cool hit counter Performance Fundamentals of Large-Scale Web Architecture Evolution_Intefrankly

Performance Fundamentals of Large-Scale Web Architecture Evolution


This article is compiled from the book "Technical Architecture for Large-Scale Websites Core Principles and Case Studies", which should be considered a very strong internal secret, although there is no practical teaching, but the basic theory is solid is very important, the book has a clear point of view, the design of the problem domain is targeted and comprehensive, the breadth and depth of knowledge points have been expanded, including all aspects of architectural design.

Reading this book may not enable you to master the art of dragon slaying for large website architecture design, but at least it will give you a comprehensive understanding of the methodology and mindset of website architecture.

This is the type of book that should be opened and flipped through occasionally in work and study after reading the book in its entirety to clear up confusion and deepen understanding.

Features of large software systems

Architectural elements of a large website

The evolution of large websites

Initial phase of site architecture

The initial stages are relatively simple, usually a server can take care of a website, and later gradually optimize it to evolve in a better direction!

Separation of application services and data services

As the website business grows, one server is gradually unable to meet the demand: more and more users access leads to worse performance, more and more data leads to insufficient storage space.

At this point it was necessary to separate the application and data, and after the separation I used three servers for that station: the application server, the file server and the database server.

Using Caching to Improve Website Performance

As the number of users gradually increased, too much pressure on the database led to access delays, which in turn affected the performance of the entire site and the user experience was compromised.

Website access characteristics follow the same twenty-eight law as the distribution of wealth in the real world: 80% of business access is concentrated on 20% of the data.

Caching this small portion of data in memory can reduce database access pressure, provide faster database access across the site, and improve database write performance.

The caches used by websites can be divided into two types.

The local cache on the application server.

Local caches are a bit faster to access, but are limited in the amount of data they can cache by the memory limitations of the application server, and can appear to commandeer memory with the application.

Remote caching on a dedicated distributed cache server.

Remotely distributed caching can be done using clustering, deploying large memory servers as dedicated caching servers, which can theoretically achieve caching services that are not limited by memory capacity.

Improving concurrent processing of websites with application server clusters

A single application server can handle a limited number of request connections, and during peak site access times, the application server becomes the bottleneck for the entire site.

The use of clusters is a common means for websites to solve the problem of high concurrency and massive amounts of data.

With a load-balanced scheduling server, access requests from a user's browser can be distributed to any server in the application server cluster, and if there are more users, more application servers are added to the cluster so that the load pressure on the application servers no longer becomes a bottleneck for the entire site.

Database read/write separation

After using caching, the vast majority of data read and write operations can be completed without the database, but there are still some read operations and all write operations need to access the database, and after the site reaches a certain size, the database becomes the bottleneck of the site because of the high load pressure.

Through the master-slave hot standby function provided by the database, the master-slave relationship of two databases can be configured to synchronize data updates from one database server to another server, thus achieving database read and write separation and improving database load pressure.

The master database is responsible for write operations and synchronizes data updates to the slave database through the master-slave replication mechanism, and the slave data is responsible for read operations.

Many cloud providers currently have similar offerings, but of course, you can also build your own database cluster and implement read/write separation in your business code

Accelerate website response with reverse proxies and CDNs

Due to the complex network environment in China, the speed of the website varies greatly when users from different regions access the website, the website needs to accelerate the website access speed, the main means are using CDN and reverse proxy.

The basic principle of both CDNs and reverse proxies is caching, the difference is that CDNs are deployed in the server room provided by the network to enable users when requesting services from the website. The ability to access data from the nearest network provider's server room.

The reverse proxy, on the other hand, is deployed in the central server room of the website. When a user request reaches the central server room, the first server to be accessed is the reverse proxy server, and if the resources requested by the user are cached in the reverse proxy server, it will be returned directly to the user.

Use of distributed file systems and distributed database systems

No powerful single server can meet the continuously growing business needs of large websites, and database systems and file systems need to use distributed systems.

Distributed databases are the last resort for website database splitting and are only used when the size of the single table data is very large. As a last resort, a more common means of database splitting for websites is business subdivision, where data from different businesses is deployed on different physical servers.

Using NoSQL and Search Engines

As website businesses become more complex and the need for data storage and retrieval becomes more complex, websites need to adopt some non-relational database technologies such as NoSQL and non-database query technologies such as search engines. Various open source products are popping up all over the place: redis, mongodb, solr, elastic stack, hadoop, spark ......

Business unbundling

In order to cope with increasingly complex business scenarios, the entire website business is divided into different product lines by using divide and rule tactics, e.g. large shopping transaction websites split the home page, shops, orders, buyers, sellers, etc. into different product lines under the responsibility of different business teams.

A site will be split into many different applications based on product line divisions, with each application deployed and maintained independently. Relationships between applications can be established via a hyperlink, or data can be distributed via a message queue, or at most by accessing the same data storage system to form an associated complete system.

distributed service

Since each application system needs to perform many of the same business operations, such as user management, product management, etc., these common operations can be extracted and deployed independently.

A common business service is provided by these reusable business connection databases, while the application system only needs to manage the user interface and complete specific business operations by invoking the common business service through distributed services.

Insert small advertisement.

1、What is the blockchain What can I do
2、SolrDIHdataConfig parameter XXE vulnerability
3、Lu Zhiji 27 years of doing just one thing in laundry
4、Investor EducationFinancial Network Security Knowledge
5、The day when big data defines smart car life

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送