High concurrency wind control techniques demystified (below)
How flexible and efficient is the access?
platforming
-Build a platform rather than a project - make a "Taobao" rather than a website that only targets a few businesses
-Abstract from business and generalize - if a business is likely to recur in the future, then modularize it, systematize it (e.g. batch systems), and develop it as a platform capability
dynamicization
-Process dynamics - different business types corresponding to the process can be adjusted at will, without adjusting the code
-Code dynamics - use groovy scripts to dynamically adjust online code without the need for release; rule configuration can use groovy scripts to code rules in addition to using various flexible pre-configurations; indicator functions groovy, without the need for each release.
-Configuration dynamics - Configuration dynamics can be considered in the form of virtual tables, through which the structure of arbitrary tables is stored into a unified table structure, thus completing the dynamics of the configuration, somewhat similar to the idea of documenting nosql.
How can I reduce response time to increase throughput?
Make good use of storage and caching
-Configuration data loaded into local memory
-Frequently and repeatedly accessed data in redis
-hbase for larger quantities with higher stability requirements
Details that need to be searched quickly are placed
-Data that needs to be decoupled for clipping put in kafka
As shown in the following diagram, different storage read times are very different, should make good use of various storage, as far as possible with time-consuming storage
The figure below is a benchmark performance test for hbase, don't ignore hbase Oh, it can access a large amount of data, but also in a very short time response, is really a wind control system performance improvement. The most important cumulative data for the current wind control system is based on hbase access
asynchronization
-At the system architecture level, asynchronize code that can be asynchronous as much as possible, but avoid abusing asynchrony
Here is a practical example, during the pressure test, it was found that the CPU sy and wa were high, and it can be broadly judged that there are too many threads and wasteful thread switching, it was observed that the time taken to enable asynchronous threads to call 3 external calls was not low, so the branch thread waited too long, resulting in a large number of threads being occupied waiting for IO and threads switching frequently.
Based on the dynamic process configuration, after combining the three external calls in the main system into one, sy and wa are greatly reduced, no longer being crushed, but the remaining two calls of the merge are put into kafka decoupling and then continued to be called.
stand-alone (computer)TPS 2,644.6->3,079
Average response time for a single machine149.3->126.03
-Log printing asynchronous - log4j2 all async, greatly improving throughput
The impact of logs on TPS can never be ignored, having tried disabling the printing of all logs, the system TPS shot straight up from 3000 to 4200.
If you don't print logs, you can't operate and maintain the online system. In a risk control system, logs are important exclusion tools and instruments. log4j2 appeared, is to print logs for large throughput, where all async to achieve full asynchronous printing, the middle of the use of disruptor to speed up, as to why disruptor fast, refer to the previous article High concurrency wind control techniques demystified (above)
stand-alone (computer)TPS 3,079 ->3,686.1
Average response time for a single machine126.03->79.35
-Reducing the number of threads and thus system cpu time, asynchronous network calls - client application for netty
In order to guarantee the throughput and execution time of the main thread, it is often necessary to asynchronize network calls, and some important asynchronized network calls also need to occupy a large number of threads in the thread pool, the number of threads, sy will remain high, both wasting cpu and causing the whole tps to crash.
The netty client using nio is definitely a great tool to solve this problem. The following figure, the left is one connection per thread waiting, consuming a large number of threads in waiting, which will lead to sy and wa boost, after using the netty framework-based client, the connection threads will be limited to a small number, and the callback business threads will also be kept in a small range and remain busy, rather than spending time on sy and wa
-Use the thread pool well and keep the system stable
Thread pooling is actually an important way to keep the system stable, and it's especially important to keep resources within a manageable range rather than overwhelming the machine with unlimited increases.
How do you deal with big data?
incremental thinking
-Problem: need to calculate from the original table to the result table, and since the incremental result table cannot be reused, only the full result table can be calculated every day, and the calculation task seems huge (182 hours of calculation task)
-Solution: convert each required full calculation to: each incremental calculation to the detail table and then full calculation from the detail table to the result table (the actual calculation is slower the first run, each subsequent run takes only a few hours)
Large number of correlation queries
-Problem: In correlation queries, often from a few simple associations to check a large amount of data, how to handle the large amount of data, sorting and paging it for manual investigation?
-Solution: multiple queries using es, redis cache paging information. The algorithm is simple, the actual process will encounter many problems, such as ip association out often massive data, data queries will time out, and subsequent queries are even more massive. Some tips are: early pruning of data; business query restrictions, such as limiting query times; multiple queries for ES, where data can be entered as query conditions at once. The distributed TOPK problem is more interesting, and this is set out in the principles of ES for those interested in studying
Another way is to store relations directly as graphs, i.e. vertex and edge relations, which will be extremely simple to query, which is conveniently represented by the graph database neo4j, which has not been studied in depth because the storage needs to be imported again, again for the reference of interested students
How do you maintain system stability?
restrict flow
• You can target business channels if you experience heavy traffic during the promotion restrict flow The switch pushes the logo to restrict flow
demote
Stop some operational query-related requirements during peak periods, reduce the burden on the data system, and schedule queries to continue until 12 midnight
Contingency plan
-Have to prepare a pre-plan before every big sale. Preplanning needs to prepare for a variety of extreme failure scenarios to ensure that there is no fumbling in case of failure