Mito ABTest Practice: Meepo System
Thanks for reading「 beautiful chart Data Technology Team」 initial recognition 8 Essay, Follow us for ongoing beauty chart Latest data technology developments。
On November 4, 2008, Barack Obama won the election to become the 44th President of the United States. The success of this campaign could not have been achieved without his personal charisma, but the role of his campaign publicity team should not be underestimated. On the presidential campaign page, his team used AB Experiment Found the best of 16 options to increase the conversion rate of the campaign page 'change' by 40.6%.
chart 1,via www.niaogebiji.com/article-17605-1.html
chart 2,via www.niaogebiji.com/article-17605-1.html
They will chart 1 hit the target chart Film or video with chart 2 Any combination of the different text buttons in, formation 4*4 total 16 Various combinations of options, Each program receives a percentage of the traffic, After a period of observation, Select the highest converting solution from which to apply。
Go back to real life and look at these two scenarios: If a product update does not go through AB Experiment Just go straight online with full traffic, Then team members are often desperately looking for data to prove they are right after going live, Even if the evidence is far-fetched, Just release a statement to the public「 Indicators have been raised again」, Crowds of people have been giving their approval。 be aware that Google、Facebook、Microsoft act as AB Experiment The experience of the 90% of the new designs are not as good as the online version Or, as luck would have it, your team members are all very active and thoughtful, and it's hard to tell what's best for each of them.
In these cases,「 Speak with strength」 of AB Experiment Might help you revamp awkward team decision-making scenarios。 AB Experiment The scenarios for supported decision-making are very broad: visual design、 Page Layout、 Copywriting content、 recommendation algorithm、 Grayscale releases, etc.。 Based on the above mentioned needs, combining with the United States chart and operations of the, We built the beauty chart of AB Experiment system——Meepo。
/ Architecture and implementation of the Meepo system /
system architecture
chart 3
as if chart 3 Shown is the United States chart Meepo systematic system architecture chart, among others Meepo back office Provide strategy development and Data analysis Presentation of results, but (not) Meepo Policy server receives Meepo back office notification policy and make it available to the public。AB Sdk Communicating with the policy server, Get the corresponding policy to provide upper layer business processing; statisticians Sdk Responsible for reporting data on punctual behaviour。 This process is displayed through log collection analysis in the Meepo back office。
It is worth noting that some model information is reported to the AB Policy Server via the AB SDK, which returns the corresponding user policy information based on certain arithmetic. The SDK provides policies to the upper layer business after processing, and the upper layer business will generate some behavior logs after executing the related logic code according to the policies, and these behavior logs are reported through the statistics sdk, and we get the data through the data center and analyze it to get the conclusion of the version experiment.
hierarchical model
With limited traffic, it is usually not possible to support multiple experimental requirements at the same time if they are available. Simultaneous analysis of multiple variables with the same number of experimental subjects often results in Parametric coupling The phenomenon of.
To avoid Parametric coupling present situation, We have used hierarchical model, Its overall architecture is as follows chart 4 as shown:
chart 4
Each stratum in the hierarchical model owns all the traffic, and in the same stratum, multiple experiments share 100% of the traffic, and the traffic is mutually exclusive between experiments, i.e. Same Layer Traffic Mutual Exclusion, Layered Multiplexed Traffic。
For example, if Experiment 1 takes up 40% of the traffic in the same stratification, then Experiment 2 can only use up to 60% of the traffic, and so on.
chart 5
When running multiple experiments at the same time, if you want the results to be as accurate as possible and need to ensure that the experiments do not interfere with each other, it is recommended that the experiments be built in the same tier and that the same user will only access one of the experiments in that tier.
If different tiers are used for Experiment 1 and Experiment 2, both Experiment 1 and Experiment 2 can be assigned up to 100% of the traffic. In this case, the same user will have access to both Experiment 1 and Experiment 2.
chart 6
If more experimental traffic is required and can Ensure that tests do not interfere with each other , then there is the option of layering experiments, with the same user potentially having access to multiple trials on different layers.
Allocation algorithm
We exist. imei、idfa、gid Multiple user IDs can be used to confirm experiments, So how do they distribute the traffic?? as if chart 7 as shown, Each user identification in the Meepo They are all independent in the system, They each occupy 10000 share flow, mutually exclusive, And each user identification has a different randomized algorithm。
chart 7
Data analysis
If the data for a short period of time is normal, The experiment will continue to run until the scheduled end time, Then it's time to analyze and interpret the experimental data to make decisions。 In general AB Experiment The cycle of at least 1-2 week, To ensure a more accurate result。 The next three are Data analysis way:
We mainly go through the experimental version of a certain indicator( average value) The value of the change and the confidence interval (math.) come to a judgment, On this indicator, Is the test version better than the control version( original version) Behave better.。
If the confidence interval is equally positive or equally negative, the test result is statistically significant. If the confidence interval is one positive and one negative, the test result is non-statistically significant. You can get a feel for it by following the example in Figure 8.
Figure 8
Original hypothesis: i.e., no difference in data performance between the experimental and control versions Alternative hypothesis: that the experimental and control versions of the data perform significantly differently
The significance level p is the probability of an event occurring in which the sample data rejects the original hypothesis, given that the original hypothesis is true. For example, we calculate a significance level of p = 0.04 based on the sample data from a particular hypothesis test, a value that implies that if the original hypothesis is true, we have only a 4% probability of getting the sample data by sampling.
So, is the probability or significance level of 0.04 large or small enough to be used to reject the original hypothesis? This is determined by comparing p with the small probability criterion α of the adopted Type I error. Decision rules for hypothesis testing.
if p ≤ α, then the original hypothesis is rejected. if p > α, Then the original hypothesis cannot be rejected。
chart 9
as if chart 9 The formula shown, in case α fetch 0.05 but (not) p = 0.04, Show that if the original hypothesis is true, Then a small probability event occurred in this trial。 Basis for determining that a small probability event will not occur, We can disprove that the original hypothesis does not hold。
/ Meepo System Module Explanation /
chart 10
Meepo back office
AB The experimental backend server generates the corresponding policies based on the backend configuration as well as dimension matching, where the backend configuration is specified by managerial intervention to which traffic layer to enter and determine the interplay of excluded parameter coupling. Its operation is divided into three main parts: configuration of experiments, data analysis, and experimental operations, as shown in Figure 11.
Figure 11
SDK
SDK The module is divided into client SDK with the server side SDK, Functional details such as Figure 12 as shown:
Figure 12
The difference between the client-side SDK and the server-side SDK is divided into two main points: different ways to get it and different ways to get the data. In terms of fetching, the client SDK will determine whether the app is cold-launched when it starts, and request an ab policy if it is cold-launched; while the server starts a daemon thread synchronously when it starts, and fetches once a minute. In terms of getting data, the client-side SDK gets the code of a certain version of an experiment, i.e. the code of the version that the device can access, while the server-side SDK gets all the experiments of the application in the Meepo policy server and the triage information of that experiment, and the server can assign the code itself.
/ Case studies /
Next, a more graphic feel based on the case Meepo system, beautiful chart Selfie app BeautyPlus When modifying the location of the application wall via Meepo The system decides whether to put it on the left or the right, The experimental procedure is as follows Figure 13 as shown:
Figure 13
The experimental results are as follows chart 14 as shown, It can be seen that the experimental version 2 The data is credible in, And the lift rate has4 point, Can be released as a new version。
chart 14
/ Future outlook /
Meepo is also continuing to optimize the Meepo system, and we are moving primarily in this direction.
(end)