No.3 What is Data Operations (3): Mindset

This one today can be described as a summary of ways of thinking, a total of fourteen of the more important ways of thinking for me personally, to share with you. Please don't mind my soul painting.

1.Signal and Noise (Reliability and Validity)

The English word for credibility is Reliability, which simply means trustworthiness. When looking at the data sometimes there will be points that are very different from the daily data. Validity, in English, stands for the degree to which a measurement instrument or tool can accurately measure the thing to be measured. This concept represents the degree of accuracy of the data, the more accurate the more reliable. But in practice a single-minded pursuit of validity may lead to higher costs and lower efficiency. Reliability analysis allows isolation of noise in the signal, and validity analysis improves the accuracy of the signal. This is the most important premise of data analysis.

2.time series (stats.)

It's pretty basic thinking to observe target changes based on time changes, though in practice, when data changes and you can't remember what happened then, you need to keep a log. There are several aspects of operations that basically bring about changes in data: holidays, social hotspots, and new feature changes.

3.Cohort analysis

The English word is cohort analysis, which is translated as contemporaneous cluster analysis in Lean Data Analysis. The guiding idea is to partition the data into different subsets that have the same characteristics and structure according to equal time periods, etc., and then analyze the changes in these data over time. This type of analysis is common in retention analysis, and when you analyze changes in next-day retention, you're really splitting the data over time on a day-by-day basis and looking at the daily changes in the data (retention).

Contemporaneous cluster analysis differs from conventional time series analysis in that time series are changes in data on a continuous timeline, and contemporaneous clusters are changes in data on a time granular basis.

4.Pipes and Funnels

Face God says that users have a chain of behaviors, and this chain is considered as a pipeline, and each module or page can be considered as each segment of the pipeline, which is composed of one pipeline node after another. There are nodes that lead to different pipes, some that leak, and some that are badly clogged. And we can use usage duration, bounce rate, etc. to measure the performance of different nodes, and the tool is the funnel. If an over control engineer operates and designs a set of pipelines, then he needs to know about each valve and node. (Yes, I'm reminded of my profession)

5.Classification and Matrix

Sorting is for everyone, but it needs to be done wisely. The basis for the classification just needs to be remembered in one sentence: the core indicators differ significantly between the different classifications. Take user segmentation as an example, tool users and community users, shallow consumption users and deep consumption users, quality creators and ordinary users, etc. The comparison of a certain core indicator between them is obvious, so that we can better analyze the differences in behavior between different users.

Matrices are equivalent to aggregations of classification bases, in which targets in different matrices may be similar on one indicator but significantly different on others.

Whether it is a classification or a matrix, the indicators may be quantitative or qualitative. Category and matrix thinking is most common when doing user segmentation.

6.Correlation and Causation

Correlation and causation thinking are widely applied, but correlation does not necessarily have causation. If A and B are correlated, there are at least five possibilities: coincidence where A causes B, B causes A, C causes A and B, A and B are mutually causal, and small samples cause coincidence.

The classic case of "beer and diapers" is "C leads to A and B". The supermarket manager noticed that greasy middle-aged men who went to buy diapers would often buy beer, and there seemed to be some kind of relationship between beer and diapers, although the supermarket manager did not figure out why but still silently put the beer next to the diapers, and sure enough the beer was bought fast, it turned out that the men who came out to buy diapers would not help but buy a bottle of beer to drink back. While this case is made up, he demonstrates a critical piece of thinking in data analysis: sometimes there is a highly correlated relationship between different things, sometimes there is not a causal relationship (e.g. diapers and beer, both targeting different users), but there may be a correlation.


For a goal, there are many things that are relevant and different things have different levels of influence, so the relevance of each influencing thing needs to be determined. Such judgments can also be difficult when quantification is not possible or is difficult. An example of how different departments assess their contributions differently when the numbers grow is that generally new adds are attributed to marketing, retention is attributed to product and development, and activity is attributed to operations.


When data changes, you may list all the relevant segmentation dimensions in order to find out the exact cause, but in turn this approach reduces the efficiency of the analysis. To simplify, different data can be grouped into a formula according to certain parameters, and when a value in the formula changes you can see how much it ultimately affects the target, taking the common chart heat formula with a time gravity factor as an example, organize play, likes, time changes and the like into a formula to achieve sorting.

Take a two-way foil from me!

9.closed loop

There is a section in AARRR that forms a closed loop. It is difficult to continue to use the funnel to measure the effect of the whole loop in this case. But people generally laugh bitterly when they bring up closed loop because it's hard to get a closed loop strategy off the ground, but there needs to be a closed loop mindset when analyzing campaigns such as sharing.

10.Decomposition and expansion

Decomposition is breaking down a problem into smaller dimensions, while extension is taking the problem up to a larger dimension and then finding similar problems for reference. During the Chinese New Year, the data will drop, but not during other holidays. We can't find the reason for this by simply splitting the data, so let's go up one dimension and look at the activity change of all apps.

Logic tree thinking is helpful and widely used when breaking down problems, if only as a way to drill down on the problem. Think of a known problem as a trunk, and then start thinking about what related problems or subtasks this problem is related to. For each point that comes to mind, add a "branch" to the problem (i.e., the trunk) and indicate what problem the "branch" represents. A large "branch" can have smaller "branches" and so on, identifying all the related items of the problem. The logic tree is mainly to help you clarify your thoughts without repetitive and irrelevant thinking.


When creating subdivisions or issue splits, attention needs to be paid to granularity. Granularity is actually the degree of size of the smallest unit chosen when segmenting the target. In fact, an important part of the recommendation algorithm in tag-based systems is the granularity of the tags; too much granularity makes inaccurate recommendations, and too little granularity affects efficiency. When tagging content, beautiful and handsome, funny can be tagged separately, and if you want to be more precise, funny can also be divided into spoofs, segments, etc. There is no proper granularity, only the granularity that fits the requirements.

12.MECE(mutually exculsive,collectively exhaustive)

The concept of MECE is similar to exhaustive enumeration, which is a principle of problem decomposition, i.e. "mutual independence, complete exhaustion", which is followed in the decomposition of the problem in different dimensions, whether using fishbone diagrams, swot or any other methods, the MECE principle is implemented to the end. The dimensions that can be considered when decomposing the problem are: time and space, components, elements, and logic.

13.Testing and Comparison

Analyzing the problem basically requires the appropriate tests, the most common being the A/B Test and the MVT, which can test a combination of several elements at once and is based on a similar principle as the A/B Test. The testing process requires attention to: stable marking of the test and control groups, reasonable triage, and controlled timing.

14.Data Visualization

Data visualization is about converting numbers into charts or other easily expressed ways of content, because charts are faster and easier to understand, especially because different people have different abilities to understand data.

We all understand the reasoning, but we need to try it to live this life (fog).

1、Learning python from machine learning IV numpy matrix broadcast and some tips
2、Who says women are inferior to men the world of AI is theirs in 2018
3、Robot girlfriend what do you think
4、Starshine New Energy Intelligent Vehicle Project with Total Investment of 202 Billion Yuan Begins Construction in Huanggang Hubei
5、GOC Chainlink Vision Presentation in Kunming Proven Powerful with Facts

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送