The Recommender System You'll Love and Hate - Programmer Part


both love and hateof Referral System

As a programmer who has always been more interested in recommender systems, I recently saw a tirade from a user that

both love and hate

Recommendation systems are used in scenarios that I believe we are all basically exposed to in our daily lives. For example, as a basketball fan, in Taobao search "kobe X basketball shoes", and then after a period of time open Taobao, the home page interface may recommend a lot of basketball shoes related products, this is a relatively normal application scenario, right. Of course there could be other scenarios where, for example, the mobile phone microphone that the user spouted above could be monitored and thus their preferences accessed by the platform side and generate recommendations ......

A good recommendation system inevitably requires accurate, as detailed as possible, knowledge of the target user's preferred characteristics, sometimes inadvertently touching the user's privacy, which may cause feelings of resistance.

However, a good recommendation system is again pushed and loved by users, for example, NetEase Cloud Music, with its song list recommendation feature, which I believe is one of the major reasons why many users who use NetEase Music choose this music app.

As for, how to balance the protection of user privacy with the implementation of the recommendation system function, I think it requires full interaction and trust between the platform party and the user, the platform has the obligation to be transparent to the user about the user information that the application will obtain, and the user has the right to protect the private information that the individual does not want to disclose.

After all, a great recommendation system is one that should enable a win-win situation for both the user and the platform owner.

All of the above, the humble opinion of a programmer, but let's do our old job and introduce the technology underlying the recommender system!

Why you need a recommendation system

Today, our generation is going through a change from Information Technology (IT) to Data Technology (DT), and the more obvious sign of the DT era is: information overload.

Lo Fat's 2017 New Year's Eve Speech

In the age of DT, flooded with a huge amount of information, how to quickly help a specific user find the information of interest from the huge amount of information? There are two related solution techniques: search engines and recommender systems.

What's the difference between a search engine and a recommendation system? Search engine: realize people find information, eg. Baidu search... Recommender system: implementing information to find people, eg. Amazon's book recommendation list...

Unlike search engines, recommendation systems do not require users to accurately describe their needs; instead, they proactively provide information that meets their interests and needs based on analysis of historical behavioral modeling.

Amazon Store

From this, it is clear that recommendation systems are concerned with proactively recommending information that may be of interest to users whose needs have not yet been defined.

For example, how consumers can discover their favorite goods without even thinking about it, how producers as well as platform owners can make their goods stand out and increase sales, and tap into the 'long tail' of goods ..... recommendation system is designed to solve these problems.

Simply put, for consumers who like to spend 2 hours watching a movie of interest but not 20 minutes picking it out, this is where personalized recommendation systems exist.

What is a referral system

By analyzing and mining user behavior, the recommendation system discovers the user's personalized needs and interest characteristics and recommends information or products that may be of interest to the user. A great recommendation system connects users, merchants, and the platform and benefits all three.

Referral System

Essentially, a recommendation system is the process of sorting all products according to a certain strategy for a specific user and then filtering out a number of products to recommend to the user.

2.1、 traditional Referral System The main methods are:

1、Collaborative Filtering Recommendation (CFR): This method collects and analyzes users' historical behaviors, activities, and preferences, calculates the similarity between a user and other users, and uses the weighted evaluation values of the target user's similar users' evaluations of products to predict the target user's preference for a specific product. The advantage is that it can recommend new products to users that have not been viewed; the disadvantage is that there is a cold start problem for new users without any behavior, and there is also a sparsity problem caused by not enough interaction data between users and products, which can make it difficult for the model to find similar users.

2. Content-based Filtering Recommendation [1] (Content-based Filtering Recommendation): this method uses the content description of the product, abstracts meaningful features, and makes recommendations to the user by calculating the similarity between the user's interests and the product description. The advantage is that it is simple and straightforward and does not need to be based on other users' reviews of the product, but rather the product similarity measure through the product attributes, thus recommending similar products to the product the user is interested in; the disadvantage is that the same cold start problem exists for new users without any behavior.

3, combination of recommendations [2] (Hybrid Recommendation): the use of different inputs and techniques together to make recommendations to compensate for the shortcomings of their respective recommendation techniques.

Collaborative filtering recommendations

The idea of collaborative filtering based recommendation algorithm is to discover users' preferences by mining their historical behavior data, grouping them based on different preferences and recommending items with similar tastes. The process of calculating the recommendation results does not depend on any additional information about the item or any additional information about the user, and is only related to the user's rating of the item.

Data set composition

There are usually two methods.

1. Recommendation by similar users. By comparing the similarity between users, more similarity indicates more similar tastes between them, such an approach is called User-based Collaborative Filtering (UBCF);

2. Recommendation by similar items. By comparing the similarity of items with each other and recommending similar items to the evaluated items for the user, such an approach is called Item-based Collaborative Filtering (IPCF) algorithm.

User-based: User-based Collaborative Filtering, which recommends products that users like with similar interests to the user.

Item (product) based: tem-based Collaborative Filtering, which recommends products with high similarity to a user's previously preferred products.

At the heart of this algorithm is how to measure the similarity between users and users or between goods and goods.

There are various measures of similarity such as: euclidean distance, Pearson correlation coefficient, cosine similarity, etc.

Euclidean distance is a more used similarity measure, which uses Euclidean distance as a measure of similarity between samples, but in the calculation of Euclidean distance, the magnitude between different features has a greater effect on Euclidean distance, but Pearson correlation coefficient is not sensitive to magnitude.

Cosine similarity is one of the more used methods in text similarity. Later we focus on cosine similarity.

3.1、The difference between user-based and commodity-based recommendation methods

1、UserCF: look at the hotspots of small groups of similar users, favoring social, generally applicable to news recommendations

Improvement: UserCF-IIF: (similar to the role of TF-IDF): the number of users in the actual business is so large that it is difficult to make an interpretation of the recommendation results.

2, ItemCF: look at personalization, reflect the heritage of the user's personal interests, in addition to the update of goods can not be too fast, because the real-time calculation of the item similarity matrix is very time-consuming, which is why the news generally do not use ItemCF.

ItemCF is used more often in real business to make understandable interpretations of recommendation results based on the user's historical purchase behavior of products.

Also, technically, UserCF needs to maintain a matrix of user similarities, while ItemCF needs to maintain a matrix of item similarities. In terms of storage, if there are many users, then maintaining the user interest similarity matrix requires a lot of space, and similarly, if there are many items, then maintaining the item similarity matrix is more costly.

come from<< Referral System>>

The process of calculating ItemCF is divided into two main steps.

1. calculate the similarity between items. [Normalizing the similarity matrix by its maximum value can improve the accuracy, coverage, and diversity of recommendations]

2. Generate a recommendation list for this user based on item similarity and that user's historical behavior [sort].

[There is a demo of the Python implementation at the end of the article]

Commodity similarity matrix calculation

Generate referral lists for feature users

Disadvantages of the algorithm.

This algorithm is relatively simple to implement, but can have certain problems in practice.

For example, some very popular items that many people might like, There's no point in recommending such items to you, So the calculation needs to add a weight to such items or remove such items。 For something generic, For example, the tool book, Laundry detergent, etc. is so versatile, There's no point in recommending it.。 These are all Referral System dirty data。

In addition, when new users appear and we know nothing about their interests, it is an important issue to make recommendations. Usually at this point, we just recommend items to users that have a generally good response, meaning that the recommendations are entirely item-based. Also, not all users give ratings to many items, many users give ratings to only a few books, and how to deal with users who are less vocal about their interests is a major problem for recommendation systems.

Recommendation systems for industry

Recommender systems have a wide range of applications in industry and related job openings are relatively high, which is considered one of the more in-demand directions for machine learning related jobs. Having been exposed to two Internet companies for recommendation system related work, I also counted to feel some differences between industry and academia, here are some of my own feelings and insights.

1. Volume of data

Enterprise-level data are generally G-volume starting data volume, it is difficult to use the way we participate in some small competition data processing, python Pandas and other libraries generally use difficult to operate these business data, so many recommendation systems are built on top of clusters, data storage may be based on Hadoop HDFS, etc., computing framework is generally Spark or enterprise self-research data platform (Ali's PAI platform ... The main task is to write SQL... Envy it). So, the first step of entry is to learn the use of hadoop platform with spark, so now I regret that I didn't learn these things properly when I was in school ah.

Enterprise Recommender System

2. Practical operational understanding

Different business scenarios require us to dig deeper into the hidden information behind the data according to the actual business data. In large recommendation system departments, different recommendation groups are generally divided according to business departments, and some of them will further subdivide the tasks within the recommendation group, for example, there is a special basic platform group, those responsible for recall, and those responsible for sorting. The business logic also requires constant iteration, and generally each engineer will basically go live with a new strategy every week, with constant iterative development based on how well it actually goes live.

Meitu VW Review

3、How to reasonably evaluate the effectiveness of the recommendation system?

Having participated in a number of data competitions for recommender systems, typically the platform will give an evaluation function, which may be a summing function of common evaluation functions such as accuracy, recall, etc. However, in real business scenarios, it is difficult to give an accurate evaluation function to evaluate the effectiveness of our recommender system. This involves the dilemma of diversity versus accuracy in recommender systems

If you want to suggest items that a user likes, the "safest" way is to give them items that are particularly popular or have a high score, because these items are more likely to be liked and, at worst, less likely to be particularly hated. However, such recommendations do not necessarily produce a good user experience because the user is likely to already know about the hot or popular products, so the amount of information received is minimal and the user does not perceive it as a "personalized" recommendation.

de facto,Mcnee And others have warned everyone, Blind worship of accuracy metrics may hurt Referral System, Because this may result in the user getting some amount of information as0of“ Precise recommendations” And the field of vision is becoming narrower and narrower。 Narrowing the user's view is a major drawback of collaborative filtering algorithms, This will further exacerbate the long tail effect。 at the same time, Merchants applying personalized recommendation technology, Would also like more categories to appear in the recommendations, Thus stimulating new shopping needs of users。

Unfortunately, there is a tension between recommending diverse and novel items and the accuracy of the recommendations, as the former is risky-recommending something that no one has seen or scored low is likely to be hated by users and thus less effective. Very often, it is a dilemma that can only be solved by sacrificing diversity to increase accuracy, or sacrificing accuracy to increase diversity. One possible solution is to process the recommendation list directly, thus enhancing its diversity. While this method is certainly effective in its application, it has no theoretical basis or grace to speak of and can only be considered a practical trick.

In general, we believe that a refined mixture of two algorithms with high accuracy and good diversity can improve both diversity and accuracy of the algorithm without sacrificing either one. Unfortunately, there is no way to provide a clear interpretation and insight on this result. The intricate relationship between diversity and precision, and the competition that lies behind it, has been a tricky puzzle so far.

The article "A Review of Recommender System Evaluation" by Yushin Zhu and Linyuan Lu summarizes almost all the recommender system metrics that have ever appeared in the literature, which are based on the data itself and can be considered as the first level. In reality, it is the other two levels of evaluation that are more important when it comes to real world applications. The second level is the key performance metrics on the business application, such as conversion rate, purchase rate, unit price, number of categories purchased, etc., influenced by recommendations. The third level is the real user experience.

The vast majority of studies only address the first level of evaluation metrics, while the industry is really interested in the second level of evaluation (e.g., exactly which metrics or combinations of metrics result in higher user purchase price per customer), and the third level is the most difficult, nobody knows, and can only be estimated by the second level. Therefore, it becomes critical to establish the relationship between the first and second level indicators. Once this step is broken, a large part of the barrier between theory and application is passed.

Deep learning-based recommendation systems

In fact, the method of collaborative filtering described above is a more traditional approach that still has wide application in industry. Nowadays, along with the rise of machine learning very many techniques are applied to recommender systems, from the traditional machine learning methods LR, GBDT, XGBoost to LightGBM, deep learning from the initial use of word2vec for evaluating the similarity of users, to CNN, RNN and other models are also starting to be tried by many recommendation groups.

Recommendation sorting technology changes at Lovecraft

Deep learning has an excellent ability to automatically extract features, learn multi-level abstract feature representations, and learn information about heterogeneous or cross-domain content, which can deal with the cold start problem of recommender systems to some extent.

YouTube video Fusion recommendation model

(located) at Fusion recommendation model movies Referral System in:

1. first, user features and movie features are used as inputs to the neural network, where.

2. User characteristics incorporate information about four attributes, namely, user ID, gender, occupation and age.

3. The movie feature incorporates three attribute information, which are movie ID, movie genre ID and movie name.

4. for user features, the user ID is mapped to a vector representation of dimension size 256, which is fed into the fully connected layer and similarly done for the other three attributes. The feature representations of each of the four attributes are then fully concatenated and summed.

5, For movie features, movie IDs are processed in a similar way to user IDs, movie genre IDs are fed directly into the fully connected layer in the form of vectors, and movie names are represented by text convolutional neural networks to obtain their fixed-length vectors. The feature representations of each of the three attributes are then fully concatenated and summed.

6, After getting the vector representation of the user and the movie, the cosine similarity of the two is calculated as the scoring of the recommendation system. Finally, the square of the difference between that similarity scoring and the user's true scoring is used as the loss function for this regression model.

Fusion recommendation model

Now in the industry, the most basic is to use collaborative filtering with some other sorting methods, such as GBDT, basically to complete the basic functions of the recommendation, based on deep learning is not so mature now applied, I hope their future business needs me to study in depth how to use deep learning in the actual business scenario of the recommendation system on a large scale, after all, now I am still a recommendation system rookie, in addition, has been very much want to write about the knowledge and understanding of word2vec, about its application in the recommendation will be left to the future article to introduce it.

Addendum: a Pyhton version of a demo of collaborative filtering [to facilitate understanding of the flow of the computation]

# coding:utf-8

from __future__ import division import numpy as np

from math import *

# The first way to calculate similarity: cosine similarity, calculate the similarity between the two [there are many ways to calculate similarity, here we use cosine similarity]

def cos_sim(x, y): """ :param x(mat): row vector, It can be a user or a commodity :param y(mat): row vector, It can be a user or a commodity :return: x harmony y The cosine similarity between """ # x together with y The inner product between inner_product = x * y.T norm = np.sqrt(x * x.T) * np.sqrt(y * y.T)

# The result of cosine similarity return (inner_product / norm)[0, 0]

def similarity(data): """ :param data: matrices :return: w(mat): Similarity between any two rows, Similarity matrixw is a symmetric matrix。 In the similarity matrix agree on its own similarity as0。 """ # user/ merchandise【 The number of rows determines the dimensionality of the square matrix】 m =np.shape(data)[0]

# Initialize the similarity matrix w = np.mat(np.zeros((m, m)))

for i in range(m):

for j in range(i, m):

if j != i:

# Calculate the similarity between two rows [user-user or commodity-good] w[i, j] = cos_sim(data[i], data[j]) w[j, i] = w[i, j]

else: w[i, j] = 0 return w

# Second calculation of similarity: log-likelihood function

def obtainK(a,b): k11=0 k12=0 k21=0 k22=0 for i in xrange(len(a)):

if a[i]==b[i]!=0: k11 +=1 if a[i]==b[i]==0: k22 +=1 if a[i]!=0 and b[i]==0: k12 +=1 if a[i]==0 and b[i]!=0: k21 +=1 return k11,k12,k21,k22

def Entropy(*x): sum=0.0 for i in x: sum +=i result=0.0 for j in x:

if j<0:

pass pinghua=1 if j==0 else 0 result += j*log((j+pinghua)/sum)

return result

def loglikelihood(N,a,b): k11,k12,k21,k22 = obtainK(a, b) rowEntropy=Entropy(k11,k12)+Entropy(k21,k22) colEntropy= Entropy(k11,k21)+Entropy(k12,k22) matEntropy=Entropy(k11,k12,k21,k22) sim=-2*(matEntropy-colEntropy-rowEntropy)

return sim

# Collaborative user-based filtering

def user_based_recommend(data, w, user): """ :param data(mat): User Commodity Matrix :param w(mat): User similarity matrix :param user(int): User Number :return: predict(list): Recommended List """ # m It's the user.,n is the number of commodities m, n = np.shape(data)

# Use the user line: product information user_product = data[user, ]

print " usefulnessuser Information about bought products:",user_product,m,n

# Use user0's item info: [[4 3 0 5 0]], which means only item 3, item 5 he hasn't bought # Find items that useruser hasn't scored, which are candidate recommendations not_score = []

for i in range(n):

if user_product[0, i] == 0: not_score.append(i)

# Predict for items that are not scored predict = {}

for x in not_score:

# All user scores for this item item = data[:, x]

# Iterate through the ratings given to each user for the item [the referee is included here because his weight is 0, so it doesn't affect the final weighting] for i in range(m):

if item[i, 0] != 0:

if x not in predict:

# user i's similarity to this user * user i's rating of this item predict[x] = w[user, i] * item[i, 0]

else: predict[x] = predict[x] + w[user, i] * item[i, 0]

# Sort by predict size return sorted(predict.items(), key=lambda p: p[1], reverse=True)

# A concrete implementation of a collaborative filtering recommendation algorithm based on commodities, as follows.

def item_based_recommend(data, w, user): """ :param data(mat): User Commodity Matrix :param w(mat): User similarity matrix :param user(int): User Number :return: predict(list): Recommended List """ # Transpose the user commodity matrix into a commodity user matrix # data = data.T m, n = np.shape(data) # m is the number of goods, n For the number of users # usefulnessuser Product information of user_product = data[:, user].T

# Find items that useruser has not scored [select recommended items in his unpurchased] not_score = []

# Variables the items corresponding to this user, find the ones without ratings for i in range(m):

if user_product[0, i] == 0: not_score.append(i)

# Predict for items that are not scored predict = {}

for x in not_score:

# Information about this user's score for this product item = user_product

# Iterate over all g-commodities for i in range(m):

# This user has bought this item if item[0, i] != 0:

if x not in predict:

# Recommendation weights = similarity between this item and this item * ratings of items that this user has passed

predict[x] = w[x, i] * item[0, i]

else: predict[x] = predict[x] + w[x, i] * item[0, i]

# Sort by predict size return sorted(predict.items(), key=lambda p: p[1], reverse=True)

# 1. Definition: the format of the data that we obtain and process

# A row indicating the rating of each item by a user

# A column representing the rating of the same item by different users, if the given user has not rated the item, it means this is not purchased

''' merchandise1, merchandise2, merchandise3, merchandise4, merchandise5 userA [4, 3, 0, 5, 0], userB [5, 0, 4, 4, 0], [4, 0, 5, 0, 3], [2, 3, 0, 1, 0], [0,4, 2, 0, 5] '''""" one、UserCF User-based collaborative filtering algorithm: First calculate the user- Similarity between users Find the useru Items not purchasedI== Candidate Recommendation Dataset Iterate over all users[ All who have boughtI Users of the productU]: summation (math){ userU with usersu degree of similarity * userU For commoditiesI ratings } < Use ratings from all users who have bought Candidate Set items* The user's similarity to this user--> arrive at (a conclusion)j The score of the candidate set> """

 # User-item-rating matrix
User1 = np.mat([     [4, 0, 0, 5,1,0,0],     [5, 0, 4, 4,2,1,3],     [4, 0, 5, 0,2,0,2],     [2, 3, 0, 1,3,1,1],     [0, 4, 2, 0,1,1,4], ])
# print User1
 # Similarity matrix between users: calculate the cosine distance between any users
w = similarity(np.mat(User1))
print " Similarity between users:
",w
 # Recommended items for U0 users.
predict = user_based_recommend(User1, w, 0)
print predict
"""

2、ItemCF Item-based collaborative filtering algorithm: is calculated by similarity of the terms based on Calculating commodities- Similarity between goods Find the useru Items not purchasedI== Candidate Recommendation Dataset Iterate through all itemsJ[ this useru Purchased itemsJ]==> summation (math){ merchandiseI With commoditiesJ degree of similarity * useru For commoditiesJ ratings } < Using onlyu Own purchased items, Then based on the similarity between the goods* Rate the item yourself--- Get the score of this candidate item> """

 # First take the user-goods matrix and transpose it into a goods-user matrix
data = User1.T
print "ItemCF: merchandise- user- score (of student's work):
",data
 # Then calculate the similarity matrix between commodities
w = similarity(data)
print " Similarity between products:
",w
 # Recommended items for U0 users.
predict = item_based_recommend(data, w, 0)
print predict

Recommended>>
1、Machine Learning Process
2、This AI system can generate images of artificial galaxies
3、Where is the future wealth explosion in the auto parts market
4、Face up to yourself no gold is too small BCH keeps finding and fixing flaws
5、The first blockchain copyright registration certificate is born

    已推荐到看一看 和朋友分享想法
    最多200字,当前共 发送

    已发送

    朋友将在看一看看到

    确定
    分享你的想法...
    取消

    分享想法到看一看

    确定
    最多200字,当前共

    发送中

    网络异常,请稍后重试

    微信扫一扫
    关注该公众号