Facebook's Applied Machine Learning Platform
joinAI Community of technical experts>>
Here we provide a brief analytical interpretation of the article Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective, published by the Applied Machine Learning group at facebook. This article gives us a pretty good idea of the machine learning platform in Facebook and the various products applying this platform.
The group of 17 engineers and scientists from Facebook are the authors of this article. These people may simply be the backbone members of the entire platform. You can see that the entire Facebook machine learning platform is a complex environment with very many people collaborating to build it.
This article can be said to help dispel a lot of myths or misconceptions out there. It also gives everyone a chance to learn from the large Internet companies building machine learning platforms. The article begins with a series of important observations.
Facebook has a lot of machine learning application scenarios. The application of computer vision is only a small part of the picture.
Facebook has a very rich machine learning library, including Support Vector Machines, Logistic Regression, GBDT, MultiLayer Perceptron, CNN and RNN.
Facebook's current machine learning scenario utilizes both GPUs and CPUs. During training, there is a lot of use of GPUs and CPUs as needed, but during Inference, the vast majority still use the CPU.
Facebook's machine learning architecture cares a lot about distributed training.
The article lists some of the major Facebook machine learning application scenarios including the familiar News Feed, Ads and Search, but also some lesser known applications such as Sigma (Facebook's internal framework for Anomaly Detection), Lumos (what appears to be an Embedding and information extraction tool for Image), Facer (Facebook's face recognition framework), Language Translation (as the name implies, a platform for language translation) and Speech Recognition (as the name implies, a platform for speech recognition). As you can see, machine learning is already very widely used in Facebook.
So, what models are these apps actually using? Facer is using SVM. Sigma is using GBDT. Ads, News Feed and Sigma are all using MLP. And Lumos, Facer are using CNNs. Text Understanding, Translation, Speech Recognition in using RNN.
For the deep learning frameworks side, Facebook currently supports two frameworks: the Caffe2 and PyTorch. They are the production environment and the research environment, respectively. The authors elaborate a bit on why they want the two environments to be different from each other. The short answer is that these two environments do not have the same needs - one requires stability and efficiency, the other requires the ability to be flexible and versatile. Of course, the authors also see the potential problems posed by multiple deep learning frameworks. So the authors mentioned an exchange format called Open Neural Network Exchange (ONNX). It is thought that this swap format is to speed up the conversion from one frame to another.
Looking at the timeliness of model training, some apps train daily, like News Feed, while Search is every hour, while others are some every week or every several months. And in terms of Inference, first, the authors mention that different applications are likely to require different architectures (Architectures) for Inference. Also, the authors mention that it is not necessary to have the most accurate predictions right off the bat, and that sometimes less accurate results can be shown to the user first, and then more accurate results can be calculated and pushed to the user later.
There are many more points of detail in this article that deserve attention. All in all, if you are interested in the application of machine learning in large internet companies and also want information on the overall architecture of the platform, software and hardware, this article is a good read.