A must-see to get started with TensorFlow: Google AI intern experience
By JacobBuckmanTranslated by Qiang Wang and Wu Ming [Xinzhiyuan Introduction] The author of this article, Jacob, comes from the GoogleAIResIdent program, where he started a year-long research-based internship at Google in the summer of 2017, and had not used tensorFlow before that, although he had a lot of programming experience and machine learning experience. This article is a practical tutorial written by Jacob for TensorFlow, and the author says it would have been nice if someone had told him about this knowledge before turning on TensorFlow learning.
Foreword: "My name is Jacob and I'm a scholar in Google's AIResidency program. When I entered the project in the summer of 2017, I had a lot of programming experience myself and a deep understanding of machine learning, but I had never worked with tensorFlow before. At the time, I thought I could master Tensorflow quickly with my own abilities, but I didn't realize how much of a stumble I would have in learning it. Even a few months after joining the project I was still occasionally confused about how to implement my new ideas with Tensorflow code.
Check it out! We are given a node which contains the constant: 2. I know you're surprised, surprised by a function called tf.constant. When we print this variable, we see that it returns a tf. Tensor object, which is a pointer to the node we just created. To emphasize the point, here is another example.
Each time we call tf.constant, we create a new node in the graph. The result is the same even if the node is functionally identical to an existing node, even if we reassign the node to the same variable, or even if we don't assign it to a variable at all. In contrast, if a new variable is created and set equal to an existing node, the pointer is simply copied to that node and nothing is added to the graph.
Okay, let's take it a step further.
Now let's see - that's the real calculation chart we want! Note that the + operation is overloaded in Tensorflow, so adding two tensors at the same time adds a node to the graph, even though it doesn't look like a Tensorflow operation. Okay, so two_node points to the node containing 2, three_node points to the node containing 3, and sum_node points to the node containing ...+? What's the situation? Isn't it supposed to contain 5? As it turns out, no.
Bravo! We can also pass a list, sess.run([node1, node2, ...]), and have it return multiple outputs.
In general, the sess.run call tends to be one of the biggest TensorFlow bottlenecks, so the fewer times it is called, the better. Return multiple items in a single sess.run call if you can, rather than making multiple calls. Placeholders and feed_dict The computations we've done so far have been tedious: there's no access to the input, so they always output the same thing. A practical application might involve constructing such a computational graph: it takes an input, processes it in some (consistent) way, and returns an output.
...... This is a bad example because it raises an exception. The placeholder is expected to be given a value, but we don't provide it, so Tensorflow crashes. To provide a value, we use the feed_dict property of sess.run.
Much better. Note the format of the values passed to feed_dict. These keys should be variables corresponding to the placeholder nodes in the graph (as mentioned before, it actually means pointers to the placeholder nodes in the graph). The corresponding value is the data element to be assigned to each placeholder - usually a scalar or Numpy array. The third key abstraction: computing paths Here is another example of the use of placeholders.
Why does the second call to sess.run fail? We're not checking input_placeholder, so why are we raising an error related to input_placeholder? The answer lies in the ultimate key Tensorflow abstraction: the computational path. Luckily this abstract is very intuitive. When we call sess.run on nodes that depend on other nodes in the graph, we need to compute the values of those nodes as well.
All three nodes need to be evaluated to calculate the value of sum_node. Most importantly, this contains our unpopulated placeholders and explains the exceptions! Instead, examine the computational path of three_node.
Based on the structure of the graph, we don't need to count all the nodes to evaluate the ones we want! Since we don't need to evaluate placeholder_node to evaluate three_node, running sess.run(three_node) will not raise an exception. The fact that Tensorflow automatically routes computations through only the necessary nodes is a huge advantage for it. It saves a lot of runtime if the computational graph is very large and has many unnecessary nodes.
Another anomaly was found. When a variable node is first created, its value is essentially "null", and any attempt to compute on it will throw this exception. We can only assign a value to a variable before we can do calculations with it. There are two main methods that can be used to assign values to variables: initializers and tf.assign. Let's start with tf.assign.
Compared to the nodes we've seen so far, tf.assign(target,value) has some unique properties: the identity operation. tf.assign(target,value) does not do any computation, it is always equal to value. Side effects. When the computation "flows through" the assign_node, it has side effects on the other nodes in the graph. In this case, the side effect is to replace the value of count_variable with the value saved in zero_node. Non-dependent side.
When the computation flows through any node in the graph, it also causes the side effects controlled by that node (shown in green) to take effect. Due to a special side effect of tf.assign, the memory associated with count_variable (previously "null") is now permanently set to 0. This means that the next time we call sess.run(count_variable), no exception will be thrown. Instead, we will get 0. Next, let's look at the initializer.
What's going on here? Why doesn't the initializer work? The problem is the separation between the session and the graph. We've pointed the initializer property of get_variable to const_init_node, but it just adds a new connection between the nodes in the graph. We haven't done anything related to causing the exception: the memory associated with the variable node (which is stored in the session, not the graph) is still "null". We need to get const_init_node to update the variables through the session.
To do this, we added another special node: init=tf.global_variables_initializer. Similar to tf.assign, this is a node with side effects. Unlike tf.assign, we don't actually need to specify its input! The tf.global_variables_initializer will look at the global graph at the time of its creation, automatically adding dependencies to each tf.initializer in the graph.
As you can see, the losses are essentially unchanged and we have a good estimate of the true parameters. Only one or two lines of this part of the code will be new to you: since you already have a good understanding of the basic concepts of Tensorflow, this code should be easy to explain! The first line, optimizer=tf.train.GradientDescentOptimizer(1e-3) will not add nodes to the graph. It just creates a Python object that contains some useful functions.
We see the result is 5. But what if we want to check the intermediate values two_node and three_node? One way to check for intermediate values is to add a return parameter to sess.run that points to each intermediate node to be checked, and then print it upon return.
This is usually fine, but it can be a bit awkward when the code gets more complex. A more convenient way is to use tf. Print statement. Confusingly, tf. Print is actually a kind of node for Tensorflow that has outputs and side effects! It has two required parameters: a node to be copied and a list of contents to be printed. The "node to copy" can be any node in the graph, tf. Print is the identification operation associated with the "node to copy", i.e., it outputs a copy of its input.
About tf. An important but somewhat subtle point about Print: printing is really just a side effect of it. As with all other side effects, the only way to calculate the flow through tf. Printing is done only when the Print node. If tf. Print nodes that are not in the compute path will not print anything. Even though tf. The original node that the Print node is replicating is located on the computational path, but tf. The Print node itself may not be. This one needs attention!
There is a good resource here (https://wookayin.github.io/tensorflow-talk-debugging/#1) with more practical debugging advice. Conclusion I hope this article will help you understand Tensorflow better, how it works and how to use it. After all, the concepts presented here are important for all Tensorflow programs, but they are still only scratching the surface.