提交 54e29389 编写于 作者: wnma3mz's avatar wnma3mz

update 9th

上级 af5d67f6
| 标题 | 简介 |
| ------------------------------------------------------------ | ---- |
| [tensorflow](https://github.com/nicodjimenez/nicodjimenez.github.io/blob/master/_posts/2017-10-08-tensorflow.markdown) | |
| [The limitations of deep learning](https://blog.keras.io/the-limitations-of-deep-learning.html) | |
| [揭秘支付宝中的深度学习引擎:xNN](https://mp.weixin.qq.com/s/ZuEi32ZBSjruvtyUimBgxQ?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) | |
# The limitations of deep learning
原文链接:[The limitations of deep learning](https://blog.keras.io/the-limitations-of-deep-learning.html)
This post is adapted from Section 2 of Chapter 9 of my book, [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff) (Manning Publications).
[![Deep learning with Python](https://blog.keras.io/img/deep_learning_with_python_cover_thumbnail.png)](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff)
It is part of a series of two posts on the current limitations of deep learning, and its future.
This post is targeted at people who already have significant experience with deep learning (e.g. people who have read chapters 1 through 8 of the book). We assume a lot of pre-existing knowledge.
------
## Deep learning: the geometric view
The most surprising thing about deep learning is how simple it is. Ten years ago, no one expected that we would achieve such amazing results on machine perception problems by using simple parametric models trained with gradient descent. Now, it turns out that all you need is *sufficiently large* parametric models trained with gradient descent on *sufficiently many* examples. As Feynman once said about the universe, *"It's not complicated, it's just a lot of it"*.
In deep learning, everything is a vector, i.e. everything is a *point* in a *geometric space*. Model inputs (it could be text, images, etc) and targets are first "vectorized", i.e. turned into some initial input vector space and target vector space. Each layer in a deep learning model operates one simple geometric transformation on the data that goes through it. Together, the chain of layers of the model forms one very complex geometric transformation, broken down into a series of simple ones. This complex transformation attempts to maps the input space to the target space, one point at a time. This transformation is parametrized by the weights of the layers, which are iteratively updated based on how well the model is currently performing. A key characteristic of this geometric transformation is that it must be *differentiable*, which is required in order for us to be able to learn its parameters via gradient descent. Intuitively, this means that the geometric morphing from inputs to outputs must be smooth and continuous—a significant constraint.
The whole process of applying this complex geometric transformation to the input data can be visualized in 3D by imagining a person trying to uncrumple a paper ball: the crumpled paper ball is the manifold of the input data that the model starts with. Each movement operated by the person on the paper ball is similar to a simple geometric transformation operated by one layer. The full uncrumpling gesture sequence is the complex transformation of the entire model. Deep learning models are mathematical machines for uncrumpling complicated manifolds of high-dimensional data.
That's the magic of deep learning: turning meaning into vectors, into geometric spaces, then incrementally learning complex geometric transformations that map one space to another. All you need are spaces of sufficiently high dimensionality in order to capture the full scope of the relationships found in the original data.
## The limitations of deep learning
The space of applications that can be implemented with this simple strategy is nearly infinite. And yet, many more applications are completely out of reach for current deep learning techniques—even given vast amounts of human-annotated data. Say, for instance, that you could assemble a dataset of hundreds of thousands—even millions—of English language descriptions of the features of a software product, as written by a product manager, as well as the corresponding source code developed by a team of engineers to meet these requirements. Even with this data, you could *not* train a deep learning model to simply read a product description and generate the appropriate codebase. That's just one example among many. In general, anything that requires reasoning—like programming, or applying the scientific method—long-term planning, and algorithmic-like data manipulation, is out of reach for deep learning models, no matter how much data you throw at them. Even learning a sorting algorithm with a deep neural network is tremendously difficult.
This is because a deep learning model is "just" *a chain of simple, continuous geometric transformations* mapping one vector space into another. All it can do is map one data manifold X into another manifold Y, assuming the existence of a learnable continuous transform from X to Y, and the availability of a *dense sampling* of X:Y to use as training data. So even though a deep learning model can be interpreted as a kind of program, inversely *most programs cannot be expressed as deep learning models*—for most tasks, either there exists no corresponding practically-sized deep neural network that solves the task, or even if there exists one, it may not be *learnable*, i.e. the corresponding geometric transform may be far too complex, or there may not be appropriate data available to learn it.
Scaling up current deep learning techniques by stacking more layers and using more training data can only superficially palliate some of these issues. It will not solve the more fundamental problem that deep learning models are very limited in what they can represent, and that most of the programs that one may wish to learn cannot be expressed as a continuous geometric morphing of a data manifold.
## The risk of anthropomorphizing machine learning models
One very real risk with contemporary AI is that of misinterpreting what deep learning models do, and overestimating their abilities. A fundamental feature of the human mind is our "theory of mind", our tendency to project intentions, beliefs and knowledge on the things around us. Drawing a smiley face on a rock suddenly makes it "happy"—in our minds. Applied to deep learning, this means that when we are able to somewhat successfully train a model to generate captions to describe pictures, for instance, we are led to believe that the model "understands" the contents of the pictures, as well as the captions it generates. We then proceed to be very surprised when any slight departure from the sort of images present in the training data causes the model to start generating completely absurd captions.
![Failure of a deep learning-based image captioning system.](https://blog.keras.io/img/limitations-of-dl/caption_fail.png)
In particular, this is highlighted by "adversarial examples", which are input samples to a deep learning network that are designed to trick the model into misclassifying them. You are already aware that it is possible to do gradient ascent in input space to generate inputs that maximize the activation of some convnet filter, for instance—this was the basis of the filter visualization technique we introduced in Chapter 5 (Note: of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff)), as well as the Deep Dream algorithm from Chapter 8. Similarly, through gradient ascent, one can slightly modify an image in order to maximize the class prediction for a given class. By taking a picture of a panda and adding to it a "gibbon" gradient, we can get a neural network to classify this panda as a gibbon. This evidences both the brittleness of these models, and the deep difference between the input-to-output mapping that they operate and our own human perception.
![An adversarial example: imperceptible changes in an image can upend a model's classification of the image.](https://blog.keras.io/img/limitations-of-dl/adversarial_example.png)
In short, deep learning models do not have any understanding of their input, at least not in any human sense. Our own understanding of images, sounds, and language, is grounded in our sensorimotor experience as humans—as embodied earthly creatures. Machine learning models have no access to such experiences and thus cannot "understand" their inputs in any human-relatable way. By annotating large numbers of training examples to feed into our models, we get them to learn a geometric transform that maps data to human concepts on this specific set of examples, but this mapping is just a simplistic sketch of the original model in our minds, the one developed from our experience as embodied agents—it is like a dim image in a mirror.
![Current machine learning models: like a dim image in a mirror.](https://blog.keras.io/img/limitations-of-dl/ml_model.png)
As a machine learning practitioner, always be mindful of this, and never fall into the trap of believing that neural networks understand the task they perform—they don't, at least not in a way that would make sense to us. They were trained on a different, far narrower task than the one we wanted to teach them: that of merely mapping training inputs to training targets, point by point. Show them anything that deviates from their training data, and they will break in the most absurd ways.
## Local generalization versus extreme generalization
There just seems to be fundamental differences between the straightforward geometric morphing from input to output that deep learning models do, and the way that humans think and learn. It isn't just the fact that humans learn by themselves from embodied experience instead of being presented with explicit training examples. Aside from the different learning processes, there is a fundamental difference in the nature of the underlying representations.
Humans are capable of far more than mapping immediate stimuli to immediate responses, like a deep net, or maybe an insect, would do. They maintain complex, *abstract models* of their current situation, of themselves, of other people, and can use these models to anticipate different possible futures and perform long-term planning. They are capable of merging together known concepts to represent something they have never experienced before—like picturing a horse wearing jeans, for instance, or imagining what they would do if they won the lottery. This ability to handle hypotheticals, to expand our mental model space far beyond what we can experience directly, in a word, to perform *abstraction* and *reasoning*, is arguably the defining characteristic of human cognition. I call it "extreme generalization": an ability to adapt to novel, never experienced before situations, using very little data or even no new data at all.
This stands in sharp contrast with what deep nets do, which I would call "local generalization": the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time. Consider, for instance, the problem of learning the appropriate launch parameters to get a rocket to land on the moon. If you were to use a deep net for this task, whether training using supervised learning or reinforcement learning, you would need to feed it with thousands or even millions of launch trials, i.e. you would need to expose it to a *dense sampling* of the input space, in order to learn a reliable mapping from input space to output space. By contrast, humans can use their power of abstraction to come up with physical models—rocket science—and derive an *exact* solution that will get the rocket on the moon in just one or few trials. Similarly, if you developed a deep net controlling a human body, and wanted it to learn to safely navigate a city without getting hit by cars, the net would have to die many thousands of times in various situations until it could infer that cars and dangerous, and develop appropriate avoidance behaviors. Dropped into a new city, the net would have to relearn most of what it knows. On the other hand, humans are able to learn safe behaviors without having to die even once—again, thanks to their power of abstract modeling of hypothetical situations.
![Local generalization vs. extreme generalization.](https://blog.keras.io/img/limitations-of-dl/local_vs_extreme_generalization.png)
In short, despite our progress on machine perception, we are still very far from human-level AI: our models can only perform *local generalization*, adapting to new situations that must stay very close from past data, while human cognition is capable of *extreme generalization*, quickly adapting to radically novel situations, or planning very for long-term future situations.
## Take-aways
Here's what you should remember: the only real success of deep learning so far has been the ability to map space X to space Y using a continuous geometric transform, given large amounts of human-annotated data. Doing this well is a game-changer for essentially every industry, but it is still a very long way from human-level AI.
To lift some of these limitations and start competing with human brains, we need to move away from straightforward input-to-output mappings, and on to *reasoning*and *abstraction*. A likely appropriate substrate for abstract modeling of various situations and concepts is that of computer programs. We have said before (Note: in [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff)) that machine learning models could be defined as "learnable programs"; currently we can only learn programs that belong to a very narrow and specific subset of all possible programs. But what if we could learn *any* program, in a modular and reusable way? Let's see in the next post what the road ahead may look like.
You can read the second part here: [The future of deep learning](https://blog.keras.io/the-future-of-deep-learning.html).
*@fchollet, May 2017*
\ No newline at end of file
# **tensorflow**
原文链接:[tensorflow](https://github.com/nicodjimenez/nicodjimenez.github.io/blob/master/_posts/2017-10-08-tensorflow.markdown)
# Introduction
Every few months I enter the following query into Google: “Tensorflow sucks” or “f*** Tensorflow”, hoping to find like-minded folk on the internet. Unfortunately, although Tensorflow has been around for about two years, I still cannot find a bashing of Tensorflow that leaves me fully satisfied.
Although I suppose it’s possible I might be asking the wrong search engine, I think there’s a different force at work here: Google envy. The phenomenon known as “Google deep envy” is the following set of assumptions made by engineers across the world:
- People who work at Google are more intelligent and competent than yourself
- If you learn Tensorflow you could get a deep learning job at Google! (keep deep dreaming young fellow)
- If your mediocre startup uses Tensorflow and you blog about its virtues maybe Google will want to buy it
- If you don’t “get” Tensorflow's unintuitive design, you’re just dumb
Let's leave our assumptions behind us for now and give Tensorflow an honest look.
When Tensorflow first came out, we were promised an end to the endless nightmare of poorly designed or poorly maintained deep learning frameworks. (e.g. <https://github.com/BVLC/caffe/issues>). What we got instead, was the deep learning framework equivalent of Java (write once, run everywhere), but less fun to work with, and with a purely declarative paradigm. Yuck.
Where did things go wrong? In trying to build a tool to satisfy everyone’s needs, it seems that Google built a product that does a so-so job of satisfying anyone's needs.
For researchers, Tensorflow is hard to learn and hard to use. Research is all about flexibility, and lack of flexibility is baked into Tensorflow at a deep level.
Want to extract the values of intermediate layers of a neural net? You’ll need to define a graph, and then execute it with the data passed in as a dictionary, and oh don’t forget to add the intermediate layers as outputs of the graph, or else you won’t be able to retrieve their values. Ok, that hurt, but it’s doable.
Want to execute layers conditionally, such as an RNN that stops whenever an end-of-sentence (EOS) token is produced? Someone using Pytorch will be on their 3rd failed AI startup by the time you're done with that.
For machine learning practitioners such as myself, Tensorflow is not a great choice either. The declarative nature of the framework makes debugging much more difficult. The advantage of being able to run models on Android or iOS looks great until you see how big the framework binaries are (20MB+), or you try to look at the nearly non-existent C++ documentation, or you want to do any kind of conditional network execution, which is super useful in low resource situations such as mobile.
# Comparisons with other frameworks
It is true that the developers of Tensorflow are deep learning superstars. However, the original developer of Tensorflow that is probably most widely known and respected, Yangquing Jia, has recently left Google to join Facebook, where his Caffe2 project is quietly picking up steam: (<https://github.com/caffe2/caffe2/graphs/contributors>, <https://github.com/caffe2/caffe2/issues>). Unlike Tensorflow, Caffe2 allows the user to execute a layer on a piece of data in one line of code. Radical!
In addition, Pytorch is quickly developing popularity amongst top AI researchers. Torch users, although nursing RSI injuries from writing Lua code to perform simple string operations, simply aren’t deserting in droves to Tensorflow -- they are switching to Pytorch. It appears that Tensorflow is just not good enough for top AI labs. Sorry, Google.
The most interesting question to me is why Google chose a purely declarative paradigm for Tensorflow in spite of the obvious downsides of this approach. Did they feel that encapsulating all the computation in a single computation graph would simplify executing models on their TPU’s so they can cut Nvidia out of the millions of dollars to be made from cloud hosting of deep learning powered applications? It’s difficult to say. Overall, Tensorflow does not feel like a pure open source project for the common good. Which I would have no problem with, had their design been sound. In comparison with beautiful Google open source projects out there such as Protobuf, Golang, and Kubernetes, Tensorflow falls dramatically short.
While declarative paradigms are great for UI programming, there are many reasons why it is a problematic choice for deep learning.
Take the React Javascript library as an example, the standard choice today for interactive web applications. In React, the complexity of how data flows through the application makes sense to be hidden from the developer, since Javascript execution is generally orders of magnitudes faster than updates to the DOM. React developers don't want to worry about the mechanics of how state is propagated, so long as the end user experience is “good enough”.
On the other hand, in deep learning, a single layer can literally execute billions of FLOP’s! And deep learning researchers care very much about the mechanics of how computation is done and want fine control because they are constantly pushing the edge of what’s possible (e.g. dynamic networks) and want easy access to intermediate results.
# A concrete example
Let's look at a simple example of training a model to multiply its input by 3.
First, let's look at the Tensorflow example:
{% highlight python %} import tensorflow as tf import numpy as np X = tf.placeholder("float") Y = tf.placeholder("float") W = tf.Variable(np.random.random(), name="weight") pred = tf.multiply(X, W) cost = tf.reduce_sum(tf.pow(pred-Y, 2)) optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(cost) init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) for t in range(10000): x = np.array(np.random.random()).reshape((1, 1, 1, 1)) y = x * 3 (_, c) = sess.run([optimizer, cost], feed_dict={X: x, Y: y}) print c {% endhighlight %}
Now let's look at a Pytorch example that does the same thing:
{% highlight python %} import numpy as np import torch from torch.autograd import Variable model = torch.nn.Linear(1, 1) loss_fn = torch.nn.MSELoss(size_average=False) optimizer = torch.optim.SGD(model.parameters(), lr=0.01) for t in range(10000): x = Variable(torch.from_numpy(np.random.random((1,1)).astype(np.float32))) y = x * 3 y_pred = model(x) loss = loss_fn(y_pred, y) optimizer.zero_grad() loss.backward() optimizer.step() print loss.data[0] {% endhighlight %}
Although the Pytorch example is one less line of code, the operations are much more explicit, and the syntax follows the actual learning process much more closely inside the training loop:
1. Forward pass of input
2. Generate loss
3. Compute gradients
4. Backprop
whereas in Tensorflow the core operation is a magic `sess.run` call.
Why would you want to write more lines of code to end up with something more difficult to understand and maintain? Pytorch's interface is objectively much better than Tensorflow's. It's not even close.
# Conclusion
With Tensorflow, Google has created a framework that is simultaneously too low level to use comfortably for rapid prototyping, yet too high level to use comfortably in cutting edge research or in production environments that are resource constrained.
Let's be honest, when you have about half a dozen open source high-level libraries out there built on top of your already high-level library to make your library usable, you know something has gone terribly wrong:
- <http://tflearn.org/>
- <https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim>
- <https://github.com/fchollet/keras>
- <https://github.com/tensorflow/skflow>
Note: I will concede that Tensorboard (Tensorflow's monitoring tool) is a really good idea. If you want a beautiful monitoring solution for your machine learning project that includes advanced model comparison features, check out Losswise ([https://losswise.com](https://losswise.com/)). I developed it to allow machine learning developers such as myself to decouple tracking their model's performance from whatever machine learning library they use, and to implement many awesome features that I wanted which Tensorboard does not provide.
\ No newline at end of file
# **tensorflow**
原文链接:[tensorflow](https://github.com/nicodjimenez/nicodjimenez.github.io/blob/master/_posts/2017-10-08-tensorflow.markdown)
# 介绍
每隔几个月我就会向Google输入以下查询:“Tensorflow糟透了”或“f*** Tensorflow”,希望能在互联网上找到志同道合的人。 不幸的是,虽然Tensorflow已经存在了大约两年,但我仍然无法找到让Tensorflow完全满意的抨击。
虽然我认为我可能使用了错误的搜索引擎,但我认为这里有不同的力量:谷歌嫉妒。 被称为“谷歌深度嫉妒”的现象是世界各地工程师做出的以下假设:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册