If you're unfamiliar with CNN, check out the videos and notes of one of the best resources on it, the Stanford CS231n course *CNN for Visual Recognition* ([http://cs231n.stanford.edu](http://cs231n.stanford.edu)). Another good resource on CNN is Chapter 6 of *Michael Nielsen's* online book, *Neural Networks and Deep Learning*: [http://neuralnetworksanddeeplearning.com/chap6.html#introducing_convolutional_networks](http://neuralnetworksanddeeplearning.com/chap6.html#introducing_convolutional_networks).
To further improve the accuracy, you can play with the `retrain.py`'s other parameters such as training steps (`--how_many_training_steps`), learning rate (`--learning_rate`), and data augmentation (`--flip_left_right`, `--random_crop`, `--random_scale`, `--random_brightness`). Generally, this is a tedious process that involves a lot of "dirty work" as called by Andrew Ng, one of the best-known deep learning experts, in his *Nuts and Bolts of Applying Deep Learning* speech (video is available at: [https://www.youtube.com/watch?v=F1ka6a13S9I](https://www.youtube.com/watch?v=F1ka6a13S9I)).
为了进一步提高准确性,您可以使用`retrain.py`的其他参数(例如训练步骤) (`--how_many_training_steps`),学习率(`--learning_rate`),和数据扩充(`--flip_left_right`, `--random_crop`, `--random_scale`, `--random_brightness`). 通常,这是一个乏味的过程,涉及到许多“肮脏的工作”,这是最著名的深度学习专家之一 Andrew Ng 在他的“应用深度学习的基本原理”演讲中提到的([视频可在以下位置找到](https://www.youtube.com/watch?v=F1ka6a13S9I))。
TensorFlow's documentation ([https://www.tensorflow.org/performance/quantization](https://www.tensorflow.org/performance/quantization)) offers more details on quantization and why it works.
Andrej Karpathy wrote a good introduction to RCNN, "Playing around with RCNN, State of the Art Object Detector” in 2014 ([https://cs.stanford.edu/people/karpathy/rcnn](https://cs.stanford.edu/people/karpathy/rcnn)). There's a nice video lecture, “Spatial Localization and Detection", as part of Stanford’s CS231n course by Justin Johnson, on object detection, with details on RCNN, Fast RCNN, Faster RCNN, and YOLO. SSD is described in detail at [https://github.com/weiliu89/caffe/tree/ssd](https://github.com/weiliu89/caffe/tree/ssd). And the cool YOLO2 website is [https://pjreddie.com/darknet/yolo](https://pjreddie.com/darknet/yolo).
尽管原始神经样式转换算法的结果令人惊叹,但其性能却很差-训练是样式转换图像生成过程的一部分,通常在 GPU 上花费几分钟,在 CPU 上花费约一个小时才能生成良好的图像。 结果。
If you're interested in the details of the original algorithm, you can read the paper along with a well-documented Python implementation, at [https://github.com/log0/neural-style-painting/blob/master/art.py](https://github.com/log0/neural-style-painting/blob/master/art.py). We won't discuss this original algorithm as it's not feasible to run on mobile phone, but it's fun and instrumental to try it to get a better understanding of how to use a pre-trained deep CNN model for different computer vision tasks.
RNN allows us to handle sequences of input and/or output, because the network, by design, has memory of previous items in an input sequence or can generate a sequence of output. This makes RNN more appropriate for speech recognition, (where the input is a sequence of words uttered by users), image captioning, (where the output is a natural language sentence consisting of a series of words), text generation, and time series prediction. If you're unfamiliar with RNN, you should definitely check out *Andrey Karpathy's* blog, *The Unreasonable Effectiveness of Recurrent Neural Networks* ([http://karpathy.github.io/2015/05/21/rnn-effectiveness](http://karpathy.github.io/2015/05/21/rnn-effectiveness)). We'll also cover some detailed RNN models later in the book.
The speech commands dataset is collected from an Open Speech Recording site ([https://aiyprojects.withgoogle.com/open_speech_recording](https://aiyprojects.withgoogle.com/open_speech_recording)). You should give it a try and maybe contribute a few minutes of your own recordings to help it improve and also get a sense of how you can collect your own speech commands dataset if needed. There's also a Kaggle competition ([https://www.kaggle.com/c/tensorflow-speech-recognition-challenge](https://www.kaggle.com/c/tensorflow-speech-recognition-challenge)) on using the dataset to build a model and you can learn more about speech models and tips there.
As mobile developers, you probably don't need to understand DFT and FFT. But you'd better appreciate how all this model training works when used in mobile apps by knowing that behind the scenes of the TensorFlow simple speech commands model training that we're about to cover, it's the use of FFT, one of the top 10 algorithms in the 20th century, among other things of course, that makes the CNN-based speech command recognition model training possible. For a fun and intuitive tutorial on DFT, you can read this article: [http://practicalcryptography.com/miscellaneous/machine-learning/intuitive-guide-discrete-fourier-transform](http://practicalcryptography.com/miscellaneous/machine-learning/intuitive-guide-discrete-fourier-transform) .
@@ -289,7 +289,7 @@ if (record.getState() != AudioRecord.STATE_INITIALIZED) return;
record.startRecording();
```
There are two classes in Android for recording audio: `MediaRecorder` and `AudioRecord`. `MediaRecorder` is easier to use than `AudioRecord`, but it saves compressed audio files until Android API Level 24 (Android 7.0), which supports recording raw, unprocessed audio. According to [https://developer.android.com/about/dashboards/index.html](https://developer.android.com/about/dashboards/index.html), as of January 2018, there are more than 70% of Android devices in the market that still run Android versions older than 7.0\. You probably would prefer not to target your app to Android 7.0 or above. In addition, to decode the compressed audio recorded by `MediaRecorder`, you have to use `MediaCodec`, which is pretty complicated to use. `AudioRecord`, albeit a low-level API, is actually perfect for recording raw unprocessed data which is then sent to the speech commands recognition model for processing.
If you're interested in knowing more about the Nash Equilibrium, Google *"khan academy nash equilibrium"* and watch the two fun videos on it by Sal Khan. The Wikipedia page on Nash Equilibrium and the article, *"**What is the Nash equilibrium and why does it matter?"* in The Economist explaining economics ([https://www.economist.com/blogs/economist-explains/2016/09/economist-explains-economics](https://www.economist.com/blogs/economist-explains/2016/09/economist-explains-economics)) are also a good read. Understanding the basic intuition and idea behind GANs will help you appreciate more why it has great potential.
如果您有兴趣了解有关纳什均衡的更多信息,请访问 Google “可汗学院纳什均衡”,并观看 Sal Khan 撰写的两个有趣的视频。 《经济学家》解释经济学的“纳什均衡”维基百科页面和文章[“纳什均衡是什么,为什么重要?”](https://www.economist.com/blogs/economist-explains/2016/09/economist-explains-economics)也是不错的读物。 了解 GAN 的基本直觉和想法将有助于您进一步了解 GAN 具有巨大潜力的原因。
If you're not familiar with reinforcement learning or MCTS, there's lots of information about them on the internet. Consider checking out *Richard Sutton* and *Andrew Barto's* classic book, *Reinforcement Learning: An Introduction*, which is publicly available at [http://incompleteideas.net/book/the-book-2nd.html](http://incompleteideas.net/book/the-book-2nd.html). You can also watch the reinforcement learning course videos by *David Silver*, the technical lead for *AlphaGo* at *DeepMind*, on YouTube (search "reinforcement learning David Silver"). A fun and useful toolkit for reinforcement learning is OpenAI Gym ([https://gym.openai.com](https://gym.openai.com)). In the last chapter of the book, we'll go deeper into reinforcement learning and OpenAI Gym. For MCTS, check out its Wiki page, [https://en.wikipedia.org/wiki/Monte_Carlo_tree_search](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search), as well as this blog: [http://tim.hibal.org/blog/alpha-zero-how-and-why-it-works](http://tim.hibal.org/blog/alpha-zero-how-and-why-it-works).
如果您不熟悉强化学习或 MCTS,则在互联网上有很多关于强化学习或 MCTS 的信息。 考虑查看 Richard Sutton 和 Andrew Barto 的经典著作《强化学习:简介》,该书可在[以下网站](http://incompleteideas.net/book/the-book-2nd.html)上公开获得。 您还可以在YouTube上观看 DeepMind 的 AlphaGo 的技术负责人 David Silver 的强化学习课程视频(搜索“强化学习 David Silver”)。 一个有趣且有用的强化学习工具包是 [OpenAI Gym](https://gym.openai.com)。 在本书的最后一章中,我们将更深入地学习强化学习和 OpenAI Gym。 对于 MCTS,请查看其[维基页面](https://en.wikipedia.org/wiki/Monte_Carlo_tree_search),以及[此博客](http://tim.hibal.org/blog/alpha-zero-how-and-why-it-works)。
在下一节中,我们将研究以 TensorFlow 为后端的 Keras 实现 AlphaZero 算法,其目标是使用该算法构建和训练模型以玩 Connect4。您将看到 模型架构看起来像,并且是关键的 Keras 代码来构建模型。
对于新冻结的,可选的经过转换和映射的模型,您始终可以将其与 TensorFlow 吊舱一起尝试,以查看是否有幸能够以简单的方式使用它。 在我们的案例中,当使用 TensorFlow Pod 加载它时,我们生成的`alphazero19.pb` 模型 会导致以下错误:
对于新冻结的,可选的经过转换和映射的模型,您始终可以将其与 TensorFlow Pod一起尝试,以查看是否有幸能够以简单的方式使用它。 在我们的案例中,当使用 TensorFlow Pod 加载它时,我们生成的`alphazero19.pb` 模型 会导致以下错误:
```py
Couldn't load model: Invalid argument: No OpKernel was registered to support Op 'Switch' with these attrs. Registered devices: [CPU], Registered kernels:
1. 使用 TensorFlow 或 Keras 以 TensorFlow 作为后端来构建和训练(或重新训练)TensorFlow 模型,例如我们在前几章中训练的模型。
You can also pick a prebuilt TensorFlow Lite model, such as the MobileNet models available at [https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md), which we used for retraining in [Chapter 2](../Text/02.html), *Classifying Images with Transfer Learning*. Each of the MobileNet model `tgz` files that you can download there contains a converted TensorFlow Lite model. For example, the `MobileNet_v1_1.0_224.tgz` file contains a `mobilenet_v1_1.0_224.tflite` file that you can use directly on mobile. If you use such a prebuilt TensorFlow Lite model, you can skip steps 2 and 3.
您还可以选择一个预先构建的 TensorFlow Lite 模型,例如可从以下位置获得的 [MobileNet 模型](https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md),我们在第 2 章,《使用转移学习对图像进行分类》中将其用于再训练。 您可以在此处下载的每个MobileNet 模型`tgz`文件都包含转换后的 TensorFlow Lite 模型。 例如,`MobileNet_v1_1.0_224.tgz`文件包含一个`mobilenet_v1_1.0_224.tflite`文件,您可以直接在移动设备上使用它。 如果使用这样的预构建 TensorFlow Lite 模型,则可以跳过步骤 2 和 3。
3. 使用 TensorFlow Lite 转换器工具将 TensorFlow 模型转换为 TensorFlow Lite 模型。 在下一节中,您将看到一个详细的示例。
4. 在 iOS 或 Android 上部署 TensorFlow Lite 模型-对于 iOS,使用 C++ API 加载和运行模型; 对于 Android,请使用 Java API(围绕 C++ API 的包装器)加载和运行模型。 与我们之前在 TensorFlow Mobile 项目中使用的`Session`类不同,C++ 和 Java API 均使用 TensorFlow-lite 特定的`Interpreter`类来推断模型。 在接下来的两个部分中,我们将向您展示 iOS C++ 代码和 Android Java 代码以使用`Interpreter`。
If you run a TensorFlow Lite model on Android, and if the Android device is Android 8.1 (API level 27) or above and supports hardware acceleration with a dedicated neural network hardware, a GPU, or some other digital signal processors, then the `Interpreter` will use the Android Neural Networks API ([https://developer.android.com/ndk/guides/neuralnetworks/index.html](https://developer.android.com/ndk/guides/neuralnetworks/index.html)) to speed up the model running. For example, Google's Pixel 2 phone has a custom chip optimized for image processing, which can be turned on with Android 8.1, and support hardware acceleration.
运行`pod install`。 然后在 Xcode 中打开 `HelloTFLite.xcworkspace`,将`ViewController.m`重命名为`ViewController.mm`,并添加必要的 C++ 头文件和 TensorFlow Lite 头文件。 您的 Xcode 项目应类似于以下屏幕截图:
![](img/b4b1750b-cc97-42f8-a032-226b595d7e46.png)Figure 11.1 A new Xcode iOS project using the TensorFlow Lite podWe're only showing you how to use the TensorFlow Lite pod in your iOS apps. There's another way to add TensorFlow Lite to iOS, similar to building the custom TensorFlow Mobile iOS library that we've done many times in the previous chapters. For more information on how to build your own custom TensorFlow Lite iOS library, see the documentation at [https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/ios.md](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/ios.md).
![](img/b4b1750b-cc97-42f8-a032-226b595d7e46.png)Figure 11.1 A new Xcode iOS project using the TensorFlow Lite pod
我们仅向您展示如何在 iOS 应用中使用 TensorFlow Lite Pod。 还有另一种将 TensorFlow Lite 添加到 iOS 的方法,类似于构建自定义 TensorFlow Mobile iOS 库的过程,我们在前几章中已经做过很多次了。 有关如何构建自己的自定义 TensorFlow Lite iOS 库的更多信息,请参阅[以下位置的文档](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/lite/g3doc/ios.md)。
您还需要 HDMI 电缆将 Raspberry Pi 板连接到计算机显示器,USB 键盘和 USB 鼠标。 总共要花 200 美元,包括 110 美元的 GoPiGo,来构建一个可以移动,看,听,说的 Raspberry Pi 机器人。 尽管与功能强大的 Raspberry Pi 计算机相比,GoPiGo 套件似乎有点昂贵,但是如果没有它,那么一动不动的 Raspberry Pi 可能会失去很多吸引力。
There's an older blog, *How to build a robot that “sees” with $100 and TensorFlow* ([https://www.oreilly.com/learning/how-to-build-a-robot-that-sees-with-100-and-tensorflow](https://www.oreilly.com/learning/how-to-build-a-robot-that-sees-with-100-and-tensorflow)), written by *Lukas Biewald* in September 2016, that covers how to use TensorFlow and Raspberry Pi 3 with some alternative parts to build a robot that sees and speaks. It's a fun read. What we cover here offers more detailed steps to set up Raspberry Pi 3 with GoPiGo, the user-friendly and Google-recommended toolkit to turn Pi into a robot, and the newer version of TensorFlow 1.6, in addition to adding voice command recognition and reinforcement learning.
有一个较旧的博客,[“如何用100美元和TensorFlow构建“可视”的机器人”](https://www.oreilly.com/learning/how-to-build-a-robot-that-sees-with-100-and-tensorflow),由 Lukas Biewald 于 2016 年 9 月撰写,内容涵盖了如何使用 TensorFlow 和 Raspberry Pi 3 以及一些其他部件来构建能够说话和说话的机器人。 这很有趣。 我们这里介绍的内容除了提供语音命令识别和强化学习外,还提供了更详细的步骤来设置带有 GoPiGo 的 Raspberry Pi 3,GoPiGo(易于使用且受 Google 推荐的工具包,可将 Pi 变成机器人)以及更新版本的 TensorFlow 1.6。
We covered AlphaGo and AlphaZero in the last chapter, and Jim Fleming wrote an interesting blog entry titled Before AlphaGo there was TD-Gammon ([https://medium.com/jim-fleming/before-alphago-there-was-td-gammon-13deff866197](https://medium.com/jim-fleming/before-alphago-there-was-td-gammon-13deff866197)), which was the first reinforcement learning application that trains itself using a neural network as an evaluation function to beat human Backgammon champions. Both the blog entry and the book, *Reinforcement Learning: An Introduction* by *Sutton* and *Barto*, have in-depth descriptions of TD-Gammon; you can also Google "Temporal Difference Learning and TD-Gammon" for the original paper if you want to know more about using neural networks as a powerful universal function.