If you're unfamiliar with CNN, check out the videos and notes of one of the best resources on it, the Stanford CS231n course *CNN for Visual Recognition* ([http://cs231n.stanford.edu](http://cs231n.stanford.edu)). Another good resource on CNN is Chapter 6 of *Michael Nielsen's* online book, *Neural Networks and Deep Learning*: [http://neuralnetworksanddeeplearning.com/chap6.html#introducing_convolutional_networks](http://neuralnetworksanddeeplearning.com/chap6.html#introducing_convolutional_networks).
@@ -39,7 +39,7 @@ Andrej Karpathy wrote a good introduction to RCNN, "Playing around with RCNN, St
If you’re really interested in deep learning research and want to know all the details of how each detector works to decide which one to use, you should definitely read the papers of each method and try to reproduce the training process on your own. It’ll be a long but rewarding road. But if you want to take Andrej Karpathy’s advice, “don’t be a hero” (search on YouTube for “deep learning for computer vision Andrej”), then you can “take whatever works best, download a pre-trained model, potentially add/delete some parts of it, and fine-tune it on your app,” which is also the approach we’ll use here.
@@ -51,7 +51,7 @@ If you’re really interested in deep learning research and want to know all the
TensorFlow 对象检测 API 在其官方网站 [https://github.com/tensorflow/models/tree/master/research/object_detection](https://github.com/tensorflow/models/tree/master/research/object_detection)上有详细记录,您一定要查看其“快速入门: Jupyter notebook for the 现成的推断”指南,快速介绍了如何在 Python 中使用良好的预训练模型进行检测。 但是那里的文档分布在许多不同的页面上,有时难以理解。 在本节和下一节中,我们将通过重组在许多地方记录的重要细节并添加更多示例和代码说明来简化官方文档,并提供有关以下内容的两个分步教程:
TensorFlow 对象检测 API 在其[官方网站](https://github.com/tensorflow/models/tree/master/research/object_detection)上有详细记录,您一定要查看其“快速入门: Jupyter notebook for the 现成的推断”指南,快速介绍了如何在 Python 中使用良好的预训练模型进行检测。 但是那里的文档分布在许多不同的页面上,有时难以理解。 在本节和下一节中,我们将通过重组在许多地方记录的重要细节并添加更多示例和代码说明来简化官方文档,并提供有关以下内容的两个分步教程:
尽管原始神经样式转换算法的结果令人惊叹,但其性能却很差-训练是样式转换图像生成过程的一部分,通常在 GPU 上花费几分钟,在 CPU 上花费约一个小时才能生成良好的图像。 结果。
If you're interested in the details of the original algorithm, you can read the paper along with a well-documented Python implementation, at [https://github.com/log0/neural-style-painting/blob/master/art.py](https://github.com/log0/neural-style-painting/blob/master/art.py). We won't discuss this original algorithm as it's not feasible to run on mobile phone, but it's fun and instrumental to try it to get a better understanding of how to use a pre-trained deep CNN model for different computer vision tasks.
RNN allows us to handle sequences of input and/or output, because the network, by design, has memory of previous items in an input sequence or can generate a sequence of output. This makes RNN more appropriate for speech recognition, (where the input is a sequence of words uttered by users), image captioning, (where the output is a natural language sentence consisting of a series of words), text generation, and time series prediction. If you're unfamiliar with RNN, you should definitely check out *Andrey Karpathy's* blog, *The Unreasonable Effectiveness of Recurrent Neural Networks* ([http://karpathy.github.io/2015/05/21/rnn-effectiveness](http://karpathy.github.io/2015/05/21/rnn-effectiveness)). We'll also cover some detailed RNN models later in the book.
The speech commands dataset is collected from an Open Speech Recording site ([https://aiyprojects.withgoogle.com/open_speech_recording](https://aiyprojects.withgoogle.com/open_speech_recording)). You should give it a try and maybe contribute a few minutes of your own recordings to help it improve and also get a sense of how you can collect your own speech commands dataset if needed. There's also a Kaggle competition ([https://www.kaggle.com/c/tensorflow-speech-recognition-challenge](https://www.kaggle.com/c/tensorflow-speech-recognition-challenge)) on using the dataset to build a model and you can learn more about speech models and tips there.
As mobile developers, you probably don't need to understand DFT and FFT. But you'd better appreciate how all this model training works when used in mobile apps by knowing that behind the scenes of the TensorFlow simple speech commands model training that we're about to cover, it's the use of FFT, one of the top 10 algorithms in the 20th century, among other things of course, that makes the CNN-based speech command recognition model training possible. For a fun and intuitive tutorial on DFT, you can read this article: [http://practicalcryptography.com/miscellaneous/machine-learning/intuitive-guide-discrete-fourier-transform](http://practicalcryptography.com/miscellaneous/machine-learning/intuitive-guide-discrete-fourier-transform) .
@@ -772,7 +772,7 @@ func audioRecorderDidFinishRecording(_ recorder: AVAudioRecorder, successfully f
}
```
如果您确实想将尽可能多的代码移植到 Swift,则可以用 Swift 替换 C 中的音频文件转换代码(请参见 [https://developer.apple.com/documentation/audiotoolbox/extended_audio_file_services](https://developer.apple.com/documentation/audiotoolbox/extended_audio_file_services) 细节)。 还有一些非官方的开源项目提供了官方 TensorFlow C ++ API 的 Swift 包装器。 但是为了简单起见和达到适当的平衡,我们将保持 TensorFlow 模型的推论,在本示例中,还将保持音频文件的读取和转换,以及在 C ++和 Objective-C 中与控制 UI 和 录音,并启动呼叫以进行音频处理和识别。
如果您确实想将尽可能多的代码移植到 Swift,[则可以用 Swift 替换 C 中的音频文件转换代码](https://developer.apple.com/documentation/audiotoolbox/extended_audio_file_services)。 还有一些非官方的开源项目提供了官方 TensorFlow C ++ API 的 Swift 包装器。 但是为了简单起见和达到适当的平衡,我们将保持 TensorFlow 模型的推论,在本示例中,还将保持音频文件的读取和转换,以及在 C ++和 Objective-C 中与控制 UI 和 录音,并启动呼叫以进行音频处理和识别。
这就是构建使用语音命令识别模型的 Swift iOS 应用所需的全部内容。 现在,您可以在 iOS 模拟器或实际设备上运行它,并看到与 Objective-C 版本完全相同的结果。