提交 d752244f 编写于 作者: M Mark Daoust 提交者: gunan

Initial add of docs for Tensorflow on Mobile. (#14161)

PiperOrigin-RevId: 173980290
上级 130a5145
# Building TensorFlow on Android
To get you started working with TensorFlow on Android, we'll walk through two
ways to build our TensorFlow mobile demos and deploying them on an Android
device. The first is Android Studio, which lets you build and deploy in an
IDE. The second is building with Bazel and deploying with ADB on the command
line.
Why choose one or the other of these methods?
The simplest way to use TensorFlow on Android is to use Android Studio. If you
aren't planning to customize your TensorFlow build at all, or if you want to use
Android Studio's editor and other features to build an app and just want to add
TensorFlow to it, we recommend using Android Studio.
If you are using custom ops, or have some other reason to build TensorFlow from
scratch, scroll down and see our instructions
for [building the demo with Bazel](#build_the_demo_using_bazel).
## Build the demo using Android Studio
**Prerequisites**
If you haven't already, do the following two things:
- Install [Android Studio](https://developer.android.com/studio/index.html),
following the instructions on their website.
- Clone the TensorFlow repository from Github:
git clone https://github.com/tensorflow/tensorflow
**Building**
1. Open Android Studio, and from the Welcome screen, select **Open an existing
Android Studio project**.
2. From the **Open File or Project** window that appears, navigate to and select
the `tensorflow/examples/android` directory from wherever you cloned the
TensorFlow Github repo. Click OK.
If it asks you to do a Gradle Sync, click OK.
You may also need to install various platforms and tools, if you get
errors like "Failed to find target with hash string 'android-23' and similar.
3. Open the `build.gradle` file (you can go to **1:Project** in the side panel
and find it under the **Gradle Scripts** zippy under **Android**). Look for
the `nativeBuildSystem` variable and set it to `none` if it isn't already:
// set to 'bazel', 'cmake', 'makefile', 'none'
def nativeBuildSystem = 'none'
4. Click the Run button (the green arrow) or use **Run -> Run 'android'** from the top menu.
If it asks you to use Instant Run, click **Proceed Without Instant Run**.
Also, you need to have an Android device plugged in with developer options
enabled at this
point. See [here](https://developer.android.com/studio/run/device.html) for
more details on setting up developer devices.
This installs three apps on your phone that are all part of the TensorFlow
Demo. See [Android Sample Apps](#android_sample_apps) for more information about
them.
## Adding TensorFlow to your apps using Android Studio
To add TensorFlow to your own apps on Android, the simplest way is to add the
following lines to your Gradle build file:
allprojects {
repositories {
jcenter()
}
}
dependencies {
compile 'org.tensorflow:tensorflow-android:+'
}
This automatically downloads the latest stable version of TensorFlow as an AAR
and installs it in your project.
## Build the demo using Bazel
Another way to use TensorFlow on Android is to build an APK
using [Bazel](https://bazel.build/) and load it onto your device
using [ADB](https://developer.android.com/studio/command-line/adb.html). This
requires some knowledge of build systems and Android developer tools, but we'll
guide you through the basics here.
- First, follow our instructions for @{$install/install_sources$installing from
sources}. This will also guide you through installing Bazel and cloning the
TensorFlow code.
- Download the Android [SDK](https://developer.android.com/studio/index.html)
and [NDK](https://developer.android.com/ndk/downloads/index.html) if you do
not already have them. You need at least version 12b of the NDK, and 23 of the
SDK.
- In your copy of the TensorFlow source, update the
[WORKSPACE](https://github.com/tensorflow/tensorflow/blob/master/WORKSPACE)
file with the location of your SDK and NDK, where it says <PATH_TO_NDK>
and <PATH_TO_SDK>.
- Run Bazel to build the demo APK:
bazel build -c opt //tensorflow/examples/android:tensorflow_demo
- Use [ADB](https://developer.android.com/studio/command-line/adb.html#move) to
install the APK onto your device:
adb install -r bazel-bin/tensorflow/examples/android/tensorflow_demo.apk
Note: In general when compiling for Android with Bazel you need
`--config=android` on the Bazel command line, though in this case this
particular example is Android-only, so you don't need it here.
This installs three apps on your phone that are all part of the TensorFlow
Demo. See [Android Sample Apps](#android_sample_apps) for more information about
them.
## Android Sample Apps
The
[Android example code](https://www.tensorflow.org/code/tensorflow/examples/android/) is
a single project that builds and installs three sample apps which all use the
same underlying code. The sample apps all take video input from a phone's
camera:
- **TF Classify** uses the Inception v3 model to label the objects it’s pointed
at with classes from Imagenet. There are only 1,000 categories in Imagenet,
which misses most everyday objects and includes many things you’re unlikely to
encounter often in real life, so the results can often be quite amusing. For
example there’s no ‘person’ category, so instead it will often guess things it
does know that are often associated with pictures of people, like a seat belt
or an oxygen mask. If you do want to customize this example to recognize
objects you care about, you can use
the
[TensorFlow for Poets codelab](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/index.html#0) as
an example for how to train a model based on your own data.
- **TF Detect** uses a multibox model to try to draw bounding boxes around the
locations of people in the camera. These boxes are annotated with the
confidence for each detection result. Results will not be perfect, as this
kind of object detection is still an active research topic. The demo also
includes optical tracking for when objects move between frames, which runs
more frequently than the TensorFlow inference. This improves the user
experience since the apparent frame rate is faster, but it also gives the
ability to estimate which boxes refer to the same object between frames, which
is important for counting objects over time.
- **TF Stylize** implements a real-time style transfer algorithm on the camera
feed. You can select which styles to use and mix between them using the
palette at the bottom of the screen, and also switch out the resolution of the
processing to go higher or lower rez.
When you build and install the demo, you'll see three app icons on your phone,
one for each of the demos. Tapping on them should open up the app and let you
explore what they do. You can enable profiling statistics on-screen by tapping
the volume up button while they’re running.
### Android Inference Library
Because Android apps need to be written in Java, and core TensorFlow is in C++,
TensorFlow has a JNI library to interface between the two. Its interface is aimed
only at inference, so it provides the ability to load a graph, set up inputs,
and run the model to calculate particular outputs. You can see the full
documentation for the minimal set of methods in
[TensorFlowInferenceInterface.java](https://www.tensorflow.org/code/tensorflow/contrib/android/java/org/tensorflow/contrib/android/TensorFlowInferenceInterface.java)
The demos applications use this interface, so they’re a good place to look for
example usage. You can download prebuilt binary jars
at
[ci.tensorflow.org](https://ci.tensorflow.org/view/Nightly/job/nightly-android/).
# Building Mobile Apps with TensorFlow
TensorFlow was designed from the ground up to be a good deep learning solution
for mobile platforms like Android and iOS. This guide is to help you understand
how to integrate TensorFlow into your mobile apps effectively and efficiently.
## About this Guide
This guide is aimed at developers who have a TensorFlow model that’s
successfully working in a desktop environment, and who want to integrate it into
a mobile application. Here are the main challenges you’ll face during that
process:
- Understanding how to use Tensorflow for mobile.
- Building TensorFlow for your platform.
- Integrating the TensorFlow library into your application.
- Preparing your model file for mobile deployment.
- Optimizing for latency, RAM usage, model file size, and binary size.
## Why run TensorFlow on mobile?
Traditionally, deep learning has been associated with data centers and giant
clusters of high-powered GPU machines. However, it can be very expensive and
time-consuming to send all of the data a device has access to across a network
connection. Running on mobile makes it possible to deliver very interactive
applications in a way that’s not possible when you have to wait for a network
round trip.
Here are some common use cases for on-device deep learning:
### Speech Recognition
There are a lot of interesting applications that can be built with a
speech-driven interface, and many of these require on-device processing. Most of
the time a user isn’t giving commands, and so streaming audio continuously to a
remote server would be a waste of bandwidth, since it would mostly be silence or
background noises. To solve this problem it’s common to have a small neural
network running on-device @{$tutorials/audio_recognition$listening out for a
particular keyword}. Once that keyword has been spotted, the rest of the
conversation can be transmitted over to the server for further processing if
more computing power is needed.
### Image Recognition
It can be very useful for a mobile app to be able to make sense of a camera
image. If your users are taking photos, recognizing what’s in them can help your
camera apps apply appropriate filters, or label the photos so they’re easily
findable. It’s important for embedded applications too, since you can use image
sensors to detect all sorts of interesting conditions, whether it’s spotting
endangered animals in the wild
or
[reporting how late your train is running](https://svds.com/tensorflow-image-recognition-raspberry-pi/).
TensorFlow comes with several examples of recognizing the types of objects
inside images along with a variety of different pre-trained models, and they can
all be run on mobile devices. You can try out
our
[Tensorflow for Poets](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/index.html#0) and
[Tensorflow for Poets 2: Optimize for Mobile](https://codelabs.developers.google.com/codelabs/tensorflow-for-poets-2/index.html#0) codelabs to
see how to take a pretrained model and run some very fast and lightweight
training to teach it to recognize specific objects, and then optimize it to
run on mobile.
### Object Localization
Sometimes it’s important to know where objects are in an image as well as what
they are. There are lots of augmented reality use cases that could benefit a
mobile app, such as guiding users to the right component when offering them
help fixing their wireless network or providing informative overlays on top of
landscape features. Embedded applications often need to count objects that are
passing by them, whether it’s pests in a field of crops, or people, cars and
bikes going past a street lamp.
TensorFlow offers a pretrained model for drawing bounding boxes around people
detected in images, together with tracking code to follow them over time. The
tracking is especially important for applications where you’re trying to count
how many objects are present over time, since it gives you a good idea when a
new object enters or leaves the scene. We have some sample code for this
available for Android [on
Github](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android),
and also a [more general object detection
model](https://github.com/tensorflow/models/tree/master/object_detection/README.md)
available as well.
### Gesture Recognition
It can be useful to be able to control applications with hand or other
gestures, either recognized from images or through analyzing accelerometer
sensor data. Creating those models is beyond the scope of this guide, but
TensorFlow is an effective way of deploying them.
### Optical Character Recognition
Google Translate’s live camera view is a great example of how effective
interactive on-device detection of text can be.
<div class="video-wrapper">
<iframe class="devsite-embedded-youtube-video" data-video-id="06olHmcJjS0"
data-autohide="1" data-showinfo="0" frameborder="0" allowfullscreen>
</iframe>
</div>
There are multiple steps involved in recognizing text in images. You first have
to identify the areas where the text is present, which is a variation on the
object localization problem, and can be solved with similar techniques. Once you
have an area of text, you then need to interpret it as letters, and then use a
language model to help guess what words they represent. The simplest way to
estimate what letters are present is to segment the line of text into individual
letters, and then apply a simple neural network to the bounding box of each. You
can get good results with the kind of models used for MNIST, which you can find
in TensorFlow’s tutorials, though you may want a higher-resolution input. A
more advanced alternative is to use an LSTM model to process a whole line of
text at once, with the model itself handling the segmentation into different
characters.
### Translation
Translating from one language to another quickly and accurately, even if you
don’t have a network connection, is an important use case. Deep networks are
very effective at this sort of task, and you can find descriptions of a lot of
different models in the literature. Often these are sequence-to-sequence
recurrent models where you’re able to run a single graph to do the whole
translation, without needing to run separate parsing stages.
### Text Classification
If you want to suggest relevant prompts to users based on what they’re typing or
reading, it can be very useful to understand the meaning of the text. This is
where text classification comes in. Text classification is an umbrella term
that covers everything from sentiment analysis to topic discovery. You’re likely
to have your own categories or labels that you want to apply, so the best place
to start is with an example
like
[Skip-Thoughts](https://github.com/tensorflow/models/tree/master/skip_thoughts/),
and then train on your own examples.
### Voice Synthesis
A synthesized voice can be a great way of giving users feedback or aiding
accessibility, and recent advances such as
[WaveNet](https://deepmind.com/blog/wavenet-generative-model-raw-audio/) show
that deep learning can offer very natural-sounding speech.
## How does it fit with the cloud?
These examples of use cases give an idea of how on-device networks can
complement cloud services. Cloud has a great deal of computing power in a
controlled environment, but running on devices can offer higher interactivity.
In situations where the cloud is unavailable, or your cloud capacity is limited,
you can provide an offline experience, or reduce cloud workload by processing
easy cases on device.
Doing on-device computation can also signal when it's time to switch to working
on the cloud. A good example of this is hotword detection in speech. Since
devices are able to constantly listen out for the keywords, this then triggers a
lot of traffic to cloud-based speech recognition once one is recognised. Without
the on-device component, the whole application wouldn’t be feasible, and this
pattern exists across several other applications as well. Recognizing that some
sensor input is interesting enough for further processing makes a lot of
interesting products possible.
## What hardware and software should you have?
TensorFlow runs on Ubuntu Linux, Windows 10, and OS X. For a list of all
supported operating systems and instructions to install TensorFlow, see
@{$install$Installing Tensorflow}.
Some of the scripts in this guide require you to compile TensorFlow from source,
so you’ll need more than just `pip install` to work through all the sample code.
To try out the mobile examples, you’ll need a device set up for development,
using
either [Android Studio](https://developer.android.com/studio/install.html),
or [XCode](https://developer.apple.com/xcode/) if you're developing for iOS.
## What should you do before you get started?
Before thinking about how to get your solution on mobile:
1. Determine whether your problem is solvable by mobile machine learning
2. Create a labelled dataset to define your problem
3. Pick an effective model for the problem
We'll discuss these in more detail below.
### Is your problem solvable by mobile machine learning?
Once you have an idea of the problem you want to solve, you need to make a plan
of how to build your solution. The most important first step is making sure that
your problem is actually solvable, and the best way to do that is to mock it up
using humans in the loop.
For example, if you want to drive a robot toy car using voice commands, try
recording some audio from the device and listen back to it to see if you can
make sense of what’s being said. Often you’ll find there are problems in the
capture process, such as the motor drowning out speech or not being able to hear
at a distance, and you should tackle these problems before investing in the
modeling process.
Another example would be giving photos taken from your app to people see if they
can classify what’s in them, in the way you’re looking for. If they can’t do
that (for example, trying to estimate calories in food from photos may be
impossible because all white soups look the same), then you’ll need to redesign
your experience to cope with that. A good rule of thumb is that if a human can’t
handle the task then it will be difficult to train a computer to do better.
### Create a labelled dataset
After you’ve solved any fundamental issues with your use case, you need to
create a labeled dataset to define what problem you’re trying to solve. This
step is extremely important, moreso than picking which model to use. You want it
to be as representative as possible of your actual use case, since the model
will only be effective at the task you teach it. It’s also worth investing in
tools to make labeling the data as efficient and accurate as possible. For
example, if you’re able to switch from having to click a button on a web
interface to simple keyboard shortcuts, you may be able to speed up the
generation process a lot. You should also start by doing the initial labeling
yourself, so you can learn about the difficulties and likely errors, and
possibly change your labeling or data capture process to avoid them. Once you
and your team are able to consistently label examples (that is once you
generally agree on the same labels for most examples), you can then try and
capture your knowledge in a manual and teach external raters how to run the same
process.
### Pick an effective model
The next step is to pick an effective model to use. You might be able to avoid
training a model from scratch if someone else has already implemented a model
similar to what you need; we have a repository of models implemented in
TensorFlow [on Github](https://github.com/tensorflow/models) that you can look
through. Lean towards the simplest model you can find, and try to get started as
soon as you have even a small amount of labelled data, since you’ll get the best
results when you’re able to iterate quickly. The shorter the time it takes to
try training a model and running it in s real application, the better overall
results you’ll see. It’s common for an algorithm to get great training accuracy
numbers but then fail to be useful within a real application because there’s a
mismatch between the dataset and real usage. Prototype end-to-end usage as soon
as possible to create a consistent user experience.
# Building TensorFlow on iOS
## Using CocoaPods
The simplest way to get started with TensorFlow on iOS is using the CocoaPods
package management system. You can add the `TensorFlow-experimental` pod to your
Podfile, which installs a universal binary framework. This makes it easy to get
started but has the disadvantage of being hard to customize, which is important
in case you want to shrink your binary size. If you do need the ability to
customize your libraries, see later sections on how to do that.
## Creating your own app
If you'd like to add TensorFlow capabilities to your own app, do the following:
- Create your own app or load your already-created app in XCode.
- Add a file named Podfile at the project root directory with the following content:
target 'YourProjectName'
pod 'TensorFlow-experimental'
- Run `pod install` to download and install the `TensorFlow-experimental` pod.
- Open `YourProjectName.xcworkspace` and add your code.
- In your app's **Build Settings**, make sure to add `$(inherited)` to the
**Other Linker Flags**, and **Header Search Paths** sections.
## Running the Samples
You'll need Xcode 7.3 or later to run our iOS samples.
There are currently three examples: simple, benchmark, and camera. For now, you
can download the sample code by cloning the main tensorflow repository (we are
planning to make the samples available as a separate repository later).
From the root of the tensorflow folder, download [Inception
v1](https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip),
and extract the label and graph files into the data folders inside both the
simple and camera examples using these steps:
mkdir -p ~/graphs
curl -o ~/graphs/inception5h.zip \
https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip \
&& unzip ~/graphs/inception5h.zip -d ~/graphs/inception5h
cp ~/graphs/inception5h/* tensorflow/examples/ios/benchmark/data/
cp ~/graphs/inception5h/* tensorflow/examples/ios/camera/data/
cp ~/graphs/inception5h/* tensorflow/examples/ios/simple/data/
Change into one of the sample directories, download the
[Tensorflow-experimental](https://cocoapods.org/pods/TensorFlow-experimental)
pod, and open the Xcode workspace. Note that installing the pod can take a long
time since it is big (~450MB). If you want to run the simple example, then:
cd tensorflow/examples/ios/simple
pod install
open tf_simple_example.xcworkspace # note .xcworkspace, not .xcodeproj
# this is created by pod install
Run the simple app in the XCode simulator. You should see a single-screen app
with a **Run Model** button. Tap that, and you should see some debug output
appear below indicating that the example Grace Hopper image in directory data
has been analyzed, with a military uniform recognized.
Run the other samples using the same process. The camera example requires a real
device connected. Once you build and run that, you should get a live camera view
that you can point at objects to get real-time recognition results.
### iOS Example details
There are three demo applications for iOS, all defined in Xcode projects inside
[tensorflow/examples/ios](https://www.tensorflow.org/code/tensorflow/examples/ios/).
- **Simple**: This is a minimal example showing how to load and run a TensorFlow
model in as few lines as possible. It just consists of a single view with a
button that executes the model loading and inference when its pressed.
- **Camera**: This is very similar to the Android TF Classify demo. It loads
Inception v3 and outputs its best label estimate for what’s in the live camera
view. As with the Android version, you can train your own custom model using
TensorFlow for Poets and drop it into this example with minimal code changes.
- **Benchmark**: is quite close to Simple, but it runs the graph repeatedly and
outputs similar statistics to the benchmark tool on Android.
### Troubleshooting
- Make sure you use the TensorFlow-experimental pod (and not TensorFlow).
- The TensorFlow-experimental pod is current about ~450MB. The reason it is so
big is because we are bundling multiple platforms, and the pod includes all
TensorFlow functionality (e.g. operations). The final app size after build is
substantially smaller though (~25MB). Working with the complete pod is
convenient during development, but see below section on how you can build your
own custom TensorFlow library to reduce the size.
## Building the TensorFlow iOS libraries from source
While Cocapods is the quickest and easiest way of getting started, you sometimes
need more flexibility to determine which parts of TensorFlow your app should be
shipped with. For such cases, you can build the iOS libraries from the
sources. [This
guide](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/ios#building-the-tensorflow-ios-libraries-from-source)
contains detailed instructions on how to do that.
### TensorFlow for Mobile
index.md
android_build.md
ios_build.md
#raspi_build.md until this section gets rewritten, or TFLite takes over
linking_libs.md
prepare_models.md
optimizing.md
# Integrating TensorFlow libraries
Once you have made some progress on a model that addresses the problem you’re
trying to solve, it’s important to test it out inside your application
immediately. There are often unexpected differences between your training data
and what users actually encounter in the real world, and getting a clear picture
of the gap as soon as possible improves the product experience.
This page talks about how to integrate the TensorFlow libraries into your own
mobile applications, once you have already successfully built and deployed the
TensorFlow mobile demo apps.
## Linking the library
After you've managed to build the examples, you'll probably want to call
TensorFlow from one of your existing applications. The very easiest way to do
this is to use the Pod installation steps described
@{$mobile/ios_build#using_cocoapods$here}, but if you want to build TensorFlow
from source (for example to customize which operators are included) you'll need
to break out TensorFlow as a framework, include the right header files, and link
against the built libraries and dependencies.
### Android
For Android, you just need to link in a Java library contained in a JAR file
called `libandroid_tensorflow_inference_java.jar`. There are three ways to
include this functionality in your program:
1. Include the jcenter AAR which contains it, as in this
[example app](https://github.com/googlecodelabs/tensorflow-for-poets-2/blob/master/android/build.gradle#L59-L65)
2. Download the nightly precompiled version from
[ci.tensorflow.org](http://ci.tensorflow.org/view/Nightly/job/nightly-android/lastSuccessfulBuild/artifact/out/).
3. Build the JAR file yourself using the instructions [in our Android Github repo](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/android)
### iOS
Pulling in the TensorFlow libraries on iOS is a little more complicated. Here is
a checklist of what you’ll need to do to your iOS app:
- Link against tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a, usually
by adding `-L/your/path/tensorflow/contrib/makefile/gen/lib/` and
`-ltensorflow-core` to your linker flags.
- Link against the generated protobuf libraries by adding
`-L/your/path/tensorflow/contrib/makefile/gen/protobuf_ios/lib` and
`-lprotobuf` and `-lprotobuf-lite` to your command line.
- For the include paths, you need the root of your TensorFlow source folder as
the first entry, followed by
`tensorflow/contrib/makefile/downloads/protobuf/src`,
`tensorflow/contrib/makefile/downloads`,
`tensorflow/contrib/makefile/downloads/eigen`, and
`tensorflow/contrib/makefile/gen/proto`.
- Make sure your binary is built with `-force_load` (or the equivalent on your
platform), aimed at the TensorFlow library to ensure that it’s linked
correctly. More detail on why this is necessary can be found in the next
section, [Global constructor magic](#global_constructor_magic). On Linux-like
platforms, you’ll need different flags, more like
`-Wl,--allow-multiple-definition -Wl,--whole-archive`.
You’ll also need to link in the Accelerator framework, since this is used to
speed up some of the operations.
## Global constructor magic
One of the subtlest problems you may run up against is the “No session factory
registered for the given session options” error when trying to call TensorFlow
from your own application. To understand why this is happening and how to fix
it, you need to know a bit about the architecture of TensorFlow.
The framework is designed to be very modular, with a thin core and a large
number of specific objects that are independent and can be mixed and matched as
needed. To enable this, the coding pattern in C++ had to let modules easily
notify the framework about the services they offer, without requiring a central
list that has to be updated separately from each implementation. It also had to
allow separate libraries to add their own implementations without needing a
recompile of the core.
To achieve this capability, TensorFlow uses a registration pattern in a lot of
places. In the code, it looks like this:
class MulKernel : OpKernel {
Status Compute(OpKernelContext* context) { … }
};
REGISTER_KERNEL(MulKernel, “Mul”);
This would be in a standalone `.cc` file linked into your application, either
as part of the main set of kernels or as a separate custom library. The magic
part is that the `REGISTER_KERNEL()` macro is able to inform the core of
TensorFlow that it has an implementation of the Mul operation, so that it can be
called in any graphs that require it.
From a programming point of view, this setup is very convenient. The
implementation and registration code live in the same file, and adding new
implementations is as simple as compiling and linking it in. The difficult part
comes from the way that the `REGISTER_KERNEL()` macro is implemented. C++
doesn’t offer a good mechanism for doing this sort of registration, so we have
to resort to some tricky code. Under the hood, the macro is implemented so that
it produces something like this:
class RegisterMul {
public:
RegisterMul() {
global_kernel_registry()->Register(“Mul”, [](){
return new MulKernel()
});
}
};
RegisterMul g_register_mul;
This sets up a class `RegisterMul` with a constructor that tells the global
kernel registry what function to call when somebody asks it how to create a
“Mul” kernel. Then there’s a global object of that class, and so the constructor
should be called at the start of any program.
While this may sound sensible, the unfortunate part is that the global object
that’s defined is not used by any other code, so linkers not designed with this
in mind will decide that it can be deleted. As a result, the constructor is
never called, and the class is never registered. All sorts of modules use this
pattern in TensorFlow, and it happens that `Session` implementations are the
first to be looked for when the code is run, which is why it shows up as the
characteristic error when this problem occurs.
The solution is to force the linker to not strip any code from the library, even
if it believes it’s unused. On iOS, this step can be accomplished with the
`-force_load` flag, specifying a library path, and on Linux you need
`--whole-archive`. These persuade the linker to not be as aggressive about
stripping, and should retain the globals.
The actual implementation of the various `REGISTER_*` macros is a bit more
complicated in practice, but they all suffer the same underlying problem. If
you’re interested in how they work, [op_kernel.h](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_kernel.h#L1091)
is a good place to start investigating.
## Protobuf problems
TensorFlow relies on
the [Protocol Buffer](https://developers.google.com/protocol-buffers/) library,
commonly known as protobuf. This library takes definitions of data structures
and produces serialization and access code for them in a variety of
languages. The tricky part is that this generated code needs to be linked
against shared libraries for the exact same version of the framework that was
used for the generator. This can be an issue when `protoc`, the tool used to
generate the code, is from a different version of protobuf than the libraries in
the standard linking and include paths. For example, you might be using a copy
of `protoc` that was built locally in `~/projects/protobuf-3.0.1.a`, but you have
libraries installed at `/usr/local/lib` and `/usr/local/include` that are from
3.0.0.
The symptoms of this issue are errors during the compilation or linking phases
with protobufs. Usually, the build tools take care of this, but if you’re using
the makefile, make sure you’re building the protobuf library locally and using
it, as shown in [this Makefile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/makefile/Makefile#L18).
Another situation that can cause problems is when protobuf headers and source
files need to be generated as part of the build process. This process makes
building more complex, since the first phase has to be a pass over the protobuf
definitions to create all the needed code files, and only after that can you go
ahead and do a build of the library code.
### Multiple versions of protobufs in the same app
Protobufs generate headers that are needed as part of the C++ interface to the
overall TensorFlow library. This complicates using the library as a standalone
framework.
If your application is already using version 1 of the protocol buffers library,
you may have trouble integrating TensorFlow because it requires version 2. If
you just try to link both versions into the same binary, you’ll see linking
errors because some of the symbols clash. To solve this particular problem, we
have an experimental script at [rename_protobuf.sh](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/makefile/rename_protobuf.sh).
You need to run this as part of the makefile build, after you’ve downloaded all
the dependencies:
tensorflow/contrib/makefile/download_dependencies.sh
tensorflow/contrib/makefile/rename_protobuf.sh
## Calling the TensorFlow API
Once you have the framework available, you then need to call into it. The usual
pattern is that you first load your model, which represents a preset set of
numeric computations, and then you run inputs through that model (for example,
images from a camera) and receive outputs (for example, predicted labels).
On Android, we provide the Java Inference Library that is focused on just this
use case, while on iOS and Raspberry Pi you call directly into the C++ API.
### Android
Here’s what a typical Inference Library sequence looks like on Android:
// Load the model from disk.
TensorFlowInferenceInterface inferenceInterface =
new TensorFlowInferenceInterface(assetManager, modelFilename);
// Copy the input data into TensorFlow.
inferenceInterface.feed(inputName, floatValues, 1, inputSize, inputSize, 3);
// Run the inference call.
inferenceInterface.run(outputNames, logStats);
// Copy the output Tensor back into the output array.
inferenceInterface.fetch(outputName, outputs);
You can find the source of this code in the [Android examples](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/TensorFlowImageClassifier.java#L107).
### iOS and Raspberry Pi
Here’s the equivalent code for iOS and Raspberry Pi:
// Load the model.
PortableReadFileToProto(file_path, &tensorflow_graph);
// Create a session from the model.
tensorflow::Status s = session->Create(tensorflow_graph);
if (!s.ok()) {
LOG(FATAL) << "Could not create TensorFlow Graph: " << s;
}
// Run the model.
std::string input_layer = "input";
std::string output_layer = "output";
std::vector<tensorflow::Tensor> outputs;
tensorflow::Status run_status = session->Run({{input_layer, image_tensor}},
{output_layer}, {}, &outputs);
if (!run_status.ok()) {
LOG(FATAL) << "Running model failed: " << run_status;
}
// Access the output data.
tensorflow::Tensor* output = &outputs[0];
This is all based on the
[iOS sample code](https://www.tensorflow.org/code/tensorflow/examples/ios/simple/RunModelViewController.mm),
but there’s nothing iOS-specific; the same code should be usable on any platform
that supports C++.
You can also find specific examples for Raspberry Pi
[here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/pi_examples/label_image/label_image.cc).
此差异已折叠。
# Preparing models for mobile deployment
The requirements for storing model information during training are very
different from when you want to release it as part of a mobile app. This section
covers the tools involved in converting from a training model to something
releasable in production.
## What is up with all the different saved file formats?
You may find yourself getting very confused by all the different ways that
TensorFlow can save out graphs. To help, here’s a rundown of some of the
different components, and what they are used for. The objects are mostly defined
and serialized as protocol buffers:
- [NodeDef](https://www.tensorflow.org/code/tensorflow/core/framework/node_def.proto):
Defines a single operation in a model. It has a unique name, a list of the
names of other nodes it pulls inputs from, the operation type it implements
(for example `Add`, or `Mul`), and any attributes that are needed to control
that operation. This is the basic unit of computation for TensorFlow, and all
work is done by iterating through a network of these nodes, applying each one
in turn. One particular operation type that’s worth knowing about is `Const`,
since this holds information about a constant. This may be a single, scalar
number or string, but it can also hold an entire multi-dimensional tensor
array. The values for a `Const` are stored inside the `NodeDef`, and so large
constants can take up a lot of room when serialized.
- [Checkpoint](https://www.tensorflow.org/code/tensorflow/core/util/tensor_bundle/tensor_bundle.h). Another
way of storing values for a model is by using `Variable` ops. Unlike `Const`
ops, these don’t store their content as part of the `NodeDef`, so they take up
very little space within the `GraphDef` file. Instead their values are held in
RAM while a computation is running, and then saved out to disk as checkpoint
files periodically. This typically happens as a neural network is being
trained and weights are updated, so it’s a time-critical operation, and it may
happen in a distributed fashion across many workers, so the file format has to
be both fast and flexible. They are stored as multiple checkpoint files,
together with metadata files that describe what’s contained within the
checkpoints. When you’re referring to a checkpoint in the API (for example
when passing a filename in as a command line argument), you’ll use the common
prefix for a set of related files. If you had these files:
/tmp/model/model-chkpt-1000.data-00000-of-00002
/tmp/model/model-chkpt-1000.data-00001-of-00002
/tmp/model/model-chkpt-1000.index
/tmp/model/model-chkpt-1000.meta
You would refer to them as `/tmp/model/chkpt-1000`.
- [GraphDef](https://www.tensorflow.org/code/tensorflow/core/framework/graph.proto):
Has a list of `NodeDefs`, which together define the computational graph to
execute. During training, some of these nodes will be `Variables`, and so if
you want to have a complete graph you can run, including the weights, you’ll
need to call a restore operation to pull those values from
checkpoints. Because checkpoint loading has to be flexible to deal with all of
the training requirements, this can be tricky to implement on mobile and
embedded devices, especially those with no proper file system available like
iOS. This is where
the
[`freeze_graph.py`](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py) script
comes in handy. As mentioned above, `Const` ops store their values as part of
the `NodeDef`, so if all the `Variable` weights are converted to `Const` nodes,
then we only need a single `GraphDef` file to hold the model architecture and
the weights. Freezing the graph handles the process of loading the
checkpoints, and then converts all Consts to Variables. You can then load the
resulting file in a single call, without having to restore variable values
from checkpoints. One thing to watch out for with `GraphDef` files is that
sometimes they’re stored in text format for easy inspection. These versions
usually have a ‘.pbtxt’ filename suffix, whereas the binary files end with
‘.pb’.
- [FunctionDefLibrary](https://www.tensorflow.org/code/tensorflow/core/framework/function.proto):
This appears in `GraphDef`, and is effectively a set of sub-graphs, each with
information about their input and output nodes. Each sub-graph can then be
used as an op in the main graph, allowing easy instantiation of different
nodes, in a similar way to how functions encapsulate code in other languages.
- [MetaGraphDef](https://www.tensorflow.org/code/tensorflow/core/protobuf/meta_graph.proto):
A plain `GraphDef` only has information about the network of computations, but
doesn’t have any extra information about the model or how it can be
used. `MetaGraphDef` contains a `GraphDef` defining the computation part of
the model, but also includes information like ‘signatures’, which are
suggestions about which inputs and outputs you may want to call the model
with, data on how and where any checkpoint files are saved, and convenience
tags for grouping ops together for ease of use.
- [SavedModel](https://www.tensorflow.org/code/tensorflow/core/protobuf/saved_model.proto):
It’s common to want to have different versions of a graph that rely on a
common set of variable checkpoints. For example, you might need a GPU and a
CPU version of the same graph, but keep the same weights for both. You might
also need some extra files (like label names) as part of your
model. The
[SavedModel](https://www.tensorflow.org/code/tensorflow/python/saved_model/README.md) format
addresses these needs by letting you save multiple versions of the same graph
without duplicating variables, and also storing asset files in the same
bundle. Under the hood, it uses `MetaGraphDef` and checkpoint files, along
with extra metadata files. It’s the format that you’ll want to use if you’re
deploying a web API using TensorFlow Serving, for example.
## How do you get a model you can use on mobile?
In most situations, training a model with TensorFlow will give you a folder
containing a `GraphDef` file (usually ending with the `.pb` or `.pbtxt` extension) and
a set of checkpoint files. What you need for mobile or embedded deployment is a
single `GraphDef` file that’s been ‘frozen’, or had its variables converted into
inline constants so everything’s in one file. To handle the conversion, you’ll
need the `freeze_graph.py` script, that’s held in
[`tensorflow/python/tools/freeze_graph.py`](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py). You’ll run it like this:
bazel build tensorflow/tools:freeze_graph
bazel-bin/tensorflow/tools/freeze_graph \
--input_graph=/tmp/model/my_graph.pb \
--input_checkpoint=/tmp/model/model.ckpt-1000 \
--output_graph=/tmp/frozen_graph.pb \
--output_node_names=output_node \
The `input_graph` argument should point to the `GraphDef` file that holds your
model architecture. It’s possible that your `GraphDef` has been stored in a text
format on disk, in which case it’s likely to end in `.pbtxt` instead of `.pb`,
and you should add an extra `--input_binary=false` flag to the command.
The `input_checkpoint` should be the most recent saved checkpoint. As mentioned
in the checkpoint section, you need to give the common prefix to the set of
checkpoints here, rather than a full filename.
`output_graph` defines where the resulting frozen `GraphDef` will be
saved. Because it’s likely to contain a lot of weight values that take up a
large amount of space in text format, it’s always saved as a binary protobuf.
`output_node_names` is a list of the names of the nodes that you want to extract
the results of your graph from. This is needed because the freezing process
needs to understand which parts of the graph are actually needed, and which are
artifacts of the training process, like summarization ops. Only ops that
contribute to calculating the given output nodes will be kept. If you know how
your graph is going to be used, these should just be the names of the nodes you
pass into `Session::Run()` as your fetch targets. The easiest way to find the
node names is to inspect the Node objects while building your graph in python.
Inspecting your graph in TensorBoard is another simple way. You can get some
suggestions on likely outputs by running the [`summarize_graph` tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/README.md#inspecting-graphs).
Because the output format for TensorFlow has changed over time, there are a
variety of other less commonly used flags available too, like `input_saver`, but
hopefully you shouldn’t need these on graphs trained with modern versions of the
framework.
## Using the Graph Transform Tool
A lot of the things you need to do to efficiently run a model on device are
available through the [Graph Transform
Tool](https://www.tensorflow.org/code/tensorflow/tools/graph_transforms/README.md). This
command-line tool takes an input `GraphDef` file, applies the set of rewriting
rules you request, and then writes out the result as a `GraphDef`. See the
documentation for more information on how to build and run this tool.
### Removing training-only nodes
TensorFlow `GraphDefs` produced by the training code contain all of the
computation that’s needed for back-propagation and updates of weights, as well
as the queuing and decoding of inputs, and the saving out of checkpoints. All of
these nodes are no longer needed during inference, and some of the operations
like checkpoint saving aren’t even supported on mobile platforms. To create a
model file that you can load on devices you need to delete those unneeded
operations by running the `strip_unused_nodes` rule in the Graph Transform Tool.
The trickiest part of this process is figuring out the names of the nodes you
want to use as inputs and outputs during inference. You'll need these anyway
once you start to run inference, but you also need them here so that the
transform can calculate which nodes are not needed on the inference-only
path. These may not be obvious from the training code. The easiest way to
determine the node name is to explore the graph with TensorBoard.
Remember that mobile applications typically gather their data from sensors and
have it as arrays in memory, whereas training typically involves loading and
decoding representations of the data stored on disk. In the case of Inception v3
for example, there’s a `DecodeJpeg` op at the start of the graph that’s designed
to take JPEG-encoded data from a file retrieved from disk and turn it into an
arbitrary-sized image. After that there’s a `BilinearResize` op to scale it to
the expected size, followed by a couple of other ops that convert the byte data
into float and scale the value magnitudes it in the way the rest of the graph
expects. A typical mobile app will skip most of these steps because it’s getting
its input directly from a live camera, so the input node you will actually
supply will be the output of the `Mul` node in this case.
<img src ="../images/inception_input.png" width="300">
You’ll need to do a similar process of inspection to figure out the correct
output nodes.
If you’ve just been given a frozen `GraphDef` file, and are not sure about the
contents, try using the `summarize_graph` tool to print out information
about the inputs and outputs it finds from the graph structure. Here’s an
example with the original Inception v3 file:
bazel run tensorflow/tools/graph_transforms:summarize_graph --
--in_graph=tensorflow_inception_graph.pb
Once you have an idea of what the input and output nodes are, you can feed them
into the graph transform tool as the `--input_names` and `--output_names`
arguments, and call the `strip_unused_nodes` transform, like this:
bazel run tensorflow/tools/graph_transforms:transform_graph --
--in_graph=tensorflow_inception_graph.pb
--out_graph=optimized_inception_graph.pb --inputs='Mul' --outputs='softmax'
--transforms='
strip_unused_nodes(type=float, shape="1,299,299,3")
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms'
One thing to look out for here is that you need to specify the size and type
that you want your inputs to be. This is because any values that you’re going to
be passing in as inputs to inference need to be fed to special `Placeholder` op
nodes, and the transform may need to create them if they don’t already exist. In
the case of Inception v3 for example, a `Placeholder` node replaces the old
`Mul` node that used to output the resized and rescaled image array, since we’re
going to be doing that processing ourselves before we call TensorFlow. It keeps
the original name though, which is why we always feed in inputs to `Mul` when we
run a session with our modified Inception graph.
After you’ve run this process, you’ll have a graph that only contains the actual
nodes you need to run your prediction process. This is the point where it
becomes useful to run metrics on the graph, so it’s worth running
`summarize_graph` again to understand what’s in your model.
## What ops should you include on mobile?
There are hundreds of operations available in TensorFlow, and each one has
multiple implementations for different data types. On mobile platforms, the size
of the executable binary that’s produced after compilation is important, because
app download bundles need to be as small as possible for the best user
experience. If all of the ops and data types are compiled into the TensorFlow
library then the total size of the compiled library can be tens of megabytes, so
by default only a subset of ops and data types are included.
That means that if you load a model file that’s been trained on a desktop
machine, you may see the error “No OpKernel was registered to support Op” when
you load it on mobile. The first thing to try is to make sure you’ve stripped
out any training-only nodes, since the error will occur at load time even if the
op is never executed. If you’re still hitting the same problem once that’s done,
you’ll need to look at adding the op to your built library.
The criteria for including ops and types fall into several categories:
- Are they only useful in back-propagation, for gradients? Since mobile is
focused on inference, we don’t include these.
- Are they useful mainly for other training needs, such as checkpoint saving?
These we leave out.
- Do they rely on frameworks that aren’t always available on mobile, such as
libjpeg? To avoid extra dependencies we don’t include ops like `DecodeJpeg`.
- Are there types that aren’t commonly used? We don’t include boolean variants
of ops for example, since we don’t see much use of them in typical inference
graphs.
These ops are trimmed by default to optimize for inference on mobile, but it is
possible to alter some build files to change the default. After alternating the
build files, you will need to recompile TensorFlow. See below for more details
on how to do this, and also see @{$mobile/optimizing#binary_size$Optimizing} for
more on reducing your binary size.
### Locate the implementation
Operations are broken into two parts. The first is the op definition, which
declares the signature of the operation, which inputs, outputs, and attributes
it has. These take up very little space, and so all are included by default. The
implementations of the op computations are done in kernels, which live in the
`tensorflow/core/kernels` folder. You need to compile the C++ file containing
the kernel implementation of the op you need into the library. To figure out
which file that is, you can search for the operation name in the source
files.
[Here’s an example search in github](https://github.com/search?utf8=%E2%9C%93&q=repo%3Atensorflow%2Ftensorflow+extension%3Acc+path%3Atensorflow%2Fcore%2Fkernels+REGISTER+Mul&type=Code&ref=searchresults).
You’ll see that this search is looking for the `Mul` op implementation, and it
finds it in `tensorflow/core/kernels/cwise_op_mul_1.cc`. You need to look for
macros beginning with `REGISTER`, with the op name you care about as one of the
string arguments.
In this case, the implementations are actually broken up across multiple `.cc`
files, so you’d need to include all of them in your build. If you’re more
comfortable using the command line for code search, here’s a grep command that
also locates the right files if you run it from the root of your TensorFlow
repository:
`grep 'REGISTER.*"Mul"' tensorflow/core/kernels/*.cc`
### Add the implementation to the build
If you’re using Bazel, and building for Android, you’ll want to add the files
you’ve found to
the
[`android_extended_ops_group1`](https://www.tensorflow.org/code/tensorflow/core/kernels/BUILD#L3565) or
[`android_extended_ops_group2`](https://www.tensorflow.org/code/tensorflow/core/kernels/BUILD#L3632) targets. You
may also need to include any .cc files they depend on in there. If the build
complains about missing header files, add the .h’s that are needed into
the
[`android_extended_ops`](https://www.tensorflow.org/code/tensorflow/core/kernels/BUILD#L3525) target.
If you’re using a makefile targetting iOS, Raspberry Pi, etc, go to
[`tensorflow/contrib/makefile/tf_op_files.txt`](https://www.tensorflow.org/code/tensorflow/contrib/makefile/tf_op_files.txt) and
add the right implementation files there.
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册