3. Build the JAR file yourself using the instructions [in our Android Github repo](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/android)
### iOS
Pulling in the TensorFlow libraries on iOS is a little more complicated. Here is
a checklist of what you’ll need to do to your iOS app:
- Link against tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a, usually
by adding `-L/your/path/tensorflow/contrib/makefile/gen/lib/` and
`-ltensorflow-core` to your linker flags.
- Link against the generated protobuf libraries by adding
`-L/your/path/tensorflow/contrib/makefile/gen/protobuf_ios/lib` and
`-lprotobuf` and `-lprotobuf-lite` to your command line.
- For the include paths, you need the root of your TensorFlow source folder as
You’ll also need to link in the Accelerator framework, since this is used to
speed up some of the operations.
## Global constructor magic
One of the subtlest problems you may run up against is the “No session factory
registered for the given session options” error when trying to call TensorFlow
from your own application. To understand why this is happening and how to fix
it, you need to know a bit about the architecture of TensorFlow.
The framework is designed to be very modular, with a thin core and a large
number of specific objects that are independent and can be mixed and matched as
needed. To enable this, the coding pattern in C++ had to let modules easily
notify the framework about the services they offer, without requiring a central
list that has to be updated separately from each implementation. It also had to
allow separate libraries to add their own implementations without needing a
recompile of the core.
To achieve this capability, TensorFlow uses a registration pattern in a lot of
places. In the code, it looks like this:
class MulKernel : OpKernel {
Status Compute(OpKernelContext* context) { … }
};
REGISTER_KERNEL(MulKernel, “Mul”);
This would be in a standalone `.cc` file linked into your application, either
as part of the main set of kernels or as a separate custom library. The magic
part is that the `REGISTER_KERNEL()` macro is able to inform the core of
TensorFlow that it has an implementation of the Mul operation, so that it can be
called in any graphs that require it.
From a programming point of view, this setup is very convenient. The
implementation and registration code live in the same file, and adding new
implementations is as simple as compiling and linking it in. The difficult part
comes from the way that the `REGISTER_KERNEL()` macro is implemented. C++
doesn’t offer a good mechanism for doing this sort of registration, so we have
to resort to some tricky code. Under the hood, the macro is implemented so that
it produces something like this:
class RegisterMul {
public:
RegisterMul() {
global_kernel_registry()->Register(“Mul”, [](){
return new MulKernel()
});
}
};
RegisterMul g_register_mul;
This sets up a class `RegisterMul` with a constructor that tells the global
kernel registry what function to call when somebody asks it how to create a
“Mul” kernel. Then there’s a global object of that class, and so the constructor
should be called at the start of any program.
While this may sound sensible, the unfortunate part is that the global object
that’s defined is not used by any other code, so linkers not designed with this
in mind will decide that it can be deleted. As a result, the constructor is
never called, and the class is never registered. All sorts of modules use this
pattern in TensorFlow, and it happens that `Session` implementations are the
first to be looked for when the code is run, which is why it shows up as the
characteristic error when this problem occurs.
The solution is to force the linker to not strip any code from the library, even
if it believes it’s unused. On iOS, this step can be accomplished with the
`-force_load` flag, specifying a library path, and on Linux you need
`--whole-archive`. These persuade the linker to not be as aggressive about
stripping, and should retain the globals.
The actual implementation of the various `REGISTER_*` macros is a bit more
complicated in practice, but they all suffer the same underlying problem. If
you’re interested in how they work, [op_kernel.h](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_kernel.h#L1091)
is a good place to start investigating.
## Protobuf problems
TensorFlow relies on
the [Protocol Buffer](https://developers.google.com/protocol-buffers/) library,
commonly known as protobuf. This library takes definitions of data structures
and produces serialization and access code for them in a variety of
languages. The tricky part is that this generated code needs to be linked
against shared libraries for the exact same version of the framework that was
used for the generator. This can be an issue when `protoc`, the tool used to
generate the code, is from a different version of protobuf than the libraries in
the standard linking and include paths. For example, you might be using a copy
of `protoc` that was built locally in `~/projects/protobuf-3.0.1.a`, but you have
libraries installed at `/usr/local/lib` and `/usr/local/include` that are from
3.0.0.
The symptoms of this issue are errors during the compilation or linking phases
with protobufs. Usually, the build tools take care of this, but if you’re using
the makefile, make sure you’re building the protobuf library locally and using
it, as shown in [this Makefile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/makefile/Makefile#L18).
Another situation that can cause problems is when protobuf headers and source
files need to be generated as part of the build process. This process makes
building more complex, since the first phase has to be a pass over the protobuf
definitions to create all the needed code files, and only after that can you go
ahead and do a build of the library code.
### Multiple versions of protobufs in the same app
Protobufs generate headers that are needed as part of the C++ interface to the
overall TensorFlow library. This complicates using the library as a standalone
framework.
If your application is already using version 1 of the protocol buffers library,
you may have trouble integrating TensorFlow because it requires version 2. If
you just try to link both versions into the same binary, you’ll see linking
errors because some of the symbols clash. To solve this particular problem, we
have an experimental script at [rename_protobuf.sh](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/makefile/rename_protobuf.sh).
You need to run this as part of the makefile build, after you’ve downloaded all
// Copy the output Tensor back into the output array.
inferenceInterface.fetch(outputName, outputs);
You can find the source of this code in the [Android examples](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/android/src/org/tensorflow/demo/TensorFlowImageClassifier.java#L107).
### iOS and Raspberry Pi
Here’s the equivalent code for iOS and Raspberry Pi:
It’s common to want to have different versions of a graph that rely on a
common set of variable checkpoints. For example, you might need a GPU and a
CPU version of the same graph, but keep the same weights for both. You might
also need some extra files (like label names) as part of your
model. The
[SavedModel](https://www.tensorflow.org/code/tensorflow/python/saved_model/README.md) format
addresses these needs by letting you save multiple versions of the same graph
without duplicating variables, and also storing asset files in the same
bundle. Under the hood, it uses `MetaGraphDef` and checkpoint files, along
with extra metadata files. It’s the format that you’ll want to use if you’re
deploying a web API using TensorFlow Serving, for example.
## How do you get a model you can use on mobile?
In most situations, training a model with TensorFlow will give you a folder
containing a `GraphDef` file (usually ending with the `.pb` or `.pbtxt` extension) and
a set of checkpoint files. What you need for mobile or embedded deployment is a
single `GraphDef` file that’s been ‘frozen’, or had its variables converted into
inline constants so everything’s in one file. To handle the conversion, you’ll
need the `freeze_graph.py` script, that’s held in
[`tensorflow/python/tools/freeze_graph.py`](https://www.tensorflow.org/code/tensorflow/python/tools/freeze_graph.py). You’ll run it like this:
bazel build tensorflow/tools:freeze_graph
bazel-bin/tensorflow/tools/freeze_graph \
--input_graph=/tmp/model/my_graph.pb \
--input_checkpoint=/tmp/model/model.ckpt-1000 \
--output_graph=/tmp/frozen_graph.pb \
--output_node_names=output_node \
The `input_graph` argument should point to the `GraphDef` file that holds your
model architecture. It’s possible that your `GraphDef` has been stored in a text
format on disk, in which case it’s likely to end in `.pbtxt` instead of `.pb`,
and you should add an extra `--input_binary=false` flag to the command.
The `input_checkpoint` should be the most recent saved checkpoint. As mentioned
in the checkpoint section, you need to give the common prefix to the set of
checkpoints here, rather than a full filename.
`output_graph` defines where the resulting frozen `GraphDef` will be
saved. Because it’s likely to contain a lot of weight values that take up a
large amount of space in text format, it’s always saved as a binary protobuf.
`output_node_names` is a list of the names of the nodes that you want to extract
the results of your graph from. This is needed because the freezing process
needs to understand which parts of the graph are actually needed, and which are
artifacts of the training process, like summarization ops. Only ops that
contribute to calculating the given output nodes will be kept. If you know how
your graph is going to be used, these should just be the names of the nodes you
pass into `Session::Run()` as your fetch targets. The easiest way to find the
node names is to inspect the Node objects while building your graph in python.
Inspecting your graph in TensorBoard is another simple way. You can get some
suggestions on likely outputs by running the [`summarize_graph` tool](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/README.md#inspecting-graphs).
Because the output format for TensorFlow has changed over time, there are a
variety of other less commonly used flags available too, like `input_saver`, but
hopefully you shouldn’t need these on graphs trained with modern versions of the
framework.
## Using the Graph Transform Tool
A lot of the things you need to do to efficiently run a model on device are
available through the [Graph Transform
Tool](https://www.tensorflow.org/code/tensorflow/tools/graph_transforms/README.md). This
command-line tool takes an input `GraphDef` file, applies the set of rewriting
rules you request, and then writes out the result as a `GraphDef`. See the
documentation for more information on how to build and run this tool.
### Removing training-only nodes
TensorFlow `GraphDefs` produced by the training code contain all of the
computation that’s needed for back-propagation and updates of weights, as well
as the queuing and decoding of inputs, and the saving out of checkpoints. All of
these nodes are no longer needed during inference, and some of the operations
like checkpoint saving aren’t even supported on mobile platforms. To create a
model file that you can load on devices you need to delete those unneeded
operations by running the `strip_unused_nodes` rule in the Graph Transform Tool.
The trickiest part of this process is figuring out the names of the nodes you
want to use as inputs and outputs during inference. You'll need these anyway
once you start to run inference, but you also need them here so that the
transform can calculate which nodes are not needed on the inference-only
path. These may not be obvious from the training code. The easiest way to
determine the node name is to explore the graph with TensorBoard.
Remember that mobile applications typically gather their data from sensors and
have it as arrays in memory, whereas training typically involves loading and
decoding representations of the data stored on disk. In the case of Inception v3
for example, there’s a `DecodeJpeg` op at the start of the graph that’s designed
to take JPEG-encoded data from a file retrieved from disk and turn it into an
arbitrary-sized image. After that there’s a `BilinearResize` op to scale it to
the expected size, followed by a couple of other ops that convert the byte data
into float and scale the value magnitudes it in the way the rest of the graph
expects. A typical mobile app will skip most of these steps because it’s getting
its input directly from a live camera, so the input node you will actually
supply will be the output of the `Mul` node in this case.
One thing to look out for here is that you need to specify the size and type
that you want your inputs to be. This is because any values that you’re going to
be passing in as inputs to inference need to be fed to special `Placeholder` op
nodes, and the transform may need to create them if they don’t already exist. In
the case of Inception v3 for example, a `Placeholder` node replaces the old
`Mul` node that used to output the resized and rescaled image array, since we’re
going to be doing that processing ourselves before we call TensorFlow. It keeps
the original name though, which is why we always feed in inputs to `Mul` when we
run a session with our modified Inception graph.
After you’ve run this process, you’ll have a graph that only contains the actual
nodes you need to run your prediction process. This is the point where it
becomes useful to run metrics on the graph, so it’s worth running
`summarize_graph` again to understand what’s in your model.
## What ops should you include on mobile?
There are hundreds of operations available in TensorFlow, and each one has
multiple implementations for different data types. On mobile platforms, the size
of the executable binary that’s produced after compilation is important, because
app download bundles need to be as small as possible for the best user
experience. If all of the ops and data types are compiled into the TensorFlow
library then the total size of the compiled library can be tens of megabytes, so
by default only a subset of ops and data types are included.
That means that if you load a model file that’s been trained on a desktop
machine, you may see the error “No OpKernel was registered to support Op” when
you load it on mobile. The first thing to try is to make sure you’ve stripped
out any training-only nodes, since the error will occur at load time even if the
op is never executed. If you’re still hitting the same problem once that’s done,
you’ll need to look at adding the op to your built library.
The criteria for including ops and types fall into several categories:
- Are they only useful in back-propagation, for gradients? Since mobile is
focused on inference, we don’t include these.
- Are they useful mainly for other training needs, such as checkpoint saving?
These we leave out.
- Do they rely on frameworks that aren’t always available on mobile, such as
libjpeg? To avoid extra dependencies we don’t include ops like `DecodeJpeg`.
- Are there types that aren’t commonly used? We don’t include boolean variants
of ops for example, since we don’t see much use of them in typical inference
graphs.
These ops are trimmed by default to optimize for inference on mobile, but it is
possible to alter some build files to change the default. After alternating the
build files, you will need to recompile TensorFlow. See below for more details
on how to do this, and also see @{$mobile/optimizing#binary_size$Optimizing} for
more on reducing your binary size.
### Locate the implementation
Operations are broken into two parts. The first is the op definition, which
declares the signature of the operation, which inputs, outputs, and attributes
it has. These take up very little space, and so all are included by default. The
implementations of the op computations are done in kernels, which live in the
`tensorflow/core/kernels` folder. You need to compile the C++ file containing
the kernel implementation of the op you need into the library. To figure out
which file that is, you can search for the operation name in the source
files.
[Here’s an example search in github](https://github.com/search?utf8=%E2%9C%93&q=repo%3Atensorflow%2Ftensorflow+extension%3Acc+path%3Atensorflow%2Fcore%2Fkernels+REGISTER+Mul&type=Code&ref=searchresults).
You’ll see that this search is looking for the `Mul` op implementation, and it
finds it in `tensorflow/core/kernels/cwise_op_mul_1.cc`. You need to look for
macros beginning with `REGISTER`, with the op name you care about as one of the
string arguments.
In this case, the implementations are actually broken up across multiple `.cc`
files, so you’d need to include all of them in your build. If you’re more
comfortable using the command line for code search, here’s a grep command that
also locates the right files if you run it from the root of your TensorFlow