`torch.utils.bottleneck` is a tool that can be used as an initial step for debugging bottlenecks in your program. It summarizes runs of your script with the Python profiler and PyTorch’s autograd profiler.
Due to the asynchronous nature of CUDA kernels, when running against CUDA code, the cProfile output and CPU-mode autograd profilers may not show correct timings: the reported CPU time reports the amount of time used to launch the kernels but does not include the time the kernel spent executing on a GPU unless the operation does a synchronize. Ops that do synchronize appear to be extremely expensive under regular CPU-mode profilers. In these case where timings are incorrect, the CUDA-mode autograd profiler may be helpful.
To decide which (CPU-only-mode or CUDA-mode) autograd profiler output to look at, you should first check if your script is CPU-bound (“CPU total time is much greater than CUDA total time”). If it is CPU-bound, looking at the results of the CPU-mode autograd profiler will help. If on the other hand your script spends most of its time executing on the GPU, then it makes sense to start looking for responsible CUDA operators in the output of the CUDA-mode autograd profiler.
Of course the reality is much more complicated and your script might not be in one of those two extremes depending on the part of the model you’re evaluating. If the profiler outputs don’t help, you could try looking at the result of [`torch.autograd.profiler.emit_nvtx()`](autograd.html#torch.autograd.profiler.emit_nvtx"torch.autograd.profiler.emit_nvtx") with `nvprof`. However, please take into account that the NVTX overhead is very high and often gives a heavily skewed timeline.
Warning
警告
If you are profiling CUDA code, the first profiler that `bottleneck` runs (cProfile) will include the CUDA startup time (CUDA buffer allocation cost) in its time reporting. This should not matter if your bottlenecks result in code much slower than the CUDA startup time.
For more complicated uses of the profilers (like in a multi-GPU case), please see [https://docs.python.org/3/library/profile.html](https://docs.python.org/3/library/profile.html) or [`torch.autograd.profiler.profile()`](autograd.html#torch.autograd.profiler.profile"torch.autograd.profiler.profile") for more information.
Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward. This can cause persistent states like the RNG state to be advanced than they would without checkpointing. By default, checkpointing includes logic to juggle the RNG state such that checkpointed passes making use of RNG (through dropout for example) have deterministic output as compared to non-checkpointed passes. The logic to stash and restore RNG states can incur a moderate performance hit depending on the runtime of checkpointed operations. If deterministic output compared to non-checkpointed passes is not required, set the global flag `torch.utils.checkpoint.preserve_rng_state=False` to omit stashing and restoring the RNG state during each checkpoint.
```py
torch.utils.checkpoint.checkpoint(function,*args)
```
Checkpoint a model or part of the model
checkpoint模型或模型的一部分
Checkpointing works by trading compute for memory. Rather than storing all intermediate activations of the entire computation graph for computing backward, the checkpointed part does **not** save intermediate activations, and instead recomputes them in backward pass. It can be applied on any part of a model.
Specifically, in the forward pass, `function` will run in `torch.no_grad()` manner, i.e., not storing the intermediate activations. Instead, the forward pass saves the inputs tuple and the `function` parameter. In the backwards pass, the saved inputs and `function` is retreived, and the forward pass is computed on `function` again, now tracking the intermediate activations, and then the gradients are calculated using these activation values.
Checkpointing doesn’t work with [`torch.autograd.grad()`](autograd.html#torch.autograd.grad"torch.autograd.grad"), but only with [`torch.autograd.backward()`](autograd.html#torch.autograd.backward"torch.autograd.backward").
If `function` invocation during backward does anything different than the one during forward, e.g., due to some global variable, the checkpointed version won’t be equivalent, and unfortunately it can’t be detected.
***function** – describes what to run in the forward pass of the model or part of the model. It should also know how to handle the inputs passed as the tuple. For example, in LSTM, if user passes `(activation, hidden)`, `function` should correctly use the first input as `activation` and the second input as `hidden`
***args** – tuple containing inputs to the `function`
A helper function for checkpointing sequential models.
用于checkpoint sequential模型的辅助函数
Sequential models execute a list of modules/functions in order (sequentially). Therefore, we can divide such a model in various segments and checkpoint each segment. All segments except the last will run in `torch.no_grad()` manner, i.e., not storing the intermediate activations. The inputs of each checkpointed segment will be saved for re-running the segment in the backward pass.
Checkpointing doesn’t work with [`torch.autograd.grad()`](autograd.html#torch.autograd.grad"torch.autograd.grad"), but only with [`torch.autograd.backward()`](autograd.html#torch.autograd.backward"torch.autograd.backward").
***functions** – A [`torch.nn.Sequential`](nn.html#torch.nn.Sequential"torch.nn.Sequential") or the list of modules or functions (comprising the model) to run sequentially.
***segments** – Number of chunks to create in the model
***inputs** – tuple of Tensors that are inputs to `functions`
| Returns: | Output of running `functions` sequentially on `*inputs` |
Convenience method that creates a `setuptools.Extension` with the bare minimum (but often sufficient) arguments to build a CUDA/C++ extension. This includes the CUDA include path, library path and runtime library.
This `setuptools.build_ext` subclass takes care of passing the minimum required compiler flags (e.g. `-std=c++11`) as well as mixed C++/CUDA compilation (and support for CUDA files in general).
When using [`BuildExtension`](#torch.utils.cpp_extension.BuildExtension"torch.utils.cpp_extension.BuildExtension"), it is allowed to supply a dictionary for `extra_compile_args` (rather than the usual list) that maps from languages (`cxx` or `cuda`) to a list of additional compiler flags to supply to the compiler. This makes it possible to supply different flags to the C++ and CUDA compiler during mixed compilation.
To load an extension, a Ninja build file is emitted, which is used to compile the given sources into a dynamic library. This library is subsequently loaded into the current Python process as a module and returned from this function, ready for use.
By default, the directory to which the build file is emitted and the resulting library compiled to is `<tmp>/torch_extensions/<name>`, where `<tmp>` is the temporary folder on the current platform and `<name>` the name of the extension. This location can be overridden in two ways. First, if the `TORCH_EXTENSIONS_DIR` environment variable is set, it replaces `<tmp>/torch_extensions` and all extensions will be compiled into subfolders of this directory. Second, if the `build_directory` argument to this function is supplied, it overrides the entire path, i.e. the library will be compiled into that folder directly.
To compile the sources, the default system compiler (`c++`) is used, which can be overridden by setting the `CXX` environment variable. To pass additional arguments to the compilation process, `extra_cflags` or `extra_ldflags` can be provided. For example, to compile your extension with optimizations, pass `extra_cflags=['-O3']`. You can also use `extra_cflags` to pass further include directories.
CUDA support with mixed compilation is provided. Simply pass CUDA source files (`.cu` or `.cuh`) along with other sources. Such files will be detected and compiled with nvcc rather than the C++ compiler. This includes passing the CUDA lib64 directory as a library directory, and linking `cudart`. You can pass additional flags to nvcc via `extra_cuda_cflags`, just like with `extra_cflags` for C++. Various heuristics for finding the CUDA install directory are used, which usually work fine. If not, setting the `CUDA_HOME` environment variable is the safest option.
***name** – The name of the extension to build. This MUST be the same as the name of the pybind11 module!
***sources** – A list of relative or absolute paths to C++ source files.
***extra_cflags** – optional list of compiler flags to forward to the build.
***extra_cuda_cflags** – optional list of compiler flags to forward to nvcc when building CUDA sources.
***extra_ldflags** – optional list of linker flags to forward to the build.
***extra_include_paths** – optional list of include directories to forward to the build.
***build_directory** – optional path to use as build workspace.
***verbose** – If `True`, turns on verbose logging of load steps.
***with_cuda** – Determines whether CUDA headers and libraries are added to the build. If set to `None` (default), this value is automatically determined based on the existence of `.cu` or `.cuh` in `sources`. Set it to `True`` to force CUDA headers and libraries to be included.
* **is_python_module** – If `True` (default), imports the produced shared library as a Python module. If `False`, loads it into the process as a plain dynamic library.
| Returns: | If `is_python_module` is `True`, returns the loaded PyTorch extension as a Python module. If `is_python_module` is `False` returns nothing (the shared library is loaded into the process as a side effect). |
Loads a PyTorch C++ extension just-in-time (JIT) from string sources.
在运行时编译加载PyTorch C++ 扩展
This function behaves exactly like [`load()`](#torch.utils.cpp_extension.load "torch.utils.cpp_extension.load"), but takes its sources as strings rather than filenames. These strings are stored to files in the build directory, after which the behavior of [`load_inline()`](#torch.utils.cpp_extension.load_inline "torch.utils.cpp_extension.load_inline") is identical to [`load()`](#torch.utils.cpp_extension.load "torch.utils.cpp_extension.load").
Sources may omit two required parts of a typical non-inline C++ extension: the necessary header includes, as well as the (pybind11) binding code. More precisely, strings passed to `cpp_sources` are first concatenated into a single `.cpp` file. This file is then prepended with `#include <torch/extension.h>`.
Furthermore, if the `functions` argument is supplied, bindings will be automatically generated for each function specified. `functions` can either be a list of function names, or a dictionary mapping from function names to docstrings. If a list is given, the name of each function is used as its docstring.
The sources in `cuda_sources` are concatenated into a separate `.cu` file and prepended with `torch/types.h`, `cuda.h` and `cuda_runtime.h` includes. The `.cpp` and `.cu` files are compiled separately, but ultimately linked into a single library. Note that no bindings are generated for functions in `cuda_sources` per se. To bind to a CUDA kernel, you must create a C++ function that calls it, and either declare or define this C++ function in one of the `cpp_sources` (and include its name in `functions`).
See [`load()`](#torch.utils.cpp_extension.load "torch.utils.cpp_extension.load") for a description of arguments omitted below.
`cuda_sources`中的代码按顺序连接到单独的`.cu`文件,追加`torch/types.h`, `cuda.h` and `cuda_runtime.h`头文件.`.cpp` 和 `.cu` 文件分开编译, 最终连接到一个库中. 注意`cuda_sources`中的函数本身没有绑定,为了绑定CUDA核函数,必须新建一个C++函数来调用它,或者在`cpp_sources` 中声明或定义(并且在`functions`中包含它).
Parameters:
* **cpp_sources** – A string, or list of strings, containing C++ source code.
* **cuda_sources** – A string, or list of strings, containing CUDA source code.
* **functions** – A list of function names for which to generate function bindings. If a dictionary is given, it should map function names to docstrings (which are otherwise just the function names).
* **with_cuda** – Determines whether CUDA headers and libraries are added to the build. If set to `None` (default), this value is automatically determined based on whether `cuda_sources` is provided. Set it to `True`` to force CUDA headers and libraries to be included.
Verifies that the given compiler is ABI-compatible with PyTorch.
验证给定的编译器是否与`PyTorch` ABI兼容。
| Parameters: | **compiler** ([_str_](https://docs.python.org/3/library/stdtypes.html#str "(in Python v3.7)")) – The compiler executable name to check (e.g. `g++`). Must be executable in a shell process. |
| --- | --- |
| Returns: | False if the compiler is (likely) ABI-incompatible with PyTorch, else True. |