diff --git a/docs/design/coreclr/botr/ilc-architecture.md b/docs/design/coreclr/botr/ilc-architecture.md new file mode 100644 index 0000000000000000000000000000000000000000..697f2b7a852e9eb561022826a12dac09039edc9c --- /dev/null +++ b/docs/design/coreclr/botr/ilc-architecture.md @@ -0,0 +1,191 @@ +# ILC Compiler Architecture + +Author: Michal Strehovsky ([@MichalStrehovsky](https://github.com/MichalStrehovsky)) - 2018 + +ILC (IL Compiler) is an ahead of time compiler that transforms programs in CIL (Common Intermediate Language) into a target language or instruction set to be executed on a stripped down CoreCLR runtime. The input to ILC is the common instruction format generated by popular managed language compilers such as C#, VB.NET, or F#. The output of ILC is native code for the target platform, along with data structures required to support executing the code on the target runtime. With a bit of stretch, one could say that ILC is an ahead of time native compiler for C#. + +Traditionally, CIL has been compiled "just in time" (JIT). What this means is that the translation from CIL to the instruction set executable on the target runtime environment happened on an as-needed basis when the native code became necessary to continue execution of the program (e.g. on a first call to a CIL method). An ahead of time compiler tries to prepare the code and data structures in advance - before the program starts executing. The major advantages of having native code and data structures required for the code to execute available in advance are significant improvements to program startup time and working set. + +In a fully ahead of time compiled environment, the compiler is responsible for generating code and data structures for everything that might be needed at runtime - the presence of the original CIL instructions or program metadata (names of methods and their signature, for example) is no longer necessary after compilation. One important aspect to keep in mind is that ahead of time compilation does not preclude just in time compilation: one could imagine mixed modes of execution where some parts of the application are compiled ahead of time, while others are compiled just in time, or interpreted. ILC needs to support such modes of operations, since both have their advantages and disadvantages. We have prototyped such modes of execution in the past. + +## Goals + +* Compile CIL and produce native code for the target platform +* Generate essential data structures that the runtime requires to execute managed native code (exception handling and GC information for methods, data structures describing types, their GC layout and vtables, interface dispatch maps, etc.) +* Generate optional data structures the base class libraries require to provide rich managed APIs to user code (data structures that support reflection, interop, textual stack trace information, type loading at runtime, etc.) +* Support optional inputs from a whole program analysis step to influence compilation +* Support generating executable files and static/dynamic libraries (with a flat C-style public API surface) +* Support multiple modes of compilation: + * Single-file output: + * All input assemblies merged into a single object file generated by ILC (merging managed assemblies happens in ILC). This mode allows maximum optimization. + * Generating multiple object files that are merged using a platform linker into a single executable (merging happens in the native linker after ILC runs). This mode allows incremental compilation at the cost of inhibiting optimizations in ILC. + * Multi-file output (one or more input assemblies generating one or more dynamic libraries that link against each other dynamically). This mode allows code/data sharing among several executables or dynamic libraries, but inhibits many optimizations. +* Multi-threaded compilation +* Generate native debug information for the target platform to allow debugging with native debuggers +* Generate outputs in the platform's native object file format (`.obj` and `.o` files) +* Have a defined behavior when input is incomplete (e.g. assemblies missing, or an assembly version mismatch) + +## ILC composition +ILC is composed of roughly 3 parts: the compilation driver, the compiler, and the code generation backends. + +### Compilation driver +The role of the compilation driver is to parse command line arguments, set up the compiler, and run the compilation. The process of setting up the compiler involves configuring a `CompilationBuilder`. The compilation builder exposes methods that let the driver configure and compose various pieces of the compiler as directed by the command line arguments. These components influence what gets compiled and how the compilation happens. Eventually the driver constructs a `Compilation` object that provides methods to run the compilation, inspect the results of the compilation, and write the outputs to a file on disk. + +Related classes: `CompilationBuilder`, `ICompilation` + +### Compiler +Compiler is the core component that the rest of the document is going to talk about. It's responsible for running the compilation process and generating data structures for the target runtime and target base class libraries. + +The compiler tries to stay policy free as to what to compile, what data structures to generate, and how to do the compilation. The specific policies are supplied by the compilation driver as part of configuring the compilation. + +### Code generation backends +ILC is designed to support multiple code generation backends that target the same runtime. What this means is that we have a model where there are common parts within the compiler (determining what needs to be compiled, generating data structures for the underlying runtime), and parts that are specific for a target environment. The common parts (the data structure layout) are not target specific - the target specific differences are limited to general questions, such as "does the target have representation for relative pointers?", but the basic shape of the data structures is the same, no matter the target platform. +ILC currently supports following codegen backends (with varying levels of completeness): + +* **RyuJIT**: native code generator also used as the JIT compiler in CoreCLR. This backend supports x64, arm64, arm32 on Windows, Linux, macOS, and BSD. +* **LLVM**: the LLVM backend is currently used to generate WebAssembly code in connection with Emscripten. Lives in [NativeAOT-LLVM branch](https://github.com/dotnet/runtimelab/tree/feature/NativeAOT-LLVM). + +This document describes the common parts of the compiler that are applicable to all codegen backends. + +In the past, ILC supported following backends: + +* **CppCodegen**: a portable code generator that translates CIL into C++ code. This supports rapid platform bringup (instead of having to build a code generator for a new CPU architecture, it relies on the C++ compiler the platform likely already has). Portability comes at certain costs. This codegen backend wasn't [brought over](https://github.com/dotnet/corert/tree/master/src/ILCompiler.CppCodeGen/src) from the now archived CoreRT repo. + +Related project files: ILCompiler.LLVM.csproj, ILCompiler.RyuJit.csproj + +## Dependency analysis +The core concept driving the compilation in ILC is dependency analysis. Dependency analysis is the process of determining the set of runtime artifacts (method code bodies and various data structures) that need to be generated into the output object file. Dependency analysis builds a graph where each vertex either +* represents an artifact that will be part of the output file (such as "compiled method body" or "data structure describing a type at runtime") - this is an "object node", or +* captures certain abstract characteristic of the compiled program (such as "the program contains a virtual call to the `Object.GetHashCode` method") - a general "dependency node". General dependency nodes do not physically manifest themselves as bytes in the output, but they usually have edges that (transitively) lead to object nodes that do form parts of the output. + +The edges of the graph represent a "requires" relationship. The compilation process corresponds to building this graph and determining what nodes are part of the graph. + +Related classes: `DependencyNodeCore<>`, `ObjectNode` + +Related project files: ILCompiler.DependencyAnalysisFramework.csproj + +### Dependency expansion process +The compilation starts with a set of nodes in the dependency graph called compilation roots. The roots are specified by the compilation driver and typically contain the `Main()` method, but the exact set of roots depends on the compilation mode: the set of roots will be different when we're e.g. building a library, or when we're doing a multi-file compilation, or when we're building a single file app. + +The process begins by looking at the list of the root nodes and establishing their dependencies (dependent nodes). Once the dependencies are known, the compilation moves on to inspecting the dependencies of the dependencies, and so on, until all dependencies are known and marked within the dependency graph. When that happens, the compilation is done. + +The expansion of the graph is required to stay within the limits of a compilation group. Compilation group is a component that controls how the dependency graph is expanded. The role of it is best illustrated by contrasting a multifile and single file compilation: in a single file compilation, all methods and types that are statically reachable from the roots become part of the dependency graph, irrespective of the input assembly that defines them. In a multi-file compilation, some of the compilation happens as part of a different unit of work: the methods and types that are not part of the current unit of work shouldn't have their dependencies examined and they should not be a part of the dependency graph. + +The advantage of having the two abstractions (compilation roots, and a class that controls how the dependency graph is expanded) is that the core compilation process can be completely unaware of the specific compilation mode (e.g. whether we're building a library, or whether we're doing a multi-file compilation). The details are fully wrapped behind the two abstractions and give us a great expressive power for defining or experimenting with new compilation modes, while keeping all the logic in a single place. For example, we support a single method compilation mode where we compile only one method. This mode is useful for troubleshooting code generation. The compilation driver can define additional compilation modes (e.g. a mode that compiles a single type and all the associated methods) without having to change the compiler itself. + +Related classes: `ICompilationRootProvider`, `CompilationModuleGroup` + +### Dependency types +The dependency graph analysis can work with several kinds of dependencies between the nodes: +* **Static dependencies**: these are the most common. If node A is part of the dependency graph and it declares it requires node B, node B also becomes part of the dependency graph. +* **Conditional dependencies**: Node A declares that it depends on node B, but only if node C is part of the graph. If that's the case, node B will become part of the graph if both A and C are in the graph. +* **Dynamic dependencies**: These are quite expensive to have in the system, so we only use them rarely. They let the node inspect other nodes within the graph and inject nodes based on their presence. They are pretty much only used for analysis of generic virtual methods. + +To show how the dependency graph looks like in real life let's look at an example of how an (optional) optimization within the compiler around virtual method usage tracking works: + +```csharp +abstract class Foo +{ + public abstract void VirtualMethod(); + public virtual void UnusedVirtualMethod() { } +} + +class Bar : Foo +{ + public override void VirtualMethod() { } + public override void UnusedVirtualMethod() { } +} + +class Baz : Foo +{ + public override void VirtualMethod() { } +} + +class Program +{ + static int Main() + { + Foo f = new Bar(); + f.VirtualMethod(); + return f is Baz ? 0 : 100; + } +} +``` + +The dependency graph for the above program would look something like this: + +![Dependency graph](images/simple-dependency-graph.svg) + +The rectangle-shaped nodes represent method bodies, the oval-shaped nodes represent types, the dashed rectangles represent virtual method use, and the dotted oval-shaped node is an unconstructed type. +The dashed edges are conditional dependencies, with the condition marked on the label. + +* `Program::Main` creates a new instance of `Bar`. For that, it will allocate an object on the GC heap and call a constructor to initialize it. Therefore, it needs the data structure that represents the `Bar` type and `Bar`'s default constructor. The method then calls `VirtualMethod`. Even though from this simple example we know what specific method body this will end up calling (we can devirtualize the call in our head), we can't know in general, so we say `Program::Main` also depends on "Virtual method use of Foo::VirtualMethod". The last line of the program performs a type check. To do the type check, the generated code needs to reference a data structure that represents type `Baz`. The interesting thing about a type check is that we don't need to generate a full data structure describing the type, only enough to be able to tell if the cast succeeds. So we say `Program::Main` also depends on "unconstructed type data structure" for `Baz`. +* The data structure that represents type `Bar` has two important kinds of dependencies. It depends on its base type (`Foo`) - a pointer to it is required to make casting work - and it also contains the vtable. The entries in the vtable are conditional - if a virtual method is never called, we don't need to place it in the vtable. As a result of the situation in the graph, the method body for `Bar::VirtualMethod` is going to be part of the graph, but `Bar::UnusedVirtualMethod` will not, because it's conditioned on a node that is not present in the graph. +* The data structure that represents `Baz` is a bit different from `Bar`. We call this an "unconstructed type" structure. Unconstructed type structures don't contain a vtable, and therefore `Baz` is missing a virtual method use dependency for `Baz::VirtualMethod` conditioned on the use of `Foo::VirtualMethod`. + +Notice how using conditional dependencies helped us avoid compiling method bodies for `Foo::UnusedVirtualMethod` and `Bar::UnusedVirtualMethod` because the virtual method is never used. We also avoided generating `Baz::VirtualMethod`, because `Baz` was never allocated within the program. We generated the data structure that represents `Baz`, but because the data structure was only generated for the purposes of casting, it doesn't have a vtable that would pull `Baz::VirtualMethod` into the dependency graph. + +Note that while "constructed" and "unconstructed" type nodes are modelled separately in the dependency graph, at the object writing time they get coalesced into one. If the graph has a type in both the unconstructed and constructed form, only the constructed form will be emitted into the executable and places referring to the unconstructed form will be redirected to the constructed form, to maintain type identity. + +Related compiler switches: `--dgmllog` serializes the dependency graph into an XML file. The XML file captures all the nodes in the graph, but only captures the first edge leading to the node (knowing the first edge is enough for most purposes). `--fulllog` generates an even bigger XML file that captures all the edges. + +Related tools: [Dependency analysis viewer](../how-to-debug-compiler-dependency-analysis.md) is a tool that listens to ETW events generated by all the ILC compiler processes on the machine and lets you interactively explore the graph. + +## Object writing +The final phase of compilation is writing out the outputs. The output of the compilation depends on the target environment but will typically be some sort of object file. An object file typically consists of blobs of code or data with links (or relocations) between them, and symbols: named locations within a blob. The relocations point to symbols, either defined within the same object file, or in a different module. + +While the object file format is highly target specific, the compiler represents dependency nodes that have object data associated with them the same way irrespective of the target - with the `ObjectNode` class. `ObjectNode` class allows its children to specify the section where to place their data (code, read only data, uninitialized data, etc.), and crucially, the data itself (represented by the `ObjectData` class returned from `GetObjectData` method). + +On a high level, the role of the object writer is to go over all the marked `ObjectNode`s in the graph, retrieve their data, defined symbols, and relocations to other symbols, and store them in the object file. + +NativeAOT compiler contains multiple object writers: +* Native object writer (`src/coreclr/tools/aot/ObjWriter`) based on LLVM that is capable of producing Windows PE, Linux ELF, and macOS Mach-O file formats +* Native object writer based on LLVM for WebAssembly +* Ready to run object writer that generates mixed CIL/native executables in the ready to run format for CoreCLR + +Related command line arguments: `--map` produces a map of all the object nodes that were emitted into the object file. + +## Optimization pluggability +An advantage of a fully ahead of time compiled environment is that the compiler can make closed world assumptions about the code being compiled. For example: lacking the ability to load arbitrary CIL at runtime (either through `Assembly.Load`, or `Reflection.Emit`), if the compiler sees that there's only one type implementing an interface, it can replace all the interface calls in the program with direct calls, and apply additional optimizations enabled by it, such as inlining. If the target environment allowed dynamic code, such optimization would be invalid. + +The compiler is structured to allow such optimizations, but remains policy-free as to when the optimization should be applied. This allow both fully AOT compiled and mixed (JIT/interpreted) code execution strategies. The policies are always captured in an abstract class or an interface, the implementation of which is selected by the compilation driver and passed to the compilation builder. This allows a great degree of flexibility and gives a lot of power to influence the compilation from the compilation driver, without hardcoding the conditions when the optimization is applicable into the compiler. + +An example of such policy is the virtual method table (vtable) generation policy. The compiler can build vtables two ways: lazily, or by reading the type's metadata and generating a vtable slot for every new virtual method present in the type's method list. The depenceny analysis example graph a couple sections above was describing how conditional dependencies can be used to track what vtable slots and virtual method bodies we need to generate for a program to work. This is an example of an optimization that requires closed world assumptions. The policy is captured in a `VTableSliceProvider` class and allows the driver to select the vtable generation policy per type. This allows the compilation driver a great degree of control to fine tune when the optimization is allowed to happen (e.g. even in the presence of a JIT, we could still allow this optimization to happen on types that are not visible/accessible from the non-AOT compiled parts of the program or through reflection). + +The policies that can be configured in the driver span a wide range of areas: generation of reflection metadata, devirtualization, generation of vtables, generation of stack trace metadata for `Exception.ToString`, generation of debug information, the source of IL for method bodies, etc. + +## IL scanning + +Another component of ILC is the IL scanner. IL scanning is an optional step that can be executed before the compilation. In many ways, the IL scanning acts as another compilation with a null/dummy code generation backend. The IL scanner scans the IL of all the method bodies that become part of the dependency graph starting from the roots and expands their dependencies. The IL scanner ends up building the same dependency graph a code generation backend would, but the nodes in the graph that represent method bodies don't have any machine code instructions associated with them. This process is relatively fast since there's no code generation involved, but the resulting graph contains a lot of valuable insights into the compiled program. The dependency graph built by the IL scanner is a strict superset of the graph built by a real compilation since the IL scanner doesn't model optimizations such as inlining and devirtualization. + +The results of the IL scanner are input into the subsequent compilation process. For example, the IL scanner can use the lazy vtable generation policy to build vtables with just the slots needed, and assign slot numbers to each slot in the vtable at the end of scanning. The vtable layouts computed lazily during scanning can then be used by the real compilation process to inline vtable lookups at the callsites. Inlining the vtable lookup at the callsite would not be possible with a lazy vtable generation policy because the exact slot assignments of lazy vtables aren't stable until the compilation is done. + +The IL scanning process is optional and the compilation driver can skip it if compilation throughput is more important than runtime code quality. + +Related classes: `ILScanner` + +## Coupling with the base class libraries +The compiler has a certain level of coupling with the underlying base class library (the `System.Private.*` libraries within the repo). The coupling is twofold: +* Binary format of the generated data structures +* Expectations about the existence of certain methods within the core library + +Examples of the binary formats generated by the compiler and used by the base class libraries would be the format of the data structure that represents a type at runtime (`MethodTable`), or the blob of bytes that describes non-essential information about the type (such as the type name, or a list of methods). These data structures form a contract and allow the managed code in the base class library to provide rich services to user code through library APIs at runtime (such as the reflection APIs). Generation of some of these data structures is optional, but for some it's mandatory because they're required to execute any managed code. + +The compiler also needs to call into some well-known entrypoints within the base class library to support the generated code. The base class library needs to define these methods. Examples of such entrypoints would be various helpers to throw `OverflowException` during mathematical operations, `IndexOutOfRangeException` during array access, or various helpers to aid in generating p/invoke marshalling code (e.g. converting a UTF-16 string to ANSI and back before/after invoking the native method). + +One interesting thing to point out is that the coupling of the compiler with the base class libraries is relatively loose (there are only few mandatory parts). This allows different base class libraries to be used with ILC. Such base class libraries could look quite different from what regular .NET developers are used to (e.g. a `System.Object` that doesn't have a `ToString` method) but could allow using type safe code in environments where regular .NET would be considered "too heavy". Various experiments with such lightweight code have been done in the past, and some of them even shipped as part of the Windows operating system. + +Example of such alternative base class library is [Test.CoreLib](../../../../src/coreclr/nativeaot/Test.CoreLib/). The `Test.CoreLib` library provides a very minimal API surface. This, coupled with the fact that it requires almost no initialization, makes it a great assistant in bringing NativeAOT to new platforms. + +## Compiler-generated method bodies +Besides compiling the code provided by the user in the form of input assemblies, the compiler also needs to compile various helpers generated within the compiler. The helpers are used to lower some of the higher-level .NET constructs into concepts that the underlying code generation backend understands. These helpers are emitted as IL code on the fly, without being physically backed by IL in an assembly on disk. Having the higher level concepts expressed as regular IL helps avoid having to implement the higher-level concept in each code generation backend (we only must do it once because IL is the thing all backends understand). + +The helpers serve various purposes such as: +* Helpers to support invoking delegates +* Helpers that support marshalling parameters and return values for P/Invoke +* Helpers that support `ValueType.GetHashCode` and `ValueType.Equals` +* Helpers that support reflection: e.g. `Assembly.GetExecutingAssembly` + +Related classes: `ILEmitter`, `ILStubMethod` + +Related ILC command line switches: `--ildump` to dump all generated IL into a file and map debug information to it (allows source stepping through the generated IL at runtime). diff --git a/docs/design/coreclr/botr/images/simple-dependency-graph.gv b/docs/design/coreclr/botr/images/simple-dependency-graph.gv new file mode 100644 index 0000000000000000000000000000000000000000..48fe5e44e4460c6d31a43823687e3f5acc1fe7a3 --- /dev/null +++ b/docs/design/coreclr/botr/images/simple-dependency-graph.gv @@ -0,0 +1,43 @@ +digraph "SimpleDependencyGraph" { + +ordering=out; +rankdir=LR; + +node [shape=box]; +Code_Program_Main[label="Program::Main"]; +Code_Bar__ctor[label="Bar::.ctor"]; +Code_Bar_VirtualMethod[label="Bar::VirtualMethod"]; +Code_Bar_UnusedVirtualMethod[label="Bar::UnusedVirtualMethod"]; +Code_Foo_UnusedVirtualMethod[label="Foo::UnusedVirtualMethod"]; +Code_Foo__ctor[label="Foo::.ctor"]; +Code_Object__ctor[label="Object::.ctor"]; + +node [shape=ellipse]; +Type_Bar[label="Bar"]; +Type_Foo[label="Foo"]; +Type_Object[label="Object"]; + +node [shape=ellipse, style=dotted] +Type_Baz[label="Baz"] + +node [shape=box, style=dashed]; +Virtual_Foo_VirtualMethod[label="Foo::VirtualMethod"]; + +Code_Program_Main -> Code_Bar__ctor; +Code_Program_Main -> Type_Bar; +Code_Program_Main -> Virtual_Foo_VirtualMethod; +Code_Program_Main -> Type_Baz; +Type_Baz -> Type_Foo; + +Type_Bar -> Type_Foo; +Type_Foo -> Type_Object; +Type_Bar -> Code_Bar_VirtualMethod[label="Foo::VirtualMethod is used", style=dashed]; +Type_Bar -> Code_Bar_UnusedVirtualMethod[label="Foo::UnusedVirtualMethod is used", style=dashed]; +Type_Foo -> Code_Foo_UnusedVirtualMethod[label="Foo::UnusedVirtualMethod is used", style=dashed]; +Code_Bar__ctor -> Code_Foo__ctor; +Code_Foo__ctor -> Code_Object__ctor; + +overlap=false; +fontsize=12; + +} diff --git a/docs/design/coreclr/botr/images/simple-dependency-graph.svg b/docs/design/coreclr/botr/images/simple-dependency-graph.svg new file mode 100644 index 0000000000000000000000000000000000000000..60ca83174222ae61fec29acaf445ca4c6a5f127e --- /dev/null +++ b/docs/design/coreclr/botr/images/simple-dependency-graph.svg @@ -0,0 +1,136 @@ + + + + + + +SimpleDependencyGraph + + +Code_Program_Main + +Program::Main + + +Code_Bar__ctor + +Bar::.ctor + + +Code_Program_Main->Code_Bar__ctor + + + + +Type_Bar + +Bar + + +Code_Program_Main->Type_Bar + + + + +Type_Baz + +Baz + + +Code_Program_Main->Type_Baz + + + + +Virtual_Foo_VirtualMethod + +Foo::VirtualMethod + + +Code_Program_Main->Virtual_Foo_VirtualMethod + + + + +Code_Foo__ctor + +Foo::.ctor + + +Code_Bar__ctor->Code_Foo__ctor + + + + +Code_Bar_VirtualMethod + +Bar::VirtualMethod + + +Code_Bar_UnusedVirtualMethod + +Bar::UnusedVirtualMethod + + +Code_Foo_UnusedVirtualMethod + +Foo::UnusedVirtualMethod + + +Code_Object__ctor + +Object::.ctor + + +Code_Foo__ctor->Code_Object__ctor + + + + +Type_Bar->Code_Bar_VirtualMethod + + +Foo::VirtualMethod is used + + +Type_Bar->Code_Bar_UnusedVirtualMethod + + +Foo::UnusedVirtualMethod is used + + +Type_Foo + +Foo + + +Type_Bar->Type_Foo + + + + +Type_Foo->Code_Foo_UnusedVirtualMethod + + +Foo::UnusedVirtualMethod is used + + +Type_Object + +Object + + +Type_Foo->Type_Object + + + + +Type_Baz->Type_Foo + + + + + diff --git a/docs/workflow/building/coreclr/nativeaot.md b/docs/workflow/building/coreclr/nativeaot.md new file mode 100644 index 0000000000000000000000000000000000000000..576939fe9e3b70ee76a63916a9310c5c2c3a0f0f --- /dev/null +++ b/docs/workflow/building/coreclr/nativeaot.md @@ -0,0 +1,83 @@ +# Native AOT Developer Workflow + +The Native AOT toolchain can be currently built for Linux (x64/arm64), macOS (x64) and Windows (x64/arm64). + +## High Level Overview + +Native AOT is a stripped down version of the CoreCLR runtime specialized for ahead of time compilation, with an accompanying ahead of time compiler. + +The main components of the toolchain are: + +* The AOT compiler (ILC/ILCompiler) built on a shared codebase with crossgen2 (src/coreclr/tools/aot). Where crossgen2 generates ReadyToRun modules that contain code and data structures for the CoreCLR runtime, ILC generates code and self-describing datastructures for a stripped down version of CoreCLR into object files. The object files use platform specific file formats (COFF with CodeView on Windows, ELF with DWARF on Linux, and Mach-O with DWARF on macOS). +* The stripped down CoreCLR runtime (NativeAOT specific files in src/coreclr/nativeaot/Runtime, the rest included from the src/coreclr). The stripped down runtime is built into a static library that is linked with object file generated by the AOT compiler using a platform-specific linker (link.exe on Windows, ld on Linux/macOS) to form a standalone executable. +* The bootstrapper library (src/coreclr/nativeaot/Bootstrap). This is a small native library that contains the actual native `main()` entrypoint and bootstraps the runtime and dispatches to managed code. Two flavors of the bootstrapper are built - one for executables, and another for dynamic libraries. +* The core libraries (src/coreclr/nativeaot): System.Private.CoreLib (corelib), System.Private.Reflection.* (the implementation of reflection), System.Private.TypeLoader (ability to load new types that were not generated statically). +* The dotnet integration (src/coreclr/nativeaot/BuildIntegration). This is a set of .targets/.props files that hook into `dotnet publish` to run the AOT compiler and execute the platform linker. + +The AOT compiler typically takes the app, core libraries, and framework libraries as input. It then compiles the whole program into a single object file. Then the object file is linked to form a runnable executable. The executable is standalone (doesn't require a runtime), modulo any managed DllImports. + +The executable looks like a native executable, in the sense that it can be debugged with native debuggers and have full-fidelity access to locals, and stepping information. + +## Building + +- [Install pre-requisites](../../README.md#build-requirements) +- Run `build[.cmd|.sh] clr+libs -rc [Debug|Release] -lc Release` from the repo root. This will restore nuget packages required for building and build the parts of the repo required for general CoreCLR development. Alternatively, instead of specifying `clr+libs`, you can specify `clr.jit+clr.tools+clr.nativeaotlibs+libs` which is more targeted and builds faster. Replace `clr.jit` with `clr.alljits` if you need to crosscompile. +- [NOT PORTED OVER YET] The build will place the toolchain packages at `artifacts\packages\[Debug|Release]\Shipping`. To publish your project using these packages: + - [NOT PORTED OVER YET] Add the package directory to your `nuget.config` file. For example, replace `dotnet-experimental` line in `samples\HelloWorld\nuget.config` with `` + - [NOT PORTED OVER YET] Run `dotnet publish --packages pkg -r [win-x64|linux-x64|osx-64] -c [Debug|Release]` to publish your project. `--packages pkg` option restores the package into a local directory that is easy to cleanup once you are done. It avoids polluting the global nuget cache with your locally built dev package. +- *Optional*. The ObjWriter component of the AOT compiler is not built by default. If you're working on ObjWriter or bringing up a new platform that doesn't have ObjWriter packages yet, as additional pre-requiresites you need to run `build[.cmd|.sh] clr.objwriter` from the repo root before building the product. + +## Visual Studio Solutions + +The repository has a number of Visual Studio Solutions files (`*.sln`) that are useful for editing parts of the repository. Build the repo from command line first before building using the solution files. Remember to select the appropriate configuration that you built. By default, `build.cmd` builds Debug x64 and so `Debug` and `x64` must be selected in the solution build configuration drop downs. + +- `src\coreclr\nativeaot\nativeaot.sln`. This solution is for the runtime libraries. +- `src\coreclr\tools\aot\ilc.sln`. This solution is for the compiler. + +Typical workflow for working on the compiler: +- Open `ilc.sln` in Visual Studio +- Set "ILCompiler" project in solution explorer as your startup project +- Set Working directory in the project Debug options to your test project directory, e.g. `C:\runtimelab\samples\HelloWorld` +- Set Application arguments in the project Debug options to the response file that was generated by regular native aot publishing of your test project, e.g. `@obj\Release\net6.0\win-x64\native\HelloWorld.ilc.rsp` +- Build & run using **F5** + +## Convenience Visual Studio "repro" project + +Typical native AOT runtime developer scenario workflow is to native AOT compile a short piece of C# and run it. The repo contains helper projects that make debugging the AOT compiler and the runtime easier. + +The workflow looks like this: + +- Build the repo using the Building instructions above +- Open the ilc.sln solution described above. This solution contains the compiler, but also an unrelated project named "repro". This repro project is a small Hello World. You can place any piece of C# you would like to compile in it. Building the project will compile the source code into IL, but also generate a response file that is suitable to pass to the AOT compiler. +- Make sure you set the solution configuration in VS to the configuration you just built (e.g. x64 Debug). +- In the ILCompiler project properties, on the Debug tab, set the "Application arguments" to the generated response file. This will be a file such as "C:\runtime\artifacts\bin\repro\x64\Debug\compile-with-Release-libs.rsp". Prefix the path to the file with "@" to indicate this is a response file so that the "Application arguments" field looks like "@some\path\to\file.rsp". +- Build & run ILCompiler using **F5**. This will compile the repro project into an `.obj` file. You can debug the compiler and set breakpoints in it at this point. +- The last step is linking the object file into an executable so that we can launch the result of the AOT compilation. +- Open the src\coreclr\tools\aot\ILCompiler\reproNative\reproNative.vcxproj project in Visual Studio. This project is configured to pick up the `.obj` file we just compiled and link it with the rest of the runtime. +- Set the solution configuration to the tuple you've been using so far (e.g. x64 Debug) +- Build & run using **F5**. This will run the platform linker to link the obj file with the runtime and launch it. At this point you can debug the runtime and the various System.Private libraries. + +## Running tests + +If you haven't built the tests yet, run `src\tests\build[.cmd|.sh] nativeaot [Debug|Release] tree nativeaot`. This will build the smoke tests only - they usually suffice to ensure the runtime and compiler is in a workable shape. To build all Pri-0 tests, drop the `tree nativeaot` parameter. The `Debug`/`Release` parameter should match the build configuration you used to build the runtime. + +To run all the tests that got built, run `src\tests\run.cmd runnativeaottests [Debug|Release]` on Windows, or `src/tests/run.sh --runnativeaottests [Debug|Release]` on Linux. The `Debug`/`Release` flag should match the flag that was passed to `build.cmd` in the previous step. + +To run an individual test (after it was built), navigate to the `artifacts\tests\coreclr\[Windows|Linux|OSX[.x64.[Debug|Release]\$path_to_test` directory. `$path_to_test` matches the subtree of `src\tests`. You should see a `[.cmd|.sh]` file there. This file is a script that will compile and launch the individual test for you. Before invoking the script, set the following environment variables: + +* CORE_ROOT=$repo_root\artifacts\tests\coreclr\[Windows|Linux|OSX[.x64.[Debug|Release]\Tests\Core_Root +* RunNativeAot=1 +* __TestDotNetCmd=$repo_root\dotnet[.cmd|.sh] + +`$repo_root` is the root of your clone of the repo. + +By default the test suite will delete the build artifacts (Native AOT images and response files) if the test compiled successfully. If you want to keep these files instead of deleting them after test run, set the following environment variables and make sure you'll have enough disk space (tens of MB per test): + +* CLRTestNoCleanup=1 + +For more advanced scenarios, look for at [Building Test Subsets](../../testing/coreclr/windows-test-instructions.md#building-test-subsets) and [Generating Core_Root](../../testing/coreclr/windows-test-instructions.md#generating-core_root) + +## Design Documentation + +- [ILC Compiler Architecture](../../../design/coreclr/botr/ilc-architecture.md) +- [Managed Type System](../../../design/coreclr/botr/managed-type-system.md) diff --git a/docs/workflow/debugging/coreclr/debugging-crossgen2.md b/docs/workflow/debugging/coreclr/debugging-aot-compilers.md similarity index 64% rename from docs/workflow/debugging/coreclr/debugging-crossgen2.md rename to docs/workflow/debugging/coreclr/debugging-aot-compilers.md index 4c37061943d43c4f05dcf03a36b751fb96e1de83..e2fd7f7e3bb9058af4cb1f7a8c31309110ff1166 100644 --- a/docs/workflow/debugging/coreclr/debugging-crossgen2.md +++ b/docs/workflow/debugging/coreclr/debugging-aot-compilers.md @@ -1,23 +1,25 @@ -How to Debug Crossgen2 +How to Debug CoreCLR AOT Compilers ================= -Crossgen2 brings with it a number of new challenges for debugging the compilation process. Fortunately, in addition to challenges, crossgen2 is designed to enhance various parts of the debugging experience. +CoreCLR comes with two AOT compilers that are built around a shared C# codebase - crossgen2 and ilc. Crossgen2 generates ReadyToRun images that are loadable into the JIT-based CoreCLR runtime. ILC generates platform-specific object files (COFF on Windows, ELF on Linux, Mach-O on macOS) that can be linked with the NativeAOT flavor of the CoreCLR runtime to form a self-contained executable or a shared library. -Important concerns to be aware of when debugging Crossgen2 +The managed AOT compilers bring with them a number of new challenges for debugging the compilation process. Fortunately, in addition to challenges, the compilers are designed to enhance various parts of the debugging experience. + +Important concerns to be aware of when debugging the managed compilers --------------------------------- -- Other than the JIT, Crossgen2 is a managed application -- By default Crossgen2 uses a multi-core compilation strategy -- A Crossgen2 process will have 2 copies of the JIT in the process at the same time, the one used to compile the target, and the one used to compile Crossgen2 itself. -- Crossgen2 does not parse environment variables for controlling the JIT (or any other behavior), all behavior is controlled via the command line -- The Crossgen2 command line as generated by the project system is quite complex +- Other than the JIT, the AOT compilers are managed applications +- By default the AOT compilers use a multi-core compilation strategy +- A compilation process will have 2 copies of the JIT in the process at the same time, the one used to compile the target, and the one used to compile compiler itself. +- The compilers don't parse environment variables for controlling the JIT (or any other behavior), all behavior is controlled via the command line +- The AOT compiler command line as generated by the project system is quite complex -Built in debugging aids in Crossgen2 +Built in debugging aids in the managed compilers --------------------------------- -- When debugging a multi-threaded component of Crossgen2 and not investigating a multi-threading issue itself, it is generally advisable to disable the use of multiple threads. -To do this use the `--parallelism 1` switch to specify that the maximum parallelism of the process shall be 1. +- When debugging a multi-threaded component of the compiler and not investigating a multi-threading issue itself, it is generally advisable to disable the use of multiple threads. +To do this use the `--parallelism 1` switch (for crossgen2) or `--singlethreaded` (for ILC) to specify that the maximum parallelism of the process shall be 1. - When debugging the behavior of compiling a single method, the compiler may be instructed to only compile a single method. This is done via the various --singlemethod options @@ -25,13 +27,13 @@ To do this use the `--parallelism 1` switch to specify that the maximum parallel - `--singlemethodindex` is used in cases where the method signature is the only distinguishing factor about the method. An index is used instead of a series of descriptive arguments, as specifying a signature exactly is extraordinarily complicated. - Repro args will look like the following ``--singlemethodtypename "Internal.Runtime.CompilerServices.Unsafe" --singlemethodname As --singlemethodindex 2 --singlemethodgenericarg "System.Runtime.Intrinsics.Vector256`1[[System.SByte]]" --singlemethodgenericarg "System.Runtime.Intrinsics.Vector256`1[[System.Double]]"`` -- Since Crossgen2 is by default multi-threaded, it produces results fairly quickly even when compiling using a Debug variant of the JIT. In general, when debugging JIT issues we recommend using the debug JIT regardless of which environment caused a problem. +- Since the compilers are by default multi-threaded, they produce results fairly quickly even when compiling using a Debug variant of the JIT. In general, when debugging JIT issues we recommend using the debug JIT regardless of which environment caused a problem. -- Crossgen2 supports nearly arbitrary cross-targetting, including OS and architecture cross targeting. The only restriction is that 32bit architecture cannot compile targetting a 64bit architecture. This allows the use of the debugging environment most convenient to the developer. In particular, if there is an issue which crosses the managed/native boundary, it is often convenient to debug using the mixed mode debugger on Windows X64. - - If the correct set of assemblies/command line arguments are passed to the compiler Crossgen2 should produce binary identical output on all platforms. +- The compilers support nearly arbitrary cross-targetting, including OS and architecture cross targeting. The only restriction is that 32bit architecture cannot compile targetting a 64bit architecture. This allows the use of the debugging environment most convenient to the developer. In particular, if there is an issue which crosses the managed/native boundary, it is often convenient to debug using the mixed mode debugger on Windows X64. + - If the correct set of assemblies/command line arguments are passed to the compiler, it should produce binary identical output on all platforms. - The compiler does not check the OS/Architecture specified for input assemblies, which allows compiling using a non-architecture/OS matched version of the framework to target an arbitrary target. While this isn't useful for producing the diagnosing all issues, it can be cheaply used to identify the general behavior of a change on the full swath of supported architectures. -Control compilation behavior by using the `--targetos` and `--targetarch` switches. The default behavior is to target the crossgen2's own OS/Arch pair, but all 64bit versions of crossgen2 are capable of targetting arbitrary OS/Arch combinations. +Control compilation behavior by using the `--targetos` and `--targetarch` switches. The default behavior is to target the compiler's own OS/Arch pair, but all 64bit versions of the compilers are capable of targetting arbitrary OS/Arch combinations. At the time of writing the current supported sets of valid arguments are: | Command line arguments | --- | @@ -49,28 +51,38 @@ At the time of writing the current supported sets of valid arguments are: - When using the NgenDump feature of the JIT, disable parallelism as described above or specify a single method to be compiled. Otherwise, output from multiple functions will be interleaved and inscrutable. -- Since there are 2 jits in the process, when debugging in the JIT, if the source files match up, there is a decent chance that a native debugger will stop at unfortunate and unexpected locations. This is extremely annoying, and to combat this, we generally recommend making a point of using a runtime which doesn't exactly match that of the crossgen2 in use. However, if that isn't feasible, it is also possible to disable symbol loading in most native debuggers. For instance, in Visual Studio, one would use the "Specify excluded modules" feature. +- Since there are 2 jits in the process, when debugging in the JIT, if the source files match up, there is a decent chance that a native debugger will stop at unfortunate and unexpected locations. This is extremely annoying, and to combat this, we generally recommend making a point of using a runtime which doesn't exactly match that of the compiler in use. However, if that isn't feasible, it is also possible to disable symbol loading in most native debuggers. For instance, in Visual Studio, one would use the "Specify excluded modules" feature. -- Crossgen2 identifies the JIT to use by the means of a naming convention. By default it will use a JIT located in the same directory as the crossgen2.dll file. In addition there is support for a `--jitpath` switch to use a specific JIT. This option is intended to support A/B testing by the JIT team. The `--jitpath` option should only be used if the jit interface has not been changed. The JIT specified by the `--jitpath` switch must be compatible with the current settings of the `--targetos` and `--targetarch` switches. +- The compiler identifies the JIT to use by the means of a naming convention. By default it will use a JIT located in the same directory as the crossgen2.dll file. In addition there is support for a `--jitpath` switch to use a specific JIT. This option is intended to support A/B testing by the JIT team. The `--jitpath` option should only be used if the jit interface has not been changed. The JIT specified by the `--jitpath` switch must be compatible with the current settings of the `--targetos` and `--targetarch` switches. -- In parallel to the crossgen2 project, there is a tool known as r2rdump. This tool can be used to dump the contents of a produced image to examine what was actually produced in the final binary. It has a large multitude of options to control exactly what is dumped, but in general it is able to dump any image produced by crossgen2, and display its contents in a human readable fashion. Specify `--disasm` to display disassembly. +- In parallel to the crossgen2 project, there is a tool known as r2rdump. This tool can be used to dump the contents of a produced ReadyToRun image to examine what was actually produced in the final binary. It has a large multitude of options to control exactly what is dumped, but in general it is able to dump any image produced by crossgen2, and display its contents in a human readable fashion. Specify `--disasm` to display disassembly. -- If there is a need to debug the dependency graph of crossgen2 (which is a very rare need at this time), there is a visualization tool located in the corert repo. https://github.com/dotnet/corert/tree/master/src/ILCompiler.DependencyAnalysisFramework/ILCompiler-DependencyGraph-Viewer To use that tool, get the sources from the CoreRT repo, compile it, and run it on Windows before the crossgen2 compilation begins. It will present a live view of the graph as it is generated and allow for exploration to determine why some node is in the graph. Every node in the graph has a unique id that is visible to this tool, and it can be used in parallel with a debugger to understand what is happening in the crossgen2 process. Changes to move this tool to a more commonly built location and improve the fairly horrible UI are encouraged. +- If there is a need to debug the dependency graph of the compiler (which is a very rare need at this time), there is a visualization tool located in src\coreclr\tools\aot\DependencyGraphViewer. To use that tool, compile it, and run it on Windows before the compilation begins. It will present a live view of the graph as it is generated and allow for exploration to determine why some node is in the graph. Every node in the graph has a unique id that is visible to this tool, and it can be used in parallel with a debugger to understand what is happening in the compilation process. Changes to improve the fairly horrible UI are encouraged. -- When used in the official build system, the set of arguments passed to crossgen2 is extremely complex, especially with regards to the set of reference paths (each assembly is specified individually). To make it easier to use crossgen2 from the command line manually the tool will accept wildcards in its parsing of references. Please note that on Unix that the shell will happily expand these arguments by itself, which will not work correctly. In those situations enclose the argument in quotes to prevent the shell expansion. +- When used in the official build system, the set of arguments passed to the compiler are extremely complex, especially with regards to the set of reference paths (each assembly is specified individually). To make it easier to use crossgen2 from the command line manually the tool will accept wildcards in its parsing of references. Please note that on Unix that the shell will happily expand these arguments by itself, which will not work correctly. In those situations enclose the argument in quotes to prevent the shell expansion. - Crossgen2 supports a `--map` and `--mapcsv` arguments to produce map files of the produced output. These are primarily used for diagnosing size issues, as they describe the generated file in fairly high detail, as well as providing a number of interesting statistics about the produced output. +- ILC also supports the `--map` argument but the format is different from crossgen2 because the output format is different too. + - Diagnosing why a specific method failed to compile in crossgen2 can be done by passing the `--verbose` switch to crossgen2. This will print many things, but in particular it will print the reason why a compilation was aborted due to an R2R format limitation. -- Crossgen2 can use either the version of dotnet that is used to build the product (as found by the dotnet.cmd or dotnet.sh script found in the root of the runtime repo) or it can use a sufficiently recent corerun.exe produced by constructing a test build. It is strongly recommended if using corerun.exe to use a release build of corerun for this purpose, as crossgen2 runs a very large amount of managed code. The version of corerun used does not need to come from the same build as the crossgen2.dll that is being debugging. In fact, I would recommnend using a different enlistment to build that corerun to avoid any confusion. +- The compilers can use either the version of dotnet that is used to build the product (as found by the dotnet.cmd or dotnet.sh script found in the root of the runtime repo) or it can use a sufficiently recent corerun.exe produced by constructing a test build. It is strongly recommended if using corerun.exe to use a release build of corerun for this purpose, as crossgen2 runs a very large amount of managed code. The version of corerun used does not need to come from the same build as the crossgen2.dll/ilc.dll that is being debugging. In fact, I would recommnend using a different enlistment to build that corerun to avoid any confusion. + +- In the runtime testbed, each test can be commanded to compile with crossgen2 by using environment variables. Just set the `RunCrossgen2` variable to 1, and optionally set the `CompositeBuildMode` variable to 1 if you wish to see the R2R behavior with composite image creation. -- In the runtime testbed, each test can be commanded to compile with crossgen2 by using environment variables. Just set the `RunCrossgen2` variable to 1, and optionally set the `CompositeBuildMode` variable to 1 if you wish to see the R2R behavior with composite image creation. By default this will simply use `dotnet` to run crossgen2. If you run the test batch script from the root of the enlistment on Windows this will just work; otherwise, you must set the `__TestDotNetCmd` environment variable to point at copy of `dotnet` or `corerun` that can run crossgen2. This is often the easiest way to run a simple test with crossgen2 for developers practiced in the CoreCLR testbed. See the example below of various techniques to use when diagnosing issues under crossgen2. +- By default the runtime test bed will simply use `dotnet` to run the managed compiler. If you run the test batch script from the root of the enlistment on Windows this will just work; otherwise, you must set the `__TestDotNetCmd` environment variable to point at copy of `dotnet` or `corerun` that can run the compiler. This is often the easiest way to run a simple test with the AOT compiler for developers practiced in the CoreCLR testbed. See the example below of various techniques to use when diagnosing issues under crossgen2. - When attempting to build crossgen2, you must build the clr.tools subset. If rebuilding a component of the JIT and wanting to use that in your inner loop, you must build as well with either the clr.jit or clr.alljits subsets. If the jit interface is changed, the clr.runtime subset must also be rebuilt. - After completion of a product build, a functional copy of crossgen2.dll will be located in a bin directory in a path like `bin\coreclr\windows.x64.Debug\crossgen2`. After creating a test native layout via a command such as `src\tests\build generatelayoutonly` then there will be a copy of crossgen2 located in the `%CORE_ROOT%\crossgen2` directory. The version of crossgen2 in the test core_root directory will have the appropriate files for running under either an x64 dotnet.exe or under the target architecture. This was done to make it somewhat easier to do cross platform development, and assumes the primary development machine is x64, +- The object files generated by the ILC compiler contain debug information for method bodies and types in the platform specific format (CodeView on Windows, DWARF elsewhere). They also contain unwinding information in the platform format. As a result of that, NativeAOT executables can be debugged with platform debuggers (VS, WinDbg, GDB, LLDB) without any SOS-like extensions. They can also be inspected using disassemblers and tools that deal with native code (dumpbin, Ghidra, IDA). Make sure to pass `-g` command line argument to enable debug info generation. + +- The ILC compiler typically compiles the whole program - it loosely corresponds to the composite mode of crossgen2. There is a multifile mode, where each managed assembly corresponds to a single object file, but this mode is not shipping. + +- The object files generated by the ILC compiler are written out using an LLVM-based object writer (src\coreclr\tools\aot\ObjWriter). The object writer uses the LLVM assembler APIs (APIs meant to be used by tools that convert textual assembly into machine code) to emit object files in PE/ELF/Mach-O formats. Normally the object writer is not built as part of the repo, but gets downloaded through NuGet. If you need to debug the object writer, you can build it by specifying `clr.objwriter` subset to the root build script. It takes about 5 minutes to compile the object writer. + Example of debugging a test application in Crossgen2 ================================================ This example is to demonstrate debugging of a simple test in the CoreCLR testbed. diff --git a/docs/workflow/debugging/coreclr/debugging.md b/docs/workflow/debugging/coreclr/debugging.md index 3a11178ce90d0a02d275ecd9392bfeeedab26add..be4326ca66e07727cb351bc51660991f5ff9066c 100644 --- a/docs/workflow/debugging/coreclr/debugging.md +++ b/docs/workflow/debugging/coreclr/debugging.md @@ -73,10 +73,10 @@ The "COMPlus_EnableDiagnostics" environment variable can be used to disable mana export COMPlus_EnableDiagnostics=0 -Debugging Crossgen2 +Debugging AOT compilers ================================== -Debugging Crossgen2 is described in [this](debugging-crossgen2.md) document. +Debugging AOT compilers is described in [this](debugging-aot-compilers.md) document. Using Visual Studio Code ========================