diff --git a/Documentation/devicetree/usage-model.txt b/Documentation/devicetree/usage-model.txt new file mode 100644 index 0000000000000000000000000000000000000000..c5a80099b71c082af76629623923424b2acf0688 --- /dev/null +++ b/Documentation/devicetree/usage-model.txt @@ -0,0 +1,412 @@ +Linux and the Device Tree +------------------------- +The Linux usage model for device tree data + +Author: Grant Likely + +This article describes how Linux uses the device tree. An overview of +the device tree data format can be found on the device tree usage page +at devicetree.org[1]. + +[1] http://devicetree.org/Device_Tree_Usage + +The "Open Firmware Device Tree", or simply Device Tree (DT), is a data +structure and language for describing hardware. More specifically, it +is a description of hardware that is readable by an operating system +so that the operating system doesn't need to hard code details of the +machine. + +Structurally, the DT is a tree, or acyclic graph with named nodes, and +nodes may have an arbitrary number of named properties encapsulating +arbitrary data. A mechanism also exists to create arbitrary +links from one node to another outside of the natural tree structure. + +Conceptually, a common set of usage conventions, called 'bindings', +is defined for how data should appear in the tree to describe typical +hardware characteristics including data busses, interrupt lines, GPIO +connections, and peripheral devices. + +As much as possible, hardware is described using existing bindings to +maximize use of existing support code, but since property and node +names are simply text strings, it is easy to extend existing bindings +or create new ones by defining new nodes and properties. Be wary, +however, of creating a new binding without first doing some homework +about what already exists. There are currently two different, +incompatible, bindings for i2c busses that came about because the new +binding was created without first investigating how i2c devices were +already being enumerated in existing systems. + +1. History +---------- +The DT was originally created by Open Firmware as part of the +communication method for passing data from Open Firmware to a client +program (like to an operating system). An operating system used the +Device Tree to discover the topology of the hardware at runtime, and +thereby support a majority of available hardware without hard coded +information (assuming drivers were available for all devices). + +Since Open Firmware is commonly used on PowerPC and SPARC platforms, +the Linux support for those architectures has for a long time used the +Device Tree. + +In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit +and 64-bit support, the decision was made to require DT support on all +powerpc platforms, regardless of whether or not they used Open +Firmware. To do this, a DT representation called the Flattened Device +Tree (FDT) was created which could be passed to the kernel as a binary +blob without requiring a real Open Firmware implementation. U-Boot, +kexec, and other bootloaders were modified to support both passing a +Device Tree Binary (dtb) and to modify a dtb at boot time. DT was +also added to the PowerPC boot wrapper (arch/powerpc/boot/*) so that +a dtb could be wrapped up with the kernel image to support booting +existing non-DT aware firmware. + +Some time later, FDT infrastructure was generalized to be usable by +all architectures. At the time of this writing, 6 mainlined +architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1 +out of mainline (nios) have some level of DT support. + +2. Data Model +------------- +If you haven't already read the Device Tree Usage[1] page, +then go read it now. It's okay, I'll wait.... + +2.1 High Level View +------------------- +The most important thing to understand is that the DT is simply a data +structure that describes the hardware. There is nothing magical about +it, and it doesn't magically make all hardware configuration problems +go away. What it does do is provide a language for decoupling the +hardware configuration from the board and device driver support in the +Linux kernel (or any other operating system for that matter). Using +it allows board and device support to become data driven; to make +setup decisions based on data passed into the kernel instead of on +per-machine hard coded selections. + +Ideally, data driven platform setup should result in less code +duplication and make it easier to support a wide range of hardware +with a single kernel image. + +Linux uses DT data for three major purposes: +1) platform identification, +2) runtime configuration, and +3) device population. + +2.2 Platform Identification +--------------------------- +First and foremost, the kernel will use data in the DT to identify the +specific machine. In a perfect world, the specific platform shouldn't +matter to the kernel because all platform details would be described +perfectly by the device tree in a consistent and reliable manner. +Hardware is not perfect though, and so the kernel must identify the +machine during early boot so that it has the opportunity to run +machine-specific fixups. + +In the majority of cases, the machine identity is irrelevant, and the +kernel will instead select setup code based on the machine's core +CPU or SoC. On ARM for example, setup_arch() in +arch/arm/kernel/setup.c will call setup_machine_fdt() in +arch/arm/kernel/devicetree.c which searches through the machine_desc +table and selects the machine_desc which best matches the device tree +data. It determines the best match by looking at the 'compatible' +property in the root device tree node, and comparing it with the +dt_compat list in struct machine_desc. + +The 'compatible' property contains a sorted list of strings starting +with the exact name of the machine, followed by an optional list of +boards it is compatible with sorted from most compatible to least. For +example, the root compatible properties for the TI BeagleBoard and its +successor, the BeagleBoard xM board might look like: + + compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3"; + compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3"; + +Where "ti,omap3-beagleboard-xm" specifies the exact model, it also +claims that it compatible with the OMAP 3450 SoC, and the omap3 family +of SoCs in general. You'll notice that the list is sorted from most +specific (exact board) to least specific (SoC family). + +Astute readers might point out that the Beagle xM could also claim +compatibility with the original Beagle board. However, one should be +cautioned about doing so at the board level since there is typically a +high level of change from one board to another, even within the same +product line, and it is hard to nail down exactly what is meant when one +board claims to be compatible with another. For the top level, it is +better to err on the side of caution and not claim one board is +compatible with another. The notable exception would be when one +board is a carrier for another, such as a CPU module attached to a +carrier board. + +One more note on compatible values. Any string used in a compatible +property must be documented as to what it indicates. Add +documentation for compatible strings in Documentation/devicetree/bindings. + +Again on ARM, for each machine_desc, the kernel looks to see if +any of the dt_compat list entries appear in the compatible property. +If one does, then that machine_desc is a candidate for driving the +machine. After searching the entire table of machine_descs, +setup_machine_fdt() returns the 'most compatible' machine_desc based +on which entry in the compatible property each machine_desc matches +against. If no matching machine_desc is found, then it returns NULL. + +The reasoning behind this scheme is the observation that in the majority +of cases, a single machine_desc can support a large number of boards +if they all use the same SoC, or same family of SoCs. However, +invariably there will be some exceptions where a specific board will +require special setup code that is not useful in the generic case. +Special cases could be handled by explicitly checking for the +troublesome board(s) in generic setup code, but doing so very quickly +becomes ugly and/or unmaintainable if it is more than just a couple of +cases. + +Instead, the compatible list allows a generic machine_desc to provide +support for a wide common set of boards by specifying "less +compatible" value in the dt_compat list. In the example above, +generic board support can claim compatibility with "ti,omap3" or +"ti,omap3450". If a bug was discovered on the original beagleboard +that required special workaround code during early boot, then a new +machine_desc could be added which implements the workarounds and only +matches on "ti,omap3-beagleboard". + +PowerPC uses a slightly different scheme where it calls the .probe() +hook from each machine_desc, and the first one returning TRUE is used. +However, this approach does not take into account the priority of the +compatible list, and probably should be avoided for new architecture +support. + +2.3 Runtime configuration +------------------------- +In most cases, a DT will be the sole method of communicating data from +firmware to the kernel, so also gets used to pass in runtime and +configuration data like the kernel parameters string and the location +of an initrd image. + +Most of this data is contained in the /chosen node, and when booting +Linux it will look something like this: + + chosen { + bootargs = "console=ttyS0,115200 loglevel=8"; + initrd-start = <0xc8000000>; + initrd-end = <0xc8200000>; + }; + +The bootargs property contains the kernel arguments, and the initrd-* +properties define the address and size of an initrd blob. The +chosen node may also optionally contain an arbitrary number of +additional properties for platform-specific configuration data. + +During early boot, the architecture setup code calls of_scan_flat_dt() +several times with different helper callbacks to parse device tree +data before paging is setup. The of_scan_flat_dt() code scans through +the device tree and uses the helpers to extract information required +during early boot. Typically the early_init_dt_scan_chosen() helper +is used to parse the chosen node including kernel parameters, +early_init_dt_scan_root() to initialize the DT address space model, +and early_init_dt_scan_memory() to determine the size and +location of usable RAM. + +On ARM, the function setup_machine_fdt() is responsible for early +scanning of the device tree after selecting the correct machine_desc +that supports the board. + +2.4 Device population +--------------------- +After the board has been identified, and after the early configuration data +has been parsed, then kernel initialization can proceed in the normal +way. At some point in this process, unflatten_device_tree() is called +to convert the data into a more efficient runtime representation. +This is also when machine-specific setup hooks will get called, like +the machine_desc .init_early(), .init_irq() and .init_machine() hooks +on ARM. The remainder of this section uses examples from the ARM +implementation, but all architectures will do pretty much the same +thing when using a DT. + +As can be guessed by the names, .init_early() is used for any machine- +specific setup that needs to be executed early in the boot process, +and .init_irq() is used to set up interrupt handling. Using a DT +doesn't materially change the behaviour of either of these functions. +If a DT is provided, then both .init_early() and .init_irq() are able +to call any of the DT query functions (of_* in include/linux/of*.h) to +get additional data about the platform. + +The most interesting hook in the DT context is .init_machine() which +is primarily responsible for populating the Linux device model with +data about the platform. Historically this has been implemented on +embedded platforms by defining a set of static clock structures, +platform_devices, and other data in the board support .c file, and +registering it en-masse in .init_machine(). When DT is used, then +instead of hard coding static devices for each platform, the list of +devices can be obtained by parsing the DT, and allocating device +structures dynamically. + +The simplest case is when .init_machine() is only responsible for +registering a block of platform_devices. A platform_device is a concept +used by Linux for memory or I/O mapped devices which cannot be detected +by hardware, and for 'composite' or 'virtual' devices (more on those +later). While there is no 'platform device' terminology for the DT, +platform devices roughly correspond to device nodes at the root of the +tree and children of simple memory mapped bus nodes. + +About now is a good time to lay out an example. Here is part of the +device tree for the NVIDIA Tegra board. + +/{ + compatible = "nvidia,harmony", "nvidia,tegra20"; + #address-cells = <1>; + #size-cells = <1>; + interrupt-parent = <&intc>; + + chosen { }; + aliases { }; + + memory { + device_type = "memory"; + reg = <0x00000000 0x40000000>; + }; + + soc { + compatible = "nvidia,tegra20-soc", "simple-bus"; + #address-cells = <1>; + #size-cells = <1>; + ranges; + + intc: interrupt-controller@50041000 { + compatible = "nvidia,tegra20-gic"; + interrupt-controller; + #interrupt-cells = <1>; + reg = <0x50041000 0x1000>, < 0x50040100 0x0100 >; + }; + + serial@70006300 { + compatible = "nvidia,tegra20-uart"; + reg = <0x70006300 0x100>; + interrupts = <122>; + }; + + i2s1: i2s@70002800 { + compatible = "nvidia,tegra20-i2s"; + reg = <0x70002800 0x100>; + interrupts = <77>; + codec = <&wm8903>; + }; + + i2c@7000c000 { + compatible = "nvidia,tegra20-i2c"; + #address-cells = <1>; + #size-cells = <0>; + reg = <0x7000c000 0x100>; + interrupts = <70>; + + wm8903: codec@1a { + compatible = "wlf,wm8903"; + reg = <0x1a>; + interrupts = <347>; + }; + }; + }; + + sound { + compatible = "nvidia,harmony-sound"; + i2s-controller = <&i2s1>; + i2s-codec = <&wm8903>; + }; +}; + +At .machine_init() time, Tegra board support code will need to look at +this DT and decide which nodes to create platform_devices for. +However, looking at the tree, it is not immediately obvious what kind +of device each node represents, or even if a node represents a device +at all. The /chosen, /aliases, and /memory nodes are informational +nodes that don't describe devices (although arguably memory could be +considered a device). The children of the /soc node are memory mapped +devices, but the codec@1a is an i2c device, and the sound node +represents not a device, but rather how other devices are connected +together to create the audio subsystem. I know what each device is +because I'm familiar with the board design, but how does the kernel +know what to do with each node? + +The trick is that the kernel starts at the root of the tree and looks +for nodes that have a 'compatible' property. First, it is generally +assumed that any node with a 'compatible' property represents a device +of some kind, and second, it can be assumed that any node at the root +of the tree is either directly attached to the processor bus, or is a +miscellaneous system device that cannot be described any other way. +For each of these nodes, Linux allocates and registers a +platform_device, which in turn may get bound to a platform_driver. + +Why is using a platform_device for these nodes a safe assumption? +Well, for the way that Linux models devices, just about all bus_types +assume that its devices are children of a bus controller. For +example, each i2c_client is a child of an i2c_master. Each spi_device +is a child of an SPI bus. Similarly for USB, PCI, MDIO, etc. The +same hierarchy is also found in the DT, where I2C device nodes only +ever appear as children of an I2C bus node. Ditto for SPI, MDIO, USB, +etc. The only devices which do not require a specific type of parent +device are platform_devices (and amba_devices, but more on that +later), which will happily live at the base of the Linux /sys/devices +tree. Therefore, if a DT node is at the root of the tree, then it +really probably is best registered as a platform_device. + +Linux board support code calls of_platform_populate(NULL, NULL, NULL) +to kick off discovery of devices at the root of the tree. The +parameters are all NULL because when starting from the root of the +tree, there is no need to provide a starting node (the first NULL), a +parent struct device (the last NULL), and we're not using a match +table (yet). For a board that only needs to register devices, +.init_machine() can be completely empty except for the +of_platform_populate() call. + +In the Tegra example, this accounts for the /soc and /sound nodes, but +what about the children of the SoC node? Shouldn't they be registered +as platform devices too? For Linux DT support, the generic behaviour +is for child devices to be registered by the parent's device driver at +driver .probe() time. So, an i2c bus device driver will register a +i2c_client for each child node, an SPI bus driver will register +its spi_device children, and similarly for other bus_types. +According to that model, a driver could be written that binds to the +SoC node and simply registers platform_devices for each of its +children. The board support code would allocate and register an SoC +device, a (theoretical) SoC device driver could bind to the SoC device, +and register platform_devices for /soc/interrupt-controller, /soc/serial, +/soc/i2s, and /soc/i2c in its .probe() hook. Easy, right? + +Actually, it turns out that registering children of some +platform_devices as more platform_devices is a common pattern, and the +device tree support code reflects that and makes the above example +simpler. The second argument to of_platform_populate() is an +of_device_id table, and any node that matches an entry in that table +will also get its child nodes registered. In the tegra case, the code +can look something like this: + +static void __init harmony_init_machine(void) +{ + /* ... */ + of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL); +} + +"simple-bus" is defined in the ePAPR 1.0 specification as a property +meaning a simple memory mapped bus, so the of_platform_populate() code +could be written to just assume simple-bus compatible nodes will +always be traversed. However, we pass it in as an argument so that +board support code can always override the default behaviour. + +[Need to add discussion of adding i2c/spi/etc child devices] + +Appendix A: AMBA devices +------------------------ + +ARM Primecells are a certain kind of device attached to the ARM AMBA +bus which include some support for hardware detection and power +management. In Linux, struct amba_device and the amba_bus_type is +used to represent Primecell devices. However, the fiddly bit is that +not all devices on an AMBA bus are Primecells, and for Linux it is +typical for both amba_device and platform_device instances to be +siblings of the same bus segment. + +When using the DT, this creates problems for of_platform_populate() +because it must decide whether to register each node as either a +platform_device or an amba_device. This unfortunately complicates the +device creation model a little bit, but the solution turns out not to +be too invasive. If a node is compatible with "arm,amba-primecell", then +of_platform_populate() will register it as an amba_device instead of a +platform_device.