1. 28 6月, 2017 2 次提交
    • D
      libnvdimm, nfit: enable support for volatile ranges · c9e582aa
      Dan Williams 提交于
      Allow volatile nfit ranges to participate in all the same infrastructure
      provided for persistent memory regions. A resulting resulting namespace
      device will still be called "pmem", but the parent region type will be
      "nd_volatile". This is in preparation for disabling the dax ->flush()
      operation in the pmem driver when it is hosted on a volatile range.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c9e582aa
    • D
      x86, libnvdimm, pmem: remove global pmem api · ca6a4657
      Dan Williams 提交于
      Now that all callers of the pmem api have been converted to dax helpers that
      call back to the pmem driver, we can remove include/linux/pmem.h and
      asm/pmem.h.
      
      Cc: <x86@kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ca6a4657
  2. 05 5月, 2017 2 次提交
  3. 01 3月, 2017 1 次提交
    • D
      nfit, libnvdimm: fix interleave set cookie calculation · 86ef58a4
      Dan Williams 提交于
      The interleave-set cookie is a sum that sanity checks the composition of
      an interleave set has not changed from when the namespace was initially
      created.  The checksum is calculated by sorting the DIMMs by their
      location in the interleave-set. The comparison for the sort must be
      64-bit wide, not byte-by-byte as performed by memcmp() in the broken
      case.
      
      Fix the implementation to accept correct cookie values in addition to
      the Linux "memcmp" order cookies, but only allow correct cookies to be
      generated going forward. It does mean that namespaces created by
      third-party-tooling, or created by newer kernels with this fix, will not
      validate on older kernels. However, there are a couple mitigating
      conditions:
      
          1/ platforms with namespace-label capable NVDIMMs are not widely
             available.
      
          2/ interleave-sets with a single-dimm are by definition not affected
             (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case.
      
      The cookie stored in the namespace label will be fixed by any write the
      namespace label, the most straightforward way to achieve this is to
      write to the "alt_name" attribute of a namespace in sysfs.
      
      Cc: <stable@vger.kernel.org>
      Fixes: eaf96153 ("libnvdimm, nfit: add interleave-set state-tracking infrastructure")
      Reported-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com>
      Tested-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      86ef58a4
  4. 01 2月, 2017 2 次提交
    • D
      libnvdimm, namespace: do not delete namespace-id 0 · 9d032f42
      Dan Williams 提交于
      Given that the naming of pmem devices changes from the pmemX form to the
      pmemX.Y form when namespace id is greater than 0, arrange for namespaces
      with id-0 to be exempt from deletion. Otherwise a simple reconfiguration
      of an existing namespace to a new mode results in a name change of the
      resulting block device:
      
          # ndctl list --namespace=namespace1.0
          {
            "dev":"namespace1.0",
            "mode":"raw",
            "size":2147483648,
            "uuid":"3dadf3dc-89b9-4b24-b20e-abc8a4707ce3",
            "blockdev":"pmem1"
          }
      
          # ndctl create-namespace --reconfig=namespace1.0 --mode=memory --force
          {
            "dev":"namespace1.1",
            "mode":"memory",
            "size":2111832064,
            "uuid":"7b4a6341-7318-4219-a02c-fb57c0bbf613",
            "blockdev":"pmem1.1"
          }
      
      This change does require tooling changes to explicitly look for
      namespaceX.0 if the seed has already advanced to another namespace.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 98a29c39 ("libnvdimm, namespace: allow creation of multiple pmem-namespaces per region")
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9d032f42
    • B
      nvdimm: constify device_type structures · 970d14e3
      Bhumika Goyal 提交于
      Declare device_type structure as const as it is only stored in the
      type field of a device structure. This field is of type const, so add
      const to declaration of device_type structure.
      
      File size before:
        text	   data	    bss	    dec	    hex	filename
        19278	   3199	     16	  22493	   57dd	nvdimm/namespace_devs.o
      
      File size after:
        text	   data	    bss	    dec	    hex	filename
        19929	   3160	     16	  23105	   5a41	nvdimm/namespace_devs.o
      Signed-off-by: NBhumika Goyal <bhumirks@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      970d14e3
  5. 14 1月, 2017 1 次提交
  6. 16 12月, 2016 1 次提交
  7. 05 12月, 2016 1 次提交
  8. 29 11月, 2016 1 次提交
  9. 20 10月, 2016 1 次提交
  10. 08 10月, 2016 7 次提交
  11. 06 10月, 2016 2 次提交
  12. 01 10月, 2016 1 次提交
  13. 22 9月, 2016 1 次提交
  14. 02 9月, 2016 1 次提交
  15. 10 5月, 2016 1 次提交
    • D
      libnvdimm, dax: introduce device-dax infrastructure · cd03412a
      Dan Williams 提交于
      Device DAX is the device-centric analogue of Filesystem DAX
      (CONFIG_FS_DAX).  It allows persistent memory ranges to be allocated and
      mapped without need of an intervening file system.  This initial
      infrastructure arranges for a libnvdimm pfn-device to be represented as
      a different device-type so that it can be attached to a driver other
      than the pmem driver.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      cd03412a
  16. 23 4月, 2016 1 次提交
  17. 06 3月, 2016 1 次提交
  18. 27 1月, 2016 1 次提交
  19. 06 1月, 2016 1 次提交
    • D
      libnvdimm: fix namespace object confusion in is_uuid_busy() · e07ecd76
      Dan Williams 提交于
      When btt devices were re-worked to be child devices of regions this
      routine was overlooked.  It mistakenly attempts to_nd_namespace_pmem()
      or to_nd_namespace_blk() conversions on btt and pfn devices.  By luck to
      date we have happened to be hitting valid memory leading to a uuid
      miscompare, but a recent change to struct nd_namespace_common causes:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
       IP: [<ffffffff814610dc>] memcmp+0xc/0x40
       [..]
       Call Trace:
        [<ffffffffa0028631>] is_uuid_busy+0xc1/0x2a0 [libnvdimm]
        [<ffffffffa0028570>] ? to_nd_blk_region+0x50/0x50 [libnvdimm]
        [<ffffffff8158c9c0>] device_for_each_child+0x50/0x90
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e07ecd76
  20. 25 12月, 2015 1 次提交
    • D
      libnvdimm, pfn: move 'memory mode' indication to sysfs · 0731de0d
      Dan Williams 提交于
      'Memory mode' is defined as the capability of a DAX mapping to be the
      source/target of DMA and other "direct I/O" scenarios.  While it
      currently requires allocating 'struct page' for each page frame of
      persistent memory in the namespace it will not always be the case.  Work
      continues on reducing the kernel's dependency on 'struct page'.
      
      Let's not maintain a suffix that is expected to lose meaning over time.
      In other words a future 'raw mode' pmem namespace may be as capable as
      today's 'memory mode' namespace.  Undo the encoding of the mode in the
      device name and leave it to other tooling to determine the mode of the
      namespace from its attributes.
      Reported-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0731de0d
  21. 14 12月, 2015 1 次提交
  22. 09 12月, 2015 1 次提交
    • D
      nvdimm: improve diagnosibility of namespaces · bd26d0d0
      Dmitry Krivenok 提交于
      In order to bind namespace to the driver user must first
      set all mandatory attributes in the following order:
      - uuid
      - size
      - sector_size (for blk namespace only)
      
      If the order is wrong, then user either won't be able to set
      the attribute or bind the namespace.
      
      This simple patch improves diagnosibility of common operations
      with namespaces by printing some details about the error
      instead of failing silently.
      
      Below are examples of error messages (assuming dyndbg is
      enabled for nvdimms):
      
      [/]# echo 4194304 > /sys/bus/nd/devices/region5/namespace5.0/size
      [  288.372612] nd namespace5.0: __size_store: uuid not set
      [  288.374839] nd namespace5.0: size_store: 400000 fail (-6)
      sh: write error: No such device or address
      [/]#
      
      [/]# echo namespace5.0 > /sys/bus/nd/drivers/nd_blk/bind
      [  554.671648] nd_blk namespace5.0: nvdimm_namespace_common_probe: sector size not set
      [  554.674688]  ndbus1: nd_blk.probe(namespace5.0) = -19
      sh: write error: No such device
      [/]#
      Signed-off-by: NDmitry V. Krivenok <krivenok.dmitry@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      bd26d0d0
  23. 29 8月, 2015 2 次提交
    • D
      libnvdimm, pmem: direct map legacy pmem by default · 004f1afb
      Dan Williams 提交于
      The expectation is that the legacy / non-standard pmem discovery method
      (e820 type-12) will only ever be used to describe small quantities of
      persistent memory.  Larger capacities will be described via the ACPI
      NFIT.  When "allocate struct page from pmem" support is added this default
      policy can be overridden by assigning a legacy pmem namespace to a pfn
      device, however this would be only be necessary if a platform used the
      legacy mechanism to define a very large range.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      004f1afb
    • D
      libnvdimm, pfn: 'struct page' provider infrastructure · e1455744
      Dan Williams 提交于
      Implement the base infrastructure for libnvdimm PFN devices. Similar to
      BTT devices they take a namespace as a backing device and layer
      functionality on top. In this case the functionality is reserving space
      for an array of 'struct page' entries to be handed out through
      pfn_to_page(). For now this is just the basic libnvdimm-device-model for
      configuring the base PFN device.
      
      As the namespace claiming mechanism for PFN devices is mostly identical
      to BTT devices drivers/nvdimm/claim.c is created to house the common
      bits.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e1455744
  24. 15 8月, 2015 1 次提交
    • V
      libnvdimm, btt: write and validate parent_uuid · 6ec68954
      Vishal Verma 提交于
      When a BTT is instantiated on a namespace it must validate the namespace
      uuid matches the 'parent_uuid' stored in the btt superblock. This
      property enforces that changing the namespace UUID invalidates all
      former BTT instances on that storage. For "IO namespaces" that don't
      have a label or UUID, the parent_uuid is set to zero, and this
      validation is skipped. For such cases, old BTTs have to be invalidated
      by forcing the namespace to raw mode, and overwriting the BTT info
      blocks.
      
      Based on a patch by Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6ec68954
  25. 26 6月, 2015 4 次提交
    • T
      libnvdimm: Add sysfs numa_node to NVDIMM devices · 74ae66c3
      Toshi Kani 提交于
      Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
      under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.x.
      
      An example of numa_node values on a 2-socket system with a single
      NVDIMM range on each socket is shown below.
        /sys/bus/nd/devices
        |-- btt0.0/numa_node:0
        |-- btt1.0/numa_node:1
        |-- btt1.1/numa_node:1
        |-- namespace0.0/numa_node:0
        |-- namespace1.0/numa_node:1
        |-- region0/numa_node:0
        |-- region1/numa_node:1
      
      These numa_node files are then linked under the block class of
      their device names.
        /sys/class/block/pmem0/device/numa_node:0
        /sys/class/block/pmem1s/device/numa_node:1
      
      This enables numactl(8) to accept 'block:' and 'file:' paths of
      pmem and btt devices as shown in the examples below.
        numactl --preferred block:pmem0 --show
        numactl --preferred file:/dev/pmem1s --show
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      74ae66c3
    • V
      libnvdimm, blk: add support for blk integrity · fcae6957
      Vishal Verma 提交于
      Support multiple block sizes (sector + metadata) for nd_blk in the
      same way as done for the BTT. Add the idea of an 'internal' lbasize,
      which is properly aligned and padded, and store metadata in this space.
      Signed-off-by: NVishal Verma <vishal.l.verma@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      fcae6957
    • R
      libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory · 047fc8a1
      Ross Zwisler 提交于
      The libnvdimm implementation handles allocating dimm address space (DPA)
      between PMEM and BLK mode interfaces.  After DPA has been allocated from
      a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
      as a struct bio based block device. Unlike PMEM, BLK is required to
      handle platform specific details like mmio register formats and memory
      controller interleave.  For this reason the libnvdimm generic nd_blk
      driver calls back into the bus provider to carry out the I/O.
      
      This initial implementation handles the BLK interface defined by the
      ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
      DCR (dimm control region), BDW (block data window), IDT (interleave
      descriptor) NFIT structures and the hardware register format.
      [1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
      [2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      047fc8a1
    • V
      nd_btt: atomic sector updates · 5212e11f
      Vishal Verma 提交于
      BTT stands for Block Translation Table, and is a way to provide power
      fail sector atomicity semantics for block devices that have the ability
      to perform byte granularity IO. It relies on the capability of libnvdimm
      namespace devices to do byte aligned IO.
      
      The BTT works as a stacked blocked device, and reserves a chunk of space
      from the backing device for its accounting metadata. It is a bio-based
      driver because all IO is done synchronously, and there is no queuing or
      asynchronous completions at either the device or the driver level.
      
      The BTT uses 'lanes' to index into various 'on-disk' data structures,
      and lanes also act as a synchronization mechanism in case there are more
      CPUs than available lanes. We did a comparison between two lane lock
      strategies - first where we kept an atomic counter around that tracked
      which was the last lane that was used, and 'our' lane was determined by
      atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
      theoretically, no CPU would be blocked waiting for a lane. The other
      strategy was to use the cpu number we're scheduled on to and hash it to
      a lane number. Theoretically, this could block an IO that could've
      otherwise run using a different, free lane. But some fio workloads
      showed that the direct cpu -> lane hash performed faster than tracking
      'last lane' - my reasoning is the cache thrash caused by moving the
      atomic variable made that approach slower than simply waiting out the
      in-progress IO. This supports the conclusion that the driver can be a
      very simple bio-based one that does synchronous IOs instead of queuing.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      [jmoyer: fix nmi watchdog timeout in btt_map_init]
      [jmoyer: move btt initialization to module load path]
      [jmoyer: fix memory leak in the btt initialization path]
      [jmoyer: Don't overwrite corrupted arenas]
      Signed-off-by: NVishal Verma <vishal.l.verma@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5212e11f
  26. 25 6月, 2015 1 次提交
    • D
      libnvdimm: infrastructure for btt devices · 8c2f7e86
      Dan Williams 提交于
      NVDIMM namespaces, in addition to accepting "struct bio" based requests,
      also have the capability to perform byte-aligned accesses.  By default
      only the bio/block interface is used.  However, if another driver can
      make effective use of the byte-aligned capability it can claim namespace
      interface and use the byte-aligned ->rw_bytes() interface.
      
      The BTT driver is the initial first consumer of this mechanism to allow
      adding atomic sector update semantics to a pmem or blk namespace.  This
      patch is the sysfs infrastructure to allow configuring a BTT instance
      for a namespace.  Enabling that BTT and performing i/o is in a
      subsequent patch.
      
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      8c2f7e86