1. 28 6月, 2016 1 次提交
  2. 25 6月, 2016 1 次提交
    • D
      libnvdimm, pmem: allow nfit_test to override pmem_direct_access() · f295e53b
      Dan Williams 提交于
      Currently phys_to_pfn_t() is an exported symbol to allow nfit_test to
      override it and indicate that nfit_test-pmem is not device-mapped.  Now,
      we want to enable nfit_test to operate without DMA_CMA and the pmem it
      provides will no longer be physically contiguous, i.e. won't be capable
      of supporting direct_access requests larger than a page.  Make
      pmem_direct_access() a weak symbol so that it can be replaced by the
      tools/testing/nvdimm/ version, and move phys_to_pfn_t() to a static
      inline now that it no longer needs to be overridden.
      Acked-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f295e53b
  3. 18 6月, 2016 1 次提交
  4. 21 5月, 2016 1 次提交
    • D
      /dev/dax, pmem: direct access to persistent memory · ab68f262
      Dan Williams 提交于
      Device DAX is the device-centric analogue of Filesystem DAX
      (CONFIG_FS_DAX).  It allows memory ranges to be allocated and mapped
      without need of an intervening file system.  Device DAX is strict,
      precise and predictable.  Specifically this interface:
      
      1/ Guarantees fault granularity with respect to a given page size (pte,
      pmd, or pud) set at configuration time.
      
      2/ Enforces deterministic behavior by being strict about what fault
      scenarios are supported.
      
      For example, by forcing MADV_DONTFORK semantics and omitting MAP_PRIVATE
      support device-dax guarantees that a mapping always behaves/performs the
      same once established.  It is the "what you see is what you get" access
      mechanism to differentiated memory vs filesystem DAX which has
      filesystem specific implementation semantics.
      
      Persistent memory is the first target, but the mechanism is also
      targeted for exclusive allocations of performance differentiated memory
      ranges.
      
      This commit is limited to the base device driver infrastructure to
      associate a dax device with pmem range.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ab68f262
  5. 10 5月, 2016 1 次提交
    • D
      libnvdimm, dax: introduce device-dax infrastructure · cd03412a
      Dan Williams 提交于
      Device DAX is the device-centric analogue of Filesystem DAX
      (CONFIG_FS_DAX).  It allows persistent memory ranges to be allocated and
      mapped without need of an intervening file system.  This initial
      infrastructure arranges for a libnvdimm pfn-device to be represented as
      a different device-type so that it can be attached to a driver other
      than the pmem driver.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      cd03412a
  6. 06 5月, 2016 1 次提交
  7. 29 4月, 2016 1 次提交
    • D
      nfit, libnvdimm: clarify "commands" vs "_DSMs" · e3654eca
      Dan Williams 提交于
      Clarify the distinction between "commands", the ioctls userspace calls
      to request the kernel take some action on a given dimm device, and
      "_DSMs", the actual function numbers used in the firmware interface to
      the DIMM.  _DSMs are ACPI specific whereas commands are Linux kernel
      generic.
      
      This is in preparation for breaking the 1:1 implicit relationship
      between the kernel ioctl number space and the firmware specific function
      numbers.
      
      Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e3654eca
  8. 23 4月, 2016 1 次提交
    • D
      libnvdimm, pmem, pfn: make pmem_rw_bytes generic and refactor pfn setup · 200c79da
      Dan Williams 提交于
      In preparation for providing an alternative (to block device) access
      mechanism to persistent memory, convert pmem_rw_bytes() to
      nsio_rw_bytes().  This allows ->rw_bytes() functionality without
      requiring a 'struct pmem_device' to be instantiated.
      
      In other words, when ->rw_bytes() is in use i/o is driven through
      'struct nd_namespace_io', otherwise it is driven through 'struct
      pmem_device' and the block layer.  This consolidates the disjoint calls
      to devm_exit_badblocks() and devm_memunmap() into a common
      devm_nsio_disable() and cleans up the init path to use a unified
      pmem_attach_disk() implementation.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      200c79da
  9. 12 4月, 2016 1 次提交
  10. 06 3月, 2016 6 次提交
  11. 20 2月, 2016 1 次提交
  12. 01 2月, 2016 1 次提交
  13. 10 1月, 2016 1 次提交
  14. 25 12月, 2015 1 次提交
  15. 15 12月, 2015 1 次提交
  16. 01 12月, 2015 1 次提交
    • L
      nfit: Adjust for different _FIT and NFIT headers · 6b577c9d
      Linda Knippers 提交于
      When support for _FIT was added, the code presumed that the data
      returned by the _FIT method is identical to the NFIT table, which
      starts with an acpi_table_header.  However, the _FIT is defined
      to return a data in the format of a series of NFIT type structure
      entries and as a method, has an acpi_object header rather tahn
      an acpi_table_header.
      
      To address the differences, explicitly save the acpi_table_header
      from the NFIT, since it is accessible through /sys, and change
      the nfit pointer in the acpi_desc structure to point to the
      table entries rather than the headers.
      
      Reported-by: Jeff Moyer (jmoyer@redhat.com>
      Signed-off-by: NLinda Knippers <linda.knippers@hpe.com>
      Acked-by: NVishal Verma <vishal.l.verma@intel.com>
      [vishal: fix up unit test for new header assumptions]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6b577c9d
  17. 13 11月, 2015 1 次提交
  18. 03 11月, 2015 1 次提交
    • V
      acpi: nfit: Add support for hot-add · 20985164
      Vishal Verma 提交于
      Add a .notify callback to the acpi_nfit_driver that gets called on a
      hotplug event. From this, evaluate the _FIT ACPI method which returns
      the updated NFIT with handles for the hot-plugged NVDIMM.
      
      Iterate over the new NFIT, and add any new tables found, and
      register/enable the corresponding regions.
      
      In the nfit test framework, after normal initialization, update the NFIT
      with a new hot-plugged NVDIMM, and directly call into the driver to
      update its view of the available regions.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Elliott, Robert <elliott@hpe.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: <linux-acpi@vger.kernel.org>
      Cc: <linux-nvdimm@lists.01.org>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      20985164
  19. 29 8月, 2015 2 次提交
    • D
      libnvdimm, pmem: 'struct page' for pmem · 32ab0a3f
      Dan Williams 提交于
      Enable the pmem driver to handle PFN device instances.  Attaching a pmem
      namespace to a pfn device triggers the driver to allocate and initialize
      struct page entries for pmem.  Memory capacity for this allocation comes
      exclusively from RAM for now which is suitable for low PMEM to RAM
      ratios.  This mechanism will be expanded later for setting an "allocate
      from PMEM" policy.
      
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      32ab0a3f
    • D
      libnvdimm, pfn: 'struct page' provider infrastructure · e1455744
      Dan Williams 提交于
      Implement the base infrastructure for libnvdimm PFN devices. Similar to
      BTT devices they take a namespace as a backing device and layer
      functionality on top. In this case the functionality is reserving space
      for an array of 'struct page' entries to be handed out through
      pfn_to_page(). For now this is just the basic libnvdimm-device-model for
      configuring the base PFN device.
      
      As the namespace claiming mechanism for PFN devices is mostly identical
      to BTT devices drivers/nvdimm/claim.c is created to house the common
      bits.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e1455744
  20. 28 8月, 2015 1 次提交
    • R
      nd_blk: change aperture mapping from WC to WB · 67a3e8fe
      Ross Zwisler 提交于
      This should result in a pretty sizeable performance gain for reads.  For
      rough comparison I did some simple read testing using PMEM to compare
      reads of write combining (WC) mappings vs write-back (WB).  This was
      done on a random lab machine.
      
      PMEM reads from a write combining mapping:
      	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
      	100000+0 records in
      	100000+0 records out
      	409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s
      
      PMEM reads from a write-back mapping:
      	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
      	1000000+0 records in
      	1000000+0 records out
      	4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s
      
      To be able to safely support a write-back aperture I needed to add
      support for the "read flush" _DSM flag, as outlined in the DSM spec:
      
      http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
      
      This flag tells the ND BLK driver that it needs to flush the cache lines
      associated with the aperture after the aperture is moved but before any
      new data is read.  This ensures that any stale cache lines from the
      previous contents of the aperture will be discarded from the processor
      cache, and the new data will be read properly from the DIMM.  We know
      that the cache lines are clean and will be discarded without any
      writeback because either a) the previous aperture operation was a read,
      and we never modified the contents of the aperture, or b) the previous
      aperture operation was a write and we must have written back the dirtied
      contents of the aperture to the DIMM before the I/O was completed.
      
      In order to add support for the "read flush" flag I needed to add a
      generic routine to invalidate cache lines, mmio_flush_range().  This is
      protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
      only supported on x86.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      67a3e8fe
  21. 19 8月, 2015 1 次提交
    • D
      libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option · 7a67832c
      Dan Williams 提交于
      We currently register a platform device for e820 type-12 memory and
      register a nvdimm bus beneath it.  Registering the platform device
      triggers the device-core machinery to probe for a driver, but that
      search currently comes up empty.  Building the nvdimm-bus registration
      into the e820_pmem platform device registration in this way forces
      libnvdimm to be built-in.  Instead, convert the built-in portion of
      CONFIG_X86_PMEM_LEGACY to simply register a platform device and move the
      rest of the logic to the driver for e820_pmem, for the following
      reasons:
      
      1/ Letting e820_pmem support be a module allows building and testing
         libnvdimm.ko changes without rebooting
      
      2/ All the normal policy around modules can be applied to e820_pmem
         (unbind to disable and/or blacklisting the module from loading by
         default)
      
      3/ Moving the driver to a generic location and converting it to scan
         "iomem_resource" rather than "e820.map" means any other architecture can
         take advantage of this simple nvdimm resource discovery mechanism by
         registering a resource named "Persistent Memory (legacy)"
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7a67832c
  22. 15 8月, 2015 2 次提交
  23. 28 7月, 2015 1 次提交
  24. 11 7月, 2015 3 次提交
  25. 26 6月, 2015 2 次提交
    • D
      libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only · 58138820
      Dan Williams 提交于
      Upon detection of an unarmed dimm in a region, arrange for descendant
      BTT, PMEM, or BLK instances to be read-only.  A dimm is primarily marked
      "unarmed" via flags passed by platform firmware (NFIT).
      
      The flags in the NFIT memory device sub-structure indicate the state of
      the data on the nvdimm relative to its energy source or last "flush to
      persistence".  For the most part there is nothing the driver can do but
      advertise the state of these flags in sysfs and emit a message if
      firmware indicates that the contents of the device may be corrupted.
      However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
      the block devices incorporating that nvdimm to be marked read-only.
      This is a safe default as the data is still available and new writes are
      held off until the administrator either forces read-write mode, or the
      energy source becomes armed.
      
      A 'read_only' attribute is added to REGION devices to allow for
      overriding the default read-only policy of all descendant block devices.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      58138820
    • D
      tools/testing/nvdimm: libnvdimm unit test infrastructure · 6bc75619
      Dan Williams 提交于
      'libnvdimm' is the first driver sub-system in the kernel to implement
      mocking for unit test coverage.  The nfit_test module gets built as an
      external module and arranges for external module replacements of nfit,
      libnvdimm, nd_pmem, and nd_blk.  These replacements use the linker
      --wrap option to redirect calls to ioremap() + request_mem_region() to
      custom defined unit test resources.  The end result is a fully
      functional nvdimm_bus, as far as userspace is concerned, but with the
      capability to perform otherwise destructive tests on emulated resources.
      
      Q: Why not use QEMU for this emulation?
      QEMU is not suitable for unit testing.  QEMU's role is to faithfully
      emulate the platform.  A unit test's role is to unfaithfully implement
      the platform with the goal of triggering bugs in the corners of the
      sub-system implementation.  As bugs are discovered in platforms, or the
      sub-system itself, the unit tests are extended to backstop a fix with a
      reproducer unit test.
      
      Another problem with QEMU is that it would require coordination of 3
      software projects instead of 2 (kernel + libndctl [1]) to maintain and
      execute the tests.  The chances for bit rot and the difficulty of
      getting the tests running goes up non-linearly the more components
      involved.
      
      
      Q: Why submit this to the kernel tree instead of external modules in
         libndctl?
      Simple, to alleviate the same risk that out-of-tree external modules
      face.  Updates to drivers/nvdimm/ can be immediately evaluated to see if
      they have any impact on tools/testing/nvdimm/.
      
      
      Q: What are the negative implications of merging this?
      It is a unique maintenance burden because the purpose of mocking an
      interface to enable a unit test is to purposefully short circuit the
      semantics of a routine to enable testing.  For example
      __wrap_ioremap_cache() fakes the pmem driver into "ioremap()'ing" a test
      resource buffer allocated by dma_alloc_coherent().  The future
      maintenance burden hits when someone changes the semantics of
      ioremap_cache() and wonders what the implications are for the unit test.
      
      [1]: https://github.com/pmem/ndctl
      
      Cc: <linux-acpi@vger.kernel.org>
      Cc: Lv Zheng <lv.zheng@intel.com>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6bc75619