1. 30 6月, 2016 1 次提交
    • D
      nfit: fix format interface code byte order · 1bcbf42d
      Dan Williams 提交于
      Per JEDEC Annex L Release 3 the SPD data is:
      
      Bits 9~5 00 000 = Function Undefined
               00 001 = Byte addressable energy backed
               00 010 = Block addressed
               00 011 = Byte addressable, no energy backed
               All other codes reserved
      Bits 4~0 0 0000 = Proprietary interface
               0 0001 = Standard interface 1
               All other codes reserved; see Definitions of Functions
      
      ...and per the ACPI 6.1 spec:
      
          byte0: Bits 4~0 (0 or 1)
          byte1: Bits 9~5 (1, 2, or 3)
      
      ...so a format interface code displayed as 0x301 should be stored in the
      nfit as (0x1, 0x3), little-endian.
      
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Robert Elliott <elliott@hpe.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=121161
      Fixes: 30ec5fd4 ("nfit: fix format interface code byte order per ACPI6.1")
      Fixes: 5ad9a7fd ("acpi/nfit: Update nfit driver to comply with ACPI 6.1")
      Reported-by: NKristin Jacque <kristin.jacque@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1bcbf42d
  2. 25 6月, 2016 1 次提交
    • D
      acpi, nfit: fix acpi_check_dsm() vs zero functions implemented · 4995734e
      Dan Williams 提交于
      QEMU 2.6 implements nascent support for nvdimm DSMs. Depending on
      configuration it may only implement the function0 dsm to indicate that
      no other DSMs are available. Commit 31eca76b "nfit, libnvdimm:
      limited/whitelisted dimm command marshaling mechanism" breaks QEMU, but
      QEMU is spec compliant.  Per the spec the way to indicate that no
      functions are supported is:
      
          If Function Index is zero, the return is a buffer containing one bit
          for each function index, starting with zero. Bit 0 indicates whether
          there is support for any functions other than function 0 for the
          specified UUID and Revision ID. If set to zero, no functions are
          supported (other than function zero) for the specified UUID and
          Revision ID.
      
      Update the nfit driver to determine the family (interface UUID) without
      requiring the implementation to define any other functions, i.e.
      short-circuit acpi_check_dsm() to succeed per the spec.  The nfit driver
      appears to be the only user passing funcs==0 to acpi_check_dsm(), so
      this behavior change of the common routine should be limited to the
      probing done by the nfit driver.
      
      Cc: Len Brown <lenb@kernel.org>
      Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
      Acked-by: N"Rafael J. Wysocki" <rafael@kernel.org>
      Fixes: 31eca76b ("nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism")
      Reported-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Tested-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      4995734e
  3. 06 5月, 2016 2 次提交
  4. 03 5月, 2016 1 次提交
  5. 30 4月, 2016 1 次提交
  6. 29 4月, 2016 2 次提交
    • D
      nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism · 31eca76b
      Dan Williams 提交于
      There are currently 4 known similar but incompatible definitions of the
      command sets that can be sent to an NVDIMM through ACPI.  It is also
      clear that future platform generations (ACPI or not) will continue to
      revise and extend the DIMM command set as new devices and use cases
      arrive.
      
      It is obviously untenable to continue to proliferate divergence
      of these command definitions, and to that end a standardization process
      has begun to provide for a unified specification.  However, that leaves a
      problem about what to do with this first generation where vendors are
      already shipping divergence.
      
      The Linux kernel can support these initial diverged platforms without
      giving platform-firmware free reign to continue to diverge and compound
      kernel maintenance overhead.  The kernel implementation can encourage
      standardization in two ways:
      
      1/ Require that any function code that userspace wants to send be
         explicitly white-listed in the implementation.  For ACPI this means
         function codes marked as supported by acpi_check_dsm() may
         only be invoked if they appear in the white-list.  A function must be
         publicly documented before it is added to the white-list.
      
      2/ The above restrictions can be trivially bypassed by using the
         "vendor-specific" payload command.  However, since vendor-specific
         commands are by definition not publicly documented and have the
         potential to corrupt the kernel's view of the dimm state, we provide a
         toggle to disable vendor-specific operations.  Enabling undefined
         behavior is a policy decision that can be made by the platform owner
         and encourages firmware implementations to choose public over
         private command implementations.
      
      Based on an initial patch from Jerry Hoemann
      Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      31eca76b
    • D
      nfit, libnvdimm: clarify "commands" vs "_DSMs" · e3654eca
      Dan Williams 提交于
      Clarify the distinction between "commands", the ioctls userspace calls
      to request the kernel take some action on a given dimm device, and
      "_DSMs", the actual function numbers used in the firmware interface to
      the DIMM.  _DSMs are ACPI specific whereas commands are Linux kernel
      generic.
      
      This is in preparation for breaking the 1:1 implicit relationship
      between the kernel ioctl number space and the firmware specific function
      numbers.
      
      Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e3654eca
  7. 27 4月, 2016 2 次提交
    • T
      acpi/nfit: Add sysfs "id" for NVDIMM ID · 38a879ba
      Toshi Kani 提交于
      ACPI 6.1, section 5.2.25.9, defines an identifier for an NVDIMM.
      
      Change the NFIT driver to add a new sysfs file "id" under nfit
      directory.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Robert Elliott <elliott@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      38a879ba
    • T
      acpi/nfit: Update nfit driver to comply with ACPI 6.1 · 5ad9a7fd
      Toshi Kani 提交于
      ACPI 6.1, Table 5-133, updates NVDIMM Control Region Structure
      as follows.
       - Valid Fields, Manufacturing Location, and Manufacturing Date
         are added from reserved range.  No change in the structure size.
       - IDs (SPD values) are stored as arrays of bytes (i.e. big-endian
         format).  The spec clarifies that they need to be represented
         as arrays of bytes as well.
      
      This patch makes the following changes to support this update.
       - Change the NFIT driver to show SPD ID values in big-endian
         format.
       - Change sprintf format to use "0x" instead of "#" since "%#02x"
         does not prepend '0'.
      
      link: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdfSigned-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Robert Elliott <elliott@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5ad9a7fd
  8. 12 4月, 2016 2 次提交
  9. 10 3月, 2016 1 次提交
    • T
      ACPI: Change NFIT driver to insert new resource · af1996ef
      Toshi Kani 提交于
      ACPI 6 defines persistent memory (PMEM) ranges in multiple
      firmware interfaces, e820, EFI, and ACPI NFIT table.  This EFI
      change, however, leads to hit a bug in the grub bootloader, which
      treats EFI_PERSISTENT_MEMORY type as regular memory and corrupts
      stored user data [1].
      
      Therefore, BIOS may set generic reserved type in e820 and EFI to
      cover PMEM ranges.  The kernel can initialize PMEM ranges from
      ACPI NFIT table alone.
      
      This scheme causes a problem in the iomem table, though.  On x86,
      for instance, e820_reserve_resources() initializes top-level entries
      (iomem_resource.child) from the e820 table at early boot-time.
      This creates "reserved" entry for a PMEM range, which does not allow
      region_intersects() to check with PMEM type.
      
      Change acpi_nfit_register_region() to call acpi_nfit_insert_resource(),
      which calls insert_resource() to insert a PMEM entry from NFIT when
      the iomem table does not have a PMEM entry already.  That is, when
      a PMEM range is marked as reserved type in e820, it inserts
      "Persistent Memory" entry, which results as follows.
      
       + "Persistent Memory"
          + "reserved"
      
      This allows the EINJ driver, which calls region_intersects() to check
      PMEM ranges, to work continuously even if BIOS sets reserved type
      (or sets nothing) to PMEM ranges in e820 and EFI.
      
      [1]: https://lists.gnu.org/archive/html/grub-devel/2015-11/msg00209.htmlSigned-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      af1996ef
  10. 06 3月, 2016 6 次提交
    • D
      nfit, libnvdimm: clear poison command support · d4f32367
      Dan Williams 提交于
      Add the boiler-plate for a 'clear error' command based on section
      9.20.7.6 "Function Index 4 - Clear Uncorrectable Error" from the ACPI
      6.1 specification, and add a reference implementation in nfit_test.
      Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d4f32367
    • D
      nfit: disable userspace initiated ars during scrub · 87bf572e
      Dan Williams 提交于
      While the nfit driver is issuing address range scrub commands and
      reaping the results do not permit an ars_start command issued from
      userspace.  The scrub thread assumes that all ars completions are for
      scrubs initiated by platform firmware at boot, or by the nfit driver.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      87bf572e
    • D
      nfit: scrub and register regions in a workqueue · 1cf03c00
      Dan Williams 提交于
      Address range scrub is a potentially long running process that we want
      to complete before any pmem regions are registered.  Perform this
      operation asynchronously to allow other drivers to load in the meantime.
      
      Platform firmware may have initiated a partial scrub prior to the driver
      loading, so we must be careful to consume those results before kicking
      off kernel initiated scrubs on other regions.
      
      This rework also makes the registration path more tolerant of scrub
      errors in that it splits scrubbing into 2 phases.  The first phase
      synchronously waits for a platform-firmware initiated scrub to complete.
      The second phase scans the remaining address ranges asynchronously and
      notifies the related driver(s) when the scrub completes.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1cf03c00
    • D
      nfit, libnvdimm: async region scrub workqueue · 7ae0fa43
      Dan Williams 提交于
      Introduce a workqueue that will be used to run address range scrub
      asynchronously with the rest of nvdimm device probing.
      
      Userspace still wants notification when probing operations complete, so
      introduce a new callback to flush this workqueue when userspace is
      awaiting probe completion.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7ae0fa43
    • D
      nfit, tools/testing/nvdimm: unify common init for acpi_nfit_desc · a61fe6f7
      Dan Williams 提交于
      The nvdimm unit test infrastructure performs its own initialization of
      an acpi_nfit_desc to specify test overrides over the native
      implementation.  Make it clear which attributes and operations it is
      overriding by re-using acpi_nfit_init_desc() as a common starting point.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a61fe6f7
    • D
      libnvdimm, nfit: centralize command status translation · aef25338
      Dan Williams 提交于
      The return value from an 'ndctl_fn' reports the command execution
      status, i.e. was the command properly formatted and was it successfully
      submitted to the bus provider.  The new 'cmd_rc' parameter allows the bus
      provider to communicate command specific results, translated into
      common error codes.
      
      Convert the ARS commands to this scheme to:
      
      1/ Consolidate status reporting
      
      2/ Prepare for for expanding ars unit test cases
      
      3/ Make the implementation more generic
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      aef25338
  11. 05 3月, 2016 1 次提交
  12. 24 2月, 2016 1 次提交
  13. 20 2月, 2016 2 次提交
    • D
      libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing · 747ffe11
      Dan Williams 提交于
      Use the output length specified in the command to size the receive
      buffer rather than the arbitrary 4K limit.
      
      This bug was hiding the fact that the ndctl implementation of
      ndctl_bus_cmd_new_ars_status() was not specifying an output buffer size.
      
      Cc: <stable@vger.kernel.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      747ffe11
    • D
      nfit: fix multi-interface dimm handling, acpi6.1 compatibility · 6697b2cf
      Dan Williams 提交于
      ACPI 6.1 clarified that multi-interface dimms require multiple control
      region entries (DCRs) per dimm.  Previously we were assuming that a
      control region is only present when block-data-windows are present.
      This implementation was done with an eye to be compatibility with the
      looser ACPI 6.0 interpretation of this table.
      
      1/ When coalescing the memory device (MEMDEV) tables for a single dimm,
      coalesce on device_handle rather than control region index.
      
      2/ Whenever we disocver a control region with non-zero block windows
      re-scan for block-data-window (BDW) entries.
      
      We may need to revisit this if a DIMM ever implements a format interface
      outside of blk or pmem, but that is not on the foreseeable horizon.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6697b2cf
  14. 10 1月, 2016 1 次提交
    • V
      libnvdimm: Add a poison list and export badblocks · 0caeef63
      Vishal Verma 提交于
      During region creation, perform Address Range Scrubs (ARS) for the SPA
      (System Physical Address) ranges to retrieve known poison locations from
      firmware. Add a new data structure 'nd_poison' which is used as a list
      in nvdimm_bus to store these poison locations.
      
      When creating a pmem namespace, if there is any known poison associated
      with its physical address space, convert the poison ranges to bad sectors
      that are exposed using the badblocks interface.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0caeef63
  15. 12 12月, 2015 1 次提交
  16. 01 12月, 2015 3 次提交
  17. 03 11月, 2015 2 次提交
    • V
      acpi: nfit: Add support for hot-add · 20985164
      Vishal Verma 提交于
      Add a .notify callback to the acpi_nfit_driver that gets called on a
      hotplug event. From this, evaluate the _FIT ACPI method which returns
      the updated NFIT with handles for the hot-plugged NVDIMM.
      
      Iterate over the new NFIT, and add any new tables found, and
      register/enable the corresponding regions.
      
      In the nfit test framework, after normal initialization, update the NFIT
      with a new hot-plugged NVDIMM, and directly call into the driver to
      update its view of the available regions.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Elliott, Robert <elliott@hpe.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: <linux-acpi@vger.kernel.org>
      Cc: <linux-nvdimm@lists.01.org>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      20985164
    • V
      nfit: in acpi_nfit_init, break on a 0-length table · 564d5011
      Vishal Verma 提交于
      If acpi_nfit_init is called (such as from nfit_test), with an nfit table
      that has more memory allocated than it needs (and a similarly large
      'size' field, add_tables would happily keep adding null SPA Range tables
      filling up all available memory.
      
      Make it friendlier by breaking out if a 0-length header is found in any
      of the tables.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: <linux-acpi@vger.kernel.org>
      Cc: <linux-nvdimm@lists.01.org>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      564d5011
  18. 22 10月, 2015 1 次提交
  19. 15 10月, 2015 1 次提交
  20. 28 8月, 2015 3 次提交
    • D
      x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB · 96601adb
      Dan Williams 提交于
      Given that a write-back (WB) mapping plus non-temporal stores is
      expected to be the most efficient way to access PMEM, update the
      definition of ARCH_HAS_PMEM_API to imply arch support for
      WB-mapped-PMEM.  This is needed as a pre-requisite for adding PMEM to
      the direct map and mapping it with struct page.
      
      The above clarification for X86_64 means that memcpy_to_pmem() is
      permitted to use the non-temporal arch_memcpy_to_pmem() rather than
      needlessly fall back to default_memcpy_to_pmem() when the pcommit
      instruction is not available.  When arch_memcpy_to_pmem() is not
      guaranteed to flush writes out of cache, i.e. on older X86_32
      implementations where non-temporal stores may just dirty cache,
      ARCH_HAS_PMEM_API is simply disabled.
      
      The default fall back for persistent memory handling remains.  Namely,
      map it with the WT (write-through) cache-type and hope for the best.
      
      arch_has_pmem_api() is updated to only indicate whether the arch
      provides the proper helpers to meet the minimum "writes are visible
      outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()".  Code
      that cares whether wmb_pmem() actually flushes writes to pmem must now
      call arch_has_wmb_pmem() directly.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      [hch: set ARCH_HAS_PMEM_API=n on x86_32]
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [toshi: x86_32 compile fixes]
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      96601adb
    • R
      nd_blk: change aperture mapping from WC to WB · 67a3e8fe
      Ross Zwisler 提交于
      This should result in a pretty sizeable performance gain for reads.  For
      rough comparison I did some simple read testing using PMEM to compare
      reads of write combining (WC) mappings vs write-back (WB).  This was
      done on a random lab machine.
      
      PMEM reads from a write combining mapping:
      	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=100000
      	100000+0 records in
      	100000+0 records out
      	409600000 bytes (410 MB) copied, 9.2855 s, 44.1 MB/s
      
      PMEM reads from a write-back mapping:
      	# dd of=/dev/null if=/dev/pmem0 bs=4096 count=1000000
      	1000000+0 records in
      	1000000+0 records out
      	4096000000 bytes (4.1 GB) copied, 3.44034 s, 1.2 GB/s
      
      To be able to safely support a write-back aperture I needed to add
      support for the "read flush" _DSM flag, as outlined in the DSM spec:
      
      http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
      
      This flag tells the ND BLK driver that it needs to flush the cache lines
      associated with the aperture after the aperture is moved but before any
      new data is read.  This ensures that any stale cache lines from the
      previous contents of the aperture will be discarded from the processor
      cache, and the new data will be read properly from the DIMM.  We know
      that the cache lines are clean and will be discarded without any
      writeback because either a) the previous aperture operation was a read,
      and we never modified the contents of the aperture, or b) the previous
      aperture operation was a write and we must have written back the dirtied
      contents of the aperture to the DIMM before the I/O was completed.
      
      In order to add support for the "read flush" flag I needed to add a
      generic routine to invalidate cache lines, mmio_flush_range().  This is
      protected by the ARCH_HAS_MMIO_FLUSH Kconfig variable, and is currently
      only supported on x86.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      67a3e8fe
    • T
      nfit: Clarify memory device state flags strings · 402bae59
      Toshi Kani 提交于
      ACPI 6.0 NFIT Memory Device State Flags in Table 5-129 defines
      NVDIMM status as follows.  These bits indicate multiple info,
      such as failures, pending event, and capability.
      
        Bit [0] set to 1 to indicate that the previous SAVE to the
        Memory Device failed.
        Bit [1] set to 1 to indicate that the last RESTORE from the
        Memory Device failed.
        Bit [2] set to 1 to indicate that platform flush of data to
        Memory Device failed. As a result, the restored data content
        may be inconsistent even if SAVE and RESTORE do not indicate
        failure.
        Bit [3] set to 1 to indicate that the Memory Device is observed
        to be not armed prior to OSPM hand off. A Memory Device is
        considered armed if it is able to accept persistent writes.
        Bit [4] set to 1 to indicate that the Memory Device observed
        SMART and health events prior to OSPM handoff.
      
      /sys/bus/nd/devices/nmemX/nfit/flags shows this flags info.
      The output strings associated with the bits are "save", "restore",
      "smart", etc., which can be confusing as they may be interpreted
      as positive status, i.e. save succeeded.
      
      Change also the dev_info() message in acpi_nfit_register_dimms()
      to be consistent with the sysfs flags strings.
      Reported-by: NRobert Elliott <elliott@hp.com>
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      [ross: rename 'not_arm' to 'not_armed']
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      [djbw: defer adding bit5, HEALTH_ENABLED, for now]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      402bae59
  21. 26 8月, 2015 1 次提交
  22. 28 7月, 2015 2 次提交
  23. 11 7月, 2015 2 次提交