1. 01 10月, 2016 1 次提交
    • V
      libnvdimm: clear the internal poison_list when clearing badblocks · e046114a
      Vishal Verma 提交于
      nvdimm_clear_poison cleared the user-visible badblocks, and sent
      commands to the NVDIMM to clear the areas marked as 'poison', but it
      neglected to clear the same areas from the internal poison_list which is
      used to marshal ARS results before sorting them by namespace. As a
      result, once on-demand ARS functionality was added:
      
      37b137ff nfit, libnvdimm: allow an ARS scrub to be triggered on demand
      
      A scrub triggered from either sysfs or an MCE was found to be adding
      stale entries that had been cleared from gendisk->badblocks, but were
      still present in nvdimm_bus->poison_list. Additionally, the stale entries
      could be triggered into producing stale disk->badblocks by simply disabling
      and re-enabling the namespace or region.
      
      This adds the missing step of clearing poison_list entries when clearing
      poison, so that it is always in sync with badblocks.
      
      Fixes: 37b137ff ("nfit, libnvdimm: allow an ARS scrub to be triggered on demand")
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e046114a
  2. 10 9月, 2016 1 次提交
  3. 24 7月, 2016 1 次提交
    • D
      libnvdimm: register nvdimm_bus devices with an nd_bus driver · 18515942
      Dan Williams 提交于
      A recent effort to add a new nvdimm bus provider attribute highlighted a
      race between interrogating nvdimm_bus->nd_desc and nvdimm_bus tear down.
      The typical way to handle these races is to take the device_lock() in
      the attribute method and validate that the device is still active.  In
      order for a device to be 'active' it needs to be associated with a
      driver.  So, we create the small boilerplate for a driver and register
      nvdimm_bus devices on the 'nvdimm_bus_type' bus.
      
      A result of this change is that ndbusX devices now appear under
      /sys/bus/nd/devices.  In fact this makes /sys/class/nd somewhat
      redundant, but removing that will need to take a long deprecation period
      given its use by ndctl binaries in the field.
      
      This change naturally pulls code from drivers/nvdimm/core.c to
      drivers/nvdimm/bus.c, so it is a nice code organization clean-up as
      well.
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      18515942
  4. 22 7月, 2016 1 次提交
  5. 13 7月, 2016 1 次提交
  6. 28 6月, 2016 1 次提交
  7. 18 6月, 2016 1 次提交
  8. 19 5月, 2016 1 次提交
  9. 10 5月, 2016 1 次提交
    • D
      libnvdimm, dax: introduce device-dax infrastructure · cd03412a
      Dan Williams 提交于
      Device DAX is the device-centric analogue of Filesystem DAX
      (CONFIG_FS_DAX).  It allows persistent memory ranges to be allocated and
      mapped without need of an intervening file system.  This initial
      infrastructure arranges for a libnvdimm pfn-device to be represented as
      a different device-type so that it can be attached to a driver other
      than the pmem driver.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      cd03412a
  10. 29 4月, 2016 2 次提交
    • D
      nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism · 31eca76b
      Dan Williams 提交于
      There are currently 4 known similar but incompatible definitions of the
      command sets that can be sent to an NVDIMM through ACPI.  It is also
      clear that future platform generations (ACPI or not) will continue to
      revise and extend the DIMM command set as new devices and use cases
      arrive.
      
      It is obviously untenable to continue to proliferate divergence
      of these command definitions, and to that end a standardization process
      has begun to provide for a unified specification.  However, that leaves a
      problem about what to do with this first generation where vendors are
      already shipping divergence.
      
      The Linux kernel can support these initial diverged platforms without
      giving platform-firmware free reign to continue to diverge and compound
      kernel maintenance overhead.  The kernel implementation can encourage
      standardization in two ways:
      
      1/ Require that any function code that userspace wants to send be
         explicitly white-listed in the implementation.  For ACPI this means
         function codes marked as supported by acpi_check_dsm() may
         only be invoked if they appear in the white-list.  A function must be
         publicly documented before it is added to the white-list.
      
      2/ The above restrictions can be trivially bypassed by using the
         "vendor-specific" payload command.  However, since vendor-specific
         commands are by definition not publicly documented and have the
         potential to corrupt the kernel's view of the dimm state, we provide a
         toggle to disable vendor-specific operations.  Enabling undefined
         behavior is a policy decision that can be made by the platform owner
         and encourages firmware implementations to choose public over
         private command implementations.
      
      Based on an initial patch from Jerry Hoemann
      Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      31eca76b
    • D
      nfit, libnvdimm: clarify "commands" vs "_DSMs" · e3654eca
      Dan Williams 提交于
      Clarify the distinction between "commands", the ioctls userspace calls
      to request the kernel take some action on a given dimm device, and
      "_DSMs", the actual function numbers used in the firmware interface to
      the DIMM.  _DSMs are ACPI specific whereas commands are Linux kernel
      generic.
      
      This is in preparation for breaking the 1:1 implicit relationship
      between the kernel ioctl number space and the firmware specific function
      numbers.
      
      Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e3654eca
  11. 12 4月, 2016 1 次提交
  12. 08 4月, 2016 1 次提交
  13. 10 3月, 2016 1 次提交
  14. 06 3月, 2016 6 次提交
  15. 24 2月, 2016 1 次提交
  16. 20 2月, 2016 1 次提交
  17. 17 2月, 2016 1 次提交
  18. 01 7月, 2015 2 次提交
  19. 26 6月, 2015 3 次提交
    • T
      libnvdimm: Add sysfs numa_node to NVDIMM devices · 74ae66c3
      Toshi Kani 提交于
      Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
      under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.x.
      
      An example of numa_node values on a 2-socket system with a single
      NVDIMM range on each socket is shown below.
        /sys/bus/nd/devices
        |-- btt0.0/numa_node:0
        |-- btt1.0/numa_node:1
        |-- btt1.1/numa_node:1
        |-- namespace0.0/numa_node:0
        |-- namespace1.0/numa_node:1
        |-- region0/numa_node:0
        |-- region1/numa_node:1
      
      These numa_node files are then linked under the block class of
      their device names.
        /sys/class/block/pmem0/device/numa_node:0
        /sys/class/block/pmem1s/device/numa_node:1
      
      This enables numactl(8) to accept 'block:' and 'file:' paths of
      pmem and btt devices as shown in the examples below.
        numactl --preferred block:pmem0 --show
        numactl --preferred file:/dev/pmem1s --show
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      74ae66c3
    • T
      libnvdimm: Set numa_node to NVDIMM devices · 41d7a6d6
      Toshi Kani 提交于
      ACPI NFIT table has System Physical Address Range Structure entries that
      describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
      set in the flags.
      
      Change acpi_nfit_register_region() to map a proximity ID to its node ID,
      and set it to a new numa_node field of nd_region_desc, which is then
      conveyed to the nd_region device.
      
      The device core arranges for btt and namespace devices to inherit their
      node from their parent region.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      [djbw: move set_dev_node() from region.c to bus.c]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      41d7a6d6
    • D
      libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only · 58138820
      Dan Williams 提交于
      Upon detection of an unarmed dimm in a region, arrange for descendant
      BTT, PMEM, or BLK instances to be read-only.  A dimm is primarily marked
      "unarmed" via flags passed by platform firmware (NFIT).
      
      The flags in the NFIT memory device sub-structure indicate the state of
      the data on the nvdimm relative to its energy source or last "flush to
      persistence".  For the most part there is nothing the driver can do but
      advertise the state of these flags in sysfs and emit a message if
      firmware indicates that the contents of the device may be corrupted.
      However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
      the block devices incorporating that nvdimm to be marked read-only.
      This is a safe default as the data is still available and new writes are
      held off until the administrator either forces read-write mode, or the
      energy source becomes armed.
      
      A 'read_only' attribute is added to REGION devices to allow for
      overriding the default read-only policy of all descendant block devices.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      58138820
  20. 25 6月, 2015 9 次提交
    • D
      libnvdimm: infrastructure for btt devices · 8c2f7e86
      Dan Williams 提交于
      NVDIMM namespaces, in addition to accepting "struct bio" based requests,
      also have the capability to perform byte-aligned accesses.  By default
      only the bio/block interface is used.  However, if another driver can
      make effective use of the byte-aligned capability it can claim namespace
      interface and use the byte-aligned ->rw_bytes() interface.
      
      The BTT driver is the initial first consumer of this mechanism to allow
      adding atomic sector update semantics to a pmem or blk namespace.  This
      patch is the sysfs infrastructure to allow configuring a BTT instance
      for a namespace.  Enabling that BTT and performing i/o is in a
      subsequent patch.
      
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      8c2f7e86
    • D
      libnvdimm: write blk label set · 0ba1c634
      Dan Williams 提交于
      After 'uuid', 'size', 'sector_size', and optionally 'alt_name' have been
      set to valid values the labels on the dimm can be updated.  The
      difference with the pmem case is that blk namespaces are limited to one
      dimm and can cover discontiguous ranges in dpa space.
      
      Also, after allocating label slots, it is useful for userspace to know
      how many slots are left.  Export this information in sysfs.
      
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Neil Brown <neilb@suse.de>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0ba1c634
    • D
      libnvdimm: pmem label sets and namespace instantiation. · bf9bccc1
      Dan Williams 提交于
      A complete label set is a PMEM-label per-dimm per-interleave-set where
      all the UUIDs match and the interleave set cookie matches the hosting
      interleave set.
      
      Present sysfs attributes for manipulation of a PMEM-namespace's
      'alt_name', 'uuid', and 'size' attributes.  A later patch will make
      these settings persistent by writing back the label.
      
      Note that PMEM allocations grow forwards from the start of an interleave
      set (lowest dimm-physical-address (DPA)).  BLK-namespaces that alias
      with a PMEM interleave set will grow allocations backward from the
      highest DPA.
      
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Neil Brown <neilb@suse.de>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      bf9bccc1
    • D
      libnvdimm, nfit: add interleave-set state-tracking infrastructure · eaf96153
      Dan Williams 提交于
      On platforms that have firmware support for reading/writing per-dimm
      label space, a portion of the dimm may be accessible via an interleave
      set PMEM mapping in addition to the dimm's BLK (block-data-window
      aperture(s)) interface.  A label, stored in a "configuration data
      region" on the dimm, disambiguates which dimm addresses are accessed
      through which exclusive interface.
      
      Add infrastructure that allows the kernel to block modifications to a
      label in the set while any member dimm is active.  Note that this is
      meant only for enforcing "no modifications of active labels" via the
      coarse ioctl command.  Adding/deleting namespaces from an active
      interleave set is always possible via sysfs.
      
      Another aspect of tracking interleave sets is tracking their integrity
      when DIMMs in a set are physically re-ordered.  For this purpose we
      generate an "interleave-set cookie" that can be recorded in a label and
      validated against the current configuration.  It is the bus provider
      implementation's responsibility to calculate the interleave set cookie
      and attach it to a given region.
      
      Cc: Neil Brown <neilb@suse.de>
      Cc: <linux-acpi@vger.kernel.org>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      eaf96153
    • D
      libnvdimm: support for legacy (non-aliasing) nvdimms · 3d88002e
      Dan Williams 提交于
      The libnvdimm region driver is an intermediary driver that translates
      non-volatile "region"s into "namespace" sub-devices that are surfaced by
      persistent memory block-device drivers (PMEM and BLK).
      
      ACPI 6 introduces the concept that a given nvdimm may simultaneously
      offer multiple access modes to its media through direct PMEM load/store
      access, or windowed BLK mode.  Existing nvdimms mostly implement a PMEM
      interface, some offer a BLK-like mode, but never both as ACPI 6 defines.
      If an nvdimm is single interfaced, then there is no need for dimm
      metadata labels.  For these devices we can take the region boundaries
      directly to create a child namespace device (nd_namespace_io).
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3d88002e
    • D
      libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver infrastructure · 4d88a97a
      Dan Williams 提交于
      * Implement the device-model infrastructure for loading modules and
        attaching drivers to nvdimm devices.  This is a simple association of a
        nd-device-type number with a driver that has a bitmask of supported
        device types.  To facilitate userspace bind/unbind operations 'modalias'
        and 'devtype', that also appear in the uevent, are added as generic
        sysfs attributes for all nvdimm devices.  The reason for the device-type
        number is to support sub-types within a given parent devtype, be it a
        vendor-specific sub-type or otherwise.
      
      * The first consumer of this infrastructure is the driver
        for dimm devices.  It simply uses control messages to retrieve and
        store the configuration-data image (label set) from each dimm.
      
      Note: nd_device_register() arranges for asynchronous registration of
            nvdimm bus devices by default.
      
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Neil Brown <neilb@suse.de>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      4d88a97a
    • D
      libnvdimm: control (ioctl) messages for nvdimm_bus and nvdimm devices · 62232e45
      Dan Williams 提交于
      Most discovery/configuration of the nvdimm-subsystem is done via sysfs
      attributes.  However, some nvdimm_bus instances, particularly the
      ACPI.NFIT bus, define a small set of messages that can be passed to the
      platform.  For convenience we derive the initial libnvdimm-ioctl command
      formats directly from the NFIT DSM Interface Example formats.
      
          ND_CMD_SMART: media health and diagnostics
          ND_CMD_GET_CONFIG_SIZE: size of the label space
          ND_CMD_GET_CONFIG_DATA: read label space
          ND_CMD_SET_CONFIG_DATA: write label space
          ND_CMD_VENDOR: vendor-specific command passthrough
          ND_CMD_ARS_CAP: report address-range-scrubbing capabilities
          ND_CMD_ARS_START: initiate scrubbing
          ND_CMD_ARS_STATUS: report on scrubbing state
          ND_CMD_SMART_THRESHOLD: configure alarm thresholds for smart events
      
      If a platform later defines different commands than this set it is
      straightforward to extend support to those formats.
      
      Most of the commands target a specific dimm.  However, the
      address-range-scrubbing commands target the bus.  The 'commands'
      attribute in sysfs of an nvdimm_bus, or nvdimm, enumerate the supported
      commands for that object.
      
      Cc: <linux-acpi@vger.kernel.org>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reported-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      62232e45
    • D
      libnvdimm, nfit: dimm/memory-devices · e6dfb2de
      Dan Williams 提交于
      Enable nvdimm devices to be registered on a nvdimm_bus.  The kernel
      assigned device id for nvdimm devicesis dynamic.  If userspace needs a
      more static identifier it should consult a provider-specific attribute.
      In the case where NFIT is the provider, the 'nmemX/nfit/handle' or
      'nmemX/nfit/serial' attributes may be used for this purpose.
      
      Cc: Neil Brown <neilb@suse.de>
      Cc: <linux-acpi@vger.kernel.org>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e6dfb2de
    • D
      libnvdimm: control character device and nvdimm_bus sysfs attributes · 45def22c
      Dan Williams 提交于
      The control device for a nvdimm_bus is registered as an "nd" class
      device.  The expectation is that there will usually only be one "nd" bus
      registered under /sys/class/nd.  However, we allow for the possibility
      of multiple buses and they will listed in discovery order as
      ndctl0...ndctlN.  This character device hosts the ioctl for passing
      control messages.  The initial command set has a 1:1 correlation with
      the commands listed in the by the "NFIT DSM Example" document [1], but
      this scheme is extensible to future command sets.
      
      Note, nd_ioctl() and the backing ->ndctl() implementation are defined in
      a subsequent patch.  This is simply the initial registrations and sysfs
      attributes.
      
      [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
      
      Cc: Neil Brown <neilb@suse.de>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: <linux-acpi@vger.kernel.org>
      Cc: Robert Moore <robert.moore@intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      45def22c