1. 26 6月, 2015 6 次提交
    • D
      libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only · 58138820
      Dan Williams 提交于
      Upon detection of an unarmed dimm in a region, arrange for descendant
      BTT, PMEM, or BLK instances to be read-only.  A dimm is primarily marked
      "unarmed" via flags passed by platform firmware (NFIT).
      
      The flags in the NFIT memory device sub-structure indicate the state of
      the data on the nvdimm relative to its energy source or last "flush to
      persistence".  For the most part there is nothing the driver can do but
      advertise the state of these flags in sysfs and emit a message if
      firmware indicates that the contents of the device may be corrupted.
      However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
      the block devices incorporating that nvdimm to be marked read-only.
      This is a safe default as the data is still available and new writes are
      held off until the administrator either forces read-write mode, or the
      energy source becomes armed.
      
      A 'read_only' attribute is added to REGION devices to allow for
      overriding the default read-only policy of all descendant block devices.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      58138820
    • D
      pmem: flag pmem block devices as non-rotational · 0f51c4fa
      Dan Williams 提交于
      ...since they are effectively SSDs as far as userspace is concerned.
      Reviewed-by: NVishal Verma <vishal.l.verma@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0f51c4fa
    • D
      libnvdimm: enable iostat · f0dc089c
      Dan Williams 提交于
      This is disabled by default as the overhead is prohibitive, but if the
      user takes the action to turn it on we'll oblige.
      Reviewed-by: NVishal Verma <vishal.l.verma@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f0dc089c
    • D
      pmem: make_request cleanups · edc870e5
      Dan Williams 提交于
      Various cleanups:
      
      1/ Kill the BUG_ON since we've already told the block layer we don't
         support DISCARD on all these drivers.
      
      2/ Kill the 'rw' variable, no need to cache it.
      
      3/ Kill the local 'sector' variable.  bio_for_each_segment() is already
         advancing the iterator's sector number by the bio_vec length.
      
      4/ Kill the check for accessing past the end of device
         generic_make_request_checks() already does that.
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      [hch: kill access past end of the device check]
      Reviewed-by: NVishal Verma <vishal.l.verma@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      edc870e5
    • D
      libnvdimm, pmem: fix up max_hw_sectors · 43d3fa3a
      Dan Williams 提交于
      There is no hardware limit to enforce on the size of the i/o that can be passed
      to an nvdimm block device, so set it to UINT_MAX.
      Reviewed-by: NVishal Verma <vishal.l.verma@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      43d3fa3a
    • V
      nd_btt: atomic sector updates · 5212e11f
      Vishal Verma 提交于
      BTT stands for Block Translation Table, and is a way to provide power
      fail sector atomicity semantics for block devices that have the ability
      to perform byte granularity IO. It relies on the capability of libnvdimm
      namespace devices to do byte aligned IO.
      
      The BTT works as a stacked blocked device, and reserves a chunk of space
      from the backing device for its accounting metadata. It is a bio-based
      driver because all IO is done synchronously, and there is no queuing or
      asynchronous completions at either the device or the driver level.
      
      The BTT uses 'lanes' to index into various 'on-disk' data structures,
      and lanes also act as a synchronization mechanism in case there are more
      CPUs than available lanes. We did a comparison between two lane lock
      strategies - first where we kept an atomic counter around that tracked
      which was the last lane that was used, and 'our' lane was determined by
      atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
      theoretically, no CPU would be blocked waiting for a lane. The other
      strategy was to use the cpu number we're scheduled on to and hash it to
      a lane number. Theoretically, this could block an IO that could've
      otherwise run using a different, free lane. But some fio workloads
      showed that the direct cpu -> lane hash performed faster than tracking
      'last lane' - my reasoning is the cache thrash caused by moving the
      atomic variable made that approach slower than simply waiting out the
      in-progress IO. This supports the conclusion that the driver can be a
      very simple bio-based one that does synchronous IOs instead of queuing.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      [jmoyer: fix nmi watchdog timeout in btt_map_init]
      [jmoyer: move btt initialization to module load path]
      [jmoyer: fix memory leak in the btt initialization path]
      [jmoyer: Don't overwrite corrupted arenas]
      Signed-off-by: NVishal Verma <vishal.l.verma@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5212e11f
  2. 25 6月, 2015 4 次提交
    • D
      libnvdimm: infrastructure for btt devices · 8c2f7e86
      Dan Williams 提交于
      NVDIMM namespaces, in addition to accepting "struct bio" based requests,
      also have the capability to perform byte-aligned accesses.  By default
      only the bio/block interface is used.  However, if another driver can
      make effective use of the byte-aligned capability it can claim namespace
      interface and use the byte-aligned ->rw_bytes() interface.
      
      The BTT driver is the initial first consumer of this mechanism to allow
      adding atomic sector update semantics to a pmem or blk namespace.  This
      patch is the sysfs infrastructure to allow configuring a BTT instance
      for a namespace.  Enabling that BTT and performing i/o is in a
      subsequent patch.
      
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      8c2f7e86
    • D
      libnvdimm: pmem label sets and namespace instantiation. · bf9bccc1
      Dan Williams 提交于
      A complete label set is a PMEM-label per-dimm per-interleave-set where
      all the UUIDs match and the interleave set cookie matches the hosting
      interleave set.
      
      Present sysfs attributes for manipulation of a PMEM-namespace's
      'alt_name', 'uuid', and 'size' attributes.  A later patch will make
      these settings persistent by writing back the label.
      
      Note that PMEM allocations grow forwards from the start of an interleave
      set (lowest dimm-physical-address (DPA)).  BLK-namespaces that alias
      with a PMEM interleave set will grow allocations backward from the
      highest DPA.
      
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Neil Brown <neilb@suse.de>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      bf9bccc1
    • D
      libnvdimm, pmem: add libnvdimm support to the pmem driver · 9f53f9fa
      Dan Williams 提交于
      nd_pmem attaches to persistent memory regions and namespaces emitted by
      the libnvdimm subsystem, and, same as the original pmem driver, presents
      the system-physical-address range as a block device.
      
      The existing e820-type-12 to pmem setup is converted to an nvdimm_bus
      that emits an nd_namespace_io device.
      
      Note that the X in 'pmemX' is now derived from the parent region.  This
      provides some stability to the pmem devices names from boot-to-boot.
      The minor numbers are also more predictable by passing 0 to
      alloc_disk().
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9f53f9fa
    • D
      libnvdimm, pmem: move pmem to drivers/nvdimm/ · 18da2c9e
      Dan Williams 提交于
      Prepare the pmem driver to consume PMEM namespaces emitted by regions of
      an nvdimm_bus instance.  No functional change.
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      18da2c9e
  3. 01 4月, 2015 2 次提交
    • I
      drivers/block/pmem: Fix 32-bit build warning in pmem_alloc() · 4c1eaa23
      Ingo Molnar 提交于
      Fix:
      
        drivers/block/pmem.c: In function ‘pmem_alloc’:
        drivers/block/pmem.c:138:7: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘phys_addr_t’ [-Wformat=]
      
      By using the proper %pa format specifier we use for 'phys_addr_t' arguments.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-nvdimm@ml01.01.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4c1eaa23
    • R
      drivers/block/pmem: Add a driver for persistent memory · 9e853f23
      Ross Zwisler 提交于
      PMEM is a new driver that presents a reserved range of memory as
      a block device.  This is useful for developing with NV-DIMMs,
      and can be used with volatile memory as a development platform.
      
      This patch contains the initial driver from Ross Zwisler, with
      various changes: converted it to use a platform_device for
      discovery, fixed partition support and merged various patches
      from Boaz Harrosh.
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-nvdimm@ml01.01.org
      Link: http://lkml.kernel.org/r/1427872339-6688-3-git-send-email-hch@lst.de
      [ Minor cleanups. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9e853f23