1. 08 3月, 2018 1 次提交
  2. 20 1月, 2018 1 次提交
    • J
      libnvdimm, btt: fix uninitialized err_lock · d08cd5e0
      Jeff Moyer 提交于
      When a sector mode namespace is initially created, the arena's err_lock
      is not initialized.  If, on the other hand, the namespace already
      exists, the mutex is initialized.  To fix the issue, I moved the mutex
      initialization into the arena_alloc, which is called by both
      discover_arenas and create_arenas.
      
      This was discovered on an older kernel where mutex_trylock checks the
      count to determine whether the lock is held.  Because the data structure
      is kzalloc-d, that count was 0 (held), and I/O to the device would hang
      forever waiting for the lock to be released (see btt_write_pg, for
      example).  Current kernels have a different mutex implementation that
      checks for a non-null owner, and so this doesn't show up as a problem.
      If that lock were ever contended, it might cause issues, but you'd have
      to be really unlucky, I think.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d08cd5e0
  3. 22 12月, 2017 1 次提交
  4. 16 11月, 2017 1 次提交
  5. 10 9月, 2017 1 次提交
  6. 08 9月, 2017 1 次提交
  7. 07 9月, 2017 1 次提交
  8. 01 9月, 2017 5 次提交
    • V
      libnvdimm, btt: rework error clearing · d9b83c75
      Vishal Verma 提交于
      Clearing errors or badblocks during a BTT write requires sending an ACPI
      DSM, which means potentially sleeping. Since a BTT IO happens in atomic
      context (preemption disabled, spinlocks may be held), we cannot perform
      error clearing in the course of an IO. Due to this error clearing for
      BTT IOs has hitherto been disabled.
      
      In this patch we move error clearing out of the atomic section, and thus
      re-enable error clearing with BTTs. When we are about to add a block to
      the free list, we check if it was previously marked as an error, and if
      it was, we add it to the freelist, but also set a flag that says error
      clearing will be required. We then drop the lane (ending the atomic
      context), and send a zero buffer so that the error can be cleared. The
      error flag in the free list is protected by the nd 'lane', and is set
      only be a thread while it holds that lane. When the error is cleared,
      the flag is cleared, but while holding a mutex for that freelist index.
      
      When writing, we check for two things -
      1/ If the freelist mutex is held or if the error flag is set. If so,
      this is an error block that is being (or about to be) cleared.
      2/ If the block is a known badblock based on nsio->bb
      
      The second check is required because the BTT map error flag for a map
      entry only gets set when an error LBA is read. If we write to a new
      location that may not have the map error flag set, but still might be in
      the region's badblock list, we can trigger an EIO on the write, which is
      undesirable and completely avoidable.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d9b83c75
    • V
      libnvdimm, btt: cache sector_size in arena_info · 75892004
      Vishal Verma 提交于
      In preparation for the error clearing rework, add sector_size in the
      arena_info struct.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      75892004
    • V
      libnvdimm, btt: ensure that flags were also unchanged during a map_read · 1398199d
      Vishal Verma 提交于
      In btt_map_read, we read the map twice to make sure that the map entry
      didn't change after we added it to the read tracking table. In
      anticipation of expanding the use of the error bit, also make sure that
      the error and zero flags are constant across the two map reads.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1398199d
    • V
      libnvdimm, btt: refactor map entry operations with macros · 0595d539
      Vishal Verma 提交于
      Add helpers for converting a raw map entry to just the block number, or
      either of the 'e' or 'z' flags in preparation for actually using the
      error flag to mark blocks with media errors.
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0595d539
    • V
      libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path · 1db1f3ce
      Vishal Verma 提交于
      The IO context conversion for rw_bytes missed a case in the BTT write
      path (btt_map_write) which should've been marked as atomic.
      
      In reality this should not cause a problem, because map writes are to
      small for nsio_rw_bytes to attempt error clearing, but it should be
      fixed for posterity.
      
      Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
      things like the nfit unit tests, which don't actually sleep, can catch
      bugs like this.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1db1f3ce
  9. 30 8月, 2017 1 次提交
  10. 04 7月, 2017 2 次提交
  11. 01 7月, 2017 1 次提交
  12. 30 6月, 2017 2 次提交
  13. 28 6月, 2017 1 次提交
  14. 09 6月, 2017 1 次提交
  15. 11 5月, 2017 2 次提交
    • V
      libnvdimm, btt: ensure that initializing metadata clears poison · b177fe85
      Vishal Verma 提交于
      If we had badblocks/poison in the metadata area of a BTT, recreating the
      BTT would not clear the poison in all cases, notably the flog area. This
      is because rw_bytes will only clear errors if the request being sent
      down is 512B aligned and sized.
      
      Make sure that when writing the map and info blocks, the rw_bytes being
      sent are of the correct size/alignment. For the flog, instead of doing
      the smaller log_entry writes only, first do a 'wipe' of the entire area
      by writing zeroes in large enough chunks so that errors get cleared.
      
      Cc: Andy Rudoff <andy.rudoff@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b177fe85
    • V
      libnvdimm: add an atomic vs process context flag to rw_bytes · 3ae3d67b
      Vishal Verma 提交于
      nsio_rw_bytes can clear media errors, but this cannot be done while we
      are in an atomic context due to locking within ACPI. From the BTT,
      ->rw_bytes may be called either from atomic or process context depending
      on whether the calls happen during initialization or during IO.
      
      During init, we want to ensure error clearing happens, and the flag
      marking process context allows nsio_rw_bytes to do that. When called
      during IO, we're in atomic context, and error clearing can be skipped.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3ae3d67b
  16. 09 8月, 2016 1 次提交
  17. 08 8月, 2016 1 次提交
  18. 05 8月, 2016 1 次提交
  19. 28 6月, 2016 1 次提交
    • D
      block: convert to device_add_disk() · 0d52c756
      Dan Williams 提交于
      For block drivers that specify a parent device, convert them to use
      device_add_disk().
      
      This conversion was done with the following semantic patch:
      
          @@
          struct gendisk *disk;
          expression E;
          @@
      
          - disk->driverfs_dev = E;
          ...
          - add_disk(disk);
          + device_add_disk(E, disk);
      
          @@
          struct gendisk *disk;
          expression E1, E2;
          @@
      
          - disk->driverfs_dev = E1;
          ...
          E2 = disk;
          ...
          - add_disk(E2);
          + device_add_disk(E1, E2);
      
      ...plus some manual fixups for a few missed conversions.
      
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0d52c756
  20. 23 4月, 2016 3 次提交
  21. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  22. 10 3月, 2016 1 次提交
  23. 08 11月, 2015 1 次提交
  24. 22 10月, 2015 1 次提交
  25. 29 8月, 2015 1 次提交
    • D
      libnvdimm, pfn: 'struct page' provider infrastructure · e1455744
      Dan Williams 提交于
      Implement the base infrastructure for libnvdimm PFN devices. Similar to
      BTT devices they take a namespace as a backing device and layer
      functionality on top. In this case the functionality is reserving space
      for an array of 'struct page' entries to be handed out through
      pfn_to_page(). For now this is just the basic libnvdimm-device-model for
      configuring the base PFN device.
      
      As the namespace claiming mechanism for PFN devices is mostly identical
      to BTT devices drivers/nvdimm/claim.c is created to house the common
      bits.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e1455744
  26. 15 8月, 2015 3 次提交
  27. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  28. 28 7月, 2015 1 次提交
  29. 26 6月, 2015 1 次提交
    • D
      libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only · 58138820
      Dan Williams 提交于
      Upon detection of an unarmed dimm in a region, arrange for descendant
      BTT, PMEM, or BLK instances to be read-only.  A dimm is primarily marked
      "unarmed" via flags passed by platform firmware (NFIT).
      
      The flags in the NFIT memory device sub-structure indicate the state of
      the data on the nvdimm relative to its energy source or last "flush to
      persistence".  For the most part there is nothing the driver can do but
      advertise the state of these flags in sysfs and emit a message if
      firmware indicates that the contents of the device may be corrupted.
      However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
      the block devices incorporating that nvdimm to be marked read-only.
      This is a safe default as the data is still available and new writes are
      held off until the administrator either forces read-write mode, or the
      energy source becomes armed.
      
      A 'read_only' attribute is added to REGION devices to allow for
      overriding the default read-only policy of all descendant block devices.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      58138820