1. 10 9月, 2016 1 次提交
  2. 09 8月, 2016 1 次提交
  3. 08 8月, 2016 2 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
    • J
      block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b
      Jens Axboe 提交于
      Commit abf54548 changed it from an 'rw' flags type to the
      newer ops based interface, but now we're effectively leaking
      some bdev internals to the rest of the kernel. Since we only
      care about whether it's a read or a write at that level, just
      pass in a bool 'is_write' parameter instead.
      
      Then we can also move op_is_write() and friends back under
      CONFIG_BLOCK protection.
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c11f0c0b
  4. 05 8月, 2016 1 次提交
  5. 24 7月, 2016 4 次提交
  6. 22 7月, 2016 1 次提交
  7. 21 7月, 2016 1 次提交
  8. 13 7月, 2016 3 次提交
  9. 12 7月, 2016 6 次提交
    • D
      libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() · 7e267a8c
      Dan Williams 提交于
      Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
      chasing through nd_region), and that we otherwise assume a platform has
      ADR capability when flush hints are not present, move nvdimm_flush() to
      REQ_FLUSH context.
      
      Note that we still arrange for nvdimm_flush() to be called even in the
      ADR case. We need at least once wmb() fence to push buffered writes in
      the cpu out to the ADR protected domain.
      
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7e267a8c
    • D
      libnvdimm: cycle flush hints · 0c27af60
      Dan Williams 提交于
      When the NFIT provides multiple flush hint addresses per-dimm it is
      expressing that the platform is capable of processing multiple flush
      requests in parallel.  There is some fixed cost per flush request, let
      the cost be shared in parallel on multiple cpus.
      
      Since there may not be enough flush hint addresses for each cpu to have
      one, keep a per-cpu index of the last used hint, hash it with current
      pid, and assume that access pattern and scheduler randomness will keep
      the flush-hint usage somewhat staggered across cpus.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0c27af60
    • D
      libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() · f284a4f2
      Dan Williams 提交于
      nvdimm_flush() is a replacement for the x86 'pcommit' instruction.  It is
      an optional write flushing mechanism that an nvdimm bus can provide for
      the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
      nvdimm_flush() is implemented as a series of flush-hint-address [1]
      writes to each dimm in the interleave set (region) that backs the
      namespace.
      
      The nvdimm_has_flush() routine relies on platform firmware to describe
      the flushing capabilities of a platform.  It uses the heuristic of
      whether an nvdimm bus provider provides flush address data to return a
      ternary result:
      
            1: flush addresses defined
            0: dimm topology described without flush addresses (assume ADR)
       -errno: no topology information, unable to determine flush mechanism
      
      The pmem driver is expected to take the following actions on this ternary
      result:
      
            1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown
            0: do not set, WC or FUA on the queue, take no further action
       -errno: warn and then operate as if nvdimm_has_flush() returned '0'
      
      The caveat of this heuristic is that it can not distinguish the "dimm
      does not have flush address" case from the "platform firmware is broken
      and failed to describe a flush address".  Given we are already
      explicitly trusting the NFIT there's not much more we can do beyond
      blacklisting broken firmwares if they are ever encountered.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f284a4f2
    • D
      libnvdimm: keep region data alive over namespace removal · a8f72022
      Dan Williams 提交于
      nd_region device driver data will be used in the namespace i/o path.
      Re-order nd_region_remove() to ensure this data stays live across
      namespace device removal
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a8f72022
    • D
      libnvdimm, nfit: move flush hint mapping to region-device driver-data · e5ae3b25
      Dan Williams 提交于
      In preparation for triggering flushes of a DIMM's writes-posted-queue
      (WPQ) via the pmem driver move mapping of flush hint addresses to the
      region driver.  Since this uses devm_nvdimm_memremap() the flush
      addresses will remain mapped while any region to which the dimm belongs
      is active.
      
      We need to communicate more information to the nvdimm core to facilitate
      this mapping, namely each dimm object now carries an array of flush hint
      address resources.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      e5ae3b25
    • D
      libnvdimm, nfit: remove nfit_spa_map() infrastructure · a8a6d2e0
      Dan Williams 提交于
      Now that all shared mappings are handled by devm_nvdimm_memremap() we no
      longer need nfit_spa_map() nor do we need to trigger a callback to the
      bus provider at region disable time.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a8a6d2e0
  10. 08 7月, 2016 1 次提交
    • D
      libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users · 29b9aa0a
      Dan Williams 提交于
      In preparation for generically mapping flush hint addresses for both the
      BLK and PMEM use case, provide a generic / reference counted mapping
      api.  Given the fact that a dimm may belong to multiple regions (PMEM
      and BLK), the flush hint addresses need to be held valid as long as any
      region associated with the dimm is active.  This is similar to the
      existing BLK-region case where multiple BLK-regions may share an
      aperture mapping.  Up-level this shared / reference-counted mapping
      capability from the nfit driver to a core nvdimm capability.
      
      This eliminates the need for the nd_blk_region.disable() callback.  Note
      that the removal of nfit_spa_map() and related infrastructure is
      deferred to a later patch.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      29b9aa0a
  11. 07 7月, 2016 1 次提交
  12. 28 6月, 2016 2 次提交
    • D
      block: remove ->driverfs_dev · 52c44d93
      Dan Williams 提交于
      Now that all drivers that specify a ->driverfs_dev have been converted
      to device_add_disk(), the pointer can be removed from struct gendisk.
      
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      52c44d93
    • D
      block: convert to device_add_disk() · 0d52c756
      Dan Williams 提交于
      For block drivers that specify a parent device, convert them to use
      device_add_disk().
      
      This conversion was done with the following semantic patch:
      
          @@
          struct gendisk *disk;
          expression E;
          @@
      
          - disk->driverfs_dev = E;
          ...
          - add_disk(disk);
          + device_add_disk(E, disk);
      
          @@
          struct gendisk *disk;
          expression E1, E2;
          @@
      
          - disk->driverfs_dev = E1;
          ...
          E2 = disk;
          ...
          - add_disk(E2);
          + device_add_disk(E1, E2);
      
      ...plus some manual fixups for a few missed conversions.
      
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0d52c756
  13. 25 6月, 2016 1 次提交
    • D
      libnvdimm, pmem: allow nfit_test to override pmem_direct_access() · f295e53b
      Dan Williams 提交于
      Currently phys_to_pfn_t() is an exported symbol to allow nfit_test to
      override it and indicate that nfit_test-pmem is not device-mapped.  Now,
      we want to enable nfit_test to operate without DMA_CMA and the pmem it
      provides will no longer be physically contiguous, i.e. won't be capable
      of supporting direct_access requests larger than a page.  Make
      pmem_direct_access() a weak symbol so that it can be replaced by the
      tools/testing/nvdimm/ version, and move phys_to_pfn_t() to a static
      inline now that it no longer needs to be overridden.
      Acked-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f295e53b
  14. 24 6月, 2016 1 次提交
    • D
      libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment · 1ee6667c
      Dan Williams 提交于
      The updated ndctl unit tests discovered that if a pfn configuration with
      a 4K alignment is read from the namespace, that alignment will be
      ignored in favor of the default 2M alignment.  The result is that the
      configuration will fail initialization with a message like:
      
          dax6.1: bad offset: 0x22000 dax disabled align: 0x200000
      
      Fix this by allowing the alignment read from the info block to override
      the default which is 2M not 0 in the autodetect path.  This also fixes a
      similar problem with the mode and alignment settings silently being
      overwritten by the kernel when userspace has changed it.  We now will
      either overwrite the info block if userspace changes the uuid or fail
      and warn if a live setting disagrees with the info block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Micah Parrish <micah.parrish@hpe.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1ee6667c
  15. 18 6月, 2016 1 次提交
  16. 16 6月, 2016 1 次提交
  17. 22 5月, 2016 2 次提交
    • D
      libnvdimm, dax: fix deletion · 03dca343
      Dan Williams 提交于
      The ndctl unit tests discovered that the dax enabling omitted updates to
      nd_detach_and_reset().  This routine clears device the configuration
      when the namespace is detached.  Without this clearing userspace may
      assume that the device is in the process of being configured by another
      agent in the system.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      03dca343
    • D
      libnvdimm, dax: fix alignment validation · 5e24c9fd
      Dan Williams 提交于
      Testing the dax-device autodetect support revealed a probe failure with
      the following result:
      
          dax0.1: bad offset: 0x8200000 dax disabled
      
      The original pfn-device implementation inferred the alignment from
      ilog2(offset), now that the alignment is explicit the is_power_of_2()
      needs replacing with a real sanity check against the recorded alignment.
      Otherwise the alignment check is useless in the implicit case and only
      the minimum size of the offset matters.
      
      This self-consistency check is further validated by the probe path that
      will re-check that the offset is large enough to contain all the
      metadata required to enable the device.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5e24c9fd
  18. 21 5月, 2016 2 次提交
  19. 19 5月, 2016 2 次提交
  20. 10 5月, 2016 3 次提交
  21. 07 5月, 2016 1 次提交
  22. 01 5月, 2016 1 次提交
    • D
      libnvdimm, pfn: fix memmap reservation sizing · 658922e5
      Dan Williams 提交于
      When configuring a pfn-device instance to allocate the memmap array it
      needs to account for the fact that vmemmap_populate_hugepages()
      allocates struct page blocks in HPAGE_SIZE chunks.  We need to align the
      reserved area size to 2MB otherwise arch_add_memory() runs out of memory
      while establishing the memmap:
      
       WARNING: CPU: 0 PID: 496 at arch/x86/mm/init_64.c:704 arch_add_memory+0xe7/0xf0
       [..]
       Call Trace:
        [<ffffffff8148bdb3>] dump_stack+0x85/0xc2
        [<ffffffff810a749b>] __warn+0xcb/0xf0
        [<ffffffff810a75cd>] warn_slowpath_null+0x1d/0x20
        [<ffffffff8106a497>] arch_add_memory+0xe7/0xf0
        [<ffffffff811d2097>] devm_memremap_pages+0x287/0x450
        [<ffffffff811d1ffa>] ? devm_memremap_pages+0x1ea/0x450
        [<ffffffffa0000298>] __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
        [<ffffffffa0047a58>] pmem_attach_disk+0x318/0x420 [nd_pmem]
        [<ffffffffa0047bcf>] nd_pmem_probe+0x6f/0x90 [nd_pmem]
        [<ffffffffa0009469>] nvdimm_bus_probe+0x69/0x110 [libnvdimm]
       [..]
        ndbus0: nd_pmem.probe(pfn3.0) = -12
       nd_pmem: probe of pfn3.0 failed with error -12
      libndctl: ndctl_pfn_enable: pfn3.0: failed to enable
      Reported-by: NNamratha Kothapalli <namratha.n.kothapalli@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      658922e5
  23. 29 4月, 2016 1 次提交
    • D
      nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism · 31eca76b
      Dan Williams 提交于
      There are currently 4 known similar but incompatible definitions of the
      command sets that can be sent to an NVDIMM through ACPI.  It is also
      clear that future platform generations (ACPI or not) will continue to
      revise and extend the DIMM command set as new devices and use cases
      arrive.
      
      It is obviously untenable to continue to proliferate divergence
      of these command definitions, and to that end a standardization process
      has begun to provide for a unified specification.  However, that leaves a
      problem about what to do with this first generation where vendors are
      already shipping divergence.
      
      The Linux kernel can support these initial diverged platforms without
      giving platform-firmware free reign to continue to diverge and compound
      kernel maintenance overhead.  The kernel implementation can encourage
      standardization in two ways:
      
      1/ Require that any function code that userspace wants to send be
         explicitly white-listed in the implementation.  For ACPI this means
         function codes marked as supported by acpi_check_dsm() may
         only be invoked if they appear in the white-list.  A function must be
         publicly documented before it is added to the white-list.
      
      2/ The above restrictions can be trivially bypassed by using the
         "vendor-specific" payload command.  However, since vendor-specific
         commands are by definition not publicly documented and have the
         potential to corrupt the kernel's view of the dimm state, we provide a
         toggle to disable vendor-specific operations.  Enabling undefined
         behavior is a policy decision that can be made by the platform owner
         and encourages firmware implementations to choose public over
         private command implementations.
      
      Based on an initial patch from Jerry Hoemann
      Cc: Jerry Hoemann <jerry.hoemann@hpe.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      31eca76b