1. 26 7月, 2017 1 次提交
    • O
      libnvdimm: Stop using HPAGE_SIZE · 0dd69643
      Oliver O'Halloran 提交于
      Currently libnvdimm uses HPAGE_SIZE as the default alignment for DAX and
      PFN devices. HPAGE_SIZE is the default hugetlbfs page size and when
      hugetlbfs is disabled it defaults to PAGE_SIZE. Given DAX has more
      in common with THP than hugetlbfs we should proably be using
      HPAGE_PMD_SIZE, but this is undefined when THP is disabled so lets just
      give it a new name.
      
      The other usage of HPAGE_SIZE in libnvdimm is when determining how large
      the altmap should be. For the reasons mentioned above it doesn't really
      make sense to use HPAGE_SIZE here either. PMD_SIZE seems to be safe to
      use in generic code and it happens to match the vmemmap allocation block
      on x86 and Power. It's still a hack, but it's a slightly nicer hack.
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0dd69643
  2. 18 7月, 2017 1 次提交
  3. 04 7月, 2017 3 次提交
  4. 01 7月, 2017 4 次提交
  5. 30 6月, 2017 5 次提交
    • V
      libnvdimm, btt: fix btt_rw_page not returning errors · c13c43d5
      Vishal Verma 提交于
      btt_rw_page was not propagating errors frm btt_do_bvec, resulting in any
      IO errors via the rw_page path going unnoticed. the pmem driver recently
      fixed this in e10624f8 pmem: fail io-requests to known bad blocks
      but same problem in BTT went neglected.
      
      Fixes: 5212e11f ("nd_btt: atomic sector updates")
      Cc: <stable@vger.kernel.org>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c13c43d5
    • D
      acpi, nfit: quiet invalid block-aperture-region warnings · d5d51fec
      Dan Williams 提交于
      This state is already visible by userspace since the BLK region will not
      be enabled, and it is otherwise benign as it usually indicates that the
      DIMM is not configured.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d5d51fec
    • V
      libnvdimm, btt: BTT updates for UEFI 2.7 format · 14e49454
      Vishal Verma 提交于
      The UEFI 2.7 specification defines an updated BTT metadata format,
      bumping the revision to 2.0. Add support for the new format, while
      retaining compatibility for the old 1.1 format.
      
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Linda Knippers <linda.knippers@hpe.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      14e49454
    • D
      libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region · 0b277961
      Dan Williams 提交于
      The pmem driver attaches to both persistent and volatile memory ranges
      advertised by the ACPI NFIT. When the region is volatile it is redundant
      to spend cycles flushing caches at fsync(). Check if the hosting region
      is volatile and do not set dax_write_cache() if it is.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0b277961
    • D
      libnvdimm, pmem, dax: export a cache control attribute · 6e0c90d6
      Dan Williams 提交于
      The dax_flush() operation can be turned into a nop on platforms where
      firmware arranges for cpu caches to be flushed on a power-fail event.
      The ACPI 6.2 specification defines a mechanism for the platform to
      indicate this capability so the kernel can select the proper default.
      However, for other platforms, the administrator must toggle this setting
      manually.
      
      Given this flush setting is a dax-specific mechanism we advertise it
      through a 'dax' attribute group hanging off a host device. For example,
      a 'pmem0' block-device gets a 'dax' sysfs-subdirectory with a
      'write_cache' attribute to control response to dax cache flush requests.
      This is similar to the 'queue/write_cache' attribute that appears under
      block devices.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      6e0c90d6
  6. 28 6月, 2017 5 次提交
    • D
      libnvdimm, nfit: enable support for volatile ranges · c9e582aa
      Dan Williams 提交于
      Allow volatile nfit ranges to participate in all the same infrastructure
      provided for persistent memory regions. A resulting resulting namespace
      device will still be called "pmem", but the parent region type will be
      "nd_volatile". This is in preparation for disabling the dax ->flush()
      operation in the pmem driver when it is hosted on a volatile range.
      
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c9e582aa
    • D
      libnvdimm, pmem: fix persistence warning · c00b396e
      Dan Williams 提交于
      The pmem driver assumes if platform firmware describes the memory
      devices associated with a persistent memory range and
      CONFIG_ARCH_HAS_PMEM_API=y that it has all the mechanism necessary to
      flush data to a power-fail safe zone. We warn if the firmware does not
      describe memory devices, but we also need to warn if the architecture
      does not claim pmem support.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c00b396e
    • D
      x86, libnvdimm, pmem: remove global pmem api · ca6a4657
      Dan Williams 提交于
      Now that all callers of the pmem api have been converted to dax helpers that
      call back to the pmem driver, we can remove include/linux/pmem.h and
      asm/pmem.h.
      
      Cc: <x86@kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ca6a4657
    • D
      x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm · f2b61257
      Dan Williams 提交于
      Kill this globally defined wrapper and move to libnvdimm so that we can
      ultimately remove include/linux/pmem.h and asm/pmem.h.
      
      Cc: <x86@kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f2b61257
    • C
      block: don't bother with bounce limits for make_request drivers · 0b0bcacc
      Christoph Hellwig 提交于
      We only call blk_queue_bounce for request-based drivers, so stop messing
      with it for make_request based drivers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0b0bcacc
  7. 16 6月, 2017 12 次提交
    • D
      x86, dax, libnvdimm: remove wb_cache_pmem() indirection · 4e4f00a9
      Dan Williams 提交于
      With all handling of the CONFIG_ARCH_HAS_PMEM_API case being moved to
      libnvdimm and the pmem driver directly we do not need to provide global
      wrappers and fallbacks in the CONFIG_ARCH_HAS_PMEM_API=n case. The pmem
      driver will simply not link to arch_wb_cache_pmem() in that case.  Same
      as before, pmem flushing is only defined for x86_64, via
      clean_cache_range(), but it is straightforward to add other archs in the
      future.
      
      arch_wb_cache_pmem() is an exported function since the pmem module needs
      to find it, but it is privately declared in drivers/nvdimm/pmem.h because
      there are no consumers outside of the pmem driver.
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      4e4f00a9
    • D
      dax, pmem: introduce an optional 'flush' dax_operation · 3c1cebff
      Dan Williams 提交于
      Filesystem-DAX flushes caches whenever it writes to the address returned
      through dax_direct_access() and when writing back dirty radix entries.
      That flushing is only required in the pmem case, so add a dax operation
      to allow pmem to take this extra action, but skip it for other dax
      capable devices that do not provide a flush routine.
      
      An example for this differentiation might be a volatile ram disk where
      there is no expectation of persistence. In fact the pmem driver itself might
      front such an address range specified by the NFIT. So, this "no flush"
      property might be something passed down by the bus / libnvdimm.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3c1cebff
    • T
      libnvdimm, pmem: Add sysfs notifications to badblocks · 975750a9
      Toshi Kani 提交于
      Sysfs "badblocks" information may be updated during run-time that:
       - MCE, SCI, and sysfs "scrub" may add new bad blocks
       - Writes and ioctl() may clear bad blocks
      
      Add support to send sysfs notifications to sysfs "badblocks" file
      under region and pmem directories when their badblocks information
      is re-evaluated (but is not necessarily changed) during run-time.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Linda Knippers <linda.knippers@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      975750a9
    • D
      libnvdimm, label: switch to using v1.2 labels by default · 8990cdf1
      Dan Williams 提交于
      The rules for which version of the label specification are in effect at
      any given point in time are as follows:
      
      1/ If a DIMM has an existing / valid index block then the version
         specified is used regardless if it is a previous version.
      
      2/ By default when the kernel is initializing new index blocks the
         latest specification version (v1.2 at time of writing) is used.
      
      3/ An environment that wants to force create v1.1 label-sets must
         arrange for userspace to disable all active regions / namespaces /
         dimms and write a valid set of v1.1 index blocks to the dimms.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      8990cdf1
    • D
      libnvdimm, label: add address abstraction identifiers · b3fde74e
      Dan Williams 提交于
      Starting with v1.2 labels, 'address abstractions' can be hinted via an
      address abstraction id that implies an info-block format. The standard
      address abstraction in the specification is the v2 format of the
      Block-Translation-Table (BTT). Support for that is saved for a later
      patch, for now we add support for the Linux supported address
      abstractions BTT (v1), PFN, and DAX.
      
      The new 'holder_class' attribute for namespace devices is added for
      tooling to specify the 'abstraction_guid' to store in the namespace label.
      For v1.1 labels this field is undefined and any setting of
      'holder_class' away from the default 'none' value will only have effect
      until the driver is unloaded. Setting 'holder_class' requires that
      whatever device tries to claim the namespace must be of the specified
      class.
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b3fde74e
    • D
      libnvdimm, label: add v1.2 label checksum support · 355d8388
      Dan Williams 提交于
      The v1.2 namespace label specification adds a fletcher checksum to each
      label instance. Add generation and validation support for the new field.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      355d8388
    • D
      libnvdimm, label: update 'nlabel' and 'position' handling for local namespaces · 3934d841
      Dan Williams 提交于
      The v1.2 namespace label specification requires 'nlabel' and 'position'
      to be valid for the first ("lowest dpa") label in the set. It also
      requires all non-first labels to set those fields to 0xff.
      
      Linux does not much care if these values are correct, because we can
      just trust the count of labels with the matching uuid like the v1.1
      case. However, we set them correctly in case other environments care.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3934d841
    • D
      libnvdimm, label: populate 'isetcookie' for blk-aperture namespaces · 8f2bc243
      Dan Williams 提交于
      Starting with the v1.2 definition of namespace labels, the isetcookie
      field is populated and validated for blk-aperture namespaces. This adds
      some safety against inadvertent copying of namespace labels from one
      DIMM-device to another.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      8f2bc243
    • D
      libnvdimm, label: populate the type_guid property for v1.2 namespaces · faec6f8a
      Dan Williams 提交于
      The type_guid refers to the "Address Range Type GUID" for the region
      backing a namespace as defined the ACPI NFIT (NVDIMM Firmware Interface
      Table). This 'type' identifier specifies an access mechanism for the
      given namespace. This capability replaces the confusing usage of the
      'NSLABEL_FLAG_LOCAL' flag to indicate a block-aperture-mode namespace.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      faec6f8a
    • D
      libnvdimm, label: honor the lba size specified in v1.2 labels · f979b13c
      Dan Williams 提交于
      Previously we only honored the lba size for blk-aperture mode
      namespaces. For pmem namespaces the lba size was just assumed to be 512.
      With the new v1.2 label definition and compatibility with other
      operating environments, the ->lbasize property is now respected for pmem
      namespaces.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f979b13c
    • D
      libnvdimm, label: add v1.2 interleave-set-cookie algorithm · c12c48ce
      Dan Williams 提交于
      The interleave-set-cookie algorithm is extended to incorporate all the
      same components that are used to generate an nvdimm unique-id. For
      backwards compatibility we still maintain the old v1.1 definition.
      Reported-by: NNicholas Moulin <nicholas.w.moulin@intel.com>
      Reported-by: NKaushik Kanetkar <kaushik.a.kanetkar@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c12c48ce
    • D
      libnvdimm, label: add v1.2 nvdimm label definitions · 564e871a
      Dan Williams 提交于
      In support of improved interoperability between operating systems and pre-boot
      environments the Intel proposed NVDIMM Namespace Specification [1], has been
      adopted and modified to the the UEFI 2.7 NVDIMM Label Protocol [2].
      
      Update the definitions of the namespace label data structures so that the new
      format can be supported alongside the existing label format.
      
      The new specification changes the default label size to 256 bytes, so
      everywhere that relied on sizeof(struct nd_namespace_label) must now use the
      sizeof_namespace_label() helper.
      
      There should be no functional differences from these changes as the
      default is still the v1.1 128-byte format. Future patches will move the
      default to the v1.2 definition.
      
      [1]: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
      [2]: http://www.uefi.org/sites/default/files/resources/UEFI_Spec_2_7.pdfSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      564e871a
  8. 10 6月, 2017 1 次提交
    • D
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations · 0aed55af
      Dan Williams 提交于
      The pmem driver has a need to transfer data with a persistent memory
      destination and be able to rely on the fact that the destination writes are not
      cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
      (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
      to ensure data-writes have reached a power-fail-safe zone in the platform. The
      fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
      around and fence previous writes with an "sfence".
      
      Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
      memcpy_flushcache, that guarantee that the destination buffer is not dirty in
      the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
      will be used to replace the "pmem api" (include/linux/pmem.h +
      arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
      and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
      config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
      otherwise.
      
      This is meant to satisfy the concern from Linus that if a driver wants to do
      something beyond the normal nocache semantics it should be something private to
      that driver [1], and Al's concern that anything uaccess related belongs with
      the rest of the uaccess code [2].
      
      The first consumer of this interface is a new 'copy_from_iter' dax operation so
      that pmem can inject cache maintenance operations without imposing this
      overhead on other dax-capable drivers.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0aed55af
  9. 09 6月, 2017 1 次提交
  10. 05 6月, 2017 1 次提交
  11. 11 5月, 2017 2 次提交
    • V
      libnvdimm, btt: ensure that initializing metadata clears poison · b177fe85
      Vishal Verma 提交于
      If we had badblocks/poison in the metadata area of a BTT, recreating the
      BTT would not clear the poison in all cases, notably the flog area. This
      is because rw_bytes will only clear errors if the request being sent
      down is 512B aligned and sized.
      
      Make sure that when writing the map and info blocks, the rw_bytes being
      sent are of the correct size/alignment. For the flog, instead of doing
      the smaller log_entry writes only, first do a 'wipe' of the entire area
      by writing zeroes in large enough chunks so that errors get cleared.
      
      Cc: Andy Rudoff <andy.rudoff@intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b177fe85
    • V
      libnvdimm: add an atomic vs process context flag to rw_bytes · 3ae3d67b
      Vishal Verma 提交于
      nsio_rw_bytes can clear media errors, but this cannot be done while we
      are in an atomic context due to locking within ACPI. From the BTT,
      ->rw_bytes may be called either from atomic or process context depending
      on whether the calls happen during initialization or during IO.
      
      During init, we want to ensure error clearing happens, and the flag
      marking process context allows nsio_rw_bytes to do that. When called
      during IO, we're in atomic context, and error clearing can be skipped.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3ae3d67b
  12. 09 5月, 2017 1 次提交
    • M
      treewide: use kv[mz]alloc* rather than opencoded variants · 752ade68
      Michal Hocko 提交于
      There are many code paths opencoding kvmalloc.  Let's use the helper
      instead.  The main difference to kvmalloc is that those users are
      usually not considering all the aspects of the memory allocator.  E.g.
      allocation requests <= 32kB (with 4kB pages) are basically never failing
      and invoke OOM killer to satisfy the allocation.  This sounds too
      disruptive for something that has a reasonable fallback - the vmalloc.
      On the other hand those requests might fallback to vmalloc even when the
      memory allocator would succeed after several more reclaim/compaction
      attempts previously.  There is no guarantee something like that happens
      though.
      
      This patch converts many of those places to kv[mz]alloc* helpers because
      they are more conservative.
      
      Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
      Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
      Acked-by: David Sterba <dsterba@suse.com> # btrfs
      Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
      Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Santosh Raspatur <santosh@chelsio.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Yishai Hadas <yishaih@mellanox.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: "Yan, Zheng" <zyan@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      752ade68
  13. 05 5月, 2017 3 次提交
    • D
      libnvdimm, pfn: fix 'npfns' vs section alignment · d5483fed
      Dan Williams 提交于
      Fix failures to create namespaces due to the vmem_altmap not advertising
      enough free space to store the memmap.
      
       WARNING: CPU: 15 PID: 8022 at arch/x86/mm/init_64.c:656 arch_add_memory+0xde/0xf0
       [..]
       Call Trace:
        dump_stack+0x63/0x83
        __warn+0xcb/0xf0
        warn_slowpath_null+0x1d/0x20
        arch_add_memory+0xde/0xf0
        devm_memremap_pages+0x244/0x440
        pmem_attach_disk+0x37e/0x490 [nd_pmem]
        nd_pmem_probe+0x7e/0xa0 [nd_pmem]
        nvdimm_bus_probe+0x71/0x120 [libnvdimm]
        driver_probe_device+0x2bb/0x460
        bind_store+0x114/0x160
        drv_attr_store+0x25/0x30
      
      In commit 658922e5 "libnvdimm, pfn: fix memmap reservation sizing"
      we arranged for the capacity to be allocated, but failed to also update
      the 'npfns' parameter. This leads to cases where there is enough
      capacity reserved to hold all the allocated sections, but
      vmemmap_populate_hugepages() still encounters -ENOMEM from
      altmap_alloc_block_buf().
      
      This fix is a stop-gap until we can teach the core memory hotplug
      implementation to permit sub-section hotplug.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 658922e5 ("libnvdimm, pfn: fix memmap reservation sizing")
      Reported-by: NAnisha Allada <anisha.allada@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d5483fed
    • D
      libnvdimm: handle locked label storage areas · 9d62ed96
      Dan Williams 提交于
      Per the latest version of the "NVDIMM DSM Interface Example" [1], the
      label data retrieval routine can report a "locked" status. In this case
      all regions associated with that DIMM are disabled until the label area
      is unlocked. Provide generic libnvdimm enabling for NVDIMMs with label
      data area locking capabilities.
      
      [1]: http://pmem.io/documents/Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9d62ed96
    • D
      libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED · 8f078b38
      Dan Williams 提交于
      This is a preparation patch for handling locked nvdimm label regions, a
      new concept as introduced by the latest DSM document on pmem.io [1]. A
      future patch will leverage nvdimm_set_locked() at DIMM probe time to
      flag regions that can not be enabled. There should be no functional
      difference resulting from this change.
      
      [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example-V1.3.pdfSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      8f078b38