1. 01 5月, 2019 1 次提交
    • D
      libnvdimm/namespace: Fix label tracking error · c4703ce1
      Dan Williams 提交于
      Users have reported intermittent occurrences of DIMM initialization
      failures due to duplicate allocations of address capacity detected in
      the labels, or errors of the form below, both have the same root cause.
      
          nd namespace1.4: failed to track label: 0
          WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863
      
          RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm]
          Call Trace:
           ? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm]
           nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm]
           uuid_store+0x17e/0x190 [libnvdimm]
           kernfs_fop_write+0xf0/0x1a0
           vfs_write+0xb7/0x1b0
           ksys_write+0x57/0xd0
           do_syscall_64+0x60/0x210
      
      Unfortunately those reports were typically with a busy parallel
      namespace creation / destruction loop making it difficult to see the
      components of the bug. However, Jane provided a simple reproducer using
      the work-in-progress sub-section implementation.
      
      When ndctl is reconfiguring a namespace it may take an existing defunct
      / disabled namespace and reconfigure it with a new uuid and other
      parameters. Critically namespace_update_uuid() takes existing address
      resources and renames them for the new namespace to use / reconfigure as
      it sees fit. The bug is that this rename only happens in the resource
      tracking tree. Existing labels with the old uuid are not reaped leading
      to a scenario where multiple active labels reference the same span of
      address range.
      
      Teach namespace_update_uuid() to flag any references to the old uuid for
      reaping at the next label update attempt.
      
      Cc: <stable@vger.kernel.org>
      Fixes: bf9bccc1 ("libnvdimm: pmem label sets and namespace instantiation")
      Link: https://github.com/pmem/ndctl/issues/91Reported-by: NJane Chu <jane.chu@oracle.com>
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Reported-by: NErwin Tsaur <erwin.tsaur@oracle.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c4703ce1
  2. 08 4月, 2019 1 次提交
  3. 30 3月, 2019 2 次提交
  4. 28 3月, 2019 1 次提交
  5. 23 3月, 2019 1 次提交
  6. 05 3月, 2019 1 次提交
  7. 02 3月, 2019 1 次提交
  8. 01 3月, 2019 2 次提交
    • V
      libnvdimm/btt: Fix LBA masking during 'free list' population · 9dedc73a
      Vishal Verma 提交于
      The Linux BTT implementation assumes that log entries will never have
      the 'zero' flag set, and indeed it never sets that flag for log entries
      itself.
      
      However, the UEFI spec is ambiguous on the exact format of the LBA field
      of a log entry, specifically as to whether it should include the
      additional flag bits or not. While a zero bit doesn't make sense in the
      context of a log entry, other BTT implementations might still have it set.
      
      If an implementation does happen to have it set, we would happily read
      it in as the next block to write to for writes. Since a high bit is set,
      it pushes the block number out of the range of an 'arena', and we fail
      such a write with an EIO.
      
      Follow the robustness principle, and tolerate such implementations by
      stripping out the zero flag when populating the free list during
      initialization. Additionally, use the same stripped out entries for
      detection of incomplete writes and map restoration that happens at this
      stage.
      
      Add a sysfs file 'log_zero_flags' that indicates the ability to accept
      such a layout to userspace applications. This enables 'ndctl
      check-namespace' to recognize whether the kernel is able to handle zero
      flags, or whether it should attempt a fix-up under the --repair option.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Reported-by: NDexuan Cui <decui@microsoft.com>
      Reported-by: NPedro d'Aquino Filocre F S Barbuda <pbarbuda@microsoft.com>
      Tested-by: NDexuan Cui <decui@microsoft.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9dedc73a
    • V
      libnvdimm/btt: Remove unnecessary code in btt_freelist_init · 2f8c9011
      Vishal Verma 提交于
      We call btt_log_read() twice, once to get the 'old' log entry, and again
      to get the 'new' entry. However, we have no use for the 'old' entry, so
      remove it.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      2f8c9011
  9. 23 2月, 2019 1 次提交
  10. 13 2月, 2019 4 次提交
    • D
      libnvdimm/pmem: Honor force_raw for legacy pmem regions · fa7d2e63
      Dan Williams 提交于
      For recovery, where non-dax access is needed to a given physical address
      range, and testing, allow the 'force_raw' attribute to override the
      default establishment of a dev_pagemap.
      
      Otherwise without this capability it is possible to end up with a
      namespace that can not be activated due to corrupted info-block, and one
      that can not be repaired due to a section collision.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 004f1afb ("libnvdimm, pmem: direct map legacy pmem by default")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      fa7d2e63
    • D
      libnvdimm/pfn: Account for PAGE_SIZE > info-block-size in nd_pfn_init() · 11a35810
      Dan Williams 提交于
      Similar to "libnvdimm: Fix altmap reservation size calculation" provide
      for a reservation of a full page worth of info block space at info-block
      establishment time.  Typically there is already slack in the padding
      from honoring the default 2MB alignment, but provide for a reservation
      for corner case configurations that would otherwise fit.
      
      Cc: Oliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      11a35810
    • O
      libnvdimm: Fix altmap reservation size calculation · 07464e88
      Oliver O'Halloran 提交于
      Libnvdimm reserves the first 8K of pfn and devicedax namespaces to
      store a superblock describing the namespace. This 8K reservation
      is contained within the altmap area which the kernel uses for the
      vmemmap backing for the pages within the namespace. The altmap
      allows for some pages at the start of the altmap area to be reserved
      and that mechanism is used to protect the superblock from being
      re-used as vmemmap backing.
      
      The number of PFNs to reserve is calculated using:
      
      	PHYS_PFN(SZ_8K)
      
      Which is implemented as:
      
       #define PHYS_PFN(x) ((unsigned long)((x) >> PAGE_SHIFT))
      
      So on systems where PAGE_SIZE is greater than 8K the reservation
      size is truncated to zero and the superblock area is re-used as
      vmemmap backing. As a result all the namespace information stored
      in the superblock (i.e. if it's a PFN or DAX namespace) is lost
      and the namespace needs to be re-created to get access to the
      contents.
      
      This patch fixes this by using PFN_UP() rather than PHYS_PFN() to ensure
      that at least one page is reserved. On systems with a 4K pages size this
      patch should have no effect.
      
      Cc: stable@vger.kernel.org
      Cc: Dan Williams <dan.j.williams@intel.com>
      Fixes: ac515c08 ("libnvdimm, pmem, pfn: move pfn setup to the core")
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      07464e88
    • W
      libnvdimm, pfn: Fix over-trim in trim_pfn_device() · f101ada7
      Wei Yang 提交于
      When trying to see whether current nd_region intersects with others,
      trim_pfn_device() has already calculated the *size* to be expanded to
      SECTION size.
      
      Do not double append 'adjust' to 'size' when calculating whether the end
      of a region collides with the next pmem region.
      
      Fixes: ae86cbfe "libnvdimm, pfn: Pad pfn namespaces relative to other regions"
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NWei Yang <richardw.yang@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f101ada7
  11. 03 2月, 2019 1 次提交
  12. 31 1月, 2019 1 次提交
  13. 22 1月, 2019 1 次提交
    • D
      libnvdimm/security: Require nvdimm_security_setup_events() to succeed · 1cd73865
      Dan Williams 提交于
      The following warning:
      
          ACPI0012:00: security event setup failed: -19
      
      ...is meant to capture exceptional failures of sysfs_get_dirent(),
      however it will also fail in the common case when security support is
      disabled. A few issues:
      
      1/ A dev_warn() report for a common case is too chatty
      2/ The setup of this notifier is generic, no need for it to be driven
         from the nfit driver, it can exist completely in the core.
      3/ If it fails for any reason besides security support being disabled,
         that's fatal and should abort DIMM activation. Userspace may hang if
         it never gets overwrite notifications.
      4/ The dirent needs to be released.
      
      Move the call to the core 'dimm' driver, make it conditional on security
      support being active, make it fatal for the exceptional case, add the
      missing sysfs_put() at device disable time.
      
      Fixes: 7d988097 ("...Add security DSM overwrite support")
      Reviewed-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      1cd73865
  14. 16 1月, 2019 2 次提交
    • D
      libnvdimm/security: Fix nvdimm_security_state() state request selection · faa8bd6e
      Dave Jiang 提交于
      The input parameter should be enum nvdimm_passphrase_type instead of bool
      for selection of master/user for selection of extended master passphrase
      state or the regular user passphrase state.
      
      Fixes: 89fa9d8e ("...add Intel DSM 1.8 master passphrase support")
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      faa8bd6e
    • D
      libnvdimm/label: Clear 'updating' flag after label-set update · 966d23a0
      Dan Williams 提交于
      The UEFI 2.7 specification sets expectations that the 'updating' flag is
      eventually cleared. To date, the libnvdimm core has never adhered to
      that protocol. The policy of the core matches the policy of other
      multi-device info-block formats like MD-Software-RAID that expect
      administrator intervention on inconsistent info-blocks, not automatic
      invalidation.
      
      However, some pre-boot environments may unfortunately attempt to "clean
      up" the labels and invalidate a set when it fails to find at least one
      "non-updating" label in the set. Clear the updating flag after set
      updates to minimize the window of vulnerability to aggressive pre-boot
      environments.
      
      Ideally implementations would not write to the label area outside of
      creating namespaces.
      
      Note that this only minimizes the window, it does not close it as the
      system can still crash while clearing the flag and the set can be
      subsequently deleted / invalidated by the pre-boot environment.
      
      Fixes: f524bf27 ("libnvdimm: write pmem label set")
      Cc: <stable@vger.kernel.org>
      Cc: Kelly Couch <kelly.j.couch@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      966d23a0
  15. 07 1月, 2019 1 次提交
    • D
      acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node · 8fc5c735
      Dan Williams 提交于
      Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
      Interface Table), is the first known instance of a memory range
      described by a unique "target" proximity domain. Where "initiator" and
      "target" proximity domains is an approach that the ACPI HMAT
      (Heterogeneous Memory Attributes Table) uses to described the unique
      performance properties of a memory range relative to a given initiator
      (e.g. CPU or DMA device).
      
      Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
      char-device follows the traditional notion of 'numa-node' where the
      attribute conveys the closest online numa-node. That numa-node attribute
      is useful for cpu-binding and memory-binding processes *near* the
      device. However, when the memory range backing a 'pmem', or 'dax' device
      is onlined (memory hot-add) the memory-only-numa-node representing that
      address needs to be differentiated from the set of online nodes. In
      other words, the numa-node association of the device depends on whether
      you can bind processes *near* the cpu-numa-node in the offline
      device-case, or bind process *on* the memory-range directly after the
      backing address range is onlined.
      
      Allow for the case that platform firmware describes persistent memory
      with a unique proximity domain, i.e. when it is distinct from the
      proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
      numa-node translation of that proximity through the libnvdimm region
      device to namespaces that are in device-dax mode. With this in place the
      proposed kmem driver [1] can optionally discover a unique numa-node
      number for the address range as it transitions the memory from an
      offline state managed by a device-driver to an online memory range
      managed by the core-mm.
      
      [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.comReported-by: NFan Du <fan.du@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Oliver O'Halloran" <oohall@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      8fc5c735
  16. 29 12月, 2018 1 次提交
    • D
      mm, devm_memremap_pages: fix shutdown handling · a95c90f1
      Dan Williams 提交于
      The last step before devm_memremap_pages() returns success is to allocate
      a release action, devm_memremap_pages_release(), to tear the entire setup
      down.  However, the result from devm_add_action() is not checked.
      
      Checking the error from devm_add_action() is not enough.  The api
      currently relies on the fact that the percpu_ref it is using is killed by
      the time the devm_memremap_pages_release() is run.  Rather than continue
      this awkward situation, offload the responsibility of killing the
      percpu_ref to devm_memremap_pages_release() directly.  This allows
      devm_memremap_pages() to do the right thing relative to init failures and
      shutdown.
      
      Without this change we could fail to register the teardown of
      devm_memremap_pages().  The likelihood of hitting this failure is tiny as
      small memory allocations almost always succeed.  However, the impact of
      the failure is large given any future reconfiguration, or disable/enable,
      of an nvdimm namespace will fail forever as subsequent calls to
      devm_memremap_pages() will fail to setup the pgmap_radix since there will
      be stale entries for the physical address range.
      
      An argument could be made to require that the ->kill() operation be set in
      the @pgmap arg rather than passed in separately.  However, it helps code
      readability, tracking the lifetime of a given instance, to be able to grep
      the kill routine directly at the devm_memremap_pages() call site.
      
      Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Fixes: e8d51348 ("memremap: change devm_memremap_pages interface...")
      Reviewed-by: N"Jérôme Glisse" <jglisse@redhat.com>
      Reported-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a95c90f1
  17. 23 12月, 2018 1 次提交
    • D
      libnvdimm/security: Quiet security operations · 37379cfc
      Dan Williams 提交于
      The security implementation is too chatty. For example, the common case
      is that security is not enabled / setup, and booting a qemu
      configuration currently yields:
      
          nvdimm nmem0: request_key() found no key
          nvdimm nmem0: failed to unlock dimm: -126
          nvdimm nmem1: request_key() found no key
          nvdimm nmem1: failed to unlock dimm: -126
      
      Convert all security related log messages to debug level.
      
      Cc: Dave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      37379cfc
  18. 22 12月, 2018 6 次提交
  19. 14 12月, 2018 4 次提交
  20. 11 12月, 2018 3 次提交
  21. 06 12月, 2018 1 次提交
  22. 05 12月, 2018 1 次提交
    • D
      acpi/nfit: Add support for Intel DSM 1.8 commands · b3ed2ce0
      Dave Jiang 提交于
      Add command definition for security commands defined in Intel DSM
      specification v1.8 [1]. This includes "get security state", "set
      passphrase", "unlock unit", "freeze lock", "secure erase", "overwrite",
      "overwrite query", "master passphrase enable/disable", and "master
      erase", . Since this adds several Intel definitions, move the relevant
      bits to their own header.
      
      These commands mutate physical data, but that manipulation is not cache
      coherent. The requirement to flush and invalidate caches makes these
      commands unsuitable to be called from userspace, so extra logic is added
      to detect and block these commands from being submitted via the ioctl
      command submission path.
      
      Lastly, the commands may contain sensitive key material that should not
      be dumped in a standard debug session. Update the nvdimm-command
      payload-dump facility to move security command payloads behind a
      default-off compile time switch.
      
      [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.8.pdfSigned-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b3ed2ce0
  23. 16 11月, 2018 1 次提交
  24. 12 10月, 2018 1 次提交