1. 02 9月, 2020 13 次提交
  2. 30 4月, 2020 5 次提交
  3. 15 1月, 2020 1 次提交
    • D
      acpi/nfit, device-dax: Identify differentiated memory with a unique numa-node · a0a4e71f
      Dan Williams 提交于
      commit 8fc5c73554db0ac18c0c6ac5b2099ab917f83bdf upstream
      
      Persistent memory, as described by the ACPI NFIT (NVDIMM Firmware
      Interface Table), is the first known instance of a memory range
      described by a unique "target" proximity domain. Where "initiator" and
      "target" proximity domains is an approach that the ACPI HMAT
      (Heterogeneous Memory Attributes Table) uses to described the unique
      performance properties of a memory range relative to a given initiator
      (e.g. CPU or DMA device).
      
      Currently the numa-node for a /dev/pmemX block-device or /dev/daxX.Y
      char-device follows the traditional notion of 'numa-node' where the
      attribute conveys the closest online numa-node. That numa-node attribute
      is useful for cpu-binding and memory-binding processes *near* the
      device. However, when the memory range backing a 'pmem', or 'dax' device
      is onlined (memory hot-add) the memory-only-numa-node representing that
      address needs to be differentiated from the set of online nodes. In
      other words, the numa-node association of the device depends on whether
      you can bind processes *near* the cpu-numa-node in the offline
      device-case, or bind process *on* the memory-range directly after the
      backing address range is onlined.
      
      Allow for the case that platform firmware describes persistent memory
      with a unique proximity domain, i.e. when it is distinct from the
      proximity of DRAM and CPUs that are on the same socket. Plumb the Linux
      numa-node translation of that proximity through the libnvdimm region
      device to namespaces that are in device-dax mode. With this in place the
      proposed kmem driver [1] can optionally discover a unique numa-node
      number for the address range as it transitions the memory from an
      offline state managed by a device-driver to an online memory range
      managed by the core-mm.
      
      [1]: https://lore.kernel.org/lkml/20181022201317.8558C1D8@viggo.jf.intel.comReported-by: NFan Du <fan.du@intel.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Oliver O'Halloran" <oohall@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      [yshi: Removed PowerPC stuff which is not applicable 4.19]
      Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: NGavin Shan <shan.gavin@linux.alibaba.com>
      a0a4e71f
  4. 12 10月, 2019 1 次提交
    • A
      libnvdimm/region: Initialize bad block for volatile namespaces · 2e93d24a
      Aneesh Kumar K.V 提交于
      [ Upstream commit c42adf87e4e7ed77f6ffe288dc90f980d07d68df ]
      
      We do check for a bad block during namespace init and that use
      region bad block list. We need to initialize the bad block
      for volatile regions for this to work. We also observe a lockdep
      warning as below because the lock is not initialized correctly
      since we skip bad block init for volatile regions.
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.3.0-rc1-15699-g3dee241c937e #149
       Call Trace:
       [c0000000f95cb250] [c00000000147dd84] dump_stack+0xe8/0x164 (unreliable)
       [c0000000f95cb2a0] [c00000000022ccd8] register_lock_class+0x308/0xa60
       [c0000000f95cb3a0] [c000000000229cc0] __lock_acquire+0x170/0x1ff0
       [c0000000f95cb4c0] [c00000000022c740] lock_acquire+0x220/0x270
       [c0000000f95cb580] [c000000000a93230] badblocks_check+0xc0/0x290
       [c0000000f95cb5f0] [c000000000d97540] nd_pfn_validate+0x5c0/0x7f0
       [c0000000f95cb6d0] [c000000000d98300] nd_dax_probe+0xd0/0x1f0
       [c0000000f95cb760] [c000000000d9b66c] nd_pmem_probe+0x10c/0x160
       [c0000000f95cb790] [c000000000d7f5ec] nvdimm_bus_probe+0x10c/0x240
       [c0000000f95cb820] [c000000000d0f844] really_probe+0x254/0x4e0
       [c0000000f95cb8b0] [c000000000d0fdfc] driver_probe_device+0x16c/0x1e0
       [c0000000f95cb930] [c000000000d10238] device_driver_attach+0x68/0xa0
       [c0000000f95cb970] [c000000000d1040c] __driver_attach+0x19c/0x1c0
       [c0000000f95cb9f0] [c000000000d0c4c4] bus_for_each_dev+0x94/0x130
       [c0000000f95cba50] [c000000000d0f014] driver_attach+0x34/0x50
       [c0000000f95cba70] [c000000000d0e208] bus_add_driver+0x178/0x2f0
       [c0000000f95cbb00] [c000000000d117c8] driver_register+0x108/0x170
       [c0000000f95cbb70] [c000000000d7edb0] __nd_driver_register+0xe0/0x100
       [c0000000f95cbbd0] [c000000001a6baa4] nd_pmem_driver_init+0x34/0x48
       [c0000000f95cbbf0] [c0000000000106f4] do_one_initcall+0x1d4/0x4b0
       [c0000000f95cbcd0] [c0000000019f499c] kernel_init_freeable+0x544/0x65c
       [c0000000f95cbdb0] [c000000000010d6c] kernel_init+0x2c/0x180
       [c0000000f95cbe20] [c00000000000b954] ret_from_kernel_thread+0x5c/0x68
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Link: https://lore.kernel.org/r/20190919083355.26340-1-aneesh.kumar@linux.ibm.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2e93d24a
  5. 09 8月, 2019 4 次提交
    • D
      libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock · 2364ed0d
      Dan Williams 提交于
      commit ca6bf264f6d856f959c4239cda1047b587745c67 upstream.
      
      A multithreaded namespace creation/destruction stress test currently
      deadlocks with the following lockup signature:
      
          INFO: task ndctl:2924 blocked for more than 122 seconds.
                Tainted: G           OE     5.2.0-rc4+ #3382
          "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
          ndctl           D    0  2924   1176 0x00000000
          Call Trace:
           ? __schedule+0x27e/0x780
           schedule+0x30/0xb0
           wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
           ? finish_wait+0x80/0x80
           uuid_store+0xe6/0x2e0 [libnvdimm]
           kernfs_fop_write+0xf0/0x1a0
           vfs_write+0xb7/0x1b0
           ksys_write+0x5c/0xd0
           do_syscall_64+0x60/0x240
      
           INFO: task ndctl:2923 blocked for more than 122 seconds.
                 Tainted: G           OE     5.2.0-rc4+ #3382
           "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
           ndctl           D    0  2923   1175 0x00000000
           Call Trace:
            ? __schedule+0x27e/0x780
            ? __mutex_lock+0x489/0x910
            schedule+0x30/0xb0
            schedule_preempt_disabled+0x11/0x20
            __mutex_lock+0x48e/0x910
            ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
            ? __lock_acquire+0x23f/0x1710
            ? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
            nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
            __dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
            ? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
            dax_pmem_probe+0xc/0x20 [dax_pmem]
            nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
            really_probe+0xef/0x390
            driver_probe_device+0xb4/0x100
      
      In this sequence an 'nd_dax' device is being probed and trying to take
      the lock on its backing namespace to validate that the 'nd_dax' device
      indeed has exclusive access to the backing namespace. Meanwhile, another
      thread is trying to update the uuid property of that same backing
      namespace. So one thread is in the probe path trying to acquire the
      lock, and the other thread has acquired the lock and tries to flush the
      probe path.
      
      Fix this deadlock by not holding the namespace device_lock over the
      wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
      the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
      subsequently dropped internally to wait_nvdimm_bus_probe_idle().
      
      Cc: <stable@vger.kernel.org>
      Fixes: bf9bccc1 ("libnvdimm: pmem label sets and namespace instantiation")
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Tested-by: NJane Chu <jane.chu@oracle.com>
      Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2364ed0d
    • D
      libnvdimm/bus: Prepare the nd_ioctl() path to be re-entrant · 7f000e7b
      Dan Williams 提交于
      commit 6de5d06e657acdbcf9637dac37916a4a5309e0f4 upstream.
      
      In preparation for not holding a lock over the execution of nd_ioctl(),
      update the implementation to allow multiple threads to be attempting
      ioctls at the same time. The bus lock still prevents multiple in-flight
      ->ndctl() invocations from corrupting each other's state, but static
      global staging buffers are moved to the heap.
      Reported-by: NVishal Verma <vishal.l.verma@intel.com>
      Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
      Tested-by: NVishal Verma <vishal.l.verma@intel.com>
      Link: https://lore.kernel.org/r/156341208947.292348.10560140326807607481.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7f000e7b
    • D
      libnvdimm/region: Register badblocks before namespaces · 32485369
      Dan Williams 提交于
      commit 700cd033a82d466ad8f9615f9985525e45f8960a upstream.
      
      Namespace activation expects to be able to reference region badblocks.
      The following warning sometimes triggers when asynchronous namespace
      activation races in front of the completion of namespace probing. Move
      all possible namespace probing after region badblocks initialization.
      
      Otherwise, lockdep sometimes catches the uninitialized state of the
      badblocks seqlock with stack trace signatures like:
      
          INFO: trying to register non-static key.
          pmem2: detected capacity change from 0 to 136365211648
          the code is fine but needs lockdep annotation.
          turning off the locking correctness validator.
          CPU: 9 PID: 358 Comm: kworker/u80:5 Tainted: G           OE     5.2.0-rc4+ #3382
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
          Workqueue: events_unbound async_run_entry_fn
          Call Trace:
           dump_stack+0x85/0xc0
          pmem1.12: detected capacity change from 0 to 8589934592
           register_lock_class+0x56a/0x570
           ? check_object+0x140/0x270
           __lock_acquire+0x80/0x1710
           ? __mutex_lock+0x39d/0x910
           lock_acquire+0x9e/0x180
           ? nd_pfn_validate+0x28f/0x440 [libnvdimm]
           badblocks_check+0x93/0x1f0
           ? nd_pfn_validate+0x28f/0x440 [libnvdimm]
           nd_pfn_validate+0x28f/0x440 [libnvdimm]
           ? lockdep_hardirqs_on+0xf0/0x180
           nd_dax_probe+0x9a/0x120 [libnvdimm]
           nd_pmem_probe+0x6d/0x180 [nd_pmem]
           nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
      
      Fixes: 48af2f7e52f4 ("libnvdimm, pfn: during init, clear errors...")
      Cc: <stable@vger.kernel.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
      Link: https://lore.kernel.org/r/156341208365.292348.1547528796026249120.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      32485369
    • D
      libnvdimm/bus: Prevent duplicate device_unregister() calls · d16bbdbb
      Dan Williams 提交于
      commit 8aac0e2338916e273ccbd438a2b7a1e8c61749f5 upstream.
      
      A multithreaded namespace creation/destruction stress test currently
      fails with signatures like the following:
      
          sysfs group 'power' not found for kobject 'dax1.1'
          RIP: 0010:sysfs_remove_group+0x76/0x80
          Call Trace:
           device_del+0x73/0x370
           device_unregister+0x16/0x50
           nd_async_device_unregister+0x1e/0x30 [libnvdimm]
           async_run_entry_fn+0x39/0x160
           process_one_work+0x23c/0x5e0
           worker_thread+0x3c/0x390
      
          BUG: kernel NULL pointer dereference, address: 0000000000000020
          RIP: 0010:klist_put+0x1b/0x6c
          Call Trace:
           klist_del+0xe/0x10
           device_del+0x8a/0x2c9
           ? __switch_to_asm+0x34/0x70
           ? __switch_to_asm+0x40/0x70
           device_unregister+0x44/0x4f
           nd_async_device_unregister+0x22/0x2d [libnvdimm]
           async_run_entry_fn+0x47/0x15a
           process_one_work+0x1a2/0x2eb
           worker_thread+0x1b8/0x26e
      
      Use the kill_device() helper to atomically resolve the race of multiple
      threads issuing kill, device_unregister(), requests.
      Reported-by: NJane Chu <jane.chu@oracle.com>
      Reported-by: NErwin Tsaur <erwin.tsaur@oracle.com>
      Fixes: 4d88a97a ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
      Cc: <stable@vger.kernel.org>
      Link: https://github.com/pmem/ndctl/issues/96Tested-by: NTested-by: Jane Chu <jane.chu@oracle.com>
      Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      d16bbdbb
  6. 31 7月, 2019 1 次提交
  7. 26 7月, 2019 1 次提交
    • D
      libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields · 2c0222b4
      Dan Williams 提交于
      commit 7e3e888dfc138089f4c15a81b418e88f0978f744 upstream.
      
      At namespace creation time there is the potential for the "expected to
      be zero" fields of a 'pfn' info-block to be filled with indeterminate
      data.  While the kernel buffer is zeroed on allocation it is immediately
      overwritten by nd_pfn_validate() filling it with the current contents of
      the on-media info-block location.  For fields like, 'flags' and the
      'padding' it potentially means that future implementations can not rely on
      those fields being zero.
      
      In preparation to stop using the 'start_pad' and 'end_trunc' fields for
      section alignment, arrange for fields that are not explicitly
      initialized to be guaranteed zero.  Bump the minor version to indicate
      it is safe to assume the 'padding' and 'flags' are zero.  Otherwise,
      this corruption is expected to benign since all other critical fields
      are explicitly initialized.
      
      Note The cc: stable is about spreading this new policy to as many
      kernels as possible not fixing an issue in those kernels.  It is not
      until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to
      section alignment" where this improper initialization becomes a problem.
      So if someone decides to backport "libnvdimm/pfn: Stop padding pmem
      namespaces to section alignment" (which is not tagged for stable), make
      sure this pre-requisite is flagged.
      
      Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwillia2-desk3.amr.corp.intel.com
      Fixes: 32ab0a3f ("libnvdimm, pmem: 'struct page' for pmem")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>	[ppc64]
      Cc: <stable@vger.kernel.org>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jane Chu <jane.chu@oracle.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c0222b4
  8. 19 6月, 2019 1 次提交
    • Q
      libnvdimm: Fix compilation warnings with W=1 · 90a56454
      Qian Cai 提交于
      [ Upstream commit c01dafad77fea8d64c4fdca0a6031c980842ad65 ]
      
      Several places (dimm_devs.c, core.c etc) include label.h but only
      label.c uses NSINDEX_SIGNATURE, so move its definition to label.c
      instead.
      
      In file included from drivers/nvdimm/dimm_devs.c:23:
      drivers/nvdimm/label.h:41:19: warning: 'NSINDEX_SIGNATURE' defined but
      not used [-Wunused-const-variable=]
      
      Also, some places abuse "/**" which is only reserved for the kernel-doc.
      
      drivers/nvdimm/bus.c:648: warning: cannot understand function prototype:
      'struct attribute_group nd_device_attribute_group = '
      drivers/nvdimm/bus.c:677: warning: cannot understand function prototype:
      'struct attribute_group nd_numa_attribute_group = '
      
      Those are just some member assignments for the "struct attribute_group"
      instances and it can't be expressed in the kernel-doc.
      Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      90a56454
  9. 31 5月, 2019 1 次提交
    • D
      libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead · ee6d3eb3
      Dan Williams 提交于
      commit 52f476a323f9efc959be1c890d0cdcf12e1582e0 upstream.
      
      Jeff discovered that performance improves from ~375K iops to ~519K iops
      on a simple psync-write fio workload when moving the location of 'struct
      page' from the default PMEM location to DRAM. This result is surprising
      because the expectation is that 'struct page' for dax is only needed for
      third party references to dax mappings. For example, a dax-mapped buffer
      passed to another system call for direct-I/O requires 'struct page' for
      sending the request down the driver stack and pinning the page. There is
      no usage of 'struct page' for first party access to a file via
      read(2)/write(2) and friends.
      
      However, this "no page needed" expectation is violated by
      CONFIG_HARDENED_USERCOPY and the check_copy_size() performed in
      copy_from_iter_full_nocache() and copy_to_iter_mcsafe(). The
      check_heap_object() helper routine assumes the buffer is backed by a
      slab allocator (DRAM) page and applies some checks.  Those checks are
      invalid, dax pages do not originate from the slab, and redundant,
      dax_iomap_actor() has already validated that the I/O is within bounds.
      Specifically that routine validates that the logical file offset is
      within bounds of the file, then it does a sector-to-pfn translation
      which validates that the physical mapping is within bounds of the block
      device.
      
      Bypass additional hardened usercopy overhead and call the 'no check'
      versions of the copy_{to,from}_iter operations directly.
      
      Fixes: 0aed55af ("x86, uaccess: introduce copy_from_iter_flushcache...")
      Cc: <stable@vger.kernel.org>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Reported-and-tested-by: NJeff Smits <jeff.smits@intel.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee6d3eb3
  10. 22 5月, 2019 1 次提交
    • D
      libnvdimm/namespace: Fix label tracking error · 866f0111
      Dan Williams 提交于
      commit c4703ce11c23423d4b46e3d59aef7979814fd608 upstream.
      
      Users have reported intermittent occurrences of DIMM initialization
      failures due to duplicate allocations of address capacity detected in
      the labels, or errors of the form below, both have the same root cause.
      
          nd namespace1.4: failed to track label: 0
          WARNING: CPU: 17 PID: 1381 at drivers/nvdimm/label.c:863
      
          RIP: 0010:__pmem_label_update+0x56c/0x590 [libnvdimm]
          Call Trace:
           ? nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm]
           nd_pmem_namespace_label_update+0xd6/0x160 [libnvdimm]
           uuid_store+0x17e/0x190 [libnvdimm]
           kernfs_fop_write+0xf0/0x1a0
           vfs_write+0xb7/0x1b0
           ksys_write+0x57/0xd0
           do_syscall_64+0x60/0x210
      
      Unfortunately those reports were typically with a busy parallel
      namespace creation / destruction loop making it difficult to see the
      components of the bug. However, Jane provided a simple reproducer using
      the work-in-progress sub-section implementation.
      
      When ndctl is reconfiguring a namespace it may take an existing defunct
      / disabled namespace and reconfigure it with a new uuid and other
      parameters. Critically namespace_update_uuid() takes existing address
      resources and renames them for the new namespace to use / reconfigure as
      it sees fit. The bug is that this rename only happens in the resource
      tracking tree. Existing labels with the old uuid are not reaped leading
      to a scenario where multiple active labels reference the same span of
      address range.
      
      Teach namespace_update_uuid() to flag any references to the old uuid for
      reaping at the next label update attempt.
      
      Cc: <stable@vger.kernel.org>
      Fixes: bf9bccc1 ("libnvdimm: pmem label sets and namespace instantiation")
      Link: https://github.com/pmem/ndctl/issues/91Reported-by: NJane Chu <jane.chu@oracle.com>
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Reported-by: NErwin Tsaur <erwin.tsaur@oracle.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      866f0111
  11. 17 5月, 2019 3 次提交
  12. 24 3月, 2019 4 次提交
    • O
      libnvdimm: Fix altmap reservation size calculation · 3b8da135
      Oliver O'Halloran 提交于
      commit 07464e88365e9236febaca9ed1a2e2006d8bc952 upstream.
      
      Libnvdimm reserves the first 8K of pfn and devicedax namespaces to
      store a superblock describing the namespace. This 8K reservation
      is contained within the altmap area which the kernel uses for the
      vmemmap backing for the pages within the namespace. The altmap
      allows for some pages at the start of the altmap area to be reserved
      and that mechanism is used to protect the superblock from being
      re-used as vmemmap backing.
      
      The number of PFNs to reserve is calculated using:
      
      	PHYS_PFN(SZ_8K)
      
      Which is implemented as:
      
       #define PHYS_PFN(x) ((unsigned long)((x) >> PAGE_SHIFT))
      
      So on systems where PAGE_SIZE is greater than 8K the reservation
      size is truncated to zero and the superblock area is re-used as
      vmemmap backing. As a result all the namespace information stored
      in the superblock (i.e. if it's a PFN or DAX namespace) is lost
      and the namespace needs to be re-created to get access to the
      contents.
      
      This patch fixes this by using PFN_UP() rather than PHYS_PFN() to ensure
      that at least one page is reserved. On systems with a 4K pages size this
      patch should have no effect.
      
      Cc: stable@vger.kernel.org
      Cc: Dan Williams <dan.j.williams@intel.com>
      Fixes: ac515c08 ("libnvdimm, pmem, pfn: move pfn setup to the core")
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3b8da135
    • D
      libnvdimm/pmem: Honor force_raw for legacy pmem regions · 696c3752
      Dan Williams 提交于
      commit fa7d2e639cd90442d868dfc6ca1d4cc9d8bf206e upstream.
      
      For recovery, where non-dax access is needed to a given physical address
      range, and testing, allow the 'force_raw' attribute to override the
      default establishment of a dev_pagemap.
      
      Otherwise without this capability it is possible to end up with a
      namespace that can not be activated due to corrupted info-block, and one
      that can not be repaired due to a section collision.
      
      Cc: <stable@vger.kernel.org>
      Fixes: 004f1afb ("libnvdimm, pmem: direct map legacy pmem by default")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      696c3752
    • W
      libnvdimm, pfn: Fix over-trim in trim_pfn_device() · 6a89ed7a
      Wei Yang 提交于
      commit f101ada7da6551127d192c2f1742c1e9e0f62799 upstream.
      
      When trying to see whether current nd_region intersects with others,
      trim_pfn_device() has already calculated the *size* to be expanded to
      SECTION size.
      
      Do not double append 'adjust' to 'size' when calculating whether the end
      of a region collides with the next pmem region.
      
      Fixes: ae86cbfef381 "libnvdimm, pfn: Pad pfn namespaces relative to other regions"
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NWei Yang <richardw.yang@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6a89ed7a
    • D
      libnvdimm/label: Clear 'updating' flag after label-set update · 2b88d92e
      Dan Williams 提交于
      commit 966d23a006ca7b44ac8cf4d0c96b19785e0c3da0 upstream.
      
      The UEFI 2.7 specification sets expectations that the 'updating' flag is
      eventually cleared. To date, the libnvdimm core has never adhered to
      that protocol. The policy of the core matches the policy of other
      multi-device info-block formats like MD-Software-RAID that expect
      administrator intervention on inconsistent info-blocks, not automatic
      invalidation.
      
      However, some pre-boot environments may unfortunately attempt to "clean
      up" the labels and invalidate a set when it fails to find at least one
      "non-updating" label in the set. Clear the updating flag after set
      updates to minimize the window of vulnerability to aggressive pre-boot
      environments.
      
      Ideally implementations would not write to the label area outside of
      creating namespaces.
      
      Note that this only minimizes the window, it does not close it as the
      system can still crash while clearing the flag and the set can be
      subsequently deleted / invalidated by the pre-boot environment.
      
      Fixes: f524bf27 ("libnvdimm: write pmem label set")
      Cc: <stable@vger.kernel.org>
      Cc: Kelly Couch <kelly.j.couch@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b88d92e
  13. 13 1月, 2019 1 次提交
  14. 13 12月, 2018 1 次提交
  15. 14 11月, 2018 2 次提交