提交 · 6c1400b39151625c356e84d44c9e9bfb67109dbd · openanolis / cloud-kernel

14 11月, 2018 3 次提交

libnvdimm, pmem: Fix badblocks population for 'raw' namespaces · 6c1400b3

由 Dan Williams 提交于 10月 04, 2018

commit 91ed7ac444ef749603a95629a5ec483988c4f14b upstream.

The driver is only initializing bb_res in the devm_memremap_pages()
paths, but the raw namespace case is passing an uninitialized bb_res to
nvdimm_badblocks_populate().

Fixes: e8d51348 ("memremap: change devm_memremap_pages interface...")
Cc: <stable@vger.kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: NJacek Zloch <jacek.zloch@intel.com>
Reported-by: NKrzysztof Rusocki <krzysztof.rusocki@intel.com>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6c1400b3

libnvdimm, region: Fail badblocks listing for inactive regions · 8f696986

由 Dan Williams 提交于 9月 27, 2018

commit 5d394eee2c102453278d81d9a7cf94c80253486a upstream.

While experimenting with region driver loading the following backtrace
was triggered:

 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 [..]
 Call Trace:
  dump_stack+0x85/0xcb
  register_lock_class+0x571/0x580
  ? __lock_acquire+0x2ba/0x1310
  ? kernfs_seq_start+0x2a/0x80
  __lock_acquire+0xd4/0x1310
  ? dev_attr_show+0x1c/0x50
  ? __lock_acquire+0x2ba/0x1310
  ? kernfs_seq_start+0x2a/0x80
  ? lock_acquire+0x9e/0x1a0
  lock_acquire+0x9e/0x1a0
  ? dev_attr_show+0x1c/0x50
  badblocks_show+0x70/0x190
  ? dev_attr_show+0x1c/0x50
  dev_attr_show+0x1c/0x50

This results from a missing successful call to devm_init_badblocks()
from nd_region_probe(). Block attempts to show badblocks while the
region is not enabled.

Fixes: 6a6bef90 ("libnvdimm: add mechanism to publish badblocks...")
Cc: <stable@vger.kernel.org>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8f696986

libnvdimm: Hold reference on parent while scheduling async init · 4f1a55a4

由 Alexander Duyck 提交于 9月 25, 2018

commit b6eae0f61db27748606cc00dafcfd1e2c032f0a5 upstream.

Unlike asynchronous initialization in the core we have not yet associated
the device with the parent, and as such the device doesn't hold a reference
to the parent.

In order to resolve that we should be holding a reference on the parent
until the asynchronous initialization has completed.

Cc: <stable@vger.kernel.org>
Fixes: 4d88a97a ("libnvdimm: ...base ... infrastructure")
Signed-off-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4f1a55a4

21 8月, 2018 1 次提交

libnvdimm, pmem: Restore page attributes when clearing errors · c953cc98

由 Dan Williams 提交于 7月 13, 2018

Use clear_mce_nospec() to restore WB mode for the kernel linear mapping
of a pmem page that was marked 'HWPoison'. A page with 'HWPoison' set
has also been marked UC in PAT (page attribute table) via
set_mce_nospec() to prevent speculative retrievals of poison.

The 'HWPoison' flag is only cleared when overwriting an entire page.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>

c953cc98

20 8月, 2018 1 次提交

libnvdimm: fix ars_status output length calculation · 286e8771

由 Vishal Verma 提交于 8月 10, 2018

Commit efda1b5d ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling")
Introduced additional hardening for ambiguity in the ACPI spec for
ars_status output sizing. However, it had a couple of cases mixed up.
Where it should have been checking for (and returning) "out_field[1] -
4" it was using "out_field[1] - 8" and vice versa.

This caused a four byte discrepancy in the buffer size passed on to
the command handler, and in some cases, this caused memory corruption
like:

  ./daxdev-errors.sh: line 76: 24104 Aborted   (core dumped) ./daxdev-errors $busdev $region
  malloc(): memory corruption
  Program received signal SIGABRT, Aborted.
  [...]
  #5  0x00007ffff7865a2e in calloc () from /lib64/libc.so.6
  #6  0x00007ffff7bc2970 in ndctl_bus_cmd_new_ars_status (ars_cap=ars_cap@entry=0x6153b0) at ars.c:136
  #7  0x0000000000401644 in check_ars_status (check=0x7fffffffdeb0, bus=0x604c20) at daxdev-errors.c:144
  #8  test_daxdev_clear_error (region_name=<optimized out>, bus_name=<optimized out>)
      at daxdev-errors.c:332

Cc: <stable@vger.kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Lukasz Dorau <lukasz.dorau@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Fixes: efda1b5d ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling")
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-of-by: NDave Jiang <dave.jiang@intel.com>

286e8771

31 7月, 2018 1 次提交

libnvdimm, pmem: kaddr and pfn can be NULL to ->direct_access() · 46a590cd

由 Huaisheng Ye 提交于 7月 30, 2018

pmem_direct_access() needs to check the validity of pointers kaddr
and pfn for NULL assignment. If anyone equals to NULL, it doesn't need
to calculate the value.

If pointer equals to NULL, that is to say callers may have no need for
kaddr or pfn, so this patch is prepared for allowing them to pass in
NULL instead of having to pass in a pointer or local variable that
they then just throw away.
Signed-off-by: NHuaisheng Ye <yehs1@lenovo.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>

46a590cd

26 7月, 2018 2 次提交

libnvdimm: Export max available extent · 1e687220

由 Keith Busch 提交于 7月 24, 2018

The 'available_size' attribute showing the combined total of all
unallocated space isn't always useful to know how large of a namespace
a user may be able to allocate if the region is fragmented. This patch
will export the largest extent of unallocated space that may be allocated
to create a new namespace.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>

1e687220

libnvdimm: Use max contiguous area for namespace size · 12e3129e

由 Keith Busch 提交于 7月 24, 2018

This patch will find the max contiguous area to determine the largest
pmem namespace size that can be created. If the requested size exceeds
the largest available, ENOSPC error will be returned.

This fixes the allocation underrun error and wrong error return code
that have otherwise been observed as the following kernel warning:

WARNING: CPU: <CPU> PID: <PID> at drivers/nvdimm/namespace_devs.c:913 size_store

Fixes: a1f3e4d6 ("libnvdimm, region: update nd_region_available_dpa() for multi-pmem support")
Cc: <stable@vger.kernel.org>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>

12e3129e

18 7月, 2018 2 次提交

block: Add and use op_stat_group() for indexing disk_stat fields. · ddcf35d3

由 Michael Callahan 提交于 7月 18, 2018

Add and use a new op_stat_group() function for indexing partition stat
fields rather than indexing them by rq_data_dir() or bio_data_dir().
This function works similarly to op_is_sync() in that it takes the
request::cmd_flags or bio::bi_opf flags and determines which stats
should et updated.

In addition, the second parameter to generic_start_io_acct() and
generic_end_io_acct() is now a REQ_OP rather than simply a read or
write bit and it uses op_stat_group() on the parameter to determine
the stat group.

Note that the partition in_flight counts are not part of the per-cpu
statistics and as such are not indexed via this function.  It's now
indexed by op_is_write().

tj: Refreshed on top of v4.17.  Updated to pass around REQ_OP.
Signed-off-by: NMichael Callahan <michaelcallahan@fb.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Matias Bjorling <mb@lightnvm.io>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Alasdair Kergon <agk@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ddcf35d3

block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb

由 Tejun Heo 提交于 7月 18, 2018

c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for
read/write") replaced @OP with boolean @is_write, which limited the
amount of information going into ->rw_page() and more importantly
page_endio(), which removed the need to expose block internals to mm.

Unfortunately, we want to track discards separately and @is_write
isn't enough information.  This patch updates bdev_ops->rw_page() to
take REQ_OP instead but leaves page_endio() to take bool @is_write.
This allows the block part of operations to have enough information
while not leaking it to mm.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3f289dcb

15 7月, 2018 1 次提交

libnvdimm: Introduce locked DIMM capacity support · 08e6b3c6

由 Dan Williams 提交于 6月 13, 2018

When a DIMM is locked its namespace label area may not be. Introduce the
distinction of locked namespaces to allow namespace enumeration while
the capacity is locked.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

08e6b3c6

29 6月, 2018 2 次提交

libnvdimm, pmem: Fix memcpy_mcsafe() return code handling in nsio_rw_bytes() · b62cc6fd

由 Dan Williams 提交于 6月 18, 2018

Commit 60622d68 "x86/asm/memcpy_mcsafe: Return bytes remaining"
converted callers of memcpy_mcsafe() to expect a positive 'bytes
remaining' value rather than a negative error code. The nsio_rw_bytes()
conversion failed to return success. The failure is benign in that
nsio_rw_bytes() will end up writing back what it just read.

Fixes: 60622d68 ("x86/asm/memcpy_mcsafe: Return bytes remaining")
Cc: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b62cc6fd

pmem: only set QUEUE_FLAG_DAX for fsdax mode · 4557641b

由 Ross Zwisler 提交于 6月 26, 2018

QUEUE_FLAG_DAX is an indication that a given block device supports
filesystem DAX and should not be set for PMEM namespaces which are in "raw"
mode.  These namespaces lack struct page and are prevented from
participating in filesystem DAX as of commit 569d0365 ("dax: require
'struct page' by default for filesystem dax").
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: NMike Snitzer <snitzer@redhat.com>
Fixes: 569d0365 ("dax: require 'struct page' by default for filesystem dax")
Cc: stable@vger.kernel.org
Acked-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

4557641b

07 6月, 2018 3 次提交

libnvdimm, pmem: Do not flush power-fail protected CPU caches · 546eb031

由 Ross Zwisler 提交于 6月 06, 2018

This commit:

5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()")

intended to make sure that deep flush was always available even on
platforms which support a power-fail protected CPU cache.  An unintended
side effect of this change was that we also lost the ability to skip
flushing CPU caches on those power-fail protected CPU cache.

Fix this by skipping the low level cache flushing in dax_flush() if we have
CPU caches which are power-fail protected.  The user can still override this
behavior by manually setting the write_cache state of a namespace.  See
libndctl's ndctl_namespace_write_cache_is_enabled(),
ndctl_namespace_enable_write_cache() and
ndctl_namespace_disable_write_cache() functions.

Cc: <stable@vger.kernel.org>
Fixes: 5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

546eb031

libnvdimm, pmem: Unconditionally deep flush on *sync · ce7f11a2

由 Ross Zwisler 提交于 6月 06, 2018

Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
write to each of the flush hints for a region) in response to an
msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
we were setting up the request queue.  This happens due to the write cache
value passed in to blk_queue_write_cache(), which then causes the block
layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set.  We do have a
"write_cache" sysfs entry for namespaces, i.e.:

  /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache

which can be used to control whether or not the kernel thinks a given
namespace has a write cache, but this didn't modify the deep flush behavior
that we set up when the driver was initialized.  Instead, it only modified
whether or not DAX would flush CPU caches via dax_flush() in response to
*sync calls.

Simplify this by making the *sync deep flush always happen, regardless of
the write cache setting of a namespace.  The DAX CPU cache flushing will
still be controlled the write_cache setting of the namespace.

Cc: <stable@vger.kernel.org>
Fixes: 5fdf8e5b ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ce7f11a2

libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH · d2d6364d

由 Ross Zwisler 提交于 6月 06, 2018

Complete the move from REQ_FLUSH to REQ_PREFLUSH that apparently started
way back in v4.8.
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d2d6364d

03 6月, 2018 2 次提交

libnvdimm, e820: Register all pmem resources · d76401ad

由 Dan Williams 提交于 6月 02, 2018

There is currently a mismatch between the resources that will trigger
the e820_pmem driver to register/load and the resources that will
actually be surfaced as pmem ranges. register_e820_pmem() uses
walk_iomem_res_desc() which includes children and siblings. In contrast,
e820_pmem_probe() only considers top level resources. For example the
following resource tree results in the driver being loaded, but no
resources being registered:

    398000000000-39bfffffffff : PCI Bus 0000:ae
      39be00000000-39bf07ffffff : PCI Bus 0000:af
        39be00000000-39beffffffff : 0000:af:00.0
          39be10000000-39beffffffff : Persistent Memory (legacy)

Fix this up to allow definitions of "legacy" pmem ranges anywhere in
system-physical address space. Not that it is a recommended or safe to
define a pmem range in PCI space, but it is useful for debug /
experimentation, and the restriction on being a top-level resource was
arbitrary.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d76401ad

libnvdimm: Debug probe times · 3f46833d

由 Dan Williams 提交于 6月 01, 2018

Instrument nvdimm_bus_probe() to emit timestamps for the start and end
of libnvdimm device probing. This is useful for identifying sources of
libnvdimm sub-system initialization latency.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

3f46833d

01 6月, 2018 1 次提交

linvdimm, pmem: Preserve read-only setting for pmem devices · 254a4cd5

由 Robert Elliott 提交于 5月 31, 2018

The pmem driver does not honor a forced read-only setting for very long:
	$ blockdev --setro /dev/pmem0
	$ blockdev --getro /dev/pmem0
	1

followed by various commands like these:
	$ blockdev --rereadpt /dev/pmem0
	or
	$ mkfs.ext4 /dev/pmem0

results in this in the kernel serial log:
	 nd_pmem namespace0.0: region0 read-write, marking pmem0 read-write

with the read-only setting lost:
	$ blockdev --getro /dev/pmem0
	0

That's from bus.c nvdimm_revalidate_disk(), which always applies the
setting from nd_region (which is initially based on the ACPI NFIT
NVDIMM state flags not_armed bit).

In contrast, commit 20bd1d02 ("scsi: sd: Keep disk read-only when
re-reading partition") fixed this issue for SCSI devices to preserve
the previous setting if it was set to read-only.

This patch modifies bus.c to preserve any previous read-only setting.
It also eliminates the kernel serial log print except for cases where
read-write is changed to read-only, so it doesn't print read-only to
read-only non-changes.

Cc: <stable@vger.kernel.org>
Fixes: 58138820 ("libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only")
Signed-off-by: NRobert Elliott <elliott@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

254a4cd5

23 5月, 2018 2 次提交

pmem: Switch to copy_to_iter_mcsafe() · 6dfdb2b6

由 Dan Williams 提交于 5月 02, 2018

Use the machine check safe version of copy_to_iter() for the
->copy_to_iter() operation published by the pmem driver.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6dfdb2b6

dax: Introduce a ->copy_to_iter dax operation · b3a9a0c3

由 Dan Williams 提交于 5月 02, 2018

Similar to the ->copy_from_iter() operation, a platform may want to
deploy an architecture or device specific routine for handling reads
from a dax_device like /dev/pmemX. On x86 this routine will point to a
machine check safe version of copy_to_iter(). For now, add the plumbing
to device-mapper and the dax core.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b3a9a0c3

22 5月, 2018 1 次提交

mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS · e7638488

由 Dan Williams 提交于 5月 16, 2018

In preparation for fixing dax-dma-vs-unmap issues, filesystems need to
be able to rely on the fact that they will get wakeups on dev_pagemap
page-idle events. Introduce MEMORY_DEVICE_FS_DAX and
generic_dax_page_free() as common indicator / infrastructure for dax
filesytems to require. With this change there are no users of the
MEMORY_DEVICE_HOST designation, so remove it.

The HMM sub-system extended dev_pagemap to arrange a callback when a
dev_pagemap managed page is freed. Since a dev_pagemap page is free /
idle when its reference count is 1 it requires an additional branch to
check the page-type at put_page() time. Given put_page() is a hot-path
we do not want to incur that check if HMM is not in use, so a static
branch is used to avoid that overhead when not necessary.

Now, the FS_DAX implementation wants to reuse this mechanism for
receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific
static-key into a generic mechanism that either HMM or FS_DAX code paths
can enable.

For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support,
care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure.
However, we still need to support FS_DAX in the FS_DAX_LIMITED case
implemented by the s390/dcssblk driver.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Reported-by: Nkbuild test robot <lkp@intel.com>
Reported-by: NThomas Meyer <thomas@m3y3r.de>
Reported-by: NDave Jiang <dave.jiang@intel.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e7638488

15 5月, 2018 1 次提交

x86/asm/memcpy_mcsafe: Return bytes remaining · 60622d68

由 Dan Williams 提交于 5月 03, 2018

Machine check safe memory copies are currently deployed in the pmem
driver whenever reading from persistent memory media, so that -EIO is
returned rather than triggering a kernel panic. While this protects most
pmem accesses, it is not complete in the filesystem-dax case. When
filesystem-dax is enabled reads may bypass the block layer and the
driver via dax_iomap_actor() and its usage of copy_to_iter().

In preparation for creating a copy_to_iter() variant that can handle
machine checks, teach memcpy_mcsafe() to return the number of bytes
remaining rather than -EFAULT when an exception occurs.
Co-developed-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: hch@lst.de
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/152539238119.31796.14318473522414462886.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

60622d68

20 4月, 2018 2 次提交

Revert "libnvdimm, of_pmem: workaround OF_NUMA=n build error" · f22acf82

由 Dan Williams 提交于 4月 19, 2018

With commit df3f1264 ("libnvdimm, of_pmem: use dev_to_node() instead
of of_node_to_nid()") it is now possible to allow of_pmem to be built as
a module as originally implemented.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f22acf82

libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid() · df3f1264

由 Rob Herring 提交于 4月 16, 2018

Remove the direct dependency on of_node_to_nid() by using dev_to_node()
instead. Any DT platform device will have its NUMA node id set when the
device is created.

With this, commit 291717b6 ("libnvdimm, of_pmem: workaround OF_NUMA=n
build error") can be reverted.

Fixes: 71719760 ("libnvdimm: Add device-tree based driver")
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: linux-nvdimm@lists.01.org
Signed-off-by: NRob Herring <robh@kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

df3f1264

16 4月, 2018 1 次提交

libnvdimm, dimm: handle EACCES failures from label reads · e7c5a571

由 Dan Williams 提交于 4月 09, 2018

The new support for the standard _LSR and _LSW methods neglected to also
update the nvdimm_init_config_data() and nvdimm_set_config_data() to
return the translated error code from failed commands. This precision is
necessary because the locked status that was previously returned on
ND_CMD_GET_CONFIG_SIZE commands is now returned on
ND_CMD_{GET,SET}_CONFIG_DATA commands.

If the kernel misses this indication it can inadvertently fall back to
label-less mode when it should otherwise avoid all access to locked
regions.

Cc: <stable@vger.kernel.org>
Fixes: 4b27db7e ("acpi, nfit: add support for the _LSI, _LSR, and...")
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e7c5a571

10 4月, 2018 1 次提交

libnvdimm, of_pmem: workaround OF_NUMA=n build error · 291717b6

由 Dan Williams 提交于 4月 09, 2018

Stephen reports that an x86 allmodconfig build fails to build the
of_pmem driver due to a missing definition of of_node_to_nid(). That
helper is currently only exported in the OF_NUMA=y case. In other cases,
ppc and sparc, it is a weak symbol, and outside of those platforms it is
a static inline.

Until an OF_NUMA=n configuration can reliably support usage of
of_node_to_nid() in modules across architectures, mark this driver as
'bool' instead of 'tristate'.

Cc: Rob Herring <robh@kernel.org>
Cc: Oliver O'Halloran <oohall@gmail.com>
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

291717b6

07 4月, 2018 5 次提交

libnvdimm: Add device-tree based driver · 71719760

由 Oliver O'Halloran 提交于 4月 06, 2018

This patch adds peliminary device-tree bindings for persistent memory
regions. The driver registers a libnvdimm bus for each pmem-region
node and each address range under the node is converted to a region
within that bus.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

71719760

libnvdimm: Add of_node to region and bus descriptors · 1ff19f48

由 Oliver O'Halloran 提交于 4月 06, 2018

We want to be able to cross reference the region and bus devices
with the device tree node that they were spawned from. libNVDIMM
handles creating the actual devices for these internally, so we
need to pass in a pointer to the relevant node in the descriptor.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1ff19f48

libnvdimm, region: quiet region probe · 60ce0f93

由 Dan Williams 提交于 4月 07, 2018

The message about constraining number of online cpus to be less than or
equal to ND_MAX_LANES (256) is only useful for block-aperture
configurations and BTT. Make it debug since it is only relevant when
debugging performance.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

60ce0f93

libnvdimm, namespace: use a safe lookup for dimm device name · 4f867220

由 Dan Williams 提交于 4月 06, 2018

The following NULL dereference results from incorrectly assuming that
ndd is valid in this print:

  struct nvdimm_drvdata *ndd = to_ndd(&nd_region->mapping[i]);

  /*
   * Give up if we don't find an instance of a uuid at each
   * position (from 0 to nd_region->ndr_mappings - 1), or if we
   * find a dimm with two instances of the same uuid.
   */
  dev_err(&nd_region->dev, "%s missing label for %pUb\n",
                  dev_name(ndd->dev), nd_label->uuid);

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 IP: nd_region_register_namespaces+0xd67/0x13c0 [libnvdimm]
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 CPU: 43 PID: 673 Comm: kworker/u609:10 Not tainted 4.16.0-rc4+ #1
 [..]
 RIP: 0010:nd_region_register_namespaces+0xd67/0x13c0 [libnvdimm]
 [..]
 Call Trace:
  ? devres_add+0x2f/0x40
  ? devm_kmalloc+0x52/0x60
  ? nd_region_activate+0x9c/0x320 [libnvdimm]
  nd_region_probe+0x94/0x260 [libnvdimm]
  ? kernfs_add_one+0xe4/0x130
  nvdimm_bus_probe+0x63/0x100 [libnvdimm]

Switch to using the nvdimm device directly.

Fixes: 0e3b0d12 ("libnvdimm, namespace: allow multiple pmem...")
Cc: <stable@vger.kernel.org>
Reported-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

4f867220

libnvdimm, dimm: fix dpa reservation vs uninitialized label area · c31898c8

由 Dan Williams 提交于 4月 06, 2018

At initialization time the 'dimm' driver caches a copy of the memory
device's label area and reserves address space for each of the
namespaces defined.

However, as can be seen below, the reservation occurs even when the
index blocks are invalid:

 nvdimm nmem0: nvdimm_init_config_data: len: 131072 rc: 0
 nvdimm nmem0: config data size: 131072
 nvdimm nmem0: __nd_label_validate: nsindex0 labelsize 1 invalid
 nvdimm nmem0: __nd_label_validate: nsindex1 labelsize 1 invalid
 nvdimm nmem0: : pmem-6025e505: 0x1000000000 @ 0xf50000000 reserve <-- bad

Gate dpa reservation on the presence of valid index blocks.

Cc: <stable@vger.kernel.org>
Fixes: 4a826c83 ("libnvdimm: namespace indices: read and validate")
Reported-by: NKrzysztof Rusocki <krzysztof.rusocki@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c31898c8

04 4月, 2018 1 次提交

libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' · 243f29fe

由 Dan Williams 提交于 4月 02, 2018

For debug, it is useful for bus providers to be able to retrieve the
'struct device' associated with an nd_region instance that it
registered. We already have to_nd_region() to perform the reverse cast
operation, in fact its duplicate declaration can be removed from the
private drivers/nvdimm/nd.h header.
Reviewed-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

243f29fe

03 4月, 2018 1 次提交

dax: introduce CONFIG_DAX_DRIVER · 2080e88a

由 Dan Williams 提交于 3月 29, 2018

In support of allowing device-mapper to compile out idle/dead code when
there are no dax providers in the system, introduce the DAX_DRIVER
symbol. This is selected by all leaf drivers that device-mapper might be
layered on top. This allows device-mapper to conditionally 'select DAX'
only when a provider is present.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: NBart Van Assche <Bart.VanAssche@wdc.com>
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

2080e88a

22 3月, 2018 2 次提交

libnvdimm, nfit: fix persistence domain reporting · fe9a552e

由 Dan Williams 提交于 3月 21, 2018

The persistence domain is a point in the platform where once writes
reach that destination the platform claims it will make them persistent
relative to power loss. In the ACPI NFIT this is currently communicated
as 2 bits in the "NFIT - Platform Capabilities Structure". The bits
comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on
Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM
Durability on Power Loss Capable".

Commit 96c3a239 "libnvdimm: expose platform persistence attr..."
shows the persistence domain as flags, but it's really an enumerated
hierarchy.

Fix this newly introduced user ABI to show the closest available
persistence domain before userspace develops dependencies on seeing, or
needing to develop code to tolerate, the raw NFIT flags communicated
through the libnvdimm-generic region attribute.

Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
Reviewed-by: NDave Jiang <dave.jiang@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

fe9a552e

libnvdimm, region: hide persistence_domain when unknown · 896196dc

由 Dan Williams 提交于 3月 21, 2018

Similar to other region attributes, do not emit the persistence_domain
attribute if its contents are empty.

Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
Cc: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

896196dc

18 3月, 2018 1 次提交

block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> · 233bde21

由 Bart Van Assche 提交于 3月 14, 2018

It happens often while I'm preparing a patch for a block driver that
I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
available for this driver? Do I have to introduce definitions of these
constants before I can use these constants? To avoid this confusion,
move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
<linux/blkdev.h> header file such that these become available for all
block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
header file conditional to avoid that including that header file after
<linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
redefinition.

Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
not been removed from uapi header files nor from NAND drivers in
which these constants are used for another purpose than converting
block layer offsets and sizes into a number of sectors.

Cc: David S. Miller <davem@davemloft.net>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

233bde21

16 3月, 2018 2 次提交

libnvdimm, label: change nvdimm_num_label_slots per UEFI 2.7 · 9e694d9c

由 Toshi Kani 提交于 2月 23, 2018

sizeof_namespace_index() fails when NVDIMM devices have the minimum
1024 bytes label storage area.  nvdimm_num_label_slots() returns 3
slots while the area is only big enough for 2 slots.

Change nvdimm_num_label_slots() to calculate a number of label slots
according to UEFI 2.7 spec.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

9e694d9c

libnvdimm, label: change min label storage size per UEFI 2.7 · 36de6f51

由 Toshi Kani 提交于 2月 23, 2018

UEFI 2.7 defines in page 758 that:

  Initial Label Storage Area Configuration
     :
  The minimum size of the Label Storage Area is large enough to
  hold 2 index blocks and 2 labels.

The mininum index block size is 256 bytes, and the minimum label size
is also 256 bytes.

Change ND_LABEL_MIN_SIZE to (256 * 4) so that NVDIMM devices with
the minimum label storage area do not fail with the size check in
nvdimm_init_config_data().
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

36de6f51

15 3月, 2018 1 次提交

libnvdimm, pmem: use module_nd_driver · 03e90843

由 Johannes Thumshirn 提交于 3月 14, 2018

Use module_nd_driver() instead of having module_init() and
module_exit() callbacks which just call nd_driver_register() and
nd_driver_unregister().
Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

03e90843

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功