提交 · efbf6f50ada168b68f5e3de474bc8dee4a02d046 · openanolis / cloud-kernel

08 10月, 2017 2 次提交

libnvdimm: introduce 'flags' attribute for DIMM 'lock' and 'alias' status · efbf6f50

由 Dan Williams 提交于 9月 25, 2017

Given that we now how have two mechanisms for a DIMM to indicate that it
is locked:

    * NVDIMM_FAMILY_INTEL 'get_config_size' _DSM command

    * ACPI 6.2 Label Storage Read / Write commands

...export the generic libnvdimm DIMM status in a new 'flags' attribute.

This attribute can also reflect the 'alias' state which indicates
whether the nvdimm core is enforcing labels for aliased-region-capacity
that the given dimm is an interleave-set member.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

efbf6f50

acpi, nfit: add support for the _LSI, _LSR, and _LSW label methods · 4b27db7e

由 Dan Williams 提交于 9月 24, 2017

ACPI 6.2 adds support for named methods to access the label storage area
of an NVDIMM. We prefer these new methods if available and otherwise
fallback to the NVDIMM_FAMILY_INTEL _DSMs. The kernel ioctls,
ND_IOCTL_{GET,SET}_CONFIG_{SIZE,DATA}, remain generic and the driver
translates the 'package' payloads into the NVDIMM_FAMILY_INTEL 'buffer'
format to maintain compatibility with existing userspace and keep the
output buffer parsing code in the driver common.

The output payloads are mostly compatible save for the 'label area
locked' status that moves from the 'config_size' (_LSI) command to the
'config_read' (_LSR) command status.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

4b27db7e

29 9月, 2017 5 次提交

libnvdimm, namespace: fix label initialization to use valid seq numbers · b18d4b8a

由 Dan Williams 提交于 9月 26, 2017

The set of valid sequence numbers is {1,2,3}. The specification
indicates that an implementation should consider 0 a sign of a critical
error:

    UEFI 2.7: 13.19 NVDIMM Label Protocol

    Software never writes the sequence number 00, so a correctly
    check-summed Index Block with this sequence number probably indicates a
    critical error. When software discovers this case it treats it as an
    invalid Index Block indication.

While the expectation is that the invalid block is just thrown away, the
Robustness Principle says we should fix this to make both sequence
numbers valid.

Fixes: f524bf27 ("libnvdimm: write pmem label set")
Cc: <stable@vger.kernel.org>
Reported-by: NJuston Li <juston.li@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b18d4b8a

libnvdimm, pfn: make 'resource' attribute only readable by root · 26417ae4

由 Dan Williams 提交于 9月 26, 2017

For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for pfn
devices only readable by root. Otherwise we disclose physical address
information.

Fixes: f6ed58c7 ("libnvdimm, pfn: 'resource'-address and 'size'...")
Cc: <stable@vger.kernel.org>
Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

26417ae4

libnvdimm, namespace: make 'resource' attribute only readable by root · c1fb3542

由 Dan Williams 提交于 9月 26, 2017

For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for
namespace devices only readable by root. Otherwise we disclose physical
address information.

Fixes: bf9bccc1 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: <stable@vger.kernel.org>
Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c1fb3542

libnvdimm, region : make 'resource' attribute only readable by root · b8ff981f

由 Dan Williams 提交于 9月 26, 2017

For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for region
devices only readable by root. Otherwise we disclose physical address
information.

Fixes: 802f4be6 ("libnvdimm: Add 'resource' sysfs attribute to regions")
Cc: <stable@vger.kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b8ff981f

libnvdimm, dimm: clear 'locked' status on successful DIMM enable · d34cb808

由 Dan Williams 提交于 9月 25, 2017

If we successfully enable a DIMM then it must not be locked and we can
clear the label-read failure condition. Otherwise, we need to reload the
entire bus provider driver to achieve the same effect, and that can
disrupt unrelated DIMMs and namespaces.

Fixes: 9d62ed96 ("libnvdimm: handle locked label storage areas")
Cc: <stable@vger.kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d34cb808

19 9月, 2017 1 次提交

libnvdimm, namespace: fix btt claim class crash · 33a56086

由 Dan Williams 提交于 9月 18, 2017

Maurice reports:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
    IP: holder_class_store+0x253/0x2b0 [libnvdimm]

...while trying to reconfigure an NVDIMM-N namespace into 'sector' /
'btt' mode. The crash points to this line:

    (gdb) li *(holder_class_store+0x253)
    0x7773 is in holder_class_store (drivers/nvdimm/namespace_devs.c:1420).
    1415            for (i = 0; i < nd_region->ndr_mappings; i++) {
    1416                    struct nd_mapping *nd_mapping = &nd_region->mapping[i];
    1417                    struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
    1418                    struct nd_namespace_index *nsindex;
    1419
    1420                    nsindex = to_namespace_index(ndd, ndd->ns_current);

...where we are failing because ndd is NULL due to NVDIMM-N dimms not
supporting labels.

Long story short, default to the BTTv1 format in the label-less /
NVDIMM-N case.

Fixes: 14e49454 ("libnvdimm, btt: BTT updates for UEFI 2.7 format")
Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: NMaurice A. Saldivar <maurice.a.saldivar@hpe.com>
Tested-by: NMaurice A. Saldivar <maurice.a.saldivar@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

33a56086

11 9月, 2017 1 次提交

dax: remove the pmem_dax_ops->flush abstraction · c3ca015f

由 Mikulas Patocka 提交于 8月 31, 2017

Commit abebfbe2 ("dm: add ->flush() dax operation support") is
buggy. A DM device may be composed of multiple underlying devices and
all of them need to be flushed. That commit just routes the flush
request to the first device and ignores the other devices.

It could be fixed by adding more complex logic to the device mapper. But
there is only one implementation of the method pmem_dax_ops->flush - that
is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
don't need the pmem_dax_ops->flush abstraction at all, we can call
arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
can't ever reach anything different from arch_wb_cache_pmem().

It should be also pointed out that for some uses of persistent memory it
is needed to flush only a very small amount of data (such as 1 cacheline),
and it would be overkill if we go through that device mapper machinery for
a single flushed cache line.

Fix this by removing the pmem_dax_ops->flush abstraction and call
arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
mapper code that forwards the flushes.

Fixes: abebfbe2 ("dm: add ->flush() dax operation support")
Cc: stable@vger.kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c3ca015f

10 9月, 2017 1 次提交

libnvdimm, btt: fix format string warnings · 04c3c982

由 Randy Dunlap 提交于 9月 08, 2017

Fix format warnings (seen on i386) in nvdimm/btt.c:

../drivers/nvdimm/btt.c: In function ‘btt_map_init’:
../drivers/nvdimm/btt.c:430:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Wformat=]
   dev_WARN_ONCE(to_dev(arena), size < 512,
   ^
../drivers/nvdimm/btt.c: In function ‘btt_log_init’:
../drivers/nvdimm/btt.c:474:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Wformat=]
   dev_WARN_ONCE(to_dev(arena), size < 512,
   ^

Fixes: 86652d2e ("libnvdimm, btt: clean up warning and error messages")
Reported-by: NArnd Bergmann <arnd@arndb.de>
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

04c3c982

08 9月, 2017 1 次提交

libnvdimm, btt: clean up warning and error messages · 86652d2e

由 Vishal Verma 提交于 9月 05, 2017

Convert all WARN* style messages to dev_WARN, and for errors in the IO
paths, use dev_err_ratelimited. Also remove some BUG_ONs in the IO path
and replace them with the above - no need to crash the machine in case
of an unaligned IO.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

86652d2e

07 9月, 2017 1 次提交

block, THP: make block_device_operations.rw_page support THP · 98cc093c

由 Huang Ying 提交于 9月 06, 2017

The .rw_page in struct block_device_operations is used by the swap
subsystem to read/write the page contents from/into the corresponding
swap slot in the swap device.  To support the THP (Transparent Huge
Page) swap optimization, the .rw_page is enhanced to support to
read/write THP if possible.

Link: http://lkml.kernel.org/r/20170724051840.2309-6-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c]
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

98cc093c

05 9月, 2017 1 次提交

libnvdimm, nfit: move the check on nd_reserved2 to the endpoint · 9edcad53

由 Meng Xu 提交于 9月 04, 2017

Delay the check of nd_reserved2 to the actual endpoint (acpi_nfit_ctl)
that uses it, as a prevention of a potential double-fetch bug.

While examining the kernel source code, I found a dangerous operation that
could turn into a double-fetch situation (a race condition bug) where
the same userspace memory region are fetched twice into kernel with sanity
checks after the first fetch while missing checks after the second fetch.

In the case of _IOC_NR(ioctl_cmd) == ND_CMD_CALL:

1. The first fetch happens in line 935 copy_from_user(&pkg, p, sizeof(pkg)

2. subsequently `pkg.nd_reserved2` is asserted to be all zeroes
(line 984 to 986).

3. The second fetch happens in line 1022 copy_from_user(buf, p, buf_len)

4. Given that `p` can be fully controlled in userspace, an attacker can
race condition to override the header part of `p`, say,
`((struct nd_cmd_pkg *)p)->nd_reserved2` to arbitrary value
(say nine 0xFFFFFFFF for `nd_reserved2`) after the first fetch but before the
second fetch. The changed value will be copied to `buf`.

5. There is no checks on the second fetches until the use of it in
line 1034: nd_cmd_clear_to_send(nvdimm_bus, nvdimm, cmd, buf) and
line 1038: nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len, &cmd_rc)
which means that the assumed relation, `p->nd_reserved2` are all zeroes might
not hold after the second fetch. And once the control goes to these functions
we lose the context to assert the assumed relation.

6. Based on my manual analysis, `p->nd_reserved2` is not used in function
`nd_cmd_clear_to_send` and potential implementations of `nd_desc->ndctl`
so there is no working exploit against it right now. However, this could
easily turns to an exploitable one if careless developers start to use
`p->nd_reserved2` later and assume that they are all zeroes.

Move the validation of the nd_reserved2 field to the ->ndctl()
implementation where it has a stable buffer to evaluate.
Signed-off-by: NMeng Xu <mengxu.gatech@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

9edcad53

01 9月, 2017 8 次提交

libnvdimm: fix integer overflow static analysis warning · 58738c49

由 Dan Williams 提交于 8月 31, 2017

Dan reports:
    The patch 62232e45: "libnvdimm: control (ioctl) messages for
    nvdimm_bus and nvdimm devices" from Jun 8, 2015, leads to the
    following static checker warning:

            drivers/nvdimm/bus.c:1018 __nd_ioctl()
            warn: integer overflows 'buf_len'

    From a casual review, this seems like it might be a real bug.  On
    the first iteration we load some data into in_env[].  On the second
    iteration we read a use controlled "in_size" from nd_cmd_in_size().
    It can go up to UINT_MAX - 1.  A high number means we will fill the
    whole in_env[] buffer.  But we potentially keep looping and adding
    more to in_len so now it can be any value.

    It simple enough to change, but it feels weird that we keep looping
    even though in_env is totally full.  Shouldn't we just return an
    error if we don't have space for desc->in_num.

We keep looping because the size of the total input is allowed to be
bigger than the 'envelope' which is a subset of the payload that tells
us how much data to expect. For safety explicitly check that buf_len
does not overflow which is what the checker flagged.

Cc: <stable@vger.kernel.org>
Fixes: 62232e45: "libnvdimm: control (ioctl) messages for nvdimm_bus..."
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

58738c49

libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7

由 Robin Murphy 提交于 8月 31, 2017

mmio_flush_range() suffers from a lack of clearly-defined semantics,
and is somewhat ambiguous to port to other architectures where the
scope of the writeback implied by "flush" and ordering might matter,
but MMIO would tend to imply non-cacheable anyway. Per the rationale
in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
only existing use is actually to invalidate clean cache lines for
ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
cleanup of the pmem API, that also now happens to be the exact purpose
of arch_invalidate_pmem(), which would be a far more well-defined tool
for the job.

Rather than risk potentially inconsistent implementations of
mmio_flush_range() for the sake of one callsite, streamline things by
removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
definitions up to the libnvdimm level, so they can be shared by NFIT
as well. This allows NFIT to be enabled for arm64.
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5deb67f7

libnvdimm, btt: rework error clearing · d9b83c75

由 Vishal Verma 提交于 8月 30, 2017

Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.

In this patch we move error clearing out of the atomic section, and thus
re-enable error clearing with BTTs. When we are about to add a block to
the free list, we check if it was previously marked as an error, and if
it was, we add it to the freelist, but also set a flag that says error
clearing will be required. We then drop the lane (ending the atomic
context), and send a zero buffer so that the error can be cleared. The
error flag in the free list is protected by the nd 'lane', and is set
only be a thread while it holds that lane. When the error is cleared,
the flag is cleared, but while holding a mutex for that freelist index.

When writing, we check for two things -
1/ If the freelist mutex is held or if the error flag is set. If so,
this is an error block that is being (or about to be) cleared.
2/ If the block is a known badblock based on nsio->bb

The second check is required because the BTT map error flag for a map
entry only gets set when an error LBA is read. If we write to a new
location that may not have the map error flag set, but still might be in
the region's badblock list, we can trigger an EIO on the write, which is
undesirable and completely avoidable.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d9b83c75

libnvdimm: fix potential deadlock while clearing errors · 0930a750

由 Vishal Verma 提交于 8月 30, 2017

With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths.
Specifically, the DSM to clear media errors is called during writes, so
that we can provide a writes-fix-errors model.

However it is easy to imagine a scenario like:
 -> write through the nvdimm driver
   -> acpi allocation
     -> writeback, causes more IO through the nvdimm driver
       -> deadlock

Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO
flag for the current scope when issuing commands/IOs that are expected
to clear errors.

Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0930a750

libnvdimm, btt: cache sector_size in arena_info · 75892004

由 Vishal Verma 提交于 8月 30, 2017

In preparation for the error clearing rework, add sector_size in the
arena_info struct.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

75892004

libnvdimm, btt: ensure that flags were also unchanged during a map_read · 1398199d

由 Vishal Verma 提交于 8月 30, 2017

In btt_map_read, we read the map twice to make sure that the map entry
didn't change after we added it to the read tracking table. In
anticipation of expanding the use of the error bit, also make sure that
the error and zero flags are constant across the two map reads.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1398199d

libnvdimm, btt: refactor map entry operations with macros · 0595d539

由 Vishal Verma 提交于 8月 30, 2017

Add helpers for converting a raw map entry to just the block number, or
either of the 'e' or 'z' flags in preparation for actually using the
error flag to mark blocks with media errors.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0595d539

libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path · 1db1f3ce

由 Vishal Verma 提交于 8月 30, 2017

The IO context conversion for rw_bytes missed a case in the BTT write
path (btt_map_write) which should've been marked as atomic.

In reality this should not cause a problem, because map writes are to
small for nsio_rw_bytes to attempt error clearing, but it should be
fixed for posterity.

Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
things like the nfit unit tests, which don't actually sleep, can catch
bugs like this.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1db1f3ce

30 8月, 2017 2 次提交

libnvdimm, btt: check memory allocation failure · ed36b4db

由 Christophe Jaillet 提交于 8月 27, 2017

Check memory allocation failures and return -ENOMEM in such cases, as
already done few lines below for another memory allocation.

This avoids NULL pointers dereference.

Cc: <stable@vger.kernel.org>
Fixes: 14e49454 ("libnvdimm, btt: BTT updates for UEFI 2.7 format")
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ed36b4db

libnvdimm, label: fix index block size calculation · 02881768

由 Dan Williams 提交于 8月 29, 2017

The old calculation assumed that the label space was 128k and the label
size is 128. With v1.2 labels where the label size is 256 this
calculation will return zero. We are saved by the fact that the
nsindex_size is always pre-initialized from a previous 128 byte
assumption and we are lucky that the index sizes turn out the same.

Fix this going forward in case we start encountering different
geometries of label areas besides 128k.

Since the label size can change from one call to the next, drop the
caching of nsindex_size.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

02881768

24 8月, 2017 1 次提交

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

16 8月, 2017 1 次提交

libnvdimm, pfn, dax: limit namespace alignments to the supported set · f13d2b61

由 Dan Williams 提交于 8月 11, 2017

Now that we properly advertise the supported pte, pmd, and pud sizes,
restrict the supported alignments that can be set on a namespace. This
assumes that userspace was not previously relying on the ability to set
odd alignments. At least ndctl only ever supported setting the namespace
alignment to 4K, 2M, or 1G.

Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f13d2b61

12 8月, 2017 2 次提交

libnvdimm, pfn, dax: show supported dax/pfn region alignments in sysfs · 1fdadbeb

由 Oliver O'Halloran 提交于 6月 27, 2017

The alignment of a DAX and PFN regions dictates the page sizes that can
be used to map the region. Even if the hardware page sizes are known the
actual range of supported page sizes that can be used with DAX depends
on the kernel configuration. As a result it's best that the kernel
advertises the alignments that should be used with these region types.

This patch adds the 'supported_alignments' region attribute to expose
this information to userspace.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
[djbw: integrate with nd_size_select_show() rename and other fixups]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1fdadbeb

libnvdimm: rename nd_sector_size_{show,store} to nd_size_select_{show,store} · b2c48f9f

由 Dan Williams 提交于 8月 11, 2017

Prepare for other another consumer of this size selection scheme that is
not a 'sector size'.

Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b2c48f9f

10 8月, 2017 1 次提交

block: pass in queue to inflight accounting · d62e26b3

由 Jens Axboe 提交于 6月 30, 2017

No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.
Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d62e26b3

05 8月, 2017 1 次提交

nfit, libnvdimm, region: export 'position' in mapping info · 401c0a19

由 Dan Williams 提交于 8月 04, 2017

It is useful to be able to know the position of a DIMM in an
interleave-set. Consider the case where the order of the DIMMs changes
causing a namespace to be invalidated because the interleave-set cookie no
longer matches. If the before and after state of each DIMM position is
known this state debugged by the system owner.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

401c0a19

26 7月, 2017 1 次提交

libnvdimm: Stop using HPAGE_SIZE · 0dd69643

由 Oliver O'Halloran 提交于 6月 27, 2017

Currently libnvdimm uses HPAGE_SIZE as the default alignment for DAX and
PFN devices. HPAGE_SIZE is the default hugetlbfs page size and when
hugetlbfs is disabled it defaults to PAGE_SIZE. Given DAX has more
in common with THP than hugetlbfs we should proably be using
HPAGE_PMD_SIZE, but this is undefined when THP is disabled so lets just
give it a new name.

The other usage of HPAGE_SIZE in libnvdimm is when determining how large
the altmap should be. For the reasons mentioned above it doesn't really
make sense to use HPAGE_SIZE here either. PMD_SIZE seems to be safe to
use in generic code and it happens to match the vmemmap allocation block
on x86 and Power. It's still a hack, but it's a slightly nicer hack.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0dd69643

18 7月, 2017 1 次提交

libnvdimm: fix badblock range handling of ARS range · 4e3f0701

由 Toshi Kani 提交于 7月 07, 2017

__add_badblock_range() does not account sector alignment when
it sets 'num_sectors'.  Therefore, an ARS error record range
spanning across two sectors is set to a single sector length,
which leaves the 2nd sector unprotected.

Change __add_badblock_range() to set 'num_sectors' properly.

Cc: <stable@vger.kernel.org>
Fixes: 0caeef63 ("libnvdimm: Add a poison list and export badblocks")
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

4e3f0701

04 7月, 2017 3 次提交

libnvdimm, namespace: record 'lbasize' for pmem namespaces · 2de5148f

由 Dan Williams 提交于 7月 03, 2017

Commit f979b13c "libnvdimm, label: honor the lba size specified in
v1.2 labels") neglected to update the 'lbasize' in the label when the
namespace sector_size attribute was written. We need this value in the
label for inter-OS / pre-OS compatibility.

Fixes: f979b13c ("libnvdimm, label: honor the lba size specified in v1.2 labels")
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

2de5148f

block: guard bvec iteration logic · b1fb2c52

由 Dmitry Monakhov 提交于 6月 29, 2017

Currently if some one try to advance bvec beyond it's size we simply
dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
This simply means that we endup dereferencing/corrupting random memory
region.

Sane reaction would be to propagate error back to calling context
But bvec_iter_advance's calling context is not always good for error
handling. For safity reason let truncate iterator size to zero which
will break external iteration loop which prevent us from unpredictable
memory range corruption. And even it caller ignores an error, it will
corrupt it's own bvecs, not others.

This patch does:
- Return error back to caller with hope that it will react on this
- Truncate iterator size

Code was added long time ago here 4550dd6c, luckily no one hit it
in real life :)
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
[hch: switch to true/false returns instead of errno values]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1fb2c52

bio-integrity: fold bio_integrity_enabled to bio_integrity_prep · e23947bd

由 Dmitry Monakhov 提交于 6月 29, 2017

Currently all integrity prep hooks are open-coded, and if prepare fails
we ignore it's code and fail bio with EIO. Let's return real error to
upper layer, so later caller may react accordingly.

In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
so it is reasonable to fold it in to one function.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
[hch: merged with the latest block tree,
	return bool from bio_integrity_prep]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e23947bd

01 7月, 2017 4 次提交

libnvdimm: passthru functions clear to send · 53b85a44

由 Jerry Hoemann 提交于 6月 30, 2017

Have dsm functions called via the pass thru mechanism also
be checked against clear to send.
Signed-off-by: NJerry Hoemann <jerry.hoemann@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

53b85a44

libnvdimm, btt: convert some info messages to warn/err · e6be2dcb

由 Vishal Verma 提交于 6月 30, 2017

Some critical messages such as IO errors, metadata failures were printed
with dev_info. Make them louder by upgrading them to dev_warn or
dev_error.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e6be2dcb

libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime · 6aa734a2

由 Dan Williams 提交于 6月 30, 2017

We need to hold a reference on the 'dirent' until we are sure there are
no more notifications that will be sent. As noted in the new comments we
take advantage of the fact that the references are taken and dropped
under device_lock() and that nd_device_notify() holds device_lock() over
new badblocks notifications. The notifications that happen when
badblocks are cleared only occur while the device is active.

Also take the opportunity to fix up the error messages to report the
user visible effect of a sysfs_get_dirent() failure.

Fixes: 975750a9 ("libnvdimm, pmem: Add sysfs notifications to badblocks")
Cc: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6aa734a2

libnvdimm: fix the clear-error check in nsio_rw_bytes · 7e5a21df

由 Vishal Verma 提交于 6月 30, 2017

A leftover from the 'bandaid' fix that disabled BTT error clearing in
rw_bytes resulted in an incorrect check. After we converted these checks
over to use the NVDIMM_IO_ATOMIC flag, the ndns->claim check was both
redundant, and incorrect. Remove it.

Fixes: 3ae3d67b ("libnvdimm: add an atomic vs process context flag to rw_bytes")
Cc: <stable@vger.kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

7e5a21df

30 6月, 2017 2 次提交

libnvdimm, btt: fix btt_rw_page not returning errors · c13c43d5

由 Vishal Verma 提交于 6月 29, 2017

btt_rw_page was not propagating errors frm btt_do_bvec, resulting in any
IO errors via the rw_page path going unnoticed. the pmem driver recently
fixed this in e10624f8 pmem: fail io-requests to known bad blocks
but same problem in BTT went neglected.

Fixes: 5212e11f ("nd_btt: atomic sector updates")
Cc: <stable@vger.kernel.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c13c43d5

acpi, nfit: quiet invalid block-aperture-region warnings · d5d51fec

由 Dan Williams 提交于 6月 29, 2017

This state is already visible by userspace since the BLK region will not
be enabled, and it is otherwise benign as it usually indicates that the
DIMM is not configured.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d5d51fec

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功