提交 · 8401c72c593d2be8607d2a0a4551ee5c867d6f2f · openeuler / raspberrypi-kernel

22 3月, 2018 2 次提交

libnvdimm, nfit: fix persistence domain reporting · fe9a552e

由 Dan Williams 提交于 3月 21, 2018

The persistence domain is a point in the platform where once writes
reach that destination the platform claims it will make them persistent
relative to power loss. In the ACPI NFIT this is currently communicated
as 2 bits in the "NFIT - Platform Capabilities Structure". The bits
comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on
Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM
Durability on Power Loss Capable".

Commit 96c3a239 "libnvdimm: expose platform persistence attr..."
shows the persistence domain as flags, but it's really an enumerated
hierarchy.

Fix this newly introduced user ABI to show the closest available
persistence domain before userspace develops dependencies on seeing, or
needing to develop code to tolerate, the raw NFIT flags communicated
through the libnvdimm-generic region attribute.

Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
Reviewed-by: NDave Jiang <dave.jiang@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

fe9a552e

libnvdimm, region: hide persistence_domain when unknown · 896196dc

由 Dan Williams 提交于 3月 21, 2018

Similar to other region attributes, do not emit the persistence_domain
attribute if its contents are empty.

Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
Cc: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

896196dc

14 3月, 2018 1 次提交

libnvdimm: remove redundant assignment to pointer 'dev' · 0cbfeef2

由 Colin Ian King 提交于 2月 05, 2018

Pointer dev is being assigned a value that is never read, it is being
re-assigned the same value later on, hence the initialization is redundant
and can be removed.

Cleans up clang warning:
drivers/nvdimm/pfn_devs.c:307:17: warning: Value stored to 'dev' during
its initialization is never read
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0cbfeef2

08 3月, 2018 1 次提交

libnvdimm, {btt, blk}: do integrity setup before add_disk() · 3ffb0ba9

由 Vishal Verma 提交于 3月 05, 2018

Prior to 25520d55 ("block: Inline blk_integrity in struct gendisk")
we needed to temporarily add a zero-capacity disk before registering for
blk-integrity. But adding a zero-capacity disk caused the partition
table scanning to bail early, and this resulted in partitions not coming
up after a probe of the BTT or blk namespaces.

We can now register for integrity before the disk has been added, and
this fixes the rescan problems.

Fixes: 25520d55 ("block: Inline blk_integrity in struct gendisk")
Reported-by: NDariusz Dokupil <dariusz.dokupil@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

3ffb0ba9

03 3月, 2018 1 次提交

libnvdimm: re-enable deep flush for pmem devices via fsync() · 5fdf8e5b

由 Dave Jiang 提交于 3月 02, 2018

Re-enable deep flush so that users always have a way to be sure that a
write makes it all the way out to media. Writes from the PMEM driver
always arrive at the NVDIMM since movnt is used to bypass the cache, and
the driver relies on the ADR (Asynchronous DRAM Refresh) mechanism to
flush write buffers on power failure. The Deep Flush mechanism is there
to explicitly write buffers to protect against (rare) ADR failure. This
change prevents a regression in deep flush behavior so that applications
can continue to depend on fsync() as a mechanism to trigger deep flush
in the filesystem-DAX case.

Fixes: 06e8ccda ("acpi: nfit: Add support for detect platform CPU cache...")
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5fdf8e5b

03 2月, 2018 1 次提交

libnvdimm, namespace: remove redundant initialization of 'nd_mapping' · 59858d3d

由 Colin Ian King 提交于 1月 30, 2018

Pointer nd_mapping is being initialized to a value that is never read,
instead it is being updated to a new value in all the cases where it
is being read afterwards, hence the initialization is redundant and
can be removed.

Cleans up clang warning:
drivers/nvdimm/namespace_devs.c:2411:21: warning: Value stored to
'nd_mapping' during its initialization is never rea
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>

59858d3d

02 2月, 2018 2 次提交

libnvdimm: expose platform persistence attribute for nd_region · 96c3a239

由 Dave Jiang 提交于 1月 31, 2018

Providing a sysfs attribute for nd_region that shows the persistence
capabilities for the platform.
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>

96c3a239

acpi: nfit: Add support for detect platform CPU cache flush on power loss · 06e8ccda

由 Dave Jiang 提交于 1月 31, 2018

In ACPI 6.2a the platform capability structure has been added to the NFIT
tables. That provides software the ability to determine whether a system
supports the auto flushing of CPU caches on power loss. If the capability
is supported, we do not need to do dax_flush(). Plumbing the path to set the
property on per region from the NFIT tables.

This patch depends on the ACPI NFIT 6.2a platform capabilities support code
in include/acpi/actbl1.h.
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>

06e8ccda

20 1月, 2018 1 次提交

libnvdimm, btt: fix uninitialized err_lock · d08cd5e0

由 Jeff Moyer 提交于 12月 13, 2017

When a sector mode namespace is initially created, the arena's err_lock
is not initialized. If, on the other hand, the namespace already
exists, the mutex is initialized. To fix the issue, I moved the mutex
initialization into the arena_alloc, which is called by both
discover_arenas and create_arenas.

This was discovered on an older kernel where mutex_trylock checks the
count to determine whether the lock is held. Because the data structure
is kzalloc-d, that count was 0 (held), and I/O to the device would hang
forever waiting for the lock to be released (see btt_write_pg, for
example). Current kernels have a different mutex implementation that
checks for a non-null owner, and so this doesn't show up as a problem.
If that lock were ever contended, it might cause issues, but you'd have
to be really unlucky, I think.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d08cd5e0

09 1月, 2018 1 次提交

memremap: change devm_memremap_pages interface to use struct dev_pagemap · e8d51348

由 Christoph Hellwig 提交于 12月 29, 2017

This new interface is similar to how struct device (and many others)
work. The caller initializes a 'struct dev_pagemap' as required
and calls 'devm_memremap_pages'. This allows the pagemap structure to
be embedded in another structure and thus container_of can be used. In
this way application specific members can be stored in a containing
struct.

This will be used by the P2P infrastructure and HMM could probably
be cleaned up to use it as well (instead of having it's own, similar
'hmm_devmem_pages_create' function).
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e8d51348

22 12月, 2017 2 次提交

libnvdimm, btt: Fix an incompatibility in the log layout · 24e3a7fb

由 Vishal Verma 提交于 12月 18, 2017

Due to a spec misinterpretation, the Linux implementation of the BTT log
area had different padding scheme from other implementations, such as
UEFI and NVML.

This fixes the padding scheme, and defaults to it for new BTT layouts.
We attempt to detect the padding scheme in use when probing for an
existing BTT. If we detect the older/incompatible scheme, we continue
using it.
Reported-by: NJuston Li <juston.li@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Fixes: 5212e11f ("nd_btt: atomic sector updates")
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

24e3a7fb

libnvdimm, btt: add a couple of missing kernel-doc lines · 13b7954c

由 Vishal Verma 提交于 12月 14, 2017

Recent updates to btt.h neglected to add corresponding kernel-doc lines
for new structure members. Add them.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

13b7954c

20 12月, 2017 2 次提交

libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment · 41fce90f

由 Dan Williams 提交于 12月 04, 2017

The following namespace configuration attempt:

    # ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
    libndctl: ndctl_dax_enable: dax0.1: failed to enable
      Error: namespace0.0: failed to enable

    failed to reconfigure namespace: No such device or address

...fails when the backing memory range is not physically aligned to 1G:

    # cat /proc/iomem | grep Persistent
    210000000-30fffffff : Persistent Memory (legacy)

In the above example the 4G persistent memory range starts and ends on a
256MB boundary.

We handle this case correctly when needing to handle cases that violate
section alignment (128MB) collisions against "System RAM", and we simply
need to extend that padding/truncation for the 1GB alignment use case.

Cc: <stable@vger.kernel.org>
Fixes: 315c5625 ("libnvdimm, pfn: add 'align' attribute...")
Reported-and-tested-by: NJane Chu <jane.chu@oracle.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

41fce90f

libnvdimm, pfn: fix start_pad handling for aligned namespaces · 19deaa21

由 Dan Williams 提交于 12月 19, 2017

The alignment checks at pfn driver startup fail to properly account for
the 'start_pad' in the case where the namespace is misaligned relative
to its internal alignment. This is typically triggered in 1G aligned
namespace, but could theoretically trigger with small namespace
alignments. When this triggers the kernel reports messages of the form:

dax2.1: bad offset: 0x3c000000 dax disabled align: 0x40000000

Cc: <stable@vger.kernel.org>
Fixes: 1ee6667c ("libnvdimm, pfn, dax: fix initialization vs autodetect...")
Reported-by: NJane Chu <jane.chu@oracle.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

19deaa21

05 12月, 2017 1 次提交

nfit, libnvdimm: deprecate the generic SMART ioctl · cdd77d3e

由 Dan Williams 提交于 11月 17, 2017

The kernel's ND_IOCTL_SMART_THRESHOLD command is based on a payload
definition that has become broken / out-of-sync with recent versions of
the NVDIMM_FAMILY_INTEL definition. Deprecate the use of the
ND_IOCTL_SMART_THRESHOLD command in favor of the ND_CMD_CALL approach
taken by NVDIMM_FAMILY_{HPE,MSFT}, where we can manage the per-vendor
variance in userspace.

In a couple years, when the new scheme is widely deployed in userspace
packages, the ND_IOCTL_SMART_THRESHOLD support can be removed. For now
we prevent new binaries from compiling against the kernel header
definitions, but kernel still compatible with old binaries. The
libndctl.h [1] header is now the authoritative interface definition for
NVDIMM SMART.

[1]: https://github.com/pmem/ndctlSigned-off-by: NDan Williams <dan.j.williams@intel.com>

cdd77d3e

16 11月, 2017 1 次提交

bdi: introduce BDI_CAP_SYNCHRONOUS_IO · 23c47d2a

由 Minchan Kim 提交于 11月 15, 2017

As discussed at

  https://lkml.kernel.org/r/<20170728165604.10455-1-ross.zwisler@linux.intel.com>

someday we will remove rw_page().  If so, we need something to detect
such super-fast storage on which synchronous IO operations like the
current rw_page are always a win.

Introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices.  With it, we
could use various optimization techniques.

Link: http://lkml.kernel.org/r/1505886205-9671-3-git-send-email-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Huang Ying <ying.huang@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

23c47d2a

03 11月, 2017 2 次提交

libnvdimm, badrange: remove a WARN for list_empty · 89360b87

由 Vishal Verma 提交于 10月 30, 2017

Now that we're reusing the badrange functions for nfit_test, and that
exposes badrange injection/clearing to userspace via the DSM paths, it
is plausible that a user may call the clear DSM multiple times. Since it
is harmless to do so, we can remove the WARN in badrange_forget.

Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

89360b87

libnvdimm: move poison list functions to a new 'badrange' file · aa9ad44a

由 Dave Jiang 提交于 8月 23, 2017

nfit_test needs to use the poison list manipulation code as well. Make
it more generic and in the process rename poison to badrange, and move
all the related helpers to a new file.
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
[vishal: Add badrange.o to nfit_test's Kbuild]
[vishal: add a missed include in bus.c for the new badrange functions]
[vishal: rename all instances of 'be' to 'bre']
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

aa9ad44a

02 11月, 2017 1 次提交

License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318

由 Greg Kroah-Hartman 提交于 11月 01, 2017

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2441318

12 10月, 2017 1 次提交

treewide: Fix typos in Kconfig · 83fc61a5

由 Masanari Iida 提交于 9月 26, 2017

This patch fixes some spelling typos found in Kconfig files.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

83fc61a5

08 10月, 2017 3 次提交

libnvdimm, namespace: make a couple of functions static · 65853a1d

由 Colin Ian King 提交于 10月 05, 2017

The functions create_namespace_pmem and create_namespace_blk are local
to the source and do not need to be in global scope, so make them static.

Cleans up sparse warnings:
symbol 'create_namespace_pmem' was not declared. Should it be static?
symbol 'create_namespace_blk' was not declared. Should it be static?
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

65853a1d

libnvdimm: introduce 'flags' attribute for DIMM 'lock' and 'alias' status · efbf6f50

由 Dan Williams 提交于 9月 25, 2017

Given that we now how have two mechanisms for a DIMM to indicate that it
is locked:

    * NVDIMM_FAMILY_INTEL 'get_config_size' _DSM command

    * ACPI 6.2 Label Storage Read / Write commands

...export the generic libnvdimm DIMM status in a new 'flags' attribute.

This attribute can also reflect the 'alias' state which indicates
whether the nvdimm core is enforcing labels for aliased-region-capacity
that the given dimm is an interleave-set member.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

efbf6f50

acpi, nfit: add support for the _LSI, _LSR, and _LSW label methods · 4b27db7e

由 Dan Williams 提交于 9月 24, 2017

ACPI 6.2 adds support for named methods to access the label storage area
of an NVDIMM. We prefer these new methods if available and otherwise
fallback to the NVDIMM_FAMILY_INTEL _DSMs. The kernel ioctls,
ND_IOCTL_{GET,SET}_CONFIG_{SIZE,DATA}, remain generic and the driver
translates the 'package' payloads into the NVDIMM_FAMILY_INTEL 'buffer'
format to maintain compatibility with existing userspace and keep the
output buffer parsing code in the driver common.

The output payloads are mostly compatible save for the 'label area
locked' status that moves from the 'config_size' (_LSI) command to the
'config_read' (_LSR) command status.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

4b27db7e

29 9月, 2017 5 次提交

libnvdimm, namespace: fix label initialization to use valid seq numbers · b18d4b8a

由 Dan Williams 提交于 9月 26, 2017

The set of valid sequence numbers is {1,2,3}. The specification
indicates that an implementation should consider 0 a sign of a critical
error:

    UEFI 2.7: 13.19 NVDIMM Label Protocol

    Software never writes the sequence number 00, so a correctly
    check-summed Index Block with this sequence number probably indicates a
    critical error. When software discovers this case it treats it as an
    invalid Index Block indication.

While the expectation is that the invalid block is just thrown away, the
Robustness Principle says we should fix this to make both sequence
numbers valid.

Fixes: f524bf27 ("libnvdimm: write pmem label set")
Cc: <stable@vger.kernel.org>
Reported-by: NJuston Li <juston.li@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b18d4b8a

libnvdimm, pfn: make 'resource' attribute only readable by root · 26417ae4

由 Dan Williams 提交于 9月 26, 2017

For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for pfn
devices only readable by root. Otherwise we disclose physical address
information.

Fixes: f6ed58c7 ("libnvdimm, pfn: 'resource'-address and 'size'...")
Cc: <stable@vger.kernel.org>
Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

26417ae4

libnvdimm, namespace: make 'resource' attribute only readable by root · c1fb3542

由 Dan Williams 提交于 9月 26, 2017

For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for
namespace devices only readable by root. Otherwise we disclose physical
address information.

Fixes: bf9bccc1 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: <stable@vger.kernel.org>
Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c1fb3542

libnvdimm, region : make 'resource' attribute only readable by root · b8ff981f

由 Dan Williams 提交于 9月 26, 2017

For the same reason that /proc/iomem returns 0's for non-root readers
and acpi tables are root-only, make the 'resource' attribute for region
devices only readable by root. Otherwise we disclose physical address
information.

Fixes: 802f4be6 ("libnvdimm: Add 'resource' sysfs attribute to regions")
Cc: <stable@vger.kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b8ff981f

libnvdimm, dimm: clear 'locked' status on successful DIMM enable · d34cb808

由 Dan Williams 提交于 9月 25, 2017

If we successfully enable a DIMM then it must not be locked and we can
clear the label-read failure condition. Otherwise, we need to reload the
entire bus provider driver to achieve the same effect, and that can
disrupt unrelated DIMMs and namespaces.

Fixes: 9d62ed96 ("libnvdimm: handle locked label storage areas")
Cc: <stable@vger.kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d34cb808

19 9月, 2017 1 次提交

libnvdimm, namespace: fix btt claim class crash · 33a56086

由 Dan Williams 提交于 9月 18, 2017

Maurice reports:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
    IP: holder_class_store+0x253/0x2b0 [libnvdimm]

...while trying to reconfigure an NVDIMM-N namespace into 'sector' /
'btt' mode. The crash points to this line:

    (gdb) li *(holder_class_store+0x253)
    0x7773 is in holder_class_store (drivers/nvdimm/namespace_devs.c:1420).
    1415            for (i = 0; i < nd_region->ndr_mappings; i++) {
    1416                    struct nd_mapping *nd_mapping = &nd_region->mapping[i];
    1417                    struct nvdimm_drvdata *ndd = to_ndd(nd_mapping);
    1418                    struct nd_namespace_index *nsindex;
    1419
    1420                    nsindex = to_namespace_index(ndd, ndd->ns_current);

...where we are failing because ndd is NULL due to NVDIMM-N dimms not
supporting labels.

Long story short, default to the BTTv1 format in the label-less /
NVDIMM-N case.

Fixes: 14e49454 ("libnvdimm, btt: BTT updates for UEFI 2.7 format")
Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Reported-by: NMaurice A. Saldivar <maurice.a.saldivar@hpe.com>
Tested-by: NMaurice A. Saldivar <maurice.a.saldivar@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

33a56086

11 9月, 2017 1 次提交

dax: remove the pmem_dax_ops->flush abstraction · c3ca015f

由 Mikulas Patocka 提交于 8月 31, 2017

Commit abebfbe2 ("dm: add ->flush() dax operation support") is
buggy. A DM device may be composed of multiple underlying devices and
all of them need to be flushed. That commit just routes the flush
request to the first device and ignores the other devices.

It could be fixed by adding more complex logic to the device mapper. But
there is only one implementation of the method pmem_dax_ops->flush - that
is pmem_dax_flush() - and it calls arch_wb_cache_pmem(). Consequently, we
don't need the pmem_dax_ops->flush abstraction at all, we can call
arch_wb_cache_pmem() directly from dax_flush() because dax_dev->ops->flush
can't ever reach anything different from arch_wb_cache_pmem().

It should be also pointed out that for some uses of persistent memory it
is needed to flush only a very small amount of data (such as 1 cacheline),
and it would be overkill if we go through that device mapper machinery for
a single flushed cache line.

Fix this by removing the pmem_dax_ops->flush abstraction and call
arch_wb_cache_pmem() directly from dax_flush(). Also, remove the device
mapper code that forwards the flushes.

Fixes: abebfbe2 ("dm: add ->flush() dax operation support")
Cc: stable@vger.kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c3ca015f

10 9月, 2017 1 次提交

libnvdimm, btt: fix format string warnings · 04c3c982

由 Randy Dunlap 提交于 9月 08, 2017

Fix format warnings (seen on i386) in nvdimm/btt.c:

../drivers/nvdimm/btt.c: In function ‘btt_map_init’:
../drivers/nvdimm/btt.c:430:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Wformat=]
   dev_WARN_ONCE(to_dev(arena), size < 512,
   ^
../drivers/nvdimm/btt.c: In function ‘btt_log_init’:
../drivers/nvdimm/btt.c:474:3: warning: format ‘%lx’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘size_t’ [-Wformat=]
   dev_WARN_ONCE(to_dev(arena), size < 512,
   ^

Fixes: 86652d2e ("libnvdimm, btt: clean up warning and error messages")
Reported-by: NArnd Bergmann <arnd@arndb.de>
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

04c3c982

08 9月, 2017 1 次提交

libnvdimm, btt: clean up warning and error messages · 86652d2e

由 Vishal Verma 提交于 9月 05, 2017

Convert all WARN* style messages to dev_WARN, and for errors in the IO
paths, use dev_err_ratelimited. Also remove some BUG_ONs in the IO path
and replace them with the above - no need to crash the machine in case
of an unaligned IO.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

86652d2e

07 9月, 2017 1 次提交

block, THP: make block_device_operations.rw_page support THP · 98cc093c

由 Huang Ying 提交于 9月 06, 2017

The .rw_page in struct block_device_operations is used by the swap
subsystem to read/write the page contents from/into the corresponding
swap slot in the swap device.  To support the THP (Transparent Huge
Page) swap optimization, the .rw_page is enhanced to support to
read/write THP if possible.

Link: http://lkml.kernel.org/r/20170724051840.2309-6-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Ross Zwisler <ross.zwisler@intel.com> [for brd.c, zram_drv.c, pmem.c]
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal L Verma <vishal.l.verma@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

98cc093c

05 9月, 2017 1 次提交

libnvdimm, nfit: move the check on nd_reserved2 to the endpoint · 9edcad53

由 Meng Xu 提交于 9月 04, 2017

Delay the check of nd_reserved2 to the actual endpoint (acpi_nfit_ctl)
that uses it, as a prevention of a potential double-fetch bug.

While examining the kernel source code, I found a dangerous operation that
could turn into a double-fetch situation (a race condition bug) where
the same userspace memory region are fetched twice into kernel with sanity
checks after the first fetch while missing checks after the second fetch.

In the case of _IOC_NR(ioctl_cmd) == ND_CMD_CALL:

1. The first fetch happens in line 935 copy_from_user(&pkg, p, sizeof(pkg)

2. subsequently `pkg.nd_reserved2` is asserted to be all zeroes
(line 984 to 986).

3. The second fetch happens in line 1022 copy_from_user(buf, p, buf_len)

4. Given that `p` can be fully controlled in userspace, an attacker can
race condition to override the header part of `p`, say,
`((struct nd_cmd_pkg *)p)->nd_reserved2` to arbitrary value
(say nine 0xFFFFFFFF for `nd_reserved2`) after the first fetch but before the
second fetch. The changed value will be copied to `buf`.

5. There is no checks on the second fetches until the use of it in
line 1034: nd_cmd_clear_to_send(nvdimm_bus, nvdimm, cmd, buf) and
line 1038: nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len, &cmd_rc)
which means that the assumed relation, `p->nd_reserved2` are all zeroes might
not hold after the second fetch. And once the control goes to these functions
we lose the context to assert the assumed relation.

6. Based on my manual analysis, `p->nd_reserved2` is not used in function
`nd_cmd_clear_to_send` and potential implementations of `nd_desc->ndctl`
so there is no working exploit against it right now. However, this could
easily turns to an exploitable one if careless developers start to use
`p->nd_reserved2` later and assume that they are all zeroes.

Move the validation of the nd_reserved2 field to the ->ndctl()
implementation where it has a stable buffer to evaluate.
Signed-off-by: NMeng Xu <mengxu.gatech@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

9edcad53

01 9月, 2017 6 次提交

libnvdimm: fix integer overflow static analysis warning · 58738c49

由 Dan Williams 提交于 8月 31, 2017

Dan reports:
    The patch 62232e45: "libnvdimm: control (ioctl) messages for
    nvdimm_bus and nvdimm devices" from Jun 8, 2015, leads to the
    following static checker warning:

            drivers/nvdimm/bus.c:1018 __nd_ioctl()
            warn: integer overflows 'buf_len'

    From a casual review, this seems like it might be a real bug.  On
    the first iteration we load some data into in_env[].  On the second
    iteration we read a use controlled "in_size" from nd_cmd_in_size().
    It can go up to UINT_MAX - 1.  A high number means we will fill the
    whole in_env[] buffer.  But we potentially keep looping and adding
    more to in_len so now it can be any value.

    It simple enough to change, but it feels weird that we keep looping
    even though in_env is totally full.  Shouldn't we just return an
    error if we don't have space for desc->in_num.

We keep looping because the size of the total input is allowed to be
bigger than the 'envelope' which is a subset of the payload that tells
us how much data to expect. For safety explicitly check that buf_len
does not overflow which is what the checker flagged.

Cc: <stable@vger.kernel.org>
Fixes: 62232e45: "libnvdimm: control (ioctl) messages for nvdimm_bus..."
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

58738c49

libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7

由 Robin Murphy 提交于 8月 31, 2017

mmio_flush_range() suffers from a lack of clearly-defined semantics,
and is somewhat ambiguous to port to other architectures where the
scope of the writeback implied by "flush" and ordering might matter,
but MMIO would tend to imply non-cacheable anyway. Per the rationale
in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
only existing use is actually to invalidate clean cache lines for
ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
cleanup of the pmem API, that also now happens to be the exact purpose
of arch_invalidate_pmem(), which would be a far more well-defined tool
for the job.

Rather than risk potentially inconsistent implementations of
mmio_flush_range() for the sake of one callsite, streamline things by
removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
definitions up to the libnvdimm level, so they can be shared by NFIT
as well. This allows NFIT to be enabled for arm64.
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5deb67f7

libnvdimm, btt: rework error clearing · d9b83c75

由 Vishal Verma 提交于 8月 30, 2017

Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.

In this patch we move error clearing out of the atomic section, and thus
re-enable error clearing with BTTs. When we are about to add a block to
the free list, we check if it was previously marked as an error, and if
it was, we add it to the freelist, but also set a flag that says error
clearing will be required. We then drop the lane (ending the atomic
context), and send a zero buffer so that the error can be cleared. The
error flag in the free list is protected by the nd 'lane', and is set
only be a thread while it holds that lane. When the error is cleared,
the flag is cleared, but while holding a mutex for that freelist index.

When writing, we check for two things -
1/ If the freelist mutex is held or if the error flag is set. If so,
this is an error block that is being (or about to be) cleared.
2/ If the block is a known badblock based on nsio->bb

The second check is required because the BTT map error flag for a map
entry only gets set when an error LBA is read. If we write to a new
location that may not have the map error flag set, but still might be in
the region's badblock list, we can trigger an EIO on the write, which is
undesirable and completely avoidable.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d9b83c75

libnvdimm: fix potential deadlock while clearing errors · 0930a750

由 Vishal Verma 提交于 8月 30, 2017

With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths.
Specifically, the DSM to clear media errors is called during writes, so
that we can provide a writes-fix-errors model.

However it is easy to imagine a scenario like:
 -> write through the nvdimm driver
   -> acpi allocation
     -> writeback, causes more IO through the nvdimm driver
       -> deadlock

Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO
flag for the current scope when issuing commands/IOs that are expected
to clear errors.

Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0930a750

libnvdimm, btt: cache sector_size in arena_info · 75892004

由 Vishal Verma 提交于 8月 30, 2017

In preparation for the error clearing rework, add sector_size in the
arena_info struct.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

75892004

libnvdimm, btt: ensure that flags were also unchanged during a map_read · 1398199d

由 Vishal Verma 提交于 8月 30, 2017

In btt_map_read, we read the map twice to make sure that the map entry
didn't change after we added it to the read tracking table. In
anticipation of expanding the use of the error bit, also make sure that
the error and zero flags are constant across the two map reads.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1398199d