提交 · 476f848aaee466fd5d74f123fa652e757f2baeba · openanolis / cloud-kernel

13 7月, 2016 1 次提交

libnvdimm, pmem: flush posted-write queues on shutdown · 476f848a

由 Dan Williams 提交于 7月 09, 2016

Commit writes to media on system shutdown or pmem driver unload.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

476f848a

12 7月, 2016 2 次提交

libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() · 7e267a8c

由 Dan Williams 提交于 6月 01, 2016

Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
chasing through nd_region), and that we otherwise assume a platform has
ADR capability when flush hints are not present, move nvdimm_flush() to
REQ_FLUSH context.

Note that we still arrange for nvdimm_flush() to be called even in the
ADR case. We need at least once wmb() fence to push buffered writes in
the cpu out to the ADR protected domain.

Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

7e267a8c

libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() · f284a4f2

由 Dan Williams 提交于 7月 07, 2016

nvdimm_flush() is a replacement for the x86 'pcommit' instruction.  It is
an optional write flushing mechanism that an nvdimm bus can provide for
the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
nvdimm_flush() is implemented as a series of flush-hint-address [1]
writes to each dimm in the interleave set (region) that backs the
namespace.

The nvdimm_has_flush() routine relies on platform firmware to describe
the flushing capabilities of a platform.  It uses the heuristic of
whether an nvdimm bus provider provides flush address data to return a
ternary result:

      1: flush addresses defined
      0: dimm topology described without flush addresses (assume ADR)
 -errno: no topology information, unable to determine flush mechanism

The pmem driver is expected to take the following actions on this ternary
result:

      1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown
      0: do not set, WC or FUA on the queue, take no further action
 -errno: warn and then operate as if nvdimm_has_flush() returned '0'

The caveat of this heuristic is that it can not distinguish the "dimm
does not have flush address" case from the "platform firmware is broken
and failed to describe a flush address".  Given we are already
explicitly trusting the NFIT there's not much more we can do beyond
blacklisting broken firmwares if they are ever encountered.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f284a4f2

25 6月, 2016 1 次提交

libnvdimm, pmem: allow nfit_test to override pmem_direct_access() · f295e53b

由 Dan Williams 提交于 6月 17, 2016

Currently phys_to_pfn_t() is an exported symbol to allow nfit_test to
override it and indicate that nfit_test-pmem is not device-mapped. Now,
we want to enable nfit_test to operate without DMA_CMA and the pmem it
provides will no longer be physically contiguous, i.e. won't be capable
of supporting direct_access requests larger than a page. Make
pmem_direct_access() a weak symbol so that it can be replaced by the
tools/testing/nvdimm/ version, and move phys_to_pfn_t() to a static
inline now that it no longer needs to be overridden.
Acked-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f295e53b

16 6月, 2016 1 次提交

libnvdimm: use devm_add_action_or_reset() · f02716db

由 Dan Williams 提交于 6月 15, 2016

Clean up needless calls to the action routine by letting
devm_add_action_or_reset() call it automatically. This does cause the
disk to registered and immediately unregistered when a memory allocation
fails, but the block layer should be prepared for such an event.
Reported-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f02716db

21 5月, 2016 1 次提交

libnvdimm, dax: autodetect support · c5ed9268

由 Dan Williams 提交于 5月 18, 2016

For autodetecting a previously established dax configuration we need the
info block to indicate block-device vs device-dax mode, and we need to
have the default namespace probe hand-off the configuration to the
dax_pmem driver.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c5ed9268

19 5月, 2016 1 次提交

dax: enable dax in the presence of known media errors (badblocks) · 0a70bd43

由 Dan Williams 提交于 2月 24, 2016

1/ If a mapping overlaps a bad sector fail the request.

2/ Do not opportunistically report more dax-capable capacity than is
   requested when errors present.
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
[vishal: fix a conflict with system RAM collision patches]
[vishal: add a 'size' parameter to ->direct_access]
[vishal: fix a conflict with DAX alignment check patches]
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>

0a70bd43

07 5月, 2016 1 次提交

libnvdimm, pfn: fix ARCH=alpha allmodconfig build failure · 1b8d2afd

由 Dan Williams 提交于 5月 06, 2016

I had relied on the kbuild robot for cross build coverage, however it
only builds alpha_defconfig.  Switch from HPAGE_SIZE to PMD_SIZE, which
is more widely defined.

Fixes: 658922e5 ("libnvdimm, pfn: fix memmap reservation sizing")
Cc: <stable@vger.kernel.org>
Reported-by: NGuenter Roeck <guenter@roeck-us.net>
Tested-by: NGuenter Roeck <guenter@roeck-us.net>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1b8d2afd

01 5月, 2016 1 次提交

libnvdimm, pfn: fix memmap reservation sizing · 658922e5

由 Dan Williams 提交于 4月 30, 2016

When configuring a pfn-device instance to allocate the memmap array it
needs to account for the fact that vmemmap_populate_hugepages()
allocates struct page blocks in HPAGE_SIZE chunks.  We need to align the
reserved area size to 2MB otherwise arch_add_memory() runs out of memory
while establishing the memmap:

 WARNING: CPU: 0 PID: 496 at arch/x86/mm/init_64.c:704 arch_add_memory+0xe7/0xf0
 [..]
 Call Trace:
  [<ffffffff8148bdb3>] dump_stack+0x85/0xc2
  [<ffffffff810a749b>] __warn+0xcb/0xf0
  [<ffffffff810a75cd>] warn_slowpath_null+0x1d/0x20
  [<ffffffff8106a497>] arch_add_memory+0xe7/0xf0
  [<ffffffff811d2097>] devm_memremap_pages+0x287/0x450
  [<ffffffff811d1ffa>] ? devm_memremap_pages+0x1ea/0x450
  [<ffffffffa0000298>] __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
  [<ffffffffa0047a58>] pmem_attach_disk+0x318/0x420 [nd_pmem]
  [<ffffffffa0047bcf>] nd_pmem_probe+0x6f/0x90 [nd_pmem]
  [<ffffffffa0009469>] nvdimm_bus_probe+0x69/0x110 [libnvdimm]
 [..]
  ndbus0: nd_pmem.probe(pfn3.0) = -12
 nd_pmem: probe of pfn3.0 failed with error -12
libndctl: ndctl_pfn_enable: pfn3.0: failed to enable
Reported-by: NNamratha Kothapalli <namratha.n.kothapalli@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

658922e5

23 4月, 2016 9 次提交

libnvdimm, pmem: kill ->pmem_queue and ->pmem_disk · 5a92289f

由 Dan Williams 提交于 3月 21, 2016

The devm conversion obviates the need to continue to remember the queue
and disk locally in the driver.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5a92289f

libnvdimm, pmem, pfn: move pfn setup to the core · ac515c08

由 Dan Williams 提交于 3月 22, 2016

Now that pmem internals have been disentangled from pfn setup, that code
can move to the core.  This is in preparation for adding another user of
the pfn-device capabilities.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ac515c08

libnvdimm, pmem, pfn: make pmem_rw_bytes generic and refactor pfn setup · 200c79da

由 Dan Williams 提交于 3月 22, 2016

In preparation for providing an alternative (to block device) access
mechanism to persistent memory, convert pmem_rw_bytes() to
nsio_rw_bytes().  This allows ->rw_bytes() functionality without
requiring a 'struct pmem_device' to be instantiated.

In other words, when ->rw_bytes() is in use i/o is driven through
'struct nd_namespace_io', otherwise it is driven through 'struct
pmem_device' and the block layer.  This consolidates the disjoint calls
to devm_exit_badblocks() and devm_memunmap() into a common
devm_nsio_disable() and cleans up the init path to use a unified
pmem_attach_disk() implementation.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

200c79da

libnvdimm, pmem: clean up resource print / request · 947df02d

由 Dan Williams 提交于 3月 21, 2016

The leading '0x' in front of %pa is redundant, also we can just use %pR
to simplify the print statement.  The request parameters can be directly
taken from the resource as well.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

947df02d

libnvdimm, pmem: use devm_add_action to release bdev resources · 030b99e3

由 Dan Williams 提交于 3月 17, 2016

Register a callback to clean up the request_queue and put the gendisk at
driver disable time.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

030b99e3

libnvdimm, pmem: use ->queuedata for driver private data · bd842b8c

由 Dan Williams 提交于 3月 18, 2016

Save a pointer chase by storing the driver private data in the
request_queue rather than the gendisk.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

bd842b8c

libnvdimm, btt, convert nd_btt_probe() to devm · e32bc729

由 Dan Williams 提交于 3月 17, 2016

Pass the device performing the probe so we can use a devm allocation for
the btt superblock.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e32bc729

libnvdimm, pfn, convert nd_pfn_probe() to devm · bd032943

由 Dan Williams 提交于 3月 17, 2016

Pass the device performing the probe so we can use a devm allocation for
the pfn superblock.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

bd032943

libnvdimm, pmem: kill pmem->ndns · 298f2bc5

由 Dan Williams 提交于 3月 15, 2016

We can derive the common namespace from other information.  We also do
not need to cache it because all the usages are in slow paths.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

298f2bc5

16 4月, 2016 1 次提交

libnvdimm, pmem: clarify the write+clear_poison+write flow · 0a370d26

由 Dan Williams 提交于 4月 14, 2016

The ACPI specification does not specify the state of data after a clear
poison operation.  Potential future libnvdimm bus implementations for
other architectures also might not specify or disagree on the state of
data after clear poison.  Clarify why we write twice.
Reported-by: NJeff Moyer <jmoyer@redhat.com>
Reported-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>

0a370d26

08 4月, 2016 1 次提交

libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignment · a3901802

由 Dan Williams 提交于 4月 07, 2016

When section alignment padding is in effect we need to shift / truncate
the range that is queried for poison by the 'start_pad' or 'end_trunc'
reservations.

It's easiest if we just pass in an adjusted resource range rather than
deriving it from the passed in namespace.  With the resource range
resolution pushed out to the caller we can also push the
namespace-to-region lookup to the caller and drop the implicit pmem-type
assumption about the passed in namespace object.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

a3901802

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

29 3月, 2016 1 次提交

x86, pmem: use memcpy_mcsafe() for memcpy_from_pmem() · fc0c2028

由 Dan Williams 提交于 3月 08, 2016

Update the definition of memcpy_from_pmem() to return 0 or a negative
error code.  Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().

Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

fc0c2028

10 3月, 2016 3 次提交

libnvdimm, pmem: clear poison on write · 59e64739

由 Dan Williams 提交于 3月 08, 2016

If a write is directed at a known bad block perform the following:

1/ write the data

2/ send a clear poison command

3/ invalidate the poison out of the cache hierarchy

Cc: <x86@kernel.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

59e64739

libnvdimm, pmem: fix kmap_atomic() leak in error path · b5ebc8ec

由 Dan Williams 提交于 3月 06, 2016

When we enounter a bad block we need to kunmap_atomic() before
returning.

Cc: <stable@vger.kernel.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b5ebc8ec

pmem: don't allocate unused major device number · 55155291

由 NeilBrown 提交于 3月 09, 2016

When alloc_disk(0) or alloc_disk-node(0, XX) is used, the ->major
number is completely ignored:  all devices are allocated with a
major of BLOCK_EXT_MAJOR.

So there is no point allocating pmem_major.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

55155291

07 3月, 2016 1 次提交

libnvdimm, pmem: fix ia64 build, use PHYS_PFN · 45f68802

由 Dan Williams 提交于 3月 06, 2016

   drivers/nvdimm/pmem.c: In function 'nvdimm_namespace_attach_pfn':
   drivers/nvdimm/pmem.c:367:3: error: implicit declaration of function
   	'__phys_to_pfn' [-Werror=implicit-function-declaration]
   .base_pfn = __phys_to_pfn(nsio->res.start),

ia64 does not provide __phys_to_pfn(), just use the PHYS_PFN() alias.

Cc: Guenter Roeck <linux@roeck-us.net>
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

45f68802

06 3月, 2016 3 次提交

libnvdimm, pmem: adjust for section collisions with 'System RAM' · cfe30b87

由 Dan Williams 提交于 3月 03, 2016

On a platform where 'Persistent Memory' and 'System RAM' are mixed
within a given sparsemem section, trim the namespace and notify about the
sub-optimal alignment.

Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

cfe30b87

libnvdimm, pmem: fix 'pfn' support for section-misaligned namespaces · d9cbe09d

由 Dan Williams 提交于 3月 03, 2016

The altmap for a section-misaligned namespace needs to arrange for the
base_pfn to be section-aligned.  As a result the 'reserve' region (pfns
from base that do not have a struct page) must be increased.  Otherwise
we trip the altmap validation check in __add_pages:

	if (altmap->base_pfn != phys_start_pfn
			|| vmem_altmap_offset(altmap) > nr_pages) {
		pr_warn_once("memory add fail, invalid altmap\n");
		return -EINVAL;
	}
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d9cbe09d

libnvdimm: async notification support · 71999466

由 Dan Williams 提交于 2月 18, 2016

In preparation for asynchronous address range scrub support add an
ability for the pmem driver to dynamically consume address range scrub
results.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

71999466

24 2月, 2016 1 次提交

nvdimm: use 'u64' for pfn flags · c4544205

由 Arnd Bergmann 提交于 2月 22, 2016

A recent bugfix changed pfn_t to always be 64-bit wide, but did not
change the code in pmem.c, which is now broken on 32-bit architectures
as reported by gcc:

In file included from ../drivers/nvdimm/pmem.c:28:0:
drivers/nvdimm/pmem.c: In function 'pmem_alloc':
include/linux/pfn_t.h:15:17: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
 #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3))

This changes the intermediate pfn_flags in struct pmem_device to
be 64 bit wide as well, so they can store the flags correctly.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: db78c222 ("mm: fix pfn_t vs highmem")
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c4544205

16 1月, 2016 6 次提交

mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup · 5c2c2587