提交 · fc17b6534eb8395f0b3133eb31d87deec32c642b · openanolis / cloud-kernel

01 5月, 2017 1 次提交

mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash · 71389703

由 Dan Williams 提交于 4月 28, 2017

The x86 conversion to the generic GUP code included a small change which causes
crashes and data corruption in the pmem code - not good.

The root cause is that the /dev/pmem driver code implicitly relies on the x86
get_user_pages() implementation doing a get_page() on the page refcount, because
get_page() does a get_zone_device_page() which properly refcounts pmem's separate
page struct arrays that are not present in the regular page struct structures.
(The pmem driver does this because it can cover huge memory areas.)

But the x86 conversion to the generic GUP code changed the get_page() to
page_cache_get_speculative() which is faster but doesn't do the
get_zone_device_page() call the pmem code relies on.

One way to solve the regression would be to change the generic GUP code to use
get_page(), but that would slow things down a bit and punish other generic-GUP
using architectures for an x86-ism they did not care about. (Arguably the pmem
driver was probably not working reliably for them: but nvdimm is an Intel
feature, so non-x86 exposure is probably still limited.)

So restructure the pmem code's interface with the MM instead: get rid of the
get/put_zone_device_page() distinction, integrate put_zone_device_page() into
__put_page() and and restructure the pmem completion-wait and teardown machinery:

Kirill points out that the calls to {get,put}_dev_pagemap() can be
removed from the mm fast path if we take a single get_dev_pagemap()
reference to signify that the page is alive and use the final put of the
page to drop that reference.

This does require some care to make sure that any waits for the
percpu_ref to drop to zero occur *after* devm_memremap_page_release(),
since it now maintains its own elevated reference.

This speeds up things while also making the pmem refcounting more robust going
forward.
Suggested-by: NKirill Shutemov <kirill.shutemov@linux.intel.com>
Tested-by: NKirill Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/149339998297.24933.1129582806028305912.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

71389703

29 4月, 2017 1 次提交

libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify · b2518c78

由 Toshi Kani 提交于 4月 25, 2017

The following BUG was observed when nd_pmem_notify() was called
for a BTT device.  The use of a pmem_device pointer is not valid
with BTT.

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
 IP: nd_pmem_notify+0x30/0xf0 [nd_pmem]
 Call Trace:
  nd_device_notify+0x40/0x50
  child_notify+0x10/0x20
  device_for_each_child+0x50/0x90
  nd_region_notify+0x20/0x30
  nd_device_notify+0x40/0x50
  nvdimm_region_notify+0x27/0x30
  acpi_nfit_scrub+0x341/0x590 [nfit]
  process_one_work+0x197/0x450
  worker_thread+0x4e/0x4a0
  kthread+0x109/0x140

Fix nd_pmem_notify() by setting nd_region and badblocks pointers
properly for BTT.

Cc: <stable@vger.kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Fixes: 71999466 ("libnvdimm: async notification support")
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b2518c78

26 4月, 2017 2 次提交

x86, dax, pmem: remove indirection around memcpy_from_pmem() · 6abccd1b

由 Dan Williams 提交于 1月 13, 2017

memcpy_from_pmem() maps directly to memcpy_mcsafe(). The wrapper
serves no real benefit aside from affording a more generic function name
than the x86-specific 'mcsafe'. However this would not be the first time
that x86 terminology leaked into the global namespace. For lack of
better name, just use memcpy_mcsafe() directly.

This conversion also catches a place where we should have been using
plain memcpy, acpi_nfit_blk_single_io().

Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Acked-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6abccd1b

block: remove block_device_operations ->direct_access() · d4b29fd7

由 Dan Williams 提交于 1月 27, 2017

Now that all the producers and consumers of dax interfaces have been
converted to using dax_operations on a dax_device, remove the block
device direct_access enabling.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d4b29fd7

20 4月, 2017 1 次提交

pmem: add dax_operations support · c1d6e828

由 Dan Williams 提交于 1月 24, 2017

Setup a dax_device to have the same lifetime as the pmem block device
and add a ->direct_access() method that is equivalent to
pmem_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old pmem_direct_access() will be removed.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c1d6e828

13 1月, 2017 1 次提交

pmem: return EIO on read_pmem() failure · d47d1d27

由 Stefan Hajnoczi 提交于 1月 05, 2017

The read_pmem() function uses memcpy_mcsafe() on x86 where an EFAULT
error code indicates a failed read.  Block I/O should use EIO to
indicate failure.  Other pmem code paths (like bad blocks) already use
EIO so let's be consistent.

This fixes compatibility with consumers like btrfs that try to parse the
specific error code rather than treat all errors the same.
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d47d1d27

17 12月, 2016 1 次提交

libnvdimm: fix mishandled nvdimm_clear_poison() return value · 868f036f

由 Dan Williams 提交于 12月 16, 2016

Colin, via static analysis, reports that the length could be negative
from nvdimm_clear_poison() in the error case. There was a similar
problem with commit 0a3f27b9 "libnvdimm, namespace: avoid multiple
sector calculations" that I noticed when merging the for-4.10/libnvdimm
topic branch into libnvdimm-for-next, but I missed this one. Fix both of
them to the following procedure:

* if we clear a block's worth of media, clear that many blocks in
  badblocks

* if we clear less than the requested size of the transfer return an
  error

* always invalidate cache after any non-error / non-zero
  nvdimm_clear_poison result

Fixes: 82bf1037 ("libnvdimm: check and clear poison before writing to pmem")
Fixes: 0a3f27b9 ("libnvdimm, namespace: avoid multiple sector calculations")
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Dave Jiang <dave.jiang@intel.com>
Reported-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

868f036f

05 12月, 2016 1 次提交

libnvdimm, namespace: avoid multiple sector calculations · 0a3f27b9

由 Fabian Frederick 提交于 12月 04, 2016

Use sector_t for cleared
Suggested-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0a3f27b9

29 11月, 2016 1 次提交

libnvdimm: use consistent naming for request_mem_region() · 450c6633

由 Dan Williams 提交于 11月 28, 2016

Here is an example /proc/iomem listing for a system with 2 namespaces,
one in "sector" mode and one in "memory" mode:

  1fc000000-2fbffffff : Persistent Memory (legacy)
    1fc000000-2fbffffff : namespace1.0
  340000000-34fffffff : Persistent Memory
    340000000-34fffffff : btt0.1

Here is the corresponding ndctl listing:

  # ndctl list
  [
    {
      "dev":"namespace1.0",
      "mode":"memory",
      "size":4294967296,
      "blockdev":"pmem1"
    },
    {
      "dev":"namespace0.0",
      "mode":"sector",
      "size":267091968,
      "uuid":"f7594f86-badb-4592-875f-ded577da2eaf",
      "sector_size":4096,
      "blockdev":"pmem0s"
    }
  ]

Notice that the ndctl listing is purely in terms of namespace devices,
while the iomem listing leaks the internal "btt0.1" implementation
detail. Given that ndctl requires the namespace device name to change
the mode, for example:

  # ndctl create-namespace --reconfig=namespace0.0 --mode=raw --force

...use the namespace name in the iomem listing to keep the claiming
device name consistent across different mode settings.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

450c6633

20 10月, 2016 1 次提交

pmem: report error on clear poison failure · 3115bb02

由 Toshi Kani 提交于 10月 13, 2016

ACPI Clear Uncorrectable Error DSM function may fail or may be
unsupported on a platform.  pmem_clear_poison() returns without clearing
badblocks in such cases.  This failure is detected at the next read
(-EIO).

This behavior can lead to an issue when user keeps writing but does not
read immediately.  For instance, flight recorder file may be only read
when it is necessary for troubleshooting.

Change pmem_do_bvec() and pmem_clear_poison() to return -EIO so that
filesystem can log an error message on a write error.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

3115bb02

01 10月, 2016 1 次提交

pmem: reduce kmap_atomic sections to the memcpys only · bd697a80

由 Vishal Verma 提交于 9月 30, 2016

pmem_do_bvec used to kmap_atomic at the begin, and only unmap at the
end. Things like nvdimm_clear_poison may want to do nvdimm subsystem
bookkeeping operations that may involve taking locks or doing memory
allocations, and we can't do that from the atomic context. Reduce the
atomic context to just what needs it - the memcpy to/from pmem.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

bd697a80

08 8月, 2016 2 次提交

block: rename bio bi_rw to bi_opf · 1eff9d32

由 Jens Axboe 提交于 8月 05, 2016

Since commit 63a4cc24, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.
Signed-off-by: NJens Axboe <axboe@fb.com>

1eff9d32

block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b

由 Jens Axboe 提交于 8月 05, 2016

Commit abf54548 changed it from an 'rw' flags type to the
newer ops based interface, but now we're effectively leaking
some bdev internals to the rest of the kernel. Since we only
care about whether it's a read or a write at that level, just
pass in a bool 'is_write' parameter instead.

Then we can also move op_is_write() and friends back under
CONFIG_BLOCK protection.
Reviewed-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

c11f0c0b

05 8月, 2016 1 次提交

mm/block: convert rw_page users to bio op use · abf54548

由 Mike Christie 提交于 8月 04, 2016

The rw_page users were not converted to use bio/req ops. As a result
bdev_write_page is not passing down REQ_OP_WRITE and the IOs will
be sent down as reads.
Signed-off-by: NMike Christie <mchristi@redhat.com>
Fixes: 4e1b2d52 ("block, fs, drivers: remove REQ_OP compat defs and related code")

Modified by me to:

1) Drop op_flags passing into ->rw_page(), as we don't use it.
2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK
Signed-off-by: NJens Axboe <axboe@fb.com>

abf54548

24 7月, 2016 1 次提交

pmem: clarify a debug print in pmem_clear_poison · 5bf0b6e1

由 Vishal Verma 提交于 7月 22, 2016

Prefix the sector number being cleared with a '0x' to make it clear that
this is a hex value.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5bf0b6e1

21 7月, 2016 1 次提交

block: add QUEUE_FLAG_DAX for devices to advertise their DAX support · 163d4baa

由 Toshi Kani 提交于 6月 23, 2016

Currently, presence of direct_access() in block_device_operations
indicates support of DAX on its block device.  Because
block_device_operations is instantiated with 'const', this DAX
capablity may not be enabled conditinally.

In preparation for supporting DAX to device-mapper devices, add
QUEUE_FLAG_DAX to request_queue flags to advertise their DAX
support.  This will allow to set the DAX capability based on how
mapped device is composed.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <linux-s390@vger.kernel.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

163d4baa

13 7月, 2016 2 次提交

pmem: kill __pmem address space · 7a9eb206

由 Dan Williams 提交于 6月 03, 2016

The __pmem address space was meant to annotate codepaths that touch
persistent memory and need to coordinate a call to wmb_pmem().  Now that
wmb_pmem() is gone, there is little need to keep this annotation.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

7a9eb206

libnvdimm, pmem: flush posted-write queues on shutdown · 476f848a

由 Dan Williams 提交于 7月 09, 2016

Commit writes to media on system shutdown or pmem driver unload.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

476f848a

12 7月, 2016 2 次提交

libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() · 7e267a8c

由 Dan Williams 提交于 6月 01, 2016

Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
chasing through nd_region), and that we otherwise assume a platform has
ADR capability when flush hints are not present, move nvdimm_flush() to
REQ_FLUSH context.

Note that we still arrange for nvdimm_flush() to be called even in the
ADR case. We need at least once wmb() fence to push buffered writes in
the cpu out to the ADR protected domain.

Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

7e267a8c

libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() · f284a4f2

由 Dan Williams 提交于 7月 07, 2016

nvdimm_flush() is a replacement for the x86 'pcommit' instruction.  It is
an optional write flushing mechanism that an nvdimm bus can provide for
the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
nvdimm_flush() is implemented as a series of flush-hint-address [1]
writes to each dimm in the interleave set (region) that backs the
namespace.

The nvdimm_has_flush() routine relies on platform firmware to describe
the flushing capabilities of a platform.  It uses the heuristic of
whether an nvdimm bus provider provides flush address data to return a
ternary result:

      1: flush addresses defined
      0: dimm topology described without flush addresses (assume ADR)
 -errno: no topology information, unable to determine flush mechanism

The pmem driver is expected to take the following actions on this ternary
result:

      1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown
      0: do not set, WC or FUA on the queue, take no further action
 -errno: warn and then operate as if nvdimm_has_flush() returned '0'

The caveat of this heuristic is that it can not distinguish the "dimm
does not have flush address" case from the "platform firmware is broken
and failed to describe a flush address".  Given we are already
explicitly trusting the NFIT there's not much more we can do beyond
blacklisting broken firmwares if they are ever encountered.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f284a4f2

28 6月, 2016 1 次提交

block: convert to device_add_disk() · 0d52c756

由 Dan Williams 提交于 6月 15, 2016

For block drivers that specify a parent device, convert them to use
device_add_disk().

This conversion was done with the following semantic patch:

    @@
    struct gendisk *disk;
    expression E;
    @@

    - disk->driverfs_dev = E;
    ...
    - add_disk(disk);
    + device_add_disk(E, disk);

    @@
    struct gendisk *disk;
    expression E1, E2;
    @@

    - disk->driverfs_dev = E1;
    ...
    E2 = disk;
    ...
    - add_disk(E2);
    + device_add_disk(E1, E2);

...plus some manual fixups for a few missed conversions.

Cc: Jens Axboe <axboe@fb.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0d52c756

25 6月, 2016 1 次提交

libnvdimm, pmem: allow nfit_test to override pmem_direct_access() · f295e53b

由 Dan Williams 提交于 6月 17, 2016

Currently phys_to_pfn_t() is an exported symbol to allow nfit_test to
override it and indicate that nfit_test-pmem is not device-mapped. Now,
we want to enable nfit_test to operate without DMA_CMA and the pmem it
provides will no longer be physically contiguous, i.e. won't be capable
of supporting direct_access requests larger than a page. Make
pmem_direct_access() a weak symbol so that it can be replaced by the
tools/testing/nvdimm/ version, and move phys_to_pfn_t() to a static
inline now that it no longer needs to be overridden.
Acked-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f295e53b

16 6月, 2016 1 次提交

libnvdimm: use devm_add_action_or_reset() · f02716db

由 Dan Williams 提交于 6月 15, 2016

Clean up needless calls to the action routine by letting
devm_add_action_or_reset() call it automatically. This does cause the
disk to registered and immediately unregistered when a memory allocation
fails, but the block layer should be prepared for such an event.
Reported-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f02716db

21 5月, 2016 1 次提交

libnvdimm, dax: autodetect support · c5ed9268

由 Dan Williams 提交于 5月 18, 2016

For autodetecting a previously established dax configuration we need the
info block to indicate block-device vs device-dax mode, and we need to
have the default namespace probe hand-off the configuration to the
dax_pmem driver.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c5ed9268

19 5月, 2016 1 次提交

dax: enable dax in the presence of known media errors (badblocks) · 0a70bd43

由 Dan Williams 提交于 2月 24, 2016

1/ If a mapping overlaps a bad sector fail the request.

2/ Do not opportunistically report more dax-capable capacity than is
   requested when errors present.
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
[vishal: fix a conflict with system RAM collision patches]
[vishal: add a 'size' parameter to ->direct_access]
[vishal: fix a conflict with DAX alignment check patches]
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>

0a70bd43

07 5月, 2016 1 次提交

libnvdimm, pfn: fix ARCH=alpha allmodconfig build failure · 1b8d2afd

由 Dan Williams 提交于 5月 06, 2016

I had relied on the kbuild robot for cross build coverage, however it
only builds alpha_defconfig.  Switch from HPAGE_SIZE to PMD_SIZE, which
is more widely defined.

Fixes: 658922e5 ("libnvdimm, pfn: fix memmap reservation sizing")
Cc: <stable@vger.kernel.org>
Reported-by: NGuenter Roeck <guenter@roeck-us.net>
Tested-by: NGuenter Roeck <guenter@roeck-us.net>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1b8d2afd

01 5月, 2016 1 次提交

libnvdimm, pfn: fix memmap reservation sizing · 658922e5

由 Dan Williams 提交于 4月 30, 2016

When configuring a pfn-device instance to allocate the memmap array it
needs to account for the fact that vmemmap_populate_hugepages()
allocates struct page blocks in HPAGE_SIZE chunks.  We need to align the
reserved area size to 2MB otherwise arch_add_memory() runs out of memory
while establishing the memmap:

 WARNING: CPU: 0 PID: 496 at arch/x86/mm/init_64.c:704 arch_add_memory+0xe7/0xf0
 [..]
 Call Trace:
  [<ffffffff8148bdb3>] dump_stack+0x85/0xc2
  [<ffffffff810a749b>] __warn+0xcb/0xf0
  [<ffffffff810a75cd>] warn_slowpath_null+0x1d/0x20
  [<ffffffff8106a497>] arch_add_memory+0xe7/0xf0
  [<ffffffff811d2097>] devm_memremap_pages+0x287/0x450
  [<ffffffff811d1ffa>] ? devm_memremap_pages+0x1ea/0x450
  [<ffffffffa0000298>] __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
  [<ffffffffa0047a58>] pmem_attach_disk+0x318/0x420 [nd_pmem]
  [<ffffffffa0047bcf>] nd_pmem_probe+0x6f/0x90 [nd_pmem]
  [<ffffffffa0009469>] nvdimm_bus_probe+0x69/0x110 [libnvdimm]
 [..]
  ndbus0: nd_pmem.probe(pfn3.0) = -12
 nd_pmem: probe of pfn3.0 failed with error -12
libndctl: ndctl_pfn_enable: pfn3.0: failed to enable
Reported-by: NNamratha Kothapalli <namratha.n.kothapalli@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

658922e5

23 4月, 2016 9 次提交

libnvdimm, pmem: kill ->pmem_queue and ->pmem_disk · 5a92289f

由 Dan Williams 提交于 3月 21, 2016

The devm conversion obviates the need to continue to remember the queue
and disk locally in the driver.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5a92289f

libnvdimm, pmem, pfn: move pfn setup to the core · ac515c08

由 Dan Williams 提交于 3月 22, 2016

Now that pmem internals have been disentangled from pfn setup, that code
can move to the core.  This is in preparation for adding another user of
the pfn-device capabilities.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ac515c08

libnvdimm, pmem, pfn: make pmem_rw_bytes generic and refactor pfn setup · 200c79da

由 Dan Williams 提交于 3月 22, 2016

In preparation for providing an alternative (to block device) access
mechanism to persistent memory, convert pmem_rw_bytes() to
nsio_rw_bytes().  This allows ->rw_bytes() functionality without
requiring a 'struct pmem_device' to be instantiated.

In other words, when ->rw_bytes() is in use i/o is driven through
'struct nd_namespace_io', otherwise it is driven through 'struct
pmem_device' and the block layer.  This consolidates the disjoint calls
to devm_exit_badblocks() and devm_memunmap() into a common
devm_nsio_disable() and cleans up the init path to use a unified
pmem_attach_disk() implementation.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

200c79da

libnvdimm, pmem: clean up resource print / request · 947df02d

由 Dan Williams 提交于 3月 21, 2016

The leading '0x' in front of %pa is redundant, also we can just use %pR
to simplify the print statement.  The request parameters can be directly
taken from the resource as well.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

947df02d

libnvdimm, pmem: use devm_add_action to release bdev resources · 030b99e3

由 Dan Williams 提交于 3月 17, 2016

Register a callback to clean up the request_queue and put the gendisk at
driver disable time.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

030b99e3

libnvdimm, pmem: use ->queuedata for driver private data · bd842b8c

由 Dan Williams 提交于 3月 18, 2016

Save a pointer chase by storing the driver private data in the
request_queue rather than the gendisk.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

bd842b8c

libnvdimm, btt, convert nd_btt_probe() to devm · e32bc729

由 Dan Williams 提交于 3月 17, 2016

Pass the device performing the probe so we can use a devm allocation for
the btt superblock.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e32bc729

libnvdimm, pfn, convert nd_pfn_probe() to devm · bd032943

由 Dan Williams 提交于 3月 17, 2016

Pass the device performing the probe so we can use a devm allocation for
the pfn superblock.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

bd032943

libnvdimm, pmem: kill pmem->ndns · 298f2bc5

由 Dan Williams 提交于 3月 15, 2016

We can derive the common namespace from other information.  We also do
not need to cache it because all the usages are in slow paths.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

298f2bc5

16 4月, 2016 1 次提交

libnvdimm, pmem: clarify the write+clear_poison+write flow · 0a370d26

由 Dan Williams 提交于 4月 14, 2016

The ACPI specification does not specify the state of data after a clear
poison operation.  Potential future libnvdimm bus implementations for
other architectures also might not specify or disagree on the state of
data after clear poison.  Clarify why we write twice.
Reported-by: NJeff Moyer <jmoyer@redhat.com>
Reported-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>

0a370d26

08 4月, 2016 1 次提交

libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignment · a3901802

由 Dan Williams 提交于 4月 07, 2016

When section alignment padding is in effect we need to shift / truncate
the range that is queried for poison by the 'start_pad' or 'end_trunc'
reservations.

It's easiest if we just pass in an adjusted resource range rather than
deriving it from the passed in namespace.  With the resource range
resolution pushed out to the caller we can also push the
namespace-to-region lookup to the caller and drop the implicit pmem-type
assumption about the passed in namespace object.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

a3901802

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

29 3月, 2016 1 次提交

x86, pmem: use memcpy_mcsafe() for memcpy_from_pmem() · fc0c2028

由 Dan Williams 提交于 3月 08, 2016

Update the definition of memcpy_from_pmem() to return 0 or a negative
error code.  Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().

Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

fc0c2028

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功