提交 · 5deb67f77a266010e2c10fb124b7516d0d258ce8 · openanolis / cloud-kernel

01 9月, 2017 7 次提交

libnvdimm, nd_blk: remove mmio_flush_range() · 5deb67f7

由 Robin Murphy 提交于 8月 31, 2017

mmio_flush_range() suffers from a lack of clearly-defined semantics,
and is somewhat ambiguous to port to other architectures where the
scope of the writeback implied by "flush" and ordering might matter,
but MMIO would tend to imply non-cacheable anyway. Per the rationale
in 67a3e8fe ("nd_blk: change aperture mapping from WC to WB"), the
only existing use is actually to invalidate clean cache lines for
ARCH_MEMREMAP_PMEM type mappings *without* writeback. Since the recent
cleanup of the pmem API, that also now happens to be the exact purpose
of arch_invalidate_pmem(), which would be a far more well-defined tool
for the job.

Rather than risk potentially inconsistent implementations of
mmio_flush_range() for the sake of one callsite, streamline things by
removing it entirely and instead move the ARCH_MEMREMAP_PMEM related
definitions up to the libnvdimm level, so they can be shared by NFIT
as well. This allows NFIT to be enabled for arm64.
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5deb67f7

libnvdimm, btt: rework error clearing · d9b83c75

由 Vishal Verma 提交于 8月 30, 2017

Clearing errors or badblocks during a BTT write requires sending an ACPI
DSM, which means potentially sleeping. Since a BTT IO happens in atomic
context (preemption disabled, spinlocks may be held), we cannot perform
error clearing in the course of an IO. Due to this error clearing for
BTT IOs has hitherto been disabled.

In this patch we move error clearing out of the atomic section, and thus
re-enable error clearing with BTTs. When we are about to add a block to
the free list, we check if it was previously marked as an error, and if
it was, we add it to the freelist, but also set a flag that says error
clearing will be required. We then drop the lane (ending the atomic
context), and send a zero buffer so that the error can be cleared. The
error flag in the free list is protected by the nd 'lane', and is set
only be a thread while it holds that lane. When the error is cleared,
the flag is cleared, but while holding a mutex for that freelist index.

When writing, we check for two things -
1/ If the freelist mutex is held or if the error flag is set. If so,
this is an error block that is being (or about to be) cleared.
2/ If the block is a known badblock based on nsio->bb

The second check is required because the BTT map error flag for a map
entry only gets set when an error LBA is read. If we write to a new
location that may not have the map error flag set, but still might be in
the region's badblock list, we can trigger an EIO on the write, which is
undesirable and completely avoidable.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d9b83c75

libnvdimm: fix potential deadlock while clearing errors · 0930a750

由 Vishal Verma 提交于 8月 30, 2017

With the ACPI NFIT 'DSM' methods, acpi can be called from IO paths.
Specifically, the DSM to clear media errors is called during writes, so
that we can provide a writes-fix-errors model.

However it is easy to imagine a scenario like:
 -> write through the nvdimm driver
   -> acpi allocation
     -> writeback, causes more IO through the nvdimm driver
       -> deadlock

Fix this by using memalloc_noio_{save,restore}, which sets the GFP_NOIO
flag for the current scope when issuing commands/IOs that are expected
to clear errors.

Cc: <linux-acpi@vger.kernel.org>
Cc: <linux-nvdimm@lists.01.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0930a750

libnvdimm, btt: cache sector_size in arena_info · 75892004

由 Vishal Verma 提交于 8月 30, 2017

In preparation for the error clearing rework, add sector_size in the
arena_info struct.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

75892004

libnvdimm, btt: ensure that flags were also unchanged during a map_read · 1398199d

由 Vishal Verma 提交于 8月 30, 2017

In btt_map_read, we read the map twice to make sure that the map entry
didn't change after we added it to the read tracking table. In
anticipation of expanding the use of the error bit, also make sure that
the error and zero flags are constant across the two map reads.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1398199d

libnvdimm, btt: refactor map entry operations with macros · 0595d539

由 Vishal Verma 提交于 8月 30, 2017

Add helpers for converting a raw map entry to just the block number, or
either of the 'e' or 'z' flags in preparation for actually using the
error flag to mark blocks with media errors.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0595d539

libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path · 1db1f3ce

由 Vishal Verma 提交于 8月 30, 2017

The IO context conversion for rw_bytes missed a case in the BTT write
path (btt_map_write) which should've been marked as atomic.

In reality this should not cause a problem, because map writes are to
small for nsio_rw_bytes to attempt error clearing, but it should be
fixed for posterity.

Add a might_sleep() in the non-atomic section of nsio_rw_bytes so that
things like the nfit unit tests, which don't actually sleep, can catch
bugs like this.

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1db1f3ce

30 8月, 2017 2 次提交

libnvdimm, btt: check memory allocation failure · ed36b4db

由 Christophe Jaillet 提交于 8月 27, 2017

Check memory allocation failures and return -ENOMEM in such cases, as
already done few lines below for another memory allocation.

This avoids NULL pointers dereference.

Cc: <stable@vger.kernel.org>
Fixes: 14e49454 ("libnvdimm, btt: BTT updates for UEFI 2.7 format")
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ed36b4db

libnvdimm, label: fix index block size calculation · 02881768

由 Dan Williams 提交于 8月 29, 2017

The old calculation assumed that the label space was 128k and the label
size is 128. With v1.2 labels where the label size is 256 this
calculation will return zero. We are saved by the fact that the
nsindex_size is always pre-initialized from a previous 128 byte
assumption and we are lucky that the index sizes turn out the same.

Fix this going forward in case we start encountering different
geometries of label areas besides 128k.

Since the label size can change from one call to the next, drop the
caching of nsindex_size.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

02881768

16 8月, 2017 1 次提交

libnvdimm, pfn, dax: limit namespace alignments to the supported set · f13d2b61

由 Dan Williams 提交于 8月 11, 2017

Now that we properly advertise the supported pte, pmd, and pud sizes,
restrict the supported alignments that can be set on a namespace. This
assumes that userspace was not previously relying on the ability to set
odd alignments. At least ndctl only ever supported setting the namespace
alignment to 4K, 2M, or 1G.

Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f13d2b61

12 8月, 2017 2 次提交

libnvdimm, pfn, dax: show supported dax/pfn region alignments in sysfs · 1fdadbeb

由 Oliver O'Halloran 提交于 6月 27, 2017

The alignment of a DAX and PFN regions dictates the page sizes that can
be used to map the region. Even if the hardware page sizes are known the
actual range of supported page sizes that can be used with DAX depends
on the kernel configuration. As a result it's best that the kernel
advertises the alignments that should be used with these region types.

This patch adds the 'supported_alignments' region attribute to expose
this information to userspace.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
[djbw: integrate with nd_size_select_show() rename and other fixups]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1fdadbeb

libnvdimm: rename nd_sector_size_{show,store} to nd_size_select_{show,store} · b2c48f9f

由 Dan Williams 提交于 8月 11, 2017

Prepare for other another consumer of this size selection scheme that is
not a 'sector size'.

Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b2c48f9f

05 8月, 2017 1 次提交

nfit, libnvdimm, region: export 'position' in mapping info · 401c0a19

由 Dan Williams 提交于 8月 04, 2017

It is useful to be able to know the position of a DIMM in an
interleave-set. Consider the case where the order of the DIMMs changes
causing a namespace to be invalidated because the interleave-set cookie no
longer matches. If the before and after state of each DIMM position is
known this state debugged by the system owner.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

401c0a19

26 7月, 2017 1 次提交

libnvdimm: Stop using HPAGE_SIZE · 0dd69643

由 Oliver O'Halloran 提交于 6月 27, 2017

Currently libnvdimm uses HPAGE_SIZE as the default alignment for DAX and
PFN devices. HPAGE_SIZE is the default hugetlbfs page size and when
hugetlbfs is disabled it defaults to PAGE_SIZE. Given DAX has more
in common with THP than hugetlbfs we should proably be using
HPAGE_PMD_SIZE, but this is undefined when THP is disabled so lets just
give it a new name.

The other usage of HPAGE_SIZE in libnvdimm is when determining how large
the altmap should be. For the reasons mentioned above it doesn't really
make sense to use HPAGE_SIZE here either. PMD_SIZE seems to be safe to
use in generic code and it happens to match the vmemmap allocation block
on x86 and Power. It's still a hack, but it's a slightly nicer hack.
Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0dd69643

18 7月, 2017 1 次提交

libnvdimm: fix badblock range handling of ARS range · 4e3f0701

由 Toshi Kani 提交于 7月 07, 2017

__add_badblock_range() does not account sector alignment when
it sets 'num_sectors'.  Therefore, an ARS error record range
spanning across two sectors is set to a single sector length,
which leaves the 2nd sector unprotected.

Change __add_badblock_range() to set 'num_sectors' properly.

Cc: <stable@vger.kernel.org>
Fixes: 0caeef63 ("libnvdimm: Add a poison list and export badblocks")
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

4e3f0701

04 7月, 2017 3 次提交

libnvdimm, namespace: record 'lbasize' for pmem namespaces · 2de5148f

由 Dan Williams 提交于 7月 03, 2017

Commit f979b13c "libnvdimm, label: honor the lba size specified in
v1.2 labels") neglected to update the 'lbasize' in the label when the
namespace sector_size attribute was written. We need this value in the
label for inter-OS / pre-OS compatibility.

Fixes: f979b13c ("libnvdimm, label: honor the lba size specified in v1.2 labels")
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

2de5148f

block: guard bvec iteration logic · b1fb2c52

由 Dmitry Monakhov 提交于 6月 29, 2017

Currently if some one try to advance bvec beyond it's size we simply
dump WARN_ONCE and continue to iterate beyond bvec array boundaries.
This simply means that we endup dereferencing/corrupting random memory
region.

Sane reaction would be to propagate error back to calling context
But bvec_iter_advance's calling context is not always good for error
handling. For safity reason let truncate iterator size to zero which
will break external iteration loop which prevent us from unpredictable
memory range corruption. And even it caller ignores an error, it will
corrupt it's own bvecs, not others.

This patch does:
- Return error back to caller with hope that it will react on this
- Truncate iterator size

Code was added long time ago here 4550dd6c, luckily no one hit it
in real life :)
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
[hch: switch to true/false returns instead of errno values]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b1fb2c52

bio-integrity: fold bio_integrity_enabled to bio_integrity_prep · e23947bd

由 Dmitry Monakhov 提交于 6月 29, 2017

Currently all integrity prep hooks are open-coded, and if prepare fails
we ignore it's code and fail bio with EIO. Let's return real error to
upper layer, so later caller may react accordingly.

In fact no one want to use bio_integrity_prep() w/o bio_integrity_enabled,
so it is reasonable to fold it in to one function.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
[hch: merged with the latest block tree,
	return bool from bio_integrity_prep]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e23947bd

01 7月, 2017 4 次提交

libnvdimm: passthru functions clear to send · 53b85a44

由 Jerry Hoemann 提交于 6月 30, 2017

Have dsm functions called via the pass thru mechanism also
be checked against clear to send.
Signed-off-by: NJerry Hoemann <jerry.hoemann@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

53b85a44

libnvdimm, btt: convert some info messages to warn/err · e6be2dcb

由 Vishal Verma 提交于 6月 30, 2017

Some critical messages such as IO errors, metadata failures were printed
with dev_info. Make them louder by upgrading them to dev_warn or
dev_error.
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e6be2dcb

libnvdimm, region, pmem: fix 'badblocks' sysfs_get_dirent() reference lifetime · 6aa734a2

由 Dan Williams 提交于 6月 30, 2017

We need to hold a reference on the 'dirent' until we are sure there are
no more notifications that will be sent. As noted in the new comments we
take advantage of the fact that the references are taken and dropped
under device_lock() and that nd_device_notify() holds device_lock() over
new badblocks notifications. The notifications that happen when
badblocks are cleared only occur while the device is active.

Also take the opportunity to fix up the error messages to report the
user visible effect of a sysfs_get_dirent() failure.

Fixes: 975750a9 ("libnvdimm, pmem: Add sysfs notifications to badblocks")
Cc: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6aa734a2

libnvdimm: fix the clear-error check in nsio_rw_bytes · 7e5a21df

由 Vishal Verma 提交于 6月 30, 2017

A leftover from the 'bandaid' fix that disabled BTT error clearing in
rw_bytes resulted in an incorrect check. After we converted these checks
over to use the NVDIMM_IO_ATOMIC flag, the ndns->claim check was both
redundant, and incorrect. Remove it.

Fixes: 3ae3d67b ("libnvdimm: add an atomic vs process context flag to rw_bytes")
Cc: <stable@vger.kernel.org>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

7e5a21df

30 6月, 2017 5 次提交

libnvdimm, btt: fix btt_rw_page not returning errors · c13c43d5

由 Vishal Verma 提交于 6月 29, 2017

btt_rw_page was not propagating errors frm btt_do_bvec, resulting in any
IO errors via the rw_page path going unnoticed. the pmem driver recently
fixed this in e10624f8 pmem: fail io-requests to known bad blocks
but same problem in BTT went neglected.

Fixes: 5212e11f ("nd_btt: atomic sector updates")
Cc: <stable@vger.kernel.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c13c43d5

acpi, nfit: quiet invalid block-aperture-region warnings · d5d51fec

由 Dan Williams 提交于 6月 29, 2017

This state is already visible by userspace since the BLK region will not
be enabled, and it is otherwise benign as it usually indicates that the
DIMM is not configured.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d5d51fec

libnvdimm, btt: BTT updates for UEFI 2.7 format · 14e49454

由 Vishal Verma 提交于 6月 28, 2017

The UEFI 2.7 specification defines an updated BTT metadata format,
bumping the revision to 2.0. Add support for the new format, while
retaining compatibility for the old 1.1 format.

Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

14e49454

libnvdimm, pmem: disable dax flushing when pmem is fronting a volatile region · 0b277961

由 Dan Williams 提交于 6月 09, 2017

The pmem driver attaches to both persistent and volatile memory ranges
advertised by the ACPI NFIT. When the region is volatile it is redundant
to spend cycles flushing caches at fsync(). Check if the hosting region
is volatile and do not set dax_write_cache() if it is.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0b277961

libnvdimm, pmem, dax: export a cache control attribute · 6e0c90d6

由 Dan Williams 提交于 6月 26, 2017

The dax_flush() operation can be turned into a nop on platforms where
firmware arranges for cpu caches to be flushed on a power-fail event.
The ACPI 6.2 specification defines a mechanism for the platform to
indicate this capability so the kernel can select the proper default.
However, for other platforms, the administrator must toggle this setting
manually.

Given this flush setting is a dax-specific mechanism we advertise it
through a 'dax' attribute group hanging off a host device. For example,
a 'pmem0' block-device gets a 'dax' sysfs-subdirectory with a
'write_cache' attribute to control response to dax cache flush requests.
This is similar to the 'queue/write_cache' attribute that appears under
block devices.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6e0c90d6

28 6月, 2017 5 次提交

libnvdimm, nfit: enable support for volatile ranges · c9e582aa

由 Dan Williams 提交于 5月 29, 2017

Allow volatile nfit ranges to participate in all the same infrastructure
provided for persistent memory regions. A resulting resulting namespace
device will still be called "pmem", but the parent region type will be
"nd_volatile". This is in preparation for disabling the dax ->flush()
operation in the pmem driver when it is hosted on a volatile range.

Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c9e582aa

libnvdimm, pmem: fix persistence warning · c00b396e

由 Dan Williams 提交于 5月 29, 2017

The pmem driver assumes if platform firmware describes the memory
devices associated with a persistent memory range and
CONFIG_ARCH_HAS_PMEM_API=y that it has all the mechanism necessary to
flush data to a power-fail safe zone. We warn if the firmware does not
describe memory devices, but we also need to warn if the architecture
does not claim pmem support.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c00b396e

x86, libnvdimm, pmem: remove global pmem api · ca6a4657

由 Dan Williams 提交于 1月 13, 2017

Now that all callers of the pmem api have been converted to dax helpers that
call back to the pmem driver, we can remove include/linux/pmem.h and
asm/pmem.h.

Cc: <x86@kernel.org>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ca6a4657

x86, libnvdimm, pmem: move arch_invalidate_pmem() to libnvdimm · f2b61257

由 Dan Williams 提交于 5月 29, 2017

Kill this globally defined wrapper and move to libnvdimm so that we can
ultimately remove include/linux/pmem.h and asm/pmem.h.

Cc: <x86@kernel.org>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f2b61257

block: don't bother with bounce limits for make_request drivers · 0b0bcacc

由 Christoph Hellwig 提交于 6月 19, 2017

We only call blk_queue_bounce for request-based drivers, so stop messing
with it for make_request based drivers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0b0bcacc

16 6月, 2017 8 次提交

x86, dax, libnvdimm: remove wb_cache_pmem() indirection · 4e4f00a9

由 Dan Williams 提交于 5月 29, 2017

With all handling of the CONFIG_ARCH_HAS_PMEM_API case being moved to
libnvdimm and the pmem driver directly we do not need to provide global
wrappers and fallbacks in the CONFIG_ARCH_HAS_PMEM_API=n case. The pmem
driver will simply not link to arch_wb_cache_pmem() in that case.  Same
as before, pmem flushing is only defined for x86_64, via
clean_cache_range(), but it is straightforward to add other archs in the
future.

arch_wb_cache_pmem() is an exported function since the pmem module needs
to find it, but it is privately declared in drivers/nvdimm/pmem.h because
there are no consumers outside of the pmem driver.

Cc: <x86@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

4e4f00a9

dax, pmem: introduce an optional 'flush' dax_operation · 3c1cebff

由 Dan Williams 提交于 5月 29, 2017

Filesystem-DAX flushes caches whenever it writes to the address returned
through dax_direct_access() and when writing back dirty radix entries.
That flushing is only required in the pmem case, so add a dax operation
to allow pmem to take this extra action, but skip it for other dax
capable devices that do not provide a flush routine.

An example for this differentiation might be a volatile ram disk where
there is no expectation of persistence. In fact the pmem driver itself might
front such an address range specified by the NFIT. So, this "no flush"
property might be something passed down by the bus / libnvdimm.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

3c1cebff

libnvdimm, pmem: Add sysfs notifications to badblocks · 975750a9

由 Toshi Kani 提交于 6月 12, 2017

Sysfs "badblocks" information may be updated during run-time that:
 - MCE, SCI, and sysfs "scrub" may add new bad blocks
 - Writes and ioctl() may clear bad blocks

Add support to send sysfs notifications to sysfs "badblocks" file
under region and pmem directories when their badblocks information
is re-evaluated (but is not necessarily changed) during run-time.
Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

975750a9

libnvdimm, label: switch to using v1.2 labels by default · 8990cdf1

由 Dan Williams 提交于 6月 07, 2017

The rules for which version of the label specification are in effect at
any given point in time are as follows:

1/ If a DIMM has an existing / valid index block then the version
specified is used regardless if it is a previous version.

2/ By default when the kernel is initializing new index blocks the
latest specification version (v1.2 at time of writing) is used.

3/ An environment that wants to force create v1.1 label-sets must
arrange for userspace to disable all active regions / namespaces /
dimms and write a valid set of v1.1 index blocks to the dimms.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

8990cdf1

libnvdimm, label: add address abstraction identifiers · b3fde74e

由 Dan Williams 提交于 6月 04, 2017

Starting with v1.2 labels, 'address abstractions' can be hinted via an
address abstraction id that implies an info-block format. The standard
address abstraction in the specification is the v2 format of the
Block-Translation-Table (BTT). Support for that is saved for a later
patch, for now we add support for the Linux supported address
abstractions BTT (v1), PFN, and DAX.

The new 'holder_class' attribute for namespace devices is added for
tooling to specify the 'abstraction_guid' to store in the namespace label.
For v1.1 labels this field is undefined and any setting of
'holder_class' away from the default 'none' value will only have effect
until the driver is unloaded. Setting 'holder_class' requires that
whatever device tries to claim the namespace must be of the specified
class.

Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b3fde74e

libnvdimm, label: add v1.2 label checksum support · 355d8388

由 Dan Williams 提交于 6月 06, 2017

The v1.2 namespace label specification adds a fletcher checksum to each
label instance. Add generation and validation support for the new field.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

355d8388

libnvdimm, label: update 'nlabel' and 'position' handling for local namespaces · 3934d841

由 Dan Williams 提交于 6月 06, 2017

The v1.2 namespace label specification requires 'nlabel' and 'position'
to be valid for the first ("lowest dpa") label in the set. It also
requires all non-first labels to set those fields to 0xff.

Linux does not much care if these values are correct, because we can
just trust the count of labels with the matching uuid like the v1.1
case. However, we set them correctly in case other environments care.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

3934d841

libnvdimm, label: populate 'isetcookie' for blk-aperture namespaces · 8f2bc243

由 Dan Williams 提交于 6月 06, 2017

Starting with the v1.2 definition of namespace labels, the isetcookie
field is populated and validated for blk-aperture namespaces. This adds
some safety against inadvertent copying of namespace labels from one
DIMM-device to another.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

8f2bc243

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功