提交 · c12c48ce869d72029d70666f615cbd8f67fc14e9 · openanolis / cloud-kernel

16 6月, 2017 1 次提交

libnvdimm, label: add v1.2 interleave-set-cookie algorithm · c12c48ce

由 Dan Williams 提交于 6月 04, 2017

The interleave-set-cookie algorithm is extended to incorporate all the
same components that are used to generate an nvdimm unique-id. For
backwards compatibility we still maintain the old v1.1 definition.
Reported-by: NNicholas Moulin <nicholas.w.moulin@intel.com>
Reported-by: NKaushik Kanetkar <kaushik.a.kanetkar@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c12c48ce

05 5月, 2017 2 次提交

libnvdimm: handle locked label storage areas · 9d62ed96

由 Dan Williams 提交于 5月 04, 2017

Per the latest version of the "NVDIMM DSM Interface Example" [1], the
label data retrieval routine can report a "locked" status. In this case
all regions associated with that DIMM are disabled until the label area
is unlocked. Provide generic libnvdimm enabling for NVDIMMs with label
data area locking capabilities.

[1]: http://pmem.io/documents/Signed-off-by: NDan Williams <dan.j.williams@intel.com>

9d62ed96

libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED · 8f078b38

由 Dan Williams 提交于 5月 04, 2017

This is a preparation patch for handling locked nvdimm label regions, a
new concept as introduced by the latest DSM document on pmem.io [1]. A
future patch will leverage nvdimm_set_locked() at DIMM probe time to
flag regions that can not be enabled. There should be no functional
difference resulting from this change.

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example-V1.3.pdfSigned-off-by: NDan Williams <dan.j.williams@intel.com>

8f078b38

01 3月, 2017 1 次提交

nfit, libnvdimm: fix interleave set cookie calculation · 86ef58a4

由 Dan Williams 提交于 2月 28, 2017

The interleave-set cookie is a sum that sanity checks the composition of
an interleave set has not changed from when the namespace was initially
created.  The checksum is calculated by sorting the DIMMs by their
location in the interleave-set. The comparison for the sort must be
64-bit wide, not byte-by-byte as performed by memcmp() in the broken
case.

Fix the implementation to accept correct cookie values in addition to
the Linux "memcmp" order cookies, but only allow correct cookies to be
generated going forward. It does mean that namespaces created by
third-party-tooling, or created by newer kernels with this fix, will not
validate on older kernels. However, there are a couple mitigating
conditions:

    1/ platforms with namespace-label capable NVDIMMs are not widely
       available.

    2/ interleave-sets with a single-dimm are by definition not affected
       (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case.

The cookie stored in the namespace label will be fixed by any write the
namespace label, the most straightforward way to achieve this is to
write to the "alt_name" attribute of a namespace in sysfs.

Cc: <stable@vger.kernel.org>
Fixes: eaf96153 ("libnvdimm, nfit: add interleave-set state-tracking infrastructure")
Reported-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com>
Tested-by: NNicholas Moulin <nicholas.w.moulin@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

86ef58a4

01 2月, 2017 2 次提交

libnvdimm, namespace: do not delete namespace-id 0 · 9d032f42

由 Dan Williams 提交于 1月 25, 2017

Given that the naming of pmem devices changes from the pmemX form to the
pmemX.Y form when namespace id is greater than 0, arrange for namespaces
with id-0 to be exempt from deletion. Otherwise a simple reconfiguration
of an existing namespace to a new mode results in a name change of the
resulting block device:

    # ndctl list --namespace=namespace1.0
    {
      "dev":"namespace1.0",
      "mode":"raw",
      "size":2147483648,
      "uuid":"3dadf3dc-89b9-4b24-b20e-abc8a4707ce3",
      "blockdev":"pmem1"
    }

    # ndctl create-namespace --reconfig=namespace1.0 --mode=memory --force
    {
      "dev":"namespace1.1",
      "mode":"memory",
      "size":2111832064,
      "uuid":"7b4a6341-7318-4219-a02c-fb57c0bbf613",
      "blockdev":"pmem1.1"
    }

This change does require tooling changes to explicitly look for
namespaceX.0 if the seed has already advanced to another namespace.

Cc: <stable@vger.kernel.org>
Fixes: 98a29c39 ("libnvdimm, namespace: allow creation of multiple pmem-namespaces per region")
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

9d032f42

nvdimm: constify device_type structures · 970d14e3

由 Bhumika Goyal 提交于 1月 25, 2017

Declare device_type structure as const as it is only stored in the
type field of a device structure. This field is of type const, so add
const to declaration of device_type structure.

File size before:
  text	   data	    bss	    dec	    hex	filename
  19278	   3199	     16	  22493	   57dd	nvdimm/namespace_devs.o

File size after:
  text	   data	    bss	    dec	    hex	filename
  19929	   3160	     16	  23105	   5a41	nvdimm/namespace_devs.o
Signed-off-by: NBhumika Goyal <bhumirks@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

970d14e3

14 1月, 2017 1 次提交

libnvdimm, namespace: fix pmem namespace leak, delete when size set to zero · 1f19b983

由 Dan Williams 提交于 1月 09, 2017

Commit 98a29c39 ("libnvdimm, namespace: allow creation of multiple
pmem-namespaces per region") added support for establishing additional
pmem namespace beyond the seed device, similar to blk namespaces.
However, it neglected to delete the namespace when the size is set to
zero.

Fixes: 98a29c39 ("libnvdimm, namespace: allow creation of multiple pmem-namespaces per region")
Cc: <stable@vger.kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

1f19b983

16 12月, 2016 1 次提交

libnvdimm: replace mutex_is_locked() warnings with lockdep_assert_held · 9cf8bd52

由 Dan Williams 提交于 12月 15, 2016

For warnings that should only ever trigger during development and
testing replace WARN statements with lockdep_assert_held. The lockdep
pattern is prevalent, and these paths are are well covered by libnvdimm
unit tests.
Reported-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

9cf8bd52

05 12月, 2016 1 次提交

libnvdimm, namespace: use octal for permissions · b44fe760

由 Fabian Frederick 提交于 12月 04, 2016

According to commit f90774e1
("checkpatch: look for symbolic permissions and suggest octal instead")
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

b44fe760

29 11月, 2016 1 次提交

libnvdimm, namespace: fix the type of name variable · 238b323a

由 Nicolas Iooss 提交于 11月 26, 2016

In create_namespace_blk(), the local variable "name" is defined as an
array of NSLABEL_NAME_LEN pointers:

    char *name[NSLABEL_NAME_LEN];

This variable is then used in calls to memcpy() and kmemdup() as if it
were char[NSLABEL_NAME_LEN]. Remove the star in the variable definition
to makes it look right.
Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

238b323a

20 10月, 2016 1 次提交

libnvdimm, namespace: potential NULL deref on allocation error · 75d29713

由 Dan Carpenter 提交于 10月 12, 2016

If the kcalloc() fails then "devs" can be NULL and we dereference it
checking "devs[i]".

Fixes: 1b40e09a ('libnvdimm: blk labels and namespace instantiation')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

75d29713

08 10月, 2016 7 次提交

libnvdimm, namespace: allow creation of multiple pmem-namespaces per region · 98a29c39

由 Dan Williams 提交于 9月 30, 2016

Similar to BLK regions, publish new seed namespace devices to allow
unused PMEM region capacity to be consumed by additional namespaces.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

98a29c39

libnvdimm, namespace: lift single pmem limit in scan_labels() · 991d9020

由 Dan Williams 提交于 10月 05, 2016

Now that the rest of the infrastructure has been converted to handle
multi-pmem configurations, lift the artificial barrier at scan time.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

991d9020

libnvdimm, namespace: filter out of range labels in scan_labels() · c969e24c

由 Dan Williams 提交于 10月 05, 2016

Short-circuit doomed-to-fail label validation attempts by skipping
labels that are outside the given region. For example a DIMM that has
multiple PMEM regions will waste time attempting to create namespaces
only to find that the interleave-set-cookie does not validate, e.g.:

nd_region region6: invalid cookie in label: 73e608dc-47b9-4b2a-b5c7-2d55a32e0c2

Similar to how we skip BLK labels when performing PMEM validation we can
skip out-of-range labels early.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

c969e24c

libnvdimm, namespace: enable allocation of multiple pmem namespaces · 762d067d

由 Dan Williams 提交于 10月 04, 2016

Now that we have nd_region_available_dpa() able to handle the presence
of multiple PMEM allocations in aliased PMEM regions, reuse that same
infrastructure to track allocations from free space.  In particular
handle allocating from an aliased PMEM region in the case where there
are dis-contiguous holes.  The allocation for BLK and PMEM are
documented in the space_valid() helper:

    BLK-space is valid as long as it does not precede a PMEM
    allocation in a given region. PMEM-space must be contiguous
    and adjacent to an existing existing allocation (if one
    exists).
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

762d067d

libnvdimm, namespace: expand pmem device naming scheme for multi-pmem · 01220733

由 Dan Williams 提交于 10月 05, 2016

pmem devices are currently named /dev/pmem<region-index>. Preserve the
naming of the 0th device, but add a ".<namespace-index>" for other
devices.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

01220733

libnvdimm, namespace: sort namespaces by dpa at init · 6ff3e912

由 Dan Williams 提交于 10月 05, 2016

Add more determinism to initial namespace device-name assignments by
sorting the namespaces by starting dpa.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6ff3e912

libnvdimm, namespace: allow multiple pmem-namespaces per region at scan time · 0e3b0d12

由 Dan Williams 提交于 10月 06, 2016

If label scanning finds multiple valid pmem namespaces allow them to be
surfaced rather than fail namespace scanning. Support for creating
multiple namespaces per region is saved for a later patch.

Note that this adds some new error messages to clarify which of the pmem
namespaces in the set are potentially impacted by invalid labels.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0e3b0d12

06 10月, 2016 2 次提交

libnvdimm, namespace: unify blk and pmem label scanning · 8a5f50d3

由 Dan Williams 提交于 9月 22, 2016

In preparation for allowing multiple namespace per pmem region, unify
blk and pmem label scanning. Given that blk regions already support
multiple namespaces, teaching that path how to do pmem namespace
scanning is an incremental step towards multiple pmem namespace support.
This should be functionally equivalent to the previous state in that
stops after finding the first valid pmem label set.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

8a5f50d3

libnvdimm, namespace: refactor uuid_show() into a namespace_to_uuid() helper · f95b4bca

由 Dan Williams 提交于 9月 21, 2016

The ability to translate a generic struct device pointer into a
namespace uuid is a useful utility as we go to unify the blk and pmem
label scanning paths.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

f95b4bca

01 10月, 2016 1 次提交

libnvdimm, label: convert label tracking to a linked list · ae8219f1

由 Dan Williams 提交于 9月 19, 2016

In preparation for enabling multiple namespaces per pmem region, convert
the label tracking to use a linked list. In particular this will allow
select_pmem_id() to move labels from the unvalidated state to the
validated state. Currently we only track one validated set per-region.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ae8219f1

22 9月, 2016 1 次提交

libnvdimm, namespace: debug invalid interleave-set-cookie values · 4765218d

由 Dan Williams 提交于 9月 15, 2016

If platform firmware fails to populate unique / non-zero serial number
data for each nvdimm in an interleave-set it may cause pmem region
initialization to fail. Add a debug message for this case.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

4765218d

02 9月, 2016 1 次提交

nvdimm: Spelling s/unacknoweldged/unacknowledged/ · ae551e9c

由 Geert Uytterhoeven 提交于 8月 31, 2016

Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ae551e9c

10 5月, 2016 1 次提交

libnvdimm, dax: introduce device-dax infrastructure · cd03412a

由 Dan Williams 提交于 3月 11, 2016

Device DAX is the device-centric analogue of Filesystem DAX
(CONFIG_FS_DAX).  It allows persistent memory ranges to be allocated and
mapped without need of an intervening file system.  This initial
infrastructure arranges for a libnvdimm pfn-device to be represented as
a different device-type so that it can be attached to a driver other
than the pmem driver.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

cd03412a

23 4月, 2016 1 次提交

libnvdimm: cleanup nvdimm_namespace_common_probe(), kill 'host' · 0bfb8dd3

由 Dan Williams 提交于 4月 13, 2016

The 'host' variable can be killed as it is always the same as the passed
in device.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0bfb8dd3

06 3月, 2016 1 次提交

libnvdimm, pmem: adjust for section collisions with 'System RAM' · cfe30b87

由 Dan Williams 提交于 3月 03, 2016

On a platform where 'Persistent Memory' and 'System RAM' are mixed
within a given sparsemem section, trim the namespace and notify about the
sub-optimal alignment.

Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

cfe30b87

27 1月, 2016 1 次提交

libnvdimm: fix mode determination for e820 devices · 9c412428

由 Dan Williams 提交于 1月 23, 2016

Correctly display "safe" mode when a btt is established on a e820/memmap
defined pmem namespace.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

9c412428

06 1月, 2016 1 次提交

libnvdimm: fix namespace object confusion in is_uuid_busy() · e07ecd76

由 Dan Williams 提交于 1月 05, 2016

When btt devices were re-worked to be child devices of regions this
routine was overlooked.  It mistakenly attempts to_nd_namespace_pmem()
or to_nd_namespace_blk() conversions on btt and pfn devices.  By luck to
date we have happened to be hitting valid memory leading to a uuid
miscompare, but a recent change to struct nd_namespace_common causes:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
 IP: [<ffffffff814610dc>] memcmp+0xc/0x40
 [..]
 Call Trace:
  [<ffffffffa0028631>] is_uuid_busy+0xc1/0x2a0 [libnvdimm]
  [<ffffffffa0028570>] ? to_nd_blk_region+0x50/0x50 [libnvdimm]
  [<ffffffff8158c9c0>] device_for_each_child+0x50/0x90

Cc: <stable@vger.kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e07ecd76

25 12月, 2015 1 次提交

libnvdimm, pfn: move 'memory mode' indication to sysfs · 0731de0d

由 Dan Williams 提交于 12月 14, 2015

'Memory mode' is defined as the capability of a DAX mapping to be the
source/target of DMA and other "direct I/O" scenarios.  While it
currently requires allocating 'struct page' for each page frame of
persistent memory in the namespace it will not always be the case.  Work
continues on reducing the kernel's dependency on 'struct page'.

Let's not maintain a suffix that is expected to lose meaning over time.
In other words a future 'raw mode' pmem namespace may be as capable as
today's 'memory mode' namespace.  Undo the encoding of the mode in the
device name and leave it to other tooling to determine the mode of the
namespace from its attributes.
Reported-by: NMatthew Wilcox <willy@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0731de0d

14 12月, 2015 1 次提交

libnvdimm, pfn: fix pfn seed creation · 2dc43331

由 Dan Williams 提交于 12月 13, 2015

Similar to btt, plant a new pfn seed when the existing one is activated.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

2dc43331

09 12月, 2015 1 次提交

nvdimm: improve diagnosibility of namespaces · bd26d0d0

由 Dmitry Krivenok 提交于 12月 02, 2015

In order to bind namespace to the driver user must first
set all mandatory attributes in the following order:
- uuid
- size
- sector_size (for blk namespace only)

If the order is wrong, then user either won't be able to set
the attribute or bind the namespace.

This simple patch improves diagnosibility of common operations
with namespaces by printing some details about the error
instead of failing silently.

Below are examples of error messages (assuming dyndbg is
enabled for nvdimms):

[/]# echo 4194304 > /sys/bus/nd/devices/region5/namespace5.0/size
[  288.372612] nd namespace5.0: __size_store: uuid not set
[  288.374839] nd namespace5.0: size_store: 400000 fail (-6)
sh: write error: No such device or address
[/]#

[/]# echo namespace5.0 > /sys/bus/nd/drivers/nd_blk/bind
[  554.671648] nd_blk namespace5.0: nvdimm_namespace_common_probe: sector size not set
[  554.674688]  ndbus1: nd_blk.probe(namespace5.0) = -19
sh: write error: No such device
[/]#
Signed-off-by: NDmitry V. Krivenok <krivenok.dmitry@gmail.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

bd26d0d0

29 8月, 2015 2 次提交

libnvdimm, pmem: direct map legacy pmem by default · 004f1afb

由 Dan Williams 提交于 8月 24, 2015

The expectation is that the legacy / non-standard pmem discovery method
(e820 type-12) will only ever be used to describe small quantities of
persistent memory.  Larger capacities will be described via the ACPI
NFIT.  When "allocate struct page from pmem" support is added this default
policy can be overridden by assigning a legacy pmem namespace to a pfn
device, however this would be only be necessary if a platform used the
legacy mechanism to define a very large range.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

004f1afb

libnvdimm, pfn: 'struct page' provider infrastructure · e1455744

由 Dan Williams 提交于 7月 30, 2015

Implement the base infrastructure for libnvdimm PFN devices. Similar to
BTT devices they take a namespace as a backing device and layer
functionality on top. In this case the functionality is reserving space
for an array of 'struct page' entries to be handed out through
pfn_to_page(). For now this is just the basic libnvdimm-device-model for
configuring the base PFN device.

As the namespace claiming mechanism for PFN devices is mostly identical
to BTT devices drivers/nvdimm/claim.c is created to house the common
bits.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

e1455744

15 8月, 2015 1 次提交

libnvdimm, btt: write and validate parent_uuid · 6ec68954

由 Vishal Verma 提交于 7月 29, 2015

When a BTT is instantiated on a namespace it must validate the namespace
uuid matches the 'parent_uuid' stored in the btt superblock. This
property enforces that changing the namespace UUID invalidates all
former BTT instances on that storage. For "IO namespaces" that don't
have a label or UUID, the parent_uuid is set to zero, and this
validation is skipped. For such cases, old BTTs have to be invalidated
by forcing the namespace to raw mode, and overwriting the BTT info
blocks.

Based on a patch by Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6ec68954

26 6月, 2015 4 次提交

libnvdimm: Add sysfs numa_node to NVDIMM devices · 74ae66c3

由 Toshi Kani 提交于 6月 19, 2015

Add support of sysfs 'numa_node' to I/O-related NVDIMM devices
under /sys/bus/nd/devices, regionN, namespaceN.0, and bttN.x.

An example of numa_node values on a 2-socket system with a single
NVDIMM range on each socket is shown below.
  /sys/bus/nd/devices
  |-- btt0.0/numa_node:0
  |-- btt1.0/numa_node:1
  |-- btt1.1/numa_node:1
  |-- namespace0.0/numa_node:0
  |-- namespace1.0/numa_node:1
  |-- region0/numa_node:0
  |-- region1/numa_node:1

These numa_node files are then linked under the block class of
their device names.
  /sys/class/block/pmem0/device/numa_node:0
  /sys/class/block/pmem1s/device/numa_node:1

This enables numactl(8) to accept 'block:' and 'file:' paths of
pmem and btt devices as shown in the examples below.
  numactl --preferred block:pmem0 --show
  numactl --preferred file:/dev/pmem1s --show
Signed-off-by: NToshi Kani <toshi.kani@hp.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

74ae66c3

libnvdimm, blk: add support for blk integrity · fcae6957

由 Vishal Verma 提交于 6月 25, 2015

Support multiple block sizes (sector + metadata) for nd_blk in the
same way as done for the BTT. Add the idea of an 'internal' lbasize,
which is properly aligned and padded, and store metadata in this space.
Signed-off-by: NVishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

fcae6957

libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory · 047fc8a1

由 Ross Zwisler 提交于 6月 25, 2015

The libnvdimm implementation handles allocating dimm address space (DPA)
between PMEM and BLK mode interfaces.  After DPA has been allocated from
a BLK-region to a BLK-namespace the nd_blk driver attaches to handle I/O
as a struct bio based block device. Unlike PMEM, BLK is required to
handle platform specific details like mmio register formats and memory
controller interleave.  For this reason the libnvdimm generic nd_blk
driver calls back into the bus provider to carry out the I/O.

This initial implementation handles the BLK interface defined by the
ACPI 6 NFIT [1] and the NVDIMM DSM Interface Example [2] composed from
DCR (dimm control region), BDW (block data window), IDT (interleave
descriptor) NFIT structures and the hardware register format.
[1]: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
[2]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

047fc8a1

nd_btt: atomic sector updates · 5212e11f

由 Vishal Verma 提交于 6月 25, 2015

BTT stands for Block Translation Table, and is a way to provide power
fail sector atomicity semantics for block devices that have the ability
to perform byte granularity IO. It relies on the capability of libnvdimm
namespace devices to do byte aligned IO.

The BTT works as a stacked blocked device, and reserves a chunk of space
from the backing device for its accounting metadata. It is a bio-based
driver because all IO is done synchronously, and there is no queuing or
asynchronous completions at either the device or the driver level.

The BTT uses 'lanes' to index into various 'on-disk' data structures,
and lanes also act as a synchronization mechanism in case there are more
CPUs than available lanes. We did a comparison between two lane lock
strategies - first where we kept an atomic counter around that tracked
which was the last lane that was used, and 'our' lane was determined by
atomically incrementing that. That way, for the nr_cpus > nr_lanes case,
theoretically, no CPU would be blocked waiting for a lane. The other
strategy was to use the cpu number we're scheduled on to and hash it to
a lane number. Theoretically, this could block an IO that could've
otherwise run using a different, free lane. But some fio workloads
showed that the direct cpu -> lane hash performed faster than tracking
'last lane' - my reasoning is the cache thrash caused by moving the
atomic variable made that approach slower than simply waiting out the
in-progress IO. This supports the conclusion that the driver can be a
very simple bio-based one that does synchronous IOs instead of queuing.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
[jmoyer: fix nmi watchdog timeout in btt_map_init]
[jmoyer: move btt initialization to module load path]
[jmoyer: fix memory leak in the btt initialization path]
[jmoyer: Don't overwrite corrupted arenas]
Signed-off-by: NVishal Verma <vishal.l.verma@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5212e11f

25 6月, 2015 2 次提交

libnvdimm: infrastructure for btt devices · 8c2f7e86

由 Dan Williams 提交于 6月 25, 2015

NVDIMM namespaces, in addition to accepting "struct bio" based requests,
also have the capability to perform byte-aligned accesses.  By default
only the bio/block interface is used.  However, if another driver can
make effective use of the byte-aligned capability it can claim namespace
interface and use the byte-aligned ->rw_bytes() interface.

The BTT driver is the initial first consumer of this mechanism to allow
adding atomic sector update semantics to a pmem or blk namespace.  This
patch is the sysfs infrastructure to allow configuring a BTT instance
for a namespace.  Enabling that BTT and performing i/o is in a
subsequent patch.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

8c2f7e86

libnvdimm: write blk label set · 0ba1c634

由 Dan Williams 提交于 5月 30, 2015

After 'uuid', 'size', 'sector_size', and optionally 'alt_name' have been
set to valid values the labels on the dimm can be updated.  The
difference with the pmem case is that blk namespaces are limited to one
dimm and can cover discontiguous ranges in dpa space.

Also, after allocating label slots, it is useful for userspace to know
how many slots are left.  Export this information in sysfs.

Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Neil Brown <neilb@suse.de>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

0ba1c634

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功