提交 9f3a0941 编写于 作者: L Linus Torvalds

Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This cycle was was not something I ever want to repeat as there were
  several late changes that have only now just settled.

  Half of the branch up to commit d2c997c0 ("fs, dax: use
  page->mapping to warn...") have been in -next for several releases.
  The of_pmem driver and the address range scrub rework were late
  arrivals, and the dax work was scaled back at the last moment.

  The of_pmem driver missed a previous merge window due to an oversight.
  A sense of obligation to rectify that miss is why it is included for
  4.17. It has acks from PowerPC folks. Stephen reported a build failure
  that only occurs when merging it with your latest tree, for now I have
  fixed that up by disabling modular builds of of_pmem. A test merge
  with your tree has received a build success report from the 0day robot
  over 156 configs.

  An initial version of the ARS rework was submitted before the merge
  window. It is self contained to libnvdimm, a net code reduction, and
  passing all unit tests.

  The filesystem-dax changes are based on the wait_var_event()
  functionality from tip/sched/core. However, late review feedback
  showed that those changes regressed truncate performance to a large
  degree. The branch was rewound to drop the truncate behavior change
  and now only includes preparation patches and cleanups (with full acks
  and reviews). The finalization of this dax-dma-vs-trnucate work will
  need to wait for 4.18.

  Summary:

   - A rework of the filesytem-dax implementation provides for detection
     of unmap operations (truncate / hole punch) colliding with
     in-progress device-DMA. A fix for these collisions remains a
     work-in-progress pending resolution of truncate latency and
     starvation regressions.

   - The of_pmem driver expands the users of libnvdimm outside of x86
     and ACPI to describe an implementation of persistent memory on
     PowerPC with Open Firmware / Device tree.

   - Address Range Scrub (ARS) handling is completely rewritten to
     account for the fact that ARS may run for 100s of seconds and there
     is no platform defined way to cancel it. ARS will now no longer
     block namespace initialization.

   - The NVDIMM Namespace Label implementation is updated to handle
     label areas as small as 1K, down from 128K.

   - Miscellaneous cleanups and updates to unit test infrastructure"

* tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits)
  libnvdimm, of_pmem: workaround OF_NUMA=n build error
  nfit, address-range-scrub: add module option to skip initial ars
  nfit, address-range-scrub: rework and simplify ARS state machine
  nfit, address-range-scrub: determine one platform max_ars value
  powerpc/powernv: Create platform devs for nvdimm buses
  doc/devicetree: Persistent memory region bindings
  libnvdimm: Add device-tree based driver
  libnvdimm: Add of_node to region and bus descriptors
  libnvdimm, region: quiet region probe
  libnvdimm, namespace: use a safe lookup for dimm device name
  libnvdimm, dimm: fix dpa reservation vs uninitialized label area
  libnvdimm, testing: update the default smart ctrl_temperature
  libnvdimm, testing: Add emulation for smart injection commands
  nfit, address-range-scrub: introduce nfit_spa->ars_state
  libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device'
  nfit, address-range-scrub: fix scrub in-progress reporting
  dax, dm: allow device-mapper to operate without dax support
  dax: introduce CONFIG_DAX_DRIVER
  fs, dax: use page->mapping to warn if truncate collides with a busy page
  ext2, dax: introduce ext2_dax_aops
  ...
Device-tree bindings for persistent memory regions
-----------------------------------------------------
Persistent memory refers to a class of memory devices that are:
a) Usable as main system memory (i.e. cacheable), and
b) Retain their contents across power failure.
Given b) it is best to think of persistent memory as a kind of memory mapped
storage device. To ensure data integrity the operating system needs to manage
persistent regions separately to the normal memory pool. To aid with that this
binding provides a standardised interface for discovering where persistent
memory regions exist inside the physical address space.
Bindings for the region nodes:
-----------------------------
Required properties:
- compatible = "pmem-region"
- reg = <base, size>;
The reg property should specificy an address range that is
translatable to a system physical address range. This address
range should be mappable as normal system memory would be
(i.e cacheable).
If the reg property contains multiple address ranges
each address range will be treated as though it was specified
in a separate device node. Having multiple address ranges in a
node implies no special relationship between the two ranges.
Optional properties:
- Any relevant NUMA assocativity properties for the target platform.
- volatile; This property indicates that this region is actually
backed by non-persistent memory. This lets the OS know that it
may skip the cache flushes required to ensure data is made
persistent after a write.
If this property is absent then the OS must assume that the region
is backed by non-volatile memory.
Examples:
--------------------
/*
* This node specifies one 4KB region spanning from
* 0x5000 to 0x5fff that is backed by non-volatile memory.
*/
pmem@5000 {
compatible = "pmem-region";
reg = <0x00005000 0x00001000>;
};
/*
* This node specifies two 4KB regions that are backed by
* volatile (normal) memory.
*/
pmem@6000 {
compatible = "pmem-region";
reg = < 0x00006000 0x00001000
0x00008000 0x00001000 >;
volatile;
};
......@@ -8048,6 +8048,14 @@ Q: https://patchwork.kernel.org/project/linux-nvdimm/list/
S: Supported
F: drivers/nvdimm/pmem*
LIBNVDIMM: DEVICETREE BINDINGS
M: Oliver O'Halloran <oohall@gmail.com>
L: linux-nvdimm@lists.01.org
Q: https://patchwork.kernel.org/project/linux-nvdimm/list/
S: Supported
F: drivers/nvdimm/of_pmem.c
F: Documentation/devicetree/bindings/pmem/pmem-region.txt
LIBNVDIMM: NON-VOLATILE MEMORY DEVICE SUBSYSTEM
M: Dan Williams <dan.j.williams@intel.com>
L: linux-nvdimm@lists.01.org
......
......@@ -824,6 +824,9 @@ static int __init opal_init(void)
/* Create i2c platform devices */
opal_pdev_init("ibm,opal-i2c");
/* Handle non-volatile memory devices */
opal_pdev_init("pmem-region");
/* Setup a heatbeat thread if requested by OPAL */
opal_init_heartbeat();
......
此差异已折叠。
......@@ -51,9 +51,8 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
if ((spa->address + spa->length - 1) < mce->addr)
continue;
found_match = 1;
dev_dbg(dev, "%s: addr in SPA %d (0x%llx, 0x%llx)\n",
__func__, spa->range_index, spa->address,
spa->length);
dev_dbg(dev, "addr in SPA %d (0x%llx, 0x%llx)\n",
spa->range_index, spa->address, spa->length);
/*
* We can break at the first match because we're going
* to rescan all the SPA ranges. There shouldn't be any
......
......@@ -117,10 +117,17 @@ enum nfit_dimm_notifiers {
NFIT_NOTIFY_DIMM_HEALTH = 0x81,
};
enum nfit_ars_state {
ARS_REQ,
ARS_DONE,
ARS_SHORT,
ARS_FAILED,
};
struct nfit_spa {
struct list_head list;
struct nd_region *nd_region;
unsigned int ars_required:1;
unsigned long ars_state;
u32 clear_err_unit;
u32 max_ars;
struct acpi_nfit_system_address spa[0];
......@@ -171,9 +178,8 @@ struct nfit_mem {
struct resource *flush_wpq;
unsigned long dsm_mask;
int family;
u32 has_lsi:1;
u32 has_lsr:1;
u32 has_lsw:1;
bool has_lsr;
bool has_lsw;
};
struct acpi_nfit_desc {
......@@ -191,18 +197,18 @@ struct acpi_nfit_desc {
struct device *dev;
u8 ars_start_flags;
struct nd_cmd_ars_status *ars_status;
size_t ars_status_size;
struct work_struct work;
struct delayed_work dwork;
struct list_head list;
struct kernfs_node *scrub_count_state;
unsigned int max_ars;
unsigned int scrub_count;
unsigned int scrub_mode;
unsigned int cancel:1;
unsigned int init_complete:1;
unsigned long dimm_cmd_force_en;
unsigned long bus_cmd_force_en;
unsigned long bus_nfit_cmd_force_en;
unsigned int platform_cap;
unsigned int scrub_tmo;
int (*blk_do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
void *iobuf, u64 len, int rw);
};
......@@ -244,7 +250,7 @@ struct nfit_blk {
extern struct list_head acpi_descs;
extern struct mutex acpi_desc_lock;
int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc, u8 flags);
int acpi_nfit_ars_rescan(struct acpi_nfit_desc *acpi_desc, unsigned long flags);
#ifdef CONFIG_X86_MCE
void nfit_mce_register(void);
......
config DAX_DRIVER
select DAX
bool
menuconfig DAX
tristate "DAX: direct access to differentiated memory"
select SRCU
......@@ -16,7 +20,6 @@ config DEV_DAX
baseline memory pool. Mappings of a /dev/daxX.Y device impose
restrictions that make the mapping behavior deterministic.
config DEV_DAX_PMEM
tristate "PMEM DAX: direct access to persistent memory"
depends on LIBNVDIMM && NVDIMM_DAX && DEV_DAX
......
......@@ -257,8 +257,8 @@ static int __dev_dax_pte_fault(struct dev_dax *dev_dax, struct vm_fault *vmf)
dax_region = dev_dax->region;
if (dax_region->align > PAGE_SIZE) {
dev_dbg(dev, "%s: alignment (%#x) > fault size (%#x)\n",
__func__, dax_region->align, fault_size);
dev_dbg(dev, "alignment (%#x) > fault size (%#x)\n",
dax_region->align, fault_size);
return VM_FAULT_SIGBUS;
}
......@@ -267,8 +267,7 @@ static int __dev_dax_pte_fault(struct dev_dax *dev_dax, struct vm_fault *vmf)
phys = dax_pgoff_to_phys(dev_dax, vmf->pgoff, PAGE_SIZE);
if (phys == -1) {
dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__,
vmf->pgoff);
dev_dbg(dev, "pgoff_to_phys(%#lx) failed\n", vmf->pgoff);
return VM_FAULT_SIGBUS;
}
......@@ -299,14 +298,14 @@ static int __dev_dax_pmd_fault(struct dev_dax *dev_dax, struct vm_fault *vmf)
dax_region = dev_dax->region;
if (dax_region->align > PMD_SIZE) {
dev_dbg(dev, "%s: alignment (%#x) > fault size (%#x)\n",
__func__, dax_region->align, fault_size);
dev_dbg(dev, "alignment (%#x) > fault size (%#x)\n",
dax_region->align, fault_size);
return VM_FAULT_SIGBUS;
}
/* dax pmd mappings require pfn_t_devmap() */
if ((dax_region->pfn_flags & (PFN_DEV|PFN_MAP)) != (PFN_DEV|PFN_MAP)) {
dev_dbg(dev, "%s: region lacks devmap flags\n", __func__);
dev_dbg(dev, "region lacks devmap flags\n");
return VM_FAULT_SIGBUS;
}
......@@ -323,8 +322,7 @@ static int __dev_dax_pmd_fault(struct dev_dax *dev_dax, struct vm_fault *vmf)
pgoff = linear_page_index(vmf->vma, pmd_addr);
phys = dax_pgoff_to_phys(dev_dax, pgoff, PMD_SIZE);
if (phys == -1) {
dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__,
pgoff);
dev_dbg(dev, "pgoff_to_phys(%#lx) failed\n", pgoff);
return VM_FAULT_SIGBUS;
}
......@@ -351,14 +349,14 @@ static int __dev_dax_pud_fault(struct dev_dax *dev_dax, struct vm_fault *vmf)
dax_region = dev_dax->region;
if (dax_region->align > PUD_SIZE) {
dev_dbg(dev, "%s: alignment (%#x) > fault size (%#x)\n",
__func__, dax_region->align, fault_size);
dev_dbg(dev, "alignment (%#x) > fault size (%#x)\n",
dax_region->align, fault_size);
return VM_FAULT_SIGBUS;
}
/* dax pud mappings require pfn_t_devmap() */
if ((dax_region->pfn_flags & (PFN_DEV|PFN_MAP)) != (PFN_DEV|PFN_MAP)) {
dev_dbg(dev, "%s: region lacks devmap flags\n", __func__);
dev_dbg(dev, "region lacks devmap flags\n");
return VM_FAULT_SIGBUS;
}
......@@ -375,8 +373,7 @@ static int __dev_dax_pud_fault(struct dev_dax *dev_dax, struct vm_fault *vmf)
pgoff = linear_page_index(vmf->vma, pud_addr);
phys = dax_pgoff_to_phys(dev_dax, pgoff, PUD_SIZE);
if (phys == -1) {
dev_dbg(dev, "%s: pgoff_to_phys(%#lx) failed\n", __func__,
pgoff);
dev_dbg(dev, "pgoff_to_phys(%#lx) failed\n", pgoff);
return VM_FAULT_SIGBUS;
}
......@@ -399,9 +396,8 @@ static int dev_dax_huge_fault(struct vm_fault *vmf,
struct file *filp = vmf->vma->vm_file;
struct dev_dax *dev_dax = filp->private_data;
dev_dbg(&dev_dax->dev, "%s: %s: %s (%#lx - %#lx) size = %d\n", __func__,
current->comm, (vmf->flags & FAULT_FLAG_WRITE)
? "write" : "read",
dev_dbg(&dev_dax->dev, "%s: %s (%#lx - %#lx) size = %d\n", current->comm,
(vmf->flags & FAULT_FLAG_WRITE) ? "write" : "read",
vmf->vma->vm_start, vmf->vma->vm_end, pe_size);
id = dax_read_lock();
......@@ -460,7 +456,7 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
struct dev_dax *dev_dax = filp->private_data;
int rc, id;
dev_dbg(&dev_dax->dev, "%s\n", __func__);
dev_dbg(&dev_dax->dev, "trace\n");
/*
* We lock to check dax_dev liveness and will re-check at
......@@ -518,7 +514,7 @@ static int dax_open(struct inode *inode, struct file *filp)
struct inode *__dax_inode = dax_inode(dax_dev);
struct dev_dax *dev_dax = dax_get_private(dax_dev);
dev_dbg(&dev_dax->dev, "%s\n", __func__);
dev_dbg(&dev_dax->dev, "trace\n");
inode->i_mapping = __dax_inode->i_mapping;
inode->i_mapping->host = __dax_inode;
filp->f_mapping = inode->i_mapping;
......@@ -533,7 +529,7 @@ static int dax_release(struct inode *inode, struct file *filp)
{
struct dev_dax *dev_dax = filp->private_data;
dev_dbg(&dev_dax->dev, "%s\n", __func__);
dev_dbg(&dev_dax->dev, "trace\n");
return 0;
}
......@@ -575,7 +571,7 @@ static void unregister_dev_dax(void *dev)
struct inode *inode = dax_inode(dax_dev);
struct cdev *cdev = inode->i_cdev;
dev_dbg(dev, "%s\n", __func__);
dev_dbg(dev, "trace\n");
kill_dev_dax(dev_dax);
cdev_device_del(cdev, dev);
......
......@@ -34,7 +34,7 @@ static void dax_pmem_percpu_release(struct percpu_ref *ref)
{
struct dax_pmem *dax_pmem = to_dax_pmem(ref);
dev_dbg(dax_pmem->dev, "%s\n", __func__);
dev_dbg(dax_pmem->dev, "trace\n");
complete(&dax_pmem->cmp);
}
......@@ -43,7 +43,7 @@ static void dax_pmem_percpu_exit(void *data)
struct percpu_ref *ref = data;
struct dax_pmem *dax_pmem = to_dax_pmem(ref);
dev_dbg(dax_pmem->dev, "%s\n", __func__);
dev_dbg(dax_pmem->dev, "trace\n");
wait_for_completion(&dax_pmem->cmp);
percpu_ref_exit(ref);
}
......@@ -53,7 +53,7 @@ static void dax_pmem_percpu_kill(void *data)
struct percpu_ref *ref = data;
struct dax_pmem *dax_pmem = to_dax_pmem(ref);
dev_dbg(dax_pmem->dev, "%s\n", __func__);
dev_dbg(dax_pmem->dev, "trace\n");
percpu_ref_kill(ref);
}
......@@ -150,17 +150,7 @@ static struct nd_device_driver dax_pmem_driver = {
.type = ND_DRIVER_DAX_PMEM,
};
static int __init dax_pmem_init(void)
{
return nd_driver_register(&dax_pmem_driver);
}
module_init(dax_pmem_init);
static void __exit dax_pmem_exit(void)
{
driver_unregister(&dax_pmem_driver.drv);
}
module_exit(dax_pmem_exit);
module_nd_driver(dax_pmem_driver);
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Intel Corporation");
......
......@@ -124,10 +124,19 @@ int __bdev_dax_supported(struct super_block *sb, int blocksize)
return len < 0 ? len : -EIO;
}
if ((IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn))
|| pfn_t_devmap(pfn))
if (IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) {
/*
* An arch that has enabled the pmem api should also
* have its drivers support pfn_t_devmap()
*
* This is a developer warning and should not trigger in
* production. dax_flush() will crash since it depends
* on being able to do (page_address(pfn_to_page())).
*/
WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
} else if (pfn_t_devmap(pfn)) {
/* pass */;
else {
} else {
pr_debug("VFS (%s): error: dax support not enabled\n",
sb->s_id);
return -EOPNOTSUPP;
......
......@@ -201,7 +201,7 @@ config BLK_DEV_DM_BUILTIN
config BLK_DEV_DM
tristate "Device mapper support"
select BLK_DEV_DM_BUILTIN
select DAX
depends on DAX || DAX=n
---help---
Device-mapper is a low level volume manager. It works by allowing
people to specify mappings for ranges of logical sectors. Various
......
......@@ -154,6 +154,7 @@ static int linear_iterate_devices(struct dm_target *ti,
return fn(ti, lc->dev, lc->start, ti->len, data);
}
#if IS_ENABLED(CONFIG_DAX_DRIVER)
static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
{
......@@ -184,6 +185,11 @@ static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
}
#else
#define linear_dax_direct_access NULL
#define linear_dax_copy_from_iter NULL
#endif
static struct target_type linear_target = {
.name = "linear",
.version = {1, 4, 0},
......
......@@ -611,51 +611,6 @@ static int log_mark(struct log_writes_c *lc, char *data)
return 0;
}
static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
struct iov_iter *i)
{
struct pending_block *block;
if (!bytes)
return 0;
block = kzalloc(sizeof(struct pending_block), GFP_KERNEL);
if (!block) {
DMERR("Error allocating dax pending block");
return -ENOMEM;
}
block->data = kzalloc(bytes, GFP_KERNEL);
if (!block->data) {
DMERR("Error allocating dax data space");
kfree(block);
return -ENOMEM;
}
/* write data provided via the iterator */
if (!copy_from_iter(block->data, bytes, i)) {
DMERR("Error copying dax data");
kfree(block->data);
kfree(block);
return -EIO;
}
/* rewind the iterator so that the block driver can use it */
iov_iter_revert(i, bytes);
block->datalen = bytes;
block->sector = bio_to_dev_sectors(lc, sector);
block->nr_sectors = ALIGN(bytes, lc->sectorsize) >> lc->sectorshift;
atomic_inc(&lc->pending_blocks);
spin_lock_irq(&lc->blocks_lock);
list_add_tail(&block->list, &lc->unflushed_blocks);
spin_unlock_irq(&lc->blocks_lock);
wake_up_process(lc->log_kthread);
return 0;
}
static void log_writes_dtr(struct dm_target *ti)
{
struct log_writes_c *lc = ti->private;
......@@ -925,6 +880,52 @@ static void log_writes_io_hints(struct dm_target *ti, struct queue_limits *limit
limits->io_min = limits->physical_block_size;
}
#if IS_ENABLED(CONFIG_DAX_DRIVER)
static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
struct iov_iter *i)
{
struct pending_block *block;
if (!bytes)
return 0;
block = kzalloc(sizeof(struct pending_block), GFP_KERNEL);
if (!block) {
DMERR("Error allocating dax pending block");
return -ENOMEM;
}
block->data = kzalloc(bytes, GFP_KERNEL);
if (!block->data) {
DMERR("Error allocating dax data space");
kfree(block);
return -ENOMEM;
}
/* write data provided via the iterator */
if (!copy_from_iter(block->data, bytes, i)) {
DMERR("Error copying dax data");
kfree(block->data);
kfree(block);
return -EIO;
}
/* rewind the iterator so that the block driver can use it */
iov_iter_revert(i, bytes);
block->datalen = bytes;
block->sector = bio_to_dev_sectors(lc, sector);
block->nr_sectors = ALIGN(bytes, lc->sectorsize) >> lc->sectorshift;
atomic_inc(&lc->pending_blocks);
spin_lock_irq(&lc->blocks_lock);
list_add_tail(&block->list, &lc->unflushed_blocks);
spin_unlock_irq(&lc->blocks_lock);
wake_up_process(lc->log_kthread);
return 0;
}
static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
{
......@@ -961,6 +962,10 @@ static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
dax_copy:
return dax_copy_from_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
}
#else
#define log_writes_dax_direct_access NULL
#define log_writes_dax_copy_from_iter NULL
#endif
static struct target_type log_writes_target = {
.name = "log-writes",
......
......@@ -313,6 +313,7 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
return DM_MAPIO_REMAPPED;
}
#if IS_ENABLED(CONFIG_DAX_DRIVER)
static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
{
......@@ -353,6 +354,11 @@ static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
}
#else
#define stripe_dax_direct_access NULL
#define stripe_dax_copy_from_iter NULL
#endif
/*
* Stripe status:
*
......
......@@ -1826,7 +1826,7 @@ static void cleanup_mapped_device(struct mapped_device *md)
static struct mapped_device *alloc_dev(int minor)
{
int r, numa_node_id = dm_get_numa_node();
struct dax_device *dax_dev;
struct dax_device *dax_dev = NULL;
struct mapped_device *md;
void *old_md;
......@@ -1892,9 +1892,11 @@ static struct mapped_device *alloc_dev(int minor)
md->disk->private_data = md;
sprintf(md->disk->disk_name, "dm-%d", minor);
dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops);
if (!dax_dev)
goto bad;
if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops);
if (!dax_dev)
goto bad;
}
md->dax_dev = dax_dev;
add_disk_no_queue_reg(md->disk);
......
......@@ -20,7 +20,7 @@ if LIBNVDIMM
config BLK_DEV_PMEM
tristate "PMEM: Persistent memory block device support"
default LIBNVDIMM
select DAX
select DAX_DRIVER
select ND_BTT if BTT
select ND_PFN if NVDIMM_PFN
help
......@@ -102,4 +102,15 @@ config NVDIMM_DAX
Select Y if unsure
config OF_PMEM
# FIXME: make tristate once OF_NUMA dependency removed
bool "Device-tree support for persistent memory regions"
depends on OF
default LIBNVDIMM
help
Allows regions of persistent memory to be described in the
device-tree.
Select Y if unsure.
endif
......@@ -4,6 +4,7 @@ obj-$(CONFIG_BLK_DEV_PMEM) += nd_pmem.o
obj-$(CONFIG_ND_BTT) += nd_btt.o
obj-$(CONFIG_ND_BLK) += nd_blk.o
obj-$(CONFIG_X86_PMEM_LEGACY) += nd_e820.o
obj-$(CONFIG_OF_PMEM) += of_pmem.o
nd_pmem-y := pmem.o
......
......@@ -26,7 +26,7 @@ static void nd_btt_release(struct device *dev)
struct nd_region *nd_region = to_nd_region(dev->parent);
struct nd_btt *nd_btt = to_nd_btt(dev);
dev_dbg(dev, "%s\n", __func__);
dev_dbg(dev, "trace\n");
nd_detach_ndns(&nd_btt->dev, &nd_btt->ndns);
ida_simple_remove(&nd_region->btt_ida, nd_btt->id);
kfree(nd_btt->uuid);
......@@ -74,8 +74,8 @@ static ssize_t sector_size_store(struct device *dev,
nvdimm_bus_lock(dev);
rc = nd_size_select_store(dev, buf, &nd_btt->lbasize,
btt_lbasize_supported);
dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
rc, buf, buf[len - 1] == '\n' ? "" : "\n");
dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
buf[len - 1] == '\n' ? "" : "\n");
nvdimm_bus_unlock(dev);
device_unlock(dev);
......@@ -101,8 +101,8 @@ static ssize_t uuid_store(struct device *dev,
device_lock(dev);
rc = nd_uuid_store(dev, &nd_btt->uuid, buf, len);
dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
rc, buf, buf[len - 1] == '\n' ? "" : "\n");
dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
buf[len - 1] == '\n' ? "" : "\n");
device_unlock(dev);
return rc ? rc : len;
......@@ -131,8 +131,8 @@ static ssize_t namespace_store(struct device *dev,
device_lock(dev);
nvdimm_bus_lock(dev);
rc = nd_namespace_store(dev, &nd_btt->ndns, buf, len);
dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
rc, buf, buf[len - 1] == '\n' ? "" : "\n");
dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
buf[len - 1] == '\n' ? "" : "\n");
nvdimm_bus_unlock(dev);
device_unlock(dev);
......@@ -206,8 +206,8 @@ static struct device *__nd_btt_create(struct nd_region *nd_region,
dev->groups = nd_btt_attribute_groups;
device_initialize(&nd_btt->dev);
if (ndns && !__nd_attach_ndns(&nd_btt->dev, ndns, &nd_btt->ndns)) {
dev_dbg(&ndns->dev, "%s failed, already claimed by %s\n",
__func__, dev_name(ndns->claim));
dev_dbg(&ndns->dev, "failed, already claimed by %s\n",
dev_name(ndns->claim));
put_device(dev);
return NULL;
}
......@@ -346,8 +346,7 @@ int nd_btt_probe(struct device *dev, struct nd_namespace_common *ndns)
return -ENOMEM;
btt_sb = devm_kzalloc(dev, sizeof(*btt_sb), GFP_KERNEL);
rc = __nd_btt_probe(to_nd_btt(btt_dev), ndns, btt_sb);
dev_dbg(dev, "%s: btt: %s\n", __func__,
rc == 0 ? dev_name(btt_dev) : "<none>");
dev_dbg(dev, "btt: %s\n", rc == 0 ? dev_name(btt_dev) : "<none>");
if (rc < 0) {
struct nd_btt *nd_btt = to_nd_btt(btt_dev);
......
......@@ -358,6 +358,7 @@ struct nvdimm_bus *nvdimm_bus_register(struct device *parent,
nvdimm_bus->dev.release = nvdimm_bus_release;
nvdimm_bus->dev.groups = nd_desc->attr_groups;
nvdimm_bus->dev.bus = &nvdimm_bus_type;
nvdimm_bus->dev.of_node = nd_desc->of_node;
dev_set_name(&nvdimm_bus->dev, "ndbus%d", nvdimm_bus->id);
rc = device_register(&nvdimm_bus->dev);
if (rc) {
......@@ -984,8 +985,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
if (cmd == ND_CMD_CALL) {
func = pkg.nd_command;
dev_dbg(dev, "%s:%s, idx: %llu, in: %u, out: %u, len %llu\n",
__func__, dimm_name, pkg.nd_command,
dev_dbg(dev, "%s, idx: %llu, in: %u, out: %u, len %llu\n",
dimm_name, pkg.nd_command,
in_len, out_len, buf_len);
}
......@@ -996,8 +997,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
u32 copy;
if (out_size == UINT_MAX) {
dev_dbg(dev, "%s:%s unknown output size cmd: %s field: %d\n",
__func__, dimm_name, cmd_name, i);
dev_dbg(dev, "%s unknown output size cmd: %s field: %d\n",
dimm_name, cmd_name, i);
return -EFAULT;
}
if (out_len < sizeof(out_env))
......@@ -1012,9 +1013,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
buf_len = (u64) out_len + (u64) in_len;
if (buf_len > ND_IOCTL_MAX_BUFLEN) {
dev_dbg(dev, "%s:%s cmd: %s buf_len: %llu > %d\n", __func__,
dimm_name, cmd_name, buf_len,
ND_IOCTL_MAX_BUFLEN);
dev_dbg(dev, "%s cmd: %s buf_len: %llu > %d\n", dimm_name,
cmd_name, buf_len, ND_IOCTL_MAX_BUFLEN);
return -EINVAL;
}
......
......@@ -148,7 +148,7 @@ ssize_t nd_namespace_store(struct device *dev,
char *name;
if (dev->driver) {
dev_dbg(dev, "%s: -EBUSY\n", __func__);
dev_dbg(dev, "namespace already active\n");
return -EBUSY;
}
......
......@@ -134,7 +134,7 @@ static void nvdimm_map_release(struct kref *kref)
nvdimm_map = container_of(kref, struct nvdimm_map, kref);
nvdimm_bus = nvdimm_map->nvdimm_bus;
dev_dbg(&nvdimm_bus->dev, "%s: %pa\n", __func__, &nvdimm_map->offset);
dev_dbg(&nvdimm_bus->dev, "%pa\n", &nvdimm_map->offset);
list_del(&nvdimm_map->list);
if (nvdimm_map->flags)
memunmap(nvdimm_map->mem);
......@@ -230,8 +230,8 @@ static int nd_uuid_parse(struct device *dev, u8 *uuid_out, const char *buf,
for (i = 0; i < 16; i++) {
if (!isxdigit(str[0]) || !isxdigit(str[1])) {
dev_dbg(dev, "%s: pos: %d buf[%zd]: %c buf[%zd]: %c\n",
__func__, i, str - buf, str[0],
dev_dbg(dev, "pos: %d buf[%zd]: %c buf[%zd]: %c\n",
i, str - buf, str[0],
str + 1 - buf, str[1]);
return -EINVAL;
}
......
......@@ -24,7 +24,7 @@ static void nd_dax_release(struct device *dev)
struct nd_dax *nd_dax = to_nd_dax(dev);
struct nd_pfn *nd_pfn = &nd_dax->nd_pfn;
dev_dbg(dev, "%s\n", __func__);
dev_dbg(dev, "trace\n");
nd_detach_ndns(dev, &nd_pfn->ndns);
ida_simple_remove(&nd_region->dax_ida, nd_pfn->id);
kfree(nd_pfn->uuid);
......@@ -129,8 +129,7 @@ int nd_dax_probe(struct device *dev, struct nd_namespace_common *ndns)
pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, DAX_SIG);
dev_dbg(dev, "%s: dax: %s\n", __func__,
rc == 0 ? dev_name(dax_dev) : "<none>");
dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "<none>");
if (rc < 0) {
nd_detach_ndns(dax_dev, &nd_pfn->ndns);
put_device(dax_dev);
......
......@@ -67,9 +67,11 @@ static int nvdimm_probe(struct device *dev)
ndd->ns_next = nd_label_next_nsindex(ndd->ns_current);
nd_label_copy(ndd, to_next_namespace_index(ndd),
to_current_namespace_index(ndd));
rc = nd_label_reserve_dpa(ndd);
if (ndd->ns_current >= 0)
nvdimm_set_aliasing(dev);
if (ndd->ns_current >= 0) {
rc = nd_label_reserve_dpa(ndd);
if (rc == 0)
nvdimm_set_aliasing(dev);
}
nvdimm_clear_locked(dev);
nvdimm_bus_unlock(dev);
......
......@@ -131,7 +131,7 @@ int nvdimm_init_config_data(struct nvdimm_drvdata *ndd)
}
memcpy(ndd->data + offset, cmd->out_buf, cmd->in_length);
}
dev_dbg(ndd->dev, "%s: len: %zu rc: %d\n", __func__, offset, rc);
dev_dbg(ndd->dev, "len: %zu rc: %d\n", offset, rc);
kfree(cmd);
return rc;
......@@ -266,8 +266,7 @@ void nvdimm_drvdata_release(struct kref *kref)
struct device *dev = ndd->dev;
struct resource *res, *_r;
dev_dbg(dev, "%s\n", __func__);
dev_dbg(dev, "trace\n");
nvdimm_bus_lock(dev);
for_each_dpa_resource_safe(ndd, res, _r)
nvdimm_free_dpa(ndd, res);
......@@ -660,7 +659,7 @@ int nvdimm_bus_check_dimm_count(struct nvdimm_bus *nvdimm_bus, int dimm_count)
nd_synchronize();
device_for_each_child(&nvdimm_bus->dev, &count, count_dimms);
dev_dbg(&nvdimm_bus->dev, "%s: count: %d\n", __func__, count);
dev_dbg(&nvdimm_bus->dev, "count: %d\n", count);
if (count != dimm_count)
return -ENXIO;
return 0;
......
......@@ -45,9 +45,27 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata *ndd)
return ndd->nslabel_size;
}
static size_t __sizeof_namespace_index(u32 nslot)
{
return ALIGN(sizeof(struct nd_namespace_index) + DIV_ROUND_UP(nslot, 8),
NSINDEX_ALIGN);
}
static int __nvdimm_num_label_slots(struct nvdimm_drvdata *ndd,
size_t index_size)
{
return (ndd->nsarea.config_size - index_size * 2) /
sizeof_namespace_label(ndd);
}
int nvdimm_num_label_slots(struct nvdimm_drvdata *ndd)
{
return ndd->nsarea.config_size / (sizeof_namespace_label(ndd) + 1);
u32 tmp_nslot, n;
tmp_nslot = ndd->nsarea.config_size / sizeof_namespace_label(ndd);
n = __sizeof_namespace_index(tmp_nslot) / NSINDEX_ALIGN;
return __nvdimm_num_label_slots(ndd, NSINDEX_ALIGN * n);
}
size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd)
......@@ -55,18 +73,14 @@ size_t sizeof_namespace_index(struct nvdimm_drvdata *ndd)
u32 nslot, space, size;
/*
* The minimum index space is 512 bytes, with that amount of
* index we can describe ~1400 labels which is less than a byte
* of overhead per label. Round up to a byte of overhead per
* label and determine the size of the index region. Yes, this
* starts to waste space at larger config_sizes, but it's
* unlikely we'll ever see anything but 128K.
* Per UEFI 2.7, the minimum size of the Label Storage Area is large
* enough to hold 2 index blocks and 2 labels. The minimum index
* block size is 256 bytes, and the minimum label size is 256 bytes.
*/
nslot = nvdimm_num_label_slots(ndd);
space = ndd->nsarea.config_size - nslot * sizeof_namespace_label(ndd);
size = ALIGN(sizeof(struct nd_namespace_index) + DIV_ROUND_UP(nslot, 8),
NSINDEX_ALIGN) * 2;
if (size <= space)
size = __sizeof_namespace_index(nslot) * 2;
if (size <= space && nslot >= 2)
return size / 2;
dev_err(ndd->dev, "label area (%d) too small to host (%d byte) labels\n",
......@@ -121,8 +135,7 @@ static int __nd_label_validate(struct nvdimm_drvdata *ndd)
memcpy(sig, nsindex[i]->sig, NSINDEX_SIG_LEN);
if (memcmp(sig, NSINDEX_SIGNATURE, NSINDEX_SIG_LEN) != 0) {
dev_dbg(dev, "%s: nsindex%d signature invalid\n",
__func__, i);
dev_dbg(dev, "nsindex%d signature invalid\n", i);
continue;
}
......@@ -135,8 +148,8 @@ static int __nd_label_validate(struct nvdimm_drvdata *ndd)
labelsize = 128;
if (labelsize != sizeof_namespace_label(ndd)) {
dev_dbg(dev, "%s: nsindex%d labelsize %d invalid\n",
__func__, i, nsindex[i]->labelsize);
dev_dbg(dev, "nsindex%d labelsize %d invalid\n",
i, nsindex[i]->labelsize);
continue;
}
......@@ -145,30 +158,28 @@ static int __nd_label_validate(struct nvdimm_drvdata *ndd)
sum = nd_fletcher64(nsindex[i], sizeof_namespace_index(ndd), 1);
nsindex[i]->checksum = __cpu_to_le64(sum_save);
if (sum != sum_save) {
dev_dbg(dev, "%s: nsindex%d checksum invalid\n",
__func__, i);
dev_dbg(dev, "nsindex%d checksum invalid\n", i);
continue;
}
seq = __le32_to_cpu(nsindex[i]->seq);
if ((seq & NSINDEX_SEQ_MASK) == 0) {
dev_dbg(dev, "%s: nsindex%d sequence: %#x invalid\n",
__func__, i, seq);
dev_dbg(dev, "nsindex%d sequence: %#x invalid\n", i, seq);
continue;
}
/* sanity check the index against expected values */
if (__le64_to_cpu(nsindex[i]->myoff)
!= i * sizeof_namespace_index(ndd)) {
dev_dbg(dev, "%s: nsindex%d myoff: %#llx invalid\n",
__func__, i, (unsigned long long)
dev_dbg(dev, "nsindex%d myoff: %#llx invalid\n",
i, (unsigned long long)
__le64_to_cpu(nsindex[i]->myoff));
continue;
}
if (__le64_to_cpu(nsindex[i]->otheroff)
!= (!i) * sizeof_namespace_index(ndd)) {
dev_dbg(dev, "%s: nsindex%d otheroff: %#llx invalid\n",
__func__, i, (unsigned long long)
dev_dbg(dev, "nsindex%d otheroff: %#llx invalid\n",
i, (unsigned long long)
__le64_to_cpu(nsindex[i]->otheroff));
continue;
}
......@@ -176,8 +187,7 @@ static int __nd_label_validate(struct nvdimm_drvdata *ndd)
size = __le64_to_cpu(nsindex[i]->mysize);
if (size > sizeof_namespace_index(ndd)
|| size < sizeof(struct nd_namespace_index)) {
dev_dbg(dev, "%s: nsindex%d mysize: %#llx invalid\n",
__func__, i, size);
dev_dbg(dev, "nsindex%d mysize: %#llx invalid\n", i, size);
continue;
}
......@@ -185,9 +195,8 @@ static int __nd_label_validate(struct nvdimm_drvdata *ndd)
if (nslot * sizeof_namespace_label(ndd)
+ 2 * sizeof_namespace_index(ndd)
> ndd->nsarea.config_size) {
dev_dbg(dev, "%s: nsindex%d nslot: %u invalid, config_size: %#x\n",
__func__, i, nslot,
ndd->nsarea.config_size);
dev_dbg(dev, "nsindex%d nslot: %u invalid, config_size: %#x\n",
i, nslot, ndd->nsarea.config_size);
continue;
}
valid[i] = true;
......@@ -356,8 +365,8 @@ static bool slot_valid(struct nvdimm_drvdata *ndd,
sum = nd_fletcher64(nd_label, sizeof_namespace_label(ndd), 1);
nd_label->checksum = __cpu_to_le64(sum_save);
if (sum != sum_save) {
dev_dbg(ndd->dev, "%s fail checksum. slot: %d expect: %#llx\n",
__func__, slot, sum);
dev_dbg(ndd->dev, "fail checksum. slot: %d expect: %#llx\n",
slot, sum);
return false;
}
}
......@@ -422,8 +431,8 @@ int nd_label_active_count(struct nvdimm_drvdata *ndd)
u64 dpa = __le64_to_cpu(nd_label->dpa);
dev_dbg(ndd->dev,
"%s: slot%d invalid slot: %d dpa: %llx size: %llx\n",
__func__, slot, label_slot, dpa, size);
"slot%d invalid slot: %d dpa: %llx size: %llx\n",
slot, label_slot, dpa, size);
continue;
}
count++;
......@@ -650,7 +659,7 @@ static int __pmem_label_update(struct nd_region *nd_region,
slot = nd_label_alloc_slot(ndd);
if (slot == UINT_MAX)
return -ENXIO;
dev_dbg(ndd->dev, "%s: allocated: %d\n", __func__, slot);
dev_dbg(ndd->dev, "allocated: %d\n", slot);
nd_label = to_label(ndd, slot);
memset(nd_label, 0, sizeof_namespace_label(ndd));
......@@ -678,7 +687,7 @@ static int __pmem_label_update(struct nd_region *nd_region,
sum = nd_fletcher64(nd_label, sizeof_namespace_label(ndd), 1);
nd_label->checksum = __cpu_to_le64(sum);
}
nd_dbg_dpa(nd_region, ndd, res, "%s\n", __func__);
nd_dbg_dpa(nd_region, ndd, res, "\n");
/* update label */
offset = nd_label_offset(ndd, nd_label);
......@@ -700,7 +709,7 @@ static int __pmem_label_update(struct nd_region *nd_region,
break;
}
if (victim) {
dev_dbg(ndd->dev, "%s: free: %d\n", __func__, slot);
dev_dbg(ndd->dev, "free: %d\n", slot);
slot = to_slot(ndd, victim->label);
nd_label_free_slot(ndd, slot);
victim->label = NULL;
......@@ -868,7 +877,7 @@ static int __blk_label_update(struct nd_region *nd_region,
slot = nd_label_alloc_slot(ndd);
if (slot == UINT_MAX)
goto abort;
dev_dbg(ndd->dev, "%s: allocated: %d\n", __func__, slot);
dev_dbg(ndd->dev, "allocated: %d\n", slot);
nd_label = to_label(ndd, slot);
memset(nd_label, 0, sizeof_namespace_label(ndd));
......@@ -928,7 +937,7 @@ static int __blk_label_update(struct nd_region *nd_region,
/* free up now unused slots in the new index */
for_each_set_bit(slot, victim_map, victim_map ? nslot : 0) {
dev_dbg(ndd->dev, "%s: free: %d\n", __func__, slot);
dev_dbg(ndd->dev, "free: %d\n", slot);
nd_label_free_slot(ndd, slot);
}
......@@ -1092,7 +1101,7 @@ static int del_labels(struct nd_mapping *nd_mapping, u8 *uuid)
active--;
slot = to_slot(ndd, nd_label);
nd_label_free_slot(ndd, slot);
dev_dbg(ndd->dev, "%s: free: %d\n", __func__, slot);
dev_dbg(ndd->dev, "free: %d\n", slot);
list_move_tail(&label_ent->list, &list);
label_ent->label = NULL;
}
......@@ -1100,7 +1109,7 @@ static int del_labels(struct nd_mapping *nd_mapping, u8 *uuid)
if (active == 0) {
nd_mapping_free_labels(nd_mapping);
dev_dbg(ndd->dev, "%s: no more active labels\n", __func__);
dev_dbg(ndd->dev, "no more active labels\n");
}
mutex_unlock(&nd_mapping->lock);
......
......@@ -33,7 +33,7 @@ enum {
BTTINFO_UUID_LEN = 16,
BTTINFO_FLAG_ERROR = 0x1, /* error state (read-only) */
BTTINFO_MAJOR_VERSION = 1,
ND_LABEL_MIN_SIZE = 512 * 129, /* see sizeof_namespace_index() */
ND_LABEL_MIN_SIZE = 256 * 4, /* see sizeof_namespace_index() */
ND_LABEL_ID_SIZE = 50,
ND_NSINDEX_INIT = 0x1,
};
......
......@@ -421,7 +421,7 @@ static ssize_t alt_name_store(struct device *dev,
rc = __alt_name_store(dev, buf, len);
if (rc >= 0)
rc = nd_namespace_label_update(nd_region, dev);
dev_dbg(dev, "%s: %s(%zd)\n", __func__, rc < 0 ? "fail " : "", rc);
dev_dbg(dev, "%s(%zd)\n", rc < 0 ? "fail " : "", rc);
nvdimm_bus_unlock(dev);
device_unlock(dev);
......@@ -1007,7 +1007,7 @@ static ssize_t __size_store(struct device *dev, unsigned long long val)
if (uuid_not_set(uuid, dev, __func__))
return -ENXIO;
if (nd_region->ndr_mappings == 0) {
dev_dbg(dev, "%s: not associated with dimm(s)\n", __func__);
dev_dbg(dev, "not associated with dimm(s)\n");
return -ENXIO;
}
......@@ -1105,8 +1105,7 @@ static ssize_t size_store(struct device *dev,
*uuid = NULL;
}
dev_dbg(dev, "%s: %llx %s (%d)\n", __func__, val, rc < 0
? "fail" : "success", rc);
dev_dbg(dev, "%llx %s (%d)\n", val, rc < 0 ? "fail" : "success", rc);
nvdimm_bus_unlock(dev);
device_unlock(dev);
......@@ -1270,8 +1269,8 @@ static ssize_t uuid_store(struct device *dev,
rc = nd_namespace_label_update(nd_region, dev);
else
kfree(uuid);
dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
rc, buf, buf[len - 1] == '\n' ? "" : "\n");
dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
buf[len - 1] == '\n' ? "" : "\n");
nvdimm_bus_unlock(dev);
device_unlock(dev);
......@@ -1355,9 +1354,8 @@ static ssize_t sector_size_store(struct device *dev,
rc = nd_size_select_store(dev, buf, lbasize, supported);
if (rc >= 0)
rc = nd_namespace_label_update(nd_region, dev);
dev_dbg(dev, "%s: result: %zd %s: %s%s", __func__,
rc, rc < 0 ? "tried" : "wrote", buf,
buf[len - 1] == '\n' ? "" : "\n");
dev_dbg(dev, "result: %zd %s: %s%s", rc, rc < 0 ? "tried" : "wrote",
buf, buf[len - 1] == '\n' ? "" : "\n");
nvdimm_bus_unlock(dev);
device_unlock(dev);
......@@ -1519,7 +1517,7 @@ static ssize_t holder_class_store(struct device *dev,
rc = __holder_class_store(dev, buf);
if (rc >= 0)
rc = nd_namespace_label_update(nd_region, dev);
dev_dbg(dev, "%s: %s(%zd)\n", __func__, rc < 0 ? "fail " : "", rc);
dev_dbg(dev, "%s(%zd)\n", rc < 0 ? "fail " : "", rc);
nvdimm_bus_unlock(dev);
device_unlock(dev);
......@@ -1717,8 +1715,7 @@ struct nd_namespace_common *nvdimm_namespace_common_probe(struct device *dev)
if (uuid_not_set(nsblk->uuid, &ndns->dev, __func__))
return ERR_PTR(-ENODEV);
if (!nsblk->lbasize) {
dev_dbg(&ndns->dev, "%s: sector size not set\n",
__func__);
dev_dbg(&ndns->dev, "sector size not set\n");
return ERR_PTR(-ENODEV);
}
if (!nd_namespace_blk_validate(nsblk))
......@@ -1798,9 +1795,7 @@ static bool has_uuid_at_pos(struct nd_region *nd_region, u8 *uuid,
}
if (found_uuid) {
dev_dbg(ndd->dev,
"%s duplicate entry for uuid\n",
__func__);
dev_dbg(ndd->dev, "duplicate entry for uuid\n");
return false;
}
found_uuid = true;
......@@ -1926,7 +1921,7 @@ static struct device *create_namespace_pmem(struct nd_region *nd_region,
}
if (i < nd_region->ndr_mappings) {
struct nvdimm_drvdata *ndd = to_ndd(&nd_region->mapping[i]);
struct nvdimm *nvdimm = nd_region->mapping[i].nvdimm;
/*
* Give up if we don't find an instance of a uuid at each
......@@ -1934,7 +1929,7 @@ static struct device *create_namespace_pmem(struct nd_region *nd_region,
* find a dimm with two instances of the same uuid.
*/
dev_err(&nd_region->dev, "%s missing label for %pUb\n",
dev_name(ndd->dev), nd_label->uuid);
nvdimm_name(nvdimm), nd_label->uuid);
rc = -EINVAL;
goto err;
}
......@@ -1994,14 +1989,13 @@ static struct device *create_namespace_pmem(struct nd_region *nd_region,
namespace_pmem_release(dev);
switch (rc) {
case -EINVAL:
dev_dbg(&nd_region->dev, "%s: invalid label(s)\n", __func__);
dev_dbg(&nd_region->dev, "invalid label(s)\n");
break;
case -ENODEV:
dev_dbg(&nd_region->dev, "%s: label not found\n", __func__);
dev_dbg(&nd_region->dev, "label not found\n");
break;
default:
dev_dbg(&nd_region->dev, "%s: unexpected err: %d\n",
__func__, rc);
dev_dbg(&nd_region->dev, "unexpected err: %d\n", rc);
break;
}
return ERR_PTR(rc);
......@@ -2334,8 +2328,8 @@ static struct device **scan_labels(struct nd_region *nd_region)
}
dev_dbg(&nd_region->dev, "%s: discovered %d %s namespace%s\n",
__func__, count, is_nd_blk(&nd_region->dev)
dev_dbg(&nd_region->dev, "discovered %d %s namespace%s\n",
count, is_nd_blk(&nd_region->dev)
? "blk" : "pmem", count == 1 ? "" : "s");
if (count == 0) {
......@@ -2467,7 +2461,7 @@ static int init_active_labels(struct nd_region *nd_region)
get_ndd(ndd);
count = nd_label_active_count(ndd);
dev_dbg(ndd->dev, "%s: %d\n", __func__, count);
dev_dbg(ndd->dev, "count: %d\n", count);
if (!count)
continue;
for (j = 0; j < count; j++) {
......
......@@ -340,7 +340,6 @@ static inline struct device *nd_dax_create(struct nd_region *nd_region)
}
#endif
struct nd_region *to_nd_region(struct device *dev);
int nd_region_to_nstype(struct nd_region *nd_region);
int nd_region_register_namespaces(struct nd_region *nd_region, int *err);
u64 nd_region_interleave_set_cookie(struct nd_region *nd_region,
......
// SPDX-License-Identifier: GPL-2.0+
#define pr_fmt(fmt) "of_pmem: " fmt
#include <linux/of_platform.h>
#include <linux/of_address.h>
#include <linux/libnvdimm.h>
#include <linux/module.h>
#include <linux/ioport.h>
#include <linux/slab.h>
static const struct attribute_group *region_attr_groups[] = {
&nd_region_attribute_group,
&nd_device_attribute_group,
NULL,
};
static const struct attribute_group *bus_attr_groups[] = {
&nvdimm_bus_attribute_group,
NULL,
};
struct of_pmem_private {
struct nvdimm_bus_descriptor bus_desc;
struct nvdimm_bus *bus;
};
static int of_pmem_region_probe(struct platform_device *pdev)
{
struct of_pmem_private *priv;
struct device_node *np;
struct nvdimm_bus *bus;
bool is_volatile;
int i;
np = dev_of_node(&pdev->dev);
if (!np)
return -ENXIO;
priv = kzalloc(sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
priv->bus_desc.attr_groups = bus_attr_groups;
priv->bus_desc.provider_name = "of_pmem";
priv->bus_desc.module = THIS_MODULE;
priv->bus_desc.of_node = np;
priv->bus = bus = nvdimm_bus_register(&pdev->dev, &priv->bus_desc);
if (!bus) {
kfree(priv);
return -ENODEV;
}
platform_set_drvdata(pdev, priv);
is_volatile = !!of_find_property(np, "volatile", NULL);
dev_dbg(&pdev->dev, "Registering %s regions from %pOF\n",
is_volatile ? "volatile" : "non-volatile", np);
for (i = 0; i < pdev->num_resources; i++) {
struct nd_region_desc ndr_desc;
struct nd_region *region;
/*
* NB: libnvdimm copies the data from ndr_desc into it's own
* structures so passing a stack pointer is fine.
*/
memset(&ndr_desc, 0, sizeof(ndr_desc));
ndr_desc.attr_groups = region_attr_groups;
ndr_desc.numa_node = of_node_to_nid(np);
ndr_desc.res = &pdev->resource[i];
ndr_desc.of_node = np;
set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
if (is_volatile)
region = nvdimm_volatile_region_create(bus, &ndr_desc);
else
region = nvdimm_pmem_region_create(bus, &ndr_desc);
if (!region)
dev_warn(&pdev->dev, "Unable to register region %pR from %pOF\n",
ndr_desc.res, np);
else
dev_dbg(&pdev->dev, "Registered region %pR from %pOF\n",
ndr_desc.res, np);
}
return 0;
}
static int of_pmem_region_remove(struct platform_device *pdev)
{
struct of_pmem_private *priv = platform_get_drvdata(pdev);
nvdimm_bus_unregister(priv->bus);
kfree(priv);
return 0;
}
static const struct of_device_id of_pmem_region_match[] = {
{ .compatible = "pmem-region" },
{ },
};
static struct platform_driver of_pmem_region_driver = {
.probe = of_pmem_region_probe,
.remove = of_pmem_region_remove,
.driver = {
.name = "of_pmem",
.owner = THIS_MODULE,
.of_match_table = of_pmem_region_match,
},
};
module_platform_driver(of_pmem_region_driver);
MODULE_DEVICE_TABLE(of, of_pmem_region_match);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("IBM Corporation");
......@@ -27,7 +27,7 @@ static void nd_pfn_release(struct device *dev)
struct nd_region *nd_region = to_nd_region(dev->parent);
struct nd_pfn *nd_pfn = to_nd_pfn(dev);
dev_dbg(dev, "%s\n", __func__);
dev_dbg(dev, "trace\n");
nd_detach_ndns(&nd_pfn->dev, &nd_pfn->ndns);
ida_simple_remove(&nd_region->pfn_ida, nd_pfn->id);
kfree(nd_pfn->uuid);
......@@ -94,8 +94,8 @@ static ssize_t mode_store(struct device *dev,
else
rc = -EINVAL;
}
dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
rc, buf, buf[len - 1] == '\n' ? "" : "\n");
dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
buf[len - 1] == '\n' ? "" : "\n");
nvdimm_bus_unlock(dev);
device_unlock(dev);
......@@ -144,8 +144,8 @@ static ssize_t align_store(struct device *dev,
nvdimm_bus_lock(dev);
rc = nd_size_select_store(dev, buf, &nd_pfn->align,
nd_pfn_supported_alignments());
dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
rc, buf, buf[len - 1] == '\n' ? "" : "\n");
dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
buf[len - 1] == '\n' ? "" : "\n");
nvdimm_bus_unlock(dev);
device_unlock(dev);
......@@ -171,8 +171,8 @@ static ssize_t uuid_store(struct device *dev,
device_lock(dev);
rc = nd_uuid_store(dev, &nd_pfn->uuid, buf, len);
dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
rc, buf, buf[len - 1] == '\n' ? "" : "\n");
dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
buf[len - 1] == '\n' ? "" : "\n");
device_unlock(dev);
return rc ? rc : len;
......@@ -201,8 +201,8 @@ static ssize_t namespace_store(struct device *dev,
device_lock(dev);
nvdimm_bus_lock(dev);
rc = nd_namespace_store(dev, &nd_pfn->ndns, buf, len);
dev_dbg(dev, "%s: result: %zd wrote: %s%s", __func__,
rc, buf, buf[len - 1] == '\n' ? "" : "\n");
dev_dbg(dev, "result: %zd wrote: %s%s", rc, buf,
buf[len - 1] == '\n' ? "" : "\n");
nvdimm_bus_unlock(dev);
device_unlock(dev);
......@@ -314,8 +314,8 @@ struct device *nd_pfn_devinit(struct nd_pfn *nd_pfn,
dev = &nd_pfn->dev;
device_initialize(&nd_pfn->dev);
if (ndns && !__nd_attach_ndns(&nd_pfn->dev, ndns, &nd_pfn->ndns)) {
dev_dbg(&ndns->dev, "%s failed, already claimed by %s\n",
__func__, dev_name(ndns->claim));
dev_dbg(&ndns->dev, "failed, already claimed by %s\n",
dev_name(ndns->claim));
put_device(dev);
return NULL;
}
......@@ -510,8 +510,7 @@ int nd_pfn_probe(struct device *dev, struct nd_namespace_common *ndns)
nd_pfn = to_nd_pfn(pfn_dev);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, PFN_SIG);
dev_dbg(dev, "%s: pfn: %s\n", __func__,
rc == 0 ? dev_name(pfn_dev) : "<none>");
dev_dbg(dev, "pfn: %s\n", rc == 0 ? dev_name(pfn_dev) : "<none>");
if (rc < 0) {
nd_detach_ndns(pfn_dev, &nd_pfn->ndns);
put_device(pfn_dev);
......
......@@ -66,7 +66,7 @@ static blk_status_t pmem_clear_poison(struct pmem_device *pmem,
rc = BLK_STS_IOERR;
if (cleared > 0 && cleared / 512) {
cleared /= 512;
dev_dbg(dev, "%s: %#llx clear %ld sector%s\n", __func__,
dev_dbg(dev, "%#llx clear %ld sector%s\n",
(unsigned long long) sector, cleared,
cleared > 1 ? "s" : "");
badblocks_clear(&pmem->bb, sector, cleared);
......@@ -547,17 +547,7 @@ static struct nd_device_driver nd_pmem_driver = {
.type = ND_DRIVER_NAMESPACE_IO | ND_DRIVER_NAMESPACE_PMEM,
};
static int __init pmem_init(void)
{
return nd_driver_register(&nd_pmem_driver);
}
module_init(pmem_init);
static void pmem_exit(void)
{
driver_unregister(&nd_pmem_driver.drv);
}
module_exit(pmem_exit);
module_nd_driver(nd_pmem_driver);
MODULE_AUTHOR("Ross Zwisler <ross.zwisler@linux.intel.com>");
MODULE_LICENSE("GPL v2");
......@@ -27,10 +27,10 @@ static int nd_region_probe(struct device *dev)
if (nd_region->num_lanes > num_online_cpus()
&& nd_region->num_lanes < num_possible_cpus()
&& !test_and_set_bit(0, &once)) {
dev_info(dev, "online cpus (%d) < concurrent i/o lanes (%d) < possible cpus (%d)\n",
dev_dbg(dev, "online cpus (%d) < concurrent i/o lanes (%d) < possible cpus (%d)\n",
num_online_cpus(), nd_region->num_lanes,
num_possible_cpus());
dev_info(dev, "setting nr_cpus=%d may yield better libnvdimm device performance\n",
dev_dbg(dev, "setting nr_cpus=%d may yield better libnvdimm device performance\n",
nd_region->num_lanes);
}
......
......@@ -182,6 +182,14 @@ struct nd_region *to_nd_region(struct device *dev)
}
EXPORT_SYMBOL_GPL(to_nd_region);
struct device *nd_region_dev(struct nd_region *nd_region)
{
if (!nd_region)
return NULL;
return &nd_region->dev;
}
EXPORT_SYMBOL_GPL(nd_region_dev);
struct nd_blk_region *to_nd_blk_region(struct device *dev)
{
struct nd_region *nd_region = to_nd_region(dev);
......@@ -1014,6 +1022,7 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
dev->parent = &nvdimm_bus->dev;
dev->type = dev_type;
dev->groups = ndr_desc->attr_groups;
dev->of_node = ndr_desc->of_node;
nd_region->ndr_size = resource_size(ndr_desc->res);
nd_region->ndr_start = ndr_desc->res->start;
nd_device_register(dev);
......
......@@ -15,8 +15,8 @@ config BLK_DEV_XPRAM
config DCSSBLK
def_tristate m
select DAX
select FS_DAX_LIMITED
select DAX_DRIVER
prompt "DCSSBLK support"
depends on S390 && BLOCK
help
......
......@@ -1948,11 +1948,6 @@ static int blkdev_releasepage(struct page *page, gfp_t wait)
static int blkdev_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
if (dax_mapping(mapping)) {
struct block_device *bdev = I_BDEV(mapping->host);
return dax_writeback_mapping_range(mapping, bdev, wbc);
}
return generic_writepages(mapping, wbc);
}
......
......@@ -73,16 +73,15 @@ fs_initcall(init_dax_wait_table);
#define RADIX_DAX_ZERO_PAGE (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 2))
#define RADIX_DAX_EMPTY (1 << (RADIX_TREE_EXCEPTIONAL_SHIFT + 3))
static unsigned long dax_radix_sector(void *entry)
static unsigned long dax_radix_pfn(void *entry)
{
return (unsigned long)entry >> RADIX_DAX_SHIFT;
}
static void *dax_radix_locked_entry(sector_t sector, unsigned long flags)
static void *dax_radix_locked_entry(unsigned long pfn, unsigned long flags)
{
return (void *)(RADIX_TREE_EXCEPTIONAL_ENTRY | flags |
((unsigned long)sector << RADIX_DAX_SHIFT) |
RADIX_DAX_ENTRY_LOCK);
(pfn << RADIX_DAX_SHIFT) | RADIX_DAX_ENTRY_LOCK);
}
static unsigned int dax_radix_order(void *entry)
......@@ -299,6 +298,63 @@ static void put_unlocked_mapping_entry(struct address_space *mapping,
dax_wake_mapping_entry_waiter(mapping, index, entry, false);
}
static unsigned long dax_entry_size(void *entry)
{
if (dax_is_zero_entry(entry))
return 0;
else if (dax_is_empty_entry(entry))
return 0;
else if (dax_is_pmd_entry(entry))
return PMD_SIZE;
else
return PAGE_SIZE;
}
static unsigned long dax_radix_end_pfn(void *entry)
{
return dax_radix_pfn(entry) + dax_entry_size(entry) / PAGE_SIZE;
}
/*
* Iterate through all mapped pfns represented by an entry, i.e. skip
* 'empty' and 'zero' entries.
*/
#define for_each_mapped_pfn(entry, pfn) \
for (pfn = dax_radix_pfn(entry); \
pfn < dax_radix_end_pfn(entry); pfn++)
static void dax_associate_entry(void *entry, struct address_space *mapping)
{
unsigned long pfn;
if (IS_ENABLED(CONFIG_FS_DAX_LIMITED))
return;
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);
WARN_ON_ONCE(page->mapping);
page->mapping = mapping;
}
}
static void dax_disassociate_entry(void *entry, struct address_space *mapping,
bool trunc)
{
unsigned long pfn;
if (IS_ENABLED(CONFIG_FS_DAX_LIMITED))
return;
for_each_mapped_pfn(entry, pfn) {
struct page *page = pfn_to_page(pfn);
WARN_ON_ONCE(trunc && page_ref_count(page) > 1);
WARN_ON_ONCE(page->mapping && page->mapping != mapping);
page->mapping = NULL;
}
}
/*
* Find radix tree entry at given index. If it points to an exceptional entry,
* return it with the radix tree entry locked. If the radix tree doesn't
......@@ -405,6 +461,7 @@ static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index,
}
if (pmd_downgrade) {
dax_disassociate_entry(entry, mapping, false);
radix_tree_delete(&mapping->page_tree, index);
mapping->nrexceptional--;
dax_wake_mapping_entry_waiter(mapping, index, entry,
......@@ -454,6 +511,7 @@ static int __dax_invalidate_mapping_entry(struct address_space *mapping,
(radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_DIRTY) ||
radix_tree_tag_get(page_tree, index, PAGECACHE_TAG_TOWRITE)))
goto out;
dax_disassociate_entry(entry, mapping, trunc);
radix_tree_delete(page_tree, index);
mapping->nrexceptional--;
ret = 1;
......@@ -526,12 +584,13 @@ static int copy_user_dax(struct block_device *bdev, struct dax_device *dax_dev,
*/
static void *dax_insert_mapping_entry(struct address_space *mapping,
struct vm_fault *vmf,
void *entry, sector_t sector,
void *entry, pfn_t pfn_t,
unsigned long flags, bool dirty)
{
struct radix_tree_root *page_tree = &mapping->page_tree;
void *new_entry;
unsigned long pfn = pfn_t_to_pfn(pfn_t);
pgoff_t index = vmf->pgoff;
void *new_entry;
if (dirty)
__mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
......@@ -546,7 +605,11 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
}
spin_lock_irq(&mapping->tree_lock);
new_entry = dax_radix_locked_entry(sector, flags);
new_entry = dax_radix_locked_entry(pfn, flags);
if (dax_entry_size(entry) != dax_entry_size(new_entry)) {
dax_disassociate_entry(entry, mapping, false);
dax_associate_entry(new_entry, mapping);
}
if (dax_is_zero_entry(entry) || dax_is_empty_entry(entry)) {
/*
......@@ -657,17 +720,14 @@ static void dax_mapping_entry_mkclean(struct address_space *mapping,
i_mmap_unlock_read(mapping);
}
static int dax_writeback_one(struct block_device *bdev,
struct dax_device *dax_dev, struct address_space *mapping,
pgoff_t index, void *entry)
static int dax_writeback_one(struct dax_device *dax_dev,
struct address_space *mapping, pgoff_t index, void *entry)
{
struct radix_tree_root *page_tree = &mapping->page_tree;
void *entry2, **slot, *kaddr;
long ret = 0, id;
sector_t sector;
pgoff_t pgoff;
void *entry2, **slot;
unsigned long pfn;
long ret = 0;
size_t size;
pfn_t pfn;
/*
* A page got tagged dirty in DAX mapping? Something is seriously
......@@ -683,10 +743,10 @@ static int dax_writeback_one(struct block_device *bdev,
goto put_unlocked;
/*
* Entry got reallocated elsewhere? No need to writeback. We have to
* compare sectors as we must not bail out due to difference in lockbit
* compare pfns as we must not bail out due to difference in lockbit
* or entry type.
*/
if (dax_radix_sector(entry2) != dax_radix_sector(entry))
if (dax_radix_pfn(entry2) != dax_radix_pfn(entry))
goto put_unlocked;
if (WARN_ON_ONCE(dax_is_empty_entry(entry) ||
dax_is_zero_entry(entry))) {
......@@ -712,33 +772,15 @@ static int dax_writeback_one(struct block_device *bdev,
/*
* Even if dax_writeback_mapping_range() was given a wbc->range_start
* in the middle of a PMD, the 'index' we are given will be aligned to
* the start index of the PMD, as will the sector we pull from
* 'entry'. This allows us to flush for PMD_SIZE and not have to
* worry about partial PMD writebacks.
* the start index of the PMD, as will the pfn we pull from 'entry'.
* This allows us to flush for PMD_SIZE and not have to worry about
* partial PMD writebacks.
*/
sector = dax_radix_sector(entry);
pfn = dax_radix_pfn(entry);
size = PAGE_SIZE << dax_radix_order(entry);
id = dax_read_lock();
ret = bdev_dax_pgoff(bdev, sector, size, &pgoff);
if (ret)
goto dax_unlock;
/*
* dax_direct_access() may sleep, so cannot hold tree_lock over
* its invocation.
*/
ret = dax_direct_access(dax_dev, pgoff, size / PAGE_SIZE, &kaddr, &pfn);
if (ret < 0)
goto dax_unlock;
if (WARN_ON_ONCE(ret < size / PAGE_SIZE)) {
ret = -EIO;
goto dax_unlock;
}
dax_mapping_entry_mkclean(mapping, index, pfn_t_to_pfn(pfn));
dax_flush(dax_dev, kaddr, size);
dax_mapping_entry_mkclean(mapping, index, pfn);
dax_flush(dax_dev, page_address(pfn_to_page(pfn)), size);
/*
* After we have flushed the cache, we can clear the dirty tag. There
* cannot be new dirty data in the pfn after the flush has completed as
......@@ -749,8 +791,6 @@ static int dax_writeback_one(struct block_device *bdev,
radix_tree_tag_clear(page_tree, index, PAGECACHE_TAG_DIRTY);
spin_unlock_irq(&mapping->tree_lock);
trace_dax_writeback_one(mapping->host, index, size >> PAGE_SHIFT);
dax_unlock:
dax_read_unlock(id);
put_locked_mapping_entry(mapping, index);
return ret;
......@@ -808,8 +848,8 @@ int dax_writeback_mapping_range(struct address_space *mapping,
break;
}
ret = dax_writeback_one(bdev, dax_dev, mapping,
indices[i], pvec.pages[i]);
ret = dax_writeback_one(dax_dev, mapping, indices[i],
pvec.pages[i]);
if (ret < 0) {
mapping_set_error(mapping, ret);
goto out;
......@@ -877,6 +917,7 @@ static int dax_load_hole(struct address_space *mapping, void *entry,
int ret = VM_FAULT_NOPAGE;
struct page *zero_page;
void *entry2;
pfn_t pfn;
zero_page = ZERO_PAGE(0);
if (unlikely(!zero_page)) {
......@@ -884,14 +925,15 @@ static int dax_load_hole(struct address_space *mapping, void *entry,
goto out;
}
entry2 = dax_insert_mapping_entry(mapping, vmf, entry, 0,
pfn = page_to_pfn_t(zero_page);
entry2 = dax_insert_mapping_entry(mapping, vmf, entry, pfn,
RADIX_DAX_ZERO_PAGE, false);
if (IS_ERR(entry2)) {
ret = VM_FAULT_SIGBUS;
goto out;
}
vm_insert_mixed(vmf->vma, vaddr, page_to_pfn_t(zero_page));
vm_insert_mixed(vmf->vma, vaddr, pfn);
out:
trace_dax_load_hole(inode, vmf, ret);
return ret;
......@@ -1200,8 +1242,7 @@ static int dax_iomap_pte_fault(struct vm_fault *vmf, pfn_t *pfnp,
if (error < 0)
goto error_finish_iomap;
entry = dax_insert_mapping_entry(mapping, vmf, entry,
dax_iomap_sector(&iomap, pos),
entry = dax_insert_mapping_entry(mapping, vmf, entry, pfn,
0, write && !sync);
if (IS_ERR(entry)) {
error = PTR_ERR(entry);
......@@ -1280,13 +1321,15 @@ static int dax_pmd_load_hole(struct vm_fault *vmf, struct iomap *iomap,
void *ret = NULL;
spinlock_t *ptl;
pmd_t pmd_entry;
pfn_t pfn;
zero_page = mm_get_huge_zero_page(vmf->vma->vm_mm);
if (unlikely(!zero_page))
goto fallback;
ret = dax_insert_mapping_entry(mapping, vmf, entry, 0,
pfn = page_to_pfn_t(zero_page);
ret = dax_insert_mapping_entry(mapping, vmf, entry, pfn,
RADIX_DAX_PMD | RADIX_DAX_ZERO_PAGE, false);
if (IS_ERR(ret))
goto fallback;
......@@ -1409,8 +1452,7 @@ static int dax_iomap_pmd_fault(struct vm_fault *vmf, pfn_t *pfnp,
if (error < 0)
goto finish_iomap;
entry = dax_insert_mapping_entry(mapping, vmf, entry,
dax_iomap_sector(&iomap, pos),
entry = dax_insert_mapping_entry(mapping, vmf, entry, pfn,
RADIX_DAX_PMD, write && !sync);
if (IS_ERR(entry))
goto finish_iomap;
......
......@@ -814,6 +814,7 @@ extern const struct inode_operations ext2_file_inode_operations;
extern const struct file_operations ext2_file_operations;
/* inode.c */
extern void ext2_set_file_ops(struct inode *inode);
extern const struct address_space_operations ext2_aops;
extern const struct address_space_operations ext2_nobh_aops;
extern const struct iomap_ops ext2_iomap_ops;
......
......@@ -940,9 +940,6 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
loff_t offset = iocb->ki_pos;
ssize_t ret;
if (WARN_ON_ONCE(IS_DAX(inode)))
return -EIO;
ret = blockdev_direct_IO(iocb, inode, iter, ext2_get_block);
if (ret < 0 && iov_iter_rw(iter) == WRITE)
ext2_write_failed(mapping, offset + count);
......@@ -952,17 +949,16 @@ ext2_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
static int
ext2_writepages(struct address_space *mapping, struct writeback_control *wbc)
{
#ifdef CONFIG_FS_DAX
if (dax_mapping(mapping)) {
return dax_writeback_mapping_range(mapping,
mapping->host->i_sb->s_bdev,
wbc);
}
#endif
return mpage_writepages(mapping, wbc, ext2_get_block);
}
static int
ext2_dax_writepages(struct address_space *mapping, struct writeback_control *wbc)
{
return dax_writeback_mapping_range(mapping,
mapping->host->i_sb->s_bdev, wbc);
}
const struct address_space_operations ext2_aops = {
.readpage = ext2_readpage,
.readpages = ext2_readpages,
......@@ -990,6 +986,13 @@ const struct address_space_operations ext2_nobh_aops = {
.error_remove_page = generic_error_remove_page,
};
static const struct address_space_operations ext2_dax_aops = {
.writepages = ext2_dax_writepages,
.direct_IO = noop_direct_IO,
.set_page_dirty = noop_set_page_dirty,
.invalidatepage = noop_invalidatepage,
};
/*
* Probably it should be a library function... search for first non-zero word
* or memcmp with zero_page, whatever is better for particular architecture.
......@@ -1388,6 +1391,18 @@ void ext2_set_inode_flags(struct inode *inode)
inode->i_flags |= S_DAX;
}
void ext2_set_file_ops(struct inode *inode)
{
inode->i_op = &ext2_file_inode_operations;
inode->i_fop = &ext2_file_operations;
if (IS_DAX(inode))
inode->i_mapping->a_ops = &ext2_dax_aops;
else if (test_opt(inode->i_sb, NOBH))
inode->i_mapping->a_ops = &ext2_nobh_aops;
else
inode->i_mapping->a_ops = &ext2_aops;
}
struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
{
struct ext2_inode_info *ei;
......@@ -1480,14 +1495,7 @@ struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
ei->i_data[n] = raw_inode->i_block[n];
if (S_ISREG(inode->i_mode)) {
inode->i_op = &ext2_file_inode_operations;
if (test_opt(inode->i_sb, NOBH)) {
inode->i_mapping->a_ops = &ext2_nobh_aops;
inode->i_fop = &ext2_file_operations;
} else {
inode->i_mapping->a_ops = &ext2_aops;
inode->i_fop = &ext2_file_operations;
}
ext2_set_file_ops(inode);
} else if (S_ISDIR(inode->i_mode)) {
inode->i_op = &ext2_dir_inode_operations;
inode->i_fop = &ext2_dir_operations;
......
......@@ -107,14 +107,7 @@ static int ext2_create (struct inode * dir, struct dentry * dentry, umode_t mode
if (IS_ERR(inode))
return PTR_ERR(inode);
inode->i_op = &ext2_file_inode_operations;
if (test_opt(inode->i_sb, NOBH)) {
inode->i_mapping->a_ops = &ext2_nobh_aops;
inode->i_fop = &ext2_file_operations;
} else {
inode->i_mapping->a_ops = &ext2_aops;
inode->i_fop = &ext2_file_operations;
}
ext2_set_file_ops(inode);
mark_inode_dirty(inode);
return ext2_add_nondir(dentry, inode);
}
......@@ -125,14 +118,7 @@ static int ext2_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
if (IS_ERR(inode))
return PTR_ERR(inode);
inode->i_op = &ext2_file_inode_operations;
if (test_opt(inode->i_sb, NOBH)) {
inode->i_mapping->a_ops = &ext2_nobh_aops;
inode->i_fop = &ext2_file_operations;
} else {
inode->i_mapping->a_ops = &ext2_aops;
inode->i_fop = &ext2_file_operations;
}
ext2_set_file_ops(inode);
mark_inode_dirty(inode);
d_tmpfile(dentry, inode);
unlock_new_inode(inode);
......
......@@ -2716,12 +2716,6 @@ static int ext4_writepages(struct address_space *mapping,
percpu_down_read(&sbi->s_journal_flag_rwsem);
trace_ext4_writepages(inode, wbc);
if (dax_mapping(mapping)) {
ret = dax_writeback_mapping_range(mapping, inode->i_sb->s_bdev,
wbc);
goto out_writepages;
}
/*
* No pages to write? This is mainly a kludge to avoid starting
* a transaction for special inodes like journal inode on last iput()
......@@ -2942,6 +2936,27 @@ static int ext4_writepages(struct address_space *mapping,
return ret;
}
static int ext4_dax_writepages(struct address_space *mapping,
struct writeback_control *wbc)
{
int ret;
long nr_to_write = wbc->nr_to_write;
struct inode *inode = mapping->host;
struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
return -EIO;
percpu_down_read(&sbi->s_journal_flag_rwsem);
trace_ext4_writepages(inode, wbc);
ret = dax_writeback_mapping_range(mapping, inode->i_sb->s_bdev, wbc);
trace_ext4_writepages_result(inode, wbc, ret,
nr_to_write - wbc->nr_to_write);
percpu_up_read(&sbi->s_journal_flag_rwsem);
return ret;
}
static int ext4_nonda_switch(struct super_block *sb)
{
s64 free_clusters, dirty_clusters;
......@@ -3845,10 +3860,6 @@ static ssize_t ext4_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
if (ext4_has_inline_data(inode))
return 0;
/* DAX uses iomap path now */
if (WARN_ON_ONCE(IS_DAX(inode)))
return 0;
trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter));
if (iov_iter_rw(iter) == READ)
ret = ext4_direct_IO_read(iocb, iter);
......@@ -3934,6 +3945,13 @@ static const struct address_space_operations ext4_da_aops = {
.error_remove_page = generic_error_remove_page,
};
static const struct address_space_operations ext4_dax_aops = {
.writepages = ext4_dax_writepages,
.direct_IO = noop_direct_IO,
.set_page_dirty = noop_set_page_dirty,
.invalidatepage = noop_invalidatepage,
};
void ext4_set_aops(struct inode *inode)
{
switch (ext4_inode_journal_mode(inode)) {
......@@ -3946,7 +3964,9 @@ void ext4_set_aops(struct inode *inode)
default:
BUG();
}
if (test_opt(inode->i_sb, DELALLOC))
if (IS_DAX(inode))
inode->i_mapping->a_ops = &ext4_dax_aops;
else if (test_opt(inode->i_sb, DELALLOC))
inode->i_mapping->a_ops = &ext4_da_aops;
else
inode->i_mapping->a_ops = &ext4_aops;
......
......@@ -1060,6 +1060,45 @@ int noop_fsync(struct file *file, loff_t start, loff_t end, int datasync)
}
EXPORT_SYMBOL(noop_fsync);
int noop_set_page_dirty(struct page *page)
{
/*
* Unlike __set_page_dirty_no_writeback that handles dirty page
* tracking in the page object, dax does all dirty tracking in
* the inode address_space in response to mkwrite faults. In the
* dax case we only need to worry about potentially dirty CPU
* caches, not dirty page cache pages to write back.
*
* This callback is defined to prevent fallback to
* __set_page_dirty_buffers() in set_page_dirty().
*/
return 0;
}
EXPORT_SYMBOL_GPL(noop_set_page_dirty);
void noop_invalidatepage(struct page *page, unsigned int offset,
unsigned int length)
{
/*
* There is no page cache to invalidate in the dax case, however
* we need this callback defined to prevent falling back to
* block_invalidatepage() in do_invalidatepage().
*/
}
EXPORT_SYMBOL_GPL(noop_invalidatepage);
ssize_t noop_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
{
/*
* iomap based filesystems support direct I/O without need for
* this callback. However, it still needs to be set in
* inode->a_ops so that open/fcntl know that direct I/O is
* generally supported.
*/
return -EINVAL;
}
EXPORT_SYMBOL_GPL(noop_direct_IO);
/* Because kfree isn't assignment-compatible with void(void*) ;-/ */
void kfree_link(void *p)
{
......
......@@ -1195,16 +1195,22 @@ xfs_vm_writepages(
int ret;
xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
if (dax_mapping(mapping))
return dax_writeback_mapping_range(mapping,
xfs_find_bdev_for_inode(mapping->host), wbc);
ret = write_cache_pages(mapping, wbc, xfs_do_writepage, &wpc);
if (wpc.ioend)
ret = xfs_submit_ioend(wbc, wpc.ioend, ret);
return ret;
}
STATIC int
xfs_dax_writepages(
struct address_space *mapping,
struct writeback_control *wbc)
{
xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED);
return dax_writeback_mapping_range(mapping,
xfs_find_bdev_for_inode(mapping->host), wbc);
}
/*
* Called to move a page into cleanable state - and from there
* to be released. The page should already be clean. We always
......@@ -1367,17 +1373,6 @@ xfs_get_blocks(
return error;
}
STATIC ssize_t
xfs_vm_direct_IO(
struct kiocb *iocb,
struct iov_iter *iter)
{
/*
* We just need the method present so that open/fcntl allow direct I/O.
*/
return -EINVAL;
}
STATIC sector_t
xfs_vm_bmap(
struct address_space *mapping,
......@@ -1500,8 +1495,15 @@ const struct address_space_operations xfs_address_space_operations = {
.releasepage = xfs_vm_releasepage,
.invalidatepage = xfs_vm_invalidatepage,
.bmap = xfs_vm_bmap,
.direct_IO = xfs_vm_direct_IO,
.direct_IO = noop_direct_IO,
.migratepage = buffer_migrate_page,
.is_partially_uptodate = block_is_partially_uptodate,
.error_remove_page = generic_error_remove_page,
};
const struct address_space_operations xfs_dax_aops = {
.writepages = xfs_dax_writepages,
.direct_IO = noop_direct_IO,
.set_page_dirty = noop_set_page_dirty,
.invalidatepage = noop_invalidatepage,
};
......@@ -54,6 +54,7 @@ struct xfs_ioend {
};
extern const struct address_space_operations xfs_address_space_operations;
extern const struct address_space_operations xfs_dax_aops;
int xfs_setfilesize(struct xfs_inode *ip, xfs_off_t offset, size_t size);
......
......@@ -1285,7 +1285,10 @@ xfs_setup_iops(
case S_IFREG:
inode->i_op = &xfs_inode_operations;
inode->i_fop = &xfs_file_operations;
inode->i_mapping->a_ops = &xfs_address_space_operations;
if (IS_DAX(inode))
inode->i_mapping->a_ops = &xfs_dax_aops;
else
inode->i_mapping->a_ops = &xfs_address_space_operations;
break;
case S_IFDIR:
if (xfs_sb_version_hasasciici(&XFS_M(inode->i_sb)->m_sb))
......
......@@ -26,18 +26,42 @@ extern struct attribute_group dax_attribute_group;
#if IS_ENABLED(CONFIG_DAX)
struct dax_device *dax_get_by_host(const char *host);
struct dax_device *alloc_dax(void *private, const char *host,
const struct dax_operations *ops);
void put_dax(struct dax_device *dax_dev);
void kill_dax(struct dax_device *dax_dev);
void dax_write_cache(struct dax_device *dax_dev, bool wc);
bool dax_write_cache_enabled(struct dax_device *dax_dev);
#else
static inline struct dax_device *dax_get_by_host(const char *host)
{
return NULL;
}
static inline struct dax_device *alloc_dax(void *private, const char *host,
const struct dax_operations *ops)
{
/*
* Callers should check IS_ENABLED(CONFIG_DAX) to know if this
* NULL is an error or expected.
*/
return NULL;
}
static inline void put_dax(struct dax_device *dax_dev)
{
}
static inline void kill_dax(struct dax_device *dax_dev)
{
}
static inline void dax_write_cache(struct dax_device *dax_dev, bool wc)
{
}
static inline bool dax_write_cache_enabled(struct dax_device *dax_dev)
{
return false;
}
#endif
struct writeback_control;
int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
#if IS_ENABLED(CONFIG_FS_DAX)
int __bdev_dax_supported(struct super_block *sb, int blocksize);
......@@ -57,6 +81,8 @@ static inline void fs_put_dax(struct dax_device *dax_dev)
}
struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev);
int dax_writeback_mapping_range(struct address_space *mapping,
struct block_device *bdev, struct writeback_control *wbc);
#else
static inline int bdev_dax_supported(struct super_block *sb, int blocksize)
{
......@@ -76,22 +102,23 @@ static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
{
return NULL;
}
static inline int dax_writeback_mapping_range(struct address_space *mapping,
struct block_device *bdev, struct writeback_control *wbc)
{
return -EOPNOTSUPP;
}
#endif
int dax_read_lock(void);
void dax_read_unlock(int id);
struct dax_device *alloc_dax(void *private, const char *host,
const struct dax_operations *ops);
bool dax_alive(struct dax_device *dax_dev);
void kill_dax(struct dax_device *dax_dev);
void *dax_get_private(struct dax_device *dax_dev);
long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages,
void **kaddr, pfn_t *pfn);
size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr,
size_t bytes, struct iov_iter *i);
void dax_flush(struct dax_device *dax_dev, void *addr, size_t size);
void dax_write_cache(struct dax_device *dax_dev, bool wc);
bool dax_write_cache_enabled(struct dax_device *dax_dev);
ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter,
const struct iomap_ops *ops);
......@@ -121,7 +148,4 @@ static inline bool dax_mapping(struct address_space *mapping)
return mapping->host && IS_DAX(mapping->host);
}
struct writeback_control;
int dax_writeback_mapping_range(struct address_space *mapping,
struct block_device *bdev, struct writeback_control *wbc);
#endif
......@@ -3127,6 +3127,10 @@ extern int simple_rmdir(struct inode *, struct dentry *);
extern int simple_rename(struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
extern int noop_fsync(struct file *, loff_t, loff_t, int);
extern int noop_set_page_dirty(struct page *page);
extern void noop_invalidatepage(struct page *page, unsigned int offset,
unsigned int length);
extern ssize_t noop_direct_IO(struct kiocb *iocb, struct iov_iter *iter);
extern int simple_empty(struct dentry *);
extern int simple_readpage(struct file *file, struct page *page);
extern int simple_write_begin(struct file *file, struct address_space *mapping,
......
......@@ -76,12 +76,14 @@ typedef int (*ndctl_fn)(struct nvdimm_bus_descriptor *nd_desc,
struct nvdimm *nvdimm, unsigned int cmd, void *buf,
unsigned int buf_len, int *cmd_rc);
struct device_node;
struct nvdimm_bus_descriptor {
const struct attribute_group **attr_groups;
unsigned long bus_dsm_mask;
unsigned long cmd_mask;
struct module *module;
char *provider_name;
struct device_node *of_node;
ndctl_fn ndctl;
int (*flush_probe)(struct nvdimm_bus_descriptor *nd_desc);
int (*clear_to_send)(struct nvdimm_bus_descriptor *nd_desc,
......@@ -123,6 +125,7 @@ struct nd_region_desc {
int num_lanes;
int numa_node;
unsigned long flags;
struct device_node *of_node;
};
struct device;
......@@ -164,6 +167,7 @@ void nvdimm_bus_unregister(struct nvdimm_bus *nvdimm_bus);
struct nvdimm_bus *to_nvdimm_bus(struct device *dev);
struct nvdimm *to_nvdimm(struct device *dev);
struct nd_region *to_nd_region(struct device *dev);
struct device *nd_region_dev(struct nd_region *nd_region);
struct nd_blk_region *to_nd_blk_region(struct device *dev);
struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus);
struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus);
......
......@@ -180,6 +180,12 @@ struct nd_region;
void nvdimm_region_notify(struct nd_region *nd_region, enum nvdimm_event event);
int __must_check __nd_driver_register(struct nd_device_driver *nd_drv,
struct module *module, const char *mod_name);
static inline void nd_driver_unregister(struct nd_device_driver *drv)
{
driver_unregister(&drv->drv);
}
#define nd_driver_register(driver) \
__nd_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
#define module_nd_driver(driver) \
module_driver(driver, nd_driver_register, nd_driver_unregister)
#endif /* __LINUX_ND_H__ */
此差异已折叠。
......@@ -93,6 +93,7 @@ struct nd_cmd_ars_err_inj_stat {
#define ND_INTEL_FW_FINISH_UPDATE 15
#define ND_INTEL_FW_FINISH_QUERY 16
#define ND_INTEL_SMART_SET_THRESHOLD 17
#define ND_INTEL_SMART_INJECT 18
#define ND_INTEL_SMART_HEALTH_VALID (1 << 0)
#define ND_INTEL_SMART_SPARES_VALID (1 << 1)
......@@ -111,6 +112,10 @@ struct nd_cmd_ars_err_inj_stat {
#define ND_INTEL_SMART_NON_CRITICAL_HEALTH (1 << 0)
#define ND_INTEL_SMART_CRITICAL_HEALTH (1 << 1)
#define ND_INTEL_SMART_FATAL_HEALTH (1 << 2)
#define ND_INTEL_SMART_INJECT_MTEMP (1 << 0)
#define ND_INTEL_SMART_INJECT_SPARE (1 << 1)
#define ND_INTEL_SMART_INJECT_FATAL (1 << 2)
#define ND_INTEL_SMART_INJECT_SHUTDOWN (1 << 3)
struct nd_intel_smart {
__u32 status;
......@@ -158,6 +163,17 @@ struct nd_intel_smart_set_threshold {
__u32 status;
} __packed;
struct nd_intel_smart_inject {
__u64 flags;
__u8 mtemp_enable;
__u16 media_temperature;
__u8 spare_enable;
__u8 spares;
__u8 fatal_enable;
__u8 unsafe_shutdown_enable;
__u32 status;
} __packed;
#define INTEL_FW_STORAGE_SIZE 0x100000
#define INTEL_FW_MAX_SEND_LEN 0xFFEC
#define INTEL_FW_QUERY_INTERVAL 250000
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册