1. 10 3月, 2016 3 次提交
  2. 07 3月, 2016 1 次提交
  3. 06 3月, 2016 3 次提交
  4. 24 2月, 2016 1 次提交
    • A
      nvdimm: use 'u64' for pfn flags · c4544205
      Arnd Bergmann 提交于
      A recent bugfix changed pfn_t to always be 64-bit wide, but did not
      change the code in pmem.c, which is now broken on 32-bit architectures
      as reported by gcc:
      
      In file included from ../drivers/nvdimm/pmem.c:28:0:
      drivers/nvdimm/pmem.c: In function 'pmem_alloc':
      include/linux/pfn_t.h:15:17: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
       #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3))
      
      This changes the intermediate pfn_flags in struct pmem_device to
      be 64 bit wide as well, so they can store the flags correctly.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: db78c222 ("mm: fix pfn_t vs highmem")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c4544205
  5. 16 1月, 2016 6 次提交
    • D
      mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup · 5c2c2587
      Dan Williams 提交于
      get_dev_page() enables paths like get_user_pages() to pin a dynamically
      mapped pfn-range (devm_memremap_pages()) while the resulting struct page
      objects are in use.  Unlike get_page() it may fail if the device is, or
      is in the process of being, disabled.  While the initial lookup of the
      range may be an expensive list walk, the result is cached to speed up
      subsequent lookups which are likely to be in the same mapped range.
      
      devm_memremap_pages() now requires a reference counter to be specified
      at init time.  For pmem this means moving request_queue allocation into
      pmem_alloc() so the existing queue usage counter can track "device
      pages".
      
      ZONE_DEVICE pages always have an elevated count and will never be on an
      lru reclaim list.  That space in 'struct page' can be redirected for
      other uses, but for safety introduce a poison value that will always
      trip __list_add() to assert.  This allows half of the struct list_head
      storage to be reclaimed with some assurance to back up the assumption
      that the page count never goes to zero and a list_add() is never
      attempted.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c2c2587
    • D
      libnvdimm, pmem: move request_queue allocation earlier in probe · 468ded03
      Dan Williams 提交于
      Before the dynamically allocated struct pages from devm_memremap_pages()
      can be put to use outside the driver, we need a mechanism to track
      whether they are still in use at teardown.  Towards that goal reorder
      the initialization sequence to allow the 'q_usage_counter' from the
      request_queue to be used by the devm_memremap_pages() implementation (in
      subsequent patches).
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      468ded03
    • D
      libnvdimm, pfn, pmem: allocate memmap array in persistent memory · d2c0f041
      Dan Williams 提交于
      Use the new vmem_altmap capability to enable the pmem driver to arrange
      for a struct page memmap to be established in persistent memory.
      
      [linux@roeck-us.net: mn10300: declare __pfn_to_phys() to fix build error]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2c0f041
    • D
      x86, mm: introduce vmem_altmap to augment vmemmap_populate() · 4b94ffdc
      Dan Williams 提交于
      In support of providing struct page for large persistent memory
      capacities, use struct vmem_altmap to change the default policy for
      allocating memory for the memmap array.  The default vmemmap_populate()
      allocates page table storage area from the page allocator.  Given
      persistent memory capacities relative to DRAM it may not be feasible to
      store the memmap in 'System Memory'.  Instead vmem_altmap represents
      pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
      requests.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b94ffdc
    • D
      mm: introduce find_dev_pagemap() · 9476df7d
      Dan Williams 提交于
      There are several scenarios where we need to retrieve and update
      metadata associated with a given devm_memremap_pages() mapping, and the
      only lookup key available is a pfn in the range:
      
      1/ We want to augment vmemmap_populate() (called via arch_add_memory())
         to allocate memmap storage from pre-allocated pages reserved by the
         device driver.  At vmemmap_alloc_block_buf() time it grabs device pages
         rather than page allocator pages.  This is in support of
         devm_memremap_pages() mappings where the memmap is too large to fit in
         main memory (i.e. large persistent memory devices).
      
      2/ Taking a reference against the mapping when inserting device pages
         into the address_space radix of a given inode.  This facilitates
         unmap_mapping_range() and truncate_inode_pages() operations when the
         driver is tearing down the mapping.
      
      3/ get_user_pages() operations on ZONE_DEVICE memory require taking a
         reference against the mapping so that the driver teardown path can
         revoke and drain usage of device pages.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9476df7d
    • D
      mm, dax, pmem: introduce pfn_t · 34c0fd54
      Dan Williams 提交于
      For the purpose of communicating the optional presence of a 'struct
      page' for the pfn returned from ->direct_access(), introduce a type that
      encapsulates a page-frame-number plus flags.  These flags contain the
      historical "page_link" encoding for a scatterlist entry, but can also
      denote "device memory".  Where "device memory" is a set of pfns that are
      not part of the kernel's linear mapping by default, but are accessed via
      the same memory controller as ram.
      
      The motivation for this new type is large capacity persistent memory
      that needs struct page entries in the 'memmap' to support 3rd party DMA
      (i.e.  O_DIRECT I/O with a persistent memory source/target).  However,
      we also need it in support of maintaining a list of mapped inodes which
      need to be unmapped at driver teardown or freeze_bdev() time.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34c0fd54
  6. 10 1月, 2016 6 次提交
  7. 25 12月, 2015 1 次提交
  8. 14 12月, 2015 1 次提交
  9. 13 12月, 2015 1 次提交
  10. 11 12月, 2015 1 次提交
  11. 13 11月, 2015 1 次提交
  12. 08 11月, 2015 1 次提交
  13. 10 10月, 2015 3 次提交
  14. 17 9月, 2015 1 次提交
  15. 29 8月, 2015 2 次提交
    • D
      libnvdimm, pmem: direct map legacy pmem by default · 004f1afb
      Dan Williams 提交于
      The expectation is that the legacy / non-standard pmem discovery method
      (e820 type-12) will only ever be used to describe small quantities of
      persistent memory.  Larger capacities will be described via the ACPI
      NFIT.  When "allocate struct page from pmem" support is added this default
      policy can be overridden by assigning a legacy pmem namespace to a pfn
      device, however this would be only be necessary if a platform used the
      legacy mechanism to define a very large range.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      004f1afb
    • D
      libnvdimm, pmem: 'struct page' for pmem · 32ab0a3f
      Dan Williams 提交于
      Enable the pmem driver to handle PFN device instances.  Attaching a pmem
      namespace to a pfn device triggers the driver to allocate and initialize
      struct page entries for pmem.  Memory capacity for this allocation comes
      exclusively from RAM for now which is suitable for low PMEM to RAM
      ratios.  This mechanism will be expanded later for setting an "allocate
      from PMEM" policy.
      
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      32ab0a3f
  16. 28 8月, 2015 2 次提交
    • D
      x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB · 96601adb
      Dan Williams 提交于
      Given that a write-back (WB) mapping plus non-temporal stores is
      expected to be the most efficient way to access PMEM, update the
      definition of ARCH_HAS_PMEM_API to imply arch support for
      WB-mapped-PMEM.  This is needed as a pre-requisite for adding PMEM to
      the direct map and mapping it with struct page.
      
      The above clarification for X86_64 means that memcpy_to_pmem() is
      permitted to use the non-temporal arch_memcpy_to_pmem() rather than
      needlessly fall back to default_memcpy_to_pmem() when the pcommit
      instruction is not available.  When arch_memcpy_to_pmem() is not
      guaranteed to flush writes out of cache, i.e. on older X86_32
      implementations where non-temporal stores may just dirty cache,
      ARCH_HAS_PMEM_API is simply disabled.
      
      The default fall back for persistent memory handling remains.  Namely,
      map it with the WT (write-through) cache-type and hope for the best.
      
      arch_has_pmem_api() is updated to only indicate whether the arch
      provides the proper helpers to meet the minimum "writes are visible
      outside the cache hierarchy after memcpy_to_pmem() + wmb_pmem()".  Code
      that cares whether wmb_pmem() actually flushes writes to pmem must now
      call arch_has_wmb_pmem() directly.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      [hch: set ARCH_HAS_PMEM_API=n on x86_32]
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      [toshi: x86_32 compile fixes]
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      96601adb
    • D
      dax: drop size parameter to ->direct_access() · cb389b9c
      Dan Williams 提交于
      None of the implementations currently use it.  The common
      bdev_direct_access() entry point handles all the size checks before
      calling ->direct_access().
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      cb389b9c
  17. 21 8月, 2015 1 次提交
  18. 15 8月, 2015 1 次提交
  19. 29 7月, 2015 1 次提交
    • C
      block: add a bi_error field to struct bio · 4246a0b6
      Christoph Hellwig 提交于
      Currently we have two different ways to signal an I/O error on a BIO:
      
       (1) by clearing the BIO_UPTODATE flag
       (2) by returning a Linux errno value to the bi_end_io callback
      
      The first one has the drawback of only communicating a single possible
      error (-EIO), and the second one has the drawback of not beeing persistent
      when bios are queued up, and are not passed along from child to parent
      bio in the ever more popular chaining scenario.  Having both mechanisms
      available has the additional drawback of utterly confusing driver authors
      and introducing bugs where various I/O submitters only deal with one of
      them, and the others have to add boilerplate code to deal with both kinds
      of error returns.
      
      So add a new bi_error field to store an errno value directly in struct
      bio and remove the existing mechanisms to clean all this up.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      4246a0b6
  20. 28 7月, 2015 1 次提交
  21. 26 6月, 2015 2 次提交
    • R
      arch, x86: pmem api for ensuring durability of persistent memory updates · 61031952
      Ross Zwisler 提交于
      Based on an original patch by Ross Zwisler [1].
      
      Writes to persistent memory have the potential to be posted to cpu
      cache, cpu write buffers, and platform write buffers (memory controller)
      before being committed to persistent media.  Provide apis,
      memcpy_to_pmem(), wmb_pmem(), and memremap_pmem(), to write data to
      pmem and assert that it is durable in PMEM (a persistent linear address
      range).  A '__pmem' attribute is added so sparse can track proper usage
      of pointers to pmem.
      
      This continues the status quo of pmem being x86 only for 4.2, but
      reworks to ioremap, and wider implementation of memremap() will enable
      other archs in 4.3.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      [djbw: various reworks]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      61031952
    • D
      libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only · 58138820
      Dan Williams 提交于
      Upon detection of an unarmed dimm in a region, arrange for descendant
      BTT, PMEM, or BLK instances to be read-only.  A dimm is primarily marked
      "unarmed" via flags passed by platform firmware (NFIT).
      
      The flags in the NFIT memory device sub-structure indicate the state of
      the data on the nvdimm relative to its energy source or last "flush to
      persistence".  For the most part there is nothing the driver can do but
      advertise the state of these flags in sysfs and emit a message if
      firmware indicates that the contents of the device may be corrupted.
      However, for the case of ACPI_NFIT_MEM_ARMED, the driver can arrange for
      the block devices incorporating that nvdimm to be marked read-only.
      This is a safe default as the data is still available and new writes are
      held off until the administrator either forces read-write mode, or the
      energy source becomes armed.
      
      A 'read_only' attribute is added to REGION devices to allow for
      overriding the default read-only policy of all descendant block devices.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      58138820