1. 13 7月, 2016 1 次提交
  2. 12 7月, 2016 2 次提交
    • D
      libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() · 7e267a8c
      Dan Williams 提交于
      Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer
      chasing through nd_region), and that we otherwise assume a platform has
      ADR capability when flush hints are not present, move nvdimm_flush() to
      REQ_FLUSH context.
      
      Note that we still arrange for nvdimm_flush() to be called even in the
      ADR case. We need at least once wmb() fence to push buffered writes in
      the cpu out to the ADR protected domain.
      
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7e267a8c
    • D
      libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() · f284a4f2
      Dan Williams 提交于
      nvdimm_flush() is a replacement for the x86 'pcommit' instruction.  It is
      an optional write flushing mechanism that an nvdimm bus can provide for
      the pmem driver to consume.  In the case of the NFIT nvdimm-bus-provider
      nvdimm_flush() is implemented as a series of flush-hint-address [1]
      writes to each dimm in the interleave set (region) that backs the
      namespace.
      
      The nvdimm_has_flush() routine relies on platform firmware to describe
      the flushing capabilities of a platform.  It uses the heuristic of
      whether an nvdimm bus provider provides flush address data to return a
      ternary result:
      
            1: flush addresses defined
            0: dimm topology described without flush addresses (assume ADR)
       -errno: no topology information, unable to determine flush mechanism
      
      The pmem driver is expected to take the following actions on this ternary
      result:
      
            1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown
            0: do not set, WC or FUA on the queue, take no further action
       -errno: warn and then operate as if nvdimm_has_flush() returned '0'
      
      The caveat of this heuristic is that it can not distinguish the "dimm
      does not have flush address" case from the "platform firmware is broken
      and failed to describe a flush address".  Given we are already
      explicitly trusting the NFIT there's not much more we can do beyond
      blacklisting broken firmwares if they are ever encountered.
      
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f284a4f2
  3. 25 6月, 2016 1 次提交
    • D
      libnvdimm, pmem: allow nfit_test to override pmem_direct_access() · f295e53b
      Dan Williams 提交于
      Currently phys_to_pfn_t() is an exported symbol to allow nfit_test to
      override it and indicate that nfit_test-pmem is not device-mapped.  Now,
      we want to enable nfit_test to operate without DMA_CMA and the pmem it
      provides will no longer be physically contiguous, i.e. won't be capable
      of supporting direct_access requests larger than a page.  Make
      pmem_direct_access() a weak symbol so that it can be replaced by the
      tools/testing/nvdimm/ version, and move phys_to_pfn_t() to a static
      inline now that it no longer needs to be overridden.
      Acked-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      f295e53b
  4. 16 6月, 2016 1 次提交
  5. 21 5月, 2016 1 次提交
  6. 19 5月, 2016 1 次提交
  7. 07 5月, 2016 1 次提交
  8. 01 5月, 2016 1 次提交
    • D
      libnvdimm, pfn: fix memmap reservation sizing · 658922e5
      Dan Williams 提交于
      When configuring a pfn-device instance to allocate the memmap array it
      needs to account for the fact that vmemmap_populate_hugepages()
      allocates struct page blocks in HPAGE_SIZE chunks.  We need to align the
      reserved area size to 2MB otherwise arch_add_memory() runs out of memory
      while establishing the memmap:
      
       WARNING: CPU: 0 PID: 496 at arch/x86/mm/init_64.c:704 arch_add_memory+0xe7/0xf0
       [..]
       Call Trace:
        [<ffffffff8148bdb3>] dump_stack+0x85/0xc2
        [<ffffffff810a749b>] __warn+0xcb/0xf0
        [<ffffffff810a75cd>] warn_slowpath_null+0x1d/0x20
        [<ffffffff8106a497>] arch_add_memory+0xe7/0xf0
        [<ffffffff811d2097>] devm_memremap_pages+0x287/0x450
        [<ffffffff811d1ffa>] ? devm_memremap_pages+0x1ea/0x450
        [<ffffffffa0000298>] __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap]
        [<ffffffffa0047a58>] pmem_attach_disk+0x318/0x420 [nd_pmem]
        [<ffffffffa0047bcf>] nd_pmem_probe+0x6f/0x90 [nd_pmem]
        [<ffffffffa0009469>] nvdimm_bus_probe+0x69/0x110 [libnvdimm]
       [..]
        ndbus0: nd_pmem.probe(pfn3.0) = -12
       nd_pmem: probe of pfn3.0 failed with error -12
      libndctl: ndctl_pfn_enable: pfn3.0: failed to enable
      Reported-by: NNamratha Kothapalli <namratha.n.kothapalli@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      658922e5
  9. 23 4月, 2016 9 次提交
  10. 16 4月, 2016 1 次提交
  11. 08 4月, 2016 1 次提交
    • D
      libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignment · a3901802
      Dan Williams 提交于
      When section alignment padding is in effect we need to shift / truncate
      the range that is queried for poison by the 'start_pad' or 'end_trunc'
      reservations.
      
      It's easiest if we just pass in an adjusted resource range rather than
      deriving it from the passed in namespace.  With the resource range
      resolution pushed out to the caller we can also push the
      namespace-to-region lookup to the caller and drop the implicit pmem-type
      assumption about the passed in namespace object.
      
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a3901802
  12. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  13. 29 3月, 2016 1 次提交
  14. 10 3月, 2016 3 次提交
  15. 07 3月, 2016 1 次提交
  16. 06 3月, 2016 3 次提交
  17. 24 2月, 2016 1 次提交
    • A
      nvdimm: use 'u64' for pfn flags · c4544205
      Arnd Bergmann 提交于
      A recent bugfix changed pfn_t to always be 64-bit wide, but did not
      change the code in pmem.c, which is now broken on 32-bit architectures
      as reported by gcc:
      
      In file included from ../drivers/nvdimm/pmem.c:28:0:
      drivers/nvdimm/pmem.c: In function 'pmem_alloc':
      include/linux/pfn_t.h:15:17: error: large integer implicitly truncated to unsigned type [-Werror=overflow]
       #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3))
      
      This changes the intermediate pfn_flags in struct pmem_device to
      be 64 bit wide as well, so they can store the flags correctly.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: db78c222 ("mm: fix pfn_t vs highmem")
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c4544205
  18. 16 1月, 2016 6 次提交
    • D
      mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup · 5c2c2587
      Dan Williams 提交于
      get_dev_page() enables paths like get_user_pages() to pin a dynamically
      mapped pfn-range (devm_memremap_pages()) while the resulting struct page
      objects are in use.  Unlike get_page() it may fail if the device is, or
      is in the process of being, disabled.  While the initial lookup of the
      range may be an expensive list walk, the result is cached to speed up
      subsequent lookups which are likely to be in the same mapped range.
      
      devm_memremap_pages() now requires a reference counter to be specified
      at init time.  For pmem this means moving request_queue allocation into
      pmem_alloc() so the existing queue usage counter can track "device
      pages".
      
      ZONE_DEVICE pages always have an elevated count and will never be on an
      lru reclaim list.  That space in 'struct page' can be redirected for
      other uses, but for safety introduce a poison value that will always
      trip __list_add() to assert.  This allows half of the struct list_head
      storage to be reclaimed with some assurance to back up the assumption
      that the page count never goes to zero and a list_add() is never
      attempted.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c2c2587
    • D
      libnvdimm, pmem: move request_queue allocation earlier in probe · 468ded03
      Dan Williams 提交于
      Before the dynamically allocated struct pages from devm_memremap_pages()
      can be put to use outside the driver, we need a mechanism to track
      whether they are still in use at teardown.  Towards that goal reorder
      the initialization sequence to allow the 'q_usage_counter' from the
      request_queue to be used by the devm_memremap_pages() implementation (in
      subsequent patches).
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      468ded03
    • D
      libnvdimm, pfn, pmem: allocate memmap array in persistent memory · d2c0f041
      Dan Williams 提交于
      Use the new vmem_altmap capability to enable the pmem driver to arrange
      for a struct page memmap to be established in persistent memory.
      
      [linux@roeck-us.net: mn10300: declare __pfn_to_phys() to fix build error]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2c0f041
    • D
      x86, mm: introduce vmem_altmap to augment vmemmap_populate() · 4b94ffdc
      Dan Williams 提交于
      In support of providing struct page for large persistent memory
      capacities, use struct vmem_altmap to change the default policy for
      allocating memory for the memmap array.  The default vmemmap_populate()
      allocates page table storage area from the page allocator.  Given
      persistent memory capacities relative to DRAM it may not be feasible to
      store the memmap in 'System Memory'.  Instead vmem_altmap represents
      pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
      requests.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b94ffdc
    • D
      mm: introduce find_dev_pagemap() · 9476df7d
      Dan Williams 提交于
      There are several scenarios where we need to retrieve and update
      metadata associated with a given devm_memremap_pages() mapping, and the
      only lookup key available is a pfn in the range:
      
      1/ We want to augment vmemmap_populate() (called via arch_add_memory())
         to allocate memmap storage from pre-allocated pages reserved by the
         device driver.  At vmemmap_alloc_block_buf() time it grabs device pages
         rather than page allocator pages.  This is in support of
         devm_memremap_pages() mappings where the memmap is too large to fit in
         main memory (i.e. large persistent memory devices).
      
      2/ Taking a reference against the mapping when inserting device pages
         into the address_space radix of a given inode.  This facilitates
         unmap_mapping_range() and truncate_inode_pages() operations when the
         driver is tearing down the mapping.
      
      3/ get_user_pages() operations on ZONE_DEVICE memory require taking a
         reference against the mapping so that the driver teardown path can
         revoke and drain usage of device pages.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9476df7d
    • D
      mm, dax, pmem: introduce pfn_t · 34c0fd54
      Dan Williams 提交于
      For the purpose of communicating the optional presence of a 'struct
      page' for the pfn returned from ->direct_access(), introduce a type that
      encapsulates a page-frame-number plus flags.  These flags contain the
      historical "page_link" encoding for a scatterlist entry, but can also
      denote "device memory".  Where "device memory" is a set of pfns that are
      not part of the kernel's linear mapping by default, but are accessed via
      the same memory controller as ram.
      
      The motivation for this new type is large capacity persistent memory
      that needs struct page entries in the 'memmap' to support 3rd party DMA
      (i.e.  O_DIRECT I/O with a persistent memory source/target).  However,
      we also need it in support of maintaining a list of mapped inodes which
      need to be unmapped at driver teardown or freeze_bdev() time.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34c0fd54
  19. 10 1月, 2016 4 次提交