1. 31 1月, 2016 1 次提交
    • A
      powerpc/book3s_32: Fix build error with checkpoint restart · 19f97c98
      Aneesh Kumar K.V 提交于
      In file included from mm/vmscan.c:54:0:
      include/linux/swapops.h: In function ‘pte_to_swp_entry’:
      include/linux/swapops.h:69:2: error: implicit declaration of function ‘pte_swp_soft_dirty’ [-Werror=implicit-function-declaration]
        if (pte_swp_soft_dirty(pte))
        ^
      include/linux/swapops.h:70:3: error: implicit declaration of function ‘pte_swp_clear_soft_dirty’ [-Werror=implicit-function-declaration]
         pte = pte_swp_clear_soft_dirty(pte);
      
      We support soft dirty tracking only with book3s 64 for now.
      So change the Kconfig dependency accordingly. Also CHECKPOINT_RESTORE
      feature is not really dependent on SOFT_DIRTY. We track the dependency
      between MEM_SOFT_DIRTY and ARCH_SOFT_DIRTY through headers
      
      Fixes: 7207f436 ("powerpc/mm: Add page soft dirty tracking")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      19f97c98
  2. 28 1月, 2016 2 次提交
    • A
      powerpc/mm: Fixup _HPAGE_CHG_MASK · 2d19fc63
      Aneesh Kumar K.V 提交于
      This was wrongly updated by commit 7aa9a23c ("powerpc, thp: remove
      infrastructure for handling splitting PMDs") during the last merge
      window. Fix it up.
      
      This could lead to incorrect behaviour in THP and/or mprotect(), at a
      minimum.
      
      Fixes: 7aa9a23c ("powerpc, thp: remove infrastructure for handling splitting PMDs")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2d19fc63
    • M
      powerpc/perf: Remove PPMU_HAS_SSLOT flag for Power8 · 370f06c8
      Madhavan Srinivasan 提交于
      Commit 7a786832 ("powerpc/perf: Add an explict flag indicating
      presence of SLOT field") introduced the PPMU_HAS_SSLOT flag to remove
      the assumption that MMCRA[SLOT] was present when PPMU_ALT_SIPR was not
      set.
      
      That commit's changelog also mentions that Power8 does not support
      MMCRA[SLOT]. However when the Power8 PMU support was merged, it
      errnoeously included the PPMU_HAS_SSLOT flag.
      
      So remove PPMU_HAS_SSLOT from the Power8 flags.
      
      mpe: On systems where MMCRA[SLOT] exists, the field occupies bits 37:39
      (IBM numbering). On Power8 bit 37 is reserved, and 38:39 overlap with
      the high bits of the Threshold Event Counter Mantissa. I am not aware of
      any published events which use the threshold counting mechanism, which
      would cause the mantissa bits to be set. So in practice this bug is
      unlikely to trigger.
      
      Fixes: e05b9b9e ("powerpc/perf: Power8 PMU support")
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      370f06c8
  3. 27 1月, 2016 1 次提交
  4. 25 1月, 2016 1 次提交
  5. 21 1月, 2016 3 次提交
    • S
      powerpc: Remove newly added extra definition of pmd_dirty · 0e2bce74
      Stephen Rothwell 提交于
      Commit d5d6a443 ("arch/powerpc/include/asm/pgtable-ppc64.h:
      add pmd_[dirty|mkclean] for THP") added a new identical definition
      of pmd_dirty(). Remove it again.
      
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0e2bce74
    • A
      powerpc: Simplify module TOC handling · c153693d
      Alan Modra 提交于
      PowerPC64 uses the symbol .TOC. much as other targets use
      _GLOBAL_OFFSET_TABLE_. It identifies the value of the GOT pointer (or in
      powerpc parlance, the TOC pointer). Global offset tables are generally
      local to an executable or shared library, or in the kernel, module. Thus
      it does not make sense for a module to resolve a relocation against
      .TOC. to the kernel's .TOC. value. A module has its own .TOC., and
      indeed the powerpc64 module relocation processing ignores the kernel
      value of .TOC. and instead calculates a module-local value.
      
      This patch removes code involved in exporting the kernel .TOC., tweaks
      modpost to ignore an undefined .TOC., and the module loader to twiddle
      the section symbol so that .TOC. isn't seen as undefined.
      
      Note that if the kernel was compiled with -msingle-pic-base then ELFv2
      would not have function global entry code setting up r2. In that case
      the module call stubs would need to be modified to set up r2 using the
      kernel .TOC. value, requiring some of this code to be reinstated.
      
      mpe: Furthermore a change in binutils master (not yet released) causes
      the current way we handle the TOC to no longer work when building with
      MODVERSIONS=y and RELOCATABLE=n. The symptom is that modules can not be
      loaded due to there being no version found for TOC.
      
      Cc: stable@vger.kernel.org # 3.16+
      Signed-off-by: NAlan Modra <amodra@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c153693d
    • C
      powerpc: Wire up copy_file_range() syscall · d7f9ee60
      Chandan Rajendra 提交于
      Test runs on a ppc64 BE guest succeeded using modified fstests.
      
      Also tested on ppc64 LE using a home made test - mpe.
      Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d7f9ee60
  6. 20 1月, 2016 2 次提交
  7. 19 1月, 2016 7 次提交
    • C
      numa: remove stale node_has_online_mem() define · 00d27c63
      Chris Metcalf 提交于
      This isn't used anywhere, so delete it.
      
      Looks like the last usage (in x86-specific code) was removed by Tejun
      in 2011 in commit bd6709a9 ("x86, NUMA: Make 32bit use common NUMA
      init path").
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      00d27c63
    • C
      arch/tile: move user_exit() to early kernel entry sequence · 1bb50cad
      Chris Metcalf 提交于
      This ensures that we always notify context tracking that we
      have exited from user space no matter how we enter the kernel.
      It is similar to how arm64 handles context tracking, for example.
      
      This allows the removal of all the exception_enter() calls that
      were added in commit 49e4e156 ("tile: support CONTEXT_TRACKING and
      thus NOHZ_FULL").
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      1bb50cad
    • C
      tile: fix bug in setting PT_FLAGS_DISABLE_IRQ on kernel entry · 9ce815ed
      Chris Metcalf 提交于
      This flag value is saved in ptregs and used to decide whether
      to disable irqs when returning from the kernel.  Commit 1168df528fe4
      ("tile: don't assume user privilege is zero") performed a bad
      merge from some KVM-enabled code that had not yet been upstreamed.
      
      The only issue with the old code is that we will read the interrupt
      mask in more conditions than we need to (e.g., coming from user
      space when user space has the Interrupt Critical Section bit set, or
      coming from a guest kernel), which is a slow multi-cycle operation.
      This change saves those few cycles in the common case.
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      9ce815ed
    • C
      tile: fix tilepro casts for readl, writel, etc · acfb699e
      Chris Metcalf 提交于
      Missing parentheses could cause an argument of the form
      "integer + pointer" to get cast to "(long)integer + pointer"
      and remain a pointer type, causing compiler warnings.
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      acfb699e
    • C
      tile: fix a -Wframe-larger-than warning · f3a26163
      Chris Metcalf 提交于
      The warning occurs in setup.c, where it is known that it can't be
      a problem, but it's still a good idea to silence the warning.
      The onstack array is converted from an s32 to a u8, which still
      is plenty of range for the values being managed there.
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      f3a26163
    • C
      tile: include the syscall number in the backtrace · 5ac65abd
      Chris Metcalf 提交于
      This information is easily available in the backtrace data and can
      be helpful when trying to figure out the backtrace, particularly
      if we're early in kernel entry or late in kernel exit.
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      5ac65abd
    • C
      arch/tile: adopt prepare_exit_to_usermode() model from x86 · 583b24a2
      Chris Metcalf 提交于
      This change is a prerequisite change for TASK_ISOLATION but also
      stands on its own for readability and maintainability.  The existing
      tile do_work_pending() was called in a loop from assembly on
      the slow path; this change moves the loop into C code as well.
      For the x86 version see commit c5c46f59 ("x86/entry: Add new,
      comprehensible entry and exit handlers written in C").
      
      This change exposes a pre-existing bug on the older tilepro platform;
      the singlestep processing is done last, but on tilepro (unlike tilegx)
      we enable interrupts while doing that processing, so we could in
      theory miss a signal or other asynchronous event.  A future change
      could fix this by breaking the singlestep work into a "prepare"
      step done in the main loop, and a "trigger" step done after exiting
      the loop.  Since this change is intended as purely a restructuring
      change, we call out the bug explicitly now, but don't yet fix it.
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      583b24a2
  8. 18 1月, 2016 2 次提交
  9. 17 1月, 2016 2 次提交
    • W
      Kconfig: remove HAVE_LATENCYTOP_SUPPORT · da48d094
      Will Deacon 提交于
      As illustrated by commit a3afe70b ("[S390] latencytop s390
      support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to
      advertise an implementation of save_stack_trace_tsk.
      
      However, as of 9212ddb5 ("stacktrace: provide save_stack_trace_tsk()
      weak alias") a dummy implementation is provided if STACKTRACE=y.  Given
      that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects
      STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Helge Deller <deller@gmx.de>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da48d094
    • H
      parisc: Protect huge page pte changes with spinlocks · b0e55131
      Helge Deller 提交于
      PA-RISC doesn't have atomic instructions to modify page table entries, so it
      takes spinlock in the TLB handler and modifies the page table entry
      non-atomically. If you modify the page table entry without the spinlock, you
      may race with TLB handler on another CPU and your modification may be lost.
      Protect against that with usage of purge_tlb_start() and purge_tlb_end() which
      handles the TLB spinlock.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # v4.4
      b0e55131
  10. 16 1月, 2016 19 次提交
    • D
      s390/mm: enable fixup_user_fault retrying · fef8953a
      Dominik Dingel 提交于
      By passing a non-null flag we allow fixup_user_fault to retry, which
      enables userfaultfd.  As during these retries we might drop the mmap_sem
      we need to check if that happened and redo the complete chain of
      actions.
      Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fef8953a
    • D
      mm: bring in additional flag for fixup_user_fault to signal unlock · 4a9e1cda
      Dominik Dingel 提交于
      During Jason's work with postcopy migration support for s390 a problem
      regarding gmap faults was discovered.
      
      The gmap code will call fixup_user_fault which will end up always in
      handle_mm_fault.  Till now we never cared about retries, but as the
      userfaultfd code kind of relies on it.  this needs some fix.
      
      This patchset does not take care of the futex code.  I will now look
      closer at this.
      
      This patch (of 2):
      
      With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault
      to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the
      faulting we ever unlocked mmap_sem.
      
      This patch brings in the logic to handle retries as well as it cleans up
      the current documentation.  fixup_user_fault was not having the same
      semantics as filemap_fault.  It never indicated if a retry happened and
      so a caller wasn't able to handle that case.  So we now changed the
      behaviour to always retry a locked mmap_sem.
      Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a9e1cda
    • D
      mm, x86: get_user_pages() for dax mappings · 3565fce3
      Dan Williams 提交于
      A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
      has established a devm_memremap_pages() mapping, i.e.  when the pfn_t
      return from ->direct_access() has PFN_DEV and PFN_MAP set.  Later, when
      encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
      struct dev_pagemap instance to keep the result of pfn_to_page() valid
      until put_page().
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3565fce3
    • D
      mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd · 5c7fb56e
      Dan Williams 提交于
      A dax-huge-page mapping while it uses some thp helpers is ultimately not
      a transparent huge page.  The distinction is especially important in the
      get_user_pages() path.  pmd_devmap() is used to distinguish dax-pmds
      from pmd_huge() and pmd_trans_huge() which have slightly different
      semantics.
      
      Explicitly mark the pmd_trans_huge() helpers that dax needs by adding
      pmd_devmap() checks.
      
      [kirill.shutemov@linux.intel.com: fix regression in handling mlocked pages in  __split_huge_pmd()]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c7fb56e
    • D
      mm, dax: convert vmf_insert_pfn_pmd() to pfn_t · f25748e3
      Dan Williams 提交于
      Similar to the conversion of vm_insert_mixed() use pfn_t in the
      vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the
      pfn is backed by a devm_memremap_pages() mapping.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f25748e3
    • D
      mm, dax, gpu: convert vm_insert_mixed to pfn_t · 01c8f1c4
      Dan Williams 提交于
      Convert the raw unsigned long 'pfn' argument to pfn_t for the purpose of
      evaluating the PFN_MAP and PFN_DEV flags.  When both are set it triggers
      _PAGE_DEVMAP to be set in the resulting pte.
      
      There are no functional changes to the gpu drivers as a result of this
      conversion.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: David Airlie <airlied@linux.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01c8f1c4
    • D
      x86, mm: introduce _PAGE_DEVMAP · 69660fd7
      Dan Williams 提交于
      _PAGE_DEVMAP is a hardware-unused pte bit that will later be used in the
      get_user_pages() path to identify pfns backed by the dynamic allocation
      established by devm_memremap_pages.  Upon seeing that bit the gup path
      will lookup and pin the allocation while the pages are in use.
      
      Since the _PAGE_DEVMAP bit is > 32 it must be cast to u64 instead of a
      pteval_t to allow pmd_flags() usage in the realmode boot code to build.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69660fd7
    • D
      frv: fix compiler warning from definition of __pmd() · 6d8113c7
      Dan Williams 提交于
      Take into account that the pmd_t type is a array inside a struct, so it
      needs two levels of brackets to initialize.  Otherwise, a usage of __pmd
      generates a warning:
      
        include/linux/mm.h:986:2: warning: missing braces around initializer [-Wmissing-braces]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d8113c7
    • D
      avr32: convert to asm-generic/memory_model.h · 083fc214
      Dan Williams 提交于
      Switch avr32/include/asm/page.h to use the common defintions for
      pfn_to_page(), page_to_pfn(), and ARCH_PFN_OFFSET.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      083fc214
    • D
      libnvdimm, pfn, pmem: allocate memmap array in persistent memory · d2c0f041
      Dan Williams 提交于
      Use the new vmem_altmap capability to enable the pmem driver to arrange
      for a struct page memmap to be established in persistent memory.
      
      [linux@roeck-us.net: mn10300: declare __pfn_to_phys() to fix build error]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2c0f041
    • D
      x86, mm: introduce vmem_altmap to augment vmemmap_populate() · 4b94ffdc
      Dan Williams 提交于
      In support of providing struct page for large persistent memory
      capacities, use struct vmem_altmap to change the default policy for
      allocating memory for the memmap array.  The default vmemmap_populate()
      allocates page table storage area from the page allocator.  Given
      persistent memory capacities relative to DRAM it may not be feasible to
      store the memmap in 'System Memory'.  Instead vmem_altmap represents
      pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
      requests.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b94ffdc
    • D
      mm, dax, pmem: introduce pfn_t · 34c0fd54
      Dan Williams 提交于
      For the purpose of communicating the optional presence of a 'struct
      page' for the pfn returned from ->direct_access(), introduce a type that
      encapsulates a page-frame-number plus flags.  These flags contain the
      historical "page_link" encoding for a scatterlist entry, but can also
      denote "device memory".  Where "device memory" is a set of pfns that are
      not part of the kernel's linear mapping by default, but are accessed via
      the same memory controller as ram.
      
      The motivation for this new type is large capacity persistent memory
      that needs struct page entries in the 'memmap' to support 3rd party DMA
      (i.e.  O_DIRECT I/O with a persistent memory source/target).  However,
      we also need it in support of maintaining a list of mapped inodes which
      need to be unmapped at driver teardown or freeze_bdev() time.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34c0fd54
    • D
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
    • D
      um: kill pfn_t · 16da3068
      Dan Williams 提交于
      The core has developed a need for a "pfn_t" type [1].  Convert the usage
      of pfn_t by usermode-linux to an unsigned long, and update pfn_to_phys()
      to drop its expectation of a typed pfn.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16da3068
    • D
      pmem, dax: clean up clear_pmem() · 52db400f
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 25):
      
      Both __dax_pmd_fault, and clear_pmem() were taking special steps to
      clear memory a page at a time to take advantage of non-temporal
      clear_page() implementations.  However, x86_64 does not use non-temporal
      instructions for clear_page(), and arch_clear_pmem() was always
      incurring the cost of __arch_wb_cache_pmem().
      
      Clean up the assumption that doing clear_pmem() a page at a time is more
      performant.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52db400f
    • M
      arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP · 05ee26d9
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_mkclean for THP page MADV_FREE support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05ee26d9
    • M
      arch/arm/include/asm/pgtable-3level.h: add pmd_mkclean for THP · 44842045
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_mkclean for THP page MADV_FREE support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44842045
    • M
      arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THP · d5d6a443
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5d6a443
    • M
      arch/sparc/include/asm/pgtable_64.h: add pmd_[dirty|mkclean] for THP · 79cedb8f
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79cedb8f