1. 16 1月, 2016 40 次提交
    • A
      mm/page_alloc.c: remove unused struct zone *z variable · f16f091b
      Alexander Kuleshov 提交于
      Remove unused struct zone *z variable which appeared in 86051ca5
      ("mm: fix usemap initialization").
      Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f16f091b
    • W
      mm/mlock.c: change can_do_mlock return value type to boolean · 7f43add4
      Wang Xiaoqiang 提交于
      Since can_do_mlock only return 1 or 0, so make it boolean.
      
      No functional change.
      
      [akpm@linux-foundation.org: update declaration in mm.h]
      Signed-off-by: NWang Xiaoqiang <wangxq10@lzu.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f43add4
    • W
      mm/vmalloc.c: use macro IS_ALIGNED to judge the aligment · 61e16557
      Wang Xiaoqiang 提交于
      Just cleanup, no functional change.
      Signed-off-by: NWang Xiaoqiang <wangxq10@lzu.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61e16557
    • T
      cgroup, memcg, writeback: drop spurious rcu locking around mem_cgroup_css_from_page() · 654a0dd0
      Tejun Heo 提交于
      In earlier versions, mem_cgroup_css_from_page() could return non-root
      css on a legacy hierarchy which can go away and required rcu locking;
      however, the eventual version simply returns the root cgroup if memcg is
      on a legacy hierarchy and thus doesn't need rcu locking around or in it.
      Remove spurious rcu lockings.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      654a0dd0
    • W
      mm/page_isolation: do some cleanup in "undo_isolate_page_range" · 6f8d2b8a
      Wang Xiaoqiang 提交于
      Use "IS_ALIGNED" to judge the alignment, rather than directly judging.
      Signed-off-by: NWang Xiaoqiang <wang_xiaoq@126.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f8d2b8a
    • K
      memblock: fix section mismatch · 036fbb21
      Kirill A. Shutemov 提交于
      allmodconfig produces following warning for me:
      
        WARNING: vmlinux.o(.text.unlikely+0x10314): Section mismatch in reference from the function movable_node_is_enabled() to the variable .meminit.data:movable_node_enabled
        The function movable_node_is_enabled() references
        the variable __meminitdata movable_node_enabled.
        This is often because movable_node_is_enabled lacks a __meminitdata
        annotation or the annotation of movable_node_enabled is wrong.
      
      Let's mark the function with __meminit.  It fixes the warning.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      036fbb21
    • D
      s390/mm: enable fixup_user_fault retrying · fef8953a
      Dominik Dingel 提交于
      By passing a non-null flag we allow fixup_user_fault to retry, which
      enables userfaultfd.  As during these retries we might drop the mmap_sem
      we need to check if that happened and redo the complete chain of
      actions.
      Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fef8953a
    • D
      mm: bring in additional flag for fixup_user_fault to signal unlock · 4a9e1cda
      Dominik Dingel 提交于
      During Jason's work with postcopy migration support for s390 a problem
      regarding gmap faults was discovered.
      
      The gmap code will call fixup_user_fault which will end up always in
      handle_mm_fault.  Till now we never cared about retries, but as the
      userfaultfd code kind of relies on it.  this needs some fix.
      
      This patchset does not take care of the futex code.  I will now look
      closer at this.
      
      This patch (of 2):
      
      With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault
      to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the
      faulting we ever unlocked mmap_sem.
      
      This patch brings in the logic to handle retries as well as it cleans up
      the current documentation.  fixup_user_fault was not having the same
      semantics as filemap_fault.  It never indicated if a retry happened and
      so a caller wasn't able to handle that case.  So we now changed the
      behaviour to always retry a locked mmap_sem.
      Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a9e1cda
    • D
      dax: re-enable dax pmd mappings · c046c321
      Dan Williams 提交于
      Now that the get_user_pages() path knows how to handle dax-pmd mappings,
      remove the protections that disabled dax-pmd support.
      
      Tests available from github.com/pmem/ndctl:
      
          make TESTS="lib/test-dax.sh lib/test-mmap.sh" check
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c046c321
    • D
      dax: provide diagnostics for pmd mapping failures · cbb38e41
      Dan Williams 提交于
      There is a wide gamut of conditions that can trigger the dax pmd path to
      fallback to pte mappings.  Ideally we'd have a syscall interface to
      determine mapping characteristics after the fact.  In the meantime
      provide debug messages.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Suggested-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbb38e41
    • D
      mm, x86: get_user_pages() for dax mappings · 3565fce3
      Dan Williams 提交于
      A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
      has established a devm_memremap_pages() mapping, i.e.  when the pfn_t
      return from ->direct_access() has PFN_DEV and PFN_MAP set.  Later, when
      encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
      struct dev_pagemap instance to keep the result of pfn_to_page() valid
      until put_page().
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3565fce3
    • D
      mm, dax: dax-pmd vs thp-pmd vs hugetlbfs-pmd · 5c7fb56e
      Dan Williams 提交于
      A dax-huge-page mapping while it uses some thp helpers is ultimately not
      a transparent huge page.  The distinction is especially important in the
      get_user_pages() path.  pmd_devmap() is used to distinguish dax-pmds
      from pmd_huge() and pmd_trans_huge() which have slightly different
      semantics.
      
      Explicitly mark the pmd_trans_huge() helpers that dax needs by adding
      pmd_devmap() checks.
      
      [kirill.shutemov@linux.intel.com: fix regression in handling mlocked pages in  __split_huge_pmd()]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c7fb56e
    • D
      mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup · 5c2c2587
      Dan Williams 提交于
      get_dev_page() enables paths like get_user_pages() to pin a dynamically
      mapped pfn-range (devm_memremap_pages()) while the resulting struct page
      objects are in use.  Unlike get_page() it may fail if the device is, or
      is in the process of being, disabled.  While the initial lookup of the
      range may be an expensive list walk, the result is cached to speed up
      subsequent lookups which are likely to be in the same mapped range.
      
      devm_memremap_pages() now requires a reference counter to be specified
      at init time.  For pmem this means moving request_queue allocation into
      pmem_alloc() so the existing queue usage counter can track "device
      pages".
      
      ZONE_DEVICE pages always have an elevated count and will never be on an
      lru reclaim list.  That space in 'struct page' can be redirected for
      other uses, but for safety introduce a poison value that will always
      trip __list_add() to assert.  This allows half of the struct list_head
      storage to be reclaimed with some assurance to back up the assumption
      that the page count never goes to zero and a list_add() is never
      attempted.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c2c2587
    • D
      libnvdimm, pmem: move request_queue allocation earlier in probe · 468ded03
      Dan Williams 提交于
      Before the dynamically allocated struct pages from devm_memremap_pages()
      can be put to use outside the driver, we need a mechanism to track
      whether they are still in use at teardown.  Towards that goal reorder
      the initialization sequence to allow the 'q_usage_counter' from the
      request_queue to be used by the devm_memremap_pages() implementation (in
      subsequent patches).
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      468ded03
    • D
      mm, dax: convert vmf_insert_pfn_pmd() to pfn_t · f25748e3
      Dan Williams 提交于
      Similar to the conversion of vm_insert_mixed() use pfn_t in the
      vmf_insert_pfn_pmd() to tag the resulting pte with _PAGE_DEVICE when the
      pfn is backed by a devm_memremap_pages() mapping.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f25748e3
    • D
      mm, dax, gpu: convert vm_insert_mixed to pfn_t · 01c8f1c4
      Dan Williams 提交于
      Convert the raw unsigned long 'pfn' argument to pfn_t for the purpose of
      evaluating the PFN_MAP and PFN_DEV flags.  When both are set it triggers
      _PAGE_DEVMAP to be set in the resulting pte.
      
      There are no functional changes to the gpu drivers as a result of this
      conversion.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: David Airlie <airlied@linux.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01c8f1c4
    • D
      x86, mm: introduce _PAGE_DEVMAP · 69660fd7
      Dan Williams 提交于
      _PAGE_DEVMAP is a hardware-unused pte bit that will later be used in the
      get_user_pages() path to identify pfns backed by the dynamic allocation
      established by devm_memremap_pages.  Upon seeing that bit the gup path
      will lookup and pin the allocation while the pages are in use.
      
      Since the _PAGE_DEVMAP bit is > 32 it must be cast to u64 instead of a
      pteval_t to allow pmd_flags() usage in the realmode boot code to build.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69660fd7
    • D
      frv: fix compiler warning from definition of __pmd() · 6d8113c7
      Dan Williams 提交于
      Take into account that the pmd_t type is a array inside a struct, so it
      needs two levels of brackets to initialize.  Otherwise, a usage of __pmd
      generates a warning:
      
        include/linux/mm.h:986:2: warning: missing braces around initializer [-Wmissing-braces]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d8113c7
    • D
      hugetlb: fix compile error on tile · 888cdbc2
      Dan Williams 提交于
      Inlude asm/pgtable.h to get the definition for pud_t to fix:
      
        include/linux/hugetlb.h:203:29: error: unknown type name 'pud_t'
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Liviu Dudau <liviu.dudau@arm.com>
      Cc: Sudeep Holla <sudeep.holla@arm.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      888cdbc2
    • D
      avr32: convert to asm-generic/memory_model.h · 083fc214
      Dan Williams 提交于
      Switch avr32/include/asm/page.h to use the common defintions for
      pfn_to_page(), page_to_pfn(), and ARCH_PFN_OFFSET.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      083fc214
    • D
      libnvdimm, pfn, pmem: allocate memmap array in persistent memory · d2c0f041
      Dan Williams 提交于
      Use the new vmem_altmap capability to enable the pmem driver to arrange
      for a struct page memmap to be established in persistent memory.
      
      [linux@roeck-us.net: mn10300: declare __pfn_to_phys() to fix build error]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2c0f041
    • D
      x86, mm: introduce vmem_altmap to augment vmemmap_populate() · 4b94ffdc
      Dan Williams 提交于
      In support of providing struct page for large persistent memory
      capacities, use struct vmem_altmap to change the default policy for
      allocating memory for the memmap array.  The default vmemmap_populate()
      allocates page table storage area from the page allocator.  Given
      persistent memory capacities relative to DRAM it may not be feasible to
      store the memmap in 'System Memory'.  Instead vmem_altmap represents
      pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
      requests.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b94ffdc
    • D
      mm: introduce find_dev_pagemap() · 9476df7d
      Dan Williams 提交于
      There are several scenarios where we need to retrieve and update
      metadata associated with a given devm_memremap_pages() mapping, and the
      only lookup key available is a pfn in the range:
      
      1/ We want to augment vmemmap_populate() (called via arch_add_memory())
         to allocate memmap storage from pre-allocated pages reserved by the
         device driver.  At vmemmap_alloc_block_buf() time it grabs device pages
         rather than page allocator pages.  This is in support of
         devm_memremap_pages() mappings where the memmap is too large to fit in
         main memory (i.e. large persistent memory devices).
      
      2/ Taking a reference against the mapping when inserting device pages
         into the address_space radix of a given inode.  This facilitates
         unmap_mapping_range() and truncate_inode_pages() operations when the
         driver is tearing down the mapping.
      
      3/ get_user_pages() operations on ZONE_DEVICE memory require taking a
         reference against the mapping so that the driver teardown path can
         revoke and drain usage of device pages.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9476df7d
    • D
      mm: skip memory block registration for ZONE_DEVICE · 260ae3f7
      Dan Williams 提交于
      Prevent userspace from trying and failing to online ZONE_DEVICE pages
      which are meant to never be onlined.
      
      For example on platforms with a udev rule like the following:
      
        SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"
      
      ...will generate futile attempts to online the ZONE_DEVICE sections.
      Example kernel messages:
      
          Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1004747
          Policy zone: Normal
          online_pages [mem 0x248000000-0x24fffffff] failed
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      260ae3f7
    • D
      mm, dax, pmem: introduce pfn_t · 34c0fd54
      Dan Williams 提交于
      For the purpose of communicating the optional presence of a 'struct
      page' for the pfn returned from ->direct_access(), introduce a type that
      encapsulates a page-frame-number plus flags.  These flags contain the
      historical "page_link" encoding for a scatterlist entry, but can also
      denote "device memory".  Where "device memory" is a set of pfns that are
      not part of the kernel's linear mapping by default, but are accessed via
      the same memory controller as ram.
      
      The motivation for this new type is large capacity persistent memory
      that needs struct page entries in the 'memmap' to support 3rd party DMA
      (i.e.  O_DIRECT I/O with a persistent memory source/target).  However,
      we also need it in support of maintaining a list of mapped inodes which
      need to be unmapped at driver teardown or freeze_bdev() time.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34c0fd54
    • D
      kvm: rename pfn_t to kvm_pfn_t · ba049e93
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 18):
      
      The core has developed a need for a "pfn_t" type [1].  Move the existing
      pfn_t in KVM to kvm_pfn_t [2].
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba049e93
    • D
      um: kill pfn_t · 16da3068
      Dan Williams 提交于
      The core has developed a need for a "pfn_t" type [1].  Convert the usage
      of pfn_t by usermode-linux to an unsigned long, and update pfn_to_phys()
      to drop its expectation of a typed pfn.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.htmlSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16da3068
    • T
      dax: Split pmd map when fallback on COW · 59bf4fb9
      Toshi Kani 提交于
      An infinite loop of PMD faults was observed when attempted to mlock() a
      private read-only PMD mmap'd range of a DAX file.
      
      __dax_pmd_fault() simply returns with VM_FAULT_FALLBACK when falling
      back to PTE on COW.  However, __handle_mm_fault() returns without
      falling back to handle_pte_fault() because a PMD map is present in this
      case.
      
      Change __dax_pmd_fault() to split the PMD map, if present, before
      returning with VM_FAULT_FALLBACK.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59bf4fb9
    • R
      mm, dax: fix livelock, allow dax pmd mappings to become writeable · 01871e59
      Ross Zwisler 提交于
      Prior to this change DAX PMD mappings that were made read-only were
      never able to be made writable again.  This is because the code in
      insert_pfn_pmd() that calls pmd_mkdirty() and pmd_mkwrite() would skip
      these calls if the PMD already existed in the page table.
      
      Instead, if we are doing a write always mark the PMD entry as dirty and
      writeable.  Without this code we can get into a condition where we mark
      the PMD as read-only, and then on a subsequent write fault we get into
      an infinite loop of PMD faults where we try unsuccessfully to make the
      PMD writeable.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Reported-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01871e59
    • D
      dax: fix lifetime of in-kernel dax mappings with dax_map_atomic() · b2e0d162
      Dan Williams 提交于
      The DAX implementation needs to protect new calls to ->direct_access()
      and usage of its return value against the driver for the underlying
      block device being disabled.  Use blk_queue_enter()/blk_queue_exit() to
      hold off blk_cleanup_queue() from proceeding, or otherwise fail new
      mapping requests if the request_queue is being torn down.
      
      This also introduces blk_dax_ctl to simplify the interface from fs/dax.c
      through dax_map_atomic() to bdev_direct_access().
      
      [willy@linux.intel.com: fix read() of a hole]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2e0d162
    • D
      dax: guarantee page aligned results from bdev_direct_access() · fe683ada
      Dan Williams 提交于
      If a ->direct_access() implementation ever returns a map count less than
      PAGE_SIZE, catch the error in bdev_direct_access().  This simplifies
      error checking in upper layers.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe683ada
    • D
      dax: increase granularity of dax_clear_blocks() operations · 0e749e54
      Dan Williams 提交于
      dax_clear_blocks is currently performing a cond_resched() after every
      PAGE_SIZE memset.  We need not check so frequently, for example md-raid
      only calls cond_resched() at stripe granularity.  Also, in preparation
      for introducing a dax_map_atomic() operation that temporarily pins a dax
      mapping move the call to cond_resched() to the outer loop.
      
      The worst case latency between calls to cond_resched() after this change
      is 500us the average latency is 133us.  This is up from a 10us max and
      4us average.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e749e54
    • D
      pmem, dax: clean up clear_pmem() · 52db400f
      Dan Williams 提交于
      To date, we have implemented two I/O usage models for persistent memory,
      PMEM (a persistent "ram disk") and DAX (mmap persistent memory into
      userspace).  This series adds a third, DAX-GUP, that allows DAX mappings
      to be the target of direct-i/o.  It allows userspace to coordinate
      DMA/RDMA from/to persistent memory.
      
      The implementation leverages the ZONE_DEVICE mm-zone that went into
      4.3-rc1 (also discussed at kernel summit) to flag pages that are owned
      and dynamically mapped by a device driver.  The pmem driver, after
      mapping a persistent memory range into the system memmap via
      devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus
      page-backed pmem-pfns via flags in the new pfn_t type.
      
      The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the
      resulting pte(s) inserted into the process page tables with a new
      _PAGE_DEVMAP flag.  Later, when get_user_pages() is walking ptes it keys
      off _PAGE_DEVMAP to pin the device hosting the page range active.
      Finally, get_page() and put_page() are modified to take references
      against the device driver established page mapping.
      
      Finally, this need for "struct page" for persistent memory requires
      memory capacity to store the memmap array.  Given the memmap array for a
      large pool of persistent may exhaust available DRAM introduce a
      mechanism to allocate the memmap from persistent memory.  The new
      "struct vmem_altmap *" parameter to devm_memremap_pages() enables
      arch_add_memory() to use reserved pmem capacity rather than the page
      allocator.
      
      This patch (of 25):
      
      Both __dax_pmd_fault, and clear_pmem() were taking special steps to
      clear memory a page at a time to take advantage of non-temporal
      clear_page() implementations.  However, x86_64 does not use non-temporal
      instructions for clear_page(), and arch_clear_pmem() was always
      incurring the cost of __arch_wb_cache_pmem().
      
      Clean up the assumption that doing clear_pmem() a page at a time is more
      performant.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52db400f
    • K
      thp: fix split_huge_page() after mremap() of THP · bd56086f
      Kirill A. Shutemov 提交于
      Sasha Levin has reported KASAN out-of-bounds bug[1].  It points to "if
      (!is_swap_pte(pte[i]))" in unfreeze_page_vma() as a problematic access.
      
      The cause is that split_huge_page() doesn't handle THP correctly if it's
      not allingned to PMD boundary.  It can happen after mremap().
      
      Test-case (not always triggers the bug):
      
      	#define _GNU_SOURCE
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <sys/mman.h>
      
      	#define MB (1024UL*1024)
      	#define SIZE (2*MB)
      	#define BASE ((void *)0x400000000000)
      
      	int main()
      	{
      		char *p;
      
      		p = mmap(BASE, SIZE, PROT_READ | PROT_WRITE,
      				MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE,
      				-1, 0);
      		if (p == MAP_FAILED)
      			perror("mmap"), exit(1);
      		p = mremap(BASE, SIZE, SIZE, MREMAP_FIXED | MREMAP_MAYMOVE,
      				BASE + SIZE + 8192);
      		if (p == MAP_FAILED)
      			perror("mremap"), exit(1);
      		system("echo 1 > /sys/kernel/debug/split_huge_pages");
      		return 0;
      	}
      
      The patch fixes freeze and unfreeze paths to handle page table boundary
      crossing.
      
      It also makes mapcount vs count check in split_huge_page_to_list()
      stricter:
       - after freeze we don't expect any subpage mapped as we remove them
         from rmap when setting up migration entries;
       - count must be 1, meaning only caller has reference to the page;
      
      [1] https://gist.github.com/sashalevin/c67fbea55e7c0576972aSigned-off-by: NKirill A.  Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd56086f
    • M
      mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called · b8d3c4c3
      Minchan Kim 提交于
      We don't need to split THP page when MADV_FREE syscall is called if
      [start, len] is aligned with THP size.  The split could be done when VM
      decide to free it in reclaim path if memory pressure is heavy.  With
      that, we could avoid unnecessary THP split.
      
      For the feature, this patch changes pte dirtness marking logic of THP.
      Now, it marks every ptes of pages dirty unconditionally in splitting,
      which makes MADV_FREE void.  So, instead, this patch propagates pmd
      dirtiness to all pages via PG_dirty and restores pte dirtiness from
      PG_dirty.  With this, if pmd is clean(ie, MADV_FREEed) when split
      happens(e,g, shrink_page_list), all of pages are clean too so we could
      discard them.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8d3c4c3
    • M
      arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP · 05ee26d9
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_mkclean for THP page MADV_FREE support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05ee26d9
    • M
      arch/arm/include/asm/pgtable-3level.h: add pmd_mkclean for THP · 44842045
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_mkclean for THP page MADV_FREE support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44842045
    • M
      arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THP · d5d6a443
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5d6a443
    • M
      arch/sparc/include/asm/pgtable_64.h: add pmd_[dirty|mkclean] for THP · 79cedb8f
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79cedb8f
    • M
      arch/x86/include/asm/pgtable.h: add pmd_[dirty|mkclean] for THP · 590a471c
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      590a471c