1. 30 1月, 2016 8 次提交
    • T
      resource: Kill walk_iomem_res() · a8fc4253
      Toshi Kani 提交于
      walk_iomem_res_desc() replaced walk_iomem_res() and there is no
      caller to walk_iomem_res() any more. Kill it. Also remove @name
      from find_next_iomem_res() as it is no longer used.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NDave Young <dyoung@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hanjun Guo <hanjun.guo@linaro.org>
      Cc: Jakub Sitnicki <jsitnicki@gmail.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm <linux-mm@kvack.org>
      Link: http://lkml.kernel.org/r/1453841853-11383-17-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a8fc4253
    • T
      x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search · f0f4711a
      Toshi Kani 提交于
      Change the callers of walk_iomem_res() scanning for the
      following resources by name to use walk_iomem_res_desc()
      instead.
      
       "ACPI Tables"
       "ACPI Non-volatile Storage"
       "Persistent Memory (legacy)"
       "Crash kernel"
      
      Note, the caller of walk_iomem_res() with "GART" will be removed
      in a later patch.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chun-Yi <joeyli.kernel@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Lee, Chun-Yi <joeyli.kernel@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Minfei Huang <mnfhuang@gmail.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Takao Indoh <indou.takao@jp.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: kexec@lists.infradead.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1453841853-11383-15-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f0f4711a
    • T
      resource: Add walk_iomem_res_desc() · 3f33647c
      Toshi Kani 提交于
      Add a new interface, walk_iomem_res_desc(), which walks through
      the iomem table by identifying a target with @flags and @desc.
      This interface provides the same functionality as
      walk_iomem_res(), but does not use strcmp() to @name for better
      efficiency.
      
      walk_iomem_res() is deprecated and will be removed in a later
      patch.
      Requested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      [ Fixup comments. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hanjun Guo <hanjun.guo@linaro.org>
      Cc: Jakub Sitnicki <jsitnicki@gmail.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm <linux-mm@kvack.org>
      Link: http://lkml.kernel.org/r/1453841853-11383-14-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3f33647c
    • T
      memremap: Change region_intersects() to take @flags and @desc · 1c29f25b
      Toshi Kani 提交于
      Change region_intersects() to identify a target with @flags and
      @desc, instead of @name with strcmp().
      
      Change the callers of region_intersects(), memremap() and
      devm_memremap(), to set IORESOURCE_SYSTEM_RAM in @flags and
      IORES_DESC_NONE in @desc when searching System RAM.
      
      Also, export region_intersects() so that the ACPI EINJ error
      injection driver can call this function in a later patch.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jakub Sitnicki <jsitnicki@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm <linux-mm@kvack.org>
      Link: http://lkml.kernel.org/r/1453841853-11383-13-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1c29f25b
    • T
      resource: Change walk_system_ram() to use System RAM type · bd7e6cb3
      Toshi Kani 提交于
      Now that all System RAM resource entries have been initialized
      to IORESOURCE_SYSTEM_RAM type, change walk_system_ram_res() and
      walk_system_ram_range() to call find_next_iomem_res() by setting
      @res.flags to IORESOURCE_SYSTEM_RAM and @name to NULL. With this
      change, they walk through the iomem table to find System RAM
      ranges without the need to do strcmp() on the resource names.
      
      No functional change is made to the interfaces.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      [ Boris: fixup comments. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jakub Sitnicki <jsitnicki@gmail.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm <linux-mm@kvack.org>
      Link: http://lkml.kernel.org/r/1453841853-11383-11-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bd7e6cb3
    • T
      kexec: Set IORESOURCE_SYSTEM_RAM for System RAM · 1a085d07
      Toshi Kani 提交于
      Set proper ioresource flags and types for crash kernel
      reservation areas.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Minfei Huang <mnfhuang@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm <linux-mm@kvack.org>
      Link: http://lkml.kernel.org/r/1453841853-11383-8-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1a085d07
    • T
      resource: Add I/O resource descriptor · 43ee493b
      Toshi Kani 提交于
      walk_iomem_res() and region_intersects() still need to use
      strcmp() for searching a resource entry by @name in the iomem
      table.
      
      This patch introduces I/O resource descriptor 'desc' in struct
      resource for the iomem search interfaces. Drivers can assign
      their unique descriptor to a range when they support the search
      interfaces.
      
      Otherwise, 'desc' is set to IORES_DESC_NONE (0). This avoids
      changing most of the drivers as they typically allocate resource
      entries statically, or by calling alloc_resource(), kzalloc(),
      or alloc_bootmem_low(), which set the field to zero by default.
      A later patch will address some drivers that use kmalloc()
      without zero'ing the field.
      
      Also change release_mem_region_adjustable() to set 'desc' when
      its resource entry gets separated. Other resource interfaces are
      also changed to initialize 'desc' explicitly although
      alloc_resource() sets it to 0.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jakub Sitnicki <jsitnicki@gmail.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm <linux-mm@kvack.org>
      Link: http://lkml.kernel.org/r/1453841853-11383-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      43ee493b
    • T
      resource: Handle resource flags properly · a3650d53
      Toshi Kani 提交于
      I/O resource flags consist of I/O resource types and modifier
      bits. Therefore, checking an I/O resource type in 'flags' must
      be performed with a bitwise operation.
      
      Fix find_next_iomem_res() and region_intersects() that simply
      compare 'flags' against a given value.
      
      Also change __request_region() to set 'res->flags' from
      resource_type() and resource_ext_type() of the parent, so that
      children nodes will inherit the extended I/O resource type.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jakub Sitnicki <jsitnicki@gmail.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm <linux-mm@kvack.org>
      Link: http://lkml.kernel.org/r/1453841853-11383-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a3650d53
  2. 28 1月, 2016 1 次提交
    • A
      PM: APM_EMULATION does not depend on PM · 993e9fe1
      Arnd Bergmann 提交于
      The APM emulation code does multiple things, and some of them depend on
      PM_SLEEP, while the battery management does not. However, selecting
      the symbol like SHARPSL_PM does causes a Kconfig warning:
      
      warning: (SHARPSL_PM && PMAC_APM_EMU) selects APM_EMULATION which has unmet direct dependencies (PM && SYS_SUPPORTS_APM_EMULATION)
      
      From all I can tell, this is completely harmless, and we can simply allow
      APM_EMULATION to be enabled here, even if PM is not.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      993e9fe1
  3. 27 1月, 2016 1 次提交
  4. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  5. 22 1月, 2016 1 次提交
  6. 21 1月, 2016 14 次提交
  7. 20 1月, 2016 1 次提交
    • W
      pipe: limit the per-user amount of pages allocated in pipes · 759c0114
      Willy Tarreau 提交于
      On no-so-small systems, it is possible for a single process to cause an
      OOM condition by filling large pipes with data that are never read. A
      typical process filling 4000 pipes with 1 MB of data will use 4 GB of
      memory. On small systems it may be tricky to set the pipe max size to
      prevent this from happening.
      
      This patch makes it possible to enforce a per-user soft limit above
      which new pipes will be limited to a single page, effectively limiting
      them to 4 kB each, as well as a hard limit above which no new pipes may
      be created for this user. This has the effect of protecting the system
      against memory abuse without hurting other users, and still allowing
      pipes to work correctly though with less data at once.
      
      The limit are controlled by two new sysctls : pipe-user-pages-soft, and
      pipe-user-pages-hard. Both may be disabled by setting them to zero. The
      default soft limit allows the default number of FDs per process (1024)
      to create pipes of the default size (64kB), thus reaching a limit of 64MB
      before starting to create only smaller pipes. With 256 processes limited
      to 1024 FDs each, this results in 1024*64kB + (256*1024 - 1024) * 4kB =
      1084 MB of memory allocated for a user. The hard limit is disabled by
      default to avoid breaking existing applications that make intensive use
      of pipes (eg: for splicing).
      
      Reported-by: socketpair@gmail.com
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Mitigates: CVE-2013-4312 (Linux 2.0+)
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      759c0114
  8. 19 1月, 2016 1 次提交
  9. 17 1月, 2016 4 次提交
  10. 16 1月, 2016 8 次提交
    • D
      mm: bring in additional flag for fixup_user_fault to signal unlock · 4a9e1cda
      Dominik Dingel 提交于
      During Jason's work with postcopy migration support for s390 a problem
      regarding gmap faults was discovered.
      
      The gmap code will call fixup_user_fault which will end up always in
      handle_mm_fault.  Till now we never cared about retries, but as the
      userfaultfd code kind of relies on it.  this needs some fix.
      
      This patchset does not take care of the futex code.  I will now look
      closer at this.
      
      This patch (of 2):
      
      With the introduction of userfaultfd, kvm on s390 needs fixup_user_fault
      to pass in FAULT_FLAG_ALLOW_RETRY and give feedback if during the
      faulting we ever unlocked mmap_sem.
      
      This patch brings in the logic to handle retries as well as it cleans up
      the current documentation.  fixup_user_fault was not having the same
      semantics as filemap_fault.  It never indicated if a retry happened and
      so a caller wasn't able to handle that case.  So we now changed the
      behaviour to always retry a locked mmap_sem.
      Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a9e1cda
    • D
      mm, x86: get_user_pages() for dax mappings · 3565fce3
      Dan Williams 提交于
      A dax mapping establishes a pte with _PAGE_DEVMAP set when the driver
      has established a devm_memremap_pages() mapping, i.e.  when the pfn_t
      return from ->direct_access() has PFN_DEV and PFN_MAP set.  Later, when
      encountering _PAGE_DEVMAP during a page table walk we lookup and pin a
      struct dev_pagemap instance to keep the result of pfn_to_page() valid
      until put_page().
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3565fce3
    • D
      mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup · 5c2c2587
      Dan Williams 提交于
      get_dev_page() enables paths like get_user_pages() to pin a dynamically
      mapped pfn-range (devm_memremap_pages()) while the resulting struct page
      objects are in use.  Unlike get_page() it may fail if the device is, or
      is in the process of being, disabled.  While the initial lookup of the
      range may be an expensive list walk, the result is cached to speed up
      subsequent lookups which are likely to be in the same mapped range.
      
      devm_memremap_pages() now requires a reference counter to be specified
      at init time.  For pmem this means moving request_queue allocation into
      pmem_alloc() so the existing queue usage counter can track "device
      pages".
      
      ZONE_DEVICE pages always have an elevated count and will never be on an
      lru reclaim list.  That space in 'struct page' can be redirected for
      other uses, but for safety introduce a poison value that will always
      trip __list_add() to assert.  This allows half of the struct list_head
      storage to be reclaimed with some assurance to back up the assumption
      that the page count never goes to zero and a list_add() is never
      attempted.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c2c2587
    • D
      x86, mm: introduce vmem_altmap to augment vmemmap_populate() · 4b94ffdc
      Dan Williams 提交于
      In support of providing struct page for large persistent memory
      capacities, use struct vmem_altmap to change the default policy for
      allocating memory for the memmap array.  The default vmemmap_populate()
      allocates page table storage area from the page allocator.  Given
      persistent memory capacities relative to DRAM it may not be feasible to
      store the memmap in 'System Memory'.  Instead vmem_altmap represents
      pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf()
      requests.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b94ffdc
    • D
      mm: introduce find_dev_pagemap() · 9476df7d
      Dan Williams 提交于
      There are several scenarios where we need to retrieve and update
      metadata associated with a given devm_memremap_pages() mapping, and the
      only lookup key available is a pfn in the range:
      
      1/ We want to augment vmemmap_populate() (called via arch_add_memory())
         to allocate memmap storage from pre-allocated pages reserved by the
         device driver.  At vmemmap_alloc_block_buf() time it grabs device pages
         rather than page allocator pages.  This is in support of
         devm_memremap_pages() mappings where the memmap is too large to fit in
         main memory (i.e. large persistent memory devices).
      
      2/ Taking a reference against the mapping when inserting device pages
         into the address_space radix of a given inode.  This facilitates
         unmap_mapping_range() and truncate_inode_pages() operations when the
         driver is tearing down the mapping.
      
      3/ get_user_pages() operations on ZONE_DEVICE memory require taking a
         reference against the mapping so that the driver teardown path can
         revoke and drain usage of device pages.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9476df7d
    • D
      mm, dax, pmem: introduce pfn_t · 34c0fd54
      Dan Williams 提交于
      For the purpose of communicating the optional presence of a 'struct
      page' for the pfn returned from ->direct_access(), introduce a type that
      encapsulates a page-frame-number plus flags.  These flags contain the
      historical "page_link" encoding for a scatterlist entry, but can also
      denote "device memory".  Where "device memory" is a set of pfns that are
      not part of the kernel's linear mapping by default, but are accessed via
      the same memory controller as ram.
      
      The motivation for this new type is large capacity persistent memory
      that needs struct page entries in the 'memmap' to support 3rd party DMA
      (i.e.  O_DIRECT I/O with a persistent memory source/target).  However,
      we also need it in support of maintaining a list of mapped inodes which
      need to be unmapped at driver teardown or freeze_bdev() time.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34c0fd54
    • K
      futex, thp: remove special case for THP in get_futex_key · 14d27abd
      Kirill A. Shutemov 提交于
      With new THP refcounting, we don't need tricks to stabilize huge page.
      If we've got reference to tail page, it can't split under us.
      
      This patch effectively reverts a5b338f2 ("thp: update futex compound
      knowledge").
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Tested-by: NArtem Savkov <artem.savkov@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      14d27abd
    • K
      memcg: adjust to support new THP refcounting · f627c2f5
      Kirill A. Shutemov 提交于
      As with rmap, with new refcounting we cannot rely on PageTransHuge() to
      check if we need to charge size of huge page form the cgroup.  We need
      to get information from caller to know whether it was mapped with PMD or
      PTE.
      
      We do uncharge when last reference on the page gone.  At that point if
      we see PageTransHuge() it means we need to unchange whole huge page.
      
      The tricky part is partial unmap -- when we try to unmap part of huge
      page.  We don't do a special handing of this situation, meaning we don't
      uncharge the part of huge page unless last user is gone or
      split_huge_page() is triggered.  In case of cgroup memory pressure
      happens the partial unmapped page will be split through shrinker.  This
      should be good enough.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f627c2f5