1. 29 12月, 2018 14 次提交
  2. 22 12月, 2018 3 次提交
    • O
      mm, page_alloc: fix has_unmovable_pages for HugePages · 17e2e7d7
      Oscar Salvador 提交于
      While playing with gigantic hugepages and memory_hotplug, I triggered
      the following #PF when "cat memoryX/removable":
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
        #PF error: [normal kernel read fault]
        PGD 0 P4D 0
        Oops: 0000 [#1] SMP PTI
        CPU: 1 PID: 1481 Comm: cat Tainted: G            E     4.20.0-rc6-mm1-1-default+ #18
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
        RIP: 0010:has_unmovable_pages+0x154/0x210
        Call Trace:
         is_mem_section_removable+0x7d/0x100
         removable_show+0x90/0xb0
         dev_attr_show+0x1c/0x50
         sysfs_kf_seq_show+0xca/0x1b0
         seq_read+0x133/0x380
         __vfs_read+0x26/0x180
         vfs_read+0x89/0x140
         ksys_read+0x42/0x90
         do_syscall_64+0x5b/0x180
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The reason is we do not pass the Head to page_hstate(), and so, the call
      to compound_order() in page_hstate() returns 0, so we end up checking
      all hstates's size to match PAGE_SIZE.
      
      Obviously, we do not find any hstate matching that size, and we return
      NULL.  Then, we dereference that NULL pointer in
      hugepage_migration_supported() and we got the #PF from above.
      
      Fix that by getting the head page before calling page_hstate().
      
      Also, since gigantic pages span several pageblocks, re-adjust the logic
      for skipping pages.  While are it, we can also get rid of the
      round_up().
      
      [osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal]
        Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de
      Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      17e2e7d7
    • P
      mm: thp: fix flags for pmd migration when split · 2e83ee1d
      Peter Xu 提交于
      When splitting a huge migrating PMD, we'll transfer all the existing PMD
      bits and apply them again onto the small PTEs.  However we are fetching
      the bits unconditionally via pmd_soft_dirty(), pmd_write() or
      pmd_yound() while actually they don't make sense at all when it's a
      migration entry.  Fix them up.  Since at it, drop the ifdef together as
      not needed.
      
      Note that if my understanding is correct about the problem then if
      without the patch there is chance to lose some of the dirty bits in the
      migrating pmd pages (on x86_64 we're fetching bit 11 which is part of
      swap offset instead of bit 2) and it could potentially corrupt the
      memory of an userspace program which depends on the dirty bit.
      
      Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.comSigned-off-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Souptick Joarder <jrdr.linux@gmail.com>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: <stable@vger.kernel.org>	[4.14+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e83ee1d
    • M
      mm, memory_hotplug: initialize struct pages for the full memory section · 2830bf6f
      Mikhail Zaslonko 提交于
      If memory end is not aligned with the sparse memory section boundary,
      the mapping of such a section is only partly initialized.  This may lead
      to VM_BUG_ON due to uninitialized struct page access from
      is_mem_section_removable() or test_pages_in_a_zone() function triggered
      by memory_hotplug sysfs handlers:
      
      Here are the the panic examples:
       CONFIG_DEBUG_VM=y
       CONFIG_DEBUG_VM_PGFLAGS=y
      
       kernel parameter mem=2050M
       --------------------------
       page:000003d082008000 is uninitialized and poisoned
       page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
       Call Trace:
       ( test_pages_in_a_zone+0xde/0x160)
         show_valid_zones+0x5c/0x190
         dev_attr_show+0x34/0x70
         sysfs_kf_seq_show+0xc8/0x148
         seq_read+0x204/0x480
         __vfs_read+0x32/0x178
         vfs_read+0x82/0x138
         ksys_read+0x5a/0xb0
         system_call+0xdc/0x2d8
       Last Breaking-Event-Address:
         test_pages_in_a_zone+0xde/0x160
       Kernel panic - not syncing: Fatal exception: panic_on_oops
      
       kernel parameter mem=3075M
       --------------------------
       page:000003d08300c000 is uninitialized and poisoned
       page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
       Call Trace:
       ( is_mem_section_removable+0xb4/0x190)
         show_mem_removable+0x9a/0xd8
         dev_attr_show+0x34/0x70
         sysfs_kf_seq_show+0xc8/0x148
         seq_read+0x204/0x480
         __vfs_read+0x32/0x178
         vfs_read+0x82/0x138
         ksys_read+0x5a/0xb0
         system_call+0xdc/0x2d8
       Last Breaking-Event-Address:
         is_mem_section_removable+0xb4/0x190
       Kernel panic - not syncing: Fatal exception: panic_on_oops
      
      Fix the problem by initializing the last memory section of each zone in
      memmap_init_zone() till the very end, even if it goes beyond the zone end.
      
      Michal said:
      
      : This has alwways been problem AFAIU.  It just went unnoticed because we
      : have zeroed memmaps during allocation before f7f99100 ("mm: stop
      : zeroing memory during allocation in vmemmap") and so the above test
      : would simply skip these ranges as belonging to zone 0 or provided a
      : garbage.
      :
      : So I guess we do care for post f7f99100 kernels mostly and
      : therefore Fixes: f7f99100 ("mm: stop zeroing memory during
      : allocation in vmemmap")
      
      Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com
      Fixes: f7f99100 ("mm: stop zeroing memory during allocation in vmemmap")
      Signed-off-by: NMikhail Zaslonko <zaslonko@linux.ibm.com>
      Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Suggested-by: NMichal Hocko <mhocko@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Tested-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2830bf6f
  3. 15 12月, 2018 3 次提交
  4. 11 12月, 2018 1 次提交
    • S
      mm: mmap: Allow for "high" userspace addresses · f6795053
      Steve Capper 提交于
      This patch adds support for "high" userspace addresses that are
      optionally supported on the system and have to be requested via a hint
      mechanism ("high" addr parameter to mmap).
      
      Architectures such as powerpc and x86 achieve this by making changes to
      their architectural versions of arch_get_unmapped_* functions. However,
      on arm64 we use the generic versions of these functions.
      
      Rather than duplicate the generic arch_get_unmapped_* implementations
      for arm64, this patch instead introduces two architectural helper macros
      and applies them to arch_get_unmapped_*:
       arch_get_mmap_end(addr) - get mmap upper limit depending on addr hint
       arch_get_mmap_base(addr, base) - get mmap_base depending on addr hint
      
      If these macros are not defined in architectural code then they default
      to (TASK_SIZE) and (base) so should not introduce any behavioural
      changes to architectures that do not define them.
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      f6795053
  5. 09 12月, 2018 1 次提交
    • D
      Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask" · 356ff8a9
      David Rientjes 提交于
      This reverts commit 89c83fb5.
      
      This should have been done as part of 2f0799a0 ("mm, thp: restore
      node-local hugepage allocations").  The movement of the thp allocation
      policy from alloc_pages_vma() to alloc_hugepage_direct_gfpmask() was
      intended to only set __GFP_THISNODE for mempolicies that are not
      MPOL_BIND whereas the revert could set this regardless of mempolicy.
      
      While the check for MPOL_BIND between alloc_hugepage_direct_gfpmask()
      and alloc_pages_vma() was racy, that has since been removed since the
      revert.  What is left is the possibility to use __GFP_THISNODE in
      policy_node() when it is unexpected because the special handling for
      hugepages in alloc_pages_vma()  was removed as part of the consolidation.
      
      Secondly, prior to 89c83fb5, alloc_pages_vma() implemented a somewhat
      different policy for hugepage allocations, which were allocated through
      alloc_hugepage_vma().  For hugepage allocations, if the allocating
      process's node is in the set of allowed nodes, allocate with
      __GFP_THISNODE for that node (for MPOL_PREFERRED, use that node with
      __GFP_THISNODE instead).  This was changed for shmem_alloc_hugepage() to
      allow fallback to other nodes in 89c83fb5 as it did for new_page() in
      mm/mempolicy.c which is functionally different behavior and removes the
      requirement to only allocate hugepages locally.
      
      So this commit does a full revert of 89c83fb5 instead of the partial
      revert that was done in 2f0799a0.  The result is the same thp
      allocation policy for 4.20 that was in 4.19.
      
      Fixes: 89c83fb5 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
      Fixes: 2f0799a0 ("mm, thp: restore node-local hugepage allocations")
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      356ff8a9
  6. 06 12月, 2018 2 次提交
    • M
      XArray: Add xa_cmpxchg_irq and xa_cmpxchg_bh · 55f3f7ea
      Matthew Wilcox 提交于
      These convenience wrappers match the other _irq and _bh wrappers we
      already have.  It turns out I'd already open-coded xa_cmpxchg_irq()
      in the shmem code, so convert that.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      55f3f7ea
    • D
      mm, thp: restore node-local hugepage allocations · 2f0799a0
      David Rientjes 提交于
      This is a full revert of ac5b2c18 ("mm: thp: relax __GFP_THISNODE for
      MADV_HUGEPAGE mappings") and a partial revert of 89c83fb5 ("mm, thp:
      consolidate THP gfp handling into alloc_hugepage_direct_gfpmask").
      
      By not setting __GFP_THISNODE, applications can allocate remote hugepages
      when the local node is fragmented or low on memory when either the thp
      defrag setting is "always" or the vma has been madvised with
      MADV_HUGEPAGE.
      
      Remote access to hugepages often has much higher latency than local pages
      of the native page size.  On Haswell, ac5b2c18 was shown to have a
      13.9% access regression after this commit for binaries that remap their
      text segment to be backed by transparent hugepages.
      
      The intent of ac5b2c18 is to address an issue where a local node is
      low on memory or fragmented such that a hugepage cannot be allocated.  In
      every scenario where this was described as a fix, there is abundant and
      unfragmented remote memory available to allocate from, even with a greater
      access latency.
      
      If remote memory is also low or fragmented, not setting __GFP_THISNODE was
      also measured on Haswell to have a 40% regression in allocation latency.
      
      Restore __GFP_THISNODE for thp allocations.
      
      Fixes: ac5b2c18 ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings")
      Fixes: 89c83fb5 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask")
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f0799a0
  7. 05 12月, 2018 1 次提交
    • M
      dax: Fix unlock mismatch with updated API · 27359fd6
      Matthew Wilcox 提交于
      Internal to dax_unlock_mapping_entry(), dax_unlock_entry() is used to
      store a replacement entry in the Xarray at the given xas-index with the
      DAX_LOCKED bit clear. When called, dax_unlock_entry() expects the unlocked
      value of the entry relative to the current Xarray state to be specified.
      
      In most contexts dax_unlock_entry() is operating in the same scope as
      the matched dax_lock_entry(). However, in the dax_unlock_mapping_entry()
      case the implementation needs to recall the original entry. In the case
      where the original entry is a 'pmd' entry it is possible that the pfn
      performed to do the lookup is misaligned to the value retrieved in the
      Xarray.
      
      Change the api to return the unlock cookie from dax_lock_page() and pass
      it to dax_unlock_page(). This fixes a bug where dax_unlock_page() was
      assuming that the page was PMD-aligned if the entry was a PMD entry with
      signatures like:
      
       WARNING: CPU: 38 PID: 1396 at fs/dax.c:340 dax_insert_entry+0x2b2/0x2d0
       RIP: 0010:dax_insert_entry+0x2b2/0x2d0
       [..]
       Call Trace:
        dax_iomap_pte_fault.isra.41+0x791/0xde0
        ext4_dax_huge_fault+0x16f/0x1f0
        ? up_read+0x1c/0xa0
        __do_fault+0x1f/0x160
        __handle_mm_fault+0x1033/0x1490
        handle_mm_fault+0x18b/0x3d0
      
      Link: https://lkml.kernel.org/r/20181130154902.GL10377@bombadil.infradead.org
      Fixes: 9f32d221 ("dax: Convert dax_lock_mapping_entry to XArray")
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      27359fd6
  8. 01 12月, 2018 15 次提交
新手
引导
客服 返回
顶部