1. 17 8月, 2017 1 次提交
    • T
      x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages · ce0fa3e5
      Tony Luck 提交于
      Speculative processor accesses may reference any memory that has a
      valid page table entry.  While a speculative access won't generate
      a machine check, it will log the error in a machine check bank. That
      could cause escalation of a subsequent error since the overflow bit
      will be then set in the machine check bank status register.
      
      Code has to be double-plus-tricky to avoid mentioning the 1:1 virtual
      address of the page we want to map out otherwise we may trigger the
      very problem we are trying to avoid.  We use a non-canonical address
      that passes through the usual Linux table walking code to get to the
      same "pte".
      
      Thanks to Dave Hansen for reviewing several iterations of this.
      
      Also see:
      
        http://marc.info/?l=linux-mm&m=149860136413338&w=2Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Elliott, Robert (Persistent Memory) <elliott@hpe.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170816171803.28342-1-tony.luck@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ce0fa3e5
  2. 18 7月, 2017 2 次提交
    • T
      x86/mm: Add support to access boot related data in the clear · 8f716c9b
      Tom Lendacky 提交于
      Boot data (such as EFI related data) is not encrypted when the system is
      booted because UEFI/BIOS does not run with SME active. In order to access
      this data properly it needs to be mapped decrypted.
      
      Update early_memremap() to provide an arch specific routine to modify the
      pagetable protection attributes before they are applied to the new
      mapping. This is used to remove the encryption mask for boot related data.
      
      Update memremap() to provide an arch specific routine to determine if RAM
      remapping is allowed.  RAM remapping will cause an encrypted mapping to be
      generated. By preventing RAM remapping, ioremap_cache() will be used
      instead, which will provide a decrypted mapping of the boot related data.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/81fb6b4117a5df6b9f2eda342f81bbef4b23d2e5.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8f716c9b
    • T
      x86/mm: Extend early_memremap() support with additional attrs · f88a68fa
      Tom Lendacky 提交于
      Add early_memremap() support to be able to specify encrypted and
      decrypted mappings with and without write-protection. The use of
      write-protection is necessary when encrypting data "in place". The
      write-protect attribute is considered cacheable for loads, but not
      stores. This implies that the hardware will never give the core a
      dirty line with this memtype.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/479b5832c30fae3efa7932e48f81794e86397229.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f88a68fa
  3. 15 7月, 2017 1 次提交
  4. 13 7月, 2017 5 次提交
    • N
      writeback: rework wb_[dec|inc]_stat family of functions · 3e8f399d
      Nikolay Borisov 提交于
      Currently the writeback statistics code uses a percpu counters to hold
      various statistics.  Furthermore we have 2 families of functions - those
      which disable local irq and those which doesn't and whose names begin
      with double underscore.  However, they both end up calling
      __add_wb_stats which in turn calls percpu_counter_add_batch which is
      already irq-safe.
      
      Exploiting this fact allows to eliminated the __wb_* functions since
      they don't add any further protection than we already have.
      Furthermore, refactor the wb_* function to call __add_wb_stat directly
      without the irq-disabling dance.  This will likely result in better
      runtime of code which deals with modifying the stat counters.
      
      While at it also document why percpu_counter_add_batch is in fact
      preempt and irq-safe since at least 3 people got confused.
      
      Link: http://lkml.kernel.org/r/1498029937-27293-1-git-send-email-nborisov@suse.comSigned-off-by: NNikolay Borisov <nborisov@suse.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e8f399d
    • M
      mm, migration: do not trigger OOM killer when migrating memory · 0f556856
      Michal Hocko 提交于
      Page migration (for memory hotplug, soft_offline_page or mbind) needs to
      allocate a new memory.  This can trigger an oom killer if the target
      memory is depleated.  Although quite unlikely, still possible,
      especially for the memory hotplug (offlining of memoery).
      
      Up to now we didn't really have reasonable means to back off.
      __GFP_NORETRY can fail just too easily and __GFP_THISNODE sticks to a
      single node and that is not suitable for all callers.
      
      But now that we have __GFP_RETRY_MAYFAIL we should use it.  It is
      preferable to fail the migration than disrupt the system by killing some
      processes.
      
      Link: http://lkml.kernel.org/r/20170623085345.11304-7-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f556856
    • M
      mm: kvmalloc support __GFP_RETRY_MAYFAIL for all sizes · cc965a29
      Michal Hocko 提交于
      Now that __GFP_RETRY_MAYFAIL has a reasonable semantic regardless of the
      request size we can drop the hackish implementation for !costly orders.
      __GFP_RETRY_MAYFAIL retries as long as the reclaim makes a forward
      progress and backs of when we are out of memory for the requested size.
      Therefore we do not need to enforce__GFP_NORETRY for !costly orders just
      to silent the oom killer anymore.
      
      Link: http://lkml.kernel.org/r/20170623085345.11304-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc965a29
    • M
      mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic · dcda9b04
      Michal Hocko 提交于
      __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
      the page allocator.  This has been true but only for allocations
      requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
      ignored for smaller sizes.  This is a bit unfortunate because there is
      no way to express the same semantic for those requests and they are
      considered too important to fail so they might end up looping in the
      page allocator for ever, similarly to GFP_NOFAIL requests.
      
      Now that the whole tree has been cleaned up and accidental or misled
      usage of __GFP_REPEAT flag has been removed for !costly requests we can
      give the original flag a better name and more importantly a more useful
      semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
      that the allocator would try really hard but there is no promise of a
      success.  This will work independent of the order and overrides the
      default allocator behavior.  Page allocator users have several levels of
      guarantee vs.  cost options (take GFP_KERNEL as an example)
      
       - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
         attempt to free memory at all. The most light weight mode which even
         doesn't kick the background reclaim. Should be used carefully because
         it might deplete the memory and the next user might hit the more
         aggressive reclaim
      
       - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
         allocation without any attempt to free memory from the current
         context but can wake kswapd to reclaim memory if the zone is below
         the low watermark. Can be used from either atomic contexts or when
         the request is a performance optimization and there is another
         fallback for a slow path.
      
       - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
         non sleeping allocation with an expensive fallback so it can access
         some portion of memory reserves. Usually used from interrupt/bh
         context with an expensive slow path fallback.
      
       - GFP_KERNEL - both background and direct reclaim are allowed and the
         _default_ page allocator behavior is used. That means that !costly
         allocation requests are basically nofail but there is no guarantee of
         that behavior so failures have to be checked properly by callers
         (e.g. OOM killer victim is allowed to fail currently).
      
       - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
         and all allocation requests fail early rather than cause disruptive
         reclaim (one round of reclaim in this implementation). The OOM killer
         is not invoked.
      
       - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
         behavior and all allocation requests try really hard. The request
         will fail if the reclaim cannot make any progress. The OOM killer
         won't be triggered.
      
       - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
         and all allocation requests will loop endlessly until they succeed.
         This might be really dangerous especially for larger orders.
      
      Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
      because they already had their semantic.  No new users are added.
      __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
      there is no progress and we have already passed the OOM point.
      
      This means that all the reclaim opportunities have been exhausted except
      the most disruptive one (the OOM killer) and a user defined fallback
      behavior is more sensible than keep retrying in the page allocator.
      
      [akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
      [mhocko@suse.com: semantic fix]
        Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
      [mhocko@kernel.org: address other thing spotted by Vlastimil]
        Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Belits <alex.belits@cavium.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: NeilBrown <neilb@suse.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcda9b04
    • G
      mm/memory.c: mark create_huge_pmd() inline to prevent build failure · 91a90140
      Geert Uytterhoeven 提交于
      With gcc 4.1.2:
      
          mm/memory.o: In function `create_huge_pmd':
          memory.c:(.text+0x93e): undefined reference to `do_huge_pmd_anonymous_page'
      
      Interestingly, create_huge_pmd() is emitted in the assembler output, but
      never called.
      
      Converting transparent_hugepage_enabled() from a macro to a static
      inline function reduced the ability of the compiler to remove unused
      code.
      
      Fix this by marking create_huge_pmd() inline.
      
      Fixes: 16981d76 ("mm: improve readability of transparent_hugepage_enabled()")
      Link: http://lkml.kernel.org/r/1499842660-10665-1-git-send-email-geert@linux-m68k.orgSigned-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91a90140
  5. 11 7月, 2017 31 次提交