1. 14 4月, 2018 1 次提交
  2. 12 4月, 2018 10 次提交
    • H
      parisc: Prevent panic at system halt · 67698287
      Helge Deller 提交于
      When issuing a "shutdown -h now", the reboot syscall calls kernel_halt()
      which shouldn't return, otherwise one gets this panic:
      
      reboot: System halted
      Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
      CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.16.0-32bit+ #560
      Backtrace:
       [<1018a694>] show_stack+0x18/0x28
       [<106e68a8>] dump_stack+0x80/0x10c
       [<101a4df8>] panic+0xfc/0x290
       [<101a90b8>] do_exit+0x73c/0x914
       [<101c7e38>] SyS_reboot+0x190/0x1d4
       [<1017e444>] syscall_exit+0x0/0x14
      
      Fix it by letting machine_halt() call machine_power_off() which doesn't
      return.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      67698287
    • M
      page cache: use xa_lock · b93b0163
      Matthew Wilcox 提交于
      Remove the address_space ->tree_lock and use the xa_lock newly added to
      the radix_tree_root.  Rename the address_space ->page_tree to ->i_pages,
      since we don't really care that it's a tree.
      
      [willy@infradead.org: fix nds32, fs/dax.c]
        Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b93b0163
    • M
      unicore32: turn flush_dcache_mmap_lock into a no-op · d339d705
      Matthew Wilcox 提交于
      Unicore doesn't walk the VMA tree in its flush_dcache_page()
      implementation, so has no need to take the tree_lock.
      
      Link: http://lkml.kernel.org/r/20180313132639.17387-5-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d339d705
    • M
      arm64: turn flush_dcache_mmap_lock into a no-op · 427c896f
      Matthew Wilcox 提交于
      ARM64 doesn't walk the VMA tree in its flush_dcache_page()
      implementation, so has no need to take the tree_lock.
      
      Link: http://lkml.kernel.org/r/20180313132639.17387-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      427c896f
    • M
      linux/const.h: move UL() macro to include/linux/const.h · 2dd8a62c
      Masahiro Yamada 提交于
      ARM, ARM64 and UniCore32 duplicate the definition of UL():
      
        #define UL(x) _AC(x, UL)
      
      This is not actually arch-specific, so it will be useful to move it to a
      common header.  Currently, we only have the uapi variant for
      linux/const.h, so I am creating include/linux/const.h.
      
      I also added _UL(), _ULL() and ULL() because _AC() is mostly used in
      the form either _AC(..., UL) or _AC(..., ULL).  I expect they will be
      replaced in follow-up cleanups.  The underscore-prefixed ones should
      be used for exported headers.
      
      Link: http://lkml.kernel.org/r/1519301715-31798-4-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NGuan Xuetao <gxt@mprc.pku.edu.cn>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2dd8a62c
    • P
      xen, mm: allow deferred page initialization for xen pv domains · 6f84f8d1
      Pavel Tatashin 提交于
      Juergen Gross noticed that commit f7f99100 ("mm: stop zeroing memory
      during allocation in vmemmap") broke XEN PV domains when deferred struct
      page initialization is enabled.
      
      This is because the xen's PagePinned() flag is getting erased from
      struct pages when they are initialized later in boot.
      
      Juergen fixed this problem by disabling deferred pages on xen pv
      domains.  It is desirable, however, to have this feature available as it
      reduces boot time.  This fix re-enables the feature for pv-dmains, and
      fixes the problem the following way:
      
      The fix is to delay setting PagePinned flag until struct pages for all
      allocated memory are initialized, i.e.  until after free_all_bootmem().
      
      A new x86_init.hyper op init_after_bootmem() is called to let xen know
      that boot allocator is done, and hence struct pages for all the
      allocated memory are now initialized.  If deferred page initialization
      is enabled, the rest of struct pages are going to be initialized later
      in boot once page_alloc_init_late() is called.
      
      xen_after_bootmem() walks page table's pages and marks them pinned.
      
      Link: http://lkml.kernel.org/r/20180226160112.24724-2-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Jinbum Park <jinb.park7@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jia Zhang <zhang.jia@linux.alibaba.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f84f8d1
    • M
      mm: introduce MAP_FIXED_NOREPLACE · a4ff8e86
      Michal Hocko 提交于
      Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2.
      
      This has started as a follow up discussion [3][4] resulting in the
      runtime failure caused by hardening patch [5] which removes MAP_FIXED
      from the elf loader because MAP_FIXED is inherently dangerous as it
      might silently clobber an existing underlying mapping (e.g.  stack).
      The reason for the failure is that some architectures enforce an
      alignment for the given address hint without MAP_FIXED used (e.g.  for
      shared or file backed mappings).
      
      One way around this would be excluding those archs which do alignment
      tricks from the hardening [6].  The patch is really trivial but it has
      been objected, rightfully so, that this screams for a more generic
      solution.  We basically want a non-destructive MAP_FIXED.
      
      The first patch introduced MAP_FIXED_NOREPLACE which enforces the given
      address but unlike MAP_FIXED it fails with EEXIST if the given range
      conflicts with an existing one.  The flag is introduced as a completely
      new one rather than a MAP_FIXED extension because of the backward
      compatibility.  We really want a never-clobber semantic even on older
      kernels which do not recognize the flag.  Unfortunately mmap sucks
      wrt flags evaluation because we do not EINVAL on unknown flags.  On
      those kernels we would simply use the traditional hint based semantic so
      the caller can still get a different address (which sucks) but at least
      not silently corrupt an existing mapping.  I do not see a good way
      around that.  Except we won't export expose the new semantic to the
      userspace at all.
      
      It seems there are users who would like to have something like that.
      Jemalloc has been mentioned by Michael Ellerman [7]
      
      Florian Weimer has mentioned the following:
      : glibc ld.so currently maps DSOs without hints.  This means that the kernel
      : will map right next to each other, and the offsets between them a completely
      : predictable.  We would like to change that and supply a random address in a
      : window of the address space.  If there is a conflict, we do not want the
      : kernel to pick a non-random address. Instead, we would try again with a
      : random address.
      
      John Hubbard has mentioned CUDA example
      : a) Searches /proc/<pid>/maps for a "suitable" region of available
      : VA space.  "Suitable" generally means it has to have a base address
      : within a certain limited range (a particular device model might
      : have odd limitations, for example), it has to be large enough, and
      : alignment has to be large enough (again, various devices may have
      : constraints that lead us to do this).
      :
      : This is of course subject to races with other threads in the process.
      :
      : Let's say it finds a region starting at va.
      :
      : b) Next it does:
      :     p = mmap(va, ...)
      :
      : *without* setting MAP_FIXED, of course (so va is just a hint), to
      : attempt to safely reserve that region. If p != va, then in most cases,
      : this is a failure (almost certainly due to another thread getting a
      : mapping from that region before we did), and so this layer now has to
      : call munmap(), before returning a "failure: retry" to upper layers.
      :
      :     IMPROVEMENT: --> if instead, we could call this:
      :
      :             p = mmap(va, ... MAP_FIXED_NOREPLACE ...)
      :
      :         , then we could skip the munmap() call upon failure. This
      :         is a small thing, but it is useful here. (Thanks to Piotr
      :         Jaroszynski and Mark Hairgrove for helping me get that detail
      :         exactly right, btw.)
      :
      : c) After that, CUDA suballocates from p, via:
      :
      :      q = mmap(sub_region_start, ... MAP_FIXED ...)
      :
      : Interestingly enough, "freeing" is also done via MAP_FIXED, and
      : setting PROT_NONE to the subregion. Anyway, I just included (c) for
      : general interest.
      
      Atomic address range probing in the multithreaded programs in general
      sounds like an interesting thing to me.
      
      The second patch simply replaces MAP_FIXED use in elf loader by
      MAP_FIXED_NOREPLACE.  I believe other places which rely on MAP_FIXED
      should follow.  Actually real MAP_FIXED usages should be docummented
      properly and they should be more of an exception.
      
      [1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org
      [2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org
      [3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
      [4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
      [5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
      [6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
      [7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
      
      This patch (of 2):
      
      MAP_FIXED is used quite often to enforce mapping at the particular range.
      The main problem of this flag is, however, that it is inherently dangerous
      because it unmaps existing mappings covered by the requested range.  This
      can cause silent memory corruptions.  Some of them even with serious
      security implications.  While the current semantic might be really
      desiderable in many cases there are others which would want to enforce the
      given range but rather see a failure than a silent memory corruption on a
      clashing range.  Please note that there is no guarantee that a given range
      is obeyed by the mmap even when it is free - e.g.  arch specific code is
      allowed to apply an alignment.
      
      Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this
      behavior.  It has the same semantic as MAP_FIXED wrt.  the given address
      request with a single exception that it fails with EEXIST if the requested
      address is already covered by an existing mapping.  We still do rely on
      get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
      check for a conflicting vma after it returns.
      
      The flag is introduced as a completely new one rather than a MAP_FIXED
      extension because of the backward compatibility.  We really want a
      never-clobber semantic even on older kernels which do not recognize the
      flag.  Unfortunately mmap sucks wrt.  flags evaluation because we do not
      EINVAL on unknown flags.  On those kernels we would simply use the
      traditional hint based semantic so the caller can still get a different
      address (which sucks) but at least not silently corrupt an existing
      mapping.  I do not see a good way around that.
      
      [mpe@ellerman.id.au: fix whitespace]
      [fail on clashing range with EEXIST as per Florian Weimer]
      [set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
      Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.orgReviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Joel Stanley <joel@jms.id.au>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Jason Evans <jasone@google.com>
      Cc: David Goldblatt <davidtgoldblatt@gmail.com>
      Cc: Edward Tomasz Napierała <trasz@FreeBSD.org>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a4ff8e86
    • K
      exec: pass stack rlimit into mm layout functions · 8f2af155
      Kees Cook 提交于
      Patch series "exec: Pin stack limit during exec".
      
      Attempts to solve problems with the stack limit changing during exec
      continue to be frustrated[1][2].  In addition to the specific issues
      around the Stack Clash family of flaws, Andy Lutomirski pointed out[3]
      other places during exec where the stack limit is used and is assumed to
      be unchanging.  Given the many places it gets used and the fact that it
      can be manipulated/raced via setrlimit() and prlimit(), I think the only
      way to handle this is to move away from the "current" view of the stack
      limit and instead attach it to the bprm, and plumb this down into the
      functions that need to know the stack limits.  This series implements
      the approach.
      
      [1] 04e35f44 ("exec: avoid RLIMIT_STACK races with prlimit()")
      [2] 779f4e1c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"")
      [3] to security@kernel.org, "Subject: existing rlimit races?"
      
      This patch (of 3):
      
      Since it is possible that the stack rlimit can change externally during
      exec (either via another thread calling setrlimit() or another process
      calling prlimit()), provide a way to pass the rlimit down into the
      per-architecture mm layout functions so that the rlimit can stay in the
      bprm structure instead of sitting in the signal structure until exec is
      finalized.
      
      Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Cc: Brad Spengler <spender@grsecurity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f2af155
    • J
      ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y · 3d2054ad
      Joonsoo Kim 提交于
      CMA area is now managed by the separate zone, ZONE_MOVABLE, to fix many
      MM related problems.  In this implementation, if CONFIG_HIGHMEM = y,
      then ZONE_MOVABLE is considered as HIGHMEM and the memory of the CMA
      area is also considered as HIGHMEM.  That means that they are considered
      as the page without direct mapping.  However, CMA area could be in a
      lowmem and the memory could have direct mapping.
      
      In ARM, when establishing a new mapping for DMA, direct mapping should
      be cleared since two mapping with different cache policy could cause
      unknown problem.  With this patch, PageHighmem() for the CMA memory
      located in lowmem returns true so that the function for DMA mapping
      cannot notice whether it needs to clear direct mapping or not,
      correctly.  To handle this situation, this patch always clears direct
      mapping for such CMA memory.
      
      Link: http://lkml.kernel.org/r/1512114786-5085-4-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Tested-by: NTony Lindgren <tony@atomide.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d2054ad
    • M
      mm, migrate: remove reason argument from new_page_t · 666feb21
      Michal Hocko 提交于
      No allocation callback is using this argument anymore.  new_page_node
      used to use this parameter to convey node_id resp.  migration error up
      to move_pages code (do_move_page_to_node_array).  The error status never
      made it into the final status field and we have a better way to
      communicate node id to the status field now.  All other allocation
      callbacks simply ignored the argument so we can drop it finally.
      
      [mhocko@suse.com: fix migration callback]
        Link: http://lkml.kernel.org/r/20180105085259.GH2801@dhcp22.suse.cz
      [akpm@linux-foundation.org: fix alloc_misplaced_dst_page()]
      [mhocko@kernel.org: fix build]
        Link: http://lkml.kernel.org/r/20180103091134.GB11319@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/20180103082555.14592-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NZi Yan <zi.yan@cs.rutgers.edu>
      Cc: Andrea Reale <ar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      666feb21
  3. 11 4月, 2018 3 次提交
  4. 10 4月, 2018 5 次提交
  5. 09 4月, 2018 1 次提交
  6. 08 4月, 2018 4 次提交
  7. 07 4月, 2018 3 次提交
  8. 06 4月, 2018 11 次提交
    • R
      ARM: sa1100/simpad: switch simpad CF to use gpiod APIs · 64b2f129
      Russell King 提交于
      Switch simpad's CF implementation to use the gpiod APIs.  The inverted
      detection is handled using gpiolib's native inversion abilities.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      64b2f129
    • R
      ARM: sa1100/shannon: convert to generic CF sockets · b51af865
      Russell King 提交于
      Convert shannon to use the generic CF socket support.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      b51af865
    • R
      ARM: sa1100/nanoengine: convert to generic CF sockets · 80c799db
      Russell King 提交于
      Convert nanoengine to use the generic CF socket support.
      Makefile fix from Arnd Bergmann <arnd@arndb.de>.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      80c799db
    • R
      headers: untangle kmemleak.h from mm.h · 514c6032
      Randy Dunlap 提交于
      Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
      reason.  It looks like it's only a convenience, so remove kmemleak.h
      from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
      don't already #include it.  Also remove <linux/kmemleak.h> from source
      files that do not use it.
      
      This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
      be good to run it through the 0day bot for other $ARCHes.  I have
      neither the horsepower nor the storage space for the other $ARCHes.
      
      Update: This patch has been extensively build-tested by both the 0day
      bot & kisskb/ozlabs build farms.  Both of them reported 2 build failures
      for which patches are included here (in v2).
      
      [ slab.h is the second most used header file after module.h; kernel.h is
        right there with slab.h. There could be some minor error in the
        counting due to some #includes having comments after them and I didn't
        combine all of those. ]
      
      [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
      Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
      Link: http://kisskb.ellerman.id.au/kisskb/head/13396/Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Reported-by: Michael Ellerman <mpe@ellerman.id.au>	[2 build failures]
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>	[2 build failures]
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: John Johansen <john.johansen@canonical.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      514c6032
    • H
      mm: fix races between swapoff and flush dcache · cb9f753a
      Huang Ying 提交于
      Thanks to commit 4b3ef9da ("mm/swap: split swap cache into 64MB
      trunks"), after swapoff the address_space associated with the swap
      device will be freed.  So page_mapping() users which may touch the
      address_space need some kind of mechanism to prevent the address_space
      from being freed during accessing.
      
      The dcache flushing functions (flush_dcache_page(), etc) in architecture
      specific code may access the address_space of swap device for anonymous
      pages in swap cache via page_mapping() function.  But in some cases
      there are no mechanisms to prevent the swap device from being swapoff,
      for example,
      
        CPU1					CPU2
        __get_user_pages()			swapoff()
          flush_dcache_page()
            mapping = page_mapping()
              ...				  exit_swap_address_space()
              ...				    kvfree(spaces)
              mapping_mapped(mapping)
      
      The address space may be accessed after being freed.
      
      But from cachetlb.txt and Russell King, flush_dcache_page() only care
      about file cache pages, for anonymous pages, flush_anon_page() should be
      used.  The implementation of flush_dcache_page() in all architectures
      follows this too.  They will check whether page_mapping() is NULL and
      whether mapping_mapped() is true to determine whether to flush the
      dcache immediately.  And they will use interval tree (mapping->i_mmap)
      to find all user space mappings.  While mapping_mapped() and
      mapping->i_mmap isn't used by anonymous pages in swap cache at all.
      
      So, to fix the race between swapoff and flush dcache, __page_mapping()
      is add to return the address_space for file cache pages and NULL
      otherwise.  All page_mapping() invoking in flush dcache functions are
      replaced with page_mapping_file().
      
      [akpm@linux-foundation.org: simplify page_mapping_file(), per Mike]
      Link: http://lkml.kernel.org/r/20180305083634.15174-1-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9f753a
    • D
      mm, powerpc: use vma_kernel_pagesize() in vma_mmu_pagesize() · 09135cc5
      Dan Williams 提交于
      Patch series "mm, smaps: MMUPageSize for device-dax", v3.
      
      Similar to commit 31383c68 ("mm, hugetlbfs: introduce ->split() to
      vm_operations_struct") here is another occasion where we want
      special-case hugetlbfs/hstate enabling to also apply to device-dax.
      
      This prompts the question what other hstate conversions we might do
      beyond ->split() and ->pagesize(), but this appears to be the last of
      the usages of hstate_vma() in generic/non-hugetlbfs specific code paths.
      
      This patch (of 3):
      
      The current powerpc definition of vma_mmu_pagesize() open codes looking
      up the page size via hstate.  It is identical to the generic
      vma_kernel_pagesize() implementation.
      
      Now, vma_kernel_pagesize() is growing support for determining the page
      size of Device-DAX vmas in addition to the existing Hugetlbfs page size
      determination.
      
      Ideally, if the powerpc vma_mmu_pagesize() used vma_kernel_pagesize() it
      would automatically benefit from any new vma-type support that is added
      to vma_kernel_pagesize().  However, the powerpc vma_mmu_pagesize() is
      prevented from calling vma_kernel_pagesize() due to a circular header
      dependency that requires vma_mmu_pagesize() to be defined before
      including <linux/hugetlb.h>.
      
      Break this circular dependency by defining the default vma_mmu_pagesize()
      as a __weak symbol to be overridden by the powerpc version.
      
      Link: http://lkml.kernel.org/r/151996254179.27922.2213728278535578744.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Jane Chu <jane.chu@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09135cc5
    • P
      x86/mm/memory_hotplug: determine block size based on the end of boot memory · 078eb6aa
      Pavel Tatashin 提交于
      Memory sections are combined into "memory block" chunks.  These chunks
      are the units upon which memory can be added and removed.
      
      On x86, the new memory may be added after the end of the boot memory,
      therefore, if block size does not align with end of boot memory, memory
      hot-plugging/hot-removing can be broken.
      
      Memory sections are combined into "memory block" chunks.  These chunks
      are the units upon which memory can be added and removed.
      
      On x86 the new memory may be added after the end of the boot memory,
      therefore, if block size does not align with end of boot memory, memory
      hotplugging/hotremoving can be broken.
      
      Currently, whenever machine is booted with more than 64G the block size
      is unconditionally increased to 2G from the base 128M.  This is done in
      order to reduce number of memory device files in sysfs:
      
      	/sys/devices/system/memory/memoryXXX
      
      We must use the largest allowed block size that aligns to the next
      address to be able to hotplug the next block of memory.
      
      So, when memory is larger or equal to 64G, we check the end address and
      find the largest block size that is still power of two but smaller or
      equal to 2G.
      
      Before, the fix:
      Run qemu with:
      -m 64G,slots=2,maxmem=66G -object memory-backend-ram,id=mem1,size=2G
      
      (qemu) device_add pc-dimm,id=dimm1,memdev=mem1
      Block size [0x80000000] unaligned hotplug range: start 0x1040000000,
      							size 0x80000000
      acpi PNP0C80:00: add_memory failed
      acpi PNP0C80:00: acpi_memory_enable_device() error
      acpi PNP0C80:00: Enumeration failure
      
      With the fix memory is added successfully as the block size is set to
      1G, and therefore aligns with start address 0x1040000000.
      
      [pasha.tatashin@oracle.com: v4]
        Link: http://lkml.kernel.org/r/20180215165920.8570-3-pasha.tatashin@oracle.com
      Link: http://lkml.kernel.org/r/20180213193159.14606-3-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      078eb6aa
    • A
      mm/migrate: rename migration reason MR_CMA to MR_CONTIG_RANGE · 31025351
      Anshuman Khandual 提交于
      alloc_contig_range() initiates compaction and eventual migration for the
      purpose of either CMA or HugeTLB allocations.  At present, the reason
      code remains the same MR_CMA for either of these cases.  Let's make it
      MR_CONTIG_RANGE which will appropriately reflect the reason code in both
      these cases.
      
      Link: http://lkml.kernel.org/r/20180202091518.18798-1-khandual@linux.vnet.ibm.comSigned-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31025351
    • H
      zboot: fix stack protector in compressed boot phase · 7bbaf27d
      Huacai Chen 提交于
      Calling __stack_chk_guard_setup() in decompress_kernel() is too late
      that stack checking always fails for decompress_kernel() itself.  So
      remove __stack_chk_guard_setup() and initialize __stack_chk_guard before
      we call decompress_kernel().
      
      Original code comes from ARM but also used for MIPS and SH, so fix them
      together.  If without this fix, compressed booting of these archs will
      fail because stack checking is enabled by default (>=4.16).
      
      Link: http://lkml.kernel.org/r/1522226933-29317-1-git-send-email-chenhc@lemote.com
      Fixes: 8779657d ("stackprotector: Introduce CONFIG_CC_STACKPROTECTOR_STRONG")
      Signed-off-by: NHuacai Chen <chenhc@lemote.com>
      Acked-by: NJames Hogan <jhogan@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRich Felker <dalias@libc.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bbaf27d
    • R
      ARM: decompressor: fix warning introduced in fortify patch · 5f8d561f
      Russell King 提交于
      Commit ee333554 ("ARM: 8749/1: Kconfig: Add ARCH_HAS_FORTIFY_SOURCE")
      introduced a new warning:
      
      arch/arm/boot/compressed/misc.c: In function 'fortify_panic':
      arch/arm/boot/compressed/misc.c:167:1: error: 'noreturn' function does return [-Werror]
      
      The simple solution would be to make 'error' a noreturn function, but
      this causes a prototype mismatch as the function is prototyped in
      several .c files.  So, move the function prototype to a new header.
      
      There are also a couple of variables that are also declared in several
      locations.  Clean this up while we are here.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      5f8d561f
    • R
      time: tick-sched: Reorganize idle tick management code · 0e776768
      Rafael J. Wysocki 提交于
      Prepare the scheduler tick code for reworking the idle loop to
      avoid stopping the tick in some cases.
      
      The idea is to split the nohz idle entry call to decouple the idle
      time stats accounting and preparatory work from the actual tick stop
      code, in order to later be able to delay the tick stop once we reach
      more power-knowledgeable callers.
      
      Move away the tick_nohz_start_idle() invocation from
      __tick_nohz_idle_enter(), rename the latter to
      __tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick()
      as a wrapper around it for calling it from the outside.
      
      Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead
      of calling the entire __tick_nohz_idle_enter(), add another wrapper
      disabling and enabling interrupts around tick_nohz_idle_stop_tick()
      and make the current callers of tick_nohz_idle_enter() call it too
      to retain their current functionality.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      0e776768
  9. 05 4月, 2018 2 次提交