1. 20 8月, 2017 1 次提交
  2. 19 8月, 2017 22 次提交
    • L
      Merge branch 'akpm' (patches from Andrew) · 58d4e450
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "14 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
        mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM
        mm/mempolicy: fix use after free when calling get_mempolicy
        mm/cma_debug.c: fix stack corruption due to sprintf usage
        signal: don't remove SIGNAL_UNKILLABLE for traced tasks.
        mm, oom: fix potential data corruption when oom_reaper races with writer
        mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS
        slub: fix per memcg cache leak on css offline
        mm: discard memblock data later
        test_kmod: fix description for -s -and -c parameters
        kmod: fix wait on recursive loop
        wait: add wait_event_killable_timeout()
        kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog
        mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()
      58d4e450
    • K
      mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes · c715b72c
      Kees Cook 提交于
      Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000
      broke AddressSanitizer.  This is a partial revert of:
      
        eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
        02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
      
      The AddressSanitizer tool has hard-coded expectations about where
      executable mappings are loaded.
      
      The motivation for changing the PIE base in the above commits was to
      avoid the Stack-Clash CVEs that allowed executable mappings to get too
      close to heap and stack.  This was mainly a problem on 32-bit, but the
      64-bit bases were moved too, in an effort to proactively protect those
      systems (proofs of concept do exist that show 64-bit collisions, but
      other recent changes to fix stack accounting and setuid behaviors will
      minimize the impact).
      
      The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC
      base), so only the 64-bit PIE base needs to be reverted to let x86 and
      arm64 ASan binaries run again.  Future changes to the 64-bit PIE base on
      these architectures can be made optional once a more dynamic method for
      dealing with AddressSanitizer is found.  (e.g.  always loading PIE into
      the mmap region for marked binaries.)
      
      Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast
      Fixes: eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
      Fixes: 02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKostya Serebryany <kcc@google.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c715b72c
    • L
      mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM · 704b862f
      Laura Abbott 提交于
      Commit 19809c2d ("mm, vmalloc: use __GFP_HIGHMEM implicitly") added
      use of __GFP_HIGHMEM for allocations.  vmalloc_32 may use
      GFP_DMA/GFP_DMA32 which does not play nice with __GFP_HIGHMEM and will
      trigger a BUG in gfp_zone.
      
      Only add __GFP_HIGHMEM if we aren't using GFP_DMA/GFP_DMA32.
      
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1482249
      Link: http://lkml.kernel.org/r/20170816220705.31374-1-labbott@redhat.com
      Fixes: 19809c2d ("mm, vmalloc: use __GFP_HIGHMEM implicitly")
      Signed-off-by: NLaura Abbott <labbott@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      704b862f
    • Z
      mm/mempolicy: fix use after free when calling get_mempolicy · 73223e4e
      zhong jiang 提交于
      I hit a use after free issue when executing trinity and repoduced it
      with KASAN enabled.  The related call trace is as follows.
      
        BUG: KASan: use after free in SyS_get_mempolicy+0x3c8/0x960 at addr ffff8801f582d766
        Read of size 2 by task syz-executor1/798
      
        INFO: Allocated in mpol_new.part.2+0x74/0x160 age=3 cpu=1 pid=799
           __slab_alloc+0x768/0x970
           kmem_cache_alloc+0x2e7/0x450
           mpol_new.part.2+0x74/0x160
           mpol_new+0x66/0x80
           SyS_mbind+0x267/0x9f0
           system_call_fastpath+0x16/0x1b
        INFO: Freed in __mpol_put+0x2b/0x40 age=4 cpu=1 pid=799
           __slab_free+0x495/0x8e0
           kmem_cache_free+0x2f3/0x4c0
           __mpol_put+0x2b/0x40
           SyS_mbind+0x383/0x9f0
           system_call_fastpath+0x16/0x1b
        INFO: Slab 0xffffea0009cb8dc0 objects=23 used=8 fp=0xffff8801f582de40 flags=0x200000000004080
        INFO: Object 0xffff8801f582d760 @offset=5984 fp=0xffff8801f582d600
      
        Bytes b4 ffff8801f582d750: ae 01 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
        Object ffff8801f582d760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        Object ffff8801f582d770: 6b 6b 6b 6b 6b 6b 6b a5                          kkkkkkk.
        Redzone ffff8801f582d778: bb bb bb bb bb bb bb bb                          ........
        Padding ffff8801f582d8b8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
        Memory state around the buggy address:
        ffff8801f582d600: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff8801f582d680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        >ffff8801f582d700: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fc
      
      !shared memory policy is not protected against parallel removal by other
      thread which is normally protected by the mmap_sem.  do_get_mempolicy,
      however, drops the lock midway while we can still access it later.
      
      Early premature up_read is a historical artifact from times when
      put_user was called in this path see https://lwn.net/Articles/124754/
      but that is gone since 8bccd85f ("[PATCH] Implement sys_* do_*
      layering in the memory policy layer.").  but when we have the the
      current mempolicy ref count model.  The issue was introduced
      accordingly.
      
      Fix the issue by removing the premature release.
      
      Link: http://lkml.kernel.org/r/1502950924-27521-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>	[2.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73223e4e
    • P
      mm/cma_debug.c: fix stack corruption due to sprintf usage · da094e42
      Prakash Gupta 提交于
      name[] in cma_debugfs_add_one() can only accommodate 16 chars including
      NULL to store sprintf output.  It's common for cma device name to be
      larger than 15 chars.  This can cause stack corrpution.  If the gcc
      stack protector is turned on, this can cause a panic due to stack
      corruption.
      
      Below is one example trace:
      
        Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in:
        ffffff8e69a75730
        Call trace:
           dump_backtrace+0x0/0x2c4
           show_stack+0x20/0x28
           dump_stack+0xb8/0xf4
           panic+0x154/0x2b0
           print_tainted+0x0/0xc0
           cma_debugfs_init+0x274/0x290
           do_one_initcall+0x5c/0x168
           kernel_init_freeable+0x1c8/0x280
      
      Fix the short sprintf buffer in cma_debugfs_add_one() by using
      scnprintf() instead of sprintf().
      
      Link: http://lkml.kernel.org/r/1502446217-21840-1-git-send-email-guptap@codeaurora.org
      Fixes: f318dd08 ("cma: Store a name in the cma structure")
      Signed-off-by: NPrakash Gupta <guptap@codeaurora.org>
      Acked-by: NLaura Abbott <labbott@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da094e42
    • J
      signal: don't remove SIGNAL_UNKILLABLE for traced tasks. · eb61b591
      Jamie Iles 提交于
      When forcing a signal, SIGNAL_UNKILLABLE is removed to prevent recursive
      faults, but this is undesirable when tracing.  For example, debugging an
      init process (whether global or namespace), hitting a breakpoint and
      SIGTRAP will force SIGTRAP and then remove SIGNAL_UNKILLABLE.
      Everything continues fine, but then once debugging has finished, the
      init process is left killable which is unlikely what the user expects,
      resulting in either an accidentally killed init or an init that stops
      reaping zombies.
      
      Link: http://lkml.kernel.org/r/20170815112806.10728-1-jamie.iles@oracle.comSigned-off-by: NJamie Iles <jamie.iles@oracle.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb61b591
    • M
      mm, oom: fix potential data corruption when oom_reaper races with writer · 6b31d595
      Michal Hocko 提交于
      Wenwei Tao has noticed that our current assumption that the oom victim
      is dying and never doing any visible changes after it dies, and so the
      oom_reaper can tear it down, is not entirely true.
      
      __task_will_free_mem consider a task dying when SIGNAL_GROUP_EXIT is set
      but do_group_exit sends SIGKILL to all threads _after_ the flag is set.
      So there is a race window when some threads won't have
      fatal_signal_pending while the oom_reaper could start unmapping the
      address space.  Moreover some paths might not check for fatal signals
      before each PF/g-u-p/copy_from_user.
      
      We already have a protection for oom_reaper vs.  PF races by checking
      MMF_UNSTABLE.  This has been, however, checked only for kernel threads
      (use_mm users) which can outlive the oom victim.  A simple fix would be
      to extend the current check in handle_mm_fault for all tasks but that
      wouldn't be sufficient because the current check assumes that a kernel
      thread would bail out after EFAULT from get_user*/copy_from_user and
      never re-read the same address which would succeed because the PF path
      has established page tables already.  This seems to be the case for the
      only existing use_mm user currently (virtio driver) but it is rather
      fragile in general.
      
      This is even more fragile in general for more complex paths such as
      generic_perform_write which can re-read the same address more times
      (e.g.  iov_iter_copy_from_user_atomic to fail and then
      iov_iter_fault_in_readable on retry).
      
      Therefore we have to implement MMF_UNSTABLE protection in a robust way
      and never make a potentially corrupted content visible.  That requires
      to hook deeper into the PF path and check for the flag _every time_
      before a pte for anonymous memory is established (that means all
      !VM_SHARED mappings).
      
      The corruption can be triggered artificially
      (http://lkml.kernel.org/r/201708040646.v746kkhC024636@www262.sakura.ne.jp)
      but there doesn't seem to be any real life bug report.  The race window
      should be quite tight to trigger most of the time.
      
      Link: http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org
      Fixes: aac45363 ("mm, oom: introduce oom reaper")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NWenwei Tao <wenwei.tww@alibaba-inc.com>
      Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b31d595
    • M
      mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS · 5b53a6ea
      Michal Hocko 提交于
      Tetsuo Handa has noticed that MMF_UNSTABLE SIGBUS path in
      handle_mm_fault causes a lockdep splat
      
        Out of memory: Kill process 1056 (a.out) score 603 or sacrifice child
        Killed process 1056 (a.out) total-vm:4268108kB, anon-rss:2246048kB, file-rss:0kB, shmem-rss:0kB
        a.out (1169) used greatest stack depth: 11664 bytes left
        DEBUG_LOCKS_WARN_ON(depth <= 0)
        ------------[ cut here ]------------
        WARNING: CPU: 6 PID: 1339 at kernel/locking/lockdep.c:3617 lock_release+0x172/0x1e0
        CPU: 6 PID: 1339 Comm: a.out Not tainted 4.13.0-rc3-next-20170803+ #142
        Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
        RIP: 0010:lock_release+0x172/0x1e0
        Call Trace:
           up_read+0x1a/0x40
           __do_page_fault+0x28e/0x4c0
           do_page_fault+0x30/0x80
           page_fault+0x28/0x30
      
      The reason is that the page fault path might have dropped the mmap_sem
      and returned with VM_FAULT_RETRY.  MMF_UNSTABLE check however rewrites
      the error path to VM_FAULT_SIGBUS and we always expect mmap_sem taken in
      that path.  Fix this by taking mmap_sem when VM_FAULT_RETRY is held in
      the MMF_UNSTABLE path.
      
      We cannot simply add VM_FAULT_SIGBUS to the existing error code because
      all arch specific page fault handlers and g-u-p would have to learn a
      new error code combination.
      
      Link: http://lkml.kernel.org/r/20170807113839.16695-2-mhocko@kernel.org
      Fixes: 3f70dc38 ("mm: make sure that kthreads will not refault oom reaped memory")
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Wenwei Tao <wenwei.tww@alibaba-inc.com>
      Cc: <stable@vger.kernel.org>	[4.9+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b53a6ea
    • V
      slub: fix per memcg cache leak on css offline · f6ba4880
      Vladimir Davydov 提交于
      To avoid a possible deadlock, sysfs_slab_remove() schedules an
      asynchronous work to delete sysfs entries corresponding to the kmem
      cache.  To ensure the cache isn't freed before the work function is
      called, it takes a reference to the cache kobject.  The reference is
      supposed to be released by the work function.
      
      However, the work function (sysfs_slab_remove_workfn()) does nothing in
      case the cache sysfs entry has already been deleted, leaking the kobject
      and the corresponding cache.
      
      This may happen on a per memcg cache destruction, because sysfs entries
      of a per memcg cache are deleted on memcg offline if the cache is empty
      (see __kmemcg_cache_deactivate()).
      
      The kmemleak report looks like this:
      
        unreferenced object 0xffff9f798a79f540 (size 32):
          comm "kworker/1:4", pid 15416, jiffies 4307432429 (age 28687.554s)
          hex dump (first 32 bytes):
            6b 6d 61 6c 6c 6f 63 2d 31 36 28 31 35 39 39 3a  kmalloc-16(1599:
            6e 65 77 72 6f 6f 74 29 00 23 6b c0 ff ff ff ff  newroot).#k.....
          backtrace:
             kmemleak_alloc+0x4a/0xa0
             __kmalloc_track_caller+0x148/0x2c0
             kvasprintf+0x66/0xd0
             kasprintf+0x49/0x70
             memcg_create_kmem_cache+0xe6/0x160
             memcg_kmem_cache_create_func+0x20/0x110
             process_one_work+0x205/0x5d0
             worker_thread+0x4e/0x3a0
             kthread+0x109/0x140
             ret_from_fork+0x2a/0x40
        unreferenced object 0xffff9f79b6136840 (size 416):
          comm "kworker/1:4", pid 15416, jiffies 4307432429 (age 28687.573s)
          hex dump (first 32 bytes):
            40 fb 80 c2 3e 33 00 00 00 00 00 40 00 00 00 00  @...>3.....@....
            00 00 00 00 00 00 00 00 10 00 00 00 10 00 00 00  ................
          backtrace:
             kmemleak_alloc+0x4a/0xa0
             kmem_cache_alloc+0x128/0x280
             create_cache+0x3b/0x1e0
             memcg_create_kmem_cache+0x118/0x160
             memcg_kmem_cache_create_func+0x20/0x110
             process_one_work+0x205/0x5d0
             worker_thread+0x4e/0x3a0
             kthread+0x109/0x140
             ret_from_fork+0x2a/0x40
      
      Fix the leak by adding the missing call to kobject_put() to
      sysfs_slab_remove_workfn().
      
      Link: http://lkml.kernel.org/r/20170812181134.25027-1-vdavydov.dev@gmail.com
      Fixes: 3b7b3140 ("slub: make sysfs file removal asynchronous")
      Signed-off-by: NVladimir Davydov <vdavydov.dev@gmail.com>
      Reported-by: NAndrei Vagin <avagin@gmail.com>
      Tested-by: NAndrei Vagin <avagin@gmail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>	[4.12.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f6ba4880
    • P
      mm: discard memblock data later · 3010f876
      Pavel Tatashin 提交于
      There is existing use after free bug when deferred struct pages are
      enabled:
      
      The memblock_add() allocates memory for the memory array if more than
      128 entries are needed.  See comment in e820__memblock_setup():
      
        * The bootstrap memblock region count maximum is 128 entries
        * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
        * than that - so allow memblock resizing.
      
      This memblock memory is freed here:
              free_low_memory_core_early()
      
      We access the freed memblock.memory later in boot when deferred pages
      are initialized in this path:
      
              deferred_init_memmap()
                      for_each_mem_pfn_range()
                        __next_mem_pfn_range()
                          type = &memblock.memory;
      
      One possible explanation for why this use-after-free hasn't been hit
      before is that the limit of INIT_MEMBLOCK_REGIONS has never been
      exceeded at least on systems where deferred struct pages were enabled.
      
      Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
      and verifying in qemu that this code is getting excuted and that the
      freed pages are sane.
      
      Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com
      Fixes: 7e18adb4 ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3010f876
    • L
      test_kmod: fix description for -s -and -c parameters · 768dc4e4
      Luis R. Rodriguez 提交于
      The descriptions were reversed, correct this.
      
      Link: http://lkml.kernel.org/r/20170809234635.13443-4-mcgrof@kernel.org
      Fixes: 64b67120 ("test_sysctl: add generic script to expand on tests")
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Reported-by: NDaniel Mentz <danielmentz@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matt Redfearn <matt.redfearn@imgetc.com>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      768dc4e4
    • L
      kmod: fix wait on recursive loop · 2ba293c9
      Luis R. Rodriguez 提交于
      Recursive loops with module loading were previously handled in kmod by
      restricting the number of modprobe calls to 50 and if that limit was
      breached request_module() would return an error and a user would see the
      following on their kernel dmesg:
      
        request_module: runaway loop modprobe binfmt-464c
        Starting init:/sbin/init exists but couldn't execute it (error -8)
      
      This issue could happen for instance when a 64-bit kernel boots a 32-bit
      userspace on some architectures and has no 32-bit binary format
      hanlders.  This is visible, for instance, when a CONFIG_MODULES enabled
      64-bit MIPS kernel boots a into o32 root filesystem and the binfmt
      handler for o32 binaries is not built-in.
      
      After commit 6d7964a7 ("kmod: throttle kmod thread limit") we now
      don't have any visible signs of an error and the kernel just waits for
      the loop to end somehow.
      
      Although this *particular* recursive loop could also be addressed by
      doing a sanity check on search_binary_handler() and disallowing a
      modular binfmt to be required for modprobe, a generic solution for any
      recursive kernel kmod issues is still needed.
      
      This should catch these loops.  We can investigate each loop and address
      each one separately as they come in, this however puts a stop gap for
      them as before.
      
      Link: http://lkml.kernel.org/r/20170809234635.13443-3-mcgrof@kernel.org
      Fixes: 6d7964a7 ("kmod: throttle kmod thread limit")
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Reported-by: NMatt Redfearn <matt.redfearn@imgtec.com>
      Tested-by: NMatt Redfearn <matt.redfearn@imgetc.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ba293c9
    • L
      wait: add wait_event_killable_timeout() · 8ada9279
      Luis R. Rodriguez 提交于
      These are the few pending fixes I have queued up for v4.13-final.  One
      is a a generic regression fix for recursive loops on kmod and the other
      one is a trivial print out correction.
      
      During the v4.13 development we assumed that recursive kmod loops were
      no longer possible.  Clearly that is not true.  The regression fix makes
      use of a new killable wait.  We use a killable wait to be paranoid in
      how signals might be sent to modprobe and only accept a proper SIGKILL.
      The signal will only be available to userspace to issue *iff* a thread
      has already entered a wait state, and that happens only if we've already
      throttled after 50 kmod threads have been hit.
      
      Note that although it may seem excessive to trigger a failure afer 5
      seconds if all kmod thread remain busy, prior to the series of changes
      that went into v4.13 we would actually *always* fatally fail any request
      which came in if the limit was already reached.  The new waiting
      implemented in v4.13 actually gives us *more* breathing room -- the wait
      for 5 seconds is a wait for *any* kmod thread to finish.  We give up and
      fail *iff* no kmod thread has finished and they're *all* running
      straight for 5 consecutive seconds.  If 50 kmod threads are running
      consecutively for 5 seconds something else must be really bad.
      
      Recursive loops with kmod are bad but they're also hard to implement
      properly as a selftest without currently fooling current userspace tools
      like kmod [1].  For instance kmod will complain when you run depmod if
      it finds a recursive loop with symbol dependency between modules as such
      this type of recursive loop cannot go upstream as the modules_install
      target will fail after running depmod.
      
      These tests already exist on userspace kmod upstream though (refer to
      the testsuite/module-playground/mod-loop-*.c files).  The same is not
      true if request_module() is used though, or worst if aliases are used.
      
      Likewise the issue with 64-bit kernels booting 32-bit userspace without
      a binfmt handler built-in is also currently not detected and proactively
      avoided by userspace kmod tools, or kconfig for all architectures.
      Although we could complain in the kernel when some of these individual
      recursive issues creep up, proactively avoiding these situations in
      userspace at build time is what we should keep striving for.
      
      Lastly, since recursive loops could happen with kmod it may mean
      recursive loops may also be possible with other kernel usermode helpers,
      this should be investigated and long term if we can come up with a more
      sensible generic solution even better!
      
      [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20170809-kmod-for-v4.13-final
      [1] https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git
      
      This patch (of 3):
      
      This wait is similar to wait_event_interruptible_timeout() but only
      accepts SIGKILL interrupt signal.  Other signals are ignored.
      
      Link: http://lkml.kernel.org/r/20170809234635.13443-2-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Matt Redfearn <matt.redfearn@imgetc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ada9279
    • N
      kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog · 92e5aae4
      Nicholas Piggin 提交于
      Commit 05a4a952 ("kernel/watchdog: split up config options") lost
      the perf-based hardlockup detector's dependency on PERF_EVENTS, which
      can result in broken builds with some powerpc configurations.
      
      Restore the dependency.  Add it in for x86 too, despite x86 always
      selecting PERF_EVENTS it seems reasonable to make the dependency
      explicit.
      
      Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com
      Fixes: 05a4a952 ("kernel/watchdog: split up config options")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92e5aae4
    • J
      mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback() · 739f79fc
      Johannes Weiner 提交于
      Jaegeuk and Brad report a NULL pointer crash when writeback ending tries
      to update the memcg stats:
      
          BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
          IP: test_clear_page_writeback+0x12e/0x2c0
          [...]
          RIP: 0010:test_clear_page_writeback+0x12e/0x2c0
          Call Trace:
           <IRQ>
           end_page_writeback+0x47/0x70
           f2fs_write_end_io+0x76/0x180 [f2fs]
           bio_endio+0x9f/0x120
           blk_update_request+0xa8/0x2f0
           scsi_end_request+0x39/0x1d0
           scsi_io_completion+0x211/0x690
           scsi_finish_command+0xd9/0x120
           scsi_softirq_done+0x127/0x150
           __blk_mq_complete_request_remote+0x13/0x20
           flush_smp_call_function_queue+0x56/0x110
           generic_smp_call_function_single_interrupt+0x13/0x30
           smp_call_function_single_interrupt+0x27/0x40
           call_function_single_interrupt+0x89/0x90
          RIP: 0010:native_safe_halt+0x6/0x10
      
          (gdb) l *(test_clear_page_writeback+0x12e)
          0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619).
          614		mod_node_page_state(page_pgdat(page), idx, val);
          615		if (mem_cgroup_disabled() || !page->mem_cgroup)
          616			return;
          617		mod_memcg_state(page->mem_cgroup, idx, val);
          618		pn = page->mem_cgroup->nodeinfo[page_to_nid(page)];
          619		this_cpu_add(pn->lruvec_stat->count[idx], val);
          620	}
          621
          622	unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
          623							gfp_t gfp_mask,
      
      The issue is that writeback doesn't hold a page reference and the page
      might get freed after PG_writeback is cleared (and the mapping is
      unlocked) in test_clear_page_writeback().  The stat functions looking up
      the page's node or zone are safe, as those attributes are static across
      allocation and free cycles.  But page->mem_cgroup is not, and it will
      get cleared if we race with truncation or migration.
      
      It appears this race window has been around for a while, but less likely
      to trigger when the memcg stats were updated first thing after
      PG_writeback is cleared.  Recent changes reshuffled this code to update
      the global node stats before the memcg ones, though, stretching the race
      window out to an extent where people can reproduce the problem.
      
      Update test_clear_page_writeback() to look up and pin page->mem_cgroup
      before clearing PG_writeback, then not use that pointer afterward.  It
      is a partial revert of 62cccb8c ("mm: simplify lock_page_memcg()")
      but leaves the pageref-holding callsites that aren't affected alone.
      
      Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org
      Fixes: 62cccb8c ("mm: simplify lock_page_memcg()")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Tested-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Reported-by: NBradley Bolen <bradleybolen@gmail.com>
      Tested-by: NBrad Bolen <bradleybolen@gmail.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: <stable@vger.kernel.org>	[4.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      739f79fc
    • L
      Merge tag 'xfs-4.13-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · cc28fcdc
      Linus Torvalds 提交于
      Pull xfs fixes from Darrick Wong:
       "A handful more bug fixes for you today.
      
        Changes since last time:
      
         - Don't leak resources when mount fails
      
         - Don't accidentally clobber variables when looking for free inodes"
      
      * tag 'xfs-4.13-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: don't leak quotacheck dquots when cow recovery
        xfs: clear MS_ACTIVE after finishing log recovery
        iomap: fix integer truncation issues in the zeroing and dirtying helpers
        xfs: fix inobt inode allocation search optimization
      cc28fcdc
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 70bfc741
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "A small set of fixes that should go into this release. This contains:
      
         - An NVMe pull request from Christoph, with a few select fixes.
      
           One of them fix a polling regression in this series, in which it's
           trivial to cause the kernel to disable most of the hardware queue
           interrupts.
      
         - Fixup for a blk-mq queue usage imbalance on request allocation,
           from Keith.
      
         - A xen block pull request from Konrad, fixing two issues with
           xen/xen-blkfront"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL
        nvme-pci: set cqe_seen on polled completions
        nvme-fabrics: fix reporting of unrecognized options
        nvmet-fc: eliminate incorrect static markers on local variables
        nvmet-fc: correct use after free on list teardown
        nvmet: don't overwrite identify sn/fr with 0-bytes
        xen-blkfront: use a right index when checking requests
        xen: fix bio vec merging
        blk-mq: Fix queue usage on failed request allocation
      70bfc741
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · edb20a1b
      Linus Torvalds 提交于
      Pull rdma fixes from Doug Ledford:
       "Fourth set of -rc fixes for 4.13 cycle. This is all of the -rc fixes
        that we know of. I suspect this will be the last rc pull request, but
        you never know, I could be wrong.
      
        Nothing major here. There are the i40iw patches I mentioned in my last
        pull request minus one that I pulled out because it wasn't a fix and
        not appropriate for the rc cycle. Then a few other items trickled in
        and were added to the pull request. It's fairly small aside from those
        five i40iw patches
      
         - Set of five i40iw fixes (the first of these is rather large by line
           count consideration, but I decided to send it because if fixes a
           legitimate issue and the line count is because it does so by
           creating a new function and using it where needed instead of just
           patching up a few lines...a smaller fix could probably be done, but
           the larger fix is the better code solution)
      
         - One vmw_pvrdma fix
      
         - One hns_roce fix (this silences a checker warning, but can't
           actually happen, I expect a patch to remove this from all drivers
           that share this same check in for-next)
      
         - One iw_cxgb4 fix
      
         - Two IB core fixes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
        IB/uverbs: Fix NULL pointer dereference during device removal
        IB/core: Protect sysfs entry on ib_unregister_device
        iw_cxgb4: fix misuse of integer variable
        IB/hns: fix memory leak on ah on error return path
        i40iw: Fix potential fcn_id_array out of bounds
        i40iw: Use correct alignment for CQ0 memory
        i40iw: Fix typecast of tcp_seq_num
        i40iw: Correct variable names
        i40iw: Fix parsing of query/commit FPM buffers
        RDMA/vmw_pvrdma: Report CQ missed events
      edb20a1b
    • L
      Merge tag 'powerpc-4.13-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 039a8e38
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "A bug in the VSX register saving that could cause userspace FP/VMX
        register corruption.
      
        Never seen to happen (that we know of), was found by code inspection,
        but still tagged for stable given the consequences"
      
      * tag 'powerpc-4.13-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
      039a8e38
    • L
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 42833468
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Arnd Bergmann:
       "A small number of bugfixes, nothing serious this time. Here is a full
        list.
      
        4.13 regression fix:
      
         - imx7d-sdb pinctrl support regressed in 4.13 due to an incomplete
           patch
      
        DT fixes for recently added devices:
      
         - badly copied DT entries on imx6qdl-nitrogen6_som broke PCI reset
      
         - sama5d2 memory controller had the wrong ID and registers
      
         - imx7 power domains did not work correctly with deferred probing
           (driver added in 4.12)
      
         - Allwinner H5 pinctrl (added in 4.12) did not work right with GPIO
           interrupts
      
        Fixes for older bugs that just got noticed:
      
         - i.MX25 ADC support (added in 4.6) apparently never worked right due
           to a missing 'ranges' property in DT.
      
         - Renesas Salvador Audio support (added in v4.5) was broken for
           device repeated bind/unbind due to a naming conflict.
      
         - Various allwinner boards are missing an 'ethernet' alias in DT,
           leading to unstable device naming.
      
        Preventive bugfix:
      
         - TI Keystone needs a fix to prevent a NULL pointer dereference with
           an upcoming PM change"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        soc: ti: ti_sci_pm_domains: Populate name for genpd
        ARM: dts: imx6qdl-nitrogen6_som2: fix PCIe reset
        arm64: allwinner: h5: fix pinctrl IRQs
        arm64: allwinner: a64: sopine: add missing ethernet0 alias
        arm64: allwinner: a64: pine64: add missing ethernet0 alias
        arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias
        arm64: renesas: salvator-common: avoid audio_clkout naming conflict
        ARM: dts: i.MX25: add ranges to tscadc
        soc: imx: gpcv2: fix regulator deferred probe
        ARM: dts: at91: sama5d2: fix EBI/NAND controllers declaration
        ARM: dts: at91: sama5d2: use sama5d2 compatible string for SMC
        ARM: dts: imx7d-sdb: Put pinctrl_spi4 in the correct location
      42833468
    • L
      Merge tag 'sound-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · cb247857
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes, mostly for regression fixes (sequencer
        kconfig and emu10k1 probe) and device-specific quirks (three for USB
        and one for HD-audio).
      
        One significant change is a fix for races in ALSA sequencer core,
        which covers over the previous incomplete fix"
      
      * tag 'sound-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: emu10k1: Fix forgotten user-copy conversion in init code
        ALSA: usb-audio: add DSD support for new Amanero PID
        ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices
        ALSA: usb-audio: Apply sample rate quirk to Sennheiser headset
        ALSA: seq: 2nd attempt at fixing race creating a queue
        ALSA: hda/realtek - Fix pincfg for Dell XPS 13 9370
        ALSA: seq: Fix CONFIG_SND_SEQ_MIDI dependency
      cb247857
    • L
      Merge tag 'dma-mapping-4.13-3' of git://git.infradead.org/users/hch/dma-mapping · 4478976a
      Linus Torvalds 提交于
      Pull dma-mapping fix from Christoph Hellwig:
       "Another dma-mapping regression fix"
      
      * tag 'dma-mapping-4.13-3' of git://git.infradead.org/users/hch/dma-mapping:
        of: fix DMA mask generation
      4478976a
  3. 18 8月, 2017 17 次提交