1. 19 8月, 2017 13 次提交
    • M
      mm, oom: fix potential data corruption when oom_reaper races with writer · 6b31d595
      Michal Hocko 提交于
      Wenwei Tao has noticed that our current assumption that the oom victim
      is dying and never doing any visible changes after it dies, and so the
      oom_reaper can tear it down, is not entirely true.
      
      __task_will_free_mem consider a task dying when SIGNAL_GROUP_EXIT is set
      but do_group_exit sends SIGKILL to all threads _after_ the flag is set.
      So there is a race window when some threads won't have
      fatal_signal_pending while the oom_reaper could start unmapping the
      address space.  Moreover some paths might not check for fatal signals
      before each PF/g-u-p/copy_from_user.
      
      We already have a protection for oom_reaper vs.  PF races by checking
      MMF_UNSTABLE.  This has been, however, checked only for kernel threads
      (use_mm users) which can outlive the oom victim.  A simple fix would be
      to extend the current check in handle_mm_fault for all tasks but that
      wouldn't be sufficient because the current check assumes that a kernel
      thread would bail out after EFAULT from get_user*/copy_from_user and
      never re-read the same address which would succeed because the PF path
      has established page tables already.  This seems to be the case for the
      only existing use_mm user currently (virtio driver) but it is rather
      fragile in general.
      
      This is even more fragile in general for more complex paths such as
      generic_perform_write which can re-read the same address more times
      (e.g.  iov_iter_copy_from_user_atomic to fail and then
      iov_iter_fault_in_readable on retry).
      
      Therefore we have to implement MMF_UNSTABLE protection in a robust way
      and never make a potentially corrupted content visible.  That requires
      to hook deeper into the PF path and check for the flag _every time_
      before a pte for anonymous memory is established (that means all
      !VM_SHARED mappings).
      
      The corruption can be triggered artificially
      (http://lkml.kernel.org/r/201708040646.v746kkhC024636@www262.sakura.ne.jp)
      but there doesn't seem to be any real life bug report.  The race window
      should be quite tight to trigger most of the time.
      
      Link: http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org
      Fixes: aac45363 ("mm, oom: introduce oom reaper")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NWenwei Tao <wenwei.tww@alibaba-inc.com>
      Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b31d595
    • M
      mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS · 5b53a6ea
      Michal Hocko 提交于
      Tetsuo Handa has noticed that MMF_UNSTABLE SIGBUS path in
      handle_mm_fault causes a lockdep splat
      
        Out of memory: Kill process 1056 (a.out) score 603 or sacrifice child
        Killed process 1056 (a.out) total-vm:4268108kB, anon-rss:2246048kB, file-rss:0kB, shmem-rss:0kB
        a.out (1169) used greatest stack depth: 11664 bytes left
        DEBUG_LOCKS_WARN_ON(depth <= 0)
        ------------[ cut here ]------------
        WARNING: CPU: 6 PID: 1339 at kernel/locking/lockdep.c:3617 lock_release+0x172/0x1e0
        CPU: 6 PID: 1339 Comm: a.out Not tainted 4.13.0-rc3-next-20170803+ #142
        Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
        RIP: 0010:lock_release+0x172/0x1e0
        Call Trace:
           up_read+0x1a/0x40
           __do_page_fault+0x28e/0x4c0
           do_page_fault+0x30/0x80
           page_fault+0x28/0x30
      
      The reason is that the page fault path might have dropped the mmap_sem
      and returned with VM_FAULT_RETRY.  MMF_UNSTABLE check however rewrites
      the error path to VM_FAULT_SIGBUS and we always expect mmap_sem taken in
      that path.  Fix this by taking mmap_sem when VM_FAULT_RETRY is held in
      the MMF_UNSTABLE path.
      
      We cannot simply add VM_FAULT_SIGBUS to the existing error code because
      all arch specific page fault handlers and g-u-p would have to learn a
      new error code combination.
      
      Link: http://lkml.kernel.org/r/20170807113839.16695-2-mhocko@kernel.org
      Fixes: 3f70dc38 ("mm: make sure that kthreads will not refault oom reaped memory")
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Wenwei Tao <wenwei.tww@alibaba-inc.com>
      Cc: <stable@vger.kernel.org>	[4.9+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b53a6ea
    • V
      slub: fix per memcg cache leak on css offline · f6ba4880
      Vladimir Davydov 提交于
      To avoid a possible deadlock, sysfs_slab_remove() schedules an
      asynchronous work to delete sysfs entries corresponding to the kmem
      cache.  To ensure the cache isn't freed before the work function is
      called, it takes a reference to the cache kobject.  The reference is
      supposed to be released by the work function.
      
      However, the work function (sysfs_slab_remove_workfn()) does nothing in
      case the cache sysfs entry has already been deleted, leaking the kobject
      and the corresponding cache.
      
      This may happen on a per memcg cache destruction, because sysfs entries
      of a per memcg cache are deleted on memcg offline if the cache is empty
      (see __kmemcg_cache_deactivate()).
      
      The kmemleak report looks like this:
      
        unreferenced object 0xffff9f798a79f540 (size 32):
          comm "kworker/1:4", pid 15416, jiffies 4307432429 (age 28687.554s)
          hex dump (first 32 bytes):
            6b 6d 61 6c 6c 6f 63 2d 31 36 28 31 35 39 39 3a  kmalloc-16(1599:
            6e 65 77 72 6f 6f 74 29 00 23 6b c0 ff ff ff ff  newroot).#k.....
          backtrace:
             kmemleak_alloc+0x4a/0xa0
             __kmalloc_track_caller+0x148/0x2c0
             kvasprintf+0x66/0xd0
             kasprintf+0x49/0x70
             memcg_create_kmem_cache+0xe6/0x160
             memcg_kmem_cache_create_func+0x20/0x110
             process_one_work+0x205/0x5d0
             worker_thread+0x4e/0x3a0
             kthread+0x109/0x140
             ret_from_fork+0x2a/0x40
        unreferenced object 0xffff9f79b6136840 (size 416):
          comm "kworker/1:4", pid 15416, jiffies 4307432429 (age 28687.573s)
          hex dump (first 32 bytes):
            40 fb 80 c2 3e 33 00 00 00 00 00 40 00 00 00 00  @...>3.....@....
            00 00 00 00 00 00 00 00 10 00 00 00 10 00 00 00  ................
          backtrace:
             kmemleak_alloc+0x4a/0xa0
             kmem_cache_alloc+0x128/0x280
             create_cache+0x3b/0x1e0
             memcg_create_kmem_cache+0x118/0x160
             memcg_kmem_cache_create_func+0x20/0x110
             process_one_work+0x205/0x5d0
             worker_thread+0x4e/0x3a0
             kthread+0x109/0x140
             ret_from_fork+0x2a/0x40
      
      Fix the leak by adding the missing call to kobject_put() to
      sysfs_slab_remove_workfn().
      
      Link: http://lkml.kernel.org/r/20170812181134.25027-1-vdavydov.dev@gmail.com
      Fixes: 3b7b3140 ("slub: make sysfs file removal asynchronous")
      Signed-off-by: NVladimir Davydov <vdavydov.dev@gmail.com>
      Reported-by: NAndrei Vagin <avagin@gmail.com>
      Tested-by: NAndrei Vagin <avagin@gmail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: <stable@vger.kernel.org>	[4.12.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f6ba4880
    • P
      mm: discard memblock data later · 3010f876
      Pavel Tatashin 提交于
      There is existing use after free bug when deferred struct pages are
      enabled:
      
      The memblock_add() allocates memory for the memory array if more than
      128 entries are needed.  See comment in e820__memblock_setup():
      
        * The bootstrap memblock region count maximum is 128 entries
        * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
        * than that - so allow memblock resizing.
      
      This memblock memory is freed here:
              free_low_memory_core_early()
      
      We access the freed memblock.memory later in boot when deferred pages
      are initialized in this path:
      
              deferred_init_memmap()
                      for_each_mem_pfn_range()
                        __next_mem_pfn_range()
                          type = &memblock.memory;
      
      One possible explanation for why this use-after-free hasn't been hit
      before is that the limit of INIT_MEMBLOCK_REGIONS has never been
      exceeded at least on systems where deferred struct pages were enabled.
      
      Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
      and verifying in qemu that this code is getting excuted and that the
      freed pages are sane.
      
      Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com
      Fixes: 7e18adb4 ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
      Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
      Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Reviewed-by: NBob Picco <bob.picco@oracle.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3010f876
    • L
      test_kmod: fix description for -s -and -c parameters · 768dc4e4
      Luis R. Rodriguez 提交于
      The descriptions were reversed, correct this.
      
      Link: http://lkml.kernel.org/r/20170809234635.13443-4-mcgrof@kernel.org
      Fixes: 64b67120 ("test_sysctl: add generic script to expand on tests")
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Reported-by: NDaniel Mentz <danielmentz@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matt Redfearn <matt.redfearn@imgetc.com>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      768dc4e4
    • L
      kmod: fix wait on recursive loop · 2ba293c9
      Luis R. Rodriguez 提交于
      Recursive loops with module loading were previously handled in kmod by
      restricting the number of modprobe calls to 50 and if that limit was
      breached request_module() would return an error and a user would see the
      following on their kernel dmesg:
      
        request_module: runaway loop modprobe binfmt-464c
        Starting init:/sbin/init exists but couldn't execute it (error -8)
      
      This issue could happen for instance when a 64-bit kernel boots a 32-bit
      userspace on some architectures and has no 32-bit binary format
      hanlders.  This is visible, for instance, when a CONFIG_MODULES enabled
      64-bit MIPS kernel boots a into o32 root filesystem and the binfmt
      handler for o32 binaries is not built-in.
      
      After commit 6d7964a7 ("kmod: throttle kmod thread limit") we now
      don't have any visible signs of an error and the kernel just waits for
      the loop to end somehow.
      
      Although this *particular* recursive loop could also be addressed by
      doing a sanity check on search_binary_handler() and disallowing a
      modular binfmt to be required for modprobe, a generic solution for any
      recursive kernel kmod issues is still needed.
      
      This should catch these loops.  We can investigate each loop and address
      each one separately as they come in, this however puts a stop gap for
      them as before.
      
      Link: http://lkml.kernel.org/r/20170809234635.13443-3-mcgrof@kernel.org
      Fixes: 6d7964a7 ("kmod: throttle kmod thread limit")
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Reported-by: NMatt Redfearn <matt.redfearn@imgtec.com>
      Tested-by: NMatt Redfearn <matt.redfearn@imgetc.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ba293c9
    • L
      wait: add wait_event_killable_timeout() · 8ada9279
      Luis R. Rodriguez 提交于
      These are the few pending fixes I have queued up for v4.13-final.  One
      is a a generic regression fix for recursive loops on kmod and the other
      one is a trivial print out correction.
      
      During the v4.13 development we assumed that recursive kmod loops were
      no longer possible.  Clearly that is not true.  The regression fix makes
      use of a new killable wait.  We use a killable wait to be paranoid in
      how signals might be sent to modprobe and only accept a proper SIGKILL.
      The signal will only be available to userspace to issue *iff* a thread
      has already entered a wait state, and that happens only if we've already
      throttled after 50 kmod threads have been hit.
      
      Note that although it may seem excessive to trigger a failure afer 5
      seconds if all kmod thread remain busy, prior to the series of changes
      that went into v4.13 we would actually *always* fatally fail any request
      which came in if the limit was already reached.  The new waiting
      implemented in v4.13 actually gives us *more* breathing room -- the wait
      for 5 seconds is a wait for *any* kmod thread to finish.  We give up and
      fail *iff* no kmod thread has finished and they're *all* running
      straight for 5 consecutive seconds.  If 50 kmod threads are running
      consecutively for 5 seconds something else must be really bad.
      
      Recursive loops with kmod are bad but they're also hard to implement
      properly as a selftest without currently fooling current userspace tools
      like kmod [1].  For instance kmod will complain when you run depmod if
      it finds a recursive loop with symbol dependency between modules as such
      this type of recursive loop cannot go upstream as the modules_install
      target will fail after running depmod.
      
      These tests already exist on userspace kmod upstream though (refer to
      the testsuite/module-playground/mod-loop-*.c files).  The same is not
      true if request_module() is used though, or worst if aliases are used.
      
      Likewise the issue with 64-bit kernels booting 32-bit userspace without
      a binfmt handler built-in is also currently not detected and proactively
      avoided by userspace kmod tools, or kconfig for all architectures.
      Although we could complain in the kernel when some of these individual
      recursive issues creep up, proactively avoiding these situations in
      userspace at build time is what we should keep striving for.
      
      Lastly, since recursive loops could happen with kmod it may mean
      recursive loops may also be possible with other kernel usermode helpers,
      this should be investigated and long term if we can come up with a more
      sensible generic solution even better!
      
      [0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20170809-kmod-for-v4.13-final
      [1] https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git
      
      This patch (of 3):
      
      This wait is similar to wait_event_interruptible_timeout() but only
      accepts SIGKILL interrupt signal.  Other signals are ignored.
      
      Link: http://lkml.kernel.org/r/20170809234635.13443-2-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Jessica Yu <jeyu@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Daniel Mentz <danielmentz@google.com>
      Cc: David Binderman <dcb314@hotmail.com>
      Cc: Matt Redfearn <matt.redfearn@imgetc.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ada9279
    • N
      kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog · 92e5aae4
      Nicholas Piggin 提交于
      Commit 05a4a952 ("kernel/watchdog: split up config options") lost
      the perf-based hardlockup detector's dependency on PERF_EVENTS, which
      can result in broken builds with some powerpc configurations.
      
      Restore the dependency.  Add it in for x86 too, despite x86 always
      selecting PERF_EVENTS it seems reasonable to make the dependency
      explicit.
      
      Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com
      Fixes: 05a4a952 ("kernel/watchdog: split up config options")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92e5aae4
    • J
      mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback() · 739f79fc
      Johannes Weiner 提交于
      Jaegeuk and Brad report a NULL pointer crash when writeback ending tries
      to update the memcg stats:
      
          BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
          IP: test_clear_page_writeback+0x12e/0x2c0
          [...]
          RIP: 0010:test_clear_page_writeback+0x12e/0x2c0
          Call Trace:
           <IRQ>
           end_page_writeback+0x47/0x70
           f2fs_write_end_io+0x76/0x180 [f2fs]
           bio_endio+0x9f/0x120
           blk_update_request+0xa8/0x2f0
           scsi_end_request+0x39/0x1d0
           scsi_io_completion+0x211/0x690
           scsi_finish_command+0xd9/0x120
           scsi_softirq_done+0x127/0x150
           __blk_mq_complete_request_remote+0x13/0x20
           flush_smp_call_function_queue+0x56/0x110
           generic_smp_call_function_single_interrupt+0x13/0x30
           smp_call_function_single_interrupt+0x27/0x40
           call_function_single_interrupt+0x89/0x90
          RIP: 0010:native_safe_halt+0x6/0x10
      
          (gdb) l *(test_clear_page_writeback+0x12e)
          0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619).
          614		mod_node_page_state(page_pgdat(page), idx, val);
          615		if (mem_cgroup_disabled() || !page->mem_cgroup)
          616			return;
          617		mod_memcg_state(page->mem_cgroup, idx, val);
          618		pn = page->mem_cgroup->nodeinfo[page_to_nid(page)];
          619		this_cpu_add(pn->lruvec_stat->count[idx], val);
          620	}
          621
          622	unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
          623							gfp_t gfp_mask,
      
      The issue is that writeback doesn't hold a page reference and the page
      might get freed after PG_writeback is cleared (and the mapping is
      unlocked) in test_clear_page_writeback().  The stat functions looking up
      the page's node or zone are safe, as those attributes are static across
      allocation and free cycles.  But page->mem_cgroup is not, and it will
      get cleared if we race with truncation or migration.
      
      It appears this race window has been around for a while, but less likely
      to trigger when the memcg stats were updated first thing after
      PG_writeback is cleared.  Recent changes reshuffled this code to update
      the global node stats before the memcg ones, though, stretching the race
      window out to an extent where people can reproduce the problem.
      
      Update test_clear_page_writeback() to look up and pin page->mem_cgroup
      before clearing PG_writeback, then not use that pointer afterward.  It
      is a partial revert of 62cccb8c ("mm: simplify lock_page_memcg()")
      but leaves the pageref-holding callsites that aren't affected alone.
      
      Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org
      Fixes: 62cccb8c ("mm: simplify lock_page_memcg()")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Tested-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Reported-by: NBradley Bolen <bradleybolen@gmail.com>
      Tested-by: NBrad Bolen <bradleybolen@gmail.com>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: <stable@vger.kernel.org>	[4.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      739f79fc
    • L
      Merge tag 'powerpc-4.13-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 039a8e38
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "A bug in the VSX register saving that could cause userspace FP/VMX
        register corruption.
      
        Never seen to happen (that we know of), was found by code inspection,
        but still tagged for stable given the consequences"
      
      * tag 'powerpc-4.13-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
      039a8e38
    • L
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 42833468
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Arnd Bergmann:
       "A small number of bugfixes, nothing serious this time. Here is a full
        list.
      
        4.13 regression fix:
      
         - imx7d-sdb pinctrl support regressed in 4.13 due to an incomplete
           patch
      
        DT fixes for recently added devices:
      
         - badly copied DT entries on imx6qdl-nitrogen6_som broke PCI reset
      
         - sama5d2 memory controller had the wrong ID and registers
      
         - imx7 power domains did not work correctly with deferred probing
           (driver added in 4.12)
      
         - Allwinner H5 pinctrl (added in 4.12) did not work right with GPIO
           interrupts
      
        Fixes for older bugs that just got noticed:
      
         - i.MX25 ADC support (added in 4.6) apparently never worked right due
           to a missing 'ranges' property in DT.
      
         - Renesas Salvador Audio support (added in v4.5) was broken for
           device repeated bind/unbind due to a naming conflict.
      
         - Various allwinner boards are missing an 'ethernet' alias in DT,
           leading to unstable device naming.
      
        Preventive bugfix:
      
         - TI Keystone needs a fix to prevent a NULL pointer dereference with
           an upcoming PM change"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        soc: ti: ti_sci_pm_domains: Populate name for genpd
        ARM: dts: imx6qdl-nitrogen6_som2: fix PCIe reset
        arm64: allwinner: h5: fix pinctrl IRQs
        arm64: allwinner: a64: sopine: add missing ethernet0 alias
        arm64: allwinner: a64: pine64: add missing ethernet0 alias
        arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias
        arm64: renesas: salvator-common: avoid audio_clkout naming conflict
        ARM: dts: i.MX25: add ranges to tscadc
        soc: imx: gpcv2: fix regulator deferred probe
        ARM: dts: at91: sama5d2: fix EBI/NAND controllers declaration
        ARM: dts: at91: sama5d2: use sama5d2 compatible string for SMC
        ARM: dts: imx7d-sdb: Put pinctrl_spi4 in the correct location
      42833468
    • L
      Merge tag 'sound-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · cb247857
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "A collection of small fixes, mostly for regression fixes (sequencer
        kconfig and emu10k1 probe) and device-specific quirks (three for USB
        and one for HD-audio).
      
        One significant change is a fix for races in ALSA sequencer core,
        which covers over the previous incomplete fix"
      
      * tag 'sound-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: emu10k1: Fix forgotten user-copy conversion in init code
        ALSA: usb-audio: add DSD support for new Amanero PID
        ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices
        ALSA: usb-audio: Apply sample rate quirk to Sennheiser headset
        ALSA: seq: 2nd attempt at fixing race creating a queue
        ALSA: hda/realtek - Fix pincfg for Dell XPS 13 9370
        ALSA: seq: Fix CONFIG_SND_SEQ_MIDI dependency
      cb247857
    • L
      Merge tag 'dma-mapping-4.13-3' of git://git.infradead.org/users/hch/dma-mapping · 4478976a
      Linus Torvalds 提交于
      Pull dma-mapping fix from Christoph Hellwig:
       "Another dma-mapping regression fix"
      
      * tag 'dma-mapping-4.13-3' of git://git.infradead.org/users/hch/dma-mapping:
        of: fix DMA mask generation
      4478976a
  2. 18 8月, 2017 15 次提交
  3. 17 8月, 2017 6 次提交
  4. 16 8月, 2017 6 次提交
    • B
      powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC · 5a69aec9
      Benjamin Herrenschmidt 提交于
      VSX uses a combination of the old vector registers, the old FP
      registers and new "second halves" of the FP registers.
      
      Thus when we need to see the VSX state in the thread struct
      (flush_vsx_to_thread()) or when we'll use the VSX in the kernel
      (enable_kernel_vsx()) we need to ensure they are all flushed into
      the thread struct if either of them is individually enabled.
      
      Unfortunately we only tested if the whole VSX was enabled, not if they
      were individually enabled.
      
      Fixes: 72cd7b44 ("powerpc: Uncomment and make enable_kernel_vsx() routine available")
      Cc: stable@vger.kernel.org # v4.3+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5a69aec9
    • T
      parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo · 40981160
      Thomas Bogendoerfer 提交于
      For 64bit kernels the lmmio_space_offset of the host bridge window
      isn't set correctly on systems with dino/cujo PCI host bridges.
      This leads to not assigned memory bars and failing drivers, which
      need to use these bars.
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: <stable@vger.kernel.org>
      Acked-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      40981160
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 510c8a89
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix TCP checksum offload handling in iwlwifi driver, from Emmanuel
          Grumbach.
      
       2) In ksz DSA tagging code, free SKB if skb_put_padto() fails. From
          Vivien Didelot.
      
       3) Fix two regressions with bonding on wireless, from Andreas Born.
      
       4) Fix build when busypoll is disabled, from Daniel Borkmann.
      
       5) Fix copy_linear_skb() wrt. SO_PEEK_OFF, from Eric Dumazet.
      
       6) Set SKB cached route properly in inet_rtm_getroute(), from Florian
          Westphal.
      
       7) Fix PCI-E relaxed ordering handling in cxgb4 driver, from Ding
          Tianhong.
      
       8) Fix module refcnt leak in ULP code, from Sabrina Dubroca.
      
       9) Fix use of GFP_KERNEL in atomic contexts in AF_KEY code, from Eric
          Dumazet.
      
      10) Need to purge socket write queue in dccp_destroy_sock(), also from
          Eric Dumazet.
      
      11) Make bpf_trace_printk() work properly on 32-bit architectures, from
          Daniel Borkmann.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
        bpf: fix bpf_trace_printk on 32 bit archs
        PCI: fix oops when try to find Root Port for a PCI device
        sfc: don't try and read ef10 data on non-ef10 NIC
        net_sched: remove warning from qdisc_hash_add
        net_sched/sfq: update hierarchical backlog when drop packet
        net_sched: reset pointers to tcf blocks in classful qdiscs' destructors
        ipv4: fix NULL dereference in free_fib_info_rcu()
        net: Fix a typo in comment about sock flags.
        ipv6: fix NULL dereference in ip6_route_dev_notify()
        tcp: fix possible deadlock in TCP stack vs BPF filter
        dccp: purge write queue in dccp_destroy_sock()
        udp: fix linear skb reception with PEEK_OFF
        ipv6: release rt6->rt6i_idev properly during ifdown
        af_key: do not use GFP_KERNEL in atomic contexts
        tcp: ulp: avoid module refcnt leak in tcp_set_ulp
        net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
        net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
        PCI: Disable Relaxed Ordering Attributes for AMD A1100
        PCI: Disable Relaxed Ordering for some Intel processors
        PCI: Disable PCIe Relaxed Ordering if unsupported
        ...
      510c8a89
    • D
      bpf: fix bpf_trace_printk on 32 bit archs · 88a5c690
      Daniel Borkmann 提交于
      James reported that on MIPS32 bpf_trace_printk() is currently
      broken while MIPS64 works fine:
      
        bpf_trace_printk() uses conditional operators to attempt to
        pass different types to __trace_printk() depending on the
        format operators. This doesn't work as intended on 32-bit
        architectures where u32 and long are passed differently to
        u64, since the result of C conditional operators follows the
        "usual arithmetic conversions" rules, such that the values
        passed to __trace_printk() will always be u64 [causing issues
        later in the va_list handling for vscnprintf()].
      
        For example the samples/bpf/tracex5 test printed lines like
        below on MIPS32, where the fd and buf have come from the u64
        fd argument, and the size from the buf argument:
      
          [...] 1180.941542: 0x00000001: write(fd=1, buf=  (null), size=6258688)
      
        Instead of this:
      
          [...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512)
      
      One way to get it working is to expand various combinations
      of argument types into 8 different combinations for 32 bit
      and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64
      as well that it resolves the issue.
      
      Fixes: 9c959c86 ("tracing: Allow BPF programs to call bpf_trace_printk()")
      Reported-by: NJames Hogan <james.hogan@imgtec.com>
      Tested-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88a5c690
    • D
      PCI: fix oops when try to find Root Port for a PCI device · 0e405232
      dingtianhong 提交于
      Eric report a oops when booting the system after applying
      the commit a99b646a ("PCI: Disable PCIe Relaxed..."):
      
      [    4.241029] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      [    4.247001] IP: pci_find_pcie_root_port+0x62/0x80
      [    4.253011] PGD 0
      [    4.253011] P4D 0
      [    4.253011]
      [    4.258013] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [    4.262015] Modules linked in:
      [    4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
      [    4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
      [    4.279002] task: ffffa2ee38cfa040 task.stack: ffffa51ec0004000
      [    4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
      [    4.290012] RSP: 0000:ffffa51ec0007ab8 EFLAGS: 00010246
      [    4.295003] RAX: 0000000000000000 RBX: ffffa2ee36bae000 RCX: 0000000000000006
      [    4.303002] RDX: 000000000000081c RSI: ffffa2ee38cfa8c8 RDI: ffffa2ee36bae000
      [    4.310013] RBP: ffffa51ec0007b58 R08: 0000000000000001 R09: 0000000000000000
      [    4.317001] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa51ec0007ad0
      [    4.324005] R13: ffffa2ee36bae098 R14: 0000000000000002 R15: ffffa2ee37204818
      [    4.331002] FS:  0000000000000000(0000) GS:ffffa2ee3fcc0000(0000) knlGS:0000000000000000
      [    4.339002] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    4.345001] CR2: 0000000000000050 CR3: 000000401000f000 CR4: 00000000001406e0
      [    4.351002] Call Trace:
      [    4.354012]  ? pci_configure_device+0x19f/0x570
      [    4.359002]  ? pci_conf1_read+0xb8/0xf0
      [    4.363002]  ? raw_pci_read+0x23/0x40
      [    4.366011]  ? pci_read+0x2c/0x30
      [    4.370014]  ? pci_read_config_word+0x67/0x70
      [    4.374012]  pci_device_add+0x28/0x230
      [    4.378012]  ? pci_vpd_f0_read+0x50/0x80
      [    4.382014]  pci_scan_single_device+0x96/0xc0
      [    4.386012]  pci_scan_slot+0x79/0xf0
      [    4.389001]  pci_scan_child_bus+0x31/0x180
      [    4.394014]  acpi_pci_root_create+0x1c6/0x240
      [    4.398013]  pci_acpi_scan_root+0x15f/0x1b0
      [    4.402012]  acpi_pci_root_add+0x2e6/0x400
      [    4.406012]  ? acpi_evaluate_integer+0x37/0x60
      [    4.411002]  acpi_bus_attach+0xdf/0x200
      [    4.415002]  acpi_bus_attach+0x6a/0x200
      [    4.418014]  acpi_bus_attach+0x6a/0x200
      [    4.422013]  acpi_bus_scan+0x38/0x70
      [    4.426011]  acpi_scan_init+0x10c/0x271
      [    4.429001]  acpi_init+0x2fa/0x348
      [    4.433004]  ? acpi_sleep_proc_init+0x2d/0x2d
      [    4.437001]  do_one_initcall+0x43/0x169
      [    4.441001]  kernel_init_freeable+0x1d0/0x258
      [    4.445003]  ? rest_init+0xe0/0xe0
      [    4.449001]  kernel_init+0xe/0x150
      
      ====================== cut here =============================
      
      It looks like the pci_find_pcie_root_port() was trying to
      find the Root Port for the PCI device which is the Root
      Port already, it will return NULL and trigger the problem,
      so check the highest_pcie_bridge to fix thie problem.
      
      Fixes: a99b646a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
      Fixes: c56d4450 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e405232
    • B
      sfc: don't try and read ef10 data on non-ef10 NIC · 61deee96
      Bert Kenward 提交于
      The MAC stats command takes a port ID, which doesn't exist on
      pre-ef10 NICs (5000- and 6000- series). This is extracted from the
      NIC specific data; we misinterpret this as the ef10 data structure,
      causing us to read potentially unallocated data. With a KASAN kernel
      this can cause errors with:
         BUG: KASAN: slab-out-of-bounds in efx_mcdi_mac_stats
      
      Fixes: 0a2ab4d9 ("sfc: set the port-id when calling MC_CMD_MAC_STATS")
      Reported-by: NStefano Brivio <sbrivio@redhat.com>
      Tested-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NBert Kenward <bkenward@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61deee96