1. 15 5月, 2019 2 次提交
  2. 14 5月, 2019 5 次提交
    • L
      fs/writeback: Attach inode's wb to root if needed · 4e97e104
      luanshi 提交于
      There might have tons of files queued in the writeback, awaiting for
      writing back. Unfortunately, the writeback's cgroup has been dead. In
      this case, we reassociate the inode with another writeback, but we
      possibly can't because the writeback associated with the dead cgroup is
      the only valid one. In this case, the new writeback is allocated,
      initialized and associated with the inode in the non-stopping fashion
      until all data resident in the inode's page cache are flushed to disk.
      It causes unnecessary high system load.
      
      This fixes the issue by enforce moving the inode to root cgroup when the
      previous binding cgroup becomes dead. With it, no more unnecessary
      writebacks are created, populated and the system load decreased by about
      6x in the test case we carried out:
          Without the patch: 30% system load
          With the patch:    5%  system load
      Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      4e97e104
    • J
      fs/writeback: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount · e9631ab0
      Jiufei Xue 提交于
      synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
      switch may not go to the workqueue after synchronize_rcu(). Thus
      previous scheduled switches was not finished even flushing the
      workqueue, which will cause a NULL pointer dereferenced followed below.
      
      VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds.  Have a nice day...
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
      [<ffffffff8126a303>] evict+0xb3/0x180
      [<ffffffff8126a760>] iput+0x1b0/0x230
      [<ffffffff8127c690>] inode_switch_wbs_work_fn+0x3c0/0x6a0
      [<ffffffff810a5b2e>] worker_thread+0x4e/0x490
      [<ffffffff810a5ae0>] ? process_one_work+0x410/0x410
      [<ffffffff810ac056>] kthread+0xe6/0x100
      [<ffffffff8173c199>] ret_from_fork+0x39/0x50
      
      Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
      pending callbacks to finish. And inc isw_nr_in_flight after call_rcu()
      in inode_switch_wbs() to make more sense.
      Suggested-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      e9631ab0
    • J
      fs/writeback: fix double free of blkcg_css · 38a9ff8e
      Jiufei Xue 提交于
      We have gotten a WARNNING when releasing blkcg_css:
      
      [332489.681635] WARNING: CPU: 55 PID: 14859 at lib/list_debug.c:56 __list_del_entry+0x81/0xc0
      [332489.682191] list_del corruption, ffff883e6b94d450->prev is LIST_POISON2 (dead000000000200)
      ......
      [332489.683895] CPU: 55 PID: 14859 Comm: kworker/55:2 Tainted: G
      [332489.684477] Hardware name: Inspur SA5248M4/X10DRT-PS, BIOS 4.05A
      10/11/2016
      [332489.685061] Workqueue: cgroup_destroy css_release_work_fn
      [332489.685654]  ffffc9001d92bd28 ffffffff81380042 ffffc9001d92bd78
      0000000000000000
      [332489.686269]  ffffc9001d92bd68 ffffffff81088f8b 0000003800000000
      ffff883e6b94d4a0
      [332489.686867]  ffff883e6b94d400 ffffffff81ce8fe0 ffff88375b24f400
      ffff883e6b94d4a0
      [332489.687479] Call Trace:
      [332489.688078]  [<ffffffff81380042>] dump_stack+0x63/0x81
      [332489.688681]  [<ffffffff81088f8b>] __warn+0xcb/0xf0
      [332489.689276]  [<ffffffff8108900f>] warn_slowpath_fmt+0x5f/0x80
      [332489.689877]  [<ffffffff8139e7c1>] __list_del_entry+0x81/0xc0
      [332489.690481]  [<ffffffff81125552>] css_release_work_fn+0x42/0x140
      [332489.691090]  [<ffffffff810a2db9>] process_one_work+0x189/0x420
      [332489.691693]  [<ffffffff810a309e>] worker_thread+0x4e/0x4b0
      [332489.692293]  [<ffffffff810a3050>] ? process_one_work+0x420/0x420
      [332489.692905]  [<ffffffff810a9616>] kthread+0xe6/0x100
      [332489.693504]  [<ffffffff810a9530>] ? kthread_park+0x60/0x60
      [332489.694099]  [<ffffffff817184e1>] ret_from_fork+0x41/0x50
      [332489.694722] ---[ end trace 0cf869c4a5cfba87 ]---
      ......
      
      This is caused by calling css_get after the css is killed by another
      thread described below:
      
                 Thread 1                       Thread 2
      cgroup_rmdir
        -> kill_css
          -> percpu_ref_kill_and_confirm
            -> css_killed_ref_fn
      
      css_killed_work_fn
        -> css_put
          -> css_release
                                              wb_get_create
      					  -> find_blkcg_css
      					    -> css_get
      					  -> css_put
      					    -> css_release (double free)
          -> css_release_workfn
            -> css_free_work_fn
             -> blkcg_css_free
      
      When doublefree happened, it may free the memory still used by
      other threads and cause a kernel panic.
      
      Fix this by using css_tryget_online in find_blkcg_css while will return
      false if the css is killed.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      38a9ff8e
    • J
      88c2a69e
    • J
      writeback: add memcg_blkcg_link tree · a79cf2e1
      Jiufei Xue 提交于
      Here we add a global radix tree to link memcg and blkcg that the user
      attach the tasks to when using cgroup v1, which is used for writeback
      cgroup.
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      a79cf2e1
  3. 07 5月, 2019 1 次提交
    • J
      ovl: check the capability before cred overridden · 9c3a26bf
      Jiufei Xue 提交于
      We found that it return success when we set IMMUTABLE_FL flag to a file in
      docker even though the docker didn't have the capability
      CAP_LINUX_IMMUTABLE.
      
      The commit d1d04ef8 ("ovl: stack file ops") and dab5ca8f ("ovl: add
      lsattr/chattr support") implemented chattr operations on a regular overlay
      file. ovl_real_ioctl() overridden the current process's subjective
      credentials with ofs->creator_cred which have the capability
      CAP_LINUX_IMMUTABLE so that it will return success in
      vfs_ioctl()->cap_capable().
      
      Fix this by checking the capability before cred overridden. And here we
      only care about APPEND_FL and IMMUTABLE_FL, so get these information from
      inode.
      
      [SzM: move check and call to underlying fs inside inode locked region to
      prevent two such calls from racing with each other]
      Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      9c3a26bf
  4. 02 5月, 2019 1 次提交
  5. 26 4月, 2019 18 次提交
  6. 20 4月, 2019 13 次提交
    • G
      Linux 4.19.36 · c98875d9
      Greg Kroah-Hartman 提交于
      c98875d9
    • A
      appletalk: Fix compile regression · 0c00f71e
      Arnd Bergmann 提交于
      [ Upstream commit 27da0d2ef998e222a876c0cec72aa7829a626266 ]
      
      A bugfix just broke compilation of appletalk when CONFIG_SYSCTL
      is disabled:
      
      In file included from net/appletalk/ddp.c:65:
      net/appletalk/ddp.c: In function 'atalk_init':
      include/linux/atalk.h:164:34: error: expected expression before 'do'
       #define atalk_register_sysctl()  do { } while(0)
                                        ^~
      net/appletalk/ddp.c:1934:7: note: in expansion of macro 'atalk_register_sysctl'
        rc = atalk_register_sysctl();
      
      This is easier to avoid by using conventional inline functions
      as stubs rather than macros. The header already has inline
      functions for other purposes, so I'm changing over all the
      macros for consistency.
      
      Fixes: 6377f787aeb9 ("appletalk: Fix use-after-free in atalk_proc_exit")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0c00f71e
    • K
      mm: hide incomplete nr_indirectly_reclaimable in sysfs · 9e91db59
      Konstantin Khlebnikov 提交于
      In upstream branch this fixed by commit b29940c1abd7 ("mm: rename and
      change semantics of nr_indirectly_reclaimable_bytes").
      
      This fixes /sys/devices/system/node/node*/vmstat format:
      
      ...
      nr_dirtied 6613155
      nr_written 5796802
       11089216
      ...
      
      Cc: <stable@vger.kernel.org> # 4.19.y
      Fixes: 7aaf7727 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat")
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9e91db59
    • R
      mm: hide incomplete nr_indirectly_reclaimable in /proc/zoneinfo · d49dea54
      Roman Gushchin 提交于
      [fixed differently upstream, this is a work-around to resolve it for 4.19.y]
      
      Yongqin reported that /proc/zoneinfo format is broken in 4.14
      due to commit 7aaf7727 ("mm: don't show nr_indirectly_reclaimable
      in /proc/vmstat")
      
      Node 0, zone      DMA
        per-node stats
            nr_inactive_anon 403
            nr_active_anon 89123
            nr_inactive_file 128887
            nr_active_file 47377
            nr_unevictable 2053
            nr_slab_reclaimable 7510
            nr_slab_unreclaimable 10775
            nr_isolated_anon 0
            nr_isolated_file 0
            <...>
            nr_vmscan_write 0
            nr_vmscan_immediate_reclaim 0
            nr_dirtied   6022
            nr_written   5985
                         74240
            ^^^^^^^^^^
        pages free     131656
      
      The problem is caused by the nr_indirectly_reclaimable counter,
      which is hidden from the /proc/vmstat, but not from the
      /proc/zoneinfo. Let's fix this inconsistency and hide the
      counter from /proc/zoneinfo exactly as from /proc/vmstat.
      
      BTW, in 4.19+ the counter has been renamed and exported by
      the commit b29940c1abd7 ("mm: rename and change semantics of
      nr_indirectly_reclaimable_bytes"), so there is no such a problem
      anymore.
      
      Cc: <stable@vger.kernel.org> # 4.14.x-4.18.x
      Fixes: 7aaf7727 ("mm: don't show nr_indirectly_reclaimable in /proc/vmstat")
      Reported-by: NYongqin Liu <yongqin.liu@linaro.org>
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d49dea54
    • K
      IB/hfi1: Failed to drain send queue when QP is put into error state · 7a462679
      Kaike Wan 提交于
      commit 662d66466637862ef955f7f6e78a286d8cf0ebef upstream.
      
      When a QP is put into error state, all pending requests in the send work
      queue should be drained. The following sequence of events could lead to a
      failure, causing a request to hang:
      
      (1) The QP builds a packet and tries to send through SDMA engine.
          However, PIO engine is still busy. Consequently, this packet is put on
          the QP's tx list and the QP is put on the PIO waiting list. The field
          qp->s_flags is set with HFI1_S_WAIT_PIO_DRAIN;
      
      (2) The QP is put into error state by the user application and
          notify_error_qp() is called, which removes the QP from the PIO waiting
          list and the packet from the QP's tx list. In addition, qp->s_flags is
          cleared of RVT_S_ANY_WAIT_IO bits, which does not include
          HFI1_S_WAIT_PIO_DRAIN bit;
      
      (3) The hfi1_schdule_send() function is called to drain the QP's send
          queue. Subsequently, hfi1_do_send() is called. Since the flag bit
          HFI1_S_WAIT_PIO_DRAIN is set in qp->s_flags, hfi1_send_ok() fails.  As
          a result, hfi1_do_send() bails out without draining any request from
          the send queue;
      
      (4) The PIO engine completes the sending and tries to wake up any QP on
          its waiting list. But the QP has been removed from the PIO waiting
          list and therefore is kept in sleep forever.
      
      The fix is to clear qp->s_flags of HFI1_S_ANY_WAIT_IO bits in step (2).
      HFI1_S_ANY_WAIT_IO includes RVT_S_ANY_WAIT_IO and HFI1_S_WAIT_PIO_DRAIN.
      
      Fixes: 2e2ba09e ("IB/rdmavt, IB/hfi1: Create device dependent s_flags")
      Cc: <stable@vger.kernel.org> # 4.19.x+
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NAlex Estrin <alex.estrin@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      7a462679
    • D
      bpf: fix use after free in bpf_evict_inode · e8eef7ad
      Daniel Borkmann 提交于
      [ Upstream commit 1da6c4d9140cb7c13e87667dc4e1488d6c8fc10f ]
      
      syzkaller was able to generate the following UAF in bpf:
      
        BUG: KASAN: use-after-free in lookup_last fs/namei.c:2269 [inline]
        BUG: KASAN: use-after-free in path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318
        Read of size 1 at addr ffff8801c4865c47 by task syz-executor2/9423
      
        CPU: 0 PID: 9423 Comm: syz-executor2 Not tainted 4.20.0-rc1-next-20181109+
        #110
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
        Google 01/01/2011
        Call Trace:
          __dump_stack lib/dump_stack.c:77 [inline]
          dump_stack+0x244/0x39d lib/dump_stack.c:113
          print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
          kasan_report_error mm/kasan/report.c:354 [inline]
          kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
          __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
          lookup_last fs/namei.c:2269 [inline]
          path_lookupat.isra.43+0x9f8/0xc00 fs/namei.c:2318
          filename_lookup+0x26a/0x520 fs/namei.c:2348
          user_path_at_empty+0x40/0x50 fs/namei.c:2608
          user_path include/linux/namei.h:62 [inline]
          do_mount+0x180/0x1ff0 fs/namespace.c:2980
          ksys_mount+0x12d/0x140 fs/namespace.c:3258
          __do_sys_mount fs/namespace.c:3272 [inline]
          __se_sys_mount fs/namespace.c:3269 [inline]
          __x64_sys_mount+0xbe/0x150 fs/namespace.c:3269
          do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
        RIP: 0033:0x457569
        Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
        48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
        ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
        RSP: 002b:00007fde6ed96c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
        RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000457569
        RDX: 0000000020000040 RSI: 0000000020000000 RDI: 0000000000000000
        RBP: 000000000072bf00 R08: 0000000020000340 R09: 0000000000000000
        R10: 0000000000200000 R11: 0000000000000246 R12: 00007fde6ed976d4
        R13: 00000000004c2c24 R14: 00000000004d4990 R15: 00000000ffffffff
      
        Allocated by task 9424:
          save_stack+0x43/0xd0 mm/kasan/kasan.c:448
          set_track mm/kasan/kasan.c:460 [inline]
          kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
          __do_kmalloc mm/slab.c:3722 [inline]
          __kmalloc_track_caller+0x157/0x760 mm/slab.c:3737
          kstrdup+0x39/0x70 mm/util.c:49
          bpf_symlink+0x26/0x140 kernel/bpf/inode.c:356
          vfs_symlink+0x37a/0x5d0 fs/namei.c:4127
          do_symlinkat+0x242/0x2d0 fs/namei.c:4154
          __do_sys_symlink fs/namei.c:4173 [inline]
          __se_sys_symlink fs/namei.c:4171 [inline]
          __x64_sys_symlink+0x59/0x80 fs/namei.c:4171
          do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        Freed by task 9425:
          save_stack+0x43/0xd0 mm/kasan/kasan.c:448
          set_track mm/kasan/kasan.c:460 [inline]
          __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
          kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
          __cache_free mm/slab.c:3498 [inline]
          kfree+0xcf/0x230 mm/slab.c:3817
          bpf_evict_inode+0x11f/0x150 kernel/bpf/inode.c:565
          evict+0x4b9/0x980 fs/inode.c:558
          iput_final fs/inode.c:1550 [inline]
          iput+0x674/0xa90 fs/inode.c:1576
          do_unlinkat+0x733/0xa30 fs/namei.c:4069
          __do_sys_unlink fs/namei.c:4110 [inline]
          __se_sys_unlink fs/namei.c:4108 [inline]
          __x64_sys_unlink+0x42/0x50 fs/namei.c:4108
          do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      In this scenario path lookup under RCU is racing with the final
      unlink in case of symlinks. As Linus puts it in his analysis:
      
        [...] We actually RCU-delay the inode freeing itself, but
        when we do the final iput(), the "evict()" function is called
        synchronously. Now, the simple fix would seem to just RCU-delay
        the kfree() of the symlink data in bpf_evict_inode(). Maybe
        that's the right thing to do. [...]
      
      Al suggested to piggy-back on the ->destroy_inode() callback in
      order to implement RCU deferral there which can then kfree() the
      inode->i_link eventually right before putting inode back into
      inode cache. By reusing free_inode_nonrcu() from there we can
      avoid the need for our own inode cache and just reuse generic
      one as we currently do.
      
      And in-fact on top of all this we should just get rid of the
      bpf_evict_inode() entirely. This means truncate_inode_pages_final()
      and clear_inode() will then simply be called by the fs core via
      evict(). Dropping the reference should really only be done when
      inode is unhashed and nothing reachable anymore, so it's better
      also moved into the final ->destroy_inode() callback.
      
      Fixes: 0f98621b ("bpf, inode: add support for symlinks and fix mtime/ctime")
      Reported-by: syzbot+fb731ca573367b7f6564@syzkaller.appspotmail.com
      Reported-by: syzbot+a13e5ead792d6df37818@syzkaller.appspotmail.com
      Reported-by: syzbot+7a8ba368b47fdefca61e@syzkaller.appspotmail.com
      Suggested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Analyzed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Link: https://lore.kernel.org/lkml/0000000000006946d2057bbd0eef@google.com/T/Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
      e8eef7ad
    • P
      include/linux/swap.h: use offsetof() instead of custom __swapoffset macro · 40c6d718
      Pi-Hsun Shih 提交于
      [ Upstream commit a4046c06be50a4f01d435aa7fe57514818e6cc82 ]
      
      Use offsetof() to calculate offset of a field to take advantage of
      compiler built-in version when possible, and avoid UBSAN warning when
      compiling with Clang:
      
        UBSAN: Undefined behaviour in mm/swapfile.c:3010:38
        member access within null pointer of type 'union swap_header'
        CPU: 6 PID: 1833 Comm: swapon Tainted: G S                4.19.23 #43
        Call trace:
         dump_backtrace+0x0/0x194
         show_stack+0x20/0x2c
         __dump_stack+0x20/0x28
         dump_stack+0x70/0x94
         ubsan_epilogue+0x14/0x44
         ubsan_type_mismatch_common+0xf4/0xfc
         __ubsan_handle_type_mismatch_v1+0x34/0x54
         __se_sys_swapon+0x654/0x1084
         __arm64_sys_swapon+0x1c/0x24
         el0_svc_common+0xa8/0x150
         el0_svc_compat_handler+0x2c/0x38
         el0_svc_compat+0x8/0x18
      
      Link: http://lkml.kernel.org/r/20190312081902.223764-1-pihsun@chromium.orgSigned-off-by: NPi-Hsun Shih <pihsun@chromium.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      40c6d718
    • C
      f2fs: fix to dirty inode for i_mode recovery · 48b0309f
      Chao Yu 提交于
      [ Upstream commit ca597bddedd94906cd761d8be6a3ad21292725de ]
      
      As Seulbae Kim reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202637
      
      We didn't recover permission field correctly after sudden power-cut,
      the reason is in setattr we didn't add inode into global dirty list
      once i_mode is changed, so latter checkpoint triggered by fsync will
      not flush last i_mode into disk, result in this problem, fix it.
      Reported-by: NSeulbae Kim <seulbae@gatech.edu>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      48b0309f
    • D
      rxrpc: Fix client call connect/disconnect race · 11582064
      David Howells 提交于
      [ Upstream commit 930c9f9125c85b5134b3e711bc252ecc094708e3 ]
      
      rxrpc_disconnect_client_call() reads the call's connection ID protocol
      value (call->cid) as part of that function's variable declarations.  This
      is bad because it's not inside the locked section and so may race with
      someone granting use of the channel to the call.
      
      This manifests as an assertion failure (see below) where the call in the
      presumed channel (0 because call->cid wasn't set when we read it) doesn't
      match the call attached to the channel we were actually granted (if 1, 2 or
      3).
      
      Fix this by moving the read and dependent calculations inside of the
      channel_lock section.  Also, only set the channel number and pointer
      variables if cid is not zero (ie. unset).
      
      This problem can be induced by injecting an occasional error in
      rxrpc_wait_for_channel() before the call to schedule().
      
      Make two further changes also:
      
       (1) Add a trace for wait failure in rxrpc_connect_call().
      
       (2) Drop channel_lock before BUG'ing in the case of the assertion failure.
      
      The failure causes a trace akin to the following:
      
      rxrpc: Assertion failed - 18446612685268945920(0xffff8880beab8c00) == 18446612685268621312(0xffff8880bea69800) is false
      ------------[ cut here ]------------
      kernel BUG at net/rxrpc/conn_client.c:824!
      ...
      RIP: 0010:rxrpc_disconnect_client_call+0x2bf/0x99d
      ...
      Call Trace:
       rxrpc_connect_call+0x902/0x9b3
       ? wake_up_q+0x54/0x54
       rxrpc_new_client_call+0x3a0/0x751
       ? rxrpc_kernel_begin_call+0x141/0x1bc
       ? afs_alloc_call+0x1b5/0x1b5
       rxrpc_kernel_begin_call+0x141/0x1bc
       afs_make_call+0x20c/0x525
       ? afs_alloc_call+0x1b5/0x1b5
       ? __lock_is_held+0x40/0x71
       ? lockdep_init_map+0xaf/0x193
       ? lockdep_init_map+0xaf/0x193
       ? __lock_is_held+0x40/0x71
       ? yfs_fs_fetch_data+0x33b/0x34a
       yfs_fs_fetch_data+0x33b/0x34a
       afs_fetch_data+0xdc/0x3b7
       afs_read_dir+0x52d/0x97f
       afs_dir_iterate+0xa0/0x661
       ? iterate_dir+0x63/0x141
       iterate_dir+0xa2/0x141
       ksys_getdents64+0x9f/0x11b
       ? filldir+0x111/0x111
       ? do_syscall_64+0x3e/0x1a0
       __x64_sys_getdents64+0x16/0x19
       do_syscall_64+0x7d/0x1a0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 45025bce ("rxrpc: Improve management and caching of client connection objects")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      11582064
    • S
      lib/div64.c: off by one in shift · a7e90c18
      Stanislaw Gruszka 提交于
      [ Upstream commit cdc94a37493135e355dfc0b0e086d84e3eadb50d ]
      
      fls counts bits starting from 1 to 32 (returns 0 for zero argument).  If
      we add 1 we shift right one bit more and loose precision from divisor,
      what cause function incorect results with some numbers.
      
      Corrected code was tested in user-space, see bugzilla:
         https://bugzilla.kernel.org/show_bug.cgi?id=202391
      
      Link: http://lkml.kernel.org/r/1548686944-11891-1-git-send-email-sgruszka@redhat.com
      Fixes: 658716d1 ("div64_u64(): improve precision on 32bit platforms")
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Reported-by: NSiarhei Volkau <lis8215@gmail.com>
      Tested-by: NSiarhei Volkau <lis8215@gmail.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      a7e90c18
    • Y
      appletalk: Fix use-after-free in atalk_proc_exit · 6c42507f
      YueHaibing 提交于
      [ Upstream commit 6377f787aeb945cae7abbb6474798de129e1f3ac ]
      
      KASAN report this:
      
      BUG: KASAN: use-after-free in pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
      Read of size 8 at addr ffff8881f41fe5b0 by task syz-executor.0/2806
      
      CPU: 0 PID: 2806 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xfa/0x1ce lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
       remove_proc_entry+0xe8/0x420 fs/proc/generic.c:667
       atalk_proc_exit+0x18/0x820 [appletalk]
       atalk_exit+0xf/0x5a [appletalk]
       __do_sys_delete_module kernel/module.c:1018 [inline]
       __se_sys_delete_module kernel/module.c:961 [inline]
       __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb2de6b9c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200001c0
      RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb2de6ba6bc
      R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff
      
      Allocated by task 2806:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
       slab_post_alloc_hook mm/slab.h:444 [inline]
       slab_alloc_node mm/slub.c:2739 [inline]
       slab_alloc mm/slub.c:2747 [inline]
       kmem_cache_alloc+0xcf/0x250 mm/slub.c:2752
       kmem_cache_zalloc include/linux/slab.h:730 [inline]
       __proc_create+0x30f/0xa20 fs/proc/generic.c:408
       proc_mkdir_data+0x47/0x190 fs/proc/generic.c:469
       0xffffffffc10c01bb
       0xffffffffc10c0166
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 2806:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
       slab_free_hook mm/slub.c:1409 [inline]
       slab_free_freelist_hook mm/slub.c:1436 [inline]
       slab_free mm/slub.c:2986 [inline]
       kmem_cache_free+0xa6/0x2a0 mm/slub.c:3002
       pde_put+0x6e/0x80 fs/proc/generic.c:647
       remove_proc_entry+0x1d3/0x420 fs/proc/generic.c:684
       0xffffffffc10c031c
       0xffffffffc10c0166
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881f41fe500
       which belongs to the cache proc_dir_entry of size 256
      The buggy address is located 176 bytes inside of
       256-byte region [ffff8881f41fe500, ffff8881f41fe600)
      The buggy address belongs to the page:
      page:ffffea0007d07f80 count:1 mapcount:0 mapping:ffff8881f6e69a00 index:0x0
      flags: 0x2fffc0000000200(slab)
      raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6e69a00
      raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
       ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
       ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      It should check the return value of atalk_proc_init fails,
      otherwise atalk_exit will trgger use-after-free in pde_subdir_find
      while unload the module.This patch fix error cleanup path of atalk_init
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6c42507f
    • K
      drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI) · 539282e9
      Kevin Wang 提交于
      [ Upstream commit cac734c2dbd2514f14c8c6a17caba1990d83bf1d ]
      
      if use the legacy method to allocate object, when mqd_hiq need to run
      uninit code, it will be cause WARNING call trace.
      
      eg: (s3 suspend test)
      [   34.918944] Call Trace:
      [   34.918948]  [<ffffffff92961dc1>] dump_stack+0x19/0x1b
      [   34.918950]  [<ffffffff92297648>] __warn+0xd8/0x100
      [   34.918951]  [<ffffffff9229778d>] warn_slowpath_null+0x1d/0x20
      [   34.918991]  [<ffffffffc03ce1fe>] uninit_mqd_hiq_sdma+0x4e/0x50 [amdgpu]
      [   34.919028]  [<ffffffffc03d0ef7>] uninitialize+0x37/0xe0 [amdgpu]
      [   34.919064]  [<ffffffffc03d15a6>] kernel_queue_uninit+0x16/0x30 [amdgpu]
      [   34.919086]  [<ffffffffc03d26c2>] pm_uninit+0x12/0x20 [amdgpu]
      [   34.919107]  [<ffffffffc03d4915>] stop_nocpsch+0x15/0x20 [amdgpu]
      [   34.919129]  [<ffffffffc03c1dce>] kgd2kfd_suspend.part.4+0x2e/0x50 [amdgpu]
      [   34.919150]  [<ffffffffc03c2667>] kgd2kfd_suspend+0x17/0x20 [amdgpu]
      [   34.919171]  [<ffffffffc03c103a>] amdgpu_amdkfd_suspend+0x1a/0x20 [amdgpu]
      [   34.919187]  [<ffffffffc02ec428>] amdgpu_device_suspend+0x88/0x3a0 [amdgpu]
      [   34.919189]  [<ffffffff922e22cf>] ? enqueue_entity+0x2ef/0xbe0
      [   34.919205]  [<ffffffffc02e8220>] amdgpu_pmops_suspend+0x20/0x30 [amdgpu]
      [   34.919207]  [<ffffffff925c56ff>] pci_pm_suspend+0x6f/0x150
      [   34.919208]  [<ffffffff925c5690>] ? pci_pm_freeze+0xf0/0xf0
      [   34.919210]  [<ffffffff926b45c6>] dpm_run_callback+0x46/0x90
      [   34.919212]  [<ffffffff926b49db>] __device_suspend+0xfb/0x2a0
      [   34.919213]  [<ffffffff926b4b9f>] async_suspend+0x1f/0xa0
      [   34.919214]  [<ffffffff922c918f>] async_run_entry_fn+0x3f/0x130
      [   34.919216]  [<ffffffff922b9d4f>] process_one_work+0x17f/0x440
      [   34.919217]  [<ffffffff922bade6>] worker_thread+0x126/0x3c0
      [   34.919218]  [<ffffffff922bacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
      [   34.919220]  [<ffffffff922c1c31>] kthread+0xd1/0xe0
      [   34.919221]  [<ffffffff922c1b60>] ? insert_kthread_work+0x40/0x40
      [   34.919222]  [<ffffffff92974c1d>] ret_from_fork_nospec_begin+0x7/0x21
      [   34.919224]  [<ffffffff922c1b60>] ? insert_kthread_work+0x40/0x40
      [   34.919224] ---[ end trace 38cd9f65c963adad ]---
      Signed-off-by: NKevin Wang <kevin1.wang@amd.com>
      Reviewed-by: NOak Zeng <Oak.Zeng@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      539282e9
    • Y
      ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t · 91583411
      Yang Shi 提交于
      [ Upstream commit 143c2a89e0e5fda6c6fd08d7bc1126438c19ae90 ]
      
      When running kprobe on -rt kernel, the below bug is caught:
      
      |BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:931
      |in_atomic(): 1, irqs_disabled(): 128, pid: 14, name: migration/0
      |Preemption disabled at:[<802f2b98>] cpu_stopper_thread+0xc0/0x140
      |CPU: 0 PID: 14 Comm: migration/0 Tainted: G O 4.8.3-rt2 #1
      |Hardware name: Freescale LS1021A
      |[<8025a43c>] (___might_sleep)
      |[<80b5b324>] (rt_spin_lock)
      |[<80b5c31c>] (__patch_text_real)
      |[<80b5c3ac>] (patch_text_stop_machine)
      |[<802f2920>] (multi_cpu_stop)
      
      Since patch_text_stop_machine() is called in stop_machine() which
      disables IRQ, sleepable lock should be not used in this atomic context,
       so replace patch_lock to raw lock.
      Signed-off-by: NYang Shi <yang.shi@linaro.org>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      91583411