1. 06 11月, 2019 17 次提交
  2. 13 10月, 2019 2 次提交
    • S
      tracing: Do not create tracefs files if tracefs lockdown is in effect · bf8e6021
      Steven Rostedt (VMware) 提交于
      If on boot up, lockdown is activated for tracefs, don't even bother creating
      the files. This can also prevent instances from being created if lockdown is
      in effect.
      
      Link: http://lkml.kernel.org/r/CAHk-=whC6Ji=fWnjh2+eS4b15TnbsS4VPVtvBOwCy1jjEG_JHQ@mail.gmail.comSuggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      bf8e6021
    • S
      tracefs: Revert ccbd54ff ("tracefs: Restrict tracefs when the kernel is locked down") · 3ed270b1
      Steven Rostedt (VMware) 提交于
      Running the latest kernel through my "make instances" stress tests, I
      triggered the following bug (with KASAN and kmemleak enabled):
      
      mkdir invoked oom-killer:
      gfp_mask=0x40cd0(GFP_KERNEL|__GFP_COMP|__GFP_RECLAIMABLE), order=0,
      oom_score_adj=0
      CPU: 1 PID: 2229 Comm: mkdir Not tainted 5.4.0-rc2-test #325
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      Call Trace:
       dump_stack+0x64/0x8c
       dump_header+0x43/0x3b7
       ? trace_hardirqs_on+0x48/0x4a
       oom_kill_process+0x68/0x2d5
       out_of_memory+0x2aa/0x2d0
       __alloc_pages_nodemask+0x96d/0xb67
       __alloc_pages_node+0x19/0x1e
       alloc_slab_page+0x17/0x45
       new_slab+0xd0/0x234
       ___slab_alloc.constprop.86+0x18f/0x336
       ? alloc_inode+0x2c/0x74
       ? irq_trace+0x12/0x1e
       ? tracer_hardirqs_off+0x1d/0xd7
       ? __slab_alloc.constprop.85+0x21/0x53
       __slab_alloc.constprop.85+0x31/0x53
       ? __slab_alloc.constprop.85+0x31/0x53
       ? alloc_inode+0x2c/0x74
       kmem_cache_alloc+0x50/0x179
       ? alloc_inode+0x2c/0x74
       alloc_inode+0x2c/0x74
       new_inode_pseudo+0xf/0x48
       new_inode+0x15/0x25
       tracefs_get_inode+0x23/0x7c
       ? lookup_one_len+0x54/0x6c
       tracefs_create_file+0x53/0x11d
       trace_create_file+0x15/0x33
       event_create_dir+0x2a3/0x34b
       __trace_add_new_event+0x1c/0x26
       event_trace_add_tracer+0x56/0x86
       trace_array_create+0x13e/0x1e1
       instance_mkdir+0x8/0x17
       tracefs_syscall_mkdir+0x39/0x50
       ? get_dname+0x31/0x31
       vfs_mkdir+0x78/0xa3
       do_mkdirat+0x71/0xb0
       sys_mkdir+0x19/0x1b
       do_fast_syscall_32+0xb0/0xed
      
      I bisected this down to the addition of the proxy_ops into tracefs for
      lockdown. It appears that the allocation of the proxy_ops and then freeing
      it in the destroy_inode callback, is causing havoc with the memory system.
      Reading the documentation about destroy_inode and talking with Linus about
      this, this is buggy and wrong. When defining the destroy_inode() method, it
      is expected that the destroy_inode() will also free the inode, and not just
      the extra allocations done in the creation of the inode. The faulty commit
      causes a memory leak of the inode data structure when they are deleted.
      
      Instead of allocating the proxy_ops (and then having to free it) the checks
      should be done by the open functions themselves, and not hack into the
      tracefs directory. First revert the tracefs updates for locked_down and then
      later we can add the locked_down checks in the kernel/trace files.
      
      Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
      
      Fixes: ccbd54ff ("tracefs: Restrict tracefs when the kernel is locked down")
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      3ed270b1
  3. 11 10月, 2019 2 次提交
    • J
      io_uring: fix sequence logic for timeout requests · 7adf4eaf
      Jens Axboe 提交于
      We have two ways a request can be deferred:
      
      1) It's a regular request that depends on another one
      2) It's a timeout that tracks completions
      
      We have a shared helper to determine whether to defer, and that
      attempts to make the right decision based on the request. But we
      only have some of this information in the caller. Un-share the
      two timeout/defer helpers so the caller can use the right one.
      
      Fixes: 5262f567 ("io_uring: IORING_OP_TIMEOUT support")
      Reported-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7adf4eaf
    • C
      NFSv4: Fix leak of clp->cl_acceptor string · 1047ec86
      Chuck Lever 提交于
      Our client can issue multiple SETCLIENTID operations to the same
      server in some circumstances. Ensure that calls to
      nfs4_proc_setclientid() after the first one do not overwrite the
      previously allocated cl_acceptor string.
      
      unreferenced object 0xffff888461031800 (size 32):
        comm "mount.nfs", pid 2227, jiffies 4294822467 (age 1407.749s)
        hex dump (first 32 bytes):
          6e 66 73 40 6b 6c 69 6d 74 2e 69 62 2e 31 30 31  nfs@klimt.ib.101
          35 67 72 61 6e 67 65 72 2e 6e 65 74 00 00 00 00  5granger.net....
        backtrace:
          [<00000000ab820188>] __kmalloc+0x128/0x176
          [<00000000eeaf4ec8>] gss_stringify_acceptor+0xbd/0x1a7 [auth_rpcgss]
          [<00000000e85e3382>] nfs4_proc_setclientid+0x34e/0x46c [nfsv4]
          [<000000003d9cf1fa>] nfs40_discover_server_trunking+0x7a/0xed [nfsv4]
          [<00000000b81c3787>] nfs4_discover_server_trunking+0x81/0x244 [nfsv4]
          [<000000000801b55f>] nfs4_init_client+0x1b0/0x238 [nfsv4]
          [<00000000977daf7f>] nfs4_set_client+0xfe/0x14d [nfsv4]
          [<0000000053a68a2a>] nfs4_create_server+0x107/0x1db [nfsv4]
          [<0000000088262019>] nfs4_remote_mount+0x2c/0x59 [nfsv4]
          [<00000000e84a2fd0>] legacy_get_tree+0x2d/0x4c
          [<00000000797e947c>] vfs_get_tree+0x20/0xc7
          [<00000000ecabaaa8>] fc_mount+0xe/0x36
          [<00000000f15fafc2>] vfs_kern_mount+0x74/0x8d
          [<00000000a3ff4e26>] nfs_do_root_mount+0x8a/0xa3 [nfsv4]
          [<00000000d1c2b337>] nfs4_try_mount+0x58/0xad [nfsv4]
          [<000000004c9bddee>] nfs_fs_mount+0x820/0x869 [nfs]
      
      Fixes: f11b2a1c ("nfs4: copy acceptor name from context ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1047ec86
  4. 10 10月, 2019 3 次提交
  5. 09 10月, 2019 9 次提交
  6. 08 10月, 2019 7 次提交
    • A
      btrfs: silence maybe-uninitialized warning in clone_range · 431d3988
      Austin Kim 提交于
      GCC throws warning message as below:
      
      ‘clone_src_i_size’ may be used uninitialized in this function
      [-Wmaybe-uninitialized]
       #define IS_ALIGNED(x, a)  (((x) & ((typeof(x))(a) - 1)) == 0)
                             ^
      fs/btrfs/send.c:5088:6: note: ‘clone_src_i_size’ was declared here
       u64 clone_src_i_size;
         ^
      The clone_src_i_size is only used as call-by-reference
      in a call to get_inode_info().
      
      Silence the warning by initializing clone_src_i_size to 0.
      
      Note that the warning is a false positive and reported by older versions
      of GCC (eg. 7.x) but not eg 9.x. As there have been numerous people, the
      patch is applied. Setting clone_src_i_size to 0 does not otherwise make
      sense and would not do any action in case the code changes in the future.
      Signed-off-by: NAustin Kim <austindh.kim@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add note ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      431d3988
    • P
      io_uring: remove wait loop spurious wakeups · 6805b32e
      Pavel Begunkov 提交于
      Any changes interesting to tasks waiting in io_cqring_wait() are
      commited with io_cqring_ev_posted(). However, io_ring_drop_ctx_refs()
      also tries to do that but with no reason, that means spurious wakeups
      every io_free_req() and io_uring_enter().
      
      Just use percpu_ref_put() instead.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6805b32e
    • T
      writeback: fix use-after-free in finish_writeback_work() · 8e00c4e9
      Tejun Heo 提交于
      finish_writeback_work() reads @done->waitq after decrementing
      @done->cnt.  However, once @done->cnt reaches zero, @done may be freed
      (from stack) at any moment and @done->waitq can contain something
      unrelated by the time finish_writeback_work() tries to read it.  This
      led to the following crash.
      
        "BUG: kernel NULL pointer dereference, address: 0000000000000002"
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0002) - not-present page
        PGD 0 P4D 0
        Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
        CPU: 40 PID: 555153 Comm: kworker/u98:50 Kdump: loaded Not tainted
        ...
        Workqueue: writeback wb_workfn (flush-btrfs-1)
        RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
        Code: 48 89 d8 5b c3 e8 50 db 6b ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 48 89 d8 5b c3 89 c6 e8 fe ca 6b ff eb f2 66 90
        RSP: 0018:ffffc90049b27d98 EFLAGS: 00010046
        RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: 0000000000000003 RDI: 0000000000000002
        RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
        R10: ffff889fff407600 R11: ffff88ba9395d740 R12: 000000000000e300
        R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
        FS:  0000000000000000(0000) GS:ffff88bfdfa00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000002 CR3: 0000000002409005 CR4: 00000000001606e0
        Call Trace:
         __wake_up_common_lock+0x63/0xc0
         wb_workfn+0xd2/0x3e0
         process_one_work+0x1f5/0x3f0
         worker_thread+0x2d/0x3d0
         kthread+0x111/0x130
         ret_from_fork+0x1f/0x30
      
      Fix it by reading and caching @done->waitq before decrementing
      @done->cnt.
      
      Link: http://lkml.kernel.org/r/20190924010631.GH2233839@devbig004.ftw2.facebook.com
      Fixes: 5b9cce4c ("writeback: Generalize and expose wb_completion")
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Debugged-by: NChris Mason <clm@fb.com>
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Cc: Jan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>	[5.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e00c4e9
    • J
      fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc() · 2abb7d3b
      Jia-Ju Bai 提交于
      In ocfs2_info_scan_inode_alloc(), there is an if statement on line 283
      to check whether inode_alloc is NULL:
      
          if (inode_alloc)
      
      When inode_alloc is NULL, it is used on line 287:
      
          ocfs2_inode_lock(inode_alloc, &bh, 0);
              ocfs2_inode_lock_full_nested(inode, ...)
                  struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, inode_alloc is checked on line 286.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Link: http://lkml.kernel.org/r/20190726033717.32359-1-baijiaju1990@gmail.comSigned-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2abb7d3b
    • J
      fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock() · 583fee3e
      Jia-Ju Bai 提交于
      In ocfs2_write_end_nolock(), there are an if statement on lines 1976,
      2047 and 2058, to check whether handle is NULL:
      
          if (handle)
      
      When handle is NULL, it is used on line 2045:
      
      	ocfs2_update_inode_fsync_trans(handle, inode, 1);
              oi->i_sync_tid = handle->h_transaction->t_tid;
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, handle is checked before calling
      ocfs2_update_inode_fsync_trans().
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Link: http://lkml.kernel.org/r/20190726033705.32307-1-baijiaju1990@gmail.comSigned-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      583fee3e
    • J
      fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry() · 56e94ea1
      Jia-Ju Bai 提交于
      In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to
      check whether loc->xl_entry is NULL:
      
          if (loc->xl_entry)
      
      When loc->xl_entry is NULL, it is used on line 2158:
      
          ocfs2_xa_add_entry(loc, name_hash);
              loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
              loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);
      
      and line 2164:
      
          ocfs2_xa_add_namevalue(loc, xi);
              loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len);
              loc->xl_entry->xe_name_len = xi->xi_name_len;
      
      Thus, possible null-pointer dereferences may occur.
      
      To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry()
      abnormally returns with -EINVAL.
      
      These bugs are found by a static analysis tool STCheck written by us.
      
      [akpm@linux-foundation.org: remove now-unused ocfs2_xa_add_entry()]
      Link: http://lkml.kernel.org/r/20190726101447.9153-1-baijiaju1990@gmail.comSigned-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56e94ea1
    • J
      ocfs2: clear zero in unaligned direct IO · 7a243c82
      Jia Guo 提交于
      Unused portion of a part-written fs-block-sized block is not set to zero
      in unaligned append direct write.This can lead to serious data
      inconsistencies.
      
      Ocfs2 manage disk with cluster size(for example, 1M), part-written in
      one cluster will change the cluster state from UN-WRITTEN to WRITTEN,
      VFS(function dio_zero_block) doesn't do the cleaning because bh's state
      is not set to NEW in function ocfs2_dio_wr_get_block when we write a
      WRITTEN cluster.  For example, the cluster size is 1M, file size is 8k
      and we direct write from 14k to 15k, then 12k~14k and 15k~16k will
      contain dirty data.
      
      We have to deal with two cases:
       1.The starting position of direct write is outside the file.
       2.The starting position of direct write is located in the file.
      
      We need set bh's state to NEW in the first case.  In the second case, we
      need mapped twice because bh's state of area out file should be set to
      NEW while area in file not.
      
      [akpm@linux-foundation.org: coding style fixes]
      Link: http://lkml.kernel.org/r/5292e287-8f1a-fd4a-1a14-661e555e0bed@huawei.comSigned-off-by: NJia Guo <guojia12@huawei.com>
      Reviewed-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a243c82