1. 21 10月, 2019 2 次提交
  2. 18 10月, 2019 1 次提交
    • D
      iomap: iomap that extends beyond EOF should be marked dirty · 7684e2c4
      Dave Chinner 提交于
      When doing a direct IO that spans the current EOF, and there are
      written blocks beyond EOF that extend beyond the current write, the
      only metadata update that needs to be done is a file size extension.
      
      However, we don't mark such iomaps as IOMAP_F_DIRTY to indicate that
      there is IO completion metadata updates required, and hence we may
      fail to correctly sync file size extensions made in IO completion
      when O_DSYNC writes are being used and the hardware supports FUA.
      
      Hence when setting IOMAP_F_DIRTY, we need to also take into account
      whether the iomap spans the current EOF. If it does, then we need to
      mark it dirty so that IO completion will call generic_write_sync()
      to flush the inode size update to stable storage correctly.
      
      Fixes: 3460cac1 ("iomap: Use FUA for pure data O_DSYNC DIO writes")
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: removed the ext4 part; they'll handle it separately]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7684e2c4
  3. 15 10月, 2019 2 次提交
  4. 13 10月, 2019 2 次提交
    • S
      tracing: Do not create tracefs files if tracefs lockdown is in effect · bf8e6021
      Steven Rostedt (VMware) 提交于
      If on boot up, lockdown is activated for tracefs, don't even bother creating
      the files. This can also prevent instances from being created if lockdown is
      in effect.
      
      Link: http://lkml.kernel.org/r/CAHk-=whC6Ji=fWnjh2+eS4b15TnbsS4VPVtvBOwCy1jjEG_JHQ@mail.gmail.comSuggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      bf8e6021
    • S
      tracefs: Revert ccbd54ff ("tracefs: Restrict tracefs when the kernel is locked down") · 3ed270b1
      Steven Rostedt (VMware) 提交于
      Running the latest kernel through my "make instances" stress tests, I
      triggered the following bug (with KASAN and kmemleak enabled):
      
      mkdir invoked oom-killer:
      gfp_mask=0x40cd0(GFP_KERNEL|__GFP_COMP|__GFP_RECLAIMABLE), order=0,
      oom_score_adj=0
      CPU: 1 PID: 2229 Comm: mkdir Not tainted 5.4.0-rc2-test #325
      Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
      Call Trace:
       dump_stack+0x64/0x8c
       dump_header+0x43/0x3b7
       ? trace_hardirqs_on+0x48/0x4a
       oom_kill_process+0x68/0x2d5
       out_of_memory+0x2aa/0x2d0
       __alloc_pages_nodemask+0x96d/0xb67
       __alloc_pages_node+0x19/0x1e
       alloc_slab_page+0x17/0x45
       new_slab+0xd0/0x234
       ___slab_alloc.constprop.86+0x18f/0x336
       ? alloc_inode+0x2c/0x74
       ? irq_trace+0x12/0x1e
       ? tracer_hardirqs_off+0x1d/0xd7
       ? __slab_alloc.constprop.85+0x21/0x53
       __slab_alloc.constprop.85+0x31/0x53
       ? __slab_alloc.constprop.85+0x31/0x53
       ? alloc_inode+0x2c/0x74
       kmem_cache_alloc+0x50/0x179
       ? alloc_inode+0x2c/0x74
       alloc_inode+0x2c/0x74
       new_inode_pseudo+0xf/0x48
       new_inode+0x15/0x25
       tracefs_get_inode+0x23/0x7c
       ? lookup_one_len+0x54/0x6c
       tracefs_create_file+0x53/0x11d
       trace_create_file+0x15/0x33
       event_create_dir+0x2a3/0x34b
       __trace_add_new_event+0x1c/0x26
       event_trace_add_tracer+0x56/0x86
       trace_array_create+0x13e/0x1e1
       instance_mkdir+0x8/0x17
       tracefs_syscall_mkdir+0x39/0x50
       ? get_dname+0x31/0x31
       vfs_mkdir+0x78/0xa3
       do_mkdirat+0x71/0xb0
       sys_mkdir+0x19/0x1b
       do_fast_syscall_32+0xb0/0xed
      
      I bisected this down to the addition of the proxy_ops into tracefs for
      lockdown. It appears that the allocation of the proxy_ops and then freeing
      it in the destroy_inode callback, is causing havoc with the memory system.
      Reading the documentation about destroy_inode and talking with Linus about
      this, this is buggy and wrong. When defining the destroy_inode() method, it
      is expected that the destroy_inode() will also free the inode, and not just
      the extra allocations done in the creation of the inode. The faulty commit
      causes a memory leak of the inode data structure when they are deleted.
      
      Instead of allocating the proxy_ops (and then having to free it) the checks
      should be done by the open functions themselves, and not hack into the
      tracefs directory. First revert the tracefs updates for locked_down and then
      later we can add the locked_down checks in the kernel/trace files.
      
      Link: http://lkml.kernel.org/r/20191011135458.7399da44@gandalf.local.home
      
      Fixes: ccbd54ff ("tracefs: Restrict tracefs when the kernel is locked down")
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      3ed270b1
  5. 11 10月, 2019 2 次提交
    • J
      io_uring: fix sequence logic for timeout requests · 7adf4eaf
      Jens Axboe 提交于
      We have two ways a request can be deferred:
      
      1) It's a regular request that depends on another one
      2) It's a timeout that tracks completions
      
      We have a shared helper to determine whether to defer, and that
      attempts to make the right decision based on the request. But we
      only have some of this information in the caller. Un-share the
      two timeout/defer helpers so the caller can use the right one.
      
      Fixes: 5262f567 ("io_uring: IORING_OP_TIMEOUT support")
      Reported-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: NJackie Liu <liuyun01@kylinos.cn>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7adf4eaf
    • C
      NFSv4: Fix leak of clp->cl_acceptor string · 1047ec86
      Chuck Lever 提交于
      Our client can issue multiple SETCLIENTID operations to the same
      server in some circumstances. Ensure that calls to
      nfs4_proc_setclientid() after the first one do not overwrite the
      previously allocated cl_acceptor string.
      
      unreferenced object 0xffff888461031800 (size 32):
        comm "mount.nfs", pid 2227, jiffies 4294822467 (age 1407.749s)
        hex dump (first 32 bytes):
          6e 66 73 40 6b 6c 69 6d 74 2e 69 62 2e 31 30 31  nfs@klimt.ib.101
          35 67 72 61 6e 67 65 72 2e 6e 65 74 00 00 00 00  5granger.net....
        backtrace:
          [<00000000ab820188>] __kmalloc+0x128/0x176
          [<00000000eeaf4ec8>] gss_stringify_acceptor+0xbd/0x1a7 [auth_rpcgss]
          [<00000000e85e3382>] nfs4_proc_setclientid+0x34e/0x46c [nfsv4]
          [<000000003d9cf1fa>] nfs40_discover_server_trunking+0x7a/0xed [nfsv4]
          [<00000000b81c3787>] nfs4_discover_server_trunking+0x81/0x244 [nfsv4]
          [<000000000801b55f>] nfs4_init_client+0x1b0/0x238 [nfsv4]
          [<00000000977daf7f>] nfs4_set_client+0xfe/0x14d [nfsv4]
          [<0000000053a68a2a>] nfs4_create_server+0x107/0x1db [nfsv4]
          [<0000000088262019>] nfs4_remote_mount+0x2c/0x59 [nfsv4]
          [<00000000e84a2fd0>] legacy_get_tree+0x2d/0x4c
          [<00000000797e947c>] vfs_get_tree+0x20/0xc7
          [<00000000ecabaaa8>] fc_mount+0xe/0x36
          [<00000000f15fafc2>] vfs_kern_mount+0x74/0x8d
          [<00000000a3ff4e26>] nfs_do_root_mount+0x8a/0xa3 [nfsv4]
          [<00000000d1c2b337>] nfs4_try_mount+0x58/0xad [nfsv4]
          [<000000004c9bddee>] nfs_fs_mount+0x820/0x869 [nfs]
      
      Fixes: f11b2a1c ("nfs4: copy acceptor name from context ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1047ec86
  6. 10 10月, 2019 3 次提交
  7. 09 10月, 2019 9 次提交
  8. 08 10月, 2019 8 次提交
    • A
      btrfs: silence maybe-uninitialized warning in clone_range · 431d3988
      Austin Kim 提交于
      GCC throws warning message as below:
      
      ‘clone_src_i_size’ may be used uninitialized in this function
      [-Wmaybe-uninitialized]
       #define IS_ALIGNED(x, a)  (((x) & ((typeof(x))(a) - 1)) == 0)
                             ^
      fs/btrfs/send.c:5088:6: note: ‘clone_src_i_size’ was declared here
       u64 clone_src_i_size;
         ^
      The clone_src_i_size is only used as call-by-reference
      in a call to get_inode_info().
      
      Silence the warning by initializing clone_src_i_size to 0.
      
      Note that the warning is a false positive and reported by older versions
      of GCC (eg. 7.x) but not eg 9.x. As there have been numerous people, the
      patch is applied. Setting clone_src_i_size to 0 does not otherwise make
      sense and would not do any action in case the code changes in the future.
      Signed-off-by: NAustin Kim <austindh.kim@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ add note ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      431d3988
    • P
      io_uring: remove wait loop spurious wakeups · 6805b32e
      Pavel Begunkov 提交于
      Any changes interesting to tasks waiting in io_cqring_wait() are
      commited with io_cqring_ev_posted(). However, io_ring_drop_ctx_refs()
      also tries to do that but with no reason, that means spurious wakeups
      every io_free_req() and io_uring_enter().
      
      Just use percpu_ref_put() instead.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6805b32e
    • T
      writeback: fix use-after-free in finish_writeback_work() · 8e00c4e9
      Tejun Heo 提交于
      finish_writeback_work() reads @done->waitq after decrementing
      @done->cnt.  However, once @done->cnt reaches zero, @done may be freed
      (from stack) at any moment and @done->waitq can contain something
      unrelated by the time finish_writeback_work() tries to read it.  This
      led to the following crash.
      
        "BUG: kernel NULL pointer dereference, address: 0000000000000002"
        #PF: supervisor write access in kernel mode
        #PF: error_code(0x0002) - not-present page
        PGD 0 P4D 0
        Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
        CPU: 40 PID: 555153 Comm: kworker/u98:50 Kdump: loaded Not tainted
        ...
        Workqueue: writeback wb_workfn (flush-btrfs-1)
        RIP: 0010:_raw_spin_lock_irqsave+0x10/0x30
        Code: 48 89 d8 5b c3 e8 50 db 6b ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 9c 5b fa 31 c0 ba 01 00 00 00 <f0> 0f b1 17 75 05 48 89 d8 5b c3 89 c6 e8 fe ca 6b ff eb f2 66 90
        RSP: 0018:ffffc90049b27d98 EFLAGS: 00010046
        RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000000
        RDX: 0000000000000001 RSI: 0000000000000003 RDI: 0000000000000002
        RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001
        R10: ffff889fff407600 R11: ffff88ba9395d740 R12: 000000000000e300
        R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
        FS:  0000000000000000(0000) GS:ffff88bfdfa00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000000002 CR3: 0000000002409005 CR4: 00000000001606e0
        Call Trace:
         __wake_up_common_lock+0x63/0xc0
         wb_workfn+0xd2/0x3e0
         process_one_work+0x1f5/0x3f0
         worker_thread+0x2d/0x3d0
         kthread+0x111/0x130
         ret_from_fork+0x1f/0x30
      
      Fix it by reading and caching @done->waitq before decrementing
      @done->cnt.
      
      Link: http://lkml.kernel.org/r/20190924010631.GH2233839@devbig004.ftw2.facebook.com
      Fixes: 5b9cce4c ("writeback: Generalize and expose wb_completion")
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Debugged-by: NChris Mason <clm@fb.com>
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Cc: Jan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>	[5.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8e00c4e9
    • J
      fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc() · 2abb7d3b
      Jia-Ju Bai 提交于
      In ocfs2_info_scan_inode_alloc(), there is an if statement on line 283
      to check whether inode_alloc is NULL:
      
          if (inode_alloc)
      
      When inode_alloc is NULL, it is used on line 287:
      
          ocfs2_inode_lock(inode_alloc, &bh, 0);
              ocfs2_inode_lock_full_nested(inode, ...)
                  struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, inode_alloc is checked on line 286.
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Link: http://lkml.kernel.org/r/20190726033717.32359-1-baijiaju1990@gmail.comSigned-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2abb7d3b
    • J
      fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock() · 583fee3e
      Jia-Ju Bai 提交于
      In ocfs2_write_end_nolock(), there are an if statement on lines 1976,
      2047 and 2058, to check whether handle is NULL:
      
          if (handle)
      
      When handle is NULL, it is used on line 2045:
      
      	ocfs2_update_inode_fsync_trans(handle, inode, 1);
              oi->i_sync_tid = handle->h_transaction->t_tid;
      
      Thus, a possible null-pointer dereference may occur.
      
      To fix this bug, handle is checked before calling
      ocfs2_update_inode_fsync_trans().
      
      This bug is found by a static analysis tool STCheck written by us.
      
      Link: http://lkml.kernel.org/r/20190726033705.32307-1-baijiaju1990@gmail.comSigned-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      583fee3e
    • J
      fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry() · 56e94ea1
      Jia-Ju Bai 提交于
      In ocfs2_xa_prepare_entry(), there is an if statement on line 2136 to
      check whether loc->xl_entry is NULL:
      
          if (loc->xl_entry)
      
      When loc->xl_entry is NULL, it is used on line 2158:
      
          ocfs2_xa_add_entry(loc, name_hash);
              loc->xl_entry->xe_name_hash = cpu_to_le32(name_hash);
              loc->xl_entry->xe_name_offset = cpu_to_le16(loc->xl_size);
      
      and line 2164:
      
          ocfs2_xa_add_namevalue(loc, xi);
              loc->xl_entry->xe_value_size = cpu_to_le64(xi->xi_value_len);
              loc->xl_entry->xe_name_len = xi->xi_name_len;
      
      Thus, possible null-pointer dereferences may occur.
      
      To fix these bugs, if loc-xl_entry is NULL, ocfs2_xa_prepare_entry()
      abnormally returns with -EINVAL.
      
      These bugs are found by a static analysis tool STCheck written by us.
      
      [akpm@linux-foundation.org: remove now-unused ocfs2_xa_add_entry()]
      Link: http://lkml.kernel.org/r/20190726101447.9153-1-baijiaju1990@gmail.comSigned-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56e94ea1
    • J
      ocfs2: clear zero in unaligned direct IO · 7a243c82
      Jia Guo 提交于
      Unused portion of a part-written fs-block-sized block is not set to zero
      in unaligned append direct write.This can lead to serious data
      inconsistencies.
      
      Ocfs2 manage disk with cluster size(for example, 1M), part-written in
      one cluster will change the cluster state from UN-WRITTEN to WRITTEN,
      VFS(function dio_zero_block) doesn't do the cleaning because bh's state
      is not set to NEW in function ocfs2_dio_wr_get_block when we write a
      WRITTEN cluster.  For example, the cluster size is 1M, file size is 8k
      and we direct write from 14k to 15k, then 12k~14k and 15k~16k will
      contain dirty data.
      
      We have to deal with two cases:
       1.The starting position of direct write is outside the file.
       2.The starting position of direct write is located in the file.
      
      We need set bh's state to NEW in the first case.  In the second case, we
      need mapped twice because bh's state of area out file should be set to
      NEW while area in file not.
      
      [akpm@linux-foundation.org: coding style fixes]
      Link: http://lkml.kernel.org/r/5292e287-8f1a-fd4a-1a14-661e555e0bed@huawei.comSigned-off-by: NJia Guo <guojia12@huawei.com>
      Reviewed-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a243c82
    • L
      uaccess: implement a proper unsafe_copy_to_user() and switch filldir over to it · c512c691
      Linus Torvalds 提交于
      In commit 9f79b78e ("Convert filldir[64]() from __put_user() to
      unsafe_put_user()") I made filldir() use unsafe_put_user(), which
      improves code generation on x86 enormously.
      
      But because we didn't have a "unsafe_copy_to_user()", the dirent name
      copy was also done by hand with unsafe_put_user() in a loop, and it
      turns out that a lot of other architectures didn't like that, because
      unlike x86, they have various alignment issues.
      
      Most non-x86 architectures trap and fix it up, and some (like xtensa)
      will just fail unaligned put_user() accesses unconditionally.  Which
      makes that "copy using put_user() in a loop" not work for them at all.
      
      I could make that code do explicit alignment etc, but the architectures
      that don't like unaligned accesses also don't really use the fancy
      "user_access_begin/end()" model, so they might just use the regular old
      __copy_to_user() interface.
      
      So this commit takes that looping implementation, turns it into the x86
      version of "unsafe_copy_to_user()", and makes other architectures
      implement the unsafe copy version as __copy_to_user() (the same way they
      do for the other unsafe_xyz() accessor functions).
      
      Note that it only does this for the copying _to_ user space, and we
      still don't have a unsafe version of copy_from_user().
      
      That's partly because we have no current users of it, but also partly
      because the copy_from_user() case is slightly different and cannot
      efficiently be implemented in terms of a unsafe_get_user() loop (because
      gcc can't do asm goto with outputs).
      
      It would be trivial to do this using "rep movsb", which would work
      really nicely on newer x86 cores, but really badly on some older ones.
      
      Al Viro is looking at cleaning up all our user copy routines to make
      this all a non-issue, but for now we have this simple-but-stupid version
      for x86 that works fine for the dirent name copy case because those
      names are short strings and we simply don't need anything fancier.
      
      Fixes: 9f79b78e ("Convert filldir[64]() from __put_user() to unsafe_put_user()")
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Reported-and-tested-by: NTony Luck <tony.luck@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c512c691
  9. 07 10月, 2019 9 次提交
    • P
      CIFS: Gracefully handle QueryInfo errors during open · 30573a82
      Pavel Shilovsky 提交于
      Currently if the client identifies problems when processing
      metadata returned in CREATE response, the open handle is being
      leaked. This causes multiple problems like a file missing a lease
      break by that client which causes high latencies to other clients
      accessing the file. Another side-effect of this is that the file
      can't be deleted.
      
      Fix this by closing the file after the client hits an error after
      the file was opened and the open descriptor wasn't returned to
      the user space. Also convert -ESTALE to -EOPENSTALE to allow
      the VFS to revalidate a dentry and retry the open.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      30573a82
    • D
      cifs: use cifsInodeInfo->open_file_lock while iterating to avoid a panic · cb248819
      Dave Wysochanski 提交于
      Commit 487317c9 ("cifs: add spinlock for the openFileList to
      cifsInodeInfo") added cifsInodeInfo->open_file_lock spin_lock to protect
      the openFileList, but missed a few places where cifs_inode->openFileList
      was enumerated.  Change these remaining tcon->open_file_lock to
      cifsInodeInfo->open_file_lock to avoid panic in is_size_safe_to_change.
      
      [17313.245641] RIP: 0010:is_size_safe_to_change+0x57/0xb0 [cifs]
      [17313.245645] Code: 68 40 48 89 ef e8 19 67 b7 f1 48 8b 43 40 48 8d 4b 40 48 8d 50 f0 48 39 c1 75 0f eb 47 48 8b 42 10 48 8d 50 f0 48 39 c1 74 3a <8b> 80 88 00 00 00 83 c0 01 a8 02 74 e6 48 89 ef c6 07 00 0f 1f 40
      [17313.245649] RSP: 0018:ffff94ae1baefa30 EFLAGS: 00010202
      [17313.245654] RAX: dead000000000100 RBX: ffff88dc72243300 RCX: ffff88dc72243340
      [17313.245657] RDX: dead0000000000f0 RSI: 00000000098f7940 RDI: ffff88dd3102f040
      [17313.245659] RBP: ffff88dd3102f040 R08: 0000000000000000 R09: ffff94ae1baefc40
      [17313.245661] R10: ffffcdc8bb1c4e80 R11: ffffcdc8b50adb08 R12: 00000000098f7940
      [17313.245663] R13: ffff88dc72243300 R14: ffff88dbc8f19600 R15: ffff88dc72243428
      [17313.245667] FS:  00007fb145485700(0000) GS:ffff88dd3e000000(0000) knlGS:0000000000000000
      [17313.245670] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [17313.245672] CR2: 0000026bb46c6000 CR3: 0000004edb110003 CR4: 00000000007606e0
      [17313.245753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [17313.245756] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [17313.245759] PKRU: 55555554
      [17313.245761] Call Trace:
      [17313.245803]  cifs_fattr_to_inode+0x16b/0x580 [cifs]
      [17313.245838]  cifs_get_inode_info+0x35c/0xa60 [cifs]
      [17313.245852]  ? kmem_cache_alloc_trace+0x151/0x1d0
      [17313.245885]  cifs_open+0x38f/0x990 [cifs]
      [17313.245921]  ? cifs_revalidate_dentry_attr+0x3e/0x350 [cifs]
      [17313.245953]  ? cifsFileInfo_get+0x30/0x30 [cifs]
      [17313.245960]  ? do_dentry_open+0x132/0x330
      [17313.245963]  do_dentry_open+0x132/0x330
      [17313.245969]  path_openat+0x573/0x14d0
      [17313.245974]  do_filp_open+0x93/0x100
      [17313.245979]  ? __check_object_size+0xa3/0x181
      [17313.245986]  ? audit_alloc_name+0x7e/0xd0
      [17313.245992]  do_sys_open+0x184/0x220
      [17313.245999]  do_syscall_64+0x5b/0x1b0
      
      Fixes: 487317c9 ("cifs: add spinlock for the openFileList to cifsInodeInfo")
      
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      cb248819
    • A
      fs: cifs: mute -Wunused-const-variable message · dd19c106
      Austin Kim 提交于
      After 'Initial git repository build' commit,
      'mapping_table_ERRHRD' variable has not been used.
      
      So 'mapping_table_ERRHRD' const variable could be removed
      to mute below warning message:
      
         fs/cifs/netmisc.c:120:40: warning: unused variable 'mapping_table_ERRHRD' [-Wunused-const-variable]
         static const struct smb_to_posix_error mapping_table_ERRHRD[] = {
                                                 ^
      Signed-off-by: NAustin Kim <austindh.kim@gmail.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      dd19c106
    • S
      smb3: cleanup some recent endian errors spotted by updated sparse · 52870d50
      Steve French 提交于
      Now that sparse has been fixed, it spotted a couple recent minor
      endian errors (and removed one additional sparse warning).
      
      Thanks to Luc Van Oostenryck for his help fixing sparse.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      52870d50
    • B
      xfs: assure zeroed memory buffers for certain kmem allocations · 3219e8cf
      Bill O'Donnell 提交于
      Guarantee zeroed memory buffers for cases where potential memory
      leak to disk can occur. In these cases, kmem_alloc is used and
      doesn't zero the buffer, opening the possibility of information
      leakage to disk.
      
      Use existing infrastucture (xfs_buf_allocate_memory) to obtain
      the already zeroed buffer from kernel memory.
      
      This solution avoids the performance issue that would occur if a
      wholesale change to replace kmem_alloc with kmem_zalloc was done.
      Signed-off-by: NBill O'Donnell <billodo@redhat.com>
      [darrick: fix bitwise complaint about kmflag_mask]
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      3219e8cf
    • A
      xfs: removed unused error variable from xchk_refcountbt_rec · d5cc14d9
      Aliasgar Surti 提交于
      Removed unused error variable. Instead of using error variable,
      returned the value directly as it wasn't updated.
      Signed-off-by: NAliasgar Surti <aliasgar.surti500@gmail.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d5cc14d9
    • E
      xfs: remove unused flags arg from xfs_get_aghdr_buf() · 6374ca03
      Eric Sandeen 提交于
      The flags arg is always passed as zero, so remove it.
      
      (xfs_buf_get_uncached takes flags to support XBF_NO_IOACCT for
      the sb, but that should never be relevant for xfs_get_aghdr_buf)
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6374ca03
    • M
      xfs: Fix tail rounding in xfs_alloc_file_space() · e093c4be
      Max Reitz 提交于
      To ensure that all blocks touched by the range [offset, offset + count)
      are allocated, we need to calculate the block count from the difference
      of the range end (rounded up) and the range start (rounded down).
      
      Before this patch, we just round up the byte count, which may lead to
      unaligned ranges not being fully allocated:
      
      $ touch test_file
      $ block_size=$(stat -fc '%S' test_file)
      $ fallocate -o $((block_size / 2)) -l $block_size test_file
      $ xfs_bmap test_file
      test_file:
              0: [0..7]: 1396264..1396271
              1: [8..15]: hole
      
      There should not be a hole there.  Instead, the first two blocks should
      be fully allocated.
      
      With this patch applied, the result is something like this:
      
      $ touch test_file
      $ block_size=$(stat -fc '%S' test_file)
      $ fallocate -o $((block_size / 2)) -l $block_size test_file
      $ xfs_bmap test_file
      test_file:
              0: [0..15]: 11024..11039
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e093c4be
    • L
      elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings · b212921b
      Linus Torvalds 提交于
      In commit 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") we
      changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
      executable mappings.
      
      Then, people reported that it broke some binaries that had overlapping
      segments from the same file, and commit ad55eac7 ("elf: enforce
      MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
      overlaying elf segment cases.  But only some - despite the summary line
      of that commit, it only did it when it also does a temporary brk vma for
      one obvious overlapping case.
      
      Now Russell King reports another overlapping case with old 32-bit x86
      binaries, which doesn't trigger that limited case.  End result: we had
      better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.
      
      Yes, it's a sign of old binaries generated with old tool-chains, but we
      do pride ourselves on not breaking existing setups.
      
      This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
      and the old load_elf_library() use-cases, because nobody has reported
      breakage for those. Yet.
      
      Note that in all the cases seen so far, the overlapping elf sections
      seem to be just re-mapping of the same executable with different section
      attributes.  We could possibly introduce a new MAP_FIXED_NOFILECHANGE
      flag or similar, which acts like NOREPLACE, but allows just remapping
      the same executable file using different protection flags.
      
      It's not clear that would make a huge difference to anything, but if
      people really hate that "elf remaps over previous maps" behavior, maybe
      at least a more limited form of remapping would alleviate some concerns.
      
      Alternatively, we should take a look at our elf_map() logic to see if we
      end up not mapping things properly the first time.
      
      In the meantime, this is the minimal "don't do that then" patch while
      people hopefully think about it more.
      Reported-by: NRussell King <linux@armlinux.org.uk>
      Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map")
      Fixes: ad55eac7 ("elf: enforce  MAP_FIXED on overlaying elf segments")
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b212921b
  10. 06 10月, 2019 2 次提交
    • L
      Make filldir[64]() verify the directory entry filename is valid · 8a23eb80
      Linus Torvalds 提交于
      This has been discussed several times, and now filesystem people are
      talking about doing it individually at the filesystem layer, so head
      that off at the pass and just do it in getdents{64}().
      
      This is partially based on a patch by Jann Horn, but checks for NUL
      bytes as well, and somewhat simplified.
      
      There's also commentary about how it might be better if invalid names
      due to filesystem corruption don't cause an immediate failure, but only
      an error at the end of the readdir(), so that people can still see the
      filenames that are ok.
      
      There's also been discussion about just how much POSIX strictly speaking
      requires this since it's about filesystem corruption.  It's really more
      "protect user space from bad behavior" as pointed out by Jann.  But
      since Eric Biederman looked up the POSIX wording, here it is for context:
      
       "From readdir:
      
         The readdir() function shall return a pointer to a structure
         representing the directory entry at the current position in the
         directory stream specified by the argument dirp, and position the
         directory stream at the next entry. It shall return a null pointer
         upon reaching the end of the directory stream. The structure dirent
         defined in the <dirent.h> header describes a directory entry.
      
        From definitions:
      
         3.129 Directory Entry (or Link)
      
         An object that associates a filename with a file. Several directory
         entries can associate names with the same file.
      
        ...
      
         3.169 Filename
      
         A name consisting of 1 to {NAME_MAX} bytes used to name a file. The
         characters composing the name may be selected from the set of all
         character values excluding the slash character and the null byte. The
         filenames dot and dot-dot have special meaning. A filename is
         sometimes referred to as a 'pathname component'."
      
      Note that I didn't bother adding the checks to any legacy interfaces
      that nobody uses.
      
      Also note that if this ends up being noticeable as a performance
      regression, we can fix that to do a much more optimized model that
      checks for both NUL and '/' at the same time one word at a time.
      
      We haven't really tended to optimize 'memchr()', and it only checks for
      one pattern at a time anyway, and we really _should_ check for NUL too
      (but see the comment about "soft errors" in the code about why it
      currently only checks for '/')
      
      See the CONFIG_DCACHE_WORD_ACCESS case of hash_name() for how the name
      lookup code looks for pathname terminating characters in parallel.
      
      Link: https://lore.kernel.org/lkml/20190118161440.220134-2-jannh@google.com/
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jann Horn <jannh@google.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a23eb80
    • L
      Convert filldir[64]() from __put_user() to unsafe_put_user() · 9f79b78e
      Linus Torvalds 提交于
      We really should avoid the "__{get,put}_user()" functions entirely,
      because they can easily be mis-used and the original intent of being
      used for simple direct user accesses no longer holds in a post-SMAP/PAN
      world.
      
      Manually optimizing away the user access range check makes no sense any
      more, when the range check is generally much cheaper than the "enable
      user accesses" code that the __{get,put}_user() functions still need.
      
      So instead of __put_user(), use the unsafe_put_user() interface with
      user_access_{begin,end}() that really does generate better code these
      days, and which is generally a nicer interface.  Under some loads, the
      multiple user writes that filldir() does are actually quite noticeable.
      
      This also makes the dirent name copy use unsafe_put_user() with a couple
      of macros.  We do not want to make function calls with SMAP/PAN
      disabled, and the code this generates is quite good when the
      architecture uses "asm goto" for unsafe_put_user() like x86 does.
      
      Note that this doesn't bother with the legacy cases.  Nobody should use
      them anyway, so performance doesn't really matter there.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f79b78e