1. 24 3月, 2019 1 次提交
    • H
      9p: use inode->i_lock to protect i_size_write() under 32-bit · e08ba890
      Hou Tao 提交于
      commit 5e3cc1ee1405a7eb3487ed24f786dec01b4cbe1f upstream.
      
      Use inode->i_lock to protect i_size_write(), else i_size_read() in
      generic_fillattr() may loop infinitely in read_seqcount_begin() when
      multiple processes invoke v9fs_vfs_getattr() or v9fs_vfs_getattr_dotl()
      simultaneously under 32-bit SMP environment, and a soft lockup will be
      triggered as show below:
      
        watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [stat:2217]
        Modules linked in:
        CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4
        Hardware name: Generic DT based system
        PC is at generic_fillattr+0x104/0x108
        LR is at 0xec497f00
        pc : [<802b8898>]    lr : [<ec497f00>]    psr: 200c0013
        sp : ec497e20  ip : ed608030  fp : ec497e3c
        r10: 00000000  r9 : ec497f00  r8 : ed608030
        r7 : ec497ebc  r6 : ec497f00  r5 : ee5c1550  r4 : ee005780
        r3 : 0000052d  r2 : 00000000  r1 : ec497f00  r0 : ed608030
        Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
        Control: 10c5387d  Table: ac48006a  DAC: 00000051
        CPU: 5 PID: 2217 Comm: stat Not tainted 5.0.0-rc1-00005-g7f702faf5a9e #4
        Hardware name: Generic DT based system
        Backtrace:
        [<8010d974>] (dump_backtrace) from [<8010dc88>] (show_stack+0x20/0x24)
        [<8010dc68>] (show_stack) from [<80a1d194>] (dump_stack+0xb0/0xdc)
        [<80a1d0e4>] (dump_stack) from [<80109f34>] (show_regs+0x1c/0x20)
        [<80109f18>] (show_regs) from [<801d0a80>] (watchdog_timer_fn+0x280/0x2f8)
        [<801d0800>] (watchdog_timer_fn) from [<80198658>] (__hrtimer_run_queues+0x18c/0x380)
        [<801984cc>] (__hrtimer_run_queues) from [<80198e60>] (hrtimer_run_queues+0xb8/0xf0)
        [<80198da8>] (hrtimer_run_queues) from [<801973e8>] (run_local_timers+0x28/0x64)
        [<801973c0>] (run_local_timers) from [<80197460>] (update_process_times+0x3c/0x6c)
        [<80197424>] (update_process_times) from [<801ab2b8>] (tick_nohz_handler+0xe0/0x1bc)
        [<801ab1d8>] (tick_nohz_handler) from [<80843050>] (arch_timer_handler_virt+0x38/0x48)
        [<80843018>] (arch_timer_handler_virt) from [<80180a64>] (handle_percpu_devid_irq+0x8c/0x240)
        [<801809d8>] (handle_percpu_devid_irq) from [<8017ac20>] (generic_handle_irq+0x34/0x44)
        [<8017abec>] (generic_handle_irq) from [<8017b344>] (__handle_domain_irq+0x6c/0xc4)
        [<8017b2d8>] (__handle_domain_irq) from [<801022e0>] (gic_handle_irq+0x4c/0x88)
        [<80102294>] (gic_handle_irq) from [<80101a30>] (__irq_svc+0x70/0x98)
        [<802b8794>] (generic_fillattr) from [<8056b284>] (v9fs_vfs_getattr_dotl+0x74/0xa4)
        [<8056b210>] (v9fs_vfs_getattr_dotl) from [<802b8904>] (vfs_getattr_nosec+0x68/0x7c)
        [<802b889c>] (vfs_getattr_nosec) from [<802b895c>] (vfs_getattr+0x44/0x48)
        [<802b8918>] (vfs_getattr) from [<802b8a74>] (vfs_statx+0x9c/0xec)
        [<802b89d8>] (vfs_statx) from [<802b9428>] (sys_lstat64+0x48/0x78)
        [<802b93e0>] (sys_lstat64) from [<80101000>] (ret_fast_syscall+0x0/0x28)
      
      [dominique.martinet@cea.fr: updated comment to not refer to a function
      in another subsystem]
      Link: http://lkml.kernel.org/r/20190124063514.8571-2-houtao1@huawei.com
      Cc: stable@vger.kernel.org
      Fixes: 7549ae3e ("9p: Use the i_size_[read, write]() macros instead of using inode->i_size directly.")
      Reported-by: NXing Gaopeng <xingaopeng@huawei.com>
      Signed-off-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e08ba890
  2. 21 11月, 2018 1 次提交
  3. 13 8月, 2018 1 次提交
  4. 01 8月, 2017 1 次提交
    • J
      fs: convert a pile of fsync routines to errseq_t based reporting · 3b49c9a1
      Jeff Layton 提交于
      This patch converts most of the in-kernel filesystems that do writeback
      out of the pagecache to report errors using the errseq_t-based
      infrastructure that was recently added. This allows them to report
      errors once for each open file description.
      
      Most filesystems have a fairly straightforward fsync operation. They
      call filemap_write_and_wait_range to write back all of the data and
      wait on it, and then (sometimes) sync out the metadata.
      
      For those filesystems this is a straightforward conversion from calling
      filemap_write_and_wait_range in their fsync operation to calling
      file_write_and_wait_range.
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      3b49c9a1
  5. 16 7月, 2017 1 次提交
    • B
      fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks · 9d5b86ac
      Benjamin Coddington 提交于
      Since commit c69899a1 "NFSv4: Update of VFS byte range lock must be
      atomic with the stateid update", NFSv4 has been inserting locks in rpciod
      worker context.  The result is that the file_lock's fl_nspid is the
      kworker's pid instead of the original userspace pid.
      
      The fl_nspid is only used to represent the namespaced virtual pid number
      when displaying locks or returning from F_GETLK.  There's no reason to set
      it for every inserted lock, since we can usually just look it up from
      fl_pid.  So, instead of looking up and holding struct pid for every lock,
      let's just look up the virtual pid number from fl_pid when it is needed.
      That means we can remove fl_nspid entirely.
      
      The translaton and presentation of fl_pid should handle the following four
      cases:
      
      1 - F_GETLK on a remote file with a remote lock:
          In this case, the filesystem should determine the l_pid to return here.
          Filesystems should indicate that the fl_pid represents a non-local pid
          value that should not be translated by returning an fl_pid <= 0.
      
      2 - F_GETLK on a local file with a remote lock:
          This should be the l_pid of the lock manager process, and translated.
      
      3 - F_GETLK on a remote file with a local lock, and
      4 - F_GETLK on a local file with a local lock:
          These should be the translated l_pid of the local locking process.
      
      Fuse was already doing the correct thing by translating the pid into the
      caller's namespace.  With this change we must update fuse to translate
      to init's pid namespace, so that the locks API can then translate from
      init's pid namespace into the pid namespace of the caller.
      
      With this change, the locks API will expect that if a filesystem returns
      a remote pid as opposed to a local pid for F_GETLK, that remote pid will
      be <= 0.  This signifies that the pid is remote, and the locks API will
      forego translating that pid into the pid namespace of the local calling
      process.
      
      Finally, we convert remote filesystems to present remote pids using
      negative numbers. Have lustre, 9p, ceph, cifs, and dlm negate the remote
      pid returned for F_GETLK lock requests.
      
      Since local pids will never be larger than PID_MAX_LIMIT (which is
      currently defined as <= 4 million), but pid_t is an unsigned int, we
      should have plenty of room to represent remote pids with negative
      numbers if we assume that remote pid numbers are similarly limited.
      
      If this is not the case, then we run the risk of having a remote pid
      returned for which there is also a corresponding local pid.  This is a
      problem we have now, but this patch should reduce the chances of that
      occurring, while also returning those remote pid numbers, for whatever
      that may be worth.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      9d5b86ac
  6. 25 2月, 2017 1 次提交
  7. 25 12月, 2016 1 次提交
  8. 01 7月, 2016 1 次提交
  9. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  10. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  11. 06 11月, 2015 1 次提交
  12. 23 10月, 2015 1 次提交
  13. 24 8月, 2015 1 次提交
    • V
      9p: fix return code of read() when count is 0 · b5ac1fb2
      Vincent Bernat 提交于
      When reading 0 bytes from an empty file on a 9P filesystem, the return
      code of read() was not 0 as expected due to an unitialized err variable.
      
      Tested with this simple program:
      
          #include <assert.h>
          #include <sys/types.h>
          #include <sys/stat.h>
          #include <fcntl.h>
          #include <unistd.h>
      
          int main(int argc, const char **argv)
          {
              assert(argc == 2);
              char buffer[256];
              int fd = open(argv[1], O_RDONLY|O_NOCTTY);
              assert(fd >= 0);
              assert(read(fd, buffer, 0) == 0);
              return 0;
          }
      Signed-off-by: NVincent Bernat <vincent@bernat.im>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      b5ac1fb2
  14. 12 4月, 2015 10 次提交
  15. 22 3月, 2015 1 次提交
  16. 20 3月, 2015 2 次提交
  17. 11 2月, 2015 1 次提交
  18. 09 10月, 2014 1 次提交
  19. 05 6月, 2014 1 次提交
  20. 02 6月, 2014 1 次提交
    • J
      locks: ensure that fl_owner is always initialized properly in flock and lease codepaths · 130d1f95
      Jeff Layton 提交于
      Currently, the fl_owner isn't set for flock locks. Some filesystems use
      byte-range locks to simulate flock locks and there is a common idiom in
      those that does:
      
          fl->fl_owner = (fl_owner_t)filp;
          fl->fl_start = 0;
          fl->fl_end = OFFSET_MAX;
      
      Since flock locks are generally "owned" by the open file description,
      move this into the common flock lock setup code. The fl_start and fl_end
      fields are already set appropriately, so remove the unneeded setting of
      that in flock ops in those filesystems as well.
      
      Finally, the lease code also sets the fl_owner as if they were owned by
      the process and not the open file description. This is incorrect as
      leases have the same ownership semantics as flock locks. Set them the
      same way. The lease code doesn't actually use the fl_owner value for
      anything, so this is more for consistency's sake than a bugfix.
      Reported-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NJeff Layton <jlayton@poochiereds.net>
      Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (Staging portion)
      Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
      130d1f95
  21. 07 5月, 2014 2 次提交
  22. 08 4月, 2014 1 次提交
  23. 10 1月, 2014 1 次提交
  24. 24 11月, 2013 1 次提交
  25. 25 10月, 2013 1 次提交
  26. 26 8月, 2013 1 次提交
    • W
      fs/9p: avoid accessing utsname after namespace has been torn down · 50192abe
      Will Deacon 提交于
      During trinity fuzzing in a kvmtool guest, I stumbled across the
      following:
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000004
      PC is at v9fs_file_do_lock+0xc8/0x1a0
      LR is at v9fs_file_do_lock+0x48/0x1a0
      [<c01e2ed0>] (v9fs_file_do_lock+0xc8/0x1a0) from [<c0119154>] (locks_remove_flock+0x8c/0x124)
      [<c0119154>] (locks_remove_flock+0x8c/0x124) from [<c00d9bf0>] (__fput+0x58/0x1e4)
      [<c00d9bf0>] (__fput+0x58/0x1e4) from [<c0044340>] (task_work_run+0xac/0xe8)
      [<c0044340>] (task_work_run+0xac/0xe8) from [<c002e36c>] (do_exit+0x6bc/0x8d8)
      [<c002e36c>] (do_exit+0x6bc/0x8d8) from [<c002e674>] (do_group_exit+0x3c/0xb0)
      [<c002e674>] (do_group_exit+0x3c/0xb0) from [<c002e6f8>] (__wake_up_parent+0x0/0x18)
      
      I believe this is due to an attempt to access utsname()->nodename, after
      exit_task_namespaces() has been called, leaving current->nsproxy->uts_ns
      as NULL and causing the above dereference.
      
      A similar issue was fixed for lockd in 9a1b6bf8 ("LOCKD: Don't call
      utsname()->nodename from nlmclnt_setlockargs"), so this patch attempts
      something similar for 9pfs.
      
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      50192abe
  27. 23 2月, 2013 1 次提交
  28. 22 2月, 2013 1 次提交
  29. 11 2月, 2013 1 次提交