1. 25 1月, 2018 2 次提交
  2. 16 11月, 2017 1 次提交
  3. 23 9月, 2017 1 次提交
  4. 21 9月, 2017 1 次提交
  5. 01 8月, 2017 1 次提交
    • J
      fs: convert a pile of fsync routines to errseq_t based reporting · 3b49c9a1
      Jeff Layton 提交于
      This patch converts most of the in-kernel filesystems that do writeback
      out of the pagecache to report errors using the errseq_t-based
      infrastructure that was recently added. This allows them to report
      errors once for each open file description.
      
      Most filesystems have a fairly straightforward fsync operation. They
      call filemap_write_and_wait_range to write back all of the data and
      wait on it, and then (sometimes) sync out the metadata.
      
      For those filesystems this is a straightforward conversion from calling
      filemap_write_and_wait_range in their fsync operation to calling
      file_write_and_wait_range.
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      3b49c9a1
  6. 06 7月, 2017 2 次提交
    • R
      CIFS: fix circular locking dependency · 966681c9
      Rabin Vincent 提交于
      When a CIFS filesystem is mounted with the forcemand option and the
      following command is run on it, lockdep warns about a circular locking
      dependency between CifsInodeInfo::lock_sem and the inode lock.
      
       while echo foo > hello; do :; done & while touch -c hello; do :; done
      
      cifs_writev() takes the locks in the wrong order, but note that we can't
      only flip the order around because it releases the inode lock before the
      call to generic_write_sync() while it holds the lock_sem across that
      call.
      
      But, AFAICS, there is no need to hold the CifsInodeInfo::lock_sem across
      the generic_write_sync() call either, so we can release both the locks
      before generic_write_sync(), and change the order.
      
       ======================================================
       WARNING: possible circular locking dependency detected
       4.12.0-rc7+ #9 Not tainted
       ------------------------------------------------------
       touch/487 is trying to acquire lock:
        (&cifsi->lock_sem){++++..}, at: cifsFileInfo_put+0x88f/0x16a0
      
       but task is already holding lock:
        (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&sb->s_type->i_mutex_key#11){+.+.+.}:
              __lock_acquire+0x1f74/0x38f0
              lock_acquire+0x1cc/0x600
              down_write+0x74/0x110
              cifs_strict_writev+0x3cb/0x8c0
              __vfs_write+0x4c1/0x930
              vfs_write+0x14c/0x2d0
              SyS_write+0xf7/0x240
              entry_SYSCALL_64_fastpath+0x1f/0xbe
      
       -> #0 (&cifsi->lock_sem){++++..}:
              check_prevs_add+0xfa0/0x1d10
              __lock_acquire+0x1f74/0x38f0
              lock_acquire+0x1cc/0x600
              down_write+0x74/0x110
              cifsFileInfo_put+0x88f/0x16a0
              cifs_setattr+0x992/0x1680
              notify_change+0x61a/0xa80
              utimes_common+0x3d4/0x870
              do_utimes+0x1c1/0x220
              SyS_utimensat+0x84/0x1a0
              entry_SYSCALL_64_fastpath+0x1f/0xbe
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&sb->s_type->i_mutex_key#11);
                                      lock(&cifsi->lock_sem);
                                      lock(&sb->s_type->i_mutex_key#11);
         lock(&cifsi->lock_sem);
      
        *** DEADLOCK ***
      
       2 locks held by touch/487:
        #0:  (sb_writers#10){.+.+.+}, at: mnt_want_write+0x41/0xb0
        #1:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870
      
       stack backtrace:
       CPU: 0 PID: 487 Comm: touch Not tainted 4.12.0-rc7+ #9
       Call Trace:
        dump_stack+0xdb/0x185
        print_circular_bug+0x45b/0x790
        __lock_acquire+0x1f74/0x38f0
        lock_acquire+0x1cc/0x600
        down_write+0x74/0x110
        cifsFileInfo_put+0x88f/0x16a0
        cifs_setattr+0x992/0x1680
        notify_change+0x61a/0xa80
        utimes_common+0x3d4/0x870
        do_utimes+0x1c1/0x220
        SyS_utimensat+0x84/0x1a0
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: 19dfc1f5 ("cifs: fix the race in cifs_writev()")
      Signed-off-by: NRabin Vincent <rabinv@axis.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
      966681c9
    • J
  7. 21 6月, 2017 1 次提交
  8. 10 5月, 2017 1 次提交
    • R
      CIFS: silence lockdep splat in cifs_relock_file() · 560d3889
      Rabin Vincent 提交于
      cifs_relock_file() can perform a down_write() on the inode's lock_sem even
      though it was already performed in cifs_strict_readv().  Lockdep complains
      about this.  AFAICS, there is no problem here, and lockdep just needs to be
      told that this nesting is OK.
      
       =============================================
       [ INFO: possible recursive locking detected ]
       4.11.0+ #20 Not tainted
       ---------------------------------------------
       cat/701 is trying to acquire lock:
        (&cifsi->lock_sem){++++.+}, at: cifs_reopen_file+0x7a7/0xc00
      
       but task is already holding lock:
        (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(&cifsi->lock_sem);
         lock(&cifsi->lock_sem);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       1 lock held by cat/701:
        #0:  (&cifsi->lock_sem){++++.+}, at: cifs_strict_readv+0x177/0x310
      
       stack backtrace:
       CPU: 0 PID: 701 Comm: cat Not tainted 4.11.0+ #20
       Call Trace:
        dump_stack+0x85/0xc2
        __lock_acquire+0x17dd/0x2260
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        ? preempt_schedule_irq+0x6b/0x80
        lock_acquire+0xcc/0x260
        ? lock_acquire+0xcc/0x260
        ? cifs_reopen_file+0x7a7/0xc00
        down_read+0x2d/0x70
        ? cifs_reopen_file+0x7a7/0xc00
        cifs_reopen_file+0x7a7/0xc00
        ? printk+0x43/0x4b
        cifs_readpage_worker+0x327/0x8a0
        cifs_readpage+0x8c/0x2a0
        generic_file_read_iter+0x692/0xd00
        cifs_strict_readv+0x29f/0x310
        generic_file_splice_read+0x11c/0x1c0
        do_splice_to+0xa5/0xc0
        splice_direct_to_actor+0xfa/0x350
        ? generic_pipe_buf_nosteal+0x10/0x10
        do_splice_direct+0xb5/0xe0
        do_sendfile+0x278/0x3a0
        SyS_sendfile64+0xc4/0xe0
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      Signed-off-by: NRabin Vincent <rabinv@axis.com>
      Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      560d3889
  9. 03 5月, 2017 2 次提交
  10. 11 4月, 2017 1 次提交
    • G
      CIFS: store results of cifs_reopen_file to avoid infinite wait · 1fa839b4
      Germano Percossi 提交于
      This fixes Continuous Availability when errors during
      file reopen are encountered.
      
      cifs_user_readv and cifs_user_writev would wait for ever if
      results of cifs_reopen_file are not stored and for later inspection.
      
      In fact, results are checked and, in case of errors, a chain
      of function calls leading to reads and writes to be scheduled in
      a separate thread is skipped.
      These threads will wake up the corresponding waiters once reads
      and writes are done.
      
      However, given the return value is not stored, when rc is checked
      for errors a previous one (always zero) is inspected instead.
      This leads to pending reads/writes added to the list, making
      cifs_user_readv and cifs_user_writev wait for ever.
      Signed-off-by: NGermano Percossi <germano.percossi@citrix.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      1fa839b4
  11. 25 2月, 2017 1 次提交
  12. 02 2月, 2017 2 次提交
    • P
      CIFS: Add copy into pages callback for a read operation · d70b9104
      Pavel Shilovsky 提交于
      Since we have two different types of reads (pagecache and direct)
      we need to process such responses differently after decryption of
      a packet. The change allows to specify a callback that copies a read
      payload data into preallocated pages.
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      d70b9104
    • P
      CIFS: Fix splice read for non-cached files · 9c25702c
      Pavel Shilovsky 提交于
      Currently we call copy_page_to_iter() for uncached reading into a pipe.
      This is wrong because it treats pages as VFS cache pages and copies references
      rather than actual data. When we are trying to read from the pipe we end up
      calling page_cache_pipe_buf_confirm() which returns -ENODATA. This error
      is translated into 0 which is returned to a user.
      
      This issue is reproduced by running xfs-tests suite (generic test #249)
      against mount points with "cache=none". Fix it by mapping pages manually
      and calling copy_to_iter() that copies data into the pipe.
      
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      9c25702c
  13. 06 12月, 2016 1 次提交
    • P
      CIFS: Fix a possible double locking of mutex during reconnect · 96a988ff
      Pavel Shilovsky 提交于
      With the current code it is possible to lock a mutex twice when
      a subsequent reconnects are triggered. On the 1st reconnect we
      reconnect sessions and tcons and then persistent file handles.
      If the 2nd reconnect happens during the reconnecting of persistent
      file handles then the following sequence of calls is observed:
      
      cifs_reopen_file -> SMB2_open -> small_smb2_init -> smb2_reconnect
      -> cifs_reopen_persistent_file_handles -> cifs_reopen_file (again!).
      
      So, we are trying to acquire the same cfile->fh_mutex twice which
      is wrong. Fix this by moving reconnecting of persistent handles to
      the delayed work (smb2_reconnect_server) and submitting this work
      every time we reconnect tcon in SMB2 commands handling codepath.
      
      This can also lead to corruption of a temporary file list in
      cifs_reopen_persistent_file_handles() because we can recursively
      call this function twice.
      
      Cc: Stable <stable@vger.kernel.org> # v4.9+
      Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
      96a988ff
  14. 14 10月, 2016 2 次提交
  15. 13 10月, 2016 2 次提交
  16. 28 9月, 2016 2 次提交
  17. 27 7月, 2016 1 次提交
    • M
      mm, memcg: use consistent gfp flags during readahead · 8a5c743e
      Michal Hocko 提交于
      Vladimir has noticed that we might declare memcg oom even during
      readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
      restriction) while __do_page_cache_readahead uses
      page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
      OOMs.  This gfp mask discrepancy is really unfortunate and easily
      fixable.  Drop page_cache_alloc_readahead() which only has one user and
      outsource the gfp_mask logic into readahead_gfp_mask and propagate this
      mask from __do_page_cache_readahead down to read_pages.
      
      This alone would have only very limited impact as most filesystems are
      implementing ->readpages and the common implementation mpage_readpages
      does GFP_KERNEL (with mapping_gfp restriction) again.  We can tell it to
      use readahead_gfp_mask instead as this function is called only during
      readahead as well.  The same applies to read_cache_pages.
      
      ext4 has its own ext4_mpage_readpages but the path which has pages !=
      NULL can use the same gfp mask.  Btrfs, cifs, f2fs and orangefs are
      doing a very similar pattern to mpage_readpages so the same can be
      applied to them as well.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [mhocko@suse.com: restrict gfp mask in mpage_alloc]
        Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Changman Lee <cm224.lee@samsung.com>
      Cc: Chao Yu <yuchao0@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a5c743e
  18. 24 6月, 2016 1 次提交
  19. 18 5月, 2016 1 次提交
  20. 02 5月, 2016 3 次提交
  21. 05 4月, 2016 2 次提交
    • K
      mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage · ea1754a0
      Kirill A. Shutemov 提交于
      Mostly direct substitution with occasional adjustment or removing
      outdated comments.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea1754a0
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  22. 29 3月, 2016 1 次提交
  23. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  24. 16 1月, 2016 1 次提交
    • K
      page-flags: define PG_locked behavior on compound pages · 48c935ad
      Kirill A. Shutemov 提交于
      lock_page() must operate on the whole compound page.  It doesn't make
      much sense to lock part of compound page.  Change code to use head
      page's PG_locked, if tail page is passed.
      
      This patch also gets rid of custom helper functions --
      __set_page_locked() and __clear_page_locked().  They are replaced with
      helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG.  Tail pages to these
      helper would trigger VM_BUG_ON().
      
      SLUB uses PG_locked as a bit spin locked.  IIUC, tail pages should never
      appear there.  VM_BUG_ON() is added to make sure that this assumption is
      correct.
      
      [akpm@linux-foundation.org: fix fs/cifs/file.c]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48c935ad
  25. 07 11月, 2015 1 次提交
  26. 23 10月, 2015 1 次提交
  27. 17 10月, 2015 1 次提交
    • M
      mm, fs: obey gfp_mapping for add_to_page_cache() · 063d99b4
      Michal Hocko 提交于
      Commit 6afdb859 ("mm: do not ignore mapping_gfp_mask in page cache
      allocation paths") has caught some users of hardcoded GFP_KERNEL used in
      the page cache allocation paths.  This, however, wasn't complete and
      there were others which went unnoticed.
      
      Dave Chinner has reported the following deadlock for xfs on loop device:
      : With the recent merge of the loop device changes, I'm now seeing
      : XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
      :
      : The deadlocked is as follows:
      :
      : kloopd1: loop_queue_read_work
      :       xfs_file_iter_read
      :       lock XFS inode XFS_IOLOCK_SHARED (on image file)
      :       page cache read (GFP_KERNEL)
      :       radix tree alloc
      :       memory reclaim
      :       reclaim XFS inodes
      :       log force to unpin inodes
      :       <wait for log IO completion>
      :
      : xfs-cil/loop1: <does log force IO work>
      :       xlog_cil_push
      :       xlog_write
      :       <loop issuing log writes>
      :               xlog_state_get_iclog_space()
      :               <blocks due to all log buffers under write io>
      :               <waits for IO completion>
      :
      : kloopd1: loop_queue_write_work
      :       xfs_file_write_iter
      :       lock XFS inode XFS_IOLOCK_EXCL (on image file)
      :       <wait for inode to be unlocked>
      :
      : i.e. the kloopd, with it's split read and write work queues, has
      : introduced a dependency through memory reclaim. i.e. that writes
      : need to be able to progress for reads make progress.
      :
      : The problem, fundamentally, is that mpage_readpages() does a
      : GFP_KERNEL allocation, rather than paying attention to the inode's
      : mapping gfp mask, which is set to GFP_NOFS.
      :
      : The didn't used to happen, because the loop device used to issue
      : reads through the splice path and that does:
      :
      :       error = add_to_page_cache_lru(page, mapping, index,
      :                       GFP_KERNEL & mapping_gfp_mask(mapping));
      
      This has changed by commit aa4d8616 ("block: loop: switch to VFS
      ITER_BVEC").
      
      This patch changes mpage_readpage{s} to follow gfp mask set for the
      mapping.  There are, however, other places which are doing basically the
      same.
      
      lustre:ll_dir_filler is doing GFP_KERNEL from the function which
      apparently uses GFP_NOFS for other allocations so let's make this
      consistent.
      
      cifs:readpages_get_pages is called from cifs_readpages and
      __cifs_readpages_from_fscache called from the same path obeys mapping
      gfp.
      
      ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
      regardless it uses mapping_gfp_mask for the page allocation.
      
      ext4_mpage_readpages is the called from the page cache allocation path
      same as read_pages and read_cache_pages
      
      As I've noticed in my previous post I cannot say I would be happy about
      sprinkling mapping_gfp_mask all over the place and it sounds like we
      should drop gfp_mask argument altogether and use it internally in
      __add_to_page_cache_locked that would require all the filesystems to use
      mapping gfp consistently which I am not sure is the case here.  From a
      quick glance it seems that some file system use it all the time while
      others are selective.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NDave Chinner <david@fromorbit.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Andreas Dilger <andreas.dilger@intel.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      063d99b4
  28. 11 9月, 2015 1 次提交
  29. 21 5月, 2015 1 次提交
  30. 11 5月, 2015 1 次提交