1. 27 7月, 2016 1 次提交
    • M
      mm, memcg: use consistent gfp flags during readahead · 8a5c743e
      Michal Hocko 提交于
      Vladimir has noticed that we might declare memcg oom even during
      readahead because read_pages only uses GFP_KERNEL (with mapping_gfp
      restriction) while __do_page_cache_readahead uses
      page_cache_alloc_readahead which adds __GFP_NORETRY to prevent from
      OOMs.  This gfp mask discrepancy is really unfortunate and easily
      fixable.  Drop page_cache_alloc_readahead() which only has one user and
      outsource the gfp_mask logic into readahead_gfp_mask and propagate this
      mask from __do_page_cache_readahead down to read_pages.
      
      This alone would have only very limited impact as most filesystems are
      implementing ->readpages and the common implementation mpage_readpages
      does GFP_KERNEL (with mapping_gfp restriction) again.  We can tell it to
      use readahead_gfp_mask instead as this function is called only during
      readahead as well.  The same applies to read_cache_pages.
      
      ext4 has its own ext4_mpage_readpages but the path which has pages !=
      NULL can use the same gfp mask.  Btrfs, cifs, f2fs and orangefs are
      doing a very similar pattern to mpage_readpages so the same can be
      applied to them as well.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [mhocko@suse.com: restrict gfp mask in mpage_alloc]
        Link: http://lkml.kernel.org/r/20160610074223.GC32285@dhcp22.suse.cz
      Link: http://lkml.kernel.org/r/1465301556-26431-1-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Changman Lee <cm224.lee@samsung.com>
      Cc: Chao Yu <yuchao0@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a5c743e
  2. 24 6月, 2016 1 次提交
  3. 18 5月, 2016 1 次提交
  4. 02 5月, 2016 3 次提交
  5. 05 4月, 2016 2 次提交
    • K
      mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage · ea1754a0
      Kirill A. Shutemov 提交于
      Mostly direct substitution with occasional adjustment or removing
      outdated comments.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea1754a0
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  6. 29 3月, 2016 1 次提交
  7. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  8. 16 1月, 2016 1 次提交
    • K
      page-flags: define PG_locked behavior on compound pages · 48c935ad
      Kirill A. Shutemov 提交于
      lock_page() must operate on the whole compound page.  It doesn't make
      much sense to lock part of compound page.  Change code to use head
      page's PG_locked, if tail page is passed.
      
      This patch also gets rid of custom helper functions --
      __set_page_locked() and __clear_page_locked().  They are replaced with
      helpers generated by __SETPAGEFLAG/__CLEARPAGEFLAG.  Tail pages to these
      helper would trigger VM_BUG_ON().
      
      SLUB uses PG_locked as a bit spin locked.  IIUC, tail pages should never
      appear there.  VM_BUG_ON() is added to make sure that this assumption is
      correct.
      
      [akpm@linux-foundation.org: fix fs/cifs/file.c]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48c935ad
  9. 07 11月, 2015 1 次提交
  10. 23 10月, 2015 1 次提交
  11. 17 10月, 2015 1 次提交
    • M
      mm, fs: obey gfp_mapping for add_to_page_cache() · 063d99b4
      Michal Hocko 提交于
      Commit 6afdb859 ("mm: do not ignore mapping_gfp_mask in page cache
      allocation paths") has caught some users of hardcoded GFP_KERNEL used in
      the page cache allocation paths.  This, however, wasn't complete and
      there were others which went unnoticed.
      
      Dave Chinner has reported the following deadlock for xfs on loop device:
      : With the recent merge of the loop device changes, I'm now seeing
      : XFS deadlock on my single CPU, 1GB RAM VM running xfs/073.
      :
      : The deadlocked is as follows:
      :
      : kloopd1: loop_queue_read_work
      :       xfs_file_iter_read
      :       lock XFS inode XFS_IOLOCK_SHARED (on image file)
      :       page cache read (GFP_KERNEL)
      :       radix tree alloc
      :       memory reclaim
      :       reclaim XFS inodes
      :       log force to unpin inodes
      :       <wait for log IO completion>
      :
      : xfs-cil/loop1: <does log force IO work>
      :       xlog_cil_push
      :       xlog_write
      :       <loop issuing log writes>
      :               xlog_state_get_iclog_space()
      :               <blocks due to all log buffers under write io>
      :               <waits for IO completion>
      :
      : kloopd1: loop_queue_write_work
      :       xfs_file_write_iter
      :       lock XFS inode XFS_IOLOCK_EXCL (on image file)
      :       <wait for inode to be unlocked>
      :
      : i.e. the kloopd, with it's split read and write work queues, has
      : introduced a dependency through memory reclaim. i.e. that writes
      : need to be able to progress for reads make progress.
      :
      : The problem, fundamentally, is that mpage_readpages() does a
      : GFP_KERNEL allocation, rather than paying attention to the inode's
      : mapping gfp mask, which is set to GFP_NOFS.
      :
      : The didn't used to happen, because the loop device used to issue
      : reads through the splice path and that does:
      :
      :       error = add_to_page_cache_lru(page, mapping, index,
      :                       GFP_KERNEL & mapping_gfp_mask(mapping));
      
      This has changed by commit aa4d8616 ("block: loop: switch to VFS
      ITER_BVEC").
      
      This patch changes mpage_readpage{s} to follow gfp mask set for the
      mapping.  There are, however, other places which are doing basically the
      same.
      
      lustre:ll_dir_filler is doing GFP_KERNEL from the function which
      apparently uses GFP_NOFS for other allocations so let's make this
      consistent.
      
      cifs:readpages_get_pages is called from cifs_readpages and
      __cifs_readpages_from_fscache called from the same path obeys mapping
      gfp.
      
      ramfs_nommu_expand_for_mapping is hardcoding GFP_KERNEL as well
      regardless it uses mapping_gfp_mask for the page allocation.
      
      ext4_mpage_readpages is the called from the page cache allocation path
      same as read_pages and read_cache_pages
      
      As I've noticed in my previous post I cannot say I would be happy about
      sprinkling mapping_gfp_mask all over the place and it sounds like we
      should drop gfp_mask argument altogether and use it internally in
      __add_to_page_cache_locked that would require all the filesystems to use
      mapping gfp consistently which I am not sure is the case here.  From a
      quick glance it seems that some file system use it all the time while
      others are selective.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NDave Chinner <david@fromorbit.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Andreas Dilger <andreas.dilger@intel.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      063d99b4
  12. 11 9月, 2015 1 次提交
  13. 21 5月, 2015 1 次提交
  14. 11 5月, 2015 1 次提交
  15. 16 4月, 2015 1 次提交
  16. 12 4月, 2015 5 次提交
  17. 21 3月, 2015 1 次提交
    • D
      cifs: fix use-after-free bug in find_writable_file · e1e9bda2
      David Disseldorp 提交于
      Under intermittent network outages, find_writable_file() is susceptible
      to the following race condition, which results in a user-after-free in
      the cifs_writepages code-path:
      
      Thread 1                                        Thread 2
      ========                                        ========
      
      inv_file = NULL
      refind = 0
      spin_lock(&cifs_file_list_lock)
      
      // invalidHandle found on openFileList
      
      inv_file = open_file
      // inv_file->count currently 1
      
      cifsFileInfo_get(inv_file)
      // inv_file->count = 2
      
      spin_unlock(&cifs_file_list_lock);
      
      cifs_reopen_file()                            cifs_close()
      // fails (rc != 0)                            ->cifsFileInfo_put()
                                             spin_lock(&cifs_file_list_lock)
                                             // inv_file->count = 1
                                             spin_unlock(&cifs_file_list_lock)
      
      spin_lock(&cifs_file_list_lock);
      list_move_tail(&inv_file->flist,
            &cifs_inode->openFileList);
      spin_unlock(&cifs_file_list_lock);
      
      cifsFileInfo_put(inv_file);
      ->spin_lock(&cifs_file_list_lock)
      
        // inv_file->count = 0
        list_del(&cifs_file->flist);
        // cleanup!!
        kfree(cifs_file);
      
        spin_unlock(&cifs_file_list_lock);
      
      spin_lock(&cifs_file_list_lock);
      ++refind;
      // refind = 1
      goto refind_writable;
      
      At this point we loop back through with an invalid inv_file pointer
      and a refind value of 1. On second pass, inv_file is not overwritten on
      openFileList traversal, and is subsequently dereferenced.
      Signed-off-by: NDavid Disseldorp <ddiss@suse.de>
      Reviewed-by: NJeff Layton <jlayton@samba.org>
      CC: <stable@vger.kernel.org>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      e1e9bda2
  18. 17 2月, 2015 1 次提交
  19. 11 2月, 2015 1 次提交
  20. 20 1月, 2015 1 次提交
    • S
      Complete oplock break jobs before closing file handle · ca7df8e0
      Sachin Prabhu 提交于
      Commit
      c11f1df5
      requires writers to wait for any pending oplock break handler to
      complete before proceeding to write. This is done by waiting on bit
      CIFS_INODE_PENDING_OPLOCK_BREAK in cifsFileInfo->flags. This bit is
      cleared by the oplock break handler job queued on the workqueue once it
      has completed handling the oplock break allowing writers to proceed with
      writing to the file.
      
      While testing, it was noticed that the filehandle could be closed while
      there is a pending oplock break which results in the oplock break
      handler on the cifsiod workqueue being cancelled before it has had a
      chance to execute and clear the CIFS_INODE_PENDING_OPLOCK_BREAK bit.
      Any subsequent attempt to write to this file hangs waiting for the
      CIFS_INODE_PENDING_OPLOCK_BREAK bit to be cleared.
      
      We fix this by ensuring that we also clear the bit
      CIFS_INODE_PENDING_OPLOCK_BREAK when we remove the oplock break handler
      from the workqueue.
      
      The bug was found by Red Hat QA while testing using ltp's fsstress
      command.
      Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
      Acked-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve French <steve.french@primarydata.com>
      ca7df8e0
  21. 17 1月, 2015 3 次提交
  22. 11 12月, 2014 1 次提交
  23. 20 11月, 2014 1 次提交
  24. 17 10月, 2014 1 次提交
    • S
      Allow mknod and mkfifo on SMB2/SMB3 mounts · db8b631d
      Steve French 提交于
      The "sfu" mount option did not work on SMB2/SMB3 mounts.
      With these changes when the "sfu" mount option is passed in
      on an smb2/smb2.1/smb3 mount the client can emulate (and
      recognize) fifo and device (character and device files).
      
      In addition the "sfu" mount option should not conflict
      with "mfsymlinks" (symlink emulation) as we will never
      create "sfu" style symlinks, but using "sfu" mount option
      will allow us to recognize existing symlinks, created with
      Microsoft "Services for Unix" (SFU and SUA).
      
      To enable the "sfu" mount option for SMB2/SMB3 the calling
      syntax of the generic cifs/smb2/smb3 sync_read and sync_write
      protocol dependent function needed to be changed (we
      don't have a file struct in all cases), but this actually
      ended up simplifying the code a little.
      Signed-off-by: NSteve French <smfrench@gmail.com>
      db8b631d
  25. 09 10月, 2014 1 次提交
  26. 03 10月, 2014 1 次提交
  27. 22 8月, 2014 1 次提交
  28. 17 8月, 2014 1 次提交
    • P
      CIFS: Fix SMB2 readdir error handling · 52755808
      Pavel Shilovsky 提交于
      SMB2 servers indicates the end of a directory search with
      STATUS_NO_MORE_FILE error code that is not processed now.
      This causes generic/257 xfstest to fail. Fix this by triggering
      the end of search by this error code in SMB2_query_directory.
      
      Also when negotiating CIFS protocol we tell the server to close
      the search automatically at the end and there is no need to do
      it itself. In the case of SMB2 protocol, we need to close it
      explicitly - separate close directory checks for different
      protocols.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPavel Shilovsky <pshilovsky@samba.org>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      52755808
  29. 02 8月, 2014 3 次提交