1. 08 6月, 2016 1 次提交
  2. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  3. 09 8月, 2014 1 次提交
  4. 07 6月, 2014 1 次提交
  5. 15 5月, 2014 1 次提交
  6. 07 5月, 2014 7 次提交
  7. 09 8月, 2013 2 次提交
    • J
      reiserfs: locking, release lock around quota operations · d2d0395f
      Jeff Mahoney 提交于
      Previous commits released the write lock across quota operations but
      missed several places.  In particular, the free operations can also
      call into the file system code and take the write lock, causing
      deadlocks.
      
      This patch introduces some more helpers and uses them for quota call
      sites.  Without this patch applied, reiserfs + quotas runs into deadlocks
      under anything more than trivial load.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      d2d0395f
    • J
      reiserfs: locking, handle nested locks properly · 278f6679
      Jeff Mahoney 提交于
      The reiserfs write lock replaced the BKL and uses similar semantics.
      
      Frederic's locking code makes a distinction between when the lock is nested
      and when it's being acquired/released, but I don't think that's the right
      distinction to make.
      
      The right distinction is between the lock being released at end-of-use and
      the lock being released for a schedule. The unlock should return the depth
      and the lock should restore it, rather than the other way around as it is now.
      
      This patch implements that and adds a number of places where the lock
      should be dropped.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      278f6679
  8. 20 11月, 2012 1 次提交
  9. 21 3月, 2012 1 次提交
  10. 20 3月, 2012 1 次提交
  11. 05 3月, 2010 1 次提交
    • C
      dquot: cleanup space allocation / freeing routines · 5dd4056d
      Christoph Hellwig 提交于
      Get rid of the alloc_space, free_space, reserve_space, claim_space and
      release_rsv dquot operations - they are always called from the filesystem
      and if a filesystem really needs their own (which none currently does)
      it can just call into it's own routine directly.
      
      Move shared logic into the common __dquot_alloc_space,
      dquot_claim_space_nodirty and __dquot_free_space low-level methods,
      and rationalize the wrappers around it to move as much as possible
      code into the common block for CONFIG_QUOTA vs not.  Also rename
      all these helpers to be named dquot_* instead of vfs_dq_*.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5dd4056d
  12. 14 9月, 2009 5 次提交
    • F
      kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock · 08f14fc8
      Frederic Weisbecker 提交于
      When do_balance() balances the tree, a trick is performed to
      provide the ability for other tree writers/readers to check whether
      do_balance() is executing concurrently (requires CONFIG_REISERFS_CHECK).
      
      This is done to protect concurrent accesses to the tree. The trick
      is the following:
      
      When do_balance is called, a unique global variable called cur_tb
      takes a pointer to the current tree to be rebalanced.
      Once do_balance finishes its work, cur_tb takes the NULL value.
      
      Then, concurrent tree readers/writers just have to check the value
      of cur_tb to ensure do_balance isn't executing concurrently.
      If it is, then it proves that schedule() occured on do_balance(),
      which then relaxed the bkl that protected the tree.
      
      Now that the bkl has be turned into a mutex, this check is still
      fine even though do_balance() becomes preemptible: the write lock
      will not be automatically released on schedule(), so the tree is
      still protected.
      
      But this is only fine if we have a single reiserfs mountpoint.
      Indeed, because the bkl is a global lock, it didn't allowed
      concurrent executions between a tree reader/writer in a mount point
      and a do_balance() on another tree from another mountpoint.
      
      So assuming all these readers/writers weren't supposed to be
      reentrant, the current check now sometimes detect false positives with
      the current per-superblock mutex which allows this reentrancy.
      
      This patch keeps the concurrent tree accesses check but moves it
      per superblock, so that only trees from a same mount point are
      checked to be not accessed concurrently.
      
      [ Impact: fix spurious panic while running several reiserfs mount-points ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      08f14fc8
    • F
      kill-the-bkl/reiserfs: unlock only when needed in search_by_key · 2ac62695
      Frederic Weisbecker 提交于
      search_by_key() is the site which most requires the lock.
      This is mostly because it is a very central function and also
      because it releases/reaqcuires the write lock at least once each
      time it is called.
      
      Such release/reacquire creates a lot of contention in this place and
      also opens more the window which let another thread changing the tree.
      When it happens, the current path searching over the tree must be
      retried from the beggining (the root) which is a wasteful and
      time consuming recovery.
      
      This patch factorizes two release/reacquire sequences:
      
      - reading leaf nodes blocks
      - reading current block
      
      The latter immediately follows the former.
      
      The whole sequence is safe as a single unlocked section because
      we check just after if the tree has changed during these operations.
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      2ac62695
    • F
      kill-the-bkl/reiserfs: reduce number of contentions in search_by_key() · 09eb47a7
      Frederic Weisbecker 提交于
      search_by_key() is a central function in reiserfs which searches
      the patch in the fs tree from the root to a node given its key.
      
      It is the function that is most requesting the write lock
      because it's a path very often used.
      
      Also we forget to release the lock while reading the next tree node,
      making us holding the lock in a wasteful way.
      
      Then we release the lock while reading the current node and its childs,
      all-in-one. It should be safe because we have a reference to these
      blocks and even if we read a block that will be concurrently changed,
      we have an fs_changed check later that will make us retry the path from
      the root.
      
      [ Impact: release the write lock while unused in a hot path ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      09eb47a7
    • F
      kill-the-BKL/reiserfs: release write lock while rescheduling on prepare_for_delete_or_cut() · 5e69e3a4
      Frederic Weisbecker 提交于
      prepare_for_delete_or_cut() can process several types of items, including
      indirect items, ie: items which contain no file data but pointers to
      unformatted nodes scattering the datas of a file.
      
      In this case it has to zero out these pointers to block numbers of
      unformatted nodes and release the bitmap from these block numbers.
      
      It can take some time, so a rescheduling() is performed between each
      block processed. We can safely release the write lock while
      rescheduling(), like the bkl did, because the code checks just after
      if the item has moved after sleeping.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      5e69e3a4
    • F
      reiserfs: kill-the-BKL · 8ebc4232
      Frederic Weisbecker 提交于
      This patch is an attempt to remove the Bkl based locking scheme from
      reiserfs and is intended.
      
      It is a bit inspired from an old attempt by Peter Zijlstra:
      
         http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html
      
      The bkl is heavily used in this filesystem to prevent from
      concurrent write accesses on the filesystem.
      
      Reiserfs makes a deep use of the specific properties of the Bkl:
      
      - It can be acqquired recursively by a same task
      - It is released on the schedule() calls and reacquired when schedule() returns
      
      The two properties above are a roadmap for the reiserfs write locking so it's
      very hard to simply replace it with a common mutex.
      
      - We need a recursive-able locking unless we want to restructure several blocks
        of the code.
      - We need to identify the sites where the bkl was implictly relaxed
        (schedule, wait, sync, etc...) so that we can in turn release and
        reacquire our new lock explicitly.
        Such implicit releases of the lock are often required to let other
        resources producer/consumer do their job or we can suffer unexpected
        starvations or deadlocks.
      
      So the new lock that replaces the bkl here is a per superblock mutex with a
      specific property: it can be acquired recursively by a same task, like the
      bkl.
      
      For such purpose, we integrate a lock owner and a lock depth field on the
      superblock information structure.
      
      The first axis on this patch is to turn reiserfs_write_(un)lock() function
      into a wrapper to manage this mutex. Also some explicit calls to
      lock_kernel() have been converted to reiserfs_write_lock() helpers.
      
      The second axis is to find the important blocking sites (schedule...(),
      wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
      release of the write lock on these locations before blocking. Then we can
      safely wait for those who can give us resources or those who need some.
      Typically this is a fight between the current writer, the reiserfs workqueue
      (aka the async commiter) and the pdflush threads.
      
      The third axis is a consequence of the second. The write lock is usually
      on top of a lock dependency chain which can include the journal lock, the
      flush lock or the commit lock. So it's dangerous to release and trying to
      reacquire the write lock while we still hold other locks.
      
      This is fine with the bkl:
      
            T1                       T2
      
      lock_kernel()
          mutex_lock(A)
          unlock_kernel()
          // do something
                                  lock_kernel()
                                      mutex_lock(A) -> already locked by T1
                                      schedule() (and then unlock_kernel())
          lock_kernel()
          mutex_unlock(A)
          ....
      
      This is not fine with a mutex:
      
            T1                       T2
      
      mutex_lock(write)
          mutex_lock(A)
          mutex_unlock(write)
          // do something
                                 mutex_lock(write)
                                    mutex_lock(A) -> already locked by T1
                                    schedule()
      
          mutex_lock(write) -> already locked by T2
          deadlock
      
      The solution in this patch is to provide a helper which releases the write
      lock and sleep a bit if we can't lock a mutex that depend on it. It's another
      simulation of the bkl behaviour.
      
      The last axis is to locate the fs callbacks that are called with the bkl held,
      according to Documentation/filesystem/Locking.
      
      Those are:
      
      - reiserfs_remount
      - reiserfs_fill_super
      - reiserfs_put_super
      
      Reiserfs didn't need to explicitly lock because of the context of these callbacks.
      But now we must take care of that with the new locking.
      
      After this patch, reiserfs suffers from a slight performance regression (for now).
      On UP, a high volume write with dd reports an average of 27 MB/s instead
      of 30 MB/s without the patch applied.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Bron Gondwana <brong@fastmail.fm>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8ebc4232
  13. 31 3月, 2009 11 次提交
  14. 26 3月, 2009 1 次提交
  15. 28 4月, 2008 1 次提交
  16. 15 11月, 2007 1 次提交
    • F
      reiserfs: don't drop PG_dirty when releasing sub-page-sized dirty file · c06a018f
      Fengguang Wu 提交于
      This is not a new problem in 2.6.23-git17.  2.6.22/2.6.23 is buggy in the
      same way.
      
      Reiserfs could accumulate dirty sub-page-size files until umount time.
      They cannot be synced to disk by pdflush routines or explicit `sync'
      commands.  Only `umount' can do the trick.
      
      The direct cause is: the dirty page's PG_dirty is wrongly _cleared_.
      Call trace:
      	 [<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0
      	 [<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710
      	 [<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530
      	 [<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0
      	 [<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340
      	 [<ffffffff802a187c>] __fput+0xcc/0x1b0
      	 [<ffffffff802a1ba6>] fput+0x16/0x20
      	 [<ffffffff8029e676>] filp_close+0x56/0x90
      	 [<ffffffff8029fe0d>] sys_close+0xad/0x110
      	 [<ffffffff8020c41e>] system_call+0x7e/0x83
      
      Fix the bug by removing the cancel_dirty_page() call. Tests show that
      it causes no bad behaviors on various write sizes.
      
      === for the patient ===
      Here are more detailed demonstrations of the problem.
      
      1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to;
         and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed.
      
      ------------------------------ screen 0 ------------------------------
      [T0] root /home/wfg# cat > /test/tiny
      [T1] hi
      [T2] root /home/wfg#
      
      ------------------------------ screen 1 ------------------------------
      [T1] root /home/wfg# echo /test/tiny > /proc/filecache
      [T1] root /home/wfg# cat /proc/filecache
           # file /test/tiny
           # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
           # idx   len     state   refcnt
           0       1       ___UD__Bd_      2
      [T2] root /home/wfg# cat /proc/filecache
           # file /test/tiny
           # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
           # idx   len     state   refcnt
           0       1       ___U___Bd_      2
      
      2) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied.
      
      ------------------------------ screen 0 ------------------------------
      [T0] root /home/wfg# echo hi > /tmp/hi
      [T1] root /home/wfg# cp /tmp/hi /dev/stdin /test
      [T2] hi
      [T3] root /home/wfg#
      
      ------------------------------ screen 1 ------------------------------
      [T1] root /proc/4397# cd /proc/`pidof cp`
      [T1] root /proc/4713# cat io
           rchar: 8396
           wchar: 3
           syscr: 20
           syscw: 1
           read_bytes: 0
           write_bytes: 20480
           cancelled_write_bytes: 4096
      [T2] root /proc/4713# cat io
           rchar: 8399
           wchar: 6
           syscr: 21
           syscw: 2
           read_bytes: 0
           write_bytes: 24576
           cancelled_write_bytes: 4096
      
      //Question: the 'write_bytes' is a bit more than expected ;-)
      Tested-by: NMaxim Levitsky <maximlevitsky@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Reviewed-by: NChris Mason <chris.mason@oracle.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c06a018f
  17. 20 10月, 2007 1 次提交
  18. 27 7月, 2007 1 次提交
  19. 09 5月, 2007 1 次提交