1. 17 9月, 2009 1 次提交
    • F
      kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency · 193be0ee
      Frederic Weisbecker 提交于
      Alexander Beregalov reported the following warning:
      
      	=======================================================
      	[ INFO: possible circular locking dependency detected ]
      	2.6.31-03149-gdcc030a #1
      	-------------------------------------------------------
      	udevadm/716 is trying to acquire lock:
      	 (&mm->mmap_sem){++++++}, at: [<c107249a>] might_fault+0x4a/0xa0
      
      	but task is already holding lock:
      	 (sysfs_mutex){+.+.+.}, at: [<c10cb9aa>] sysfs_readdir+0x5a/0x200
      
      	which lock already depends on the new lock.
      
      	the existing dependency chain (in reverse order) is:
      
      	-> #3 (sysfs_mutex){+.+.+.}:
      	       [...]
      
      	-> #2 (&bdev->bd_mutex){+.+.+.}:
      	       [...]
      
      	-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
      	       [...]
      
      	-> #0 (&mm->mmap_sem){++++++}:
      	       [...]
      
      On reiserfs mount path, we take the reiserfs lock and while
      initializing the journal, we open the device, taking the
      bdev->bd_mutex. Then rescan_partition() may signal the change
      to sysfs.
      
      We have then the following dependency:
      
      	reiserfs_lock -> bd_mutex -> sysfs_mutex
      
      Later, while entering reiserfs_readpage() after a pagefault in an
      mmaped reiserfs file, we are holding the mm->mmap_sem, and we are going
      to take the reiserfs lock too.
      We have then the following dependency:
      
      	mm->mmap_sem -> reiserfs_lock
      
      which, expanded with the previous dependency gives us:
      
      	mm->mmap_sem -> reiserfs_lock -> bd_mutex -> sysfs_mutex
      
      Now while entering the sysfs readdir path, we are holding the
      sysfs_mutex. And when we copy a directory entry to the user buffer, we
      might fault and then take the mm->mmap_sem lock. Which leads to the
      circular locking dependency reported.
      
      We can fix that by relaxing the reiserfs lock during the call to
      journal_init_dev(), which is the place where we open the mounted
      device.
      
      This is fine to relax the lock here because we are in the begining of
      the reiserfs mount path and there is nothing to protect at this time,
      the journal is not intialized.
      We just keep this lock around for paranoid reasons.
      Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
      Tested-by: NAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      193be0ee
  2. 14 9月, 2009 6 次提交
    • F
      kill-the-bkl/reiserfs: acquire the inode mutex safely · c72e0575
      Frederic Weisbecker 提交于
      While searching a pathname, an inode mutex can be acquired
      in do_lookup() which calls reiserfs_lookup() which in turn
      acquires the write lock.
      
      On the other side reiserfs_fill_super() can acquire the write_lock
      and then call reiserfs_lookup_privroot() which can acquire an
      inode mutex (the root of the mount point).
      
      So we theoretically risk an AB - BA lock inversion that could lead
      to a deadlock.
      
      As for other lock dependencies found since the bkl to mutex
      conversion, the fix is to use reiserfs_mutex_lock_safe() which
      drops the lock dependency to the write lock.
      
      [ Impact: fix a possible deadlock with reiserfs ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      c72e0575
    • F
      kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe · c63e3c0b
      Frederic Weisbecker 提交于
      reiserfs_mutex_lock_safe() is a hack to avoid any dependency between
      an internal reiserfs mutex and the write lock, it has been proposed
      to follow the old bkl logic.
      
      The code does the following:
      
      while (!mutex_trylock(m)) {
      	reiserfs_write_unlock(s);
      	schedule();
      	reiserfs_write_lock(s);
      }
      
      It then imitate the implicit behaviour of the lock when it was
      a Bkl and hadn't such dependency:
      
      mutex_lock(m) {
      	if (fastpath)
      		let's go
      	else {
      		wait_for_mutex() {
      			schedule() {
      				unlock_kernel()
      				reacquire_lock_kernel()
      			}
      		}
      	}
      }
      
      The problem is that by using such explicit schedule(), we don't
      benefit of the adaptive mutex spinning on owner.
      
      The logic in use now is:
      
      reiserfs_write_unlock(s);
      mutex_lock(m); // -> possible adaptive spinning
      reiserfs_write_lock(s);
      
      [ Impact: restore the use of adaptive spinning mutexes in reiserfs ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      c63e3c0b
    • F
      kill-the-BKL/reiserfs: release the write lock on flush_commit_list() · 6e3647ac
      Frederic Weisbecker 提交于
      flush_commit_list() uses ll_rw_block() to commit the pending log blocks.
      ll_rw_block() might sleep, and the bkl was released at this point. Then
      we can also relax the write lock at this point.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      6e3647ac
    • F
      kill-the-BKL/reiserfs: release the write lock before rescheduling on do_journal_end() · e6950a4d
      Frederic Weisbecker 提交于
      When do_journal_end() copies data to the journal blocks buffers in memory,
      it reschedules if needed between each block copied and dirtyfied.
      
      We can also release the write lock at this rescheduling stage,
      like did the bkl implicitly.
      
      [ Impact: release the reiserfs write lock when it is not needed ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      e6950a4d
    • F
      reiserfs, kill-the-BKL: fix unsafe j_flush_mutex lock · a412f9ef
      Frederic Weisbecker 提交于
      Impact: fix a deadlock
      
      The j_flush_mutex is acquired safely in journal.c:
      if we can't take it, we free the reiserfs per superblock lock
      and wait a bit.
      
      But we have a remaining place in kupdate_transactions() where
      j_flush_mutex is still acquired traditionnaly. Thus the following
      scenario (warned by lockdep) can happen:
      
      A						B
      
      mutex_lock(&write_lock)			mutex_lock(&write_lock)
      	mutex_lock(&j_flush_mutex)	mutex_lock(&j_flush_mutex) //block
      	mutex_unlock(&write_lock)
      	sleep...
      	mutex_lock(&write_lock) //deadlock
      
      Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions().
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      LKML-Reference: <1239660635-12940-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a412f9ef
    • F
      reiserfs: kill-the-BKL · 8ebc4232
      Frederic Weisbecker 提交于
      This patch is an attempt to remove the Bkl based locking scheme from
      reiserfs and is intended.
      
      It is a bit inspired from an old attempt by Peter Zijlstra:
      
         http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html
      
      The bkl is heavily used in this filesystem to prevent from
      concurrent write accesses on the filesystem.
      
      Reiserfs makes a deep use of the specific properties of the Bkl:
      
      - It can be acqquired recursively by a same task
      - It is released on the schedule() calls and reacquired when schedule() returns
      
      The two properties above are a roadmap for the reiserfs write locking so it's
      very hard to simply replace it with a common mutex.
      
      - We need a recursive-able locking unless we want to restructure several blocks
        of the code.
      - We need to identify the sites where the bkl was implictly relaxed
        (schedule, wait, sync, etc...) so that we can in turn release and
        reacquire our new lock explicitly.
        Such implicit releases of the lock are often required to let other
        resources producer/consumer do their job or we can suffer unexpected
        starvations or deadlocks.
      
      So the new lock that replaces the bkl here is a per superblock mutex with a
      specific property: it can be acquired recursively by a same task, like the
      bkl.
      
      For such purpose, we integrate a lock owner and a lock depth field on the
      superblock information structure.
      
      The first axis on this patch is to turn reiserfs_write_(un)lock() function
      into a wrapper to manage this mutex. Also some explicit calls to
      lock_kernel() have been converted to reiserfs_write_lock() helpers.
      
      The second axis is to find the important blocking sites (schedule...(),
      wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
      release of the write lock on these locations before blocking. Then we can
      safely wait for those who can give us resources or those who need some.
      Typically this is a fight between the current writer, the reiserfs workqueue
      (aka the async commiter) and the pdflush threads.
      
      The third axis is a consequence of the second. The write lock is usually
      on top of a lock dependency chain which can include the journal lock, the
      flush lock or the commit lock. So it's dangerous to release and trying to
      reacquire the write lock while we still hold other locks.
      
      This is fine with the bkl:
      
            T1                       T2
      
      lock_kernel()
          mutex_lock(A)
          unlock_kernel()
          // do something
                                  lock_kernel()
                                      mutex_lock(A) -> already locked by T1
                                      schedule() (and then unlock_kernel())
          lock_kernel()
          mutex_unlock(A)
          ....
      
      This is not fine with a mutex:
      
            T1                       T2
      
      mutex_lock(write)
          mutex_lock(A)
          mutex_unlock(write)
          // do something
                                 mutex_lock(write)
                                    mutex_lock(A) -> already locked by T1
                                    schedule()
      
          mutex_lock(write) -> already locked by T2
          deadlock
      
      The solution in this patch is to provide a helper which releases the write
      lock and sleep a bit if we can't lock a mutex that depend on it. It's another
      simulation of the bkl behaviour.
      
      The last axis is to locate the fs callbacks that are called with the bkl held,
      according to Documentation/filesystem/Locking.
      
      Those are:
      
      - reiserfs_remount
      - reiserfs_fill_super
      - reiserfs_put_super
      
      Reiserfs didn't need to explicitly lock because of the context of these callbacks.
      But now we must take care of that with the new locking.
      
      After this patch, reiserfs suffers from a slight performance regression (for now).
      On UP, a high volume write with dd reports an average of 27 MB/s instead
      of 30 MB/s without the patch applied.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Bron Gondwana <brong@fastmail.fm>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8ebc4232
  3. 11 7月, 2009 1 次提交
  4. 31 3月, 2009 6 次提交
    • J
      reiserfs: rename p_s_sb to sb · a9dd3643
      Jeff Mahoney 提交于
      This patch is a simple s/p_s_sb/sb/g to the reiserfs code.  This is the
      first in a series of patches to rip out some of the awful variable
      naming in reiserfs.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9dd3643
    • J
      reiserfs: strip trailing whitespace · 0222e657
      Jeff Mahoney 提交于
      This patch strips trailing whitespace from the reiserfs code.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0222e657
    • J
      reiserfs: rearrange journal abort · 32e8b106
      Jeff Mahoney 提交于
      This patch kills off reiserfs_journal_abort as it is never called, and
      combines __reiserfs_journal_abort_{soft,hard} into one function called
      reiserfs_abort_journal, which performs the same work. It is silent
      as opposed to the old version, since the message was always issued
      after a regular 'abort' message.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32e8b106
    • J
      reiserfs: rework reiserfs_panic · c3a9c210
      Jeff Mahoney 提交于
      ReiserFS panics can be somewhat inconsistent.
      In some cases:
       * a unique identifier may be associated with it
       * the function name may be included
       * the device may be printed separately
      
      This patch aims to make warnings more consistent. reiserfs_warning() prints
      the device name, so printing it a second time is not required. The function
      name for a warning is always helpful in debugging, so it is now automatically
      inserted into the output. Hans has stated that every warning should have
      a unique identifier. Some cases lack them, others really shouldn't have them.
      reiserfs_warning() now expects an id associated with each message. In the
      rare case where one isn't needed, "" will suffice.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3a9c210
    • J
      reiserfs: rework reiserfs_warning · 45b03d5e
      Jeff Mahoney 提交于
      ReiserFS warnings can be somewhat inconsistent.
      In some cases:
       * a unique identifier may be associated with it
       * the function name may be included
       * the device may be printed separately
      
      This patch aims to make warnings more consistent. reiserfs_warning() prints
      the device name, so printing it a second time is not required. The function
      name for a warning is always helpful in debugging, so it is now automatically
      inserted into the output. Hans has stated that every warning should have
      a unique identifier. Some cases lack them, others really shouldn't have them.
      reiserfs_warning() now expects an id associated with each message. In the
      rare case where one isn't needed, "" will suffice.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45b03d5e
    • J
      reiserfs: audit transaction ids to always be unsigned ints · 600ed416
      Jeff Mahoney 提交于
      This patch fixes up the reiserfs code such that transaction ids are
      always unsigned ints.  In places they can currently be signed ints or
      unsigned longs.
      
      The former just causes an annoying clm-2200 warning and may join a
      transaction when it should wait.
      
      The latter is just for correctness since the disk format uses a 32-bit
      transaction id.  There aren't any runtime problems that result from it
      not wrapping at the correct location since the value is truncated
      correctly even on big endian systems.  The 0 value might make it to
      disk, but the mount-time checks will bump it to 10 itself.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      600ed416
  5. 21 10月, 2008 4 次提交
  6. 05 8月, 2008 2 次提交
  7. 26 7月, 2008 3 次提交
  8. 30 4月, 2008 1 次提交
  9. 28 4月, 2008 2 次提交
  10. 19 4月, 2008 1 次提交
  11. 20 10月, 2007 2 次提交
  12. 17 10月, 2007 3 次提交
  13. 09 5月, 2007 2 次提交
  14. 30 11月, 2006 1 次提交
  15. 22 11月, 2006 1 次提交
  16. 21 10月, 2006 1 次提交
    • A
      [PATCH] separate bdi congestion functions from queue congestion functions · 3fcfab16
      Andrew Morton 提交于
      Separate out the concept of "queue congestion" from "backing-dev congestion".
      Congestion is a backing-dev concept, not a queue concept.
      
      The blk_* congestion functions are retained, as wrappers around the core
      backing-dev congestion functions.
      
      This proper layering is needed so that NFS can cleanly use the congestion
      functions, and so that CONFIG_BLOCK=n actually links.
      
      Cc: "Thomas Maier" <balagi@justmail.de>
      Cc: "Jens Axboe" <jens.axboe@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3fcfab16
  17. 04 10月, 2006 1 次提交
  18. 30 9月, 2006 1 次提交
    • C
      [PATCH] Fix reiserfs latencies caused by data=ordered · a3172027
      Chris Mason 提交于
      ReiserFS does periodic cleanup of old transactions in order to limit the
      length of time a journal replay may take after a crash.  Sometimes, writing
      metadata from an old (already committed) transaction may require committing
      a newer transaction, which also requires writing all data=ordered buffers.
      This can cause very long stalls on journal_begin.
      
      This patch makes sure new transactions will not need to be committed before
      trying a periodic reclaim of an old transaction.  It is low risk because if
      a bad decision is made, it just means a slightly longer journal replay
      after a crash.
      Signed-off-by: NChris Mason <mason@suse.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a3172027
  19. 01 7月, 2006 1 次提交