1. 21 7月, 2011 1 次提交
  2. 14 3月, 2011 1 次提交
  3. 10 3月, 2011 1 次提交
  4. 18 11月, 2010 1 次提交
  5. 27 10月, 2010 1 次提交
    • W
      writeback: remove nonblocking/encountered_congestion references · 1b430bee
      Wu Fengguang 提交于
      This removes more dead code that was somehow missed by commit 0d99519e
      (writeback: remove unused nonblocking and congestion checks).  There are
      no behavior change except for the removal of two entries from one of the
      ext4 tracing interface.
      
      The nonblocking checks in ->writepages are no longer used because the
      flusher now prefer to block on get_request_wait() than to skip inodes on
      IO congestion.  The latter will lead to more seeky IO.
      
      The nonblocking checks in ->writepage are no longer used because it's
      redundant with the WB_SYNC_NONE check.
      
      We no long set ->nonblocking in VM page out and page migration, because
      a) it's effectively redundant with WB_SYNC_NONE in current code
      b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
         that would skip some dirty inodes on congestion and page out others, which
         is unfair in terms of LRU age.
      
      Inspired by Christoph Hellwig. Thanks!
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: Steve French <sfrench@samba.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b430bee
  6. 26 10月, 2010 1 次提交
  7. 18 8月, 2010 1 次提交
  8. 10 8月, 2010 6 次提交
    • A
      convert reiserfs to ->evict_inode() · 845a2cc0
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      845a2cc0
    • C
      always call inode_change_ok early in ->setattr · db78b877
      Christoph Hellwig 提交于
      Make sure we call inode_change_ok before doing any changes in ->setattr,
      and make sure to call it even if our fs wants to ignore normal UNIX
      permissions, but use the ATTR_FORCE to skip those.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      db78b877
    • C
      remove inode_setattr · 1025774c
      Christoph Hellwig 提交于
      Replace inode_setattr with opencoded variants of it in all callers.  This
      moves the remaining call to vmtruncate into the filesystem methods where it
      can be replaced with the proper truncate sequence.
      
      In a few cases it was obvious that we would never end up calling vmtruncate
      so it was left out in the opencoded variant:
      
       spufs: explicitly checks for ATTR_SIZE earlier
       btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
       ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above
      
      In addition to that ncpfs called inode_setattr with handcrafted iattrs,
      which allowed to trim down the opencoded variant.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1025774c
    • C
      introduce __block_write_begin · 6e1db88d
      Christoph Hellwig 提交于
      Split up the block_write_begin implementation - __block_write_begin is a new
      trivial wrapper for block_prepare_write that always takes an already
      allocated page and can be either called from block_write_begin or filesystem
      code that already has a page allocated.  Remove the handling of already
      allocated pages from block_write_begin after switching all callers that
      do it to __block_write_begin.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6e1db88d
    • C
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig 提交于
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eafdc7d1
    • A
      Fix reiserfs_file_release() · 0e4f6a79
      Al Viro 提交于
      a) count file openers correctly; i_count use was completely wrong
      b) use new mutex for exclusion between final close/open/truncate,
      to protect tailpacking logics.  i_mutex use was wrong and resulted
      in deadlocks.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0e4f6a79
  9. 17 6月, 2010 1 次提交
  10. 22 5月, 2010 1 次提交
  11. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  12. 06 3月, 2010 1 次提交
  13. 05 3月, 2010 5 次提交
    • C
      dquot: cleanup dquot initialize routine · 871a2931
      Christoph Hellwig 提交于
      Get rid of the initialize dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_initialize helper to __dquot_initialize
      and vfs_dq_init to dquot_initialize to have a consistent namespace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      871a2931
    • C
      dquot: move dquot initialization responsibility into the filesystem · 907f4554
      Christoph Hellwig 提交于
      Currently various places in the VFS call vfs_dq_init directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the initialization.   For most metadata operations
      this is a straight forward move into the methods, but for truncate and
      open it's a bit more complicated.
      
      For truncate we currently only call vfs_dq_init for the sys_truncate case
      because open already takes care of it for ftruncate and open(O_TRUNC) - the
      new code causes an additional vfs_dq_init for those which is harmless.
      
      For open the initialization is moved from do_filp_open into the open method,
      which means it happens slightly earlier now, and only for regular files.
      The latter is fine because we don't need to initialize it for operations
      on special files, and we already do it as part of the namespace operations
      for directories.
      
      Add a dquot_file_open helper that filesystems that support generic quotas
      can use to fill in ->open.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      907f4554
    • C
      dquot: cleanup dquot drop routine · 9f754758
      Christoph Hellwig 提交于
      Get rid of the drop dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_drop helper to __dquot_drop
      and vfs_dq_drop to dquot_drop to have a consistent namespace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      9f754758
    • C
      dquot: cleanup dquot transfer routine · b43fa828
      Christoph Hellwig 提交于
      Get rid of the transfer dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_transfer helper to __dquot_transfer
      and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
      and make the new dquot_transfer return a normal negative errno value
      which all callers expect.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b43fa828
    • C
      dquot: cleanup inode allocation / freeing routines · 63936dda
      Christoph Hellwig 提交于
      Get rid of the alloc_inode and free_inode dquot operations - they are
      always called from the filesystem and if a filesystem really needs
      their own (which none currently does) it can just call into it's
      own routine directly.
      
      Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
      call the lowlevel dquot_alloc_inode / dqout_free_inode routines
      directly, which now lose the number argument which is always 1.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      63936dda
  14. 15 2月, 2010 1 次提交
  15. 05 1月, 2010 2 次提交
    • F
      reiserfs: Relax the lock before truncating pages · 108d3943
      Frederic Weisbecker 提交于
      While truncating a file, reiserfs_setattr() calls inode_setattr()
      that will truncate the mapping for the given inode, but for that
      it needs the pages locks.
      
      In order to release these, the owners need the reiserfs lock to
      complete their jobs. But they can't, as we don't release it before
      calling inode_setattr().
      
      We need to do that to fix the following softlockups:
      
      INFO: task flush-8:0:2149 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      flush-8:0     D f51af998     0  2149      2 0x00000000
       f51af9ac 00000092 00000002 f51af998 c2803304 00000000 c1894ad0 010f3000
       f51af9cc c1462604 c189ef80 f51af974 c1710304 f715b450 f715b5ec c2807c40
       00000000 0005bb00 c2803320 c102c55b c1710304 c2807c50 c2803304 00000246
      Call Trace:
       [<c1462604>] ? schedule+0x434/0xb20
       [<c102c55b>] ? resched_task+0x4b/0x70
       [<c106fa22>] ? mark_held_locks+0x62/0x80
       [<c146414d>] ? mutex_lock_nested+0x1fd/0x350
       [<c14640b9>] mutex_lock_nested+0x169/0x350
       [<c1178cde>] ? reiserfs_write_lock+0x2e/0x40
       [<c1178cde>] reiserfs_write_lock+0x2e/0x40
       [<c11719a2>] do_journal_end+0xc2/0xe70
       [<c1172912>] journal_end+0xb2/0x120
       [<c11686b3>] ? pathrelse+0x33/0xb0
       [<c11729e4>] reiserfs_end_persistent_transaction+0x64/0x70
       [<c1153caa>] reiserfs_get_block+0x12ba/0x15f0
       [<c106fa22>] ? mark_held_locks+0x62/0x80
       [<c1154b24>] reiserfs_writepage+0xa74/0xe80
       [<c1465a27>] ? _raw_spin_unlock_irq+0x27/0x50
       [<c11f3d25>] ? radix_tree_gang_lookup_tag_slot+0x95/0xc0
       [<c10b5377>] ? find_get_pages_tag+0x127/0x1a0
       [<c106fa22>] ? mark_held_locks+0x62/0x80
       [<c106fcd4>] ? trace_hardirqs_on_caller+0x124/0x170
       [<c10bc1e0>] __writepage+0x10/0x40
       [<c10bc9ab>] write_cache_pages+0x16b/0x320
       [<c10bc1d0>] ? __writepage+0x0/0x40
       [<c10bcb88>] generic_writepages+0x28/0x40
       [<c10bcbd5>] do_writepages+0x35/0x40
       [<c11059f7>] writeback_single_inode+0xc7/0x330
       [<c11067b2>] writeback_inodes_wb+0x2c2/0x490
       [<c1106a86>] wb_writeback+0x106/0x1b0
       [<c1106cf6>] wb_do_writeback+0x106/0x1e0
       [<c1106c18>] ? wb_do_writeback+0x28/0x1e0
       [<c1106e0a>] bdi_writeback_task+0x3a/0xb0
       [<c10cbb13>] bdi_start_fn+0x63/0xc0
       [<c10cbab0>] ? bdi_start_fn+0x0/0xc0
       [<c105d1f4>] kthread+0x74/0x80
       [<c105d180>] ? kthread+0x0/0x80
       [<c100327a>] kernel_thread_helper+0x6/0x10
      3 locks held by flush-8:0/2149:
       #0:  (&type->s_umount_key#30){+++++.}, at: [<c110676f>] writeback_inodes_wb+0x27f/0x490
       #1:  (&journal->j_mutex){+.+...}, at: [<c117199a>] do_journal_end+0xba/0xe70
       #2:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178cde>] reiserfs_write_lock+0x2e/0x40
      INFO: task fstest:3813 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      fstest        D 00000002     0  3813   3812 0x00000000
       f5103c94 00000082 f5103c40 00000002 f5ad5450 00000007 f5103c28 011f3000
       00000006 f5ad5450 c10bb005 00000480 c1710304 f5ad5450 f5ad55ec c2907c40
       00000001 f5ad5450 f5103c74 00000046 00000002 f5ad5450 00000007 f5103c6c
      Call Trace:
       [<c10bb005>] ? free_hot_cold_page+0x1d5/0x280
       [<c1462d64>] io_schedule+0x74/0xc0
       [<c10b5a45>] sync_page+0x35/0x60
       [<c146325a>] __wait_on_bit_lock+0x4a/0x90
       [<c10b5a10>] ? sync_page+0x0/0x60
       [<c10b59e5>] __lock_page+0x85/0x90
       [<c105d660>] ? wake_bit_function+0x0/0x60
       [<c10bf654>] truncate_inode_pages_range+0x1e4/0x2d0
       [<c10bf75f>] truncate_inode_pages+0x1f/0x30
       [<c10bf7cf>] truncate_pagecache+0x5f/0xa0
       [<c10bf86a>] vmtruncate+0x5a/0x70
       [<c10fdb7d>] inode_setattr+0x5d/0x190
       [<c1150117>] reiserfs_setattr+0x1f7/0x2f0
       [<c1464569>] ? down_write+0x49/0x70
       [<c10fde01>] notify_change+0x151/0x330
       [<c10e6f3d>] do_truncate+0x6d/0xa0
       [<c10f4ce2>] do_filp_open+0x9a2/0xcf0
       [<c1465aec>] ? _raw_spin_unlock+0x2c/0x50
       [<c10fec50>] ? alloc_fd+0xe0/0x100
       [<c10e602d>] do_sys_open+0x6d/0x130
       [<c1002cfb>] ? sysenter_exit+0xf/0x16
       [<c10e615e>] sys_open+0x2e/0x40
       [<c1002ccc>] sysenter_do_call+0x12/0x32
      3 locks held by fstest/3813:
       #0:  (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [<c10e6f33>] do_truncate+0x63/0xa0
       #1:  (&sb->s_type->i_alloc_sem_key#3){+.+.+.}, at: [<c10fdf07>] notify_change+0x257/0x330
       #2:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178c8e>] reiserfs_write_lock_once+0x2e/0x50
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Christian Kujau <lists@nerdbynature.de>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      108d3943
    • F
      reiserfs: Fix recursive lock on lchown · 5fe1533f
      Frederic Weisbecker 提交于
      On chown, reiserfs will call reiserfs_setattr() to change the owner
      of the given inode, but it may also recursively call
      reiserfs_setattr() to propagate the owner change to the private xattr
      files for this inode.
      
      Hence, the reiserfs lock may be acquired twice which is not wanted
      as reiserfs_setattr() calls journal_begin() that is going to try to
      relax the lock in order to safely acquire the journal mutex.
      
      Using reiserfs_write_lock_once() from reiserfs_setattr() solves
      the problem.
      
      This fixes the following warning, that precedes a lockdep report.
      
      WARNING: at fs/reiserfs/lock.c:95 reiserfs_lock_check_recursive+0x3f/0x50()
      Hardware name: MS-7418
      Unwanted recursive reiserfs lock!
      Pid: 4189, comm: fsstress Not tainted 2.6.33-rc2-tip-atom+ #195
      Call Trace:
       [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
       [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
       [<c103f7ac>] warn_slowpath_common+0x6c/0xc0
       [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
       [<c103f84b>] warn_slowpath_fmt+0x2b/0x30
       [<c1178bff>] reiserfs_lock_check_recursive+0x3f/0x50
       [<c1172ae3>] do_journal_begin_r+0x83/0x350
       [<c1172f2d>] journal_begin+0x7d/0x140
       [<c106509a>] ? in_group_p+0x2a/0x30
       [<c10fda71>] ? inode_change_ok+0x91/0x140
       [<c115007d>] reiserfs_setattr+0x15d/0x2e0
       [<c10f9bf3>] ? dput+0xe3/0x140
       [<c1465adc>] ? _raw_spin_unlock+0x2c/0x50
       [<c117831d>] chown_one_xattr+0xd/0x10
       [<c11780a3>] reiserfs_for_each_xattr+0x113/0x2c0
       [<c1178310>] ? chown_one_xattr+0x0/0x10
       [<c14641e9>] ? mutex_lock_nested+0x2a9/0x350
       [<c117826f>] reiserfs_chown_xattrs+0x1f/0x60
       [<c106509a>] ? in_group_p+0x2a/0x30
       [<c10fda71>] ? inode_change_ok+0x91/0x140
       [<c1150046>] reiserfs_setattr+0x126/0x2e0
       [<c1177c20>] ? reiserfs_getxattr+0x0/0x90
       [<c11b0d57>] ? cap_inode_need_killpriv+0x37/0x50
       [<c10fde01>] notify_change+0x151/0x330
       [<c10e659f>] chown_common+0x6f/0x90
       [<c10e67bd>] sys_lchown+0x6d/0x80
       [<c1002ccc>] sysenter_do_call+0x12/0x32
      ---[ end trace 7c2b77224c1442fc ]---
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Christian Kujau <lists@nerdbynature.de>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      5fe1533f
  16. 18 12月, 2009 1 次提交
  17. 14 12月, 2009 1 次提交
    • F
      reiserfs: Fix reiserfs lock and journal lock inversion dependency · cb1c2e51
      Frederic Weisbecker 提交于
      When we were using the bkl, we didn't care about dependencies against
      other locks, but the mutex conversion created new ones, which is why
      we have reiserfs_mutex_lock_safe(), which unlocks the reiserfs lock
      before acquiring another mutex.
      
      But this trick actually fails if we have acquired the reiserfs lock
      recursively, as we try to unlock it to acquire the new mutex without
      inverted dependency, but we eventually only decrease its depth.
      
      This happens in the case of a nested inode creation/deletion.
      Say we have no space left on the device, we create an inode
      and tak the lock but fail to create its entry, then we release the
      inode using iput(), which calls reiserfs_delete_inode() that takes
      the reiserfs lock recursively. The path eventually ends up in
      journal_begin() where we try to take the journal safely but we
      fail because of the reiserfs lock recursion:
      
      [ INFO: possible circular locking dependency detected ]
      2.6.32-06486-g053fe57a #2
      -------------------------------------------------------
      vi/23454 is trying to acquire lock:
       (&journal->j_mutex){+.+...}, at: [<c110dac4>] do_journal_begin_r+0x64/0x2f0
      
      but task is already holding lock:
       (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>] reiserfs_write_lock+0x28/0x40
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
             [<c104f8f3>] validate_chain+0xa23/0xf70
             [<c1050325>] __lock_acquire+0x4e5/0xa70
             [<c105092a>] lock_acquire+0x7a/0xa0
             [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
             [<c11106a8>] reiserfs_write_lock+0x28/0x40
             [<c110dacb>] do_journal_begin_r+0x6b/0x2f0
             [<c110ddcf>] journal_begin+0x7f/0x120
             [<c10f76c2>] reiserfs_remount+0x212/0x4d0
             [<c1093997>] do_remount_sb+0x67/0x140
             [<c10a9ca6>] do_mount+0x436/0x6b0
             [<c10a9f86>] sys_mount+0x66/0xa0
             [<c1002c50>] sysenter_do_call+0x12/0x36
      
      -> #0 (&journal->j_mutex){+.+...}:
             [<c104fe38>] validate_chain+0xf68/0xf70
             [<c1050325>] __lock_acquire+0x4e5/0xa70
             [<c105092a>] lock_acquire+0x7a/0xa0
             [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
             [<c110dac4>] do_journal_begin_r+0x64/0x2f0
             [<c110ddcf>] journal_begin+0x7f/0x120
             [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140
             [<c10a55fc>] generic_delete_inode+0x9c/0x150
             [<c10a56ed>] generic_drop_inode+0x3d/0x60
             [<c10a4607>] iput+0x47/0x50
             [<c10e915c>] reiserfs_create+0x16c/0x1c0
             [<c109a9c1>] vfs_create+0xc1/0x130
             [<c109dbec>] do_filp_open+0x81c/0x920
             [<c109004f>] do_sys_open+0x4f/0x110
             [<c1090179>] sys_open+0x29/0x40
             [<c1002c50>] sysenter_do_call+0x12/0x36
      
      other info that might help us debug this:
      
      2 locks held by vi/23454:
       #0:  (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [<c109d64e>]
      do_filp_open+0x27e/0x920
       #1:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>]
      reiserfs_write_lock+0x28/0x40
      
      stack backtrace:
      Pid: 23454, comm: vi Not tainted 2.6.32-06486-g053fe57a #2
      Call Trace:
       [<c134b202>] ? printk+0x18/0x1e
       [<c104e960>] print_circular_bug+0xc0/0xd0
       [<c104fe38>] validate_chain+0xf68/0xf70
       [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10
       [<c1050325>] __lock_acquire+0x4e5/0xa70
       [<c105092a>] lock_acquire+0x7a/0xa0
       [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
       [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
       [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
       [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
       [<c110ff80>] ? delete_one_xattr+0x0/0x1c0
       [<c110dac4>] do_journal_begin_r+0x64/0x2f0
       [<c110ddcf>] journal_begin+0x7f/0x120
       [<c11105b5>] ? reiserfs_delete_xattrs+0x15/0x50
       [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140
       [<c10a55bf>] ? generic_delete_inode+0x5f/0x150
       [<c10ef490>] ? reiserfs_delete_inode+0x0/0x140
       [<c10a55fc>] generic_delete_inode+0x9c/0x150
       [<c10a56ed>] generic_drop_inode+0x3d/0x60
       [<c10a4607>] iput+0x47/0x50
       [<c10e915c>] reiserfs_create+0x16c/0x1c0
       [<c1099a5d>] ? inode_permission+0x7d/0xa0
       [<c109a9c1>] vfs_create+0xc1/0x130
       [<c10e8ff0>] ? reiserfs_create+0x0/0x1c0
       [<c109dbec>] do_filp_open+0x81c/0x920
       [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10
       [<c134dc0d>] ? _spin_unlock+0x1d/0x20
       [<c10a6eea>] ? alloc_fd+0xba/0xf0
       [<c109004f>] do_sys_open+0x4f/0x110
       [<c1090179>] sys_open+0x29/0x40
       [<c1002c50>] sysenter_do_call+0x12/0x36
      
      To fix this, use reiserfs_lock_once() from reiserfs_delete_inode()
      which prevents from adding reiserfs lock recursion.
      Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      cb1c2e51
  18. 21 11月, 2009 1 次提交
    • F
      kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block() · 1d2c6cfd
      Frederic Weisbecker 提交于
      GFP_ATOMIC was used in reiserfs_get_block to not lose the Bkl so that
      nobody can modify the tree in the middle of its work. Now that we
      kicked out the bkl, we can use a more friendly flag. We use GFP_NOFS
      here because we already hold the reiserfs lock.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      1d2c6cfd
  19. 15 10月, 2009 1 次提交
    • F
      kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0() · 27b3a5c5
      Frederic Weisbecker 提交于
      We had a watchdog in _get_block_create_0() that jumped to a fixup retry
      path in case the bkl got relaxed while calling kmap().
      This is not necessary anymore since we now have a reiserfs lock that is
      not implicitly relaxed while sleeping.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      27b3a5c5
  20. 14 9月, 2009 5 次提交
    • F
      kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() · 7e942770
      Frederic Weisbecker 提交于
      reiserfs_commit_write() is always called with the write lock held.
      Thus the current calls to reiserfs_write_lock() in this function are
      acquiring the lock recursively.
      We can safely drop them.
      
      This also solves further assumptions for this lock to be really
      released while calling reiserfs_write_unlock().
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      7e942770
    • F
      kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() · d6f5b0aa
      Frederic Weisbecker 提交于
      reiserfs_write_end() is a hot path in reiserfs.
      We have two wasteful write lock lock/release inside that can be gathered
      without changing the code logic.
      
      This patch factorizes them out in a single protected section, reducing the
      number of contentions inside.
      
      [ Impact: reduce lock contention in a reiserfs hotpath ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      d6f5b0aa
    • F
      kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() · 26931309
      Frederic Weisbecker 提交于
      reiserfs_get_block() is one of these sites where the write lock might
      be acquired recursively.
      
      It's a particular problem because this function is called very often.
      It's a hot spot which needs to reschedule() periodically while converting
      direct items to indirect ones because it can take some time.
      
      Then if we are applying the write lock release/reacquire pattern on
      schedule() here, it may not produce the desired effect since we may have
      locked in more than one depth.
      
      The solution is to use reiserfs_write_lock_once() which won't try
      to reacquire the lock recursively. Then the lock will be *really*
      released before schedule().
      
      Also, we only release the lock if TIF_NEED_RESCHED is set to not
      create wasteful numerous contentions.
      
      [ Impact: fix a too long holded lock case in reiserfs_get_block() ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      26931309
    • F
      kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file · 22c963ad
      Frederic Weisbecker 提交于
      Impact: fix a deadlock
      
      reiserfs_truncate_file() can be called from multiple context where
      the write lock can be already hold or not.
      
      This function also acquire (possibly recursively) the write
      lock. Subsequent releases before sleeping will not actually release
      the lock because we may be in more than one lock depth degree.
      
      A typical case is:
      
      reiserfs_file_release {
      	acquire_the_lock()
      	reiserfs_truncate_file()
      		reacquire_the_lock()
      		journal_begin() {
      			do_journal_begin_r() {
      				reiserfs_wait_on_write_block() {
      					/*
      					 * Not released because still one
      					 * depth owned
      					 */
      					release_lock()
      					wait_for_event()
      
      At this stage the event never happen because the one which provides
      it needs the write lock.
      
      We use reiserfs_write_lock_once() here to ensure that we don't acquire the
      write lock recursively.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      LKML-Reference: <1239680065-25013-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      22c963ad
    • F
      reiserfs: kill-the-BKL · 8ebc4232
      Frederic Weisbecker 提交于
      This patch is an attempt to remove the Bkl based locking scheme from
      reiserfs and is intended.
      
      It is a bit inspired from an old attempt by Peter Zijlstra:
      
         http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html
      
      The bkl is heavily used in this filesystem to prevent from
      concurrent write accesses on the filesystem.
      
      Reiserfs makes a deep use of the specific properties of the Bkl:
      
      - It can be acqquired recursively by a same task
      - It is released on the schedule() calls and reacquired when schedule() returns
      
      The two properties above are a roadmap for the reiserfs write locking so it's
      very hard to simply replace it with a common mutex.
      
      - We need a recursive-able locking unless we want to restructure several blocks
        of the code.
      - We need to identify the sites where the bkl was implictly relaxed
        (schedule, wait, sync, etc...) so that we can in turn release and
        reacquire our new lock explicitly.
        Such implicit releases of the lock are often required to let other
        resources producer/consumer do their job or we can suffer unexpected
        starvations or deadlocks.
      
      So the new lock that replaces the bkl here is a per superblock mutex with a
      specific property: it can be acquired recursively by a same task, like the
      bkl.
      
      For such purpose, we integrate a lock owner and a lock depth field on the
      superblock information structure.
      
      The first axis on this patch is to turn reiserfs_write_(un)lock() function
      into a wrapper to manage this mutex. Also some explicit calls to
      lock_kernel() have been converted to reiserfs_write_lock() helpers.
      
      The second axis is to find the important blocking sites (schedule...(),
      wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
      release of the write lock on these locations before blocking. Then we can
      safely wait for those who can give us resources or those who need some.
      Typically this is a fight between the current writer, the reiserfs workqueue
      (aka the async commiter) and the pdflush threads.
      
      The third axis is a consequence of the second. The write lock is usually
      on top of a lock dependency chain which can include the journal lock, the
      flush lock or the commit lock. So it's dangerous to release and trying to
      reacquire the write lock while we still hold other locks.
      
      This is fine with the bkl:
      
            T1                       T2
      
      lock_kernel()
          mutex_lock(A)
          unlock_kernel()
          // do something
                                  lock_kernel()
                                      mutex_lock(A) -> already locked by T1
                                      schedule() (and then unlock_kernel())
          lock_kernel()
          mutex_unlock(A)
          ....
      
      This is not fine with a mutex:
      
            T1                       T2
      
      mutex_lock(write)
          mutex_lock(A)
          mutex_unlock(write)
          // do something
                                 mutex_lock(write)
                                    mutex_lock(A) -> already locked by T1
                                    schedule()
      
          mutex_lock(write) -> already locked by T2
          deadlock
      
      The solution in this patch is to provide a helper which releases the write
      lock and sleep a bit if we can't lock a mutex that depend on it. It's another
      simulation of the bkl behaviour.
      
      The last axis is to locate the fs callbacks that are called with the bkl held,
      according to Documentation/filesystem/Locking.
      
      Those are:
      
      - reiserfs_remount
      - reiserfs_fill_super
      - reiserfs_put_super
      
      Reiserfs didn't need to explicitly lock because of the context of these callbacks.
      But now we must take care of that with the new locking.
      
      After this patch, reiserfs suffers from a slight performance regression (for now).
      On UP, a high volume write with dd reports an average of 27 MB/s instead
      of 30 MB/s without the patch applied.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Bron Gondwana <brong@fastmail.fm>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8ebc4232
  21. 24 6月, 2009 1 次提交
  22. 31 3月, 2009 5 次提交