1. 15 2月, 2010 1 次提交
  2. 05 1月, 2010 2 次提交
    • F
      reiserfs: Relax the lock before truncating pages · 108d3943
      Frederic Weisbecker 提交于
      While truncating a file, reiserfs_setattr() calls inode_setattr()
      that will truncate the mapping for the given inode, but for that
      it needs the pages locks.
      
      In order to release these, the owners need the reiserfs lock to
      complete their jobs. But they can't, as we don't release it before
      calling inode_setattr().
      
      We need to do that to fix the following softlockups:
      
      INFO: task flush-8:0:2149 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      flush-8:0     D f51af998     0  2149      2 0x00000000
       f51af9ac 00000092 00000002 f51af998 c2803304 00000000 c1894ad0 010f3000
       f51af9cc c1462604 c189ef80 f51af974 c1710304 f715b450 f715b5ec c2807c40
       00000000 0005bb00 c2803320 c102c55b c1710304 c2807c50 c2803304 00000246
      Call Trace:
       [<c1462604>] ? schedule+0x434/0xb20
       [<c102c55b>] ? resched_task+0x4b/0x70
       [<c106fa22>] ? mark_held_locks+0x62/0x80
       [<c146414d>] ? mutex_lock_nested+0x1fd/0x350
       [<c14640b9>] mutex_lock_nested+0x169/0x350
       [<c1178cde>] ? reiserfs_write_lock+0x2e/0x40
       [<c1178cde>] reiserfs_write_lock+0x2e/0x40
       [<c11719a2>] do_journal_end+0xc2/0xe70
       [<c1172912>] journal_end+0xb2/0x120
       [<c11686b3>] ? pathrelse+0x33/0xb0
       [<c11729e4>] reiserfs_end_persistent_transaction+0x64/0x70
       [<c1153caa>] reiserfs_get_block+0x12ba/0x15f0
       [<c106fa22>] ? mark_held_locks+0x62/0x80
       [<c1154b24>] reiserfs_writepage+0xa74/0xe80
       [<c1465a27>] ? _raw_spin_unlock_irq+0x27/0x50
       [<c11f3d25>] ? radix_tree_gang_lookup_tag_slot+0x95/0xc0
       [<c10b5377>] ? find_get_pages_tag+0x127/0x1a0
       [<c106fa22>] ? mark_held_locks+0x62/0x80
       [<c106fcd4>] ? trace_hardirqs_on_caller+0x124/0x170
       [<c10bc1e0>] __writepage+0x10/0x40
       [<c10bc9ab>] write_cache_pages+0x16b/0x320
       [<c10bc1d0>] ? __writepage+0x0/0x40
       [<c10bcb88>] generic_writepages+0x28/0x40
       [<c10bcbd5>] do_writepages+0x35/0x40
       [<c11059f7>] writeback_single_inode+0xc7/0x330
       [<c11067b2>] writeback_inodes_wb+0x2c2/0x490
       [<c1106a86>] wb_writeback+0x106/0x1b0
       [<c1106cf6>] wb_do_writeback+0x106/0x1e0
       [<c1106c18>] ? wb_do_writeback+0x28/0x1e0
       [<c1106e0a>] bdi_writeback_task+0x3a/0xb0
       [<c10cbb13>] bdi_start_fn+0x63/0xc0
       [<c10cbab0>] ? bdi_start_fn+0x0/0xc0
       [<c105d1f4>] kthread+0x74/0x80
       [<c105d180>] ? kthread+0x0/0x80
       [<c100327a>] kernel_thread_helper+0x6/0x10
      3 locks held by flush-8:0/2149:
       #0:  (&type->s_umount_key#30){+++++.}, at: [<c110676f>] writeback_inodes_wb+0x27f/0x490
       #1:  (&journal->j_mutex){+.+...}, at: [<c117199a>] do_journal_end+0xba/0xe70
       #2:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178cde>] reiserfs_write_lock+0x2e/0x40
      INFO: task fstest:3813 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      fstest        D 00000002     0  3813   3812 0x00000000
       f5103c94 00000082 f5103c40 00000002 f5ad5450 00000007 f5103c28 011f3000
       00000006 f5ad5450 c10bb005 00000480 c1710304 f5ad5450 f5ad55ec c2907c40
       00000001 f5ad5450 f5103c74 00000046 00000002 f5ad5450 00000007 f5103c6c
      Call Trace:
       [<c10bb005>] ? free_hot_cold_page+0x1d5/0x280
       [<c1462d64>] io_schedule+0x74/0xc0
       [<c10b5a45>] sync_page+0x35/0x60
       [<c146325a>] __wait_on_bit_lock+0x4a/0x90
       [<c10b5a10>] ? sync_page+0x0/0x60
       [<c10b59e5>] __lock_page+0x85/0x90
       [<c105d660>] ? wake_bit_function+0x0/0x60
       [<c10bf654>] truncate_inode_pages_range+0x1e4/0x2d0
       [<c10bf75f>] truncate_inode_pages+0x1f/0x30
       [<c10bf7cf>] truncate_pagecache+0x5f/0xa0
       [<c10bf86a>] vmtruncate+0x5a/0x70
       [<c10fdb7d>] inode_setattr+0x5d/0x190
       [<c1150117>] reiserfs_setattr+0x1f7/0x2f0
       [<c1464569>] ? down_write+0x49/0x70
       [<c10fde01>] notify_change+0x151/0x330
       [<c10e6f3d>] do_truncate+0x6d/0xa0
       [<c10f4ce2>] do_filp_open+0x9a2/0xcf0
       [<c1465aec>] ? _raw_spin_unlock+0x2c/0x50
       [<c10fec50>] ? alloc_fd+0xe0/0x100
       [<c10e602d>] do_sys_open+0x6d/0x130
       [<c1002cfb>] ? sysenter_exit+0xf/0x16
       [<c10e615e>] sys_open+0x2e/0x40
       [<c1002ccc>] sysenter_do_call+0x12/0x32
      3 locks held by fstest/3813:
       #0:  (&sb->s_type->i_mutex_key#4){+.+.+.}, at: [<c10e6f33>] do_truncate+0x63/0xa0
       #1:  (&sb->s_type->i_alloc_sem_key#3){+.+.+.}, at: [<c10fdf07>] notify_change+0x257/0x330
       #2:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c1178c8e>] reiserfs_write_lock_once+0x2e/0x50
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Christian Kujau <lists@nerdbynature.de>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      108d3943
    • F
      reiserfs: Fix recursive lock on lchown · 5fe1533f
      Frederic Weisbecker 提交于
      On chown, reiserfs will call reiserfs_setattr() to change the owner
      of the given inode, but it may also recursively call
      reiserfs_setattr() to propagate the owner change to the private xattr
      files for this inode.
      
      Hence, the reiserfs lock may be acquired twice which is not wanted
      as reiserfs_setattr() calls journal_begin() that is going to try to
      relax the lock in order to safely acquire the journal mutex.
      
      Using reiserfs_write_lock_once() from reiserfs_setattr() solves
      the problem.
      
      This fixes the following warning, that precedes a lockdep report.
      
      WARNING: at fs/reiserfs/lock.c:95 reiserfs_lock_check_recursive+0x3f/0x50()
      Hardware name: MS-7418
      Unwanted recursive reiserfs lock!
      Pid: 4189, comm: fsstress Not tainted 2.6.33-rc2-tip-atom+ #195
      Call Trace:
       [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
       [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
       [<c103f7ac>] warn_slowpath_common+0x6c/0xc0
       [<c1178bff>] ? reiserfs_lock_check_recursive+0x3f/0x50
       [<c103f84b>] warn_slowpath_fmt+0x2b/0x30
       [<c1178bff>] reiserfs_lock_check_recursive+0x3f/0x50
       [<c1172ae3>] do_journal_begin_r+0x83/0x350
       [<c1172f2d>] journal_begin+0x7d/0x140
       [<c106509a>] ? in_group_p+0x2a/0x30
       [<c10fda71>] ? inode_change_ok+0x91/0x140
       [<c115007d>] reiserfs_setattr+0x15d/0x2e0
       [<c10f9bf3>] ? dput+0xe3/0x140
       [<c1465adc>] ? _raw_spin_unlock+0x2c/0x50
       [<c117831d>] chown_one_xattr+0xd/0x10
       [<c11780a3>] reiserfs_for_each_xattr+0x113/0x2c0
       [<c1178310>] ? chown_one_xattr+0x0/0x10
       [<c14641e9>] ? mutex_lock_nested+0x2a9/0x350
       [<c117826f>] reiserfs_chown_xattrs+0x1f/0x60
       [<c106509a>] ? in_group_p+0x2a/0x30
       [<c10fda71>] ? inode_change_ok+0x91/0x140
       [<c1150046>] reiserfs_setattr+0x126/0x2e0
       [<c1177c20>] ? reiserfs_getxattr+0x0/0x90
       [<c11b0d57>] ? cap_inode_need_killpriv+0x37/0x50
       [<c10fde01>] notify_change+0x151/0x330
       [<c10e659f>] chown_common+0x6f/0x90
       [<c10e67bd>] sys_lchown+0x6d/0x80
       [<c1002ccc>] sysenter_do_call+0x12/0x32
      ---[ end trace 7c2b77224c1442fc ]---
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Christian Kujau <lists@nerdbynature.de>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      5fe1533f
  3. 18 12月, 2009 1 次提交
  4. 14 12月, 2009 1 次提交
    • F
      reiserfs: Fix reiserfs lock and journal lock inversion dependency · cb1c2e51
      Frederic Weisbecker 提交于
      When we were using the bkl, we didn't care about dependencies against
      other locks, but the mutex conversion created new ones, which is why
      we have reiserfs_mutex_lock_safe(), which unlocks the reiserfs lock
      before acquiring another mutex.
      
      But this trick actually fails if we have acquired the reiserfs lock
      recursively, as we try to unlock it to acquire the new mutex without
      inverted dependency, but we eventually only decrease its depth.
      
      This happens in the case of a nested inode creation/deletion.
      Say we have no space left on the device, we create an inode
      and tak the lock but fail to create its entry, then we release the
      inode using iput(), which calls reiserfs_delete_inode() that takes
      the reiserfs lock recursively. The path eventually ends up in
      journal_begin() where we try to take the journal safely but we
      fail because of the reiserfs lock recursion:
      
      [ INFO: possible circular locking dependency detected ]
      2.6.32-06486-g053fe57a #2
      -------------------------------------------------------
      vi/23454 is trying to acquire lock:
       (&journal->j_mutex){+.+...}, at: [<c110dac4>] do_journal_begin_r+0x64/0x2f0
      
      but task is already holding lock:
       (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>] reiserfs_write_lock+0x28/0x40
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
             [<c104f8f3>] validate_chain+0xa23/0xf70
             [<c1050325>] __lock_acquire+0x4e5/0xa70
             [<c105092a>] lock_acquire+0x7a/0xa0
             [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
             [<c11106a8>] reiserfs_write_lock+0x28/0x40
             [<c110dacb>] do_journal_begin_r+0x6b/0x2f0
             [<c110ddcf>] journal_begin+0x7f/0x120
             [<c10f76c2>] reiserfs_remount+0x212/0x4d0
             [<c1093997>] do_remount_sb+0x67/0x140
             [<c10a9ca6>] do_mount+0x436/0x6b0
             [<c10a9f86>] sys_mount+0x66/0xa0
             [<c1002c50>] sysenter_do_call+0x12/0x36
      
      -> #0 (&journal->j_mutex){+.+...}:
             [<c104fe38>] validate_chain+0xf68/0xf70
             [<c1050325>] __lock_acquire+0x4e5/0xa70
             [<c105092a>] lock_acquire+0x7a/0xa0
             [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
             [<c110dac4>] do_journal_begin_r+0x64/0x2f0
             [<c110ddcf>] journal_begin+0x7f/0x120
             [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140
             [<c10a55fc>] generic_delete_inode+0x9c/0x150
             [<c10a56ed>] generic_drop_inode+0x3d/0x60
             [<c10a4607>] iput+0x47/0x50
             [<c10e915c>] reiserfs_create+0x16c/0x1c0
             [<c109a9c1>] vfs_create+0xc1/0x130
             [<c109dbec>] do_filp_open+0x81c/0x920
             [<c109004f>] do_sys_open+0x4f/0x110
             [<c1090179>] sys_open+0x29/0x40
             [<c1002c50>] sysenter_do_call+0x12/0x36
      
      other info that might help us debug this:
      
      2 locks held by vi/23454:
       #0:  (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [<c109d64e>]
      do_filp_open+0x27e/0x920
       #1:  (&REISERFS_SB(s)->lock){+.+.+.}, at: [<c11106a8>]
      reiserfs_write_lock+0x28/0x40
      
      stack backtrace:
      Pid: 23454, comm: vi Not tainted 2.6.32-06486-g053fe57a #2
      Call Trace:
       [<c134b202>] ? printk+0x18/0x1e
       [<c104e960>] print_circular_bug+0xc0/0xd0
       [<c104fe38>] validate_chain+0xf68/0xf70
       [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10
       [<c1050325>] __lock_acquire+0x4e5/0xa70
       [<c105092a>] lock_acquire+0x7a/0xa0
       [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
       [<c134c78f>] mutex_lock_nested+0x5f/0x2b0
       [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
       [<c110dac4>] ? do_journal_begin_r+0x64/0x2f0
       [<c110ff80>] ? delete_one_xattr+0x0/0x1c0
       [<c110dac4>] do_journal_begin_r+0x64/0x2f0
       [<c110ddcf>] journal_begin+0x7f/0x120
       [<c11105b5>] ? reiserfs_delete_xattrs+0x15/0x50
       [<c10ef52f>] reiserfs_delete_inode+0x9f/0x140
       [<c10a55bf>] ? generic_delete_inode+0x5f/0x150
       [<c10ef490>] ? reiserfs_delete_inode+0x0/0x140
       [<c10a55fc>] generic_delete_inode+0x9c/0x150
       [<c10a56ed>] generic_drop_inode+0x3d/0x60
       [<c10a4607>] iput+0x47/0x50
       [<c10e915c>] reiserfs_create+0x16c/0x1c0
       [<c1099a5d>] ? inode_permission+0x7d/0xa0
       [<c109a9c1>] vfs_create+0xc1/0x130
       [<c10e8ff0>] ? reiserfs_create+0x0/0x1c0
       [<c109dbec>] do_filp_open+0x81c/0x920
       [<c104ca9b>] ? trace_hardirqs_off+0xb/0x10
       [<c134dc0d>] ? _spin_unlock+0x1d/0x20
       [<c10a6eea>] ? alloc_fd+0xba/0xf0
       [<c109004f>] do_sys_open+0x4f/0x110
       [<c1090179>] sys_open+0x29/0x40
       [<c1002c50>] sysenter_do_call+0x12/0x36
      
      To fix this, use reiserfs_lock_once() from reiserfs_delete_inode()
      which prevents from adding reiserfs lock recursion.
      Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      cb1c2e51
  5. 21 11月, 2009 1 次提交
    • F
      kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block() · 1d2c6cfd
      Frederic Weisbecker 提交于
      GFP_ATOMIC was used in reiserfs_get_block to not lose the Bkl so that
      nobody can modify the tree in the middle of its work. Now that we
      kicked out the bkl, we can use a more friendly flag. We use GFP_NOFS
      here because we already hold the reiserfs lock.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      1d2c6cfd
  6. 15 10月, 2009 1 次提交
    • F
      kill-the-bkl/reiserfs: drop the fs race watchdog from _get_block_create_0() · 27b3a5c5
      Frederic Weisbecker 提交于
      We had a watchdog in _get_block_create_0() that jumped to a fixup retry
      path in case the bkl got relaxed while calling kmap().
      This is not necessary anymore since we now have a reiserfs lock that is
      not implicitly relaxed while sleeping.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      27b3a5c5
  7. 14 9月, 2009 5 次提交
    • F
      kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() · 7e942770
      Frederic Weisbecker 提交于
      reiserfs_commit_write() is always called with the write lock held.
      Thus the current calls to reiserfs_write_lock() in this function are
      acquiring the lock recursively.
      We can safely drop them.
      
      This also solves further assumptions for this lock to be really
      released while calling reiserfs_write_unlock().
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      7e942770
    • F
      kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() · d6f5b0aa
      Frederic Weisbecker 提交于
      reiserfs_write_end() is a hot path in reiserfs.
      We have two wasteful write lock lock/release inside that can be gathered
      without changing the code logic.
      
      This patch factorizes them out in a single protected section, reducing the
      number of contentions inside.
      
      [ Impact: reduce lock contention in a reiserfs hotpath ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      d6f5b0aa
    • F
      kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() · 26931309
      Frederic Weisbecker 提交于
      reiserfs_get_block() is one of these sites where the write lock might
      be acquired recursively.
      
      It's a particular problem because this function is called very often.
      It's a hot spot which needs to reschedule() periodically while converting
      direct items to indirect ones because it can take some time.
      
      Then if we are applying the write lock release/reacquire pattern on
      schedule() here, it may not produce the desired effect since we may have
      locked in more than one depth.
      
      The solution is to use reiserfs_write_lock_once() which won't try
      to reacquire the lock recursively. Then the lock will be *really*
      released before schedule().
      
      Also, we only release the lock if TIF_NEED_RESCHED is set to not
      create wasteful numerous contentions.
      
      [ Impact: fix a too long holded lock case in reiserfs_get_block() ]
      
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      26931309
    • F
      kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file · 22c963ad
      Frederic Weisbecker 提交于
      Impact: fix a deadlock
      
      reiserfs_truncate_file() can be called from multiple context where
      the write lock can be already hold or not.
      
      This function also acquire (possibly recursively) the write
      lock. Subsequent releases before sleeping will not actually release
      the lock because we may be in more than one lock depth degree.
      
      A typical case is:
      
      reiserfs_file_release {
      	acquire_the_lock()
      	reiserfs_truncate_file()
      		reacquire_the_lock()
      		journal_begin() {
      			do_journal_begin_r() {
      				reiserfs_wait_on_write_block() {
      					/*
      					 * Not released because still one
      					 * depth owned
      					 */
      					release_lock()
      					wait_for_event()
      
      At this stage the event never happen because the one which provides
      it needs the write lock.
      
      We use reiserfs_write_lock_once() here to ensure that we don't acquire the
      write lock recursively.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Alessio Igor Bogani <abogani@texware.it>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Alexander Beregalov <a.beregalov@gmail.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      LKML-Reference: <1239680065-25013-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      22c963ad
    • F
      reiserfs: kill-the-BKL · 8ebc4232
      Frederic Weisbecker 提交于
      This patch is an attempt to remove the Bkl based locking scheme from
      reiserfs and is intended.
      
      It is a bit inspired from an old attempt by Peter Zijlstra:
      
         http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html
      
      The bkl is heavily used in this filesystem to prevent from
      concurrent write accesses on the filesystem.
      
      Reiserfs makes a deep use of the specific properties of the Bkl:
      
      - It can be acqquired recursively by a same task
      - It is released on the schedule() calls and reacquired when schedule() returns
      
      The two properties above are a roadmap for the reiserfs write locking so it's
      very hard to simply replace it with a common mutex.
      
      - We need a recursive-able locking unless we want to restructure several blocks
        of the code.
      - We need to identify the sites where the bkl was implictly relaxed
        (schedule, wait, sync, etc...) so that we can in turn release and
        reacquire our new lock explicitly.
        Such implicit releases of the lock are often required to let other
        resources producer/consumer do their job or we can suffer unexpected
        starvations or deadlocks.
      
      So the new lock that replaces the bkl here is a per superblock mutex with a
      specific property: it can be acquired recursively by a same task, like the
      bkl.
      
      For such purpose, we integrate a lock owner and a lock depth field on the
      superblock information structure.
      
      The first axis on this patch is to turn reiserfs_write_(un)lock() function
      into a wrapper to manage this mutex. Also some explicit calls to
      lock_kernel() have been converted to reiserfs_write_lock() helpers.
      
      The second axis is to find the important blocking sites (schedule...(),
      wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
      release of the write lock on these locations before blocking. Then we can
      safely wait for those who can give us resources or those who need some.
      Typically this is a fight between the current writer, the reiserfs workqueue
      (aka the async commiter) and the pdflush threads.
      
      The third axis is a consequence of the second. The write lock is usually
      on top of a lock dependency chain which can include the journal lock, the
      flush lock or the commit lock. So it's dangerous to release and trying to
      reacquire the write lock while we still hold other locks.
      
      This is fine with the bkl:
      
            T1                       T2
      
      lock_kernel()
          mutex_lock(A)
          unlock_kernel()
          // do something
                                  lock_kernel()
                                      mutex_lock(A) -> already locked by T1
                                      schedule() (and then unlock_kernel())
          lock_kernel()
          mutex_unlock(A)
          ....
      
      This is not fine with a mutex:
      
            T1                       T2
      
      mutex_lock(write)
          mutex_lock(A)
          mutex_unlock(write)
          // do something
                                 mutex_lock(write)
                                    mutex_lock(A) -> already locked by T1
                                    schedule()
      
          mutex_lock(write) -> already locked by T2
          deadlock
      
      The solution in this patch is to provide a helper which releases the write
      lock and sleep a bit if we can't lock a mutex that depend on it. It's another
      simulation of the bkl behaviour.
      
      The last axis is to locate the fs callbacks that are called with the bkl held,
      according to Documentation/filesystem/Locking.
      
      Those are:
      
      - reiserfs_remount
      - reiserfs_fill_super
      - reiserfs_put_super
      
      Reiserfs didn't need to explicitly lock because of the context of these callbacks.
      But now we must take care of that with the new locking.
      
      After this patch, reiserfs suffers from a slight performance regression (for now).
      On UP, a high volume write with dd reports an average of 27 MB/s instead
      of 30 MB/s without the patch applied.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Bron Gondwana <brong@fastmail.fm>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8ebc4232
  8. 24 6月, 2009 1 次提交
  9. 31 3月, 2009 9 次提交
  10. 26 3月, 2009 1 次提交
  11. 06 1月, 2009 1 次提交
  12. 05 1月, 2009 1 次提交
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
  13. 01 1月, 2009 1 次提交
  14. 23 10月, 2008 1 次提交
  15. 05 8月, 2008 1 次提交
  16. 09 7月, 2008 1 次提交
  17. 08 2月, 2008 1 次提交
  18. 06 2月, 2008 1 次提交
    • C
      Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user · eebd2aa3
      Christoph Lameter 提交于
      Simplify page cache zeroing of segments of pages through 3 functions
      
      zero_user_segments(page, start1, end1, start2, end2)
      
              Zeros two segments of the page. It takes the position where to
              start and end the zeroing which avoids length calculations and
      	makes code clearer.
      
      zero_user_segment(page, start, end)
      
              Same for a single segment.
      
      zero_user(page, start, length)
      
              Length variant for the case where we know the length.
      
      We remove the zero_user_page macro. Issues:
      
      1. Its a macro. Inline functions are preferable.
      
      2. The KM_USER0 macro is only defined for HIGHMEM.
      
         Having to treat this special case everywhere makes the
         code needlessly complex. The parameter for zeroing is always
         KM_USER0 except in one single case that we open code.
      
      Avoiding KM_USER0 makes a lot of code not having to be dealing
      with the special casing for HIGHMEM anymore. Dealing with
      kmap is only necessary for HIGHMEM configurations. In those
      configurations we use KM_USER0 like we do for a series of other
      functions defined in highmem.h.
      
      Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
      function could not be a macro. zero_user_* functions introduced
      here can be be inline because that constant is not used when these
      functions are called.
      
      Also extract the flushing of the caches to be outside of the kmap.
      
      [akpm@linux-foundation.org: fix nfs and ntfs build]
      [akpm@linux-foundation.org: fix ntfs build some more]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eebd2aa3
  19. 22 10月, 2007 1 次提交
  20. 20 10月, 2007 1 次提交
  21. 19 10月, 2007 1 次提交
  22. 17 10月, 2007 3 次提交
  23. 18 7月, 2007 1 次提交
  24. 10 5月, 2007 1 次提交
  25. 23 1月, 2007 1 次提交
    • V
      [PATCH] resierfs: avoid tail packing if an inode was ever mmapped · de14569f
      Vladimir Saveliev 提交于
      This patch fixes a confusion reiserfs has for a long time.
      
      On release file operation reiserfs used to try to pack file data stored in
      last incomplete page of some files into metadata blocks.  After packing the
      page got cleared with clear_page_dirty.  It did not take into account that
      the page may be mmaped into other process's address space.  Recent
      replacement for clear_page_dirty cancel_dirty_page found the confusion with
      sanity check that page has to be not mapped.
      
      The patch fixes the confusion by making reiserfs avoid tail packing if an
      inode was ever mmapped.  reiserfs_mmap and reiserfs_file_release are
      serialized with mutex in reiserfs specific inode.  reiserfs_mmap locks the
      mutex and sets a bit in reiserfs specific inode flags.
      reiserfs_file_release checks the bit having the mutex locked.  If bit is
      set - tail packing is avoided.  This eliminates a possibility that mmapped
      page gets cancel_page_dirty-ed.
      Signed-off-by: NVladimir Saveliev <vs@namesys.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Chris Mason <mason@suse.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de14569f