- 15 10月, 2009 1 次提交
-
-
由 Frederic Weisbecker 提交于
Reiserfs uses the ioctl callback for its file operations, which means that its ioctl path is still locked by the bkl, this was synchronizing with the rest of the filsystem operations. We have changed that by locking it with the new reiserfs lock but we do that only from the compat_ioctl callback. Fix that by locking reiserfs_ioctl() everytime. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr> Cc: Thomas Gleixner <tglx@linutronix.de>
-
- 05 10月, 2009 1 次提交
-
-
由 Frederic Weisbecker 提交于
While creating the reiserfs workqueue during the journal initialization, we are holding the reiserfs lock, but create_workqueue() also holds the cpu_add_remove_lock, creating then the following dependency: - reiserfs lock -> cpu_add_remove_lock But we also have the following existing dependencies: - mm->mmap_sem -> reiserfs lock - cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex The merged dependency chain then becomes: - mm->mmap_sem -> reiserfs lock -> cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex But when we fill a dir entry in sysfs_readir(), we are holding the sysfs_mutex and we also might fault while copying the directory entry to the user, leading to the following dependency: - sysfs_mutex -> mm->mmap_sem The end result is then a lock inversion between sysfs_mutex and mm->mmap_sem, as reported in the following lockdep warning: [ INFO: possible circular locking dependency detected ] 2.6.31-07095-g25a3912 #4 ------------------------------------------------------- udevadm/790 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<c1098942>] might_fault+0x72/0xc0 but task is already holding lock: (sysfs_mutex){+.+.+.}, at: [<c110813c>] sysfs_readdir+0x7c/0x260 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (sysfs_mutex){+.+.+.}: [...] -> #4 (slub_lock){+++++.}: [...] -> #3 (cpu_hotplug.lock){+.+.+.}: [...] -> #2 (cpu_add_remove_lock){+.+.+.}: [...] -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [...] -> #0 (&mm->mmap_sem){++++++}: [...] This can be fixed by relaxing the reiserfs lock while creating the workqueue. This is fine to relax the lock here, we just keep it around to pass through reiserfs lock checks and for paranoid reasons. Reported-by: NAlexander Beregalov <a.beregalov@gmail.com> Tested-by: NAlexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>
-
- 17 9月, 2009 1 次提交
-
-
由 Frederic Weisbecker 提交于
Alexander Beregalov reported the following warning: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.31-03149-gdcc030a #1 ------------------------------------------------------- udevadm/716 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<c107249a>] might_fault+0x4a/0xa0 but task is already holding lock: (sysfs_mutex){+.+.+.}, at: [<c10cb9aa>] sysfs_readdir+0x5a/0x200 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (sysfs_mutex){+.+.+.}: [...] -> #2 (&bdev->bd_mutex){+.+.+.}: [...] -> #1 (&REISERFS_SB(s)->lock){+.+.+.}: [...] -> #0 (&mm->mmap_sem){++++++}: [...] On reiserfs mount path, we take the reiserfs lock and while initializing the journal, we open the device, taking the bdev->bd_mutex. Then rescan_partition() may signal the change to sysfs. We have then the following dependency: reiserfs_lock -> bd_mutex -> sysfs_mutex Later, while entering reiserfs_readpage() after a pagefault in an mmaped reiserfs file, we are holding the mm->mmap_sem, and we are going to take the reiserfs lock too. We have then the following dependency: mm->mmap_sem -> reiserfs_lock which, expanded with the previous dependency gives us: mm->mmap_sem -> reiserfs_lock -> bd_mutex -> sysfs_mutex Now while entering the sysfs readdir path, we are holding the sysfs_mutex. And when we copy a directory entry to the user buffer, we might fault and then take the mm->mmap_sem lock. Which leads to the circular locking dependency reported. We can fix that by relaxing the reiserfs lock during the call to journal_init_dev(), which is the place where we open the mounted device. This is fine to relax the lock here because we are in the begining of the reiserfs mount path and there is nothing to protect at this time, the journal is not intialized. We just keep this lock around for paranoid reasons. Reported-by: NAlexander Beregalov <a.beregalov@gmail.com> Tested-by: NAlexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>
-
- 14 9月, 2009 22 次提交
-
-
由 Frederic Weisbecker 提交于
Until now, trying to unlock the reiserfs write lock whereas the current task doesn't hold it lead to a simple warning. We should actually warn and panic in this case to avoid the user datas to reach an unstable state. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>
-
由 Frederic Weisbecker 提交于
reiserfs_commit_write() is always called with the write lock held. Thus the current calls to reiserfs_write_lock() in this function are acquiring the lock recursively. We can safely drop them. This also solves further assumptions for this lock to be really released while calling reiserfs_write_unlock(). Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>
-
由 Frederic Weisbecker 提交于
reiserfs_mkdir() acquires the reiserfs lock, assuming it has been called from the dir inodes callbacks, without the lock held. But it can also be called from other internal sites such as reiserfs_xattr_init() which already holds the lock. This recursive locking leads to further wrong assumptions. For example, later calls to reiserfs_mutex_lock_safe() won't actually unlock the reiserfs lock the time we acquire a given mutex, creating unexpected lock inversions. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>
-
由 Frederic Weisbecker 提交于
reiserfs_xattr_init is called with the reiserfs write lock held, but if the ".reiserfs_priv" entry is not created, we take the superblock root directory inode mutex until .reiserfs_priv is created. This creates a lock dependency inversion against other sites such as reiserfs_file_release() which takes an inode mutex and the reiserfs lock after. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Laurent Riffard <laurent.riffard@free.fr>
-
由 Frederic Weisbecker 提交于
When do_balance() balances the tree, a trick is performed to provide the ability for other tree writers/readers to check whether do_balance() is executing concurrently (requires CONFIG_REISERFS_CHECK). This is done to protect concurrent accesses to the tree. The trick is the following: When do_balance is called, a unique global variable called cur_tb takes a pointer to the current tree to be rebalanced. Once do_balance finishes its work, cur_tb takes the NULL value. Then, concurrent tree readers/writers just have to check the value of cur_tb to ensure do_balance isn't executing concurrently. If it is, then it proves that schedule() occured on do_balance(), which then relaxed the bkl that protected the tree. Now that the bkl has be turned into a mutex, this check is still fine even though do_balance() becomes preemptible: the write lock will not be automatically released on schedule(), so the tree is still protected. But this is only fine if we have a single reiserfs mountpoint. Indeed, because the bkl is a global lock, it didn't allowed concurrent executions between a tree reader/writer in a mount point and a do_balance() on another tree from another mountpoint. So assuming all these readers/writers weren't supposed to be reentrant, the current check now sometimes detect false positives with the current per-superblock mutex which allows this reentrancy. This patch keeps the concurrent tree accesses check but moves it per superblock, so that only trees from a same mount point are checked to be not accessed concurrently. [ Impact: fix spurious panic while running several reiserfs mount-points ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
While searching a pathname, an inode mutex can be acquired in do_lookup() which calls reiserfs_lookup() which in turn acquires the write lock. On the other side reiserfs_fill_super() can acquire the write_lock and then call reiserfs_lookup_privroot() which can acquire an inode mutex (the root of the mount point). So we theoretically risk an AB - BA lock inversion that could lead to a deadlock. As for other lock dependencies found since the bkl to mutex conversion, the fix is to use reiserfs_mutex_lock_safe() which drops the lock dependency to the write lock. [ Impact: fix a possible deadlock with reiserfs ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
search_by_key() is the site which most requires the lock. This is mostly because it is a very central function and also because it releases/reaqcuires the write lock at least once each time it is called. Such release/reacquire creates a lot of contention in this place and also opens more the window which let another thread changing the tree. When it happens, the current path searching over the tree must be retried from the beggining (the root) which is a wasteful and time consuming recovery. This patch factorizes two release/reacquire sequences: - reading leaf nodes blocks - reading current block The latter immediately follows the former. The whole sequence is safe as a single unlocked section because we check just after if the tree has changed during these operations. Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
reiserfs_mutex_lock_safe() is a hack to avoid any dependency between an internal reiserfs mutex and the write lock, it has been proposed to follow the old bkl logic. The code does the following: while (!mutex_trylock(m)) { reiserfs_write_unlock(s); schedule(); reiserfs_write_lock(s); } It then imitate the implicit behaviour of the lock when it was a Bkl and hadn't such dependency: mutex_lock(m) { if (fastpath) let's go else { wait_for_mutex() { schedule() { unlock_kernel() reacquire_lock_kernel() } } } } The problem is that by using such explicit schedule(), we don't benefit of the adaptive mutex spinning on owner. The logic in use now is: reiserfs_write_unlock(s); mutex_lock(m); // -> possible adaptive spinning reiserfs_write_lock(s); [ Impact: restore the use of adaptive spinning mutexes in reiserfs ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
reiserfs_write_end() is a hot path in reiserfs. We have two wasteful write lock lock/release inside that can be gathered without changing the code logic. This patch factorizes them out in a single protected section, reducing the number of contentions inside. [ Impact: reduce lock contention in a reiserfs hotpath ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
search_by_key() is a central function in reiserfs which searches the patch in the fs tree from the root to a node given its key. It is the function that is most requesting the write lock because it's a path very often used. Also we forget to release the lock while reading the next tree node, making us holding the lock in a wasteful way. Then we release the lock while reading the current node and its childs, all-in-one. It should be safe because we have a reference to these blocks and even if we read a block that will be concurrently changed, we have an fs_changed check later that will make us retry the path from the root. [ Impact: release the write lock while unused in a hot path ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
The write lock can be acquired recursively in reiserfs_lookup(). But we may want to *really* release the lock before possible rescheduling from a reiserfs_lookup() callee. Hence we want to only acquire the lock once (ie: not recursively). [ Impact: prevent from possible false unreleased write lock on sleeping ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
reiserfs_get_block() is one of these sites where the write lock might be acquired recursively. It's a particular problem because this function is called very often. It's a hot spot which needs to reschedule() periodically while converting direct items to indirect ones because it can take some time. Then if we are applying the write lock release/reacquire pattern on schedule() here, it may not produce the desired effect since we may have locked in more than one depth. The solution is to use reiserfs_write_lock_once() which won't try to reacquire the lock recursively. Then the lock will be *really* released before schedule(). Also, we only release the lock if TIF_NEED_RESCHED is set to not create wasteful numerous contentions. [ Impact: fix a too long holded lock case in reiserfs_get_block() ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
flush_commit_list() uses ll_rw_block() to commit the pending log blocks. ll_rw_block() might sleep, and the bkl was released at this point. Then we can also relax the write lock at this point. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
reiserfs_read_bitmap_block() uses sb_bread() to read the bitmap block. This helper might sleep. Then, when the bkl was used, it was released at this point. We can then relax the write lock too here. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
get_neighbors() is used to get the left and/or right blocks against a given one in order to balance a tree. sb_bread() is used to read the buffer of these neighors blocks and while it waits for this operation, it might sleep. The bkl was released at this point, and then we can also release the write lock before calling sb_bread(). This is safe because if the filesystem is changed after this lock release, the function returns REPEAT_SEARCH (aka SCHEDULE_OCCURRED in the function header comments) in order to repeat the neighbhor research. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
prepare_for_delete_or_cut() can process several types of items, including indirect items, ie: items which contain no file data but pointers to unformatted nodes scattering the datas of a file. In this case it has to zero out these pointers to block numbers of unformatted nodes and release the bitmap from these block numbers. It can take some time, so a rescheduling() is performed between each block processed. We can safely release the write lock while rescheduling(), like the bkl did, because the code checks just after if the item has moved after sleeping. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
When do_journal_end() copies data to the journal blocks buffers in memory, it reschedules if needed between each block copied and dirtyfied. We can also release the write lock at this rescheduling stage, like did the bkl implicitly. [ Impact: release the reiserfs write lock when it is not needed ] Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
Impact: fix a deadlock reiserfs_dirty_inode() is the super_operations::dirty_inode() callback of reiserfs. It can be called from different contexts where the write lock can be already held. But this function also grab the write lock (possibly recursively). Subsequent release of the lock before sleep will actually not release the lock if the caller of mark_inode_dirty() (which in turn calls reiserfs_dirty_inode()) already owns the lock. A typical case: reiserfs_write_end() { acquire_write_lock() mark_inode_dirty() { reiserfs_dirty_inode() { reacquire_write_lock() { journal_begin() { do_journal_begin_r() { /* * fail to release, still * one depth of lock */ release_write_lock() reiserfs_wait_on_write_block() { wait_event() The event is usually provided by something which needs the write lock but it hasn't been released. We use reiserfs_write_lock_once() here to ensure we only grab the write lock in one level. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> LKML-Reference: <1239680065-25013-4-git-send-email-fweisbec@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Frederic Weisbecker 提交于
Impact: fix a deadlock reiserfs_truncate_file() can be called from multiple context where the write lock can be already hold or not. This function also acquire (possibly recursively) the write lock. Subsequent releases before sleeping will not actually release the lock because we may be in more than one lock depth degree. A typical case is: reiserfs_file_release { acquire_the_lock() reiserfs_truncate_file() reacquire_the_lock() journal_begin() { do_journal_begin_r() { reiserfs_wait_on_write_block() { /* * Not released because still one * depth owned */ release_lock() wait_for_event() At this stage the event never happen because the one which provides it needs the write lock. We use reiserfs_write_lock_once() here to ensure that we don't acquire the write lock recursively. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> LKML-Reference: <1239680065-25013-3-git-send-email-fweisbec@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Frederic Weisbecker 提交于
Sometimes we don't want to recursively hold the per superblock write lock because we want to be sure it is actually released when we come to sleep. This patch introduces the necessary tools for that. reiserfs_write_lock_once() does the same job than reiserfs_write_lock() except that it won't try to acquire recursively the lock if the current task already owns it. Also the lock_depth before the call of this function is returned. reiserfs_write_unlock_once() unlock only if reiserfs_write_lock_once() returned a depth equal to -1, ie: only if it actually locked. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Alexander Beregalov <a.beregalov@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> LKML-Reference: <1239680065-25013-2-git-send-email-fweisbec@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Frederic Weisbecker 提交于
Impact: fix a deadlock The j_flush_mutex is acquired safely in journal.c: if we can't take it, we free the reiserfs per superblock lock and wait a bit. But we have a remaining place in kupdate_transactions() where j_flush_mutex is still acquired traditionnaly. Thus the following scenario (warned by lockdep) can happen: A B mutex_lock(&write_lock) mutex_lock(&write_lock) mutex_lock(&j_flush_mutex) mutex_lock(&j_flush_mutex) //block mutex_unlock(&write_lock) sleep... mutex_lock(&write_lock) //deadlock Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions(). Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Alessio Igor Bogani <abogani@texware.it> Cc: Jeff Mahoney <jeffm@suse.com> LKML-Reference: <1239660635-12940-1-git-send-email-fweisbec@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Frederic Weisbecker 提交于
This patch is an attempt to remove the Bkl based locking scheme from reiserfs and is intended. It is a bit inspired from an old attempt by Peter Zijlstra: http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html The bkl is heavily used in this filesystem to prevent from concurrent write accesses on the filesystem. Reiserfs makes a deep use of the specific properties of the Bkl: - It can be acqquired recursively by a same task - It is released on the schedule() calls and reacquired when schedule() returns The two properties above are a roadmap for the reiserfs write locking so it's very hard to simply replace it with a common mutex. - We need a recursive-able locking unless we want to restructure several blocks of the code. - We need to identify the sites where the bkl was implictly relaxed (schedule, wait, sync, etc...) so that we can in turn release and reacquire our new lock explicitly. Such implicit releases of the lock are often required to let other resources producer/consumer do their job or we can suffer unexpected starvations or deadlocks. So the new lock that replaces the bkl here is a per superblock mutex with a specific property: it can be acquired recursively by a same task, like the bkl. For such purpose, we integrate a lock owner and a lock depth field on the superblock information structure. The first axis on this patch is to turn reiserfs_write_(un)lock() function into a wrapper to manage this mutex. Also some explicit calls to lock_kernel() have been converted to reiserfs_write_lock() helpers. The second axis is to find the important blocking sites (schedule...(), wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit release of the write lock on these locations before blocking. Then we can safely wait for those who can give us resources or those who need some. Typically this is a fight between the current writer, the reiserfs workqueue (aka the async commiter) and the pdflush threads. The third axis is a consequence of the second. The write lock is usually on top of a lock dependency chain which can include the journal lock, the flush lock or the commit lock. So it's dangerous to release and trying to reacquire the write lock while we still hold other locks. This is fine with the bkl: T1 T2 lock_kernel() mutex_lock(A) unlock_kernel() // do something lock_kernel() mutex_lock(A) -> already locked by T1 schedule() (and then unlock_kernel()) lock_kernel() mutex_unlock(A) .... This is not fine with a mutex: T1 T2 mutex_lock(write) mutex_lock(A) mutex_unlock(write) // do something mutex_lock(write) mutex_lock(A) -> already locked by T1 schedule() mutex_lock(write) -> already locked by T2 deadlock The solution in this patch is to provide a helper which releases the write lock and sleep a bit if we can't lock a mutex that depend on it. It's another simulation of the bkl behaviour. The last axis is to locate the fs callbacks that are called with the bkl held, according to Documentation/filesystem/Locking. Those are: - reiserfs_remount - reiserfs_fill_super - reiserfs_put_super Reiserfs didn't need to explicitly lock because of the context of these callbacks. But now we must take care of that with the new locking. After this patch, reiserfs suffers from a slight performance regression (for now). On UP, a high volume write with dd reports an average of 27 MB/s instead of 30 MB/s without the patch applied. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Reviewed-by: NIngo Molnar <mingo@elte.hu> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Bron Gondwana <brong@fastmail.fm> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 9月, 2009 1 次提交
-
-
由 Mimi Zohar 提交于
- As ima_counts_put() may be called after the inode has been freed, verify that the inode is not NULL, before dereferencing it. - Maintain the IMA file counters in may_open() properly, decrementing any counter increments on subsequent errors. Reported-by: NCiprian Docan <docan@eden.rutgers.edu> Reported-by: NJ.R. Okajima <hooanon05@yahoo.co.jp> Signed-off-by: NMimi Zohar <zohar@us.ibm.com> Acked-by: Eric Paris <eparis@redhat.com Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 06 9月, 2009 2 次提交
-
-
由 Nicolas Pitre 提交于
In ext2_rename(), dir_page is acquired through ext2_dotdot(). It is then released through ext2_set_link() but only if old_dir != new_dir. Failing that, the pkmap reference count is never decremented and the page remains pinned forever. Repeat that a couple times with highmem pages and all pkmap slots get exhausted, and every further kmap() calls end up stalling on the pkmap_map_wait queue at which point the whole system comes to a halt. Signed-off-by: NNicolas Pitre <nico@marvell.com> Acked-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Tom Horsley reports that his debugger hangs when it tries to read /proc/pid_of_tracee/maps, this happens since "mm_for_maps: take ->cred_guard_mutex to fix the race with exec" 04b836cbf19e885f8366bccb2e4b0474346c02d commit in 2.6.31. But the root of the problem lies in the fact that do_execve() path calls tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC. The tracee must not sleep in TASK_TRACED holding this mutex. Even if we remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(), another task doing PTRACE_ATTACH should not hang until it is killed or the tracee resumes. With this patch do_execve() does not use ->cred_guard_mutex directly and we do not hold it throughout, instead: - introduce prepare_bprm_creds() helper, it locks the mutex and calls prepare_exec_creds() to initialize bprm->cred. - install_exec_creds() drops the mutex after commit_creds(), and thus before tracehook_report_exec()->ptrace_stop(). or, if exec fails, free_bprm() drops this mutex when bprm->cred != NULL which indicates install_exec_creds() was not called. Reported-by: NTom Horsley <tom.horsley@att.net> Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: James Morris <jmorris@namei.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 9月, 2009 1 次提交
-
-
由 Sunil Mushran 提交于
Bug introduced by mainline commit e7432675 The bug causes ocfs2_write_begin_nolock() to oops when len=0. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Cc: stable@kernel.org Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 03 9月, 2009 1 次提交
-
-
由 Massimo Cirillo 提交于
The function jffs2_nor_wbuf_flash_setup() doesn't allocate the verify buffer if CONFIG_JFFS2_FS_WBUF_VERIFY is defined, so causing a kernel panic when that macro is enabled and the verify function is called. Similarly the jffs2_nor_wbuf_flash_cleanup() must free the buffer if CONFIG_JFFS2_FS_WBUF_VERIFY is enabled. The following patch fixes the problem. The following patch applies to 2.6.30 kernel. Signed-off-by: NMassimo Cirillo <maxcir@gmail.com> Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com> Cc: stable@kernel.org
-
- 02 9月, 2009 1 次提交
-
-
由 Christoph Hellwig 提交于
Fix a small typo in the compat ioctl handler that cause the swapext compat handler to never be called. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NTorsten Kaiser <just.for.lkml@googlemail.com> Tested-by: NTorsten Kaiser <just.for.lkml@googlemail.com> Reviewed-by: NEric Sandeen <sandeen@sandeen.net> Reviewed-by: NFelix Blyakher <felixb@sgi.com> Signed-off-by: NFelix Blyakher <felixb@sgi.com>
-
- 01 9月, 2009 1 次提交
-
-
由 Ian Kent 提交于
In the recent change by Al Viro that changes verious subsystems to use "struct path" one case was missed in the autofs4 module which causes mounts to no longer expire. Signed-off-by: NIan Kent <raven@themaw.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 8月, 2009 1 次提交
-
-
由 Ryusuke Konishi 提交于
This will fix the following preempt count underflow reported from users with the title "[NILFS users] segctord problem" (Message-ID: <949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID: <debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>): WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0() Hardware name: HP Compaq 6530b (KR980UT#ABC) Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7 Call Trace: [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0 [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0 [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20 [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0 [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2] [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2] [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2] [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2] [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2] [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40 [<ffffffff803c06e0>] ? __up_write+0xe0/0x150 [<ffffffff80262959>] ? up_write+0x9/0x10 [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2] [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2] [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2] [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2] [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2] [<ffffffff80252633>] ? add_timer+0x13/0x20 [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90 [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2] [<ffffffff8025e556>] kthread+0x56/0x90 [<ffffffff8020cdea>] child_rip+0xa/0x20 [<ffffffff8025e500>] ? kthread+0x0/0x90 [<ffffffff8020cde0>] ? child_rip+0x0/0x20 This problem was caused due to a missing radix_tree_preload() call in the retry path of nilfs_btnode_prepare_change_key() function. Reported-by: NEric A <eric225125@yahoo.com> Reported-by: NJerome Poulin <jeromepoulin@gmail.com> Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Tested-by: NJerome Poulin <jeromepoulin@gmail.com> Cc: stable@kernel.org
-
- 29 8月, 2009 1 次提交
-
-
由 Eric Paris 提交于
Seperating the addition and update of marks in inotify resulted in a regression in that inotify never gets events. The inotify group mask is always 0. This mask should be updated any time a new mark is added. Signed-off-by: NEric Paris <eparis@redhat.com>
-
- 28 8月, 2009 4 次提交
-
-
由 Eric Paris 提交于
0db501bd introduced a regresion in that it now sends a nul terminator but the length accounting when checking for space or reporting to userspace did not take this into account. This corrects all of the rounding logic. Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Brian Rogers 提交于
When an event has no pathname, there's no need to pad it with a null byte and therefore generate an inotify_event sized block of zeros. This fixes a regression introduced by commit 0db501bd where my system wouldn't finish booting because some process was being confused by this. Signed-off-by: NBrian Rogers <brian@xyzw.org> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Tao Ma 提交于
In commit a5a0a630, when ocfs2_attch_dentry_lock fails, we call an extra iput and reset dentry->d_fsdata to NULL. This resolve a bug, but it isn't completed and the dentry is still there. When we want to use it again, ocfs2_dentry_revalidate doesn't catch it and return true. That make future ocfs2_dentry_lock panic out. One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162. The resolution is to add a check for dentry->d_fsdata in revalidate process and return false if dentry->d_fsdata is NULL, so that a new ocfs2_lookup will be called again. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 David Howells 提交于
kAFS crashes when asked to read a symbolic link because page_getlink() passes a NULL file pointer to read_mapping_page(), but afs_readpage() expects a file pointer from which to extract a key. Modify afs_readpage() to request the appropriate key from the calling process's keyrings if a file struct is not supplied with one attached. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 8月, 2009 2 次提交
-
-
由 Eric W. Biederman 提交于
Before the rewrite copy_event_to_user always wrote a terqminating '\0' byte to user space after the filename. Since the rewrite that terminating byte was skipped if your filename is exactly a multiple of event_size. Ouch! So add one byte to name_size before we round up and use clear_user to set userspace to zero like /dev/zero does instead of copying the strange nul_inotify_event. I can't quite convince myself len_to_zero will never exceed 16 and even if it doesn't clear_user should be more efficient and a more accurate reflection of what the code is trying to do. Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Eric Paris 提交于
The are races around the idr storage of inotify watches. It's possible that a watch could be found from sys_inotify_rm_watch() in the idr, but it could be removed from the idr before that code does it's removal. Move the locking and the refcnt'ing so that these have to happen atomically. Signed-off-by: NEric Paris <eparis@redhat.com>
-