提交 · ac78a07893d24d95ff5f39d0433c25210f224f07 · OpenHarmony / kernel_linux

15 10月, 2009 1 次提交

kill-the-bkl/reiserfs: always lock the ioctl path · ac78a078

由 Frederic Weisbecker 提交于 10月 14, 2009

Reiserfs uses the ioctl callback for its file operations, which means
that its ioctl path is still locked by the bkl, this was synchronizing
with the rest of the filsystem operations. We have changed that by
locking it with the new reiserfs lock but we do that only from the
compat_ioctl callback.

Fix that by locking reiserfs_ioctl() everytime.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>

ac78a078

05 10月, 2009 1 次提交

kill-the-bkl/reiserfs: fix reiserfs lock to cpu_add_remove_lock dependency · 48f6ba5e

由 Frederic Weisbecker 提交于 10月 05, 2009

While creating the reiserfs workqueue during the journal
initialization, we are holding the reiserfs lock, but
create_workqueue() also holds the cpu_add_remove_lock, creating
then the following dependency:

- reiserfs lock -> cpu_add_remove_lock

But we also have the following existing dependencies:

- mm->mmap_sem -> reiserfs lock
- cpu_add_remove_lock -> cpu_hotplug.lock -> slub_lock -> sysfs_mutex

The merged dependency chain then becomes:

- mm->mmap_sem -> reiserfs lock -> cpu_add_remove_lock ->
	cpu_hotplug.lock -> slub_lock -> sysfs_mutex

But when we fill a dir entry in sysfs_readir(), we are holding the
sysfs_mutex and we also might fault while copying the directory entry
to the user, leading to the following dependency:

- sysfs_mutex -> mm->mmap_sem

The end result is then a lock inversion between sysfs_mutex and
mm->mmap_sem, as reported in the following lockdep warning:

[ INFO: possible circular locking dependency detected ]
2.6.31-07095-g25a3912 #4
-------------------------------------------------------
udevadm/790 is trying to acquire lock:
 (&mm->mmap_sem){++++++}, at: [<c1098942>] might_fault+0x72/0xc0

but task is already holding lock:
 (sysfs_mutex){+.+.+.}, at: [<c110813c>] sysfs_readdir+0x7c/0x260

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (sysfs_mutex){+.+.+.}:
      [...]

-> #4 (slub_lock){+++++.}:
      [...]

-> #3 (cpu_hotplug.lock){+.+.+.}:
      [...]

-> #2 (cpu_add_remove_lock){+.+.+.}:
      [...]

-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
      [...]

-> #0 (&mm->mmap_sem){++++++}:
      [...]

This can be fixed by relaxing the reiserfs lock while creating the
workqueue.
This is fine to relax the lock here, we just keep it around to pass
through reiserfs lock checks and for paranoid reasons.
Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
Tested-by: NAlexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>

48f6ba5e

17 9月, 2009 1 次提交

kill-the-bkl/reiserfs: Fix induced mm->mmap_sem to sysfs_mutex dependency · 193be0ee

由 Frederic Weisbecker 提交于 9月 17, 2009

Alexander Beregalov reported the following warning:

	=======================================================
	[ INFO: possible circular locking dependency detected ]
	2.6.31-03149-gdcc030a #1
	-------------------------------------------------------
	udevadm/716 is trying to acquire lock:
	 (&mm->mmap_sem){++++++}, at: [<c107249a>] might_fault+0x4a/0xa0

	but task is already holding lock:
	 (sysfs_mutex){+.+.+.}, at: [<c10cb9aa>] sysfs_readdir+0x5a/0x200

	which lock already depends on the new lock.

	the existing dependency chain (in reverse order) is:

	-> #3 (sysfs_mutex){+.+.+.}:
	       [...]

	-> #2 (&bdev->bd_mutex){+.+.+.}:
	       [...]

	-> #1 (&REISERFS_SB(s)->lock){+.+.+.}:
	       [...]

	-> #0 (&mm->mmap_sem){++++++}:
	       [...]

On reiserfs mount path, we take the reiserfs lock and while
initializing the journal, we open the device, taking the
bdev->bd_mutex. Then rescan_partition() may signal the change
to sysfs.

We have then the following dependency:

	reiserfs_lock -> bd_mutex -> sysfs_mutex

Later, while entering reiserfs_readpage() after a pagefault in an
mmaped reiserfs file, we are holding the mm->mmap_sem, and we are going
to take the reiserfs lock too.
We have then the following dependency:

	mm->mmap_sem -> reiserfs_lock

which, expanded with the previous dependency gives us:

	mm->mmap_sem -> reiserfs_lock -> bd_mutex -> sysfs_mutex

Now while entering the sysfs readdir path, we are holding the
sysfs_mutex. And when we copy a directory entry to the user buffer, we
might fault and then take the mm->mmap_sem lock. Which leads to the
circular locking dependency reported.

We can fix that by relaxing the reiserfs lock during the call to
journal_init_dev(), which is the place where we open the mounted
device.

This is fine to relax the lock here because we are in the begining of
the reiserfs mount path and there is nothing to protect at this time,
the journal is not intialized.
We just keep this lock around for paranoid reasons.
Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
Tested-by: NAlexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>

193be0ee

14 9月, 2009 22 次提交

kill-the-bkl/reiserfs: panic in case of lock imbalance · 80503185

由 Frederic Weisbecker 提交于 8月 25, 2009

Until now, trying to unlock the reiserfs write lock whereas the current
task doesn't hold it lead to a simple warning.
We should actually warn and panic in this case to avoid the user datas
to reach an unstable state.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>

80503185

kill-the-bkl/reiserfs: fix recursive reiserfs write lock in reiserfs_commit_write() · 7e942770

由 Frederic Weisbecker 提交于 8月 25, 2009

reiserfs_commit_write() is always called with the write lock held.
Thus the current calls to reiserfs_write_lock() in this function are
acquiring the lock recursively.
We can safely drop them.

This also solves further assumptions for this lock to be really
released while calling reiserfs_write_unlock().
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>

7e942770

kill-the-bkl/reiserfs: fix recursive reiserfs lock in reiserfs_mkdir() · b10ab4c3

由 Frederic Weisbecker 提交于 8月 25, 2009

reiserfs_mkdir() acquires the reiserfs lock, assuming it has been called
from the dir inodes callbacks, without the lock held.

But it can also be called from other internal sites such as
reiserfs_xattr_init() which already holds the lock. This recursive
locking leads to further wrong assumptions. For example, later calls
to reiserfs_mutex_lock_safe() won't actually unlock the reiserfs lock
the time we acquire a given mutex, creating unexpected lock inversions.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>

b10ab4c3

kill-the-bkl/reiserfs: fix "reiserfs lock" / "inode mutex" lock inversion dependency · ae635c0b

由 Frederic Weisbecker 提交于 8月 25, 2009

reiserfs_xattr_init is called with the reiserfs write lock held, but
if the ".reiserfs_priv" entry is not created, we take the superblock
root directory inode mutex until .reiserfs_priv is created.

This creates a lock dependency inversion against other sites such as
reiserfs_file_release() which takes an inode mutex and the reiserfs
lock after.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>

ae635c0b

kill-the-bkl/reiserfs: move the concurrent tree accesses checks per superblock · 08f14fc8

由 Frederic Weisbecker 提交于 5月 16, 2009

When do_balance() balances the tree, a trick is performed to
provide the ability for other tree writers/readers to check whether
do_balance() is executing concurrently (requires CONFIG_REISERFS_CHECK).

This is done to protect concurrent accesses to the tree. The trick
is the following:

When do_balance is called, a unique global variable called cur_tb
takes a pointer to the current tree to be rebalanced.
Once do_balance finishes its work, cur_tb takes the NULL value.

Then, concurrent tree readers/writers just have to check the value
of cur_tb to ensure do_balance isn't executing concurrently.
If it is, then it proves that schedule() occured on do_balance(),
which then relaxed the bkl that protected the tree.

Now that the bkl has be turned into a mutex, this check is still
fine even though do_balance() becomes preemptible: the write lock
will not be automatically released on schedule(), so the tree is
still protected.

But this is only fine if we have a single reiserfs mountpoint.
Indeed, because the bkl is a global lock, it didn't allowed
concurrent executions between a tree reader/writer in a mount point
and a do_balance() on another tree from another mountpoint.

So assuming all these readers/writers weren't supposed to be
reentrant, the current check now sometimes detect false positives with
the current per-superblock mutex which allows this reentrancy.

This patch keeps the concurrent tree accesses check but moves it
per superblock, so that only trees from a same mount point are
checked to be not accessed concurrently.

[ Impact: fix spurious panic while running several reiserfs mount-points ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

08f14fc8

kill-the-bkl/reiserfs: acquire the inode mutex safely · c72e0575

由 Frederic Weisbecker 提交于 5月 16, 2009

While searching a pathname, an inode mutex can be acquired
in do_lookup() which calls reiserfs_lookup() which in turn
acquires the write lock.

On the other side reiserfs_fill_super() can acquire the write_lock
and then call reiserfs_lookup_privroot() which can acquire an
inode mutex (the root of the mount point).

So we theoretically risk an AB - BA lock inversion that could lead
to a deadlock.

As for other lock dependencies found since the bkl to mutex
conversion, the fix is to use reiserfs_mutex_lock_safe() which
drops the lock dependency to the write lock.

[ Impact: fix a possible deadlock with reiserfs ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

c72e0575

kill-the-bkl/reiserfs: unlock only when needed in search_by_key · 2ac62695

由 Frederic Weisbecker 提交于 5月 14, 2009

search_by_key() is the site which most requires the lock.
This is mostly because it is a very central function and also
because it releases/reaqcuires the write lock at least once each
time it is called.

Such release/reacquire creates a lot of contention in this place and
also opens more the window which let another thread changing the tree.
When it happens, the current path searching over the tree must be
retried from the beggining (the root) which is a wasteful and
time consuming recovery.

This patch factorizes two release/reacquire sequences:

- reading leaf nodes blocks
- reading current block

The latter immediately follows the former.

The whole sequence is safe as a single unlocked section because
we check just after if the tree has changed during these operations.

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

2ac62695

kill-the-bkl/reiserfs: use mutex_lock in reiserfs_mutex_lock_safe · c63e3c0b

由 Frederic Weisbecker 提交于 5月 08, 2009

reiserfs_mutex_lock_safe() is a hack to avoid any dependency between
an internal reiserfs mutex and the write lock, it has been proposed
to follow the old bkl logic.

The code does the following:

while (!mutex_trylock(m)) {
	reiserfs_write_unlock(s);
	schedule();
	reiserfs_write_lock(s);
}

It then imitate the implicit behaviour of the lock when it was
a Bkl and hadn't such dependency:

mutex_lock(m) {
	if (fastpath)
		let's go
	else {
		wait_for_mutex() {
			schedule() {
				unlock_kernel()
				reacquire_lock_kernel()
			}
		}
	}
}

The problem is that by using such explicit schedule(), we don't
benefit of the adaptive mutex spinning on owner.

The logic in use now is:

reiserfs_write_unlock(s);
mutex_lock(m); // -> possible adaptive spinning
reiserfs_write_lock(s);

[ Impact: restore the use of adaptive spinning mutexes in reiserfs ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

c63e3c0b

kill-the-bkl/reiserfs: factorize the locking in reiserfs_write_end() · d6f5b0aa

由 Frederic Weisbecker 提交于 5月 08, 2009

reiserfs_write_end() is a hot path in reiserfs.
We have two wasteful write lock lock/release inside that can be gathered
without changing the code logic.

This patch factorizes them out in a single protected section, reducing the
number of contentions inside.

[ Impact: reduce lock contention in a reiserfs hotpath ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

d6f5b0aa

kill-the-bkl/reiserfs: reduce number of contentions in search_by_key() · 09eb47a7

由 Frederic Weisbecker 提交于 5月 08, 2009

search_by_key() is a central function in reiserfs which searches
the patch in the fs tree from the root to a node given its key.

It is the function that is most requesting the write lock
because it's a path very often used.

Also we forget to release the lock while reading the next tree node,
making us holding the lock in a wasteful way.

Then we release the lock while reading the current node and its childs,
all-in-one. It should be safe because we have a reference to these
blocks and even if we read a block that will be concurrently changed,
we have an fs_changed check later that will make us retry the path from
the root.

[ Impact: release the write lock while unused in a hot path ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

09eb47a7

kill-the-bkl/reiserfs: don't hold the write recursively in reiserfs_lookup() · b1c839bb

由 Frederic Weisbecker 提交于 5月 08, 2009

The write lock can be acquired recursively in reiserfs_lookup(). But we may
want to *really* release the lock before possible rescheduling from a
reiserfs_lookup() callee.

Hence we want to only acquire the lock once (ie: not recursively).

[ Impact: prevent from possible false unreleased write lock on sleeping ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

b1c839bb

kill-the-bkl/reiserfs: lock only once on reiserfs_get_block() · 26931309

由 Frederic Weisbecker 提交于 5月 07, 2009

reiserfs_get_block() is one of these sites where the write lock might
be acquired recursively.

It's a particular problem because this function is called very often.
It's a hot spot which needs to reschedule() periodically while converting
direct items to indirect ones because it can take some time.

Then if we are applying the write lock release/reacquire pattern on
schedule() here, it may not produce the desired effect since we may have
locked in more than one depth.

The solution is to use reiserfs_write_lock_once() which won't try
to reacquire the lock recursively. Then the lock will be *really*
released before schedule().

Also, we only release the lock if TIF_NEED_RESCHED is set to not
create wasteful numerous contentions.

[ Impact: fix a too long holded lock case in reiserfs_get_block() ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

26931309

kill-the-BKL/reiserfs: release the write lock on flush_commit_list() · 6e3647ac

由 Frederic Weisbecker 提交于 5月 01, 2009

flush_commit_list() uses ll_rw_block() to commit the pending log blocks.
ll_rw_block() might sleep, and the bkl was released at this point. Then
we can also relax the write lock at this point.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

6e3647ac

kill-the-BKL/reiserfs: release the write lock inside reiserfs_read_bitmap_block() · 4c5eface

由 Frederic Weisbecker 提交于 5月 01, 2009

reiserfs_read_bitmap_block() uses sb_bread() to read the bitmap block. This
helper might sleep.

Then, when the bkl was used, it was released at this point. We can then
relax the write lock too here.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

4c5eface

kill-the-BKL/reiserfs: release the write lock inside get_neighbors() · 148d3504

由 Frederic Weisbecker 提交于 5月 01, 2009

get_neighbors() is used to get the left and/or right blocks
against a given one in order to balance a tree.

sb_bread() is used to read the buffer of these neighors blocks and
while it waits for this operation, it might sleep.

The bkl was released at this point, and then we can also release
the write lock before calling sb_bread().

This is safe because if the filesystem is changed after this
lock release, the function returns REPEAT_SEARCH (aka SCHEDULE_OCCURRED
in the function header comments) in order to repeat the neighbhor
research.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

148d3504

kill-the-BKL/reiserfs: release write lock while rescheduling on prepare_for_delete_or_cut() · 5e69e3a4

由 Frederic Weisbecker 提交于 4月 30, 2009

prepare_for_delete_or_cut() can process several types of items, including
indirect items, ie: items which contain no file data but pointers to
unformatted nodes scattering the datas of a file.

In this case it has to zero out these pointers to block numbers of
unformatted nodes and release the bitmap from these block numbers.

It can take some time, so a rescheduling() is performed between each
block processed. We can safely release the write lock while
rescheduling(), like the bkl did, because the code checks just after
if the item has moved after sleeping.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

5e69e3a4

kill-the-BKL/reiserfs: release the write lock before rescheduling on do_journal_end() · e6950a4d

由 Frederic Weisbecker 提交于 4月 30, 2009

When do_journal_end() copies data to the journal blocks buffers in memory,
it reschedules if needed between each block copied and dirtyfied.

We can also release the write lock at this rescheduling stage,
like did the bkl implicitly.

[ Impact: release the reiserfs write lock when it is not needed ]

Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

e6950a4d

kill-the-BKL/reiserfs: only acquire the write lock once in reiserfs_dirty_inode · dc8f6d89

由 Frederic Weisbecker 提交于 4月 14, 2009

Impact: fix a deadlock

reiserfs_dirty_inode() is the super_operations::dirty_inode() callback
of reiserfs. It can be called from different contexts where the write
lock can be already held.

But this function also grab the write lock (possibly recursively).
Subsequent release of the lock before sleep will actually not release
the lock if the caller of mark_inode_dirty() (which in turn calls
reiserfs_dirty_inode()) already owns the lock.

A typical case:

reiserfs_write_end() {
	acquire_write_lock()
	mark_inode_dirty() {
		reiserfs_dirty_inode() {
			reacquire_write_lock() {
				journal_begin() {
					do_journal_begin_r() {
						/*
						 * fail to release, still
						 * one depth of lock
						 */
						release_write_lock()
						reiserfs_wait_on_write_block() {
							wait_event()

The event is usually provided by something which needs the write lock but
it hasn't been released.

We use reiserfs_write_lock_once() here to ensure we only grab the
write lock in one level.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
LKML-Reference: <1239680065-25013-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dc8f6d89

kill-the-BKL/reiserfs: lock only once in reiserfs_truncate_file · 22c963ad

由 Frederic Weisbecker 提交于 4月 14, 2009

Impact: fix a deadlock

reiserfs_truncate_file() can be called from multiple context where
the write lock can be already hold or not.

This function also acquire (possibly recursively) the write
lock. Subsequent releases before sleeping will not actually release
the lock because we may be in more than one lock depth degree.

A typical case is:

reiserfs_file_release {
	acquire_the_lock()
	reiserfs_truncate_file()
		reacquire_the_lock()
		journal_begin() {
			do_journal_begin_r() {
				reiserfs_wait_on_write_block() {
					/*
					 * Not released because still one
					 * depth owned
					 */
					release_lock()
					wait_for_event()

At this stage the event never happen because the one which provides
it needs the write lock.

We use reiserfs_write_lock_once() here to ensure that we don't acquire the
write lock recursively.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
LKML-Reference: <1239680065-25013-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

22c963ad

kill-the-BKL/reiserfs: provide a tool to lock only once the write lock · daf88c89

由 Frederic Weisbecker 提交于 4月 14, 2009

Sometimes we don't want to recursively hold the per superblock write
lock because we want to be sure it is actually released when we come
to sleep.

This patch introduces the necessary tools for that.

reiserfs_write_lock_once() does the same job than reiserfs_write_lock()
except that it won't try to acquire recursively the lock if the current
task already owns it. Also the lock_depth before the call of this function
is returned.

reiserfs_write_unlock_once() unlock only if reiserfs_write_lock_once()
returned a depth equal to -1, ie: only if it actually locked.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Chris Mason <chris.mason@oracle.com>
LKML-Reference: <1239680065-25013-2-git-send-email-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

daf88c89

reiserfs, kill-the-BKL: fix unsafe j_flush_mutex lock · a412f9ef

由 Frederic Weisbecker 提交于 4月 14, 2009

Impact: fix a deadlock

The j_flush_mutex is acquired safely in journal.c:
if we can't take it, we free the reiserfs per superblock lock
and wait a bit.

But we have a remaining place in kupdate_transactions() where
j_flush_mutex is still acquired traditionnaly. Thus the following
scenario (warned by lockdep) can happen:

A						B

mutex_lock(&write_lock)			mutex_lock(&write_lock)
	mutex_lock(&j_flush_mutex)	mutex_lock(&j_flush_mutex) //block
	mutex_unlock(&write_lock)
	sleep...
	mutex_lock(&write_lock) //deadlock

Fix this by using reiserfs_mutex_lock_safe() in kupdate_transactions().
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@texware.it>
Cc: Jeff Mahoney <jeffm@suse.com>
LKML-Reference: <1239660635-12940-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a412f9ef

reiserfs: kill-the-BKL · 8ebc4232

由 Frederic Weisbecker 提交于 4月 07, 2009

This patch is an attempt to remove the Bkl based locking scheme from
reiserfs and is intended.

It is a bit inspired from an old attempt by Peter Zijlstra:

   http://lkml.indiana.edu/hypermail/linux/kernel/0704.2/2174.html

The bkl is heavily used in this filesystem to prevent from
concurrent write accesses on the filesystem.

Reiserfs makes a deep use of the specific properties of the Bkl:

- It can be acqquired recursively by a same task
- It is released on the schedule() calls and reacquired when schedule() returns

The two properties above are a roadmap for the reiserfs write locking so it's
very hard to simply replace it with a common mutex.

- We need a recursive-able locking unless we want to restructure several blocks
  of the code.
- We need to identify the sites where the bkl was implictly relaxed
  (schedule, wait, sync, etc...) so that we can in turn release and
  reacquire our new lock explicitly.
  Such implicit releases of the lock are often required to let other
  resources producer/consumer do their job or we can suffer unexpected
  starvations or deadlocks.

So the new lock that replaces the bkl here is a per superblock mutex with a
specific property: it can be acquired recursively by a same task, like the
bkl.

For such purpose, we integrate a lock owner and a lock depth field on the
superblock information structure.

The first axis on this patch is to turn reiserfs_write_(un)lock() function
into a wrapper to manage this mutex. Also some explicit calls to
lock_kernel() have been converted to reiserfs_write_lock() helpers.

The second axis is to find the important blocking sites (schedule...(),
wait_on_buffer(), sync_dirty_buffer(), etc...) and then apply an explicit
release of the write lock on these locations before blocking. Then we can
safely wait for those who can give us resources or those who need some.
Typically this is a fight between the current writer, the reiserfs workqueue
(aka the async commiter) and the pdflush threads.

The third axis is a consequence of the second. The write lock is usually
on top of a lock dependency chain which can include the journal lock, the
flush lock or the commit lock. So it's dangerous to release and trying to
reacquire the write lock while we still hold other locks.

This is fine with the bkl:

      T1                       T2

lock_kernel()
    mutex_lock(A)
    unlock_kernel()
    // do something
                            lock_kernel()
                                mutex_lock(A) -> already locked by T1
                                schedule() (and then unlock_kernel())
    lock_kernel()
    mutex_unlock(A)
    ....

This is not fine with a mutex:

      T1                       T2

mutex_lock(write)
    mutex_lock(A)
    mutex_unlock(write)
    // do something
                           mutex_lock(write)
                              mutex_lock(A) -> already locked by T1
                              schedule()

    mutex_lock(write) -> already locked by T2
    deadlock

The solution in this patch is to provide a helper which releases the write
lock and sleep a bit if we can't lock a mutex that depend on it. It's another
simulation of the bkl behaviour.

The last axis is to locate the fs callbacks that are called with the bkl held,
according to Documentation/filesystem/Locking.

Those are:

- reiserfs_remount
- reiserfs_fill_super
- reiserfs_put_super

Reiserfs didn't need to explicitly lock because of the context of these callbacks.
But now we must take care of that with the new locking.

After this patch, reiserfs suffers from a slight performance regression (for now).
On UP, a high volume write with dd reports an average of 27 MB/s instead
of 30 MB/s without the patch applied.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: NIngo Molnar <mingo@elte.hu>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Bron Gondwana <brong@fastmail.fm>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
LKML-Reference: <1239070789-13354-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8ebc4232

07 9月, 2009 1 次提交

IMA: update ima_counts_put · acd0c935

由 Mimi Zohar 提交于 9月 04, 2009

- As ima_counts_put() may be called after the inode has been freed,
verify that the inode is not NULL, before dereferencing it.

- Maintain the IMA file counters in may_open() properly, decrementing
any counter increments on subsequent errors.
Reported-by: NCiprian Docan <docan@eden.rutgers.edu>
Reported-by: NJ.R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
Acked-by: Eric Paris <eparis@redhat.com
Signed-off-by: NJames Morris <jmorris@namei.org>

acd0c935

06 9月, 2009 2 次提交

ext2: fix unbalanced kmap()/kunmap() · 9de6886e

由 Nicolas Pitre 提交于 9月 05, 2009

In ext2_rename(), dir_page is acquired through ext2_dotdot().  It is
then released through ext2_set_link() but only if old_dir != new_dir.
Failing that, the pkmap reference count is never decremented and the
page remains pinned forever.  Repeat that a couple times with highmem
pages and all pkmap slots get exhausted, and every further kmap() calls
end up stalling on the pkmap_map_wait queue at which point the whole
system comes to a halt.
Signed-off-by: NNicolas Pitre <nico@marvell.com>
Acked-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9de6886e

exec: do not sleep in TASK_TRACED under ->cred_guard_mutex · a2a8474c

由 Oleg Nesterov 提交于 9月 05, 2009

Tom Horsley reports that his debugger hangs when it tries to read
/proc/pid_of_tracee/maps, this happens since

	"mm_for_maps: take ->cred_guard_mutex to fix the race with exec"
	04b836cbf19e885f8366bccb2e4b0474346c02d

commit in 2.6.31.

But the root of the problem lies in the fact that do_execve() path calls
tracehook_report_exec() which can stop if the tracer sets PT_TRACE_EXEC.

The tracee must not sleep in TASK_TRACED holding this mutex.  Even if we
remove ->cred_guard_mutex from mm_for_maps() and proc_pid_attr_write(),
another task doing PTRACE_ATTACH should not hang until it is killed or the
tracee resumes.

With this patch do_execve() does not use ->cred_guard_mutex directly and
we do not hold it throughout, instead:

	- introduce prepare_bprm_creds() helper, it locks the mutex
	  and calls prepare_exec_creds() to initialize bprm->cred.

	- install_exec_creds() drops the mutex after commit_creds(),
	  and thus before tracehook_report_exec()->ptrace_stop().

	  or, if exec fails,

	  free_bprm() drops this mutex when bprm->cred != NULL which
	  indicates install_exec_creds() was not called.
Reported-by: NTom Horsley <tom.horsley@att.net>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2a8474c

05 9月, 2009 1 次提交

ocfs2: ocfs2_write_begin_nolock() should handle len=0 · 8379e7c4

由 Sunil Mushran 提交于 9月 04, 2009

Bug introduced by mainline commit e7432675
The bug causes ocfs2_write_begin_nolock() to oops when len=0.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

8379e7c4

03 9月, 2009 1 次提交

JFFS2: add missing verify buffer allocation/deallocation · bc8cec0d

由 Massimo Cirillo 提交于 8月 27, 2009

The function jffs2_nor_wbuf_flash_setup() doesn't allocate the verify buffer
if CONFIG_JFFS2_FS_WBUF_VERIFY is defined, so causing a kernel panic when
that macro is enabled and the verify function is called. Similarly the
jffs2_nor_wbuf_flash_cleanup() must free the buffer if
CONFIG_JFFS2_FS_WBUF_VERIFY is enabled.
The following patch fixes the problem.
The following patch applies to 2.6.30 kernel.
Signed-off-by: NMassimo Cirillo <maxcir@gmail.com>
Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
Cc: stable@kernel.org

bc8cec0d

02 9月, 2009 1 次提交

xfs: actually enable the swapext compat handler · 3725867d

由 Christoph Hellwig 提交于 9月 01, 2009

Fix a small typo in the compat ioctl handler that cause the swapext
compat handler to never be called.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Tested-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

3725867d

01 9月, 2009 1 次提交

autofs4 - fix missed case when changing to use struct path · 37d0892c

由 Ian Kent 提交于 9月 01, 2009

In the recent change by Al Viro that changes verious subsystems
to use "struct path" one case was missed in the autofs4 module
which causes mounts to no longer expire.
Signed-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

37d0892c

31 8月, 2009 1 次提交

nilfs2: fix preempt count underflow in nilfs_btnode_prepare_change_key · b1f1b8ce

由 Ryusuke Konishi 提交于 8月 30, 2009

This will fix the following preempt count underflow reported from
users with the title "[NILFS users] segctord problem" (Message-ID:
<949415.6494.qm@web58808.mail.re1.yahoo.com> and Message-ID:
<debc30fc0908270825v747c1734xa59126623cfd5b05@mail.gmail.com>):

 WARNING: at kernel/sched.c:4890 sub_preempt_count+0x95/0xa0()
 Hardware name: HP Compaq 6530b (KR980UT#ABC)
 Modules linked in: bridge stp llc bnep rfcomm l2cap xfs exportfs nilfs2 cowloop loop vboxnetadp vboxnetflt vboxdrv btusb bluetooth uvcvideo videodev v4l1_compat v4l2_compat_ioctl32 arc4 snd_hda_codec_analog ecb iwlagn iwlcore rfkill lib80211 mac80211 snd_hda_intel snd_hda_codec ehci_hcd uhci_hcd usbcore snd_hwdep snd_pcm tg3 cfg80211 psmouse snd_timer joydev libphy ohci1394 snd_page_alloc hp_accel lis3lv02d ieee1394 led_class i915 drm i2c_algo_bit video backlight output i2c_core dm_crypt dm_mod
 Pid: 4197, comm: segctord Not tainted 2.6.30-gentoo-r4-64 #7
 Call Trace:
  [<ffffffff8023fa05>] ? sub_preempt_count+0x95/0xa0
  [<ffffffff802470f8>] warn_slowpath_common+0x78/0xd0
  [<ffffffff8024715f>] warn_slowpath_null+0xf/0x20
  [<ffffffff8023fa05>] sub_preempt_count+0x95/0xa0
  [<ffffffffa04ce4db>] nilfs_btnode_prepare_change_key+0x11b/0x190 [nilfs2]
  [<ffffffffa04d01ad>] nilfs_btree_assign_p+0x19d/0x1e0 [nilfs2]
  [<ffffffffa04d10ad>] nilfs_btree_assign+0xbd/0x130 [nilfs2]
  [<ffffffffa04cead7>] nilfs_bmap_assign+0x47/0x70 [nilfs2]
  [<ffffffffa04d9bc6>] nilfs_segctor_do_construct+0x956/0x20f0 [nilfs2]
  [<ffffffff805ac8e2>] ? _spin_unlock_irqrestore+0x12/0x40
  [<ffffffff803c06e0>] ? __up_write+0xe0/0x150
  [<ffffffff80262959>] ? up_write+0x9/0x10
  [<ffffffffa04ce9f3>] ? nilfs_bmap_test_and_clear_dirty+0x43/0x60 [nilfs2]
  [<ffffffffa04cd627>] ? nilfs_mdt_fetch_dirty+0x27/0x60 [nilfs2]
  [<ffffffffa04db5fc>] nilfs_segctor_construct+0x8c/0xd0 [nilfs2]
  [<ffffffffa04dc3dc>] nilfs_segctor_thread+0x15c/0x3a0 [nilfs2]
  [<ffffffffa04dbe20>] ? nilfs_construction_timeout+0x0/0x10 [nilfs2]
  [<ffffffff80252633>] ? add_timer+0x13/0x20
  [<ffffffff802370da>] ? __wake_up_common+0x5a/0x90
  [<ffffffff8025e960>] ? autoremove_wake_function+0x0/0x40
  [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
  [<ffffffffa04dc280>] ? nilfs_segctor_thread+0x0/0x3a0 [nilfs2]
  [<ffffffff8025e556>] kthread+0x56/0x90
  [<ffffffff8020cdea>] child_rip+0xa/0x20
  [<ffffffff8025e500>] ? kthread+0x0/0x90
  [<ffffffff8020cde0>] ? child_rip+0x0/0x20

This problem was caused due to a missing radix_tree_preload() call in
the retry path of nilfs_btnode_prepare_change_key() function.
Reported-by: NEric A <eric225125@yahoo.com>
Reported-by: NJerome Poulin <jeromepoulin@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NJerome Poulin <jeromepoulin@gmail.com>
Cc: stable@kernel.org

b1f1b8ce

29 8月, 2009 1 次提交

inotify: update the group mask on mark addition · 750a8870

由 Eric Paris 提交于 8月 28, 2009

Seperating the addition and update of marks in inotify resulted in a
regression in that inotify never gets events. The inotify group mask is
always 0. This mask should be updated any time a new mark is added.
Signed-off-by: NEric Paris <eparis@redhat.com>

750a8870

28 8月, 2009 4 次提交

inotify: fix length reporting and size checking · 83cb10f0

由 Eric Paris 提交于 8月 28, 2009

0db501bd introduced a regresion in that it now sends a nul
terminator but the length accounting when checking for space or
reporting to userspace did not take this into account.  This corrects
all of the rounding logic.
Signed-off-by: NEric Paris <eparis@redhat.com>

83cb10f0

inotify: do not send a block of zeros when no pathname is available · b962e731

由 Brian Rogers 提交于 8月 28, 2009

When an event has no pathname, there's no need to pad it with a null byte and
therefore generate an inotify_event sized block of zeros. This fixes a
regression introduced by commit 0db501bd where
my system wouldn't finish booting because some process was being confused by
this.
Signed-off-by: NBrian Rogers <brian@xyzw.org>
Signed-off-by: NEric Paris <eparis@redhat.com>

b962e731

ocfs2: invalidate dentry if its dentry_lock isn't initialized. · a1b08e75

由 Tao Ma 提交于 8月 27, 2009

In commit a5a0a630, when
ocfs2_attch_dentry_lock fails, we call an extra iput and reset
dentry->d_fsdata to NULL. This resolve a bug, but it isn't
completed and the dentry is still there. When we want to use
it again, ocfs2_dentry_revalidate doesn't catch it and return
true. That make future ocfs2_dentry_lock panic out.
One bug is http://oss.oracle.com/bugzilla/show_bug.cgi?id=1162.

The resolution is to add a check for dentry->d_fsdata in
revalidate process and return false if dentry->d_fsdata is NULL,
so that a new ocfs2_lookup will be called again.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

a1b08e75

AFS: Stop readlink() on AFS crashing due to NULL 'file' ptr · 9886e836

由 David Howells 提交于 8月 27, 2009

kAFS crashes when asked to read a symbolic link because page_getlink()
passes a NULL file pointer to read_mapping_page(), but afs_readpage()
expects a file pointer from which to extract a key.

Modify afs_readpage() to request the appropriate key from the calling
process's keyrings if a file struct is not supplied with one attached.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9886e836

27 8月, 2009 2 次提交

inotify: Ensure we alwasy write the terminating NULL. · 0db501bd

由 Eric W. Biederman 提交于 8月 27, 2009

Before the rewrite copy_event_to_user always wrote a terqminating '\0'
byte to user space after the filename.  Since the rewrite that
terminating byte was skipped if your filename is exactly a multiple of
event_size.  Ouch!

So add one byte to name_size before we round up and use clear_user to
set userspace to zero like /dev/zero does instead of copying the
strange nul_inotify_event.  I can't quite convince myself len_to_zero
will never exceed 16 and even if it doesn't clear_user should be more
efficient and a more accurate reflection of what the code is trying to
do.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NEric Paris <eparis@redhat.com>

0db501bd

inotify: fix locking around inotify watching in the idr · dead537d

由 Eric Paris 提交于 8月 24, 2009

The are races around the idr storage of inotify watches. It's possible
that a watch could be found from sys_inotify_rm_watch() in the idr, but it
could be removed from the idr before that code does it's removal. Move the
locking and the refcnt'ing so that these have to happen atomically.
Signed-off-by: NEric Paris <eparis@redhat.com>

dead537d

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多