提交 · 766ccb9ed406c230d13c145def08ebea1b932982 · openanolis / cloud-kernel

09 2月, 2009 1 次提交

async: Rename _special -> _domain for clarity. · 766ccb9e

由 Cornelia Huck 提交于 1月 20, 2009

Rename the async_*_special() functions to async_*_domain(), which
describes the purpose of these functions much better.
[Broke up long lines to silence checkpatch]
Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

766ccb9e

07 2月, 2009 5 次提交

eCryptfs: Regression in unencrypted filename symlinks · fd9fc842

由 Tyler Hicks 提交于 2月 06, 2009

The addition of filename encryption caused a regression in unencrypted
filename symlink support. ecryptfs_copy_filename() is used when dealing
with unencrypted filenames and it reported that the new, copied filename
was a character longer than it should have been.

This caused the return value of readlink() to count the NULL byte of the
symlink target. Most applications don't care about the extra NULL byte,
but a version control system (bzr) helped in discovering the bug.
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fd9fc842

elf core dump: fix get_user use · 92dc07b1

由 Roland McGrath 提交于 2月 06, 2009

The elf_core_dump() code does its work with set_fs(KERNEL_DS) in force,
so vma_dump_size() needs to switch back with set_fs(USER_DS) to safely
use get_user() for a normal user-space address.

Checking for VM_READ optimizes out the case where get_user() would fail
anyway. The vm_file check here was already superfluous given the control
flow earlier in the function, so that is a cleanup/optimization unrelated
to other changes but an obvious and trivial one.
Reported-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NRoland McGrath <roland@redhat.com>

92dc07b1

CRED: Fix SUID exec regression · 0bf2f3ae

由 David Howells 提交于 2月 06, 2009

The patch:

	commit a6f76f23
	CRED: Make execve() take advantage of copy-on-write credentials

moved the place in which the 'safeness' of a SUID/SGID exec was performed to
before de_thread() was called.  This means that LSM_UNSAFE_SHARE is now
calculated incorrectly.  This flag is set if any of the usage counts for
fs_struct, files_struct and sighand_struct are greater than 1 at the time the
determination is made.  All of which are true for threads created by the
pthread library.

However, since we wish to make the security calculation before irrevocably
damaging the process so that we can return it an error code in the case where
we decide we want to reject the exec request on this basis, we have to make the
determination before calling de_thread().

So, instead, we count up the number of threads (CLONE_THREAD) that are sharing
our fs_struct (CLONE_FS), files_struct (CLONE_FILES) and sighand_structs
(CLONE_SIGHAND/CLONE_THREAD) with us.  These will be killed by de_thread() and
so can be discounted by check_unsafe_exec().

We do have to be careful because CLONE_THREAD does not imply FS or FILES.

We _assume_ that there will be no extra references to these structs held by the
threads we're going to kill.

This can be tested with the attached pair of programs.  Build the two programs
using the Makefile supplied, and run ./test1 as a non-root user.  If
successful, you should see something like:

	[dhowells@andromeda tmp]$ ./test1
	--TEST1--
	uid=4043, euid=4043 suid=4043
	exec ./test2
	--TEST2--
	uid=4043, euid=0 suid=0
	SUCCESS - Correct effective user ID

and if unsuccessful, something like:

	[dhowells@andromeda tmp]$ ./test1
	--TEST1--
	uid=4043, euid=4043 suid=4043
	exec ./test2
	--TEST2--
	uid=4043, euid=4043 suid=4043
	ERROR - Incorrect effective user ID!

The non-root user ID you see will depend on the user you run as.

[test1.c]
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>

static void *thread_func(void *arg)
{
	while (1) {}
}

int main(int argc, char **argv)
{
	pthread_t tid;
	uid_t uid, euid, suid;

	printf("--TEST1--\n");
	getresuid(&uid, &euid, &suid);
	printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid);

	if (pthread_create(&tid, NULL, thread_func, NULL) < 0) {
		perror("pthread_create");
		exit(1);
	}

	printf("exec ./test2\n");
	execlp("./test2", "test2", NULL);
	perror("./test2");
	_exit(1);
}

[test2.c]
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv)
{
	uid_t uid, euid, suid;

	getresuid(&uid, &euid, &suid);
	printf("--TEST2--\n");
	printf("uid=%d, euid=%d suid=%d\n", uid, euid, suid);

	if (euid != 0) {
		fprintf(stderr, "ERROR - Incorrect effective user ID!\n");
		exit(1);
	}
	printf("SUCCESS - Correct effective user ID\n");
	exit(0);
}

[Makefile]
CFLAGS = -D_GNU_SOURCE -Wall -Werror -Wunused
all: test1 test2

test1: test1.c
	gcc $(CFLAGS) -o test1 test1.c -lpthread

test2: test2.c
	gcc $(CFLAGS) -o test2 test2.c
	sudo chown root.root test2
	sudo chmod +s test2
Reported-by: NDavid Smith <dsmith@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NDavid Smith <dsmith@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

0bf2f3ae

vfs: Don't call attach_nobh_buffers() with an empty list · d4cf109f

由 Dave Kleikamp 提交于 2月 06, 2009

This is a modification of a patch by Bill Pemberton <wfp5p@virginia.edu>

nobh_write_end() could call attach_nobh_buffers() with head == NULL.
This would result in a trap when attach_nobh_buffers() attempted to
access bh->b_this_page.

This can be illustrated by running the writev01 testcase from LTP on jfs.

This error was introduced by commit 5b41e74a "vfs: fix data leak in
nobh_write_end()".  That patch did not take into account that if
PageMappedToDisk() is true upon entry to nobh_write_begin(), then no
buffers will be allocated for the page.  In that case, we won't have to
worry about a failed write leaving unitialized data in the page.

Of course, head != NULL implies !page_has_buffers(page), so no need to
test both.
Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Dmitri Monakhov <dmonakhov@openvz.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d4cf109f

Btrfs: Make sure dir is non-null before doing S_ISGID checks · 42f15d77

由 Chris Mason 提交于 2月 06, 2009

The S_ISGID check in btrfs_new_inode caused an oops during subvol creation
because sometimes the dir is null.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

42f15d77

06 2月, 2009 3 次提交

braino in sg_ioctl_trans() · 767b5828

由 Al Viro 提交于 2月 06, 2009

... and yes, gcc is insane enough to eat that without complaint.
We probably want sparse to scream on those...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

767b5828

seq_file: fix big-enough lseek() + read() · f01d1d54

由 Alexey Dobriyan 提交于 2月 06, 2009

lseek() further than length of the file will leave stale ->index
(second-to-last during iteration). Next seq_read() will not notice
that ->f_pos is big enough to return 0, but will print last item
as if ->f_pos is pointing to it.

Introduced in commit cb510b81
aka "seq_file: more atomicity in traverse()".
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f01d1d54

seq_file: move traverse so it can be used from seq_read · 33da8892

由 Eric Biederman 提交于 2月 04, 2009

In 2.6.25 some /proc files were converted to use the seq_file
infrastructure.  But seq_files do not correctly support pread(), which
broke some usersapce applications.

To handle pread correctly we can't assume that f_pos is where we left it
in seq_read.  So move traverse() so that we can eventually use it in
seq_read and do thus some day support pread().
Signed-off-by: NEric Biederman <ebiederm@xmission.com>
Cc: Paul Turner <pjt@google.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

33da8892

05 2月, 2009 2 次提交

Btrfs: Fix memory leak in cache_drop_leaf_ref · 806638bc

由 Chris Mason 提交于 2月 05, 2009

The code wasn't doing a kfree on the sorted array
Signed-off-by: NChris Mason <chris.mason@oracle.com>

806638bc

Revert "configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item()" · 436443f0

由 Mark Fasheh 提交于 2月 03, 2009

This reverts commit 0e033342.

I committed this by accident - Joel and Louis are working with the lockdep
maintainer to provide a better solution than just turning lockdep off.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: N&lt;Joel Becker <joel.becker@oracle.com>

436443f0

04 2月, 2009 19 次提交

Btrfs: don't return congestion in write_cache_pages as often · 9b0d3ace

由 Chris Mason 提交于 2月 04, 2009

On fast devices that go from congested to uncongested very quickly, pdflush
is waiting too often in congestion_wait, and the FS is backing off to
easily in write_cache_pages.

For now, fix this on the btrfs side by only checking congestion after
some bios have already gone down.  Longer term a real fix is needed
for pdflush, but that is a larger project.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9b0d3ace

Btrfs: Only prep for btree deletion balances when nodes are mostly empty · 7b78c170

由 Chris Mason 提交于 2月 04, 2009

Whenever an item deletion is done, we need to balance all the nodes
in the tree to make sure we don't end up with an empty node if a pointer
is deleted.  This balance prep happens from the root of the tree down
so we can drop our locks as we go.

reada_for_balance was triggering read-ahead on neighboring nodes even
when no balancing was required.  This adds an extra check to avoid
calling balance_level() and avoid reada_for_balance() when a balance
won't be required.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7b78c170

Btrfs: fix btrfs_unlock_up_safe to walk the entire path · 12f4dacc

由 Chris Mason 提交于 2月 04, 2009

btrfs_unlock_up_safe would break out at the first NULL node entry or
unlocked node it found in the path.

Some of the callers have missing nodes at the lower levels of the path, so this
commit fixes things to check all the nodes in the path before returning.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

12f4dacc

Btrfs: change btrfs_del_leaf to drop locks earlier · 4d081c41

由 Chris Mason 提交于 2月 04, 2009

btrfs_del_leaf does two things.  First it removes the pointer in the
parent, and then it frees the block that has the leaf.  It has the
parent node locked for both operations.

But, it only needs the parent locked while it is deleting the pointer.
After that it can safely free the block without the parent locked.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4d081c41

Btrfs: Change btrfs_truncate_inode_items to stop when it hits the inode · 06d9a8d7

由 Chris Mason 提交于 2月 04, 2009

btrfs_truncate_inode_items is setup to stop doing btree searches when
it has finished removing the items for the inode.  It used to detect the
end of the inode by looking for an objectid that didn't match the
one we were searching for.

But, this would result in an extra search through the btree, which
adds extra balancing and cow costs to the operation.

This commit adds a check to see if we found the inode item, which means
we can stop searching early.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

06d9a8d7

Btrfs: Don't try to compress pages past i_size · f03d9301

由 Chris Mason 提交于 2月 04, 2009

The compression code had some checks to make sure we were only
compressing bytes inside of i_size, but it wasn't catching every
case.  To make things worse, some incorrect math about the number
of bytes remaining would make it try to compress more pages than the
file really had.

The fix used here is to fall back to the non-compression code in this
case, which does all the proper cleanup of delalloc and other accounting.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f03d9301

Btrfs: join the transaction in __btrfs_setxattr · 81144949

由 Josef Bacik 提交于 2月 04, 2009

With selinux on we end up calling __btrfs_setxattr when we create an inode,
which calls btrfs_start_transaction(). The problem is we've already called
that in btrfs_new_inode, and in btrfs_start_transaction we end up doing a
wait_current_trans(). If btrfs-transaction has started committing it will wait
for all handles to finish, while the other process is waiting for the
transaction to commit. This is fixed by using btrfs_join_transaction, which
won't wait for the transaction to commit. Thanks,
Signed-off-by: NJosef Bacik <jbacik@redhat.com>

81144949

Btrfs: Handle SGID bit when creating inodes · 8c087b51

由 Chris Ball 提交于 2月 04, 2009

Before this patch, new files/dirs would ignore the SGID bit on their
parent directory and always be owned by the creating user's uid/gid.
Signed-off-by: NChris Ball <cjb@laptop.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8c087b51

Btrfs: Make btrfs_drop_snapshot work in larger and more efficient chunks · bd56b302

由 Chris Mason 提交于 2月 04, 2009

Every transaction in btrfs creates a new snapshot, and then schedules the
snapshot from the last transaction for deletion.  Snapshot deletion
works by walking down the btree and dropping the reference counts
on each btree block during the walk.

If if a given leaf or node has a reference count greater than one,
the reference count is decremented and the subtree pointed to by that
node is ignored.

If the reference count is one, walking continues down into that node
or leaf, and the references of everything it points to are decremented.

The old code would try to work in small pieces, walking down the tree
until it found the lowest leaf or node to free and then returning.  This
was very friendly to the rest of the FS because it didn't have a huge
impact on other operations.

But it wouldn't always keep up with the rate that new commits added new
snapshots for deletion, and it wasn't very optimal for the extent
allocation tree because it wasn't finding leaves that were close together
on disk and processing them at the same time.

This changes things to walk down to a level 1 node and then process it
in bulk.  All the leaf pointers are sorted and the leaves are dropped
in order based on their extent number.

The extent allocation tree and commit code are now fast enough for
this kind of bulk processing to work without slowing the rest of the FS
down.  Overall it does less IO and is better able to keep up with
snapshot deletions under high load.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bd56b302

Btrfs: Change btree locking to use explicit blocking points · b4ce94de

由 Chris Mason 提交于 2月 04, 2009

Most of the btrfs metadata operations can be protected by a spinlock,
but some operations still need to schedule.

So far, btrfs has been using a mutex along with a trylock loop,
most of the time it is able to avoid going for the full mutex, so
the trylock loop is a big performance gain.

This commit is step one for getting rid of the blocking locks entirely.
btrfs_tree_lock takes a spinlock, and the code explicitly switches
to a blocking lock when it starts an operation that can schedule.

We'll be able get rid of the blocking locks in smaller pieces over time.
Tracing allows us to find the most common cause of blocking, so we
can start with the hot spots first.

The basic idea is:

btrfs_tree_lock() returns with the spin lock held

btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in
the extent buffer flags, and then drops the spin lock.  The buffer is
still considered locked by all of the btrfs code.

If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops
the spin lock and waits on a wait queue for the blocking bit to go away.

Much of the code that needs to set the blocking bit finishes without actually
blocking a good percentage of the time.  So, an adaptive spin is still
used against the blocking bit to avoid very high context switch rates.

btrfs_clear_lock_blocking() clears the blocking bit and returns
with the spinlock held again.

btrfs_tree_unlock() can be called on either blocking or spinning locks,
it does the right thing based on the blocking bit.

ctree.c has a helper function to set/clear all the locked buffers in a
path as blocking.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b4ce94de

Btrfs: hash_lock is no longer needed · c487685d

由 Chris Mason 提交于 2月 04, 2009

Before metadata is written to disk, it is updated to reflect that writeout
has begun.  Once this update is done, the block must be cow'd before it
can be modified again.

This update was originally synchronized by using a per-fs spinlock.  Today
the buffers for the metadata blocks are locked before writeout begins,
and everyone that tests the flag has the buffer locked as well.

So, the per-fs spinlock (called hash_lock for no good reason) is no
longer required.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c487685d

Btrfs: disable leak debugging checks in extent_io.c · 3935127c

由 Chris Mason 提交于 2月 04, 2009

extent_io.c has debugging code to report and free leaked extent_state
and extent_buffer objects at rmmod time.  This helps track down
leaks and it saves you from rebooting just to properly remove the
kmem_cache object.

But, the code runs under a fairly expensive spinlock and the checks to
see if it is currently enabled are not entirely consistent.  Some use
#ifdef and some #if.

This changes everything to #if and disables the leak checking.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3935127c

Btrfs: sort references by byte number during btrfs_inc_ref · b7a9f29f

由 Chris Mason 提交于 2月 04, 2009

When a block goes through cow, we update the reference counts of
everything that block points to. The internal pointers of the block
can be in just about any order, and it is likely to have clusters of
things that are close together and clusters of things that are not.

To help reduce the seeks that come with updating all of these reference
counts, sort them by byte number before actual updates are done.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b7a9f29f

Btrfs: async threads should try harder to find work · b51912c9

由 Chris Mason 提交于 2月 04, 2009

Tracing shows the delay between when an async thread goes to sleep
and when more work is added is often very short.  This commit adds
a little bit of delay and extra checking to the code right before
we schedule out.

It allows more work to be added to the worker
without requiring notifications from other procs.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b51912c9

Btrfs: selinux support · 0279b4cd

由 Jim Owens 提交于 2月 04, 2009

Add call to LSM security initialization and save
resulting security xattr for new inodes.

Add xattr support to symlink inode ops.

Set inode->i_op for existing special files.
Signed-off-by: Njim owens <jowens@hp.com>

0279b4cd

Btrfs: make btrfs acls selectable · bef62ef3

由 Christian Hesse 提交于 2月 04, 2009

This patch adds a menu entry to kconfig to enable acls for btrfs.
This allows you to enable FS_POSIX_ACL at kernel compile time.

(updated by Jeff Mahoney to make the changes in fs/btrfs/Kconfig instead)
Signed-off-by: NChristian Hesse <mail@earthworm.de>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

bef62ef3

Btrfs: Catch missed bios in the async bio submission thread · a6837051

由 Chris Mason 提交于 2月 04, 2009

The async bio submission thread was missing some bios that were
added after it had decided there was no work left to do.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a6837051

[XFS] Warn on transaction in flight on read-only remount · 43f3f057

由 Felix Blyakher 提交于 1月 22, 2009

Till VFS can correctly support read-only remount without racing,
use WARN_ON instead of BUG_ON on detecting transaction in flight
after quiescing filesystem.
Signed-off-by: NFelix Blyakher <felixb@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

43f3f057

xfs: Check buffer lengths in log recovery · 6139a236

由 Dave Chinner 提交于 1月 22, 2009

Before trying to obtain, read or write a buffer,
check that the buffer length is actually valid. If
it is not valid, then something read in the recovery
process has been corrupted and we should abort
recovery.
Reported-by: NEric Sesterhenn <snakebyte@gmx.de>
Tested-by: NEric Sesterhenn <snakebyte@gmx.de>
Reviewed-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NFelix Blyakher <felixb@sgi.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NFelix Blyakher <felixb@sgi.com>

6139a236

03 2月, 2009 6 次提交

ocfs2: add quota call to ocfs2_remove_btree_range() · fd4ef231

由 Mark Fasheh 提交于 1月 29, 2009

We weren't reclaiming the clusters which get free'd from this function,
so any user punching holes in a file would still have those bytes accounted
against him/her. Add the call to vfs_dq_free_space_nodirty() to fix this.
Interestingly enough, the journal credits calculation already took this into
account.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJan Kara <jack@suse.cz>

fd4ef231

ocfs2: Wakeup the downconvert thread after a successful cancel convert · a4b91965

由 Sunil Mushran 提交于 1月 29, 2009

When two nodes holding PR locks on a resource concurrently attempt to
upconvert the locks to EX, the master sends a BAST to one of the nodes. This
message tells that node to first cancel convert the upconvert request,
followed by downconvert to a NL. Only when this lock is downconverted to NL,
can the master upconvert the first node's lock to EX.

While the fs was doing the cancel convert, it was forgetting to wake up the
dc thread after a successful cancel, leading to a deadlock.
Reported-and-Tested-by: NDavid Teigland <teigland@redhat.com>
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

a4b91965

ocfs2: Access the xattr bucket only before modifying it. · 554e7f9e

由 Tao Ma 提交于 1月 08, 2009

In ocfs2_xattr_value_truncate, we may call b-tree codes which will
extend the journal transaction. It has a potential problem that it
may let the already-accessed-but-not-dirtied buffers gone. So we'd
better access the bucket after we call ocfs2_xattr_value_truncate.
And as for the root buffer for the xattr value, b-tree code will
acess and dirty it, so we don't need to worry about it.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

554e7f9e

configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() · 0e033342

由 Joel Becker 提交于 12月 17, 2008

When attaching default groups (subdirs) of a new group (in mkdir() or
in configfs_register()), configfs recursively takes inode's mutexes
along the path from the parent of the new group to the default
subdirs. This is needed to ensure that the VFS will not race with
operations on these sub-dirs. This is safe for the following reasons:

- the VFS allows one to lock first an inode and second one of its
  children (The lock subclasses for this pattern are respectively
  I_MUTEX_PARENT and I_MUTEX_CHILD);
- from this rule any inode path can be recursively locked in
  descending order as long as it stays under a single mountpoint and
  does not follow symlinks.

Unfortunately lockdep does not know (yet?) how to handle such
recursion.

I've tried to use Peter Zijlstra's lock_set_subclass() helper to
upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know
that we might recursively lock some of their descendant, but this
usage does not seem to fit the purpose of lock_set_subclass() because
it leads to several i_mutex locked with subclass I_MUTEX_PARENT by
the same task.

>From inside configfs it is not possible to serialize those recursive
locking with a top-level one, because mkdir() and rmdir() are already
called with inodes locked by the VFS. So using some
mutex_lock_nest_lock() is not an option.

I am proposing two solutions:
1) one that wraps recursive mutex_lock()s with
   lockdep_off()/lockdep_on().
2) (as suggested earlier by Peter Zijlstra) one that puts the
   i_mutexes recursively locked in different classes based on their
   depth from the top-level config_group created. This
   induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the
   nesting of configfs default groups whenever lockdep is activated
   but this limit looks reasonably high. Unfortunately, this alos
   isolates VFS operations on configfs default groups from the others
   and thus lowers the chances to detect locking issues.

This patch implements solution 1).

Solution 2) looks better from lockdep's point of view, but fails with
configfs_depend_item(). This needs to rework the locking
scheme of configfs_depend_item() by removing the variable lock recursion
depth, and I think that it's doable thanks to the configfs_dirent_lock.
For now, let's stick to solution 1).
Signed-off-by: NLouis Rilling <louis.rilling@kerlabs.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

0e033342

ocfs2: Fix possible deadlock in ocfs2_write_dquot() · f8afead7

由 Jan Kara 提交于 1月 12, 2009

It could happen that some limit has been set via quotactl() and in parallel
->mark_dirty() is called from another thread doing e.g. dquot_alloc_space(). In
such case ocfs2_write_dquot() must not try to sync the dquot because that needs
global quota lock but that ranks above transaction start.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

f8afead7

ocfs2: Push out dropping of dentry lock to ocfs2_wq · ea455f8a

由 Jan Kara 提交于 1月 12, 2009

Dropping of last reference to dentry lock is a complicated operation involving
dropping of reference to inode. This can get complicated and quota code in
particular needs to obtain some quota locks which leads to potential deadlock.
Thus we defer dropping of inode reference to ocfs2_wq.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

ea455f8a

30 1月, 2009 4 次提交

block: Remove obsolete BUG_ON · 8ae372e3

由 Martin K. Petersen 提交于 1月 04, 2009

Now that bio_vecs are no longer cleared in bvec_alloc_bs() the following
BUG_ON must go.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8ae372e3

block: Don't verify integrity metadata on read error · 7b24fc4d

由 Martin K. Petersen 提交于 1月 04, 2009

If we get an I/O error on a read request there is no point in doing a
verify pass on the integrity buffer.  Adjust the completion path
accordingly.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7b24fc4d

ext4: Remove bogus BUG() check in ext4_bmap() · b9ec63f7

由 Theodore Ts'o 提交于 1月 30, 2009

The code to support journal-less ext4 operation added a BUG to
ext4_bmap() which fired if there was no journal and the
EXT4_STATE_JDATA bit was set in the i_state field.  This caused
running the filefrag program (which uses the FIMBAP ioctl) to trigger
a BUG().

The EXT4_STATE_JDATA bit is only used for ext4_bmap(), and it's
harmless for the bit to be set.  We could add a check in
__ext4_journalled_writepage() and ext4_journalled_write_end() to only
set the EXT4_STATE_JDATA bit if the journal is present, but that adds
an extra test and jump instruction.  It's easier to simply remove the
BUG check.

http://bugzilla.kernel.org/show_bug.cgi?id=12568Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

b9ec63f7

epoll: drop max_user_instances and rely only on max_user_watches · 9df04e1f

由 Davide Libenzi 提交于 1月 29, 2009

Linus suggested to put limits where the money is, and max_user_watches
already does that w/out the need of max_user_instances.  That has the
advantage to mitigate the potential DoS while allowing pretty generous
default behavior.

Allowing top 4% of low memory (per user) to be allocated in epoll watches,
we have:

LOMEM    MAX_WATCHES (per user)
512MB    ~178000
1GB      ~356000
2GB      ~712000

A box with 512MB of lomem, will meet some challenge in hitting 180K
watches, socket buffers math teaches us.  No more max_user_instances
limits then.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Bron Gondwana <brong@fastmail.fm>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9df04e1f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功