提交 · b4ce94de9b4d64e8ab3cf155d13653c666e22b9b · openanolis / cloud-kernel

04 2月, 2009 8 次提交

Btrfs: Change btree locking to use explicit blocking points · b4ce94de

由 Chris Mason 提交于 2月 04, 2009

Most of the btrfs metadata operations can be protected by a spinlock,
but some operations still need to schedule.

So far, btrfs has been using a mutex along with a trylock loop,
most of the time it is able to avoid going for the full mutex, so
the trylock loop is a big performance gain.

This commit is step one for getting rid of the blocking locks entirely.
btrfs_tree_lock takes a spinlock, and the code explicitly switches
to a blocking lock when it starts an operation that can schedule.

We'll be able get rid of the blocking locks in smaller pieces over time.
Tracing allows us to find the most common cause of blocking, so we
can start with the hot spots first.

The basic idea is:

btrfs_tree_lock() returns with the spin lock held

btrfs_set_lock_blocking() sets the EXTENT_BUFFER_BLOCKING bit in
the extent buffer flags, and then drops the spin lock.  The buffer is
still considered locked by all of the btrfs code.

If btrfs_tree_lock gets the spinlock but finds the blocking bit set, it drops
the spin lock and waits on a wait queue for the blocking bit to go away.

Much of the code that needs to set the blocking bit finishes without actually
blocking a good percentage of the time.  So, an adaptive spin is still
used against the blocking bit to avoid very high context switch rates.

btrfs_clear_lock_blocking() clears the blocking bit and returns
with the spinlock held again.

btrfs_tree_unlock() can be called on either blocking or spinning locks,
it does the right thing based on the blocking bit.

ctree.c has a helper function to set/clear all the locked buffers in a
path as blocking.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b4ce94de

Btrfs: hash_lock is no longer needed · c487685d

由 Chris Mason 提交于 2月 04, 2009

Before metadata is written to disk, it is updated to reflect that writeout
has begun.  Once this update is done, the block must be cow'd before it
can be modified again.

This update was originally synchronized by using a per-fs spinlock.  Today
the buffers for the metadata blocks are locked before writeout begins,
and everyone that tests the flag has the buffer locked as well.

So, the per-fs spinlock (called hash_lock for no good reason) is no
longer required.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c487685d

Btrfs: disable leak debugging checks in extent_io.c · 3935127c

由 Chris Mason 提交于 2月 04, 2009

extent_io.c has debugging code to report and free leaked extent_state
and extent_buffer objects at rmmod time.  This helps track down
leaks and it saves you from rebooting just to properly remove the
kmem_cache object.

But, the code runs under a fairly expensive spinlock and the checks to
see if it is currently enabled are not entirely consistent.  Some use
#ifdef and some #if.

This changes everything to #if and disables the leak checking.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3935127c

Btrfs: sort references by byte number during btrfs_inc_ref · b7a9f29f

由 Chris Mason 提交于 2月 04, 2009

When a block goes through cow, we update the reference counts of
everything that block points to. The internal pointers of the block
can be in just about any order, and it is likely to have clusters of
things that are close together and clusters of things that are not.

To help reduce the seeks that come with updating all of these reference
counts, sort them by byte number before actual updates are done.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b7a9f29f

Btrfs: async threads should try harder to find work · b51912c9

由 Chris Mason 提交于 2月 04, 2009

Tracing shows the delay between when an async thread goes to sleep
and when more work is added is often very short.  This commit adds
a little bit of delay and extra checking to the code right before
we schedule out.

It allows more work to be added to the worker
without requiring notifications from other procs.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b51912c9

Btrfs: selinux support · 0279b4cd

由 Jim Owens 提交于 2月 04, 2009

Add call to LSM security initialization and save
resulting security xattr for new inodes.

Add xattr support to symlink inode ops.

Set inode->i_op for existing special files.
Signed-off-by: Njim owens <jowens@hp.com>

0279b4cd

Btrfs: make btrfs acls selectable · bef62ef3

由 Christian Hesse 提交于 2月 04, 2009

This patch adds a menu entry to kconfig to enable acls for btrfs.
This allows you to enable FS_POSIX_ACL at kernel compile time.

(updated by Jeff Mahoney to make the changes in fs/btrfs/Kconfig instead)
Signed-off-by: NChristian Hesse <mail@earthworm.de>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

bef62ef3

Btrfs: Catch missed bios in the async bio submission thread · a6837051

由 Chris Mason 提交于 2月 04, 2009

The async bio submission thread was missing some bios that were
added after it had decided there was no work left to do.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a6837051

29 1月, 2009 1 次提交

Btrfs: fix readdir on 32 bit machines · 89f135d8

由 Chris Mason 提交于 1月 28, 2009

After btrfs_readdir has gone through all the directory items, it
sets the directory f_pos to the largest possible int.  This way
applications that mix readdir with creating new files don't
end up in an endless loop finding the new directory items as they go.

It was a workaround for a bug in git, but the assumption was that if git
could make this looping mistake than it would be a common problem.

The largest possible int chosen was INT_LIMIT(typeof(file->f_pos),
and it is possible for that to be a larger number than 32 bit glibc
expects to come out of readdir.

This patches switches that to INT_LIMIT(off_t), which should keep
applications happy on 32 and 64 bit machines.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

89f135d8

28 1月, 2009 3 次提交

nfsd: only set file_lock.fl_lmops in nfsd4_lockt if a stateowner is found · fa82a491

由 Jeff Layton 提交于 1月 22, 2009

nfsd4_lockt does a search for a lockstateowner when building the lock
struct to test. If one is found, it'll set fl_owner to it. Regardless of
whether that happens, it'll also set fl_lmops. Given that this lock is
basically a "lightweight" lock that's just used for checking conflicts,
setting fl_lmops is probably not appropriate for it.

This behavior exposed a bug in DLM's GETLK implementation where it
wasn't clearing out the fields in the file_lock before filling in
conflicting lock info. While we were able to fix this in DLM, it
still seems pointless and dangerous to set the fl_lmops this way
when we may have a NULL lockstateowner.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@pig.fieldses.org>

fa82a491

nfsd: fix cred leak on every rpc · b914152a

由 J. Bruce Fields 提交于 1月 20, 2009

Since override_creds() took its own reference on new, we need to release
our own reference.

(Note the put_cred on the return value puts the *old* value of
current->creds, not the new passed-in value).
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

b914152a

nfsd: fix null dereference on error path · bf935a78

由 J. Bruce Fields 提交于 1月 20, 2009

We're forgetting to check the return value from groups_alloc().
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>

bf935a78

27 1月, 2009 1 次提交

inotify: clean up inotify_read and fix locking problems · 3632dee2

由 Vegard Nossum 提交于 1月 22, 2009

If userspace supplies an invalid pointer to a read() of an inotify
instance, the inotify device's event list mutex is unlocked twice.
This causes an unbalance which effectively leaves the data structure
unprotected, and we can trigger oopses by accessing the inotify
instance from different tasks concurrently.

The best fix (contributed largely by Linus) is a total rewrite
of the function in question:

On Thu, Jan 22, 2009 at 7:05 AM, Linus Torvalds wrote:
> The thing to notice is that:
>
>  - locking is done in just one place, and there is no question about it
>   not having an unlock.
>
>  - that whole double-while(1)-loop thing is gone.
>
>  - use multiple functions to make nesting and error handling sane
>
>  - do error testing after doing the things you always need to do, ie do
>   this:
>
>        mutex_lock(..)
>        ret = function_call();
>        mutex_unlock(..)
>
>        .. test ret here ..
>
>   instead of doing conditional exits with unlocking or freeing.
>
> So if the code is written in this way, it may still be buggy, but at least
> it's not buggy because of subtle "forgot to unlock" or "forgot to free"
> issues.
>
> This _always_ unlocks if it locked, and it always frees if it got a
> non-error kevent.

Cc: John McCutchan <ttb@tentacle.dhs.org>
Cc: Robert Love <rlove@google.com>
Cc: <stable@kernel.org>
Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3632dee2

26 1月, 2009 5 次提交

fuse: fix poll notify · f6d47a17

由 Miklos Szeredi 提交于 1月 26, 2009

Move fuse_copy_finish() to before calling fuse_notify_poll_wakeup().
This is not a big issue because fuse_notify_poll_wakeup() should be
atomic, but it's cleaner this way, and later uses of notification will
need to be able to finish the copying before performing some actions.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

f6d47a17

fuse: destroy bdi on umount · 26c36791

由 Miklos Szeredi 提交于 1月 26, 2009

If a fuse filesystem is unmounted but the device file descriptor
remains open and a new mount reuses the old device number, then the
mount fails with EEXIST and the following warning is printed in the
kernel log:

  WARNING: at fs/sysfs/dir.c:462 sysfs_add_one+0x35/0x3d()
  sysfs: duplicate filename '0:15' can not be created

The cause is that the bdi belonging to the fuse filesystem was
destoryed only after the device file was released.  Fix this by
calling bdi_destroy() from fuse_put_super() instead.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org

26c36791

fuse: fuse_fill_super error handling cleanup · c2b8f006

由 Miklos Szeredi 提交于 1月 26, 2009

Clean up error handling for the whole of fuse_fill_super() function.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>

c2b8f006

fuse: fix missing fput on error · 3ddf1e7f

由 Miklos Szeredi 提交于 1月 26, 2009

Fix the leaking file reference if allocation or initialization of
fuse_conn failed.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org

3ddf1e7f

fuse: fix NULL deref in fuse_file_alloc() · bb875b38

由 Dan Carpenter 提交于 1月 26, 2009

ff is set to NULL and then dereferenced on line 65.  Compile tested only.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org

bb875b38

22 1月, 2009 22 次提交
- C
  Btrfs: do less aggressive btree readahead · a7175319
  由 Chris Mason 提交于 1月 22, 2009
```
Just before reading a leaf, btrfs scans the node for blocks that are
close by and reads them too.  It tries to build up a large window
of IO looking for blocks that are within a max distance from the top
and bottom of the IO window.

This patch changes things to just look for blocks within 64k of the
target block.  It will trigger less IO and make for lower latencies on
the read size.
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
  a7175319
- A
  fs/Kconfig: move 9p out · 0fcb4408
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  0fcb4408
- A
  fs/Kconfig: move afs out · b2480c7f
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  b2480c7f
- A
  fs/Kconfig: move coda out · 33a1a6fe
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  33a1a6fe
- A
  fs/Kconfig: move the rest of ncpfs out · 9d7d6447
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  9d7d6447
- A
  fs/Kconfig: move smbfs out · 213a41d4
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  213a41d4
- A
  fs/Kconfig: move sunrpc out · 9098c24f
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  9098c24f
- A
  fs/Kconfig: move nfsd out · e2b329e2
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  e2b329e2
- A
  fs/Kconfig: move nfs out · 97afe47a
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  97afe47a
- A
  fs/Kconfig: move ufs out · a276a52f
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  a276a52f
- A
  fs/Kconfig: move sysv out · 8af915ba
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  8af915ba
- A
  fs/Kconfig: move romfs out · 41810246
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  41810246
- A
  fs/Kconfig: move qnx4 out · 4c741583
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  4c741583
- A
  fs/Kconfig: move hpfs out · 928ea192
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  928ea192
- A
  fs/Kconfig: move omfs out · da55e6f9
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  da55e6f9
- A
  fs/Kconfig: move minix out · 8b1cd7d3
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  8b1cd7d3
- A
  fs/Kconfig: move vxfs out · 22135169
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  22135169
- A
  fs/Kconfig: move squashfs out · 22635ec9
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  22635ec9
- A
  fs/Kconfig: move cramfs out · 2a22783b
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  2a22783b
- A
  fs/Kconfig: move efs out · 571f0a0b
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  571f0a0b
- A
  fs/Kconfig: move bfs out · 0ff42384
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  0ff42384
- A
  fs/Kconfig: move befs out · 0b09eb32
  由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
  0b09eb32

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功