提交 · 4f2de97acee6532b36dd6e995b858343771ad126 · openeuler / Kernel

27 3月, 2012 4 次提交

Btrfs: set page->private to the eb · 4f2de97a

由 Josef Bacik 提交于 3月 07, 2012

We spend a lot of time looking up extent buffers from pages when we could just
store the pointer to the eb the page is associated with in page->private. This
patch does just that, and it makes things a little simpler and reduces a bit of
CPU overhead involved with doing metadata IO. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4f2de97a

Btrfs: allow metadata blocks larger than the page size · 727011e0

由 Chris Mason 提交于 8月 06, 2010

A few years ago the btrfs code to support blocks lager than
the page size was disabled to fix a few corner cases in the
page cache handling.  This fixes the code to properly support
large metadata blocks again.

Since current kernels will crash early and often with larger
metadata blocks, this adds an incompat bit so that older kernels
can't mount it.

This also does away with different blocksizes for nodes and leaves.
You get a single block size for all tree blocks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

727011e0

Btrfs: remove search_start and search_end from find_free_extent and callers · 81c9ad23

由 Josef Bacik 提交于 1月 18, 2012

We have been passing nothing but (u64)-1 to find_free_extent for search_end in
all of the callers, so it's completely useless, and we've always been passing 0
in as search_start, so just remove them as function arguments and move
search_start into find_free_extent. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

81c9ad23

Btrfs: remove the ideal caching code · 285ff5af

由 Josef Bacik 提交于 1月 13, 2012

This is a relic from before we had the disk space cache and it was to make
bootup times when you had btrfs as root not be so damned slow. Now that we have
the disk space cache this isn't a problem anymore and really having this code
casues uneeded fragmentation and complexity, so just remove it. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

285ff5af

19 3月, 2012 1 次提交

Don't limit non-nested epoll paths · 93dc6107

由 Jason Baron 提交于 3月 16, 2012

Commit 28d82dc1 ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).

The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.

This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578Signed-off-by: NJason Baron <jbaron@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93dc6107

17 3月, 2012 4 次提交

nilfs2: fix NULL pointer dereference in nilfs_load_super_block() · d7178c79

由 Ryusuke Konishi 提交于 3月 16, 2012

According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:

 BUG: unable to handle kernel NULL pointer dereference at 00000048
 IP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2]
 *pde = 00000000
 Oops: 0000 [#1] PREEMPT SMP
 ...
 Call Trace:
  [<d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2]
  [<d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2]
  [<c0226636>] mount_fs+0x36/0x180
  [<c023d961>] vfs_kern_mount+0x51/0xa0
  [<c023ddae>] do_kern_mount+0x3e/0xe0
  [<c023f189>] do_mount+0x169/0x700
  [<c023fa9b>] sys_mount+0x6b/0xa0
  [<c04abd1f>] sysenter_do_call+0x12/0x28
 Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43
 20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72
 48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00
 EIP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc
 CR2: 0000000000000048

This turned out due to a defect in an error path which runs if the
calculated location of the secondary super block was invalid.

This patch fixes it and eliminates the reported oops.
Reported-by: NSlicky Devil <slicky.dvl@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NSlicky Devil <slicky.dvl@gmail.com>
Cc: <stable@vger.kernel.org>	[2.6.30+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d7178c79

nilfs2: clamp ns_r_segments_percentage to [1, 99] · 3d777a64

由 Haogang Chen 提交于 3月 16, 2012

ns_r_segments_percentage is read from the disk.  Bogus or malicious
value could cause integer overflow and malfunction due to meaningless
disk usage calculation.  This patch reports error when mounting such
bogus volumes.
Signed-off-by: NHaogang Chen <haogangchen@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d777a64

afs: Remote abort can cause BUG in rxrpc code · c0173863

由 Anton Blanchard 提交于 3月 16, 2012

When writing files to afs I sometimes hit a BUG:

kernel BUG at fs/afs/rxrpc.c:179!

With a backtrace of:

	afs_free_call
	afs_make_call
	afs_fs_store_data
	afs_vnode_store_data
	afs_write_back_from_locked_page
	afs_writepages_region
	afs_writepages

The cause is:

	ASSERT(skb_queue_empty(&call->rx_queue));

Looking at a tcpdump of the session the abort happens because we
are exceeding our disk quota:

	rx abort fs reply store-data error diskquota exceeded (32)

So the abort error is valid. We hit the BUG because we haven't
freed all the resources for the call.

By freeing any skbs in call->rx_queue before calling afs_free_call
we avoid hitting leaking memory and avoid hitting the BUG.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0173863

afs: Read of file returns EBADMSG · 2c724fb9

由 Anton Blanchard 提交于 3月 16, 2012

A read of a large file on an afs mount failed:

# cat junk.file > /dev/null
cat: junk.file: Bad message

Looking at the trace, call->offset wrapped since it is only an
unsigned short. In afs_extract_data:

        _enter("{%u},{%zu},%d,,%zu", call->offset, len, last, count);
...

        if (call->offset < count) {
                if (last) {
                        _leave(" = -EBADMSG [%d < %zu]", call->offset, count);
                        return -EBADMSG;
                }

Which matches the trace:

[cat   ] ==> afs_extract_data({65132},{524},1,,65536)
[cat   ] <== afs_extract_data() = -EBADMSG [0 < 65536]

call->offset went from 65132 to 0. Fix this by making call->offset an
unsigned int.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2c724fb9

11 3月, 2012 5 次提交

A
restore smp_mb() in unlock_new_inode() · 310fa7a3
由 Al Viro 提交于 3月 10, 2012
```
wait_on_inode() doesn't have ->i_lock
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
310fa7a3

vfs: fix return value from do_last() · 7f6c7e62

由 Miklos Szeredi 提交于 3月 06, 2012

complete_walk() returns either ECHILD or ESTALE.  do_last() turns this into
ECHILD unconditionally.  If not in RCU mode, this error will reach userspace
which is complete nonsense.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7f6c7e62

vfs: fix double put after complete_walk() · 097b180c

由 Miklos Szeredi 提交于 3月 06, 2012

complete_walk() already puts nd->path, no need to do it again at cleanup time.

This would result in Oopses if triggered, apparently the codepath is not too
well exercised.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

097b180c

udf: Fix deadlock in udf_release_file() · f6940fe9

由 Jan Kara 提交于 2月 20, 2012

udf_release_file() can be called from munmap() path with mmap_sem held.  Thus
we cannot take i_mutex there because that ranks above mmap_sem. Luckily,
i_mutex is not needed in udf_release_file() anymore since protection by
i_data_sem is enough to protect from races with write and truncate.
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Reviewed-by: NNamjae Jeon <linkinjeon@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f6940fe9

vfs: Correctly set the dir i_mutex lockdep class · 978d6d8c

由 Tyler Hicks 提交于 12月 12, 2011

9a7aa12f introduced additional logic around setting the i_mutex
lockdep class for directory inodes. The idea was that some filesystems
may want their own special lockdep class for different directory
inodes and calling unlock_new_inode() should not clobber one of
those special classes.

I believe that the added conditional, around the *negated* return value
of lockdep_match_class(), caused directory inodes to be placed in the
wrong lockdep class.

inode_init_always() sets the i_mutex lockdep class with i_mutex_key for
all inodes. If the filesystem did not change the class during inode
initialization, then the conditional mentioned above was false and the
directory inode was incorrectly left in the non-directory lockdep class.
If the filesystem did set a special lockdep class, then the conditional
mentioned above was true and that class was clobbered with
i_mutex_dir_key.

This patch removes the negation from the conditional so that the i_mutex
lockdep class is properly set for directory inodes. Special classes are
preserved and directory inodes with unmodified classes are set with
i_mutex_dir_key.
Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

978d6d8c

10 3月, 2012 2 次提交

aio: fix the "too late munmap()" race · c7b28555

由 Al Viro 提交于 3月 08, 2012

Current code has put_ioctx() called asynchronously from aio_fput_routine();
that's done *after* we have killed the request that used to pin ioctx,
so there's nothing to stop io_destroy() waiting in wait_for_all_aios()
from progressing.  As the result, we can end up with async call of
put_ioctx() being the last one and possibly happening during exit_mmap()
or elf_core_dump(), neither of which expects stray munmap() being done
to them...

We do need to prevent _freeing_ ioctx until aio_fput_routine() is done
with that, but that's all we care about - neither io_destroy() nor
exit_aio() will progress past wait_for_all_aios() until aio_fput_routine()
does really_put_req(), so the ioctx teardown won't be done until then
and we don't care about the contents of ioctx past that point.

Since actual freeing of these suckers is RCU-delayed, we don't need to
bump ioctx refcount when request goes into list for async removal.
All we need is rcu_read_lock held just over the ->ctx_lock-protected
area in aio_fput_routine().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c7b28555

aio: fix io_setup/io_destroy race · 86b62a2c

由 Al Viro 提交于 3月 07, 2012

Have ioctx_alloc() return an extra reference, so that caller would drop it
on success and not bother with re-grabbing it on failure exit.  The current
code is obviously broken - io_destroy() from another thread that managed
to guess the address io_setup() would've returned would free ioctx right
under us; gets especially interesting if aio_context_t * we pass to
io_setup() points to PROT_READ mapping, so put_user() fails and we end
up doing io_destroy() on kioctx another thread has just got freed...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

86b62a2c

07 3月, 2012 2 次提交

CIFS: Do not kmalloc under the flocks spinlock · d5751469

由 Pavel Shilovsky 提交于 3月 05, 2012

Reorganize the code to make the memory already allocated before
spinlock'ed loop.

Cc: stable@vger.kernel.org
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NPavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

d5751469

cifs: possible memory leak in xattr. · b0f8ef20

由 Santosh Nayak 提交于 3月 02, 2012

Memory is allocated irrespective of whether CIFS_ACL is configured
or not. But free is happenning only if CIFS_ACL is set. This is a
possible memory leak scenario.

Fix is:
Allocate and free memory only if CIFS_ACL is configured.
Signed-off-by: NSantosh Nayak <santoshprasadnayak@gmail.com>
Reviewed-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

b0f8ef20

06 3月, 2012 4 次提交

coredump_wait: don't call complete_vfork_done() · 57b59c4a

由 Oleg Nesterov 提交于 3月 05, 2012

Now that CLONE_VFORK is killable, coredump_wait() no longer needs
complete_vfork_done().  zap_threads() should find and kill all tasks with
the same ->mm, this includes our parent if ->vfork_done is set.

mm_release() becomes the only caller, unexport complete_vfork_done().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57b59c4a

vfork: introduce complete_vfork_done() · c415c3b4

由 Oleg Nesterov 提交于 3月 05, 2012

No functional changes.

Move the clear-and-complete-vfork_done code into the new trivial helper,
complete_vfork_done().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c415c3b4

aio: wake up waiters when freeing unused kiocbs · 880641bb

由 Jeff Moyer 提交于 3月 05, 2012

Bart Van Assche reported a hung fio process when either hot-removing
storage or when interrupting the fio process itself.  The (pruned) call
trace for the latter looks like so:

  fio             D 0000000000000001     0  6849   6848 0x00000004
   ffff880092541b88 0000000000000046 ffff880000000000 ffff88012fa11dc0
   ffff88012404be70 ffff880092541fd8 ffff880092541fd8 ffff880092541fd8
   ffff880128b894d0 ffff88012404be70 ffff880092541b88 000000018106f24d
  Call Trace:
    schedule+0x3f/0x60
    io_schedule+0x8f/0xd0
    wait_for_all_aios+0xc0/0x100
    exit_aio+0x55/0xc0
    mmput+0x2d/0x110
    exit_mm+0x10d/0x130
    do_exit+0x671/0x860
    do_group_exit+0x44/0xb0
    get_signal_to_deliver+0x218/0x5a0
    do_signal+0x65/0x700
    do_notify_resume+0x65/0x80
    int_signal+0x12/0x17

The problem lies with the allocation batching code.  It will
opportunistically allocate kiocbs, and then trim back the list of iocbs
when there is not enough room in the completion ring to hold all of the
events.

In the case above, what happens is that the pruning back of events ends
up freeing up the last active request and the context is marked as dead,
so it is thus responsible for waking up waiters.  Unfortunately, the
code does not check for this condition, so we end up with a hung task.
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Reported-by: NBart Van Assche <bvanassche@acm.org>
Tested-by: NBart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org>		[3.2.x only]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

880641bb

aout: move setup_arg_pages() prior to reading/mapping the binary · 6414fa6a

由 Al Viro 提交于 3月 05, 2012

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6414fa6a

05 3月, 2012 1 次提交

vfs: move dentry_cmp from <linux/dcache.h> to fs/dcache.c · 5483f18e

由 Linus Torvalds 提交于 3月 04, 2012

It's only used inside fs/dcache.c, and we're going to play games with it
for the word-at-a-time patches.  This time we really don't even want to
export it, because it really is an internal function to fs/dcache.c, and
has been since it was introduced.

Having it in that extremely hot header file (it's included in pretty
much everything, thanks to <linux/fs.h>) is a disaster for testing
different versions, and is utterly pointless.

We really should have some kind of header file diet thing, where we
figure out which parts of header files are really better off private and
only result in more expensive compiles.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5483f18e

03 3月, 2012 7 次提交

Btrfs: fix casting error in scrub reada code · a175423c

由 Chris Mason 提交于 2月 28, 2012

The reada code from scrub was casting down a u64 to
an unsigned long so it could insert it into a radix tree.

What it really wanted to do was cast down the result of a shift, instead
of casting down the u64.  The bug resulted in trying to insert our
reada struct into the wrong place, which caused soft lockups and other
problems.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a175423c

btrfs: fix locking issues in find_parent_nodes() · d3b01064

由 Li Zefan 提交于 3月 03, 2012

- We might unlock head->mutex while it was not locked
- We might leave the function without unlocking delayed_refs->lock
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d3b01064

vfs: export full_name_hash() function to modules · ae942ae7

由 Linus Torvalds 提交于 3月 02, 2012

Commit 5707c87f "vfs: uninline full_name_hash()" broke the modular
build, because it needs exporting now that it isn't inlined any more.
Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ae942ae7

vfs: split up name hashing in link_path_walk() into helper function · 200e9ef7

由 Linus Torvalds 提交于 3月 02, 2012

The code in link_path_walk() that finds out the length and the hash of
the next path component is some of the hottest code in the kernel. And
I have a version of it that does things at the full width of the CPU
wordsize at a time, but that means that we *really* want to split it up
into a separate helper function.

So this re-organizes the code a bit and splits the hashing part into a
helper function called "hash_name()". It returns the length of the
pathname component, while at the same time computing and writing the
hash to the appropriate location.

The code generation is slightly changed by this patch, but generally for
the better - and the added abstraction actually makes the code easier to
read too. And the new interface is well suited for replacing just the
"hash_name()" function with alternative implementations.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

200e9ef7

vfs: uninline full_name_hash() · 0145acc2

由 Linus Torvalds 提交于 3月 02, 2012

.. and also use it in lookup_one_len() rather than open-coding it.

There aren't any performance-critical users, so inlining it is silly.
But it wouldn't matter if it wasn't for the fact that the word-at-a-time
dentry name patches want to conditionally replace the function, and
uninlining it sets the stage for that.

So again, this is a preparatory patch that doesn't change any semantics,
and only prepares for a much cleaner and testable word-at-a-time dentry
name accessor patch.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0145acc2

vfs: trivial __d_lookup_rcu() cleanups · 8966be90

由 Linus Torvalds 提交于 3月 02, 2012

These don't change any semantics, but they clean up the code a bit and
mark some arguments appropriately 'const'.

They came up as I was doing the word-at-a-time dcache name accessor
code, and cleaning this up now allows me to send out a smaller relevant
interesting patch for the experimental stuff.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8966be90

regset: Prevent null pointer reference on readonly regsets · c8e25258

由 H. Peter Anvin 提交于 3月 02, 2012

The regset common infrastructure assumed that regsets would always
have .get and .set methods, but not necessarily .active methods.
Unfortunately people have since written regsets without .set methods.

Rather than putting in stub functions everywhere, handle regsets with
null .get or .set methods explicitly.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@hack.frob.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c8e25258

02 3月, 2012 1 次提交

block: Fix NULL pointer dereference in sd_revalidate_disk · fe316bf2

由 Jun'ichi Nomura 提交于 3月 02, 2012

Since 2.6.39 (1196f8b8), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.

However it ends up calling driver's revalidate_disk without open
and could cause oops.

In the case of SCSI:

  process A                  process B
  ----------------------------------------------
  sys_open
    __blkdev_get
      sd_open
        returns -ENOMEDIUM
                             scsi_remove_device
                               <scsi_device torn down>
      rescan_partitions
        sd_revalidate_disk
          <oops>
Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052

This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.
Reported-by: NHuajun Li <huajun.li.lee@gmail.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@kernel.dk>

fe316bf2

29 2月, 2012 1 次提交

ecryptfs: fix printk format warning for size_t · 164974a8

由 Randy Dunlap 提交于 2月 28, 2012

Fix printk format warning (from Linus's suggestion):

on i386:
  fs/ecryptfs/miscdev.c:433:38: warning: format '%lu' expects type 'long unsigned int', but argument 4 has type 'unsigned int'

and on x86_64:
  fs/ecryptfs/miscdev.c:433:38: warning: format '%u' expects type 'unsigned int', but argument 4 has type 'long unsigned int'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Cc:	Geert Uytterhoeven <geert@linux-m68k.org>
Cc:	Tyler Hicks <tyhicks@canonical.com>
Cc:	Dustin Kirkland <dustin.kirkland@gazzang.com>
Cc:	ecryptfs@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

164974a8

28 2月, 2012 4 次提交

GFS2: Read resource groups on mount · a365fbf3

由 Steven Whitehouse 提交于 2月 24, 2012

This makes mount take slightly longer, but at the same time, the first
write to the filesystem will be faster too. It also means that if there
is a problem in the resource index, then we can refuse to mount rather
than having to try and report that when the first write occurs.

In addition, to avoid recursive locking, we hvae to take account of
instances when the rindex glock may already be held when we are
trying to update the rbtree of resource groups.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a365fbf3

GFS2: Ensure rindex is uptodate for fallocate · 9e73f571

由 Bob Peterson 提交于 2月 17, 2012

This patch fixes a problem whereby gfs2_grow was failing and causing GFS2
to assert. The problem was that when GFS2's fallocate operation tried to
acquire an "allocation" it made sure the rindex was up to date, and if not,
it called gfs2_rindex_update. However, if the file being fallocated was
the rindex itself, it was already locked at that point. By calling
gfs2_rindex_update at an earlier point in time, we bring rindex up to date
and thereby avoid trying to lock it when the "allocation" is acquired.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

9e73f571

GFS2: Read in rindex if necessary during unlink · 718b97bd

由 Bob Peterson 提交于 2月 16, 2012

This patch fixes a problem whereby you were unable to delete
files until other file system operations were done (such as
statfs, touch, writes, etc.) that caused the rindex to be
read in.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

718b97bd

GFS2: Fix race between lru_list and glock ref count · 4043b886

由 Steven Whitehouse 提交于 1月 16, 2012

This patch fixes a narrow race window between the glock ref count
hitting zero and glocks being removed from the lru_list.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

4043b886

27 2月, 2012 2 次提交

cifs: fix dentry refcount leak when opening a FIFO on lookup · 5bccda0e

由 Jeff Layton 提交于 2月 23, 2012

The cifs code will attempt to open files on lookup under certain
circumstances. What happens though if we find that the file we opened
was actually a FIFO or other special file?

Currently, the open filehandle just ends up being leaked leading to
a dentry refcount mismatch and oops on umount. Fix this by having the
code close the filehandle on the server if it turns out not to be a
regular file. While we're at it, change this spaghetti if statement
into a switch too.

Cc: stable@vger.kernel.org
Reported-by: NCAI Qian <caiqian@redhat.com>
Tested-by: NCAI Qian <caiqian@redhat.com>
Reviewed-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

5bccda0e

CIFS: Fix mkdir/rmdir bug for the non-POSIX case · 6de2ce42

由 Pavel Shilovsky 提交于 2月 17, 2012

Currently we do inc/drop_nlink for a parent directory for every
mkdir/rmdir calls. That's wrong when Unix extensions are disabled
because in this case a server doesn't follow the same semantic and
returns the old value on the next QueryInfo request. As the result,
we update our value with the server one and then decrement it on
every rmdir call - go to negative nlink values.

Fix this by removing inc/drop_nlink for the parent directory from
mkdir/rmdir, setting it for a revalidation and ignoring NumberOfLinks
for directories when Unix extensions are disabled.
Signed-off-by: NPavel Shilovsky <piastry@etersoft.ru>
Reviewed-by: NJeff Layton <jlayton@samba.org>
Signed-off-by: NSteve French <smfrench@gmail.com>

6de2ce42

26 2月, 2012 1 次提交

autofs: work around unhappy compat problem on x86-64 · a32744d4

由 Ian Kent 提交于 2月 22, 2012

When the autofs protocol version 5 packet type was added in commit
5c0a32fc ("autofs4: add new packet type for v5 communications"), it
obvously tried quite hard to be word-size agnostic, and uses explicitly
sized fields that are all correctly aligned.

However, with the final "char name[NAME_MAX+1]" array at the end, the
actual size of the structure ends up being not very well defined:
because the struct isn't marked 'packed', doing a "sizeof()" on it will
align the size of the struct up to the biggest alignment of the members
it has.

And despite all the members being the same, the alignment of them is
different: a "__u64" has 4-byte alignment on x86-32, but native 8-byte
alignment on x86-64. And while 'NAME_MAX+1' ends up being a nice round
number (256), the name[] array starts out a 4-byte aligned.

End result: the "packed" size of the structure is 300 bytes: 4-byte, but
not 8-byte aligned.

As a result, despite all the fields being in the same place on all
architectures, sizeof() will round up that size to 304 bytes on
architectures that have 8-byte alignment for u64.

Note that this is *not* a problem for 32-bit compat mode on POWER, since
there __u64 is 8-byte aligned even in 32-bit mode. But on x86, 32-bit
and 64-bit alignment is different for 64-bit entities, and as a result
the structure that has exactly the same layout has different sizes.

So on x86-64, but no other architecture, we will just subtract 4 from
the size of the structure when running in a compat task. That way we
will write the properly sized packet that user mode expects.

Not pretty. Sadly, this very subtle, and unnecessary, size difference
has been encoded in user space that wants to read packets of *exactly*
the right size, and will refuse to touch anything else.
Reported-and-tested-by: NThomas Meyer <thomas@m3y3r.de>
Signed-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a32744d4

25 2月, 2012 1 次提交

epoll: ep_unregister_pollwait() can use the freed pwq->whead · 971316f0

由 Oleg Nesterov 提交于 2月 24, 2012

signalfd_cleanup() ensures that ->signalfd_wqh is not used, but
this is not enough. eppoll_entry->whead still points to the memory
we are going to free, ep_unregister_pollwait()->remove_wait_queue()
is obviously unsafe.

Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL,
change ep_unregister_pollwait() to check pwq->whead != NULL under
rcu_read_lock() before remove_wait_queue(). We add the new helper,
ep_remove_wait_queue(), for this.

This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because
->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand.
ep_unregister_pollwait()->remove_wait_queue() can play with already
freed and potentially reused ->sighand, but this is fine. This memory
must have the valid ->signalfd_wqh until rcu_read_unlock().
Reported-by: NMaxime Bizon <mbizon@freebox.fr>
Cc: <stable@kernel.org>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

971316f0

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功