提交 · 39a4bade8c1826b658316d66ee81c09b0a4d7d42 · openanolis / cloud-kernel

17 5月, 2010 6 次提交

ext4: Show journal_checksum option · 39a4bade

由 Jan Kara 提交于 5月 16, 2010

We failed to show journal_checksum option in /proc/mounts. Fix it.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

39a4bade

ext4: Fix for ext4_mb_collect_stats() · 291dae47

由 Curt Wohlgemuth 提交于 5月 16, 2010

Fix ext4_mb_collect_stats() to use the correct test for s_bal_success; it
should be testing "best-extent.fe_len >= orig-extent.fe_len" , not
"orig-extent.fe_len >= goal-extent.fe_len" .
Signed-off-by: NCurt Wohlgemuth <curtw@google.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

291dae47

ext4: check for a good block group before loading buddy pages · 8a57d9d6

由 Curt Wohlgemuth 提交于 5月 16, 2010

This adds a new field in ext4_group_info to cache the largest available
block range in a block group; and don't load the buddy pages until *after*
we've done a sanity check on the block group.

With large allocation requests (e.g., fallocate(), 8MiB) and relatively full
partitions, it's easy to have no block groups with a block extent large
enough to satisfy the input request length.  This currently causes the loop
during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages
for EVERY block group.  That can be a lot of pages.  The patch below allows
us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we
have check again after we lock the block group).

Addresses-Google-Bug: #2578108
Addresses-Google-Bug: #2704453
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8a57d9d6

ext4: Prevent creation of files larger than RLIMIT_FSIZE using fallocate · 6d19c42b

由 Nikanth Karthikesan 提交于 5月 16, 2010

Currently using posix_fallocate one can bypass an RLIMIT_FSIZE limit
and create a file larger than the limit. Add a check for that.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NAmit Arora <aarora@in.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6d19c42b

ext4: Remove extraneous newlines in ext4_msg() calls · fbe845dd

由 Curt Wohlgemuth 提交于 5月 16, 2010

Addresses-Google-Bug: #2562325
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fbe845dd

ext4: Print mount options in when mounting and add a remount message · d4c402d9

由 Curt Wohlgemuth 提交于 5月 16, 2010

This adds a "re-mounted" message to ext4_remount(), and both it and
the mount message in ext4_fill_super() now have the original mount
options data string.
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d4c402d9

16 5月, 2010 13 次提交

ext4: don't use quota reservation for speculative metadata · 72b8ab9d

由 Eric Sandeen 提交于 5月 16, 2010

Because we can badly over-reserve metadata when we
calculate worst-case, it complicates things for quota, since
we must reserve and then claim later, retry on EDQUOT, etc.
Quota is also a generally smaller pool than fs free blocks,
so this over-reservation hurts more, and more often.

I'm of the opinion that it's not the worst thing to allow
metadata to push a user slightly over quota.  This simplifies
the code and avoids the false quota rejections that result
from worst-case speculation.

This patch stops the speculative quota-charging for
worst-case metadata requirements, and just charges quota
when the blocks are allocated at writeout.  It also is
able to remove the try-again loop on EDQUOT.

This patch has been tested indirectly by running the xfstests
suite with a hack to mount & enable quota prior to the test.

I also did a more specific test of fragmenting freespace
and then doing a large delalloc write under quota; quota
stopped me at the right amount of file IO, and then the
writeout generated enough metadata (due to the fragmentation)
that it put me slightly over quota, as expected.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

72b8ab9d

quota: add the option to not fail with EDQUOT in block · 0e05842b

由 Eric Sandeen 提交于 5月 16, 2010

To simplify metadata tracking for delalloc writes, ext4
will simply claim metadata blocks at allocation time, without
first speculatively reserving the worst case and then freeing
what was not used.

To do this, we need a mechanism to track allocations in
the quota subsystem, but potentially allow that allocation
to actually go over quota.

This patch adds a DQUOT_SPACE_NOFAIL flag and function
variants for this purpose.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0e05842b

quota: use flags interface for dquot alloc/free space · 56246f9a

由 Eric Sandeen 提交于 5月 16, 2010

Switch __dquot_alloc_space and __dquot_free_space to take flags
to indicate whether to warn and/or to reserve (or free reserve).

This is slightly more readable at the callpoints, and makes it
cleaner to add a "nofail" option in the next patch.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

56246f9a

ext4: init statistics after journal recovery · 84061e07

由 Dmitry Monakhov 提交于 5月 16, 2010

Currently block/inode/dir counters initialized before journal was
recovered. In fact after journal recovery this info will probably
change. And freeblocks it critical for correct delalloc mode
accounting.

https://bugzilla.kernel.org/show_bug.cgi?id=15768Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

84061e07

ext4: clean up inode bitmaps manipulation in ext4_free_inode · d17413c0

由 Dmitry Monakhov 提交于 5月 16, 2010

- Reorganize locking scheme to batch two atomic operation in to one.
  This also allow us to state what healthy group must obey following rule
  ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM);
- Fix possible undefined pointer dereference.
- Even if group descriptor stats aren't accessible we have to update
  inode bitmaps.
- Move non-group members update out of group_lock.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d17413c0

ext4: Do not zero out uninitialized extents beyond i_size · 21ca087a

由 Dmitry Monakhov 提交于 5月 16, 2010

The extents code will sometimes zero out blocks and mark them as
initialized instead of splitting an extent into several smaller ones.
This optimization however, causes problems if the extent is beyond
i_size because fsck will complain if there are uninitialized blocks
after i_size as this can not be distinguished from an inode that has
an incorrect i_size field.

https://bugzilla.kernel.org/show_bug.cgi?id=15742Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

21ca087a

jbd2: Improve scalability by not taking j_state_lock in jbd2_journal_stop() · c35a56a0

由 Theodore Ts'o 提交于 5月 16, 2010

One of the most contended locks in the jbd2 layer is j_state_lock when
running dbench.  This is especially true if using the real-time kernel
with its "sleeping spinlocks" patch that replaces spinlocks with
priority inheriting mutexes --- but it also shows up on large SMP
benchmarks.

Thanks to John Stultz for pointing this out.

Reviewed by Mingming Cao and Jan Kara.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c35a56a0

ext4: don't scan/accumulate more pages than mballoc will allocate · c445e3e0

由 Eric Sandeen 提交于 5月 16, 2010

There was a bug reported on RHEL5 that a 10G dd on a 12G box
had a very, very slow sync after that.

At issue was the loop in write_cache_pages scanning all the way
to the end of the 10G file, even though the subsequent call
to mpage_da_submit_io would only actually write a smallish amt; then
we went back to the write_cache_pages loop ... wasting tons of time
in calling __mpage_da_writepage for thousands of pages we would
just revisit (many times) later.

Upstream it's not such a big issue for sys_sync because we get
to the loop with a much smaller nr_to_write, which limits the loop.

However, talking with Aneesh he realized that fsync upstream still
gets here with a very large nr_to_write and we face the same problem.

This patch makes mpage_add_bh_to_extent stop the loop after we've
accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately
causes the write_cache_pages loop to break.

Repeating the test with a dirty_ratio of 80 (to leave something for
fsync to do), I don't see huge IO performance gains, but the reduction
in cpu usage is striking: 80% usage with stock, and 2% with the
below patch.  Instrumenting the loop in write_cache_pages clearly
shows that we are wasting time here.

Eventually we need to change mpage_da_map_pages() also submit its I/O
to the block layer, subsuming mpage_da_submit_io(), and then change it
call ext4_get_blocks() multiple times.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c445e3e0

ext4: stop issuing discards if not supported by device · a30eec2a

由 Eric Sandeen 提交于 5月 16, 2010

Turn off issuance of discard requests if the device does
not support it - similar to the action we take for barriers.
This will save a little computation time if a non-discardable
device is mounted with -o discard, and also makes it obvious
that it's not doing what was asked at mount time ...
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

a30eec2a

ext4: don't return to userspace after freezing the fs with a mutex held · 6b0310fb

由 Eric Sandeen 提交于 5月 16, 2010

ext4_freeze() used jbd2_journal_lock_updates() which takes
the j_barrier mutex, and then returns to userspace.  The
kernel does not like this:

================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
lvcreate/1075 is leaving the kernel with locks still held!
1 lock held by lvcreate/1075:
 #0:  (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>]
jbd2_journal_lock_updates+0xe1/0xf0

Use vfs_check_frozen() added to ext4_journal_start_sb() and
ext4_force_commit() instead.

Addresses-Red-Hat-Bugzilla: #568503
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

6b0310fb

ext4: symlink must be handled via filesystem specific operation · 256a4535

由 Dmitry Monakhov 提交于 5月 16, 2010

generic setattr implementation is no longer responsible for
quota transfer so synlinks must be handled via ext4_setattr.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

256a4535

ext4: check s_log_groups_per_flex in online resize code · 42007efd

由 Eric Sandeen 提交于 5月 16, 2010

If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out,
and every other access to this first tests s_log_groups_per_flex;
same thing needs to happen in resize or we'll wander off into
a null pointer when doing an online resize of the file system.

Thanks to Christoph Biedl, who came up with the trivial testcase:

# truncate --size 128M fsfile
# mkfs.ext3 -F fsfile
# tune2fs -O extents,uninit_bg,dir_index,flex_bg,huge_file,dir_nlink,extra_isize fsfile
# e2fsck -yDf -C0 fsfile
# truncate --size 132M fsfile
# losetup /dev/loop0 fsfile
# mount /dev/loop0 mnt
# resize2fs -p /dev/loop0

	https://bugzilla.kernel.org/show_bug.cgi?id=13549Reported-by: NAlessandro Polverini <alex@nibbles.it>
Test-case-by: NChristoph Biedl  <bugzilla.kernel.bpeb@manchmal.in-ulm.de>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

42007efd

ext4: fix quota accounting in case of fallocate · 35121c98

由 Dmitry Monakhov 提交于 5月 16, 2010

allocated_meta_data is already included in 'used' variable.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

35121c98

15 5月, 2010 1 次提交

ext4: allow defrag (EXT4_IOC_MOVE_EXT) in 32bit compat mode · b684b2ee

由 Christian Borntraeger 提交于 5月 15, 2010

I have an x86_64 kernel with i386 userspace. e4defrag fails on the
EXT4_IOC_MOVE_EXT ioctl because it is not wired up for the compat
case. It seems that struct move_extent is compat save, only types
with fixed widths are used:
{
        __u32 reserved;         /* should be zero */
        __u32 donor_fd;         /* donor file descriptor */
        __u64 orig_start;       /* logical start offset in block for orig */
        __u64 donor_start;      /* logical start offset in block for donor */
        __u64 len;              /* block length to be moved */
        __u64 moved_len;        /* moved block length */
};

Lets just wire up EXT4_IOC_MOVE_EXT for the compat case.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
CC: Akira Fujita <a-fujita@rs.jp.nec.com>

b684b2ee

14 5月, 2010 1 次提交

ext4: rename ext4_mb_release_desc() to ext4_mb_unload_buddy() · e39e07fd

由 Jing Zhang 提交于 5月 14, 2010

This function cleans up after ext4_mb_load_buddy(), so the renaming
makes the code clearer.
Signed-off-by: NJing Zhang <zj.barak@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e39e07fd

13 5月, 2010 1 次提交
- J
  ext4: Remove unnecessary call to ext4_get_group_desc() in mballoc · 62e823a2
  由 Jing Zhang 提交于 5月 13, 2010
```
Signed-off-by: NJing Zhang <zj.barak@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
  62e823a2
12 5月, 2010 1 次提交

ext4: fix memory leaks in error path handling of ext4_ext_zeroout() · b720303d

由 Jing Zhang 提交于 5月 12, 2010

When EIO occurs after bio is submitted, there is no memory free
operation for bio, which results in memory leakage. And there is also
no check against bio_alloc() for bio.
Acked-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
Signed-off-by: NJing Zhang <zj.barak@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

b720303d

11 5月, 2010 1 次提交

ext4: Fix coding style in fs/ext4/move_extent.c · c26d0bad

由 liuqi_123 提交于 5月 11, 2010

Making sure ee_block is initialized to zero to prevent gcc from
kvetching.  It's harmless (although it's not obvious that it's
harmless) from code inspection:

fs/ext4/move_extent.c:478: warning: 'start_ext.ee_block' may be used
uninitialized in this function

Thanks to Stefan Richter for first bringing this to the attention of
linux-ext4@vger.kernel.org.
Signed-off-by: LiuQi <lingjiujianke@gmail.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>

c26d0bad

10 5月, 2010 1 次提交
- D
  ext4: check missed return value in ext4_sync_file() · 0671e704
  由 Dmitry Monakhov 提交于 5月 10, 2010
```
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
  0671e704
04 5月, 2010 12 次提交

ocfs2: Avoid a gcc warning in ocfs2_wipe_inode(). · d577632e

由 Joel Becker 提交于 5月 03, 2010

gcc warns that a variable is uninitialized. It's actually handled, but
an early return fools gcc. Let's just initialize the variable to a
garbage value that will crash if the usage is ever broken.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

d577632e

ceph: remove bad auth_x kmem_cache · b0930f8d

由 Sage Weil 提交于 4月 29, 2010

It's useless, since our allocations are already a power of 2. And it was
allocated per-instance (not globally), which caused a name collision when
we tried to mount a second file system with auth_x enabled.
Signed-off-by: NSage Weil <sage@newdream.net>

b0930f8d

ceph: fix lockless caps check · 7ff899da

由 Sage Weil 提交于 4月 23, 2010

The __ variant requires caller to hold i_lock.
Signed-off-by: NSage Weil <sage@newdream.net>

7ff899da

ceph: clear dir complete, invalidate dentry on replayed rename · ea1409f9

由 Sage Weil 提交于 4月 28, 2010

If a rename operation is resent to the MDS following an MDS restart, the
client does not get a full reply (containing the resulting metadata) back.
In that case, a ceph_rename() needs to compensate by doing anything useful
that fill_inode() would have, like d_move().

It also needs to invalidate the dentry (to workaround the vfs_rename_dir()
bug) and clear the dir complete flag, just like fill_trace().
Signed-off-by: NSage Weil <sage@newdream.net>

ea1409f9

ceph: fix direct io truncate offset · 5c6a2cdb

由 Sage Weil 提交于 4月 22, 2010

truncate_inode_pages_range wants the end offset to align with the last byte
in a page.
Signed-off-by: NSage Weil <sage@newdream.net>

5c6a2cdb

ceph: discard incoming messages with bad seq # · ae18756b

由 Sage Weil 提交于 4月 22, 2010

We can get old message seq #'s after a tcp reconnect for stateful sessions
(i.e., the MDS).  If we get a higher seq #, that is an error, and we
shouldn't see any bad seq #'s for stateless (mon, osd) connections.
Signed-off-by: NSage Weil <sage@newdream.net>

ae18756b

ceph: fix seq counting for skipped messages · 684be25c

由 Sage Weil 提交于 4月 21, 2010

Increment in_seq even when the message is skipped for some reason.
Signed-off-by: NSage Weil <sage@newdream.net>

684be25c

S
ceph: add missing #includes · d45d0d97
由 Sage Weil 提交于 4月 20, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
d45d0d97
S
ceph: fix leaked spinlock during mds reconnect · 0b0c06d1
由 Sage Weil 提交于 4月 20, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
0b0c06d1

ceph: print more useful version info on module load · c8f16584

由 Sage Weil 提交于 4月 19, 2010

Decouple the client version from the server side.  Print relevant protocol
and map version info instead.
Signed-off-by: NSage Weil <sage@newdream.net>

c8f16584

ceph: fix snap realm splits · 91dee39e

由 Sage Weil 提交于 4月 19, 2010

The snap realm split was checking i_snap_realm, not the list_head, to
determine if an inode belonged in the new realm.  The check always failed,
which meant we always moved the inode, corrupting the old realm's list and
causing various crashes.

Also wait to release old realm reference to avoid possibility of use after
free.
Signed-off-by: NSage Weil <sage@newdream.net>

91dee39e

ceph: clear dir complete on d_move · c10f5e12

由 Sage Weil 提交于 4月 16, 2010

d_move() reorders the d_subdirs list, breaking the readdir result caching.
Unless/until d_move preserves that ordering, clear CEPH_I_COMPLETE on
rename.
Signed-off-by: NSage Weil <sage@newdream.net>

c10f5e12

03 5月, 2010 1 次提交

nilfs2: fix sync silent failure · 973bec34

由 Ryusuke Konishi 提交于 5月 03, 2010

As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set.
And nilfs does not set s_bdi anywhere.  I noticed this problem by the
warning introduced by the recent commit 5129a469 ("Catch filesystem
lacking s_bdi").

 WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e()
 Hardware name: PowerEdge 2850
 Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas
 Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38
 Call Trace:
  [<c1028422>] warn_slowpath_common+0x60/0x90
  [<c102845f>] warn_slowpath_null+0xd/0x10
  [<c1095936>] vfs_kern_mount+0xc5/0x14e
  [<c1095a03>] do_kern_mount+0x32/0xbd
  [<c10a811e>] do_mount+0x671/0x6d0
  [<c1073794>] ? __get_free_pages+0x1f/0x21
  [<c10a684f>] ? copy_mount_options+0x2b/0xe2
  [<c107b634>] ? strndup_user+0x48/0x67
  [<c10a81de>] sys_mount+0x61/0x8f
  [<c100280c>] sysenter_do_call+0x12/0x32

This ensures to set s_bdi for nilfs and fixes the sync silent failure.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

973bec34

02 5月, 2010 2 次提交

NFS: Fix RCU issues in the NFSv4 delegation code · 17d2c0a0

由 David Howells 提交于 5月 01, 2010

Fix a number of RCU issues in the NFSv4 delegation code.

 (1) delegation->cred doesn't need to be RCU protected as it's essentially an
     invariant refcounted structure.

     By the time we get to nfs_free_delegation(), the delegation is being
     released, so no one else should be attempting to use the saved
     credentials, and they can be cleared.

     However, since the list of delegations could still be under traversal at
     this point by such as nfs_client_return_marked_delegations(), the cred
     should be released in nfs_do_free_delegation() rather than in
     nfs_free_delegation().  Simply using rcu_assign_pointer() to clear it is
     insufficient as that doesn't stop the cred from being destroyed, and nor
     does calling put_rpccred() after call_rcu(), given that the latter is
     asynchronous.

 (2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use
     rcu_derefence_protected() because they can only be called if
     nfs_client::cl_lock is held, and that guards against anyone changing
     nfsi->delegation under it.  Furthermore, the barrier imposed by
     rcu_dereference() is superfluous, given that the spin_lock() is also a
     barrier.

 (3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client
     struct so that it can issue lockdep advice based on clp->cl_lock for (2).

 (4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation()
     should use rcu_access_pointer() outside the spinlocked region as they
     merely examine the pointer and don't follow it, thus rendering unnecessary
     the need to impose a partial ordering over the one item of interest.

     These result in an RCU warning like the following:

[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by mount.nfs4/2281:
 #0:  (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80
 #1:  (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a

stack backtrace:
Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110
Call Trace:
 [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs]
 [<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs]
 [<ffffffff810c2d92>] clear_inode+0x9e/0xf8
 [<ffffffff810c3028>] dispose_list+0x67/0x10e
 [<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a
 [<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4
 [<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f
 [<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs]
 [<ffffffff810b25bc>] deactivate_super+0x68/0x80
 [<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8
 [<ffffffff810c681b>] release_mounts+0x9a/0xb0
 [<ffffffff810c689b>] put_mnt_ns+0x6a/0x79
 [<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs]
 [<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs]
 [<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs]
 [<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs]
 [<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177
 [<ffffffff810b2176>] do_kern_mount+0x48/0xe8
 [<ffffffff810c810b>] do_mount+0x782/0x7f9
 [<ffffffff810c8205>] sys_mount+0x83/0xbe
 [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b

Also on:

fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection!
 [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs]
 [<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs]
 [<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs]
 ...

And:

fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection!
 [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
 [<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs]
 [<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs]
 [<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs]
 [<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs]
 ...
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

17d2c0a0

NFSv4: Fix the locking in nfs_inode_reclaim_delegation() · 8f649c37

由 Trond Myklebust 提交于 5月 01, 2010

Ensure that we correctly rcu-dereference the delegation itself, and that we
protect against removal while we're changing the contents.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

8f649c37

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功