提交 · fb045adb99d9b7c562dc7fef834857f78249daa1 · openeuler / Kernel

07 1月, 2011 2 次提交

fs: dcache reduce branches in lookup path · fb045adb

由 Nick Piggin 提交于 1月 07, 2011

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fb045adb

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

05 10月, 2010 1 次提交

BKL: Remove BKL from fat · 3768744c

由 Arnd Bergmann 提交于 9月 14, 2010

The lock_kernel in fat_put_super is not needed because
it only protects the super block itself and we know that
no other thread can reach it because we are about to
kfree the object.

In the two fill_super functions, this converts the locking
to use lock_super like elsewhere in the fat code. This
is probably not needed either, but is consistent and puts
us on the safe side.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Jan Blunck <jblunck@infradead.org>

3768744c

10 8月, 2010 3 次提交

A
covert fatfs to ->evict_inode() · deee3ce4
由 Al Viro 提交于 6月 05, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
deee3ce4

get rid of cont_write_begin_newtrunc · 282dc178

由 Christoph Hellwig 提交于 6月 04, 2010

Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to cont_write_begin.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

282dc178

sort out blockdev_direct_IO variants · eafdc7d1

由 Christoph Hellwig 提交于 6月 04, 2010

Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eafdc7d1

28 5月, 2010 1 次提交

fat: convert to use the new truncate convention. · 459f6ed3

由 npiggin@suse.de 提交于 5月 27, 2010

Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

459f6ed3

25 5月, 2010 1 次提交

fatfs: ratelimit corruption report · aaa04b48

由 OGAWA Hirofumi 提交于 5月 24, 2010

Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aaa04b48

16 3月, 2010 1 次提交

fat: Cleanup nls_unload() usage · 1bdb6f91

由 OGAWA Hirofumi 提交于 3月 16, 2010

Other users doesn't check NULL explicitly. So, these doesn't also
check to remove inconsistency.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

1bdb6f91

06 3月, 2010 1 次提交

pass writeback_control to ->write_inode · a9185b41

由 Christoph Hellwig 提交于 3月 05, 2010

This gives the filesystem more information about the writeback that
is happening.  Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a9185b41

10 2月, 2010 1 次提交

fat: Fix stat->f_namelen · eeb5b4ae

由 Kevin Dankwardt 提交于 2月 10, 2010

I found that the length of a file name when created cannot exceed 255
characters, yet, pathconf(), via statfs(), returns the maximum as 260.
Signed-off-by: NKevin Dankwardt <k@kcomputing.com>
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

eeb5b4ae

21 11月, 2009 1 次提交

fat: make discard a mount option · 681142f9

由 Christoph Hellwig 提交于 11月 21, 2009

Currently shipping discard capable SSDs and arrays have rather sub-optimal
implementations of the command and can the use of it can cause massive
slowdowns.  Make issueing these commands option as it's already in btrfs
and gfs2.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
[hirofumi@mail.parknet.co.jp: tweaks, and add "discard" to fat_show_options]
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

681142f9

24 9月, 2009 1 次提交

fs: Make unload_nls() NULL pointer safe · 6d729e44

由 Thomas Gleixner 提交于 8月 16, 2009

Most call sites of unload_nls() do:
	if (nls)
		unload_nls(nls);

Check the pointer inside unload_nls() like we do in kfree() and
simplify the call sites.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Steve French <sfrench@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Petr Vandrovec <vandrove@vc.cvut.cz>
Cc: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6d729e44

20 9月, 2009 1 次提交

fat: Check s_dirt in fat_sync_fs() · ed248b29

由 OGAWA Hirofumi 提交于 9月 20, 2009

If we didn't check sb->s_dirt, it will update the FSINFO
unconditionally. It will reduce the filetime of flash base device.

So, this checks sb->s_dirt. sb->s_dirt is racy, however FSINFO is just
hint. So even if there is race, and we hit it, it would not become big
problem.

And this also is as workaround of suspend problem.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

ed248b29

01 8月, 2009 1 次提交

vfat: change the default from shortname=lower to shortname=mixed · 95523475

由 Paul Wise 提交于 8月 01, 2009

Because, with "shortname=lower", copying one FAT filesystem tree to
another FAT filesystem tree using Linux results in semantically
different filesystems. (E.g.: Filenames which were once "all
uppercase" are now "all lowercase").

So, this changes the default of "shortname=lower" to "shortname=mixed".
Signed-off-by: NPaul Wise <pabs3@bonedaddy.net>
[change fat_show_options()]
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

95523475

20 6月, 2009 1 次提交

fat: Fix the removal of opts->fs_dmask · 3e107603

由 OGAWA Hirofumi 提交于 6月 20, 2009

(ce3b0f8d: New helper - current_umask())
is removing the opts->fs_dmask, probably it's a cut-and-paste
miss or something.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

3e107603

12 6月, 2009 5 次提交

fat: add ->sync_fs · f83d6d46

由 Christoph Hellwig 提交于 6月 08, 2009

Add a ->sync_fs method for data integrity syncs.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f83d6d46

Sanitize ->fsync() for FAT · b522412a

由 Al Viro 提交于 6月 07, 2009

* mark directory data blocks as assoc. metadata
* add new inode to deal with FAT, mark FAT blocks as assoc. metadata of that
* now ->fsync() is trivial both for files and directories
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b522412a

->write_super lock_super pushdown · ebc1ac16

由 Christoph Hellwig 提交于 5月 11, 2009

Push down lock_super into ->write_super instances and remove it from the
caller.

Following filesystem don't need ->s_lock in ->write_super and are skipped:

 * bfs, nilfs2 - no other uses of s_lock and have internal locks in
	->write_super
 * ext2 - uses BKL in ext2_write_super and has internal calls without s_lock
 * reiserfs - no other uses of s_lock as has reiserfs_write_lock (BKL) in
 	->write_super
 * xfs - no other uses of s_lock and uses internal lock (buffer lock on
	superblock buffer) to serialize ->write_super.  Also xfs_fs_write_super
	is superflous and will go away in the next merge window
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ebc1ac16

push BKL down into ->put_super · 6cfd0148

由 Christoph Hellwig 提交于 5月 05, 2009

Move BKL into ->put_super from the only caller.  A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment.  Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.

[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6cfd0148

remove ->write_super call in generic_shutdown_super · 8c85e125

由 Christoph Hellwig 提交于 4月 28, 2009

We just did a full fs writeout using sync_filesystem before, and if
that's not enough for the filesystem it can perform it's own writeout
in ->put_super, which many filesystems already do.

Move a call to foofs_write_super into every foofs_put_super for now to
guarantee identical behaviour until it's cleaned up by the individual
filesystem maintainers.

Exceptions:

 - affs already has identical copy & pasted code at the beginning of
   affs_put_super so no need to do it twice.
 - xfs does the right thing without it and I have changes pending for
   the xfs tree touching this are so I don't really need conflicts
   here..
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8c85e125

04 6月, 2009 1 次提交

FAT: add 'errors' mount option · 85c78591

由 Denis Karpov 提交于 6月 04, 2009

On severe errors FAT remounts itself in read-only mode. Allow to
specify FAT fs desired behavior through 'errors' mount option:
panic, continue or remount read-only.

`mount -t [fat|vfat] -o errors=[panic,remount-ro,continue] \
	<bdev> <mount point>`

This is analog to ext2 fs 'errors' mount option.
Signed-off-by: NDenis Karpov <ext-denis.2.karpov@nokia.com>
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

85c78591

03 4月, 2009 1 次提交

fs/fat: return f_fsid for statfs(2) · aac49b75

由 Coly Li 提交于 4月 02, 2009

Make fat return f_fsid info for statfs(2).
Signed-off-by: NColy Li <coly.li@suse.de>
Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aac49b75

01 4月, 2009 1 次提交

New helper - current_umask() · ce3b0f8d

由 Al Viro 提交于 3月 29, 2009

current->fs->umask is what most of fs_struct users are doing.
Put that into a helper function.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ce3b0f8d

12 3月, 2009 1 次提交

Fix _fat_bmap() locking · 3a95ea11

由 OGAWA Hirofumi 提交于 3月 12, 2009

On swapon() path, it has already i_mutex. So, this uses i_alloc_sem
instead of it.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: NLaurent GUERBY <laurent@guerby.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3a95ea11

14 11月, 2008 1 次提交

CRED: Wrap task credential accesses in the FAT filesystem · f0ce7ee3

由 David Howells 提交于 11月 14, 2008

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id().  In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NJames Morris <jmorris@namei.org>

f0ce7ee3

12 11月, 2008 1 次提交

fat: make sure to set d_ops in fat_get_parent · 5a6bb103

由 Christoph Hellwig 提交于 11月 12, 2008

fat_get_parent needs to setup the dentry operations, otherwise we might
lose them when the NFS server needs to reconnect out of cache inodes.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

5a6bb103

07 11月, 2008 9 次提交

fat: ->i_pos race fix · 9ca59f4c

由 OGAWA Hirofumi 提交于 11月 06, 2008

i_pos is 64bits value, hence it's not atomic to update.

Important place is fat_write_inode() only, other places without lock
are just for printk().

This adds lock for "BITS_PER_LONG == 32" kernel.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ca59f4c

fat: mmu_private race fix · 2bdf67eb

由 OGAWA Hirofumi 提交于 11月 06, 2008

mmu_private is 64bits value, hence it's not atomic to update.

So, the access rule for mmu_private is we must hold ->i_mutex. But,
fat_get_block() path doesn't follow the rule on non-allocation path.

This fixes by using i_size instead if non-allocation path.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2bdf67eb

fat: Fix _fat_bmap() race · fa93ca18

由 OGAWA Hirofumi 提交于 11月 06, 2008

fat_get_cluster() assumes the requested blocknr isn't truncated during
read. _fat_bmap() doesn't follow this rule.

This protects it by ->i_mutex.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa93ca18

fat: Fix ATTR_RO for directory · dfc209c0

由 OGAWA Hirofumi 提交于 11月 06, 2008

FAT has the ATTR_RO (read-only) attribute. But on Windows, the ATTR_RO
of the directory will be just ignored actually, and is used by only
applications as flag. E.g. it's setted for the customized folder by
Explorer.

http://msdn2.microsoft.com/en-us/library/aa969337.aspx

This adds "rodir" option. If user specified it, ATTR_RO is used as
read-only flag even if it's the directory. Otherwise, inode->i_mode
is not used to hold ATTR_RO (i.e. fat_mode_can_save_ro() returns 0).
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dfc209c0

fat: Cleanup FAT attribute stuff · 9c0aa1b8

由 OGAWA Hirofumi 提交于 11月 06, 2008

This adds three helpers:

fat_make_attrs() - makes FAT attributes from inode.
fat_make_mode()  - makes mode_t from FAT attributes.
fat_save_attrs() - saves FAT attributes to inode.

Then this replaces: MSDOS_MKMODE() by fat_make_mode(), fat_attr() by
fat_make_attrs(), ->i_attrs = attr & ATTR_UNUSED by fat_save_attrs().
And for root inode, those is used with ATTR_DIR instead of bogus
ATTR_NONE.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c0aa1b8

fat: use fat_detach() in fat_clear_inode() · a993b542

由 OGAWA Hirofumi 提交于 11月 06, 2008

Use fat_detach() instead of opencoding it.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a993b542

fat: improve fat_hash() · d3dfa822

由 OGAWA Hirofumi 提交于 11月 06, 2008

fat_hash() is using the algorithm known as bad. Instead of it, this
uses hash_32(). The following is the summary of test.

old hash:
	hash func (1000 times): 33489 cycles
	total inodes in hash table: 70926
	largest bucket contains: 696
	smallest bucket contains: 54

new hash:
	hash func (1000 times): 33129 cycles
	total inodes in hash table: 70926
	largest bucket contains: 315
	smallest bucket contains: 236
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d3dfa822

fat: Fix and cleanup timestamp conversion · 7decd1cb

由 OGAWA Hirofumi 提交于 11月 06, 2008

This cleans date_dos2unix()/fat_date_unix2dos() up. New code should be
much more readable.

And this fixes those old functions. Those doesn't handle 2100
correctly. 2100 isn't leap year, but old one handles it as leap year.
Also, with this, centi sec is handled and is fixed.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7decd1cb

fat: split include/msdos_fs.h · 9e975dae

由 OGAWA Hirofumi 提交于 11月 06, 2008

This splits __KERNEL__ stuff in include/msdos_fs.h into fs/fat/fat.h.
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9e975dae

31 10月, 2008 1 次提交

fs: remove prepare_write/commit_write · 4e02ed4b

由 Nick Piggin 提交于 10月 29, 2008

Nothing uses prepare_write or commit_write. Remove them from the tree
completely.

[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e02ed4b

23 10月, 2008 1 次提交

[PATCH] switch all filesystems over to d_obtain_alias · 44003728

由 Christoph Hellwig 提交于 8月 11, 2008

Switch all users of d_alloc_anon to d_obtain_alias.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

44003728

14 10月, 2008 1 次提交

vfs: Use const for kernel parser table · a447c093

由 Steven Whitehouse 提交于 10月 13, 2008

This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.

This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a447c093

20 8月, 2008 1 次提交

vfat: fix 'sync' mount deadlock due to BKL->lock_super conversion · 5f22ca9b

由 Linus Torvalds 提交于 8月 20, 2008

There was another FAT BKL conversion deadlock reported by Bart
Trojanowski due to the BKL being used as a recursive lock by FAT, which
was missed because it only triggers with 'sync' (or 'dirsync') mounts.

The recursion worked for the BKL, but after the conversion to lock_super
(which uses a mutex), it just deadlocks.

Thanks to Bart for debugging this and testing the fix.  The lock
debugging information from the original report:

  =============================================
  [ INFO: possible recursive locking detected ]
  2.6.27-rc3-bisect-00448-ga7f5aaf3 #16
  ---------------------------------------------
  mv/4020 is trying to acquire lock:
   (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  but task is already holding lock:
   (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  other info that might help us debug this:
  3 locks held by mv/4020:
   #0:  (&sb->s_type->i_mutex_key#9/1){--..}, at: [<c01b2336>] do_unlinkat+0x66/0x140
   #1:  (&sb->s_type->i_mutex_key#9){--..}, at: [<c01b0954>] vfs_unlink+0x84/0x110
   #2:  (&type->s_lock_key#9){--..}, at: [<c01a90fe>] lock_super+0x1e/0x20

  stack backtrace:
  Pid: 4020, comm: mv Not tainted 2.6.27-rc3-bisect-00448-ga7f5aaf3 #16
   [<c014e694>] validate_chain+0x984/0xea0
   [<c0108d70>] ? native_sched_clock+0x0/0xf0
   [<c014ee9c>] __lock_acquire+0x2ec/0x9b0
   [<c014f5cf>] lock_acquire+0x6f/0x90
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c044e5fd>] mutex_lock_nested+0xad/0x300
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c01a90fe>] ? lock_super+0x1e/0x20
   [<c01a90fe>] lock_super+0x1e/0x20
   [<f8b3a700>] fat_write_inode+0x60/0x2b0 [fat]
   [<c0450878>] ? _spin_unlock_irqrestore+0x48/0x80
   [<f8b3a953>] ? fat_sync_inode+0x3/0x20 [fat]
   [<f8b3a962>] fat_sync_inode+0x12/0x20 [fat]
   [<f8b37c7e>] fat_remove_entries+0xbe/0x120 [fat]
   [<f8b422ef>] vfat_unlink+0x5f/0x90 [vfat]
   [<f8b42290>] ? vfat_unlink+0x0/0x90 [vfat]
   [<c01b0968>] vfs_unlink+0x98/0x110
   [<c01b2400>] do_unlinkat+0x130/0x140
   [<c016a8f5>] ? audit_syscall_entry+0x105/0x150
   [<c01b253b>] sys_unlinkat+0x3b/0x40
   [<c01040d3>] sysenter_do_call+0x12/0x3f
   =======================

where the deadlock is due to the nesting of lock_super from vfat_unlink
to fat_write_inode:

 - do_unlinkat
   - vfs_unlink
     - vfat_unlink
       * lock_super
       - fat_remove_entries
         - fat_sync_inode
           - fat_write_inode
             * lock_super

and the fix is to simply remove the use of lock_super() in fat_write_inode.

The lock_super() there had been just an automatic conversion of the
kernel lock to the superblock lock, but no locking was actually needed
there, since the code in fat_write_inode already protected all relevant
accesses with a spinlock (sbi->inode_hash_lock to be exact).  The only
code inside the BKL (and thus the superblock lock) was accesses tp local
variables or calls to functions that have long been SMP-safe (i.e.
sb_bread, mark_buffe_dirty and brlese).

Bart reports:
 "Looks good.  I ran 10 parallel processes creating 1M files truncating
  them, writing to them again and then deleting them.  This patch fixes
  the issue I ran into.

  Signed-off-by: Bart Trojanowski <bart@jukie.net>"
Reported-and-tested-by: NBart Trojanowski <bart@jukie.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5f22ca9b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功