提交 · 772cb7c83ba256a11c7bf99a11bef3858d23767c · openeuler / Kernel

12 7月, 2008 2 次提交

ext4: New inode allocation for FLEX_BG meta-data groups. · 772cb7c8

由 Jose R. Santos 提交于 7月 11, 2008

This patch mostly controls the way inode are allocated in order to
make ialloc aware of flex_bg block group grouping.  It achieves this
by bypassing the Orlov allocator when block group meta-data are packed
toghether through mke2fs.  Since the impact on the block allocator is
minimal, this patch should have little or no effect on other block
allocation algorithms. By controlling the inode allocation, it can
basically control where the initial search for new block begins and
thus indirectly manipulate the block allocator.

This allocator favors data and meta-data locality so the disk will
gradually be filled from block group zero upward.  This helps improve
performance by reducing seek time.  Since the group of inode tables
within one flex_bg are treated as one giant inode table, uninitialized
block groups would not need to partially initialize as many inode
table as with Orlov which would help fsck time as the filesystem usage
goes up.
Signed-off-by: NJose R. Santos <jrs@us.ibm.com>
Signed-off-by: NValerie Clement <valerie.clement@bull.net>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

772cb7c8

jbd2: Add commit time into the commit block · 736603ab

由 Theodore Ts'o 提交于 7月 11, 2008

Carlo Wood has demonstrated that it's possible to recover deleted
files from the journal.  Something that will make this easier is if we
can put the time of the commit into commit block.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

736603ab

14 7月, 2008 3 次提交

ext4: replace __FUNCTION__ occurrences · 4db9c54a

由 Stoyan Gaydarov 提交于 7月 13, 2008

__FUNCTION__ is gcc-specific, use __func__ instead
Signed-off-by: NStoyan Gaydarov <stoyboyker@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

4db9c54a

ext4: fix error processing in mb_free_blocks · 7e5a8cdd

由 Shen Feng 提交于 7月 13, 2008

The error processing of the return value of mb_free_blocks is meanless
because it only returns 0.  This fix includes

- make mb_free_blocks return void

- remove the error processing part in callers

- unlock group before calling ext4_error in mb_free_blocks
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

7e5a8cdd

ext4: error proc entry creation when the fs/ext4 is not correctly created · cfbe7e4f

由 Shen Feng 提交于 7月 13, 2008

When the directory fs/ext4 is not correctly created under proc, the entry
under this directory should not be created.
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

cfbe7e4f

12 7月, 2008 16 次提交

ext4: fix build failure if DX_DEBUG is enabled · f795e140

由 Li Zefan 提交于 7月 11, 2008

ext4_next_entry() is used by the debugging function dx_show_leaf(), so
it must be defined before that function.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f795e140

T
ext4: Remove unused variable from ext4_show_options · 7ad72ca6
由 Theodore Ts'o 提交于 7月 11, 2008
```
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
```
7ad72ca6

ext4: Rename read_block_bitmap() to ext4_read_block_bitmap() · 574ca174

由 Theodore Ts'o 提交于 7月 11, 2008

Since this a non-static function, make it be ext4 specific to avoid
conflicts with potentially other filesystems.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

574ca174

ext4: remove double definitions of xattr macros · 3537576a

由 Shen Feng 提交于 7月 11, 2008

remove the definitions of macros XATTR_TRUSTED_PREFIX and XATTR_USER_PREFIX
since they are defined in linux/xattr.h
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

3537576a

ext4: miscellaneous error checks and coding cleanups for mballoc · 74767c5a

由 Shen Feng 提交于 7月 11, 2008

ext4_mb_seq_history_open(): check if sbi->s_mb_history is NULL

ext4_mb_history_init(): replace kmalloc and memset with kzalloc

ext4_mb_init_backend(): remove memset since kzalloc is used

ext4_mb_init(): the return value of ext4_mb_init_backend is int,
	but i is unsigned, replace it with a new int variable.
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

74767c5a

ext4: add error processing when calling ext4_mb_init_cache in mballoc · fdf6c7a7

由 Shen Feng 提交于 7月 11, 2008

Add error processing for ext4_mb_load_buddy when it calls
ext4_mb_init_cache.
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fdf6c7a7

ext4: Fix ext4_mb_init_cache return error · 31b481dc

由 Mingming Cao 提交于 7月 11, 2008

ext4_mb_init_cache() incorrectly always return EIO on success. This
causes the caller of ext4_mb_init_cache() fail when it checks the return
value.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

31b481dc

ext4: improve some code in rb tree part of dir.c · 69baee06

由 Shen Feng 提交于 7月 11, 2008

* remove unnecessary code in free_rb_tree_fname

* rename free_rb_tree_fname to ext4_htree_create_dir_info
  since it and ext4_htree_free_dir_info are a pair

* replace kmalloc with kzalloc in ext4_htree_free_dir_info

All these make the code more readable and simple.
PS: this patch is also suitable for ext3.
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

69baee06

ext4: switch to seq_files · 91d99827

由 Alexey Dobriyan 提交于 7月 11, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

91d99827

ext4: Use BUG_ON() instead of BUG() · 07d45f12

由 Julia Lawall 提交于 7月 11, 2008

if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
side-effects to allow a definition of BUG_ON that drops the code completely.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@ disable unlikely @ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (unlikely(E)) { BUG(); }
+ BUG_ON(E);
)

@@ expression E,f; @@

(
  if (<... f(...) ...>) { BUG(); }
|
- if (E) { BUG(); }
+ BUG_ON(E);
)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

07d45f12

ext4: start searching for the right extent from the goal group. · ed8f9c75

由 Aneesh Kumar K.V 提交于 7月 11, 2008

With mballoc we search for the best extent using different
criteria. We should always use the goal group when we are
starting with a new criteria.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

ed8f9c75

ext4: fix comments to say "ext4" · 8a35694e

由 Shen Feng 提交于 7月 11, 2008

Change second/third to fourth.
Signed-off-by: NShen Feng <shen@cn.fujitsu.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8a35694e

ext4: Fix mb_find_next_bit not to return larger than max · e7dfb246

由 Aneesh Kumar K.V 提交于 7月 11, 2008

Some architectures implement ext4_find_next_bit and
ext4_find_next_zero_bit in such a way that they return
greater than max for some input values. Make sure
mb_find_next_bit and mb_find_next_zero_bit return the
right values.

On 2.6.25 we have include/asm-x86/bitops_32.h
static inline unsigned find_first_bit(const unsigned long *addr, unsigned size)
{
	unsigned x = 0;

	while (x < size) {
		unsigned long val = *addr++;
		if (val)
			return __ffs(val) + x;
		x += (sizeof(*addr)<<3);
	}
	return x;
}

This can return value greater than size.

Reported and fixed here for lustre

https://bugzilla.lustre.org/show_bug.cgi?id=15932
https://bugzilla.lustre.org/attachment.cgi?id=17205Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e7dfb246

ext4: validate directory entry data before use · f3b35f06

由 Duane Griffin 提交于 7月 11, 2008

ext4_dx_find_entry uses ext4_next_entry without verifying that the entry is
valid. If its rec_len == 0 this causes an infinite loop. Refactor the loop
to check the validity of entries before checking whether they match and
moving onto the next one.

There are other uses of ext4_next_entry in this file which also look
problematic. They should be reviewed and fixed if/when we have a test-case
that triggers them.

This patch fixes the first case (image hdb.25.softlockup.gz) reported in
http://bugzilla.kernel.org/show_bug.cgi?id=10882.
Signed-off-by: NDuane Griffin <duaneg@dghda.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

f3b35f06

ext4: handle deleting corrupted indirect blocks · 71dc8fbc

由 Duane Griffin 提交于 7月 11, 2008

While freeing indirect blocks we attach a journal head to the parent buffer
head, free the blocks, then journal the parent. If the indirect block list
is corrupted and points to the parent the journal head will be detached
when the block is cleared, causing an OOPS.

Check for that explicitly and handle it gracefully.

This patch fixes the third case (image hdb.20000057.nullderef.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.
Signed-off-by: NDuane Griffin <duaneg@dghda.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

71dc8fbc

ext4: handle corrupted orphan list at mount · 91ef4caf

由 Duane Griffin 提交于 7月 11, 2008

If the orphan node list includes valid, untruncatable nodes with nlink > 0
the ext4_orphan_cleanup loop which attempts to delete them will not do so,
causing it to loop forever. Fix by checking for such nodes in the
ext4_orphan_get function.

This patch fixes the second case (image hdb.20000009.softlockup.gz)
reported in http://bugzilla.kernel.org/show_bug.cgi?id=10882.
Signed-off-by: NDuane Griffin <duaneg@dghda.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

91ef4caf

13 7月, 2008 2 次提交

cifs: fix wksidarr declaration to be big-endian friendly · 536abdb0

由 Jeff Layton 提交于 7月 12, 2008

The current definition of wksidarr works fine on little endian arches
(since cpu_to_le32 is a no-op there), but on big-endian arches, it fails
to compile with this error:

error: braced-group within expression allowed only inside a function

The problem is that this static declaration has cpu_to_le32 embedded
within it, and that expands into a function macro.  We need to use
__constant_cpu_to_le32() instead.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

536abdb0

cifs: fix inode leak in cifs_get_inode_info_unix · e911d0cc

由 Jeff Layton 提交于 7月 12, 2008

Try this:

    mount a share with unix extensions
    create a file on it
    umount the share

You'll get the following message in the ring buffer:

VFS: Busy inodes after unmount of cifs. Self-destruct in 5 seconds.  Have a
nice day...

...the problem is that cifs_get_inode_info_unix is creating and hashing
a new inode even when it's going to return error anyway. The first
lookup when creating a file returns an error so we end up leaking this
inode before we do the actual create. This appears to be a regression
caused by commit 0e4bbde9.

The following patch seems to fix it for me, and fixes a minor
formatting nit as well.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NSteven French <sfrench@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e911d0cc

12 7月, 2008 1 次提交

Fix reference counting race on log buffers · 49641f1a

由 Dave Chinner 提交于 7月 11, 2008

When we release the iclog, we do an atomic_dec_and_lock to determine if
we are the last reference and need to trigger update of log headers and
writeout. However, in xlog_state_get_iclog_space() we also need to
check if we have the last reference count there. If we do, we release
the log buffer, otherwise we decrement the reference count.

But the compare and decrement in xlog_state_get_iclog_space() is not
atomic, so both places can see a reference count of 2 and neither will
release the iclog. That leads to a filesystem hang.

Close the race by replacing the atomic_read() and atomic_dec() pair with
atomic_add_unless() to ensure that they are executed atomically.
Signed-off-by: NDave Chinner <david@fromorbit.com>
Reviewed-by: NTim Shimmin <tes@sgi.com>
Tested-by: NEric Sandeen <sandeen@sandeen.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49641f1a

11 7月, 2008 2 次提交

exec: fix stack excutability without PT_GNU_STACK · 96a8e13e

由 Hugh Dickins 提交于 7月 10, 2008

Kernel Bugzilla #11063 points out that on some architectures (e.g. x86_32)
exec'ing an ELF without a PT_GNU_STACK program header should default to an
executable stack; but this got broken by the unlimited argv feature because
stack vma is now created before the right personality has been established:
so breaking old binaries using nested function trampolines.

Therefore re-evaluate VM_STACK_FLAGS in setup_arg_pages, where stack
vm_flags used to be set, before the mprotect_fixup. Checking through
our existing VM_flags, none would have changed since insert_vm_struct:
so this seems safer than finding a way through the personality labyrinth.

Reported-by: pageexec@freemail.hu
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96a8e13e

ocfs2: Fix flags in ocfs2_file_lock · e988cf1c

由 Mark Fasheh 提交于 7月 10, 2008

The stack-glue merge changed the way we use flags in dlmglue in that we now
use the fs/dlm equivalents. Unfortunately, a merge error left the new flock
code only partially updated. This took a while to show up though, because
the lock level constants are actually identical between o2dlm and fs/dlm.
The *_CONVERT and *_NOQUEUE flags have different values though, which is
eventually causing a crash in flags_to_o2dlm().
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

e988cf1c

09 7月, 2008 2 次提交

reiserfs: discard prealloc in reiserfs_delete_inode · eb35c218

由 Jeff Mahoney 提交于 7月 08, 2008

With the removal of struct file from the xattr code,
reiserfs_file_release() isn't used anymore, so the prealloc isn't
discarded.  This causes hangs later down the line.

This patch adds it to reiserfs_delete_inode.  In most cases it will be a
no-op due to it already having been called, but will avoid hangs with
xattrs.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eb35c218

NFS: Fix readdir cache invalidation · 2aac05a9

由 Trond Myklebust 提交于 7月 07, 2008

invalidate_inode_pages2_range() takes page offset arguments, not byte
ranges.

Another thought is that individual pages might perhaps get evicted by VM
pressure, in which case we might perhaps want to re-read not only the
evicted page, but all subsequent pages too (in case the server returns
more/less data per page so that the alignment of the next entry
changes). We should therefore remove the condition that we only do this on
page->index==0.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2aac05a9

08 7月, 2008 1 次提交

[PATCH] ocfs2/dlm: Fixes oops in dlm_new_lockres() · 18c6ac38

由 Sunil Mushran 提交于 7月 07, 2008

Patch fixes a race that can result in an oops while adding a
lockres to the dlm lockres tracking list.

Bug introduced by mainline commit 29576f8b.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

18c6ac38

06 7月, 2008 2 次提交

Fix pagemap_read() use of struct mm_walk · 5d7e0d2b

由 Andrew Morton 提交于 7月 05, 2008

Fix some issues in pagemap_read noted by Alexey:

- initialize pagemap_walk.mm to "mm" , so the code starts working as
  advertised

- initialize ->private to "&pm" so it wouldn't immediately oops in
  pagemap_pte_hole()

- unstatic struct pagemap_walk, so two threads won't fsckup each other
  (including those started by root, including flipping ->mm when you don't
  have permissions)

- pagemap_read() contains two calls to ptrace_may_attach(), second one
  looks unneeded.

- avoid possible kmalloc(0) and integer wraparound.

Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
[ Personally, I'd just remove the functionality entirely  - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d7e0d2b

Fix clear_refs_write() use of struct mm_walk · 20cbc972

由 Andrew Morton 提交于 7月 05, 2008

Don't use a static entry, so as to prevent races during concurrent use
of this function.
Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

20cbc972

05 7月, 2008 7 次提交

security: filesystem capabilities: fix fragile setuid fixup code · 086f7316

由 Andrew G. Morgan 提交于 7月 04, 2008

This commit includes a bugfix for the fragile setuid fixup code in the
case that filesystem capabilities are supported (in access()).  The effect
of this fix is gated on filesystem capability support because changing
securebits is only supported when filesystem capabilities support is
configured.)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

086f7316

add kernel-doc for simple_read_from_buffer and memory_read_from_buffer · 6d1029b5

由 Akinobu Mita 提交于 7月 04, 2008

Add kernel-doc comments describing simple_read_from_buffer and
memory_read_from_buffer.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d1029b5

ntfs: update help text · 337e2ab5

由 Jess Guerrero 提交于 7月 04, 2008

The url in the help text for ntfs should be updated.
Acked-by: NAnton Altaparmakov <aia21@cantab.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

337e2ab5

ecryptfs: remove unnecessary mux from ecryptfs_init_ecryptfs_miscdev() · c4a2d7fb

由 Michael Halcrow 提交于 7月 04, 2008

The misc_mtx should provide all the protection required to keep the daemon
hash table sane during miscdev registration.  Since this mutex is causing
gratuitous lockdep warnings, this patch removes it.
Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
Reported-by: NCyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4a2d7fb

reiserfs: add missing unlock to an error path in reiserfs_quota_write() · 10dd08dc

由 Jan Kara 提交于 7月 04, 2008

When write in reiserfs_quota_write() fails, we have to properly release
i_mutex. One error path has been missing the unlock...
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

10dd08dc

ext4: add missing unlock to an error path in ext4_quota_write() · 4d04e4fb

由 Jan Kara 提交于 7月 04, 2008

When write in ext4_quota_write() fails, we have to properly release
i_mutex.  One error path has been missing the unlock...
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d04e4fb

ext3: add missing unlock to error path in ext3_quota_write() · f5c8f7da

由 Jan Kara 提交于 7月 04, 2008

When write in ext3_quota_write() fails, we have to properly release
i_mutex.  One error path has been missing the unlock...
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5c8f7da

03 7月, 2008 1 次提交

9p: fix O_APPEND in legacy mode · 2e4bef41

由 Eric Van Hensbergen 提交于 6月 24, 2008

The legacy protocol's open operation doesn't handle an append operation
(it is expected that the client take care of it).  We were incorrectly
passing the extended protocol's flag through even in legacy mode.  This
was reported in bugzilla report #10689.  This patch fixes the problem
by disallowing extended protocol open modes from being passed in legacy
mode and implemented append functionality on the client side by adding
a seek after the open.
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

2e4bef41

01 7月, 2008 1 次提交

Properly notify block layer of sync writes · 18ce3751

由 Jens Axboe 提交于 7月 01, 2008

fsync_buffers_list() and sync_dirty_buffer() both issue async writes and
then immediately wait on them. Conceptually, that makes them sync writes
and we should treat them as such so that the IO schedulers can handle
them appropriately.

This patch fixes a write starvation issue that Lin Ming reported, where
xx is stuck for more than 2 minutes because of a large number of
synchronous IO in the system:

INFO: task kjournald:20558 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
kjournald     D ffff810010820978  6712 20558      2
ffff81022ddb1d10 0000000000000046 ffff81022e7baa10 ffffffff803ba6f2
ffff81022ecd0000 ffff8101e6dc9160 ffff81022ecd0348 000000008048b6cb
0000000000000086 ffff81022c4e8d30 0000000000000000 ffffffff80247537
Call Trace:
[<ffffffff803ba6f2>] kobject_get+0x12/0x17
[<ffffffff80247537>] getnstimeofday+0x2f/0x83
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d195>] io_schedule+0x5d/0x9f
[<ffffffff8029c1e7>] sync_buffer+0x3b/0x3f
[<ffffffff8066d3f0>] __wait_on_bit+0x40/0x6f
[<ffffffff8029c1ac>] sync_buffer+0x0/0x3f
[<ffffffff8066d48b>] out_of_line_wait_on_bit+0x6c/0x78
[<ffffffff80243909>] wake_bit_function+0x0/0x23
[<ffffffff8029e3ad>] sync_dirty_buffer+0x98/0xcb
[<ffffffff8030056b>] journal_commit_transaction+0x97d/0xcb6
[<ffffffff8023a676>] lock_timer_base+0x26/0x4b
[<ffffffff8030300a>] kjournald+0xc1/0x1fb
[<ffffffff802438db>] autoremove_wake_function+0x0/0x2e
[<ffffffff80302f49>] kjournald+0x0/0x1fb
[<ffffffff802437bb>] kthread+0x47/0x74
[<ffffffff8022de51>] schedule_tail+0x28/0x5d
[<ffffffff8020cac8>] child_rip+0xa/0x12
[<ffffffff80243774>] kthread+0x0/0x74
[<ffffffff8020cabe>] child_rip+0x0/0x12

Lin Ming confirms that this patch fixes the issue. I've run tests with
it for the past week and no ill effects have been observed, so I'm
proposing it for inclusion into 2.6.26.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

18ce3751

openeuler / Kernel 12 个月 前同步成功

openeuler / Kernel
12 个月前同步成功