提交 · 5e5027bd26ed4df735d29e66cd5c1c9b5959a587 · openeuler / Kernel

05 10月, 2009 1 次提交

Revert "Seperate read and write statistics of in_flight requests" · 0f78ab98

由 Jens Axboe 提交于 10月 04, 2009

This reverts commit a9327cac.

Corrado Zoccolo <czoccolo@gmail.com> reports:

"with 2.6.32-rc1 I started getting the following strange output from
"iostat -kx 2":
Linux 2.6.31bisect (et2) 	04/10/2009 	_i686_	(2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10,70    0,00    3,16   15,75    0,00   70,38

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda              18,22     0,00    0,67    0,01    14,77     0,02
43,94     0,01   10,53 39043915,03 2629219,87
sdb              60,89     9,68   50,79    3,04  1724,43    50,52
65,95     0,70   13,06 488437,47 2629219,87

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2,72    0,00    0,74    0,00    0,00   96,53

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6,68    0,00    0,99    0,00    0,00   92,33

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4,40    0,00    0,73    1,47    0,00   93,40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sda               0,00     0,00    0,00    0,00     0,00     0,00
0,00     0,00    0,00   0,00 100,00
sdb               0,00     4,00    0,00    3,00     0,00    28,00
18,67     0,06   19,50 333,33 100,00

Global values for service time and utilization are garbage. For
interval values, utilization is always 100%, and service time is
higher than normal.

I bisected it down to:
[a9327cac] Seperate read and write
statistics of in_flight requests
and verified that reverting just that commit indeed solves the issue
on 2.6.32-rc1."

So until this is debugged, revert the bad commit.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

0f78ab98

03 10月, 2009 2 次提交

[PATCH] ext4: retry failed direct IO allocations · fbbf6945

由 Eric Sandeen 提交于 10月 02, 2009

On a 256M filesystem, doing this in a loop:

        xfs_io -F -f -d -c 'pwrite 0 64m' test
        rm -f test

eventually leads to ENOSPC.  (the xfs_io command does a
64m direct IO write to the file "test")

As with other block allocation callers, it looks like we need to
potentially retry the allocations on the initial ENOSPC.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

fbbf6945

ext4: Fix build warning in ext4_dirty_inode() · 74072d0a

由 Curt Wohlgemuth 提交于 10月 02, 2009

This fixes the following warning:

fs/ext4/inode.c: In function 'ext4_dirty_inode':
fs/ext4/inode.c:5615: warning: unused variable 'current_handle'

We remove the jbd_debug() statement which does use current_handle, as
it's not terribly important in the grand scheme of things.

Thanks to Stephen Rothwell for pointing this out.
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

74072d0a

02 10月, 2009 6 次提交

afs: remove cache.h · 80e50be4

由 Christoph Hellwig 提交于 10月 01, 2009

It's just a wrapper for <linux/fscache.h>, so remove it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

80e50be4

const: constify remaining file_operations · 828c0950

由 Alexey Dobriyan 提交于 10月 01, 2009

[akpm@linux-foundation.org: fix KVM]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

828c0950

Btrfs: fix data space leak fix · fbf19087

由 Josef Bacik 提交于 10月 01, 2009

There is a problem where page_mkwrite can be called on a dirtied page that
already has a delalloc range associated with it.  The fix is to clear any
delalloc bits for the range we are dirtying so the space accounting gets
handled properly.  This is the same thing we do in the normal write case, so we
are consistent across the board.  With this patch we no longer leak reserved
space.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fbf19087

fs/bio.c: move EXPORT* macros to line after function · a112a71d

由 H Hartley Sweeten 提交于 9月 26, 2009

As mentioned in Documentation/CodingStyle, move EXPORT* macro's
to the line immediately after the closing function brace line.
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a112a71d

Btrfs: remove duplicates of filemap_ helpers · 8aa38c31

由 Christoph Hellwig 提交于 10月 01, 2009

Use filemap_fdatawrite_range and filemap_fdatawait_range instead of
local copies of the functions.  For filemap_fdatawait_range that
also means replacing the awkward old wait_on_page_writeback_range
calling convention with the regular filemap byte offsets.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8aa38c31

Btrfs: take i_mutex before generic_write_checks · ab93dbec

由 Chris Mason 提交于 10月 01, 2009

btrfs_file_write was incorrectly calling generic_write_checks without
taking i_mutex.  This lead to problems with racing around i_size when
doing O_APPEND writes.

The fix here is to move i_mutex higher.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ab93dbec

01 10月, 2009 3 次提交

Btrfs: fix arguments to btrfs_wait_on_page_writeback_range · 35d62a94

由 Christoph Hellwig 提交于 9月 30, 2009

wait_on_page_writeback_range/btrfs_wait_on_page_writeback_range takes
a pagecache offset, not a byte offset into the file.  Shift the arguments
around to wait for the correct range
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

35d62a94

ext4: drop ext4dev compat · f0e2dfa7

由 Eric Sandeen 提交于 10月 01, 2009

Kconfig & super.c promised it'd be gone by 2.6.31, so it's
about time to drop it.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f0e2dfa7

ext4: fix a BUG_ON crash by checking that page has buffers attached to it · 1f94533d

由 Theodore Ts'o 提交于 9月 30, 2009

In ext4_num_dirty_pages() we were calling page_buffers() before
checking to see if the page actually had pages attached to it; this
would cause a BUG check crash in the inline function page_buffers().

Thanks to Markus Trippelsdorf for reporting this bug.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1f94533d

30 9月, 2009 9 次提交

ext4: Fix time encoding with extra epoch bits · c1fccc06

由 Theodore Ts'o 提交于 9月 30, 2009

"Looking at ext4.h, I think the setting of extra time fields forgets to
mask the epoch bits so the epoch part overwrites nsec part. The second
change is only for coherency (2 -> EXT4_EPOCH_BITS)."

Thanks to Damien Guibouret for pointing out this problem.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c1fccc06

jbd2: Use tracepoints for history file · bf699327

由 Theodore Ts'o 提交于 9月 30, 2009

The /proc/fs/jbd2/<dev>/history was maintained manually; by using
tracepoints, we can get all of the existing functionality of the /proc
file plus extra capabilities thanks to the ftrace infrastructure.  We
save memory as a bonus.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bf699327

ext4: Use tracepoints for mb_history trace file · 296c355c

由 Theodore Ts'o 提交于 9月 30, 2009

The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.  

By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

296c355c

Btrfs: fix deadlock with free space handling and user transactions · dd7e0b7b

由 Sage Weil 提交于 9月 29, 2009

If an ioctl-initiated transaction is open, we can't force a commit during
the free space checks in order to free up pinned extents or else we
deadlock.  Just ENOSPC instead.

A more satisfying solution that reserves space for the entire user
transaction up front is forthcoming...
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

dd7e0b7b

Btrfs: fix error cases for ioctl transactions · 1ab86aed

由 Sage Weil 提交于 9月 29, 2009

Fix leak of vfsmount write reference and open_ioctl_trans reference on
ENOMEM.  Clean up the error paths while we're at it.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1ab86aed

ext4, jbd2: Drop unneeded printks at mount and unmount time · 90576c0b

由 Theodore Ts'o 提交于 9月 29, 2009

There are a number of kernel printk's which are printed when an ext4
filesystem is mounted and unmounted.  Disable them to economize space
in the system logs.  In addition, disabling the mballoc stats by
default saves a number of unneeded atomic operations for every block
allocation or deallocation.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

90576c0b

Btrfs: Use CONFIG_BTRFS_POSIX_ACL to enable ACL code · 3baf0bed

由 Chris Ball 提交于 9月 29, 2009

We've already defined CONFIG_BTRFS_POSIX_ACL in Kconfig, but we're
currently not using it and are testing CONFIG_FS_POSIX_ACL instead.
CONFIG_FS_POSIX_ACL states "Never use this symbol for ifdefs".
Signed-off-by: NChris Ball <cjb@laptop.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3baf0bed

Btrfs: introduce missing kfree · fd2696f3

由 Julia Lawall 提交于 9月 29, 2009

Error handling code following a kzalloc should free the allocated data.

The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,f1,l;
position p1,p2;
expression *ptr != NULL;
@@

x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
<... when != x
     when != if (...) { <+...x...+> }
(
x->f1 = E
|
 (x->f1 == NULL || ...)
|
 f(...,x->f1,...)
)
...>
(
 return \(0\|<+...x...+>\|ptr\);
|
 return@p2 ...;
)

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fd2696f3

Btrfs: Fix setting umask when POSIX ACLs are not enabled · 49cf6f45

由 Chris Ball 提交于 9月 29, 2009

We currently set sb->s_flags |= MS_POSIXACL unconditionally, which is
incorrect -- it tells the VFS that it shouldn't set umask because we
will, yet we don't set it ourselves if we aren't using POSIX ACLs, so
the umask ends up ignored.
Signed-off-by: NChris Ball <cjb@laptop.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

49cf6f45

29 9月, 2009 1 次提交

ext4: Handle nested ext4_journal_start/stop calls without a journal · d3d1faf6

由 Curt Wohlgemuth 提交于 9月 29, 2009

This patch fixes a problem with handling nested calls to
ext4_journal_start/ext4_journal_stop, when there is no journal present.
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d3d1faf6

30 9月, 2009 1 次提交

ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode · f3dc272f

由 Curt Wohlgemuth 提交于 9月 29, 2009

This patch a problem that ext4_dirty_inode() was not calling
ext4_mark_inode_dirty() if the current_handle is not valid, which it
is the case in no journal mode.

It also removes a test for non-matching transaction which can never
happen.
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f3dc272f

29 9月, 2009 9 次提交

ext4: Avoid updating the inode table bh twice in no journal mode · 830156c7

由 Frank Mayhar 提交于 9月 29, 2009

This is a cleanup of commit 91ac6f43.  Since ext4_mark_inode_dirty()
has already called ext4_mark_iloc_dirty(), which in turn calls
ext4_do_update_inode(), it's not necessary to have ext4_write_inode()
call ext4_do_update_inode() in no journal mode.  Indeed, it would be
duplicated work.
Reviewed-by: N"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NFrank Mayhar <fmayhar@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

830156c7

nilfs2: fix missing initialization of i_dir_start_lookup member · 3cc811bf

由 Ryusuke Konishi 提交于 9月 28, 2009

The i_dir_start_lookup field in nilfs_inode_info objects should be
cleared when the objects are allocated, but the the initialization was
missing in case of reading from disk.  This adds the initialization.

Since the variable just gives a start page on directory lookups, the
bug was nonfatal until now.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

3cc811bf

nilfs2: fix missing zero-fill initialization of btree node cache · 1f28fcd9

由 Ryusuke Konishi 提交于 9月 28, 2009

This will fix file system corruption which infrequently happens after
mount.  The problem was reported from users with the title "[NILFS
users] Fail to mount NILFS." (Message-ID:
<200908211918.34720.yuri@itinteg.net>), and so forth.  I've also
experienced the corruption multiple times on kernel 2.6.30 and 2.6.31.

The problem turned out to be caused due to discordance between
mapping->nrpages of a btree node cache and the actual number of pages
hung on the cache; if the mapping->nrpages becomes zero even as it has
pages, truncate_inode_pages() returns without doing anything.  Usually
this is harmless except it may cause page leak, but garbage collection
fairly infrequently sees a stale page remained in the btree node cache
of DAT (i.e. disk address translation file of nilfs), and induces the
corruption.

I identified a missing initialization in btree node caches was the
root cause.  This corrects the bug.

I've tested this for kernel 2.6.30 and 2.6.31.
Reported-by: NYuri Chislov <yuri@itinteg.net>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable <stable@kernel.org>

1f28fcd9

Btrfs: proper -ENOSPC handling · 9ed74f2d

由 Josef Bacik 提交于 9月 11, 2009

At the start of a transaction we do a btrfs_reserve_metadata_space() and
specify how many items we plan on modifying. Then once we've done our
modifications and such, just call btrfs_unreserve_metadata_space() for
the same number of items we reserved.

For keeping track of metadata needed for data I've had to add an extent_io op
for when we merge extents. This lets us track space properly when we are doing
sequential writes, so we don't end up reserving way more metadata space than
what we need.

The only place where the metadata space accounting is not done is in the
relocation code. This is because Yan is going to be reworking that code in the
near future, so running btrfs-vol -b could still possibly result in a ENOSPC
related panic. This patch also turns off the metadata_ratio stuff in order to
allow users to more efficiently use their disk space.

This patch makes it so we track how much metadata we need for an inode's
delayed allocation extents by tracking how many extents are currently
waiting for allocation. It introduces two new callbacks for the
extent_io tree's, merge_extent_hook and split_extent_hook. These help
us keep track of when we merge delalloc extents together and split them
up. Reservations are handled prior to any actually dirty'ing occurs,
and then we unreserve after we dirty.

btrfs_unreserve_metadata_for_delalloc() will make the appropriate
unreservations as needed based on the number of reservations we
currently have and the number of extents we currently have. Doing the
reservation outside of doing any of the actual dirty'ing lets us do
things like filemap_flush() the inode to try and force delalloc to
happen, or as a last resort actually start allocation on all delalloc
inodes in the fs. This has survived dbench, fs_mark and an fsx torture
test.
Signed-off-by: NJosef Bacik <jbacik@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9ed74f2d

ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first · f3ce8064

由 Theodore Ts'o 提交于 9月 28, 2009

Move the check to make sure the original and donor inodes are
different earlier, to avoid a potential deadlock by trying to lock the
same inode twice.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f3ce8064

ext4: async direct IO for holes and fallocate support · 8d5d02e6

由 Mingming Cao 提交于 9月 28, 2009

For async direct IO that covers holes or fallocate, the end_io
callback function now queued the convertion work on workqueue but
don't flush the work rightaway as it might take too long to afford.

But when fsync is called after all the data is completed, user expects
the metadata also being updated before fsync returns.

Thus we need to flush the conversion work when fsync() is called.
This patch keep track of a listed of completed async direct io that
has a work queued on workqueue.  When fsync() is called, it will go
through the list and do the conversion.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

8d5d02e6

ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O · 4c0425ff

由 Mingming Cao 提交于 9月 28, 2009

Currently the DIO VFS code passes create = 0 when writing to the
middle of file.  It does this to avoid block allocation for holes, so
as not to expose stale data out when there is a parallel buffered read
(which does not hold the i_mutex lock).  Direct I/O writes into holes
falls back to buffered IO for this reason.

Since preallocated extents are treated as holes when doing a
get_block() look up (buffer is not mapped), direct IO over fallocate
also falls back to buffered IO.  Thus ext4 actually silently falls
back to buffered IO in above two cases, which is undesirable.

To fix this, this patch creates unitialized extents when a direct I/O
write into holes in sparse files, and registering an end_io callback which
converts the uninitialized extent to an initialized extent after the
I/O is completed.
Singed-Off-By: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4c0425ff

ext4: Split uninitialized extents for direct I/O · 0031462b

由 Mingming Cao 提交于 9月 28, 2009

When writing into an unitialized extent via direct I/O, and the direct
I/O doesn't exactly cover the unitialized extent, split the extent
into uninitialized and initialized extents before submitting the I/O.
This avoids needing to deal with an ENOSPC error in the end_io
callback that gets used for direct I/O.

When the IO is complete, the written extent will be marked as initialized.

Singed-Off-By: Mingming Cao <cmm@us.ibm.com> 
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0031462b

ext4: release reserved quota when block reservation for delalloc retry · 9f0ccfd8

由 Mingming Cao 提交于 9月 28, 2009

ext4_da_reserve_space() can reserve quota blocks multiple times if
ext4_claim_free_blocks() fail and we retry the allocation. We should
release the quota reservation before restarting.

Bug found by Jan Kara.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9f0ccfd8

30 9月, 2009 1 次提交

ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks · 55138e0b

由 Theodore Ts'o 提交于 9月 29, 2009

Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small.  This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time.  So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode.  We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

55138e0b

28 9月, 2009 2 次提交

ext4: Fix hueristic which avoids group preallocation for closed files · 71780577

由 Theodore Ts'o 提交于 9月 28, 2009

The hueristic was designed to avoid using locality group preallocation
when writing the last segment of a closed file.  Fix it by move
setting size to the maximum of size and isize until after we check
whether size == isize.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

71780577

const: mark struct vm_struct_operations · f0f37e2f

由 Alexey Dobriyan 提交于 9月 27, 2009

* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f0f37e2f

27 9月, 2009 1 次提交

ext4: Use ext4_msg() for ext4_da_writepage() errors · 1693918e

由 Theodore Ts'o 提交于 9月 26, 2009

This allows the user to see what filesystem was involved with a
particular ext4_da_writepage() error.  Also, use KERN_CRIT which is
more appropriate than KERN_EMERG.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1693918e

26 9月, 2009 4 次提交

writeback: pass in super_block to bdi_start_writeback() · a72bfd4d

由 Jens Axboe 提交于 9月 26, 2009

Sometimes we only want to write pages from a specific super_block,
so allow that to be passed in.

This fixes a problem with commit 56a131dc
causing writeback on all super_blocks on a bdi, where we only really
want to sync a specific sb from writeback_inodes_sb().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a72bfd4d

cifs: fix locking and list handling code in cifs_open and its helper · 3321b791

由 Jeff Layton 提交于 9月 25, 2009

The patch to remove cifs_init_private introduced a locking imbalance. It
didn't remove the leftover list addition code and the unlocking in that
function. cifs_new_fileinfo does the list addition now, so there should
be no need to do it outside of that function.

pCifsInode will never be NULL, so we don't need to check for that. This
patch also gets rid of the ugly locking and unlocking across function
calls.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NSteve French <sfrench@us.ibm.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

3321b791

writeback: writeback_inodes_sb() should use bdi_start_writeback() · 56a131dc

由 Jens Axboe 提交于 9月 25, 2009

Pointless to iterate other devices looking for a super, when
we have a bdi mapping.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

56a131dc

writeback: don't delay inodes redirtied by a fast dirtier · b3af9468

由 Wu Fengguang 提交于 9月 25, 2009

Debug traces show that in per-bdi writeback, the inode under writeback
almost always get redirtied by a busy dirtier.  We used to call
redirty_tail() in this case, which could delay inode for up to 30s.

This is unacceptable because it now happens so frequently for plain cp/dd,
that the accumulated delays could make writeback of big files very slow.

So let's distinguish between data redirty and metadata only redirty.
The first one is caused by a busy dirtier, while the latter one could
happen in XFS, NFS, etc. when they are doing delalloc or updating isize.

The inode being busy dirtied will now be requeued for next io, while
the inode being redirtied by fs will continue to be delayed to avoid
repeated IO.

CC: Jan Kara <jack@suse.cz>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Dave Chinner <david@fromorbit.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b3af9468

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功