提交 · 1c4115e595dec42aa0e81ba47ef46e35b34ed428 · openeuler / raspberrypi-kernel

30 9月, 2009 4 次提交

ext4: Fix time encoding with extra epoch bits · c1fccc06

由 Theodore Ts'o 提交于 9月 30, 2009

"Looking at ext4.h, I think the setting of extra time fields forgets to
mask the epoch bits so the epoch part overwrites nsec part. The second
change is only for coherency (2 -> EXT4_EPOCH_BITS)."

Thanks to Damien Guibouret for pointing out this problem.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

c1fccc06

jbd2: Use tracepoints for history file · bf699327

由 Theodore Ts'o 提交于 9月 30, 2009

The /proc/fs/jbd2/<dev>/history was maintained manually; by using
tracepoints, we can get all of the existing functionality of the /proc
file plus extra capabilities thanks to the ftrace infrastructure.  We
save memory as a bonus.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

bf699327

ext4: Use tracepoints for mb_history trace file · 296c355c

由 Theodore Ts'o 提交于 9月 30, 2009

The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a
number of problems: it required a largish amount of memory to be
allocated for each ext4 filesystem, and the s_mb_history_lock
introduced a CPU contention problem.  

By ripping out the mb_history code and replacing it with ftrace
tracepoints, and we get more functionality: timestamps, event
filtering, the ability to correlate mballoc history with other ext4
tracepoints, etc.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

296c355c

ext4, jbd2: Drop unneeded printks at mount and unmount time · 90576c0b

由 Theodore Ts'o 提交于 9月 29, 2009

There are a number of kernel printk's which are printed when an ext4
filesystem is mounted and unmounted.  Disable them to economize space
in the system logs.  In addition, disabling the mballoc stats by
default saves a number of unneeded atomic operations for every block
allocation or deallocation.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

90576c0b

29 9月, 2009 1 次提交

ext4: Handle nested ext4_journal_start/stop calls without a journal · d3d1faf6

由 Curt Wohlgemuth 提交于 9月 29, 2009

This patch fixes a problem with handling nested calls to
ext4_journal_start/ext4_journal_stop, when there is no journal present.
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

d3d1faf6

30 9月, 2009 1 次提交

ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode · f3dc272f

由 Curt Wohlgemuth 提交于 9月 29, 2009

This patch a problem that ext4_dirty_inode() was not calling
ext4_mark_inode_dirty() if the current_handle is not valid, which it
is the case in no journal mode.

It also removes a test for non-matching transaction which can never
happen.
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f3dc272f

29 9月, 2009 8 次提交

ext4: Avoid updating the inode table bh twice in no journal mode · 830156c7

由 Frank Mayhar 提交于 9月 29, 2009

This is a cleanup of commit 91ac6f43.  Since ext4_mark_inode_dirty()
has already called ext4_mark_iloc_dirty(), which in turn calls
ext4_do_update_inode(), it's not necessary to have ext4_write_inode()
call ext4_do_update_inode() in no journal mode.  Indeed, it would be
duplicated work.
Reviewed-by: N"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NFrank Mayhar <fmayhar@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

830156c7

nilfs2: fix missing initialization of i_dir_start_lookup member · 3cc811bf

由 Ryusuke Konishi 提交于 9月 28, 2009

The i_dir_start_lookup field in nilfs_inode_info objects should be
cleared when the objects are allocated, but the the initialization was
missing in case of reading from disk.  This adds the initialization.

Since the variable just gives a start page on directory lookups, the
bug was nonfatal until now.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

3cc811bf

nilfs2: fix missing zero-fill initialization of btree node cache · 1f28fcd9

由 Ryusuke Konishi 提交于 9月 28, 2009

This will fix file system corruption which infrequently happens after
mount.  The problem was reported from users with the title "[NILFS
users] Fail to mount NILFS." (Message-ID:
<200908211918.34720.yuri@itinteg.net>), and so forth.  I've also
experienced the corruption multiple times on kernel 2.6.30 and 2.6.31.

The problem turned out to be caused due to discordance between
mapping->nrpages of a btree node cache and the actual number of pages
hung on the cache; if the mapping->nrpages becomes zero even as it has
pages, truncate_inode_pages() returns without doing anything.  Usually
this is harmless except it may cause page leak, but garbage collection
fairly infrequently sees a stale page remained in the btree node cache
of DAT (i.e. disk address translation file of nilfs), and induces the
corruption.

I identified a missing initialization in btree node caches was the
root cause.  This corrects the bug.

I've tested this for kernel 2.6.30 and 2.6.31.
Reported-by: NYuri Chislov <yuri@itinteg.net>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: stable <stable@kernel.org>

1f28fcd9

ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first · f3ce8064

由 Theodore Ts'o 提交于 9月 28, 2009

Move the check to make sure the original and donor inodes are
different earlier, to avoid a potential deadlock by trying to lock the
same inode twice.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

f3ce8064

ext4: async direct IO for holes and fallocate support · 8d5d02e6

由 Mingming Cao 提交于 9月 28, 2009

For async direct IO that covers holes or fallocate, the end_io
callback function now queued the convertion work on workqueue but
don't flush the work rightaway as it might take too long to afford.

But when fsync is called after all the data is completed, user expects
the metadata also being updated before fsync returns.

Thus we need to flush the conversion work when fsync() is called.
This patch keep track of a listed of completed async direct io that
has a work queued on workqueue.  When fsync() is called, it will go
through the list and do the conversion.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>

8d5d02e6

ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O · 4c0425ff

由 Mingming Cao 提交于 9月 28, 2009

Currently the DIO VFS code passes create = 0 when writing to the
middle of file.  It does this to avoid block allocation for holes, so
as not to expose stale data out when there is a parallel buffered read
(which does not hold the i_mutex lock).  Direct I/O writes into holes
falls back to buffered IO for this reason.

Since preallocated extents are treated as holes when doing a
get_block() look up (buffer is not mapped), direct IO over fallocate
also falls back to buffered IO.  Thus ext4 actually silently falls
back to buffered IO in above two cases, which is undesirable.

To fix this, this patch creates unitialized extents when a direct I/O
write into holes in sparse files, and registering an end_io callback which
converts the uninitialized extent to an initialized extent after the
I/O is completed.
Singed-Off-By: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

4c0425ff

ext4: Split uninitialized extents for direct I/O · 0031462b

由 Mingming Cao 提交于 9月 28, 2009

When writing into an unitialized extent via direct I/O, and the direct
I/O doesn't exactly cover the unitialized extent, split the extent
into uninitialized and initialized extents before submitting the I/O.
This avoids needing to deal with an ENOSPC error in the end_io
callback that gets used for direct I/O.

When the IO is complete, the written extent will be marked as initialized.

Singed-Off-By: Mingming Cao <cmm@us.ibm.com> 
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

0031462b

ext4: release reserved quota when block reservation for delalloc retry · 9f0ccfd8

由 Mingming Cao 提交于 9月 28, 2009

ext4_da_reserve_space() can reserve quota blocks multiple times if
ext4_claim_free_blocks() fail and we retry the allocation. We should
release the quota reservation before restarting.

Bug found by Jan Kara.
Signed-off-by: NMingming Cao <cmm@us.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

9f0ccfd8

30 9月, 2009 1 次提交

ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks · 55138e0b

由 Theodore Ts'o 提交于 9月 29, 2009

Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small.  This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time.  So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode.  We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

55138e0b

28 9月, 2009 2 次提交

ext4: Fix hueristic which avoids group preallocation for closed files · 71780577

由 Theodore Ts'o 提交于 9月 28, 2009

The hueristic was designed to avoid using locality group preallocation
when writing the last segment of a closed file.  Fix it by move
setting size to the maximum of size and isize until after we check
whether size == isize.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

71780577

const: mark struct vm_struct_operations · f0f37e2f

由 Alexey Dobriyan 提交于 9月 27, 2009

* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f0f37e2f

27 9月, 2009 1 次提交

ext4: Use ext4_msg() for ext4_da_writepage() errors · 1693918e

由 Theodore Ts'o 提交于 9月 26, 2009

This allows the user to see what filesystem was involved with a
particular ext4_da_writepage() error.  Also, use KERN_CRIT which is
more appropriate than KERN_EMERG.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

1693918e

26 9月, 2009 13 次提交

writeback: pass in super_block to bdi_start_writeback() · a72bfd4d

由 Jens Axboe 提交于 9月 26, 2009

Sometimes we only want to write pages from a specific super_block,
so allow that to be passed in.

This fixes a problem with commit 56a131dc
causing writeback on all super_blocks on a bdi, where we only really
want to sync a specific sb from writeback_inodes_sb().
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a72bfd4d

cifs: fix locking and list handling code in cifs_open and its helper · 3321b791

由 Jeff Layton 提交于 9月 25, 2009

The patch to remove cifs_init_private introduced a locking imbalance. It
didn't remove the leftover list addition code and the unlocking in that
function. cifs_new_fileinfo does the list addition now, so there should
be no need to do it outside of that function.

pCifsInode will never be NULL, so we don't need to check for that. This
patch also gets rid of the ugly locking and unlocking across function
calls.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NSteve French <sfrench@us.ibm.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

3321b791

writeback: writeback_inodes_sb() should use bdi_start_writeback() · 56a131dc

由 Jens Axboe 提交于 9月 25, 2009

Pointless to iterate other devices looking for a super, when
we have a bdi mapping.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

56a131dc

writeback: don't delay inodes redirtied by a fast dirtier · b3af9468

由 Wu Fengguang 提交于 9月 25, 2009

Debug traces show that in per-bdi writeback, the inode under writeback
almost always get redirtied by a busy dirtier.  We used to call
redirty_tail() in this case, which could delay inode for up to 30s.

This is unacceptable because it now happens so frequently for plain cp/dd,
that the accumulated delays could make writeback of big files very slow.

So let's distinguish between data redirty and metadata only redirty.
The first one is caused by a busy dirtier, while the latter one could
happen in XFS, NFS, etc. when they are doing delalloc or updating isize.

The inode being busy dirtied will now be requeued for next io, while
the inode being redirtied by fs will continue to be delayed to avoid
repeated IO.

CC: Jan Kara <jack@suse.cz>
CC: Theodore Ts'o <tytso@mit.edu>
CC: Dave Chinner <david@fromorbit.com>
CC: Chris Mason <chris.mason@oracle.com>
CC: Christoph Hellwig <hch@infradead.org>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b3af9468

writeback: make the super_block pinning more efficient · 9ecc2738

由 Jens Axboe 提交于 9月 24, 2009

Currently we pin the inode->i_sb for every single inode. This
increases cache traffic on sb->s_umount sem. Lets instead
cache the inode sb pin state and keep the super_block pinned
for as long as keep writing out inodes from the same
super_block.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

9ecc2738

writeback: don't resort for a single super_block in move_expired_inodes() · cf137307

由 Jens Axboe 提交于 9月 24, 2009

If we only moved inodes from a single super_block to the temporary
list, there's no point in doing a resort for multiple super_blocks.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

cf137307

writeback: move inodes from one super_block together · 5c03449d

由 Shaohua Li 提交于 9月 24, 2009

__mark_inode_dirty adds inode to wb dirty list in random order. If a disk has
several partitions, writeback might keep spindle moving between partitions.
To reduce the move, better write big chunk of one partition and then move to
another. Inodes from one fs usually are in one partion, so idealy move indoes
from one fs together should reduce spindle move. This patch tries to address
this. Before per-bdi writeback is added, the behavior is write indoes
from one fs first and then another, so the patch restores previous behavior.
The loop in the patch is a bit ugly, should we add a dirty list for each
superblock in bdi_writeback?

Test in a two partition disk with attached fio script shows about 3% ~ 6%
improvement.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

5c03449d

J
writeback: get rid to incorrect references to pdflush in comments · 5b0830cb
由 Jens Axboe 提交于 9月 23, 2009
```
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
5b0830cb

writeback: improve readability of the wb_writeback() continue/break logic · 71fd05a8

由 Jens Axboe 提交于 9月 23, 2009

And throw some comments in there, too.
Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

71fd05a8

writeback: cleanup writeback_single_inode() · ae1b7f7d

由 Wu Fengguang 提交于 9月 23, 2009

Make the if-else straight in writeback_single_inode().
No behavior change.

Cc: Jan Kara <jack@suse.cz>
Cc: Michael Rubin <mrubin@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

ae1b7f7d

writeback: kupdate writeback shall not stop when more io is possible · 7fbdea32

由 Wu Fengguang 提交于 9月 23, 2009

Fix the kupdate case, which disregards wbc.more_io and stop writeback
prematurely even when there are more inodes to be synced.

wbc.more_io should always be respected.

Also remove the pages_skipped check. It will set when some page(s) of some
inode(s) cannot be written for now. Such inodes will be delayed for a while.
This variable has nothing to do with whether there are other writeable inodes.

CC: Jan Kara <jack@suse.cz>
CC: Dave Chinner <david@fromorbit.com>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7fbdea32

writeback: stop background writeback when below background threshold · d3ddec76

由 Wu Fengguang 提交于 9月 23, 2009

Treat bdi_start_writeback(0) as a special request to do background write,
and stop such work when we are below the background dirty threshold.

Also simplify the (nr_pages <= 0) checks. Since we already pass in
nr_pages=LONG_MAX for WB_SYNC_ALL and background writes, we don't
need to worry about it being decreased to zero.
Reported-by: NRichard Kennedy <richard@rsk.demon.co.uk>
CC: Jan Kara <jack@suse.cz>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

d3ddec76

fs: Fix busyloop in wb_writeback() · a5989bdc

由 Jan Kara 提交于 9月 16, 2009

If all inodes are under writeback (e.g. in case when there's only one inode
with dirty pages), wb_writeback() with WB_SYNC_NONE work basically degrades
to busylooping until I_SYNC flags of the inode is cleared. Fix the problem by
waiting on I_SYNC flags of an inode on b_more_io list in case we failed to
write anything.
Tested-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a5989bdc

25 9月, 2009 7 次提交

[CIFS] Remove build warning · 15dd4781

由 Steve French 提交于 9月 25, 2009

Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

15dd4781

cifs: fix problems with last two commits · 5d2c0e22

由 Jeff Layton 提交于 9月 24, 2009

Fix problems with commits:

086f68bd
3bc303c2Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

5d2c0e22

S
[CIFS] Fix build break when keys support turned off · 0f59e61c
由 Steve French 提交于 9月 25, 2009
```
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>
```
0f59e61c

procfs: disable per-task stack usage on NOMMU · c44972f1

由 Andrew Morton 提交于 9月 24, 2009

It needs walk_page_range().
Reported-by: NMichal Simek <monstr@monstr.eu>
Tested-by: NMichal Simek <monstr@monstr.eu>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c44972f1

cifs: eliminate cifs_init_private · 086f68bd

由 Jeff Layton 提交于 9月 21, 2009

...it does the same thing as cifs_fill_fileinfo, but doesn't handle the
flist ordering correctly. Also rename cifs_fill_fileinfo to a more
descriptive name and have it take an open flags arg instead of just a
write_only flag. That makes the logic in the callers a little simpler.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

086f68bd

nfs[23] tcp breakage in mount with binary options · 36dd2fdb

由 Al Viro 提交于 9月 24, 2009

We forget to set nfs_server.protocol in tcp case when old-style binary
options are passed to mount.  The thing remains zero and never validated
afterwards.  As the result, we hit BUG in fs/nfs/client.c:588.

Breakage has been introduced in NFS: Add nfs_alloc_parsed_mount_data
merged yesterday...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

36dd2fdb

cifs: convert oplock breaks to use slow_work facility (try #4) · 3bc303c2

由 Jeff Layton 提交于 9月 21, 2009

This is the fourth respin of the patch to convert oplock breaks to
use the slow_work facility.

A customer of ours was testing a backport of one of the earlier
patchsets, and hit a "Busy inodes after umount..." problem. An oplock
break job had raced with a umount, and the superblock got torn down and
its memory reused. When the oplock break job tried to dereference the
inode->i_sb, the kernel oopsed.

This patchset has the oplock break job hold an inode and vfsmount
reference until the oplock break completes.  With this, there should be
no need to take a tcon reference (the vfsmount implicitly holds one
already).

Currently, when an oplock break comes in there's a chance that the
oplock break job won't occur if the allocation of the oplock_q_entry
fails. There are also some rather nasty races in the allocation and
handling these structs.

Rather than allocating oplock queue entries when an oplock break comes
in, add a few extra fields to the cifsFileInfo struct. Get rid of the
dedicated cifs_oplock_thread as well and queue the oplock break job to
the slow_work thread pool.

This approach also has the advantage that the oplock break jobs can
potentially run in parallel rather than be serialized like they are
today.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

3bc303c2

24 9月, 2009 2 次提交

task_struct cleanup: move binfmt field to mm_struct · 801460d0

由 Hiroshi Shimamoto 提交于 9月 23, 2009

Because the binfmt is not different between threads in the same process,
it can be moved from task_struct to mm_struct.  And binfmt moudle is
handled per mm_struct instead of task_struct.
Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

801460d0

fs/romfs: correct error-handling code · a21f3c2a

由 Julia Lawall 提交于 9月 23, 2009

romfs_iget returns an ERR_PTR value in an error case instead of NULL.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@match exists@
expression x, E;
statement S1, S2;
@@

x = romfs_iget(...)
... when != x = E
(
*  if (x == NULL || ...) S1 else S2
|
*  if (x == NULL && ...) S1 else S2
)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a21f3c2a