提交 · f88657ce3f9713a0c62101dffb0e972a979e77b9 · openanolis / cloud-kernel

06 9月, 2011 3 次提交

fs/9p: Add OS dependent open flags in 9p protocol · f88657ce

由 Aneesh Kumar K.V 提交于 8月 03, 2011

Some of the flags are OS/arch dependent we add a 9p
protocol value which maps to asm-generic/fcntl.h values in Linux
Based on the original patch from Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

f88657ce

fs/9p: Don't update file type when updating file attributes · 45089142

由 Aneesh Kumar K.V 提交于 7月 25, 2011

We should only update attributes that we can change on stat2inode.
Also do file type initialization in v9fs_init_inode.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

45089142

fs/9p: Add fid before dentry instantiation · 5441ae5e

由 Aneesh Kumar K.V 提交于 7月 25, 2011

d_instantiate marks the dentry positive. So a parallel lookup and mkdir of
the directory can find dentry that doesn't have fid attached. This can result
in both the code path doing v9fs_fid_add which results in v9fs_dentry leak.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

5441ae5e

01 9月, 2011 2 次提交

xfs: fix ->write_inode return values · 58d84c4e

由 Christoph Hellwig 提交于 8月 27, 2011

Currently we always redirty an inode that was attempted to be written out
synchronously but has been cleaned by an AIL pushed internall, which is
rather bogus.  Fix that by doing the i_update_core check early on and
return 0 for it.  Also include async calls for it, as doing any work for
those is just as pointless.  While we're at it also fix the sign for the
EIO return in case of a filesystem shutdown, and fix the completely
non-sensical locking around xfs_log_inode.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>
(cherry picked from commit 297db93bb74cf687510313eb235a7aec14d67e97)
Signed-off-by: NAlex Elder <aelder@sgi.com>

58d84c4e

xfs: fix xfs_mark_inode_dirty during umount · 866e4ed7

由 Christoph Hellwig 提交于 8月 27, 2011

During umount we do not add a dirty inode to the lru and wait for it to
become clean first, but force writeback of data and metadata with
I_WILL_FREE set.  Currently there is no way for XFS to detect that the
inode has been redirtied for metadata operations, as we skip the
mark_inode_dirty call during teardown.  Fix this by setting i_update_core
nanually in that case, so that the inode gets flushed during inode reclaim.

Alternatively we could enable calling mark_inode_dirty for inodes in
I_WILL_FREE state, and let the VFS dirty tracking handle this.  I decided
against this as we will get better I/O patterns from reclaim compared to
the synchronous writeout in write_inode_now, and always marking the inode
dirty in some way from xfs_mark_inode_dirty is a better safetly net in
either case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>
(cherry picked from commit da6742a5a4cc844a9982fdd936ddb537c0747856)
Signed-off-by: NAlex Elder <aelder@sgi.com>

866e4ed7

31 8月, 2011 1 次提交

ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining · 8c0bec21

由 Jiaying Zhang 提交于 8月 31, 2011

The i_mutex lock and flush_completed_IO() added by commit 2581fdc8
in ext4_evict_inode() causes lockdep complaining about potential
deadlock in several places. In most/all of these LOCKDEP complaints
it looks like it's a false positive, since many of the potential
circular locking cases can't take place by the time the
ext4_evict_inode() is called; but since at the very least it may mask
real problems, we need to address this.

This change removes the flush_completed_IO() and i_mutex lock in
ext4_evict_inode(). Instead, we take a different approach to resolve
the software lockup that commit 2581fdc8 intends to fix. Rather
than having ext4-dio-unwritten thread wait for grabing the i_mutex
lock of an inode, we use mutex_trylock() instead, and simply requeue
the work item if we fail to grab the inode's i_mutex lock.

This should speed up work queue processing in general and also
prevents the following deadlock scenario: During page fault,
shrink_icache_memory is called that in turn evicts another inode B.
Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

8c0bec21

27 8月, 2011 1 次提交

All Arch: remove linkage for sys_nfsservctl system call · f5b94099

由 NeilBrown 提交于 8月 26, 2011

The nfsservctl system call is now gone, so we should remove all
linkage for it.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5b94099

26 8月, 2011 1 次提交

lockdep: Add helper function for dir vs file i_mutex annotation · e096d0c7

由 Josh Boyer 提交于 8月 25, 2011

Purely in-memory filesystems do not use the inode hash as the dcache
tells us if an entry already exists.  As a result, they do not call
unlock_new_inode, and thus directory inodes do not get put into a
different lockdep class for i_sem.

We need the different lockdep classes, because the locking order for
i_mutex is different for directory inodes and regular inodes.  Directory
inodes can do "readdir()", which takes i_mutex *before* possibly taking
mm->mmap_sem (due to a page fault while copying the directory entry to
user space).

In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
before accessing i_mutex.

The two cases can never happen for the same inode, so no real deadlock
can occur, but without the different lockdep classes, lockdep cannot
understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
can lead to false positives from lockdep like below:

    find/645 is trying to acquire lock:
     (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac

    but task is already holding lock:
     (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>]
    vfs_readdir+0x5b/0xb4

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
          [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
          [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
          [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
          [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
          [<ffffffff81111557>] mmap_region+0x258/0x432
          [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
          [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
          [<ffffffff8100c858>] sys_mmap+0x22/0x24
          [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

    -> #0 (&mm->mmap_sem){++++++}:
          [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
          [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
          [<ffffffff81109541>] might_fault+0x89/0xac
          [<ffffffff81149cff>] filldir+0x6f/0xc7
          [<ffffffff811586ea>] dcache_readdir+0x67/0x205
          [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
          [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
          [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

This patch moves the directory vs file lockdep annotation into a helper
function that can be called by in-memory filesystems and has hugetlbfs
call it.
Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e096d0c7

25 8月, 2011 1 次提交

xfs: deprecate the nodelaylog mount option · 242d6219

由 Christoph Hellwig 提交于 8月 24, 2011

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

242d6219

24 8月, 2011 1 次提交

fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message · c2183d1e

由 Miklos Szeredi 提交于 8月 24, 2011

FUSE_NOTIFY_INVAL_ENTRY didn't check the length of the write so the
message processing could overrun and result in a "kernel BUG at
fs/fuse/dev.c:629!"
Reported-by: NHan-Wen Nienhuys <hanwenn@gmail.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org

c2183d1e

23 8月, 2011 1 次提交

xfs: fix tracing builds inside the source tree · b6bede3b

由 Christoph Hellwig 提交于 8月 14, 2011

The code really requires the current source directory to be in the
header search path.  We already do this if building with an object
tree separate from the source, but it needs to be added manually
if building inside the source.  The cflags addition for it accidentally
got removed when collapsing the xfs directory structure.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

b6bede3b

21 8月, 2011 1 次提交

Btrfs: fix 64 bit divide problem · 6719db6a

由 Josef Bacik 提交于 8月 20, 2011

This fixes a regression introduced by commit cdcb725c ("Btrfs: check
if there is enough space for balancing smarter").  We can't do 64-bit
divides on 32-bit architectures.

In cases where we need to divide/multiply by 2 we should just left/right
shift respectively, and in cases where theres N number of devices use
do_div.  Also make the counters u64 to match up with rw_devices.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Acked-and-tested-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6719db6a

20 8月, 2011 1 次提交

ext4: flush any pending end_io requests before DIO reads w/dioread_nolock · dccaf33f

由 Jiaying Zhang 提交于 8月 19, 2011

There is a race between ext4 buffer write and direct_IO read with
dioread_nolock mount option enabled. The problem is that we clear
PageWriteback flag during end_io time but will do
uninitialized-to-initialized extent conversion later with dioread_nolock.
If an O_direct read request comes in during this period, ext4 will return
zero instead of the recently written data.

This patch checks whether there are any pending uninitialized-to-initialized
extent conversion requests before doing O_direct read to close the race.
Note that this is just a bandaid fix. The fundamental issue is that we
clear PageWriteback flag before we really complete an IO, which is
problem-prone. To fix the fundamental issue, we may need to implement an
extent tree cache that we can use to look up pending to-be-converted extents.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

dccaf33f

19 8月, 2011 2 次提交

S
update cifs version to 1.75 · 04c05b4a
由 Steve French 提交于 8月 18, 2011
```
Signed-off-by: NSteve French <sfrench@us.ibm.com>
```
04c05b4a

[CIFS] possible memory corruption on mount · 13589c43

由 Steve French 提交于 8月 18, 2011

CIFS cleanup_volume_info_contents() looks like having a memory
corruption problem.
When UNCip is set to "&vol->UNC[2]" in cifs_parse_mount_options(), it
should not be kfree()-ed in cleanup_volume_info_contents().

Introduced in commit b946845aSigned-off-by: NJ.R. Okajima <hooanon05@yahoo.co.jp>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
CC: Stable <stable@kernel.org>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

13589c43

18 8月, 2011 4 次提交

Btrfs: set i_size properly when fallocating and we already · f1e490a7

由 Josef Bacik 提交于 8月 18, 2011

xfstests exposed a problem with preallocate when it fallocates a range that
already has an extent. We don't set the new i_size properly because we see that
we already have an extent. This isn't right and we should update i_size if the
space already exists. With this patch we now pass xfstests 075. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f1e490a7

btrfs: unlock on error in btrfs_file_llseek() · 9a4327ca

由 Dan Carpenter 提交于 8月 18, 2011

There were some unlocks on error missing in a recent patch to
btrfs_file_llseek().
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9a4327ca

btrfs: btrfs_permission's RO check shouldn't apply to device nodes · cb6db4e5

由 Jeff Mahoney 提交于 8月 15, 2011

This patch tightens the read-only access checks in btrfs_permission to
match the constraints in inode_permission. Currently, even though the
device node itself will be unmodified, read-write access to device nodes
is denied to when the device node resides on a read-only subvolume or a
is a file that has been marked read-only by the btrfs conversion utility.

With this patch applied, the check only affects regular files,
directories, and symlinks. It also restructures the code a bit so that
we don't duplicate the MAY_WRITE check for both tests.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cb6db4e5

befs: Validate length of long symbolic links. · 338d0f0a

由 Timo Warns 提交于 8月 17, 2011

Signed-off-by: NTimo Warns <warns@pre-sense.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

338d0f0a

17 8月, 2011 13 次提交

fat: fat16 support maximum 4GB file/vol size as WinXP or 7. · 710d4403

由 Namjae Jeon 提交于 8月 17, 2011

FAT16 support maximum 4GB vol/file size with 64KB cluster size.

Win NT/XP/7 increased the maximum cluster size to 64KB, and file/vol
size increased 4GB also.  Although increasing, the file size of linux
FAT is still limited at 2GB.

I found that it is limited by sb->maxbytes(0x7fffffff) when partition
is formatted by FAT16.  sb->s_maxbytes in fill_super should be set to
0xffffffff like fat32.
Signed-off-by: NNamjae Jeon <linkinjeon@gmail.com>
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

710d4403

fat: fix utf8 iocharset warning message · 186b5370

由 Mihai Moldovan 提交于 8月 17, 2011

The fat_msg function already formats the given message and appends
a newline to it - we don't need to do this in the passed message
string as well, or will end up with a blank line printed in the
kernel log ring buffer.

Also change the loglevel from error to warning.
Signed-off-by: NMihai Moldovan <ionic@ionic.de>
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

186b5370

fat: fix build warning · 8c320c07

由 Jonas Aberg 提交于 8月 17, 2011

This fixes a compile warning (unititialized variable) in
the fat filesystem code.
Signed-off-by: NJonas Aberg <jonas.aberg@stericsson.com>
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

8c320c07

Btrfs: truncate pages from clone ioctl target range · f81c9cdc

由 Sage Weil 提交于 8月 10, 2011

We need to truncate page cache pages for the clone ioctl target range or
else we'll confuse ourselves to no end.  If the old data was cached, we
used to still see it (until remount).  If the page was partially updated
we used to get a mix of old and new data.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f81c9cdc

Btrfs: fix uninitialized sync_pending · 0e588859

由 Miao Xie 提交于 8月 05, 2011

sync_pending is uninitialized before it be used, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0e588859

Btrfs: fix wrong free space information · bb3ac5a4

由 Miao Xie 提交于 8月 05, 2011

Btrfs subtracted the size of the allocated space twice when it allocated
the space from the bitmap in the cluster, it broke the free space information
and led to oops finally.

And this patch also fixes the bug that ctl->free_space was subtracted
without lock.
Reported-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bb3ac5a4

btrfs: memory leak in btrfs_add_inode_defrag() · f4ac904c

由 Dan Carpenter 提交于 8月 05, 2011

We don't use the defrag struct on this path.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f4ac904c

Btrfs: use plain page_address() in header fields setget functions · c97c2916

由 Li Zefan 提交于 8月 03, 2011

We've stopped using highmem for extent buffers.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c97c2916

Btrfs: forced readonly when btrfs_drop_snapshot() fails · cb1b69f4

由 Tsutomu Itoh 提交于 8月 09, 2011

The filesystem turns readonly instead of returning the error to the
caller when detected error in btrfs_drop_snapshot().
and, because the caller doesn't check the error, the function type is
changed to 'void'.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cb1b69f4

Btrfs: check if there is enough space for balancing smarter · cdcb725c

由 liubo 提交于 8月 03, 2011

When checking if there is enough space for balancing a block group,
since we do not take raid types into consideration, we do not account
corrent amounts of space that we needed.  This makes us do some extra
work before we get ENOSPC.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

cdcb725c

Btrfs: fix a bug of balance on full multi-disk partitions · 38c01b96

由 liubo 提交于 8月 02, 2011

When balancing, we'll first try to shrink devices for some space,
but if it is working on a full multi-disk partition with raid protection,
we may encounter a bug, that is, while shrinking, total_bytes may be less
than bytes_used, and btrfs may allocate a dev extent that accesses out of
device's bounds.

Then we will not be able to write or read the data which stores at the end
of the device, and get the followings:

device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5
Btrfs detected SSD devices, enabling SSD mode
btrfs: relocating block group 476315648 flags 9
btrfs: found 4 extents
attempt to access beyond end of device
sdb5: rw=145, want=546176, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546304, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546432, limit=546147
attempt to access beyond end of device
sdb5: rw=145, want=546560, limit=546147
attempt to access beyond end of device
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

38c01b96

Btrfs: fix an oops of log replay · 34f3e4f2

由 liubo 提交于 8月 06, 2011

When btrfs recovers from a crash, it may hit the oops below:

------------[ cut here ]------------
kernel BUG at fs/btrfs/inode.c:4580!
[...]
RIP: 0010:[<ffffffffa03df251>]  [<ffffffffa03df251>] btrfs_add_link+0x161/0x1c0 [btrfs]
[...]
Call Trace:
 [<ffffffffa03e7b31>] ? btrfs_inode_ref_index+0x31/0x80 [btrfs]
 [<ffffffffa04054e9>] add_inode_ref+0x319/0x3f0 [btrfs]
 [<ffffffffa0407087>] replay_one_buffer+0x2c7/0x390 [btrfs]
 [<ffffffffa040444a>] walk_down_log_tree+0x32a/0x480 [btrfs]
 [<ffffffffa0404695>] walk_log_tree+0xf5/0x240 [btrfs]
 [<ffffffffa0406cc0>] btrfs_recover_log_trees+0x250/0x350 [btrfs]
 [<ffffffffa0406dc0>] ? btrfs_recover_log_trees+0x350/0x350 [btrfs]
 [<ffffffffa03d18b2>] open_ctree+0x1442/0x17d0 [btrfs]
[...]

This comes from that while replaying an inode ref item, we forget to
check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree,
then we will come to conflict corners which lead to BUG_ON().
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Tested-by: NAndy Lutomirski <luto@mit.edu>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

34f3e4f2

Btrfs: detect wether a device supports discard · d5e2003c

由 Josef Bacik 提交于 8月 04, 2011

We have a problem where if a user specifies discard but doesn't actually support
it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem
because this gets called (in a fashion) from the tree log recovery code, which
has a nice little BUG_ON(ret) after it, which causes us to fail the tree log
replay. So instead detect wether our devices support discard when we're adding
them and then don't issue discards if we know that the device doesn't support
it. And just for good measure set ret = 0 in btrfs_issue_discard just in case
we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d5e2003c

16 8月, 2011 1 次提交

cifs: demote cERROR in build_path_from_dentry to cFYI · fa71f447

由 Jeff Layton 提交于 8月 08, 2011

Running the cthon tests on a recent kernel caused this message to pop
occasionally:

    CIFS VFS: did not end path lookup where expected namelen is 0

Some added debugging showed that namelen and dfsplen were both 0 when
this occurred. That means that the read_seqretry returned true.

Assuming that the comment inside the if statement is true, this should
be harmless and just means that we raced with a rename. If that is the
case, then there's no need for alarm and we can demote this to cFYI.

While we're at it, print the dfsplen too so that we can see what
happened here if the message pops during debugging.

Cc: stable@kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NSteve French <sfrench@us.ibm.com>

fa71f447

14 8月, 2011 3 次提交

ext4: fix nomblk_io_submit option so it correctly converts uninit blocks · 9dd75f1f

由 Theodore Ts'o 提交于 8月 13, 2011

Bug discovered by Jan Kara:

Finally, commit 1449032b returned back
the old IO submission code but apparently it forgot to return the old
handling of uninitialized buffers so we unconditionnaly call
block_write_full_page() without specifying end_io function. So AFAICS
we never convert unwritten extents to written in some cases. For
example when I mount the fs as: mount -t ext4 -o
nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do
        int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600);
        char buf[1024];
        memset(buf, 'a', sizeof(buf));
        fallocate(fd, 0, 0, 16384);
        write(fd, buf, sizeof(buf));

I get a file full of zeros (after remounting the filesystem so that
pagecache is dropped) instead of seeing the first KB contain 'a's.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

9dd75f1f

ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. · 32c80b32

由 Tao Ma 提交于 8月 13, 2011

EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
should be done simultaneously since ext4_end_io_nolock always clear
the flag and decrease the counter in the same time.

We don't increase i_aiodio_unwritten when setting
EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
to wait forever.

Part of the patch came from Eric in his e-mail, but it doesn't fix the
problem met by Michael actually.

http://marc.info/?l=linux-ext4&m=131316851417460&w=2

Reported-and-Tested-by: Michael Tokarev<mjt@tls.msk.ru>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

32c80b32

ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode · 2581fdc8

由 Jiaying Zhang 提交于 8月 13, 2011

Flush inode's i_completed_io_list before calling ext4_io_wait to
prevent the following deadlock scenario: A page fault happens while
some process is writing inode A. During page fault,
shrink_icache_memory is called that in turn evicts another inode
B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
that waits for inode B's i_ioend_count to become zero. However, inode
B's ioend work was queued behind some of inode A's ioend work on the
same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
thread on that cpu is processing inode A's ioend work, it tries to
grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
still hold before the page fault happened, we enter a deadlock.

Also moves ext4_flush_completed_IO and ext4_ioend_wait from
ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
ext4_evict_inode() is called before ext4_destroy_inode() and in
ext4_evict_inode(), we may call ext4_truncate() without holding
i_mutex lock. As a result, there is a race between flush_completed_IO
that is called from ext4_ext_truncate() and ext4_end_io_work, which
may cause corruption on an io_end structure. This change moves
ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
to ext4_evict_inode() to resolve the race between ext4_truncate() and
ext4_end_io_work during inode deletion.
Signed-off-by: NJiaying Zhang <jiayingz@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

2581fdc8

13 8月, 2011 4 次提交

ext4: Fix ext4_should_writeback_data() for no-journal mode · 441c8508

由 Curt Wohlgemuth 提交于 8月 13, 2011

ext4_should_writeback_data() had an incorrect sequence of
tests to determine if it should return 0 or 1: in
particular, even in no-journal mode, 0 was being returned
for a non-regular-file inode.

This meant that, in non-journal mode, we would use
ext4_journalled_aops for directories, symlinks, and other
non-regular files.  However, calling journalled aop
callbacks when there is no valid handle, can cause problems.

This would cause a kernel crash with Jan Kara's commit
2d859db3 ("ext4: fix data corruption in inodes with
journalled data"), because we now dereference 'handle' in
ext4_journalled_write_end().

I also added BUG_ONs to check for a valid handle in the
obviously journal-only aops callbacks.

I tested this running xfstests with a scratch device in
these modes:

   - no-journal
   - data=ordered
   - data=writeback
   - data=journal

All work fine; the data=journal run has many failures and a
crash in xfstests 074, but this is no different from a
vanilla kernel.
Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org

441c8508

xfs: remove subdirectories · c59d87c4

由 Christoph Hellwig 提交于 8月 12, 2011

Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
annoying subdirectories in the XFS source code.  Besides the large
amount of file rename the only changes are to the Makefile, a few
files including headers with the subdirectory prefix, and the binary
sysctl compat code that includes a header under fs/xfs/ from
kernel/.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

c59d87c4

xfs: don't expect xfs headers to be in subdirectories · 06f8e2d6

由 Alex Elder 提交于 8月 12, 2011

Fix up some #include directives in preparation for moving a few
header files out of xfs source subdirectories.

Note that "xfs_linux.h" also got its quoting convention for included
files switched.
Signed-off-by: NAlex Elder <aelder@sgi.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

06f8e2d6

xfs: replace xfs_buf_geterror() with bp->b_error · e5702805

由 Chandra Seetharaman 提交于 8月 03, 2011

Since we just checked bp for NULL, it is ok to replace
xfs_buf_geterror() with bp->b_error in these places.
Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

e5702805

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功