提交 · a06a2416038d317a6430e453f5bc5fd81834554d · openanolis / cloud-kernel

28 5月, 2013 29 次提交

f2fs: optimize several routines in node.h · a06a2416

由 Namjae Jeon 提交于 5月 23, 2013

There are various functions with common code which could be separated
out to make common routines. So, made new routines and in order to
retain the same call path and no major changes, written some macros
to access those routines.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

a06a2416

f2fs: remove unneeded initializations in f2fs_parent_dir · 4777f86b

由 Namjae Jeon 提交于 5月 23, 2013

There is no need to initialize few pointers in f2fs_parent_dir
as the values are not checked and instead directly initialized
values are used.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

4777f86b

f2fs: push some variables to debug part · 35b09d82

由 Namjae Jeon 提交于 5月 23, 2013

Some, counters are needed only for the statistical information
while debugging.
So, those can be controlled using CONFIG_F2FS_STAT_FS,
pushing the usage for few variables under this flag.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

35b09d82

f2fs: align data types between on-disk and in-memory block addresses · a9841c4d

由 Jaegeuk Kim 提交于 5月 24, 2013

The on-disk block address is defined as __le32, but in-memory block address,
block_t, does as u64.

Let's synchronize them to 32 bits.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

a9841c4d

f2fs: dereferencing an ERR_PTR · f28c06fa

由 Dan Carpenter 提交于 5月 23, 2013

There is an error path where "dir" is an ERR_PTR.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

f28c06fa

f2fs: use ihold · 6f6fd833

由 Jaegeuk Kim 提交于 5月 22, 2013

Use the following helper function committed by Al.

commit 7de9c6ee
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Sat Oct 23 11:11:40 2010 -0400

    new helper: ihold()

...
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

6f6fd833

f2fs: should not make_bad_inode on f2fs_link failure · 93ff10d6

由 Jaegeuk Kim 提交于 5月 22, 2013

If -ENOSPC is met during f2fs_link, we should not make the inode as bad.
The inode is still alive.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

93ff10d6

f2fs: fix to handle do_recover_data errors · 39cf72cf

由 Jaegeuk Kim 提交于 5月 22, 2013

This patch adds error handling codes of check_index_in_prev_nodes and its
caller, do_recover_data.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

39cf72cf

f2fs: reuse the locked dnode page and its inode · b292dcab

由 Jaegeuk Kim 提交于 5月 22, 2013

This patch fixes the following deadlock bug during the recovery.

INFO: task mount:1322 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount           D ffffffff81125870     0  1322   1266 0x00000000
 ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046
 ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8
 ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520
Call Trace:
[<ffffffff81125870>] ? __lock_page+0x70/0x70
[<ffffffff816a92d9>] schedule+0x29/0x70
[<ffffffff816a93af>] io_schedule+0x8f/0xd0
[<ffffffff8112587e>] sleep_on_page+0xe/0x20
[<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff81125867>] __lock_page+0x67/0x70
[<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40
[<ffffffff81126857>] find_lock_page+0x67/0x80
[<ffffffff8112698f>] find_or_create_page+0x3f/0xb0
[<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs]
[<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs]
[<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs]
[<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40
[<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs]
[<ffffffff81184cf9>] mount_bdev+0x1c9/0x210
[<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs]
[<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffff81185a13>] mount_fs+0x43/0x1b0
[<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20
[<ffffffff811a0796>] vfs_kern_mount+0x76/0x120
[<ffffffff811a2cb7>] do_mount+0x237/0xa10
[<ffffffff81140b9b>] ? strndup_user+0x5b/0x80
[<ffffffff811a3520>] SyS_mount+0x90/0xe0
[<ffffffff816b3502>] system_call_fastpath+0x16/0x1b

The bug is triggered when check_index_in_prev_nodes tries to get the direct
node page by calling get_node_page.
At this point, if the direct node page is already locked by get_dnode_of_data,
its caller, we got a deadlock condition.

This patch adds additional condition check for the reuse of locked direct node
pages prior to the get_node_page call.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

b292dcab

f2fs: fix wrong condition check · b638f0c4

由 Jaegeuk Kim 提交于 5月 21, 2013

While an orphan inode has zero link_count, f2fs_gc is able to select the inode
for foreground gc.

- f2fs_gc
 - do_garbage_collect
   - gc_data_segment
     : f2fs_iget is failed
     : get_valid_blocks() != 0, so that retry
--> here we got the infinite loop.

This patch resolved this issue.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

b638f0c4

f2fs: add f2fs_readonly() · 77888c1e

由 Jaegeuk Kim 提交于 5月 20, 2013

Introduce a simple macro function for readability.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

77888c1e

f2fs: avoid RECLAIM_FS-ON-W: deadlock · 6f85b352

由 Jaegeuk Kim 提交于 5月 20, 2013

This patch tries to avoid the following deadlock condition of which the reclaim
path can trigger f2fs_balance_fs again.

=================================
[ INFO: inconsistent lock state ]
---------------------------------
inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
kswapd0/41 [HC0[0]:SC0[0]:HE1:SE1] takes:
 (&sbi->gc_mutex){+.+.?.}, at: f2fs_balance_fs+0xe6/0x100 [f2fs]
{RECLAIM_FS-ON-W} state was registered at:
  [<ffffffff810aa5a9>] mark_held_locks+0xb9/0x140
  [<ffffffff810aae85>] lockdep_trace_alloc+0x85/0xf0
  [<ffffffff8113ab2c>] __alloc_pages_nodemask+0x7c/0x9b0
  [<ffffffff81175aa8>] alloc_pages_current+0xb8/0x180
  [<ffffffff811319cf>] __page_cache_alloc+0xaf/0xd0
  [<ffffffff8113225c>] find_or_create_page+0x4c/0xb0
  [<ffffffffa021359e>] find_data_page+0x14e/0x210 [f2fs]
  [<ffffffffa021161b>] f2fs_gc+0x9eb/0xd90 [f2fs]
  [<ffffffffa0218fae>] f2fs_balance_fs+0xee/0x100 [f2fs]
  [<ffffffffa020848c>] f2fs_setattr+0x6c/0x200 [f2fs]
  [<ffffffff811ae51b>] notify_change+0x1db/0x3a0
  [<ffffffff8118fbd0>] do_truncate+0x60/0xa0
  [<ffffffff8118fd95>] vfs_truncate+0x185/0x1b0
  [<ffffffff8118fe1c>] do_sys_truncate+0x5c/0xa0
  [<ffffffff8118ffee>] SyS_truncate+0xe/0x10
  [<ffffffff816e2b42>] system_call_fastpath+0x16/0x1b
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

6f85b352

f2fs: don't do checkpoint if error is occurred · 2c2c149f

由 Jaegeuk Kim 提交于 5月 20, 2013

If we met an error during the dentry recovery, we should not conduct checkpoint.
Otherwise, some errorneous dentry blocks overwrites the existing blocks that
contain the remaining recovery information.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

2c2c149f

f2fs: fix to unlock page before exit · 45856aff

由 Jaegeuk Kim 提交于 5月 20, 2013

If we got an error after lock_page, we should unlock it before exit.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

45856aff

f2fs: remove unnecessary kmap/kunmap operations · 9a55ed65

由 Jaegeuk Kim 提交于 5月 20, 2013

The allocated page used by the recovery is not on HIGHMEM, so that we don't
need to use kmap/kunmap.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

9a55ed65

f2fs: reorganize f2fs_vm_page_mkwrite · 9851e6e1

由 Namjae Jeon 提交于 4月 28, 2013

Few things can be changed in the default mkwrite function
1) Make file_update_time at the start before acquiring any lock
2) the condition page_offset(page) >= i_size_read(inode) should be
 changed to page_offset(page) > i_size_read
3) Move wait_on_page_writeback.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

9851e6e1

f2fs: use list_for_each_entry rather than list_for_each_entry_safe · 145b04e5

由 majianpeng 提交于 5月 14, 2013

We can do this, since now we use a global mutex, f2fs_stat_mutex to protect its
list operations.
Signed-off-by: NJianpeng Ma <majianpeng@gmail.com>
[Jaegeuk Kim: add description]
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

145b04e5

f2fs: remove unecessary variable and code · 81fb5e87

由 Haicheng Li 提交于 5月 14, 2013

Code cleanup without behavior changed.
Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

81fb5e87

f2fs, lockdep: annotate mutex_lock_all() · bfe35965

由 Peter Zijlstra 提交于 5月 16, 2013

Majianpeng reported a lockdep splat for f2fs. It turns out mutex_lock_all()
acquires an array of locks (in global/local lock style).

Any such operation is always serialized using cp_mutex, therefore there is no
fs_lock[] lock-order issue; tell lockdep about this using the
mutex_lock_nest_lock() primitive.
Reported-by: Nmajianpeng <majianpeng@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

bfe35965

f2fs: add debug msgs in the recovery routine · f356fe0c

由 Jaegeuk Kim 提交于 5月 16, 2013

This patch adds some trivial debugging messages in the recovery process.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

f356fe0c

f2fs: update inode page after creation · 44a83ff6

由 Jaegeuk Kim 提交于 5月 20, 2013

I found a bug when testing power-off-recovery as follows.

[Bug Scenario]
1. create a file
2. fsync the file
3. reboot w/o any sync
4. try to recover the file
 - found its fsync mark
 - found its dentry mark
   : try to recover its dentry
    - get its file name
    - get its parent inode number
     : here we got zero value

The reason why we get the wrong parent inode number is that we didn't
synchronize the inode page with its newly created inode information perfectly.

Especially, previous f2fs stores fi->i_pino and writes it to the cached
node page in a wrong order, which incurs the zero-valued i_pino during the
recovery.

So, this patch modifies the creation flow to fix the synchronization order of
inode page with its inode.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

44a83ff6

f2fs: change get_new_data_page to pass a locked node page · 64aa7ed9

由 Jaegeuk Kim 提交于 5月 20, 2013

This patch is for passing a locked node page to get_dnode_of_data.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

64aa7ed9

f2fs: skip get_node_page if locked node page is passed · 1646cfac

由 Jaegeuk Kim 提交于 5月 20, 2013

If get_dnode_of_data gets a locked node page, let's skip redundant
get_node_page calls.
This is for the futher enhancement.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

1646cfac

f2fs: remove unnecessary por_doing check · 0a364af1

由 Jaegeuk Kim 提交于 5月 16, 2013

This por_doing check is totally not related to the recovery process.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

0a364af1

f2fs: fix BUG_ON during f2fs_evict_inode(dir) · 74d0b917

由 Jaegeuk Kim 提交于 5月 15, 2013

During the dentry recovery routine, recover_inode() triggers __f2fs_add_link
with its directory inode.

In the following scenario, a bug is captured.
 1. dir = f2fs_iget(pino)
 2. __f2fs_add_link(dir, name)
 3. iput(dir)
  -> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents))

Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable]
[<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs]
Call Trace:
 [<ffffffff8118ea00>] evict+0xb0/0x1b0
 [<ffffffff8118f1c5>] iput+0x105/0x190
 [<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs]
 [<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0
 [<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0
 [<ffffffff8111a0e7>] ? __lock_page+0x67/0x70
 [<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140
 [<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0
 [<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30
 [<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70

This means that we should flush all the dentry pages between iget and iput().
But, during the recovery routine, it is unallowed due to consistency, so we
have to wait the whole recovery process.
And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we
can put the stale dir inodes from the dirty_dir_inode_list.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

74d0b917

f2fs: fix por_doing variable coverage · 8c26d7d5

由 Jaegeuk Kim 提交于 5月 15, 2013

The reason of using sbi->por_doing is to alleviate data writes during the
recovery.
The find_fsync_dnodes() produces some dirty dentry pages, so we should
cover it too with sbi->por_doing.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

8c26d7d5

f2fs: remove redundant assignment · addbe45b

由 Jaegeuk Kim 提交于 5月 15, 2013

We don't need to assign a value redundantly.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

addbe45b

f2fs: fix the inconsistent state of data pages · 650495de

由 Jaegeuk Kim 提交于 5月 13, 2013

In get_lock_data_page, if there is a data race between get_dnode_of_data for
node and grab_cache_page for data, f2fs is able to face with the following
BUG_ON(dn.data_blkaddr == NEW_ADDR).

kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/data.c:251!
 [<ffffffffa044966c>] get_lock_data_page+0x1ec/0x210 [f2fs]
Call Trace:
 [<ffffffffa043b089>] f2fs_readdir+0x89/0x210 [f2fs]
 [<ffffffff811a0920>] ? fillonedir+0x100/0x100
 [<ffffffff811a0920>] ? fillonedir+0x100/0x100
 [<ffffffff811a07f8>] vfs_readdir+0xb8/0xe0
 [<ffffffff811a0b4f>] sys_getdents+0x8f/0x110
 [<ffffffff816d7999>] system_call_fastpath+0x16/0x1b

This bug is able to be occurred when the block address of the data block is
changed after f2fs_put_dnode().
In order to avoid that, this patch fixes the lock order of node and data
blocks in which the node block lock is covered by the data block lock.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

650495de

f2fs: fix inconsistency of block count during recovery · 65e5cd0a

由 Jaegeuk Kim 提交于 5月 14, 2013

Currently f2fs recovers the dentry of fsynced files.
When power-off-recovery is conducted, this newly recovered inode should increase
node block count as well as inode block count.

This patch resolves this inconsistency that results in:

1. create a file
2. write data
3. fsync
4. reboot without sync
5. mount and recover the file
6. node block count is 1 and inode block count is 2
 : fall into the inconsistent state
7. unlink the file
 : trigger the following BUG_ON

------------[ cut here ]------------
kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/f2fs.h:716!
Call Trace:
 [<ffffffffa0344100>] ? get_node_page+0x50/0x1a0 [f2fs]
 [<ffffffffa0344bfc>] remove_inode_page+0x8c/0x100 [f2fs]
 [<ffffffffa03380f0>] ? f2fs_evict_inode+0x180/0x2d0 [f2fs]
 [<ffffffffa033812e>] f2fs_evict_inode+0x1be/0x2d0 [f2fs]
 [<ffffffff811c7a67>] evict+0xa7/0x1a0
 [<ffffffff811c82b5>] iput+0x105/0x190
 [<ffffffff811c2b30>] d_kill+0xe0/0x120
 [<ffffffff811c2c57>] dput+0xe7/0x1e0
 [<ffffffff811acc3d>] __fput+0x19d/0x2d0
 [<ffffffff811acd7e>] ____fput+0xe/0x10
 [<ffffffff81070645>] task_work_run+0xb5/0xe0
 [<ffffffff81002941>] do_notify_resume+0x71/0xb0
 [<ffffffff8175f14a>] int_signal+0x12/0x17
Reported-and-Tested-by: NChris Fries <C.Fries@motorola.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

65e5cd0a

25 5月, 2013 11 次提交

aio: fix kioctx not being freed after cancellation at exit time · 03e04f04

由 Benjamin LaHaise 提交于 5月 24, 2013

The recent changes overhauling fs/aio.c introduced a bug that results in
the kioctx not being freed when outstanding kiocbs are cancelled at
exit_aio() time.  Specifically, a kiocb that is cancelled has its
completion events discarded by batch_complete_aio(), which then fails to
wake up the process stuck in free_ioctx().  Fix this by modifying the
wait_event() condition in free_ioctx() appropriately.

This patch was tested with the cancel operation in the thread based code
posted yesterday.

[akpm@linux-foundation.org: fix build]
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Zach Brown <zab@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

03e04f04

ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap() · b4ca2b4b

由 Joseph Qi 提交于 5月 24, 2013

Last time we found there is lock/unlock bug in ocfs2_file_aio_write, and
then we did a thorough search for all lock resources in
ocfs2_inode_info, including rw, inode and open lockres and found this
bug.  My kernel version is 3.0.13, and it is also in the lastest version
3.9.  In ocfs2_fiemap, once ocfs2_get_clusters_nocache failed, it should
goto out_unlock instead of out, because we need release buffer head, up
read alloc sem and unlock inode.
Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
Reviewed-by: NJie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: NSunil Mushran <sunil.mushran@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b4ca2b4b

nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary · 136e8770

由 Ryusuke Konishi 提交于 5月 24, 2013

nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary

DESCRIPTION:
 There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.

The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
 "The machine I've been trialling nilfs on is running Debian Testing,
  Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
  version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
  also reproduced it (identically) with Debian Unstable amd64 and Debian
  Experimental (using the 3.8-trunk kernel).  The problematic partitions
  were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."

SYMPTOMS:
(1) System log contains error messages likewise:

    [63102.496756] nilfs_direct_assign: invalid pointer: 0
    [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
    [63102.496798]
    [63102.524403] Remounting filesystem read-only

(2) The NILFS2 file system is remounted in RO mode.

REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):

----------------[BEGIN SCRIPT]--------------------

VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------

REPRODUCIBILITY: 100%

INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:

  open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
  fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  ftruncate(7, 60)

The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes.  As
it is possible to see from above output, the file has 60 bytes of the
whole size.  So, it is enough one block (1 KB in size) allocation for
the whole file.  Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.

The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.

When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().

The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called.  However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.

As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.

FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.

Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: NAnthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: NVyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

136e8770

aio: fix io_getevents documentation · 6900807c

由 Jeff Moyer 提交于 5月 24, 2013

In reviewing man pages, I noticed that io_getevents is documented to
update the timeout that gets passed into the library call.  This doesn't
happen in kernel space or in the library (even though it's documented to
do so in both places).  Unless there is objection, I'd like to fix the
comments/docs to match the code (I will also update the man page upon
consensus).
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
Acked-by: NCyril Hrubis <chrubis@suse.cz>
Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6900807c

hfs: avoid crash in hfs_bnode_create · fb09c373

由 Jeff Mahoney 提交于 5月 24, 2013

Commit 634725a9 ("hfs: cleanup HFS+ prints") removed the BUG_ON in
hfs_bnode_create in hfsplus.  This patch removes it from the hfs version
and avoids an fsfuzzer crash.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Acked-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb09c373

ocfs2: unlock rw lock if inode lock failed · afe1bb73

由 Joseph Qi 提交于 5月 24, 2013

In ocfs2_file_aio_write(), it does ocfs2_rw_lock() first and then
ocfs2_inode_lock().

But if ocfs2_inode_lock() failed, it goes to out_sems without unlocking
rw lock.  This will cause a bug in ocfs2_lock_res_free() when testing
res->l_ex_holders, which is increased in __ocfs2_cluster_lock() and
decreased in __ocfs2_cluster_unlock().
Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: "Duyongfeng (B)" <du.duyongfeng@huawei.com>
Acked-by: NSunil Mushran <sunil.mushran@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

afe1bb73

fat: fix possible overflow for fat_clusters · 7b92d03c

由 OGAWA Hirofumi 提交于 5月 24, 2013

Intermediate value of fat_clusters can be overflowed on 32bits arch.
Reported-by: NKrzysztof Strasburger <strasbur@chkw386.ch.pwr.wroc.pl>
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7b92d03c

xfs: remote attribute lookups require the value length · 7ae07780

由 Dave Chinner 提交于 5月 20, 2013

When reading a remote attribute, to correctly calculate the length
of the data buffer for CRC enable filesystems, we need to know the
length of the attribute data. We get this information when we look
up the attribute, but we don't store it in the args structure along
with the other remote attr information we get from the lookup. Add
this information to the args structure so we can use it
appropriately.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit e461fcb1)

7ae07780

xfs: xfs_attr_shortform_allfit() does not handle attr3 format. · cf257abf

由 Dave Chinner 提交于 5月 20, 2013

xfstests generic/117 fails with:

XFS: Assertion failed: leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC)

indicating a function that does not handle the attr3 format
correctly. Fix it.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>
(cherry picked from commit b38958d7)

cf257abf

xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC · 7ced60ca

由 Dave Chinner 提交于 5月 20, 2013

Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit 72916fb8)

7ced60ca

xfs: fix missing KM_NOFS tags to keep lockdep happy · b17cb364

由 Dave Chinner 提交于 5月 20, 2013

There are several places where we use KM_SLEEP allocation contexts
and use the fact that they are called from transaction context to
add KM_NOFS where appropriate. Unfortunately, there are several
places where the code makes this assumption but can be called from
outside transaction context but with filesystem locks held. These
places need explicit KM_NOFS annotations to avoid lockdep
complaining about reclaim contexts.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit ac14876c)

b17cb364

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功