提交 · 3aab8f828ef358ae92545294a14cd52d510cc585 · openeuler / raspberrypi-kernel

10 7月, 2014 12 次提交

f2fs: introduce f2fs_write_failed to handle error case when write · 3aab8f82

由 Chao Yu 提交于 7月 02, 2014

When we fail in ->write_begin()/->direct_IO(), our allocated node block in disk
and page cache are still kept, despite these may not be used again.

This patch introduce f2fs_write_failed() to handle the error case of these two
interfaces, it will truncate page cache and blocks of this file according to
i_size.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3aab8f82

G
f2fs: arguments cleanup of finding file flow functions · eee6160f
由 Gu Zheng 提交于 6月 24, 2014
```
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
```
eee6160f

f2fs: remove the needless point-cast · 1c3bb978

由 Gu Zheng 提交于 6月 27, 2014

Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1c3bb978

f2fs: remove the redundant validation check of acl · 34e6d456

由 Gu Zheng 提交于 6月 24, 2014

kernel side(xx_init_acl), the acl is get/cloned from the parent dir's,
which is credible. So remove the redundant validation check of acl
here.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

34e6d456

f2fs: reduce region of f2fs_lock_op covered for better concurrency · 1256010a

由 Chao Yu 提交于 6月 24, 2014

In our rename process, region of f2fs_lock_op covered is too big as some of the
code like f2fs_empty_dir/f2fs_find_entry are not needed to protect by this lock.

So in the extreme case like doing checkpoint when we rename old inode to exist
inode in a large directory could cause lower concurrency.

Let's reduce the region of f2fs_lock_op to fix this.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

1256010a

f2fs: replace count*size kzalloc by kcalloc · b434babf

由 Fabian Frederick 提交于 6月 23, 2014

kcalloc manages count*sizeof overflow.

Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b434babf

f2fs: refactor flush_nat_entries codes for reducing NAT writes · aec71382

由 Chao Yu 提交于 6月 24, 2014

Although building NAT journal in cursum reduce the read/write work for NAT
block, but previous design leave us lower performance when write checkpoint
frequently for these cases:
1. if journal in cursum has already full, it's a bit of waste that we flush all
   nat entries to page for persistence, but not to cache any entries.
2. if journal in cursum is not full, we fill nat entries to journal util
   journal is full, then flush the left dirty entries to disk without merge
   journaled entries, so these journaled entries may be flushed to disk at next
   checkpoint but lost chance to flushed last time.

In this patch we merge dirty entries located in same NAT block to nat entry set,
and linked all set to list, sorted ascending order by entries' count of set.
Later we flush entries in sparse set into journal as many as we can, and then
flush merged entries to disk. In this way we can not only gain in performance,
but also save lifetime of flash device.

In my testing environment, it shows this patch can help to reduce NAT block
writes obviously. In hard disk test case: cost time of fsstress is stablely
reduced by about 5%.

1. virtual machine + hard disk:
fsstress -p 20 -n 200 -l 5
		node num	cp count	nodes/cp
based		4599.6		1803.0		2.551
patched		2714.6		1829.6		1.483

2. virtual machine + 32g micro SD card:
fsstress -p 20 -n 200 -l 1 -w -f chown=0 -f creat=4 -f dwrite=0
-f fdatasync=4 -f fsync=4 -f link=0 -f mkdir=4 -f mknod=4 -f rename=5
-f rmdir=5 -f symlink=0 -f truncate=4 -f unlink=5 -f write=0 -S

		node num	cp count	nodes/cp
based		84.5		43.7		1.933
patched		49.2		40.0		1.23

Our latency of merging op shows not bad when handling extreme case like:
merging a great number of dirty nats:
latency(ns)	dirty nat count
3089219		24922
5129423		27422
4000250		24523

change log from v1:
 o fix wrong logic in add_nat_entry when grab a new nat entry set.
 o swith to create slab cache in create_node_manager_caches.
 o use GFP_ATOMIC instead of GFP_NOFS to avoid potential long latency.

change log from v2:
 o make comment position more appropriate suggested by Jaegeuk Kim.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

aec71382

J
f2fs: clean up an unused parameter and assignment · a014e037
由 Jaegeuk Kim 提交于 6月 20, 2014
```
This patch cleans up simple unnecessary codes.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
```
a014e037

f2fs: introduce f2fs_do_tmpfile for code consistency · b97a9b5d

由 Jaegeuk Kim 提交于 6月 20, 2014

This patch adds f2fs_do_tmpfile to eliminate the redundant init_inode_metadata
flow.
Throught this, we can provide the consistent lock usage, e.g., fi->i_sem,  and
this will enable better debugging stuffs.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b97a9b5d

f2fs: support ->tmpfile() · 50732df0

由 Chao Yu 提交于 6月 19, 2014

Add function f2fs_tmpfile() to support O_TMPFILE file creation, and modify logic
of init_inode_metadata to enable linkat temp file.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

50732df0

f2fs: avoid to truncate non-updated page partially · ca0a81b3

由 Chao Yu 提交于 6月 12, 2014

After we call find_data_page in truncate_partial_data_page, we could not
guarantee this page is updated or not as error may occurred in lower layer.

We'd better check status of the page to avoid this no updated page be
writebacked to device.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

ca0a81b3

f2fs: avoid unneeded SetPageUptodate in f2fs_write_end · 5576cd6c

由 Chao Yu 提交于 6月 12, 2014

We have already set page update in ->write_begin, so we should remove redundant
SetPageUptodate in ->write_end.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5576cd6c

09 7月, 2014 7 次提交

f2fs: avoid to access NULL pointer in issue_flush_thread · 50e1f8d2

由 Chao Yu 提交于 7月 07, 2014

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=75861

Denis 2014-05-10 11:28:59 UTC reported:
"F2FS-fs (mmcblk0p28): mounting..
 Unable to handle kernel NULL pointer dereference at virtual address 00000018
 ...
 [<c0a2f678>] (_raw_spin_lock+0x3c/0x70) from [<c03a0330>] (issue_flush_thread+0x50/0x17c)
 [<c03a0330>] (issue_flush_thread+0x50/0x17c) from [<c01b4064>] (kthread+0x98/0xa4)
 [<c01b4064>] (kthread+0x98/0xa4) from [<c0108060>] (kernel_thread_exit+0x0/0x8)"

This patch assign cmd_control_info in sm_info before issue_flush_thread is being
created, so this make sure that issue flush thread will have no chance to access
invalid info in fcc.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Reviewed-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

50e1f8d2

f2fs: check bdi->dirty_exceeded when trying to skip data writes · 2743f865

由 Jaegeuk Kim 提交于 6月 28, 2014

If we don't check the current backing device status, balance_dirty_pages can
fall into infinite pausing routine.

This can be occurred when a lot of directories make a small number of dirty
dentry pages including files.
Reported-by: NBrian Chadwick <brianchad@westnet.com.au>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2743f865

f2fs: do checkpoint for the renamed inode · b2c08299

由 Jaegeuk Kim 提交于 6月 30, 2014

If an inode is renamed, it should be registered as file_lost_pino to conduct
checkpoint at f2fs_sync_file.
Otherwise, the inode cannot be recovered due to no dent_mark in the following
scenario.

Note that, this scenario is from xfstests/322.

1. create "a"
2. fsync "a"
3. rename "a" to "b"
4. fsync "b"
5. Sudden power-cut

After recovery is done, "b" should be seen.
However, the result shows "a", since the recovery procedure does not enter
recover_dentry due to no dent_mark.

The reason is like below.
- The nid of "a" is checkpointed during #2, f2fs_sync_file.
- The inode page for "b" produced by #3 is written without dent_mark by
sync_node_pages.

So, this patch fixes this bug by assinging file_lost_pino to the "a"'s inode.
If the pino is lost, f2fs_sync_file conducts checkpoint, and then recovers
the latest pino and its dentry information for further recovery.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

b2c08299

f2fs: release new entry page correctly in error path of f2fs_rename · dd4d961f

由 Chao Yu 提交于 6月 24, 2014

This patch correct releasing code of new_page to avoid BUG_ON in error patch of
f2fs_rename.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

dd4d961f

f2fs: fix error path in init_inode_metadata · 90d72459

由 Chao Yu 提交于 6月 24, 2014

If we fail in this path:
->init_inode_metadata
  ->make_empty_dir
    ->get_new_data_page
      ->grab_cache_page return -ENOMEM

We will bug on in error path of init_inode_metadata when call remove_inode_page
because i_block = 2 (one inode block will be released later & one dentry block).

We should release the dentry block in init_inode_metadata to avoid this BUG_ON,
and avoid leak of dentry block resource, because we never have second chance to
release that block in ->evict_inode as in upper error path we make this inode
'bad'.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

90d72459

f2fs: check lower bound nid value in check_nid_range · d6b7d4b3

由 Chao Yu 提交于 6月 12, 2014

This patch add lower bound verification for nid in check_nid_range, so nids
reserved like 0, node, meta passed by caller could be checked there.

And then check_nid_range could be used in f2fs_nfs_get_inode for simplifying
code.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d6b7d4b3

f2fs: remove unused variables in f2fs_sm_info · 8bc6f60e

由 Chao Yu 提交于 6月 11, 2014

Remove unused variables in struct f2fs_sm_info.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

8bc6f60e

04 7月, 2014 3 次提交

fs/seq_file: fallback to vmalloc allocation · 058504ed

由 Heiko Carstens 提交于 7月 02, 2014

There are a couple of seq_files which use the single_open() interface.
This interface requires that the whole output must fit into a single
buffer.

E.g.  for /proc/stat allocation failures have been observed because an
order-4 memory allocation failed due to memory fragmentation.  In such
situations reading /proc/stat is not possible anymore.

Therefore change the seq_file code to fallback to vmalloc allocations
which will usually result in a couple of order-0 allocations and hence
also work if memory is fragmented.

For reference a call trace where reading from /proc/stat failed:

  sadc: page allocation failure: order:4, mode:0x1040d0
  CPU: 1 PID: 192063 Comm: sadc Not tainted 3.10.0-123.el7.s390x #1
  [...]
  Call Trace:
    show_stack+0x6c/0xe8
    warn_alloc_failed+0xd6/0x138
    __alloc_pages_nodemask+0x9da/0xb68
    __get_free_pages+0x2e/0x58
    kmalloc_order_trace+0x44/0xc0
    stat_open+0x5a/0xd8
    proc_reg_open+0x8a/0x140
    do_dentry_open+0x1bc/0x2c8
    finish_open+0x46/0x60
    do_last+0x382/0x10d0
    path_openat+0xc8/0x4f8
    do_filp_open+0x46/0xa8
    do_sys_open+0x114/0x1f0
    sysc_tracego+0x14/0x1a
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: NDavid Rientjes <rientjes@google.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

058504ed

/proc/stat: convert to single_open_size() · f74373a5

由 Heiko Carstens 提交于 7月 02, 2014

These two patches are supposed to "fix" failed order-4 memory
allocations which have been observed when reading /proc/stat.  The
problem has been observed on s390 as well as on x86.

To address the problem change the seq_file memory allocations to
fallback to use vmalloc, so that allocations also work if memory is
fragmented.

This approach seems to be simpler and less intrusive than changing
/proc/stat to use an interator.  Also it "fixes" other users as well,
which use seq_file's single_open() interface.

This patch (of 2):

Use seq_file's single_open_size() to preallocate a buffer that is large
enough to hold the whole output, instead of open coding it.  Also
calculate the requested size using the number of online cpus instead of
possible cpus, since the size of the output only depends on the number
of online cpus.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com>
Cc: Andrea Righi <andrea@betterlinux.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Stefan Bader <stefan.bader@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f74373a5

autofs4: fix false positive compile error · 571ff473

由 Ian Kent 提交于 7月 02, 2014

On strict build environments we can see:

  fs/autofs4/inode.c: In function 'autofs4_fill_super':
  fs/autofs4/inode.c:312: error: 'pgrp' may be used uninitialized in this function
  make[2]: *** [fs/autofs4/inode.o] Error 1
  make[1]: *** [fs/autofs4] Error 2
  make: *** [fs] Error 2
  make: *** Waiting for unfinished jobs....

This is due to the use of pgrp_set being used to indicate pgrp has has
been set rather than initializing pgrp itself.
Signed-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

571ff473

03 7月, 2014 13 次提交

Btrfs: fix crash when starting transaction · abdd2e80

由 Filipe Manana 提交于 6月 24, 2014

Often when starting a transaction we commit the currently running transaction,
which can end up writing block group caches when the current process has its
journal_info set to NULL (and not to a transaction). This makes our assertion
at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting
in a crash/hang. Therefore fix it by setting journal_info.

Two different traces of this issue follow below.

1)

    [51502.241936] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
    [51502.242213] ------------[ cut here ]------------
    [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964!
    [51502.242669] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
    (...)
    [51502.244010] Call Trace:
    [51502.244010]  [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
    [51502.244010]  [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
    [51502.244010]  [<ffffffffa0357a6a>] commit_cowonly_roots+0x164/0x226 [btrfs]
    [51502.244010]  [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
    [51502.244010]  [<ffffffff8168ec7b>] ? _raw_spin_unlock+0x2b/0x40
    [51502.244010]  [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs]
    [51502.244010]  [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs]
    [51502.244010]  [<ffffffffa02d73e1>] __unlink_start_trans+0x31/0xe0 [btrfs]
    [51502.244010]  [<ffffffffa02dea67>] btrfs_unlink+0x37/0xc0 [btrfs]
    [51502.244010]  [<ffffffff811bb054>] ? do_unlinkat+0x114/0x2a0
    [51502.244010]  [<ffffffff811baebc>] vfs_unlink+0xcc/0x150
    [51502.244010]  [<ffffffff811bb1a0>] do_unlinkat+0x260/0x2a0
    [51502.244010]  [<ffffffff811a9ef4>] ? filp_close+0x64/0x90
    [51502.244010]  [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0
    [51502.244010]  [<ffffffff81349cab>] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [51502.244010]  [<ffffffff811be9eb>] SyS_unlinkat+0x1b/0x40
    [51502.244010]  [<ffffffff81698452>] system_call_fastpath+0x16/0x1b
    [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
    [51502.244010] RIP  [<ffffffffa03575da>] assfail.constprop.88+0x1e/0x20 [btrfs]

2)

    [25405.097230] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
    [25405.097488] ------------[ cut here ]------------
    [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964!
    [25405.097940] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
    (...)
    [25405.100008] Call Trace:
    [25405.100008]  [<ffffffffa02bc025>] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
    [25405.100008]  [<ffffffffa02c3bdc>] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
    [25405.100008]  [<ffffffffa035755a>] commit_cowonly_roots+0x164/0x226 [btrfs]
    [25405.100008]  [<ffffffffa02d53cd>] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
    [25405.100008]  [<ffffffff8109c170>] ? bit_waitqueue+0xc0/0xc0
    [25405.100008]  [<ffffffffa02d6259>] start_transaction+0x459/0x620 [btrfs]
    [25405.100008]  [<ffffffffa02d67ab>] btrfs_start_transaction+0x1b/0x20 [btrfs]
    [25405.100008]  [<ffffffffa02e3407>] btrfs_create+0x47/0x210 [btrfs]
    [25405.100008]  [<ffffffffa02d74cc>] ? btrfs_permission+0x3c/0x80 [btrfs]
    [25405.100008]  [<ffffffff811bc63b>] vfs_create+0x9b/0x130
    [25405.100008]  [<ffffffff811bcf19>] do_last+0x849/0xe20
    [25405.100008]  [<ffffffff811b9409>] ? link_path_walk+0x79/0x820
    [25405.100008]  [<ffffffff811bd5b5>] path_openat+0xc5/0x690
    [25405.100008]  [<ffffffff810ab07d>] ? trace_hardirqs_on+0xd/0x10
    [25405.100008]  [<ffffffff811cdcd2>] ? __alloc_fd+0x32/0x1d0
    [25405.100008]  [<ffffffff811be2a3>] do_filp_open+0x43/0xa0
    [25405.100008]  [<ffffffff811cddf1>] ? __alloc_fd+0x151/0x1d0
    [25405.100008]  [<ffffffff811abcfc>] do_sys_open+0x13c/0x230
    [25405.100008]  [<ffffffff810aaea6>] ? trace_hardirqs_on_caller+0x16/0x1e0
    [25405.100008]  [<ffffffff811abe12>] SyS_open+0x22/0x30
    [25405.100008]  [<ffffffff81698452>] system_call_fastpath+0x16/0x1b
    [25405.100008] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 <0f> 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
    [25405.100008] RIP  [<ffffffffa03570ca>] assfail.constprop.88+0x1e/0x20 [btrfs]
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

abdd2e80

Btrfs: fix btrfs_print_leaf for skinny metadata · be2c765d

由 Josef Bacik 提交于 7月 02, 2014

We wouldn't actuall print the extent information if we had a skinny metadata
item, this fixes that.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

be2c765d

Btrfs: fix race of using total_bytes_pinned · d288db5d

由 Liu Bo 提交于 7月 02, 2014

This percpu counter @total_bytes_pinned is introduced to skip unnecessary
operations of 'commit transaction', it accounts for those space we may free
but are stuck in delayed refs.

And we zero out @space_info->total_bytes_pinned every transaction period so
we have a better idea of how much space we'll actually free up by committing
this transaction.  However, we do the 'zero out' part a little earlier, before
we actually unpin space, so we end up returning ENOSPC when we actually have
free space that's just unpinned from committing transaction.

xfstests/generic/074 complained then.

This fixes it by actually accounting the percpu pinned number when 'unpin',
and since it's protected by space_info->lock, the race is gone now.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

d288db5d

btrfs: use E2BIG instead of EIO if compression does not help · 130d5b41

由 David Sterba 提交于 6月 20, 2014

Return codes got updated in 60e1975a
(btrfs: return errno instead of -1 from compression)
lzo wrapper returns E2BIG in this case, do the same for zlib.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

130d5b41

btrfs: remove stale comment from btrfs_flush_all_pending_stuffs · 0a4eaea8

由 David Sterba 提交于 6月 20, 2014

Commit fcebe456 (Btrfs: rework qgroup
accounting) removed the qgroup accounting after delayed refs.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

0a4eaea8

Btrfs: fix use-after-free when cloning a trailing file hole · 14f59796

由 Filipe Manana 提交于 6月 29, 2014

The transaction handle was being used after being freed.

Cc: Chris Mason <clm@fb.com>
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

14f59796

btrfs: fix null pointer dereference in btrfs_show_devname when name is null · 0aeb8a6e

由 Anand Jain 提交于 6月 30, 2014

dev->name is null but missing flag is not set.
Strictly speaking the missing flag should have been set, but there
are more places where code just checks if name is null. For now this
patch does the same.

stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000064
IP: [<ffffffffa0228908>] btrfs_show_devname+0x58/0xf0 [btrfs]

[<ffffffff81198879>] show_vfsmnt+0x39/0x130
[<ffffffff81178056>] m_show+0x16/0x20
[<ffffffff8117d706>] seq_read+0x296/0x390
[<ffffffff8115aa7d>] vfs_read+0x9d/0x160
[<ffffffff8115b549>] SyS_read+0x49/0x90
[<ffffffff817abe52>] system_call_fastpath+0x16/0x1b

reproducer:
mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
btrfstune -S 1 /dev/sdg1
modprobe -r btrfs && modprobe btrfs
mount -o degraded /dev/sdg1 /btrfs
btrfs dev add /dev/sdg3 /btrfs
Signed-off-by: NAnand Jain <Anand.Jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

0aeb8a6e

btrfs: fix null pointer dereference in clone_fs_devices when name is null · e755f780

由 Anand Jain 提交于 6月 30, 2014

when one of the device path is missing btrfs_device name is null. So this
patch will check for that.

stack:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff812e18c0>] strlen+0x0/0x30
[<ffffffffa01cd92a>] ? clone_fs_devices+0xaa/0x160 [btrfs]
[<ffffffffa01cdcf7>] btrfs_init_new_device+0x317/0xca0 [btrfs]
[<ffffffff81155bca>] ? __kmalloc_track_caller+0x15a/0x1a0
[<ffffffffa01d6473>] btrfs_ioctl+0xaa3/0x2860 [btrfs]
[<ffffffff81132a6c>] ? handle_mm_fault+0x48c/0x9c0
[<ffffffff81192a61>] ? __blkdev_put+0x171/0x180
[<ffffffff817a784c>] ? __do_page_fault+0x4ac/0x590
[<ffffffff81193426>] ? blkdev_put+0x106/0x110
[<ffffffff81179175>] ? mntput+0x35/0x40
[<ffffffff8116d4b0>] do_vfs_ioctl+0x460/0x4a0
[<ffffffff8115c72e>] ? ____fput+0xe/0x10
[<ffffffff81068033>] ? task_work_run+0xb3/0xd0
[<ffffffff8116d547>] SyS_ioctl+0x57/0x90
[<ffffffff817a793e>] ? do_page_fault+0xe/0x10
[<ffffffff817abe52>] system_call_fastpath+0x16/0x1b

reproducer:
mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
btrfstune -S 1 /dev/sdg1
modprobe -r btrfs && modprobe btrfs
mount -o degraded /dev/sdg1 /btrfs
btrfs dev add /dev/sdg3 /btrfs
Signed-off-by: NAnand Jain <Anand.Jain@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

e755f780

btrfs: fix nossd and ssd_spread mount option regression · 2aa06a35

由 Eric Sandeen 提交于 6月 27, 2014

The commit

07802534 btrfs: Cleanup the btrfs_parse_options for remount.

broke ssd options quite badly; it stopped making ssd_spread
imply ssd, and it made "nossd" unsettable.

Put things back at least as well as they were before
(though ssd mount option handling is still pretty odd:
# mount -o "nossd,ssd_spread" works?)
Reported-by: NRoman Mamedov <rm@romanrm.net>
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NChris Mason <clm@fb.com>

2aa06a35

Btrfs: fix race between balance recovery and root deletion · 5f316481

由 Wang Shilong 提交于 6月 26, 2014

Balance recovery is called when RW mounting or remounting from
RO to RW, it is called to finish roots merging.

When doing balance recovery, relocation root's corresponding
fs root(whose root refs is 0) might be destroyed by cleaner
thread, this will make btrfs fail to mount.

Fix this problem by holding @cleaner_mutex when doing balance
recovery.
Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

5f316481

Btrfs: atomically set inode->i_flags in btrfs_update_iflags · 3cc79392

由 Filipe Manana 提交于 6月 25, 2014

This change is based on the corresponding recent change for ext4:

  ext4: atomically set inode->i_flags in ext4_set_inode_flags()

That has the following commit message that applies to btrfs as well:

  "Use cmpxchg() to atomically set i_flags instead of clearing out the
   S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
   EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
   where an immutable file has the immutable flag cleared for a brief
   window of time."

Replacing EXT4_IMMUTABLE_FL and EXT4_APPEND_FL with BTRFS_INODE_IMMUTABLE
and BTRFS_INODE_APPEND, respectively.
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

3cc79392

nfs: fix nfs4d readlink truncated packet · 69bbd9c7

由 Avi Kivity 提交于 6月 26, 2014

XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding,
but truncates the packet to the padding-less size.

Fix by taking the padding into consideration when truncating the packet.

Symptoms:

	# ll /mnt/
	ls: cannot read symbolic link /mnt/test: Input/output error
	total 4
	-rw-r--r--. 1 root root  0 Jun 14 01:21 123456
	lrwxrwxrwx. 1 root root  6 Jul  2 03:33 test
	drwxr-xr-x. 1 root root  0 Jul  2 23:50 tmp
	drwxr-xr-x. 1 root root 60 Jul  2 23:44 tree
Signed-off-by: NAvi Kivity <avi@cloudius-systems.com>
Fixes: 476a7b1f (nfsd4: don't treat readlink like a zero-copy operation)
Reviewed-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

69bbd9c7

kernfs: kernfs_notify() must be useable from non-sleepable contexts · ecca47ce

由 Tejun Heo 提交于 7月 01, 2014

d911d987 ("kernfs: make kernfs_notify() trigger inotify events
too") added fsnotify triggering to kernfs_notify() which requires a
sleepable context.  There are already existing users of
kernfs_notify() which invoke it from an atomic context and in general
it's silly to require a sleepable context for triggering a
notification.

The following is an invalid context bug triggerd by md invoking
sysfs_notify() from IO completion path.

 BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
 in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
 2 locks held by swapper/1/0:
  #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
  #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
 irq event stamp: 33518
 hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
 hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
 softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
 softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
  0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
  0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
 Call Trace:
  <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
  [<ffffffff810d4f14>] __might_sleep+0x184/0x240
  [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
  [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
  [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
  [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
  [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
  [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
  [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
  [<ffffffff813b46c4>] blk_update_request+0x94/0x420
  [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
  [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
  [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
  [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
  [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
  [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
  [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
  [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
  [<ffffffff81119436>] handle_edge_irq+0x66/0x130
  [<ffffffff8101c3e4>] handle_irq+0x84/0x150
  [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
  [<ffffffff818122f2>] common_interrupt+0x72/0x72
  <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
  [<ffffffff81025454>] default_idle+0x24/0x230
  [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
  [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
  [<ffffffff8104df1b>] start_secondary+0x25b/0x300

This patch fixes it by punting the notification delivery through a
work item.  This ends up adding an extra pointer to kernfs_elem_attr
enlarging kernfs_node by a pointer, which is not ideal but not a very
big deal either.  If this turns out to be an actual issue, we can move
kernfs_elem_attr->size to kernfs_node->iattr later.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ecca47ce

29 6月, 2014 5 次提交

btrfs: only unlock block in verify_parent_transid if we locked it · 472b909f

由 Josef Bacik 提交于 6月 25, 2014

This is a regression from my patch a26e8c9f, we
need to only unlock the block if we were the one who locked it.  Otherwise this
will trip BUG_ON()'s in locking.c  Thanks,

cc: stable@vger.kernel.org
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

472b909f

Btrfs: assert send doesn't attempt to start transactions · 46c4e71e

由 Filipe Manana 提交于 6月 24, 2014

When starting a transaction just assert that current->journal_info
doesn't contain a send transaction stub, since send isn't supposed
to start transactions and when it finishes (either successfully or
not) it's supposed to set current->journal_info to NULL.

This is motivated by the change titled:

    Btrfs: fix crash when starting transaction
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

46c4e71e

btrfs compression: reuse recently used workspace · c39aa705

由 Sergey Senozhatsky 提交于 6月 25, 2014

Add compression `workspace' in free_workspace() to
`idle_workspace' list head, instead of tail. So we have
better chances to reuse most recently used `workspace'.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

c39aa705

Btrfs: fix crash when mounting raid5 btrfs with missing disks · 5588383e

由 Liu Bo 提交于 6月 24, 2014

The reproducer is

$ mkfs.btrfs D1 D2 D3 -mraid5
$ mkfs.ext4 D2 && mkfs.ext4 D3
$ mount D1 /btrfs -odegraded

-------------------

[   87.672992] ------------[ cut here ]------------
[   87.673845] kernel BUG at fs/btrfs/raid56.c:1828!
...
[   87.673845] RIP: 0010:[<ffffffff813efc7e>]  [<ffffffff813efc7e>] __raid_recover_end_io+0x4ae/0x4d0
...
[   87.673845] Call Trace:
[   87.673845]  [<ffffffff8116bbc6>] ? mempool_free+0x36/0xa0
[   87.673845]  [<ffffffff813f0255>] raid_recover_end_io+0x75/0xa0
[   87.673845]  [<ffffffff81447c5b>] bio_endio+0x5b/0xa0
[   87.673845]  [<ffffffff81447cb2>] bio_endio_nodec+0x12/0x20
[   87.673845]  [<ffffffff81374621>] end_workqueue_fn+0x41/0x50
[   87.673845]  [<ffffffff813ad2aa>] normal_work_helper+0xca/0x2c0
[   87.673845]  [<ffffffff8108ba2b>] process_one_work+0x1eb/0x530
[   87.673845]  [<ffffffff8108b9c9>] ? process_one_work+0x189/0x530
[   87.673845]  [<ffffffff8108c15b>] worker_thread+0x11b/0x4f0
[   87.673845]  [<ffffffff8108c040>] ? rescuer_thread+0x290/0x290
[   87.673845]  [<ffffffff810939c4>] kthread+0xe4/0x100
[   87.673845]  [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220
[   87.673845]  [<ffffffff817e7c7c>] ret_from_fork+0x7c/0xb0
[   87.673845]  [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220

-------------------

It's because that we miscalculate @rbio->bbio->error so that it doesn't
reach maximum of tolerable errors while it should have.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Tested-by: Satoru Takeuchi<takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

5588383e

btrfs: create sprout should rename fsid on the sysfs as well · b2373f25

由 Anand Jain 提交于 6月 03, 2014

Creating sprout will change the fsid of the mounted root.
do the same on the sysfs as well.

reproducer:
 mount /dev/sdb /btrfs (seed disk)
 btrfs dev add /dev/sdc /btrfs
 mount -o rw,remount /btrfs
 btrfs dev del /dev/sdb /btrfs
 mount /dev/sdb /btrfs

Error:
kobject_add_internal failed for fe350492-dc28-4051-a601-e017b17e6145 with -EEXIST, don't try to register things with the same name in the same directory.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

b2373f25