- 30 4月, 2015 1 次提交
-
-
由 Forrest Liu 提交于
btrfs_release_extent_buffer_page() can't handle dummy extent that allocated by btrfs_clone_extent_buffer() properly. That is because reference count of pages that allocated by btrfs_clone_extent_buffer() was 2, 1 by alloc_page(), and another by attach_extent_buffer_page(). Running following command repeatly can check this memory leak problem btrfs inspect-internal inode-resolve 256 /mnt/btrfs Signed-off-by: NChien-Kuan Yeh <ckya@synology.com> Signed-off-by: NForrest Liu <forrestl@synology.com> Reviewed-by: NFilipe Manana <fdmanana@suse.com> Tested-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 26 4月, 2015 9 次提交
-
-
由 Yang Dongsheng 提交于
We need to fill inode when we found a node for it in delayed_nodes_tree. But we did not fill the ->last_trans currently, it will cause the test of xfstest/generic/311 fail. Scenario of the 311 is shown as below: Problem: (1). test_fd = open(fname, O_RDWR|O_DIRECT) (2). pwrite(test_fd, buf, 4096, 0) (3). close(test_fd) (4). drop_all_caches() <-------- "echo 3 > /proc/sys/vm/drop_caches" (5). test_fd = open(fname, O_RDWR|O_DIRECT) (6). fsync(test_fd); <-------- we did not get the correct log entry for the file Reason: When we re-open this file in (5), we would find a node in delayed_nodes_tree and fill the inode we are lookup with the information. But the ->last_trans is not filled, then the fsync() will check the ->last_trans and found it's 0 then say this inode is already in our tree which is commited, not recording the extents for it. Fix: This patch fill the ->last_trans properly and set the runtime_flags if needed in this situation. Then we can get the log entries we expected after (6) and generic/311 passed. Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com> Reviewed-by: NMiao Xie <miaoxie@huawei.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
Whenever the check for a send in progress introduced in commit 521e0546 (btrfs: protect snapshots from deleting during send) is hit, we return without unlocking inode->i_mutex. This is easy to see with lockdep enabled: [ +0.000059] ================================================ [ +0.000028] [ BUG: lock held when returning to user space! ] [ +0.000029] 4.0.0-rc5-00096-g3c435c1e #93 Not tainted [ +0.000026] ------------------------------------------------ [ +0.000029] btrfs/211 is leaving the kernel with locks still held! [ +0.000029] 1 lock held by btrfs/211: [ +0.000023] #0: (&type->i_mutex_dir_key){+.+.+.}, at: [<ffffffff8135b8df>] btrfs_ioctl_snap_destroy+0x2df/0x7a0 Make sure we unlock it in the error path. Reviewed-by: NFilipe Manana <fdmanana@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: NOmar Sandoval <osandov@osandov.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
If io_ctl_prepare_pages fails, the pages in io_ctl.pages are not valid. When we try to access them later, things will blow up in various ways. Also fix the comment about the return value, which is an errno on error, not -1, and update the cases where it was not. Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NOmar Sandoval <osandov@osandov.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
Consider the following interleaving of overlapping calls to alloc_extent_buffer: Call 1: - Successfully allocates a few pages with find_or_create_page - find_or_create_page fails, goto free_eb - Unlocks the allocated pages Call 2: - Calls find_or_create_page and gets a page in call 1's extent_buffer - Finds that the page is already associated with an extent_buffer - Grabs a reference to the half-written extent_buffer and calls mark_extent_buffer_accessed on it mark_extent_buffer_accessed will then try to call mark_page_accessed on a null page and panic. The fix is to decrement the reference count on the half-written extent_buffer before unlocking the pages so call 2 won't use it. We should also set exists = NULL in the case that we don't use exists to avoid accidentally returning a freed extent_buffer in an error case. Signed-off-by: NOmar Sandoval <osandov@osandov.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Omar Sandoval 提交于
This is one of the first places to give out when memory is tight. Handle it properly rather than with a BUG_ON. Also fix the comment about the return value, which is an ERR_PTR, not NULL, on error. Signed-off-by: NOmar Sandoval <osandov@osandov.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Forrest Liu 提交于
If device tree has hole, find_free_dev_extent() cannot find available address properly. The problem can be reproduce by following script. mntpath=/btrfs loopdev=/dev/loop0 filepath=/home/forrest/image umount $mntpath losetup -d $loopdev truncate --size 100g $filepath losetup $loopdev $filepath mkfs.btrfs -f $loopdev mount $loopdev $mntpath # make device tree with one big hole for i in `seq 1 1 100`; do fallocate -l 1g $mntpath/$i done sync for i in `seq 1 1 95`; do rm $mntpath/$i done sync # wait cleaner thread remove unused block group sleep 300 fallocate -l 1g $mntpath/aaa # failed to allocate new chunk fallocate -l 1g $mntpath/bbb Above script will make device tree with one big hole, and can only allocate just one chunk in a transaction, so failed to allocate new chunk for $mntpath/bbb item 8 key (1 DEV_EXTENT 2185232384) itemoff 15859 itemsize 48 dev extent chunk_tree 3 chunk objectid 256 chunk offset 106292051968 length 1073741824 item 9 key (1 DEV_EXTENT 104190705664) itemoff 15811 itemsize 48 dev extent chunk_tree 3 chunk objectid 256 chunk offset 103108575232 length 1073741824 Signed-off-by: NForrest Liu <forrestl@synology.com> Reviewed-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Chris Mason 提交于
Now that we're doing free space cache writeback outside the critical section in the commit, there is a bigger window for delalloc_bytes to be added after a cache has been written. find_free_extent may do this without putting the block group back into the dirty list, and also without a transaction running. Checking for delalloc_bytes in cache_save_setup means we might leave the cache marked as written without invalidating it. Consistency checks during mount will toss the cache, but it's better to get rid of the check in cache_save_setup and let it get invalidated by the checks already done during cache write out. Signed-off-by: NChris Mason <clm@fb.com>
-
由 Filipe Manana 提交于
While starting the writes of the dirty block group caches, if we don't find a block group item in the extent tree we were leaving without releasing our path, running delayed references and then looping again to process any new dirty block groups. However this second iteration of the loop could cause a deadlock because it tries to lock some other extent tree node/leaf which another task already locked and it's blocked because it's waiting for a lock on some node/leaf that is in our path that was not released before. We could also deadlock when running the delayed references - as we could end up trying to lock the same nodes/leafs that we have in our local path (with a different lock type). Got into such case when running xfstests: [20892.242791] ------------[ cut here ]------------ [20892.243776] WARNING: CPU: 0 PID: 13299 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]() [20892.245874] BTRFS: Transaction aborted (error -2) (...) [20892.269378] Call Trace: [20892.269915] [<ffffffff8142fa46>] dump_stack+0x4f/0x7b [20892.271097] [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad [20892.272173] [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb [20892.273386] [<ffffffffa0509a6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs] [20892.274857] [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48 [20892.275851] [<ffffffffa0509a6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs] [20892.277341] [<ffffffffa0515e10>] write_one_cache_group+0x68/0xaf [btrfs] [20892.278628] [<ffffffffa052088a>] btrfs_start_dirty_block_groups+0x18d/0x29b [btrfs] [20892.280191] [<ffffffffa052f077>] btrfs_commit_transaction+0x130/0x9c9 [btrfs] (...) [20892.291316] ---[ end trace 597f77e664245373 ]--- [20892.293955] BTRFS: error (device sdg) in write_one_cache_group:3184: errno=-2 No such entry [20892.297390] BTRFS info (device sdg): forced readonly [20892.298222] ------------[ cut here ]------------ [20892.299190] WARNING: CPU: 0 PID: 13299 at fs/btrfs/ctree.c:2683 btrfs_search_slot+0x7e/0x7d2 [btrfs]() (...) [20892.326253] Call Trace: [20892.326904] [<ffffffff8142fa46>] dump_stack+0x4f/0x7b [20892.329503] [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad [20892.330815] [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb [20892.332556] [<ffffffffa0510b73>] ? btrfs_search_slot+0x7e/0x7d2 [btrfs] [20892.333955] [<ffffffff81045f62>] warn_slowpath_null+0x1a/0x1c [20892.335562] [<ffffffffa0510b73>] btrfs_search_slot+0x7e/0x7d2 [btrfs] [20892.336849] [<ffffffff8107b024>] ? arch_local_irq_save+0x9/0xc [20892.338222] [<ffffffffa051ad52>] ? cache_save_setup+0x43/0x2a5 [btrfs] [20892.339823] [<ffffffffa051ad66>] ? cache_save_setup+0x57/0x2a5 [btrfs] [20892.341275] [<ffffffff814351a4>] ? _raw_spin_unlock+0x32/0x46 [20892.342810] [<ffffffffa0515de7>] write_one_cache_group+0x3f/0xaf [btrfs] [20892.344184] [<ffffffffa052088a>] btrfs_start_dirty_block_groups+0x18d/0x29b [btrfs] [20892.347162] [<ffffffffa052f077>] btrfs_commit_transaction+0x130/0x9c9 [btrfs] (...) [20892.361015] ---[ end trace 597f77e664245374 ]--- [21120.688097] INFO: task kworker/u8:17:29854 blocked for more than 120 seconds. [21120.689881] Tainted: G W 4.0.0-rc5-btrfs-next-9+ #2 [21120.691384] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. (...) [21120.703696] Call Trace: [21120.704310] [<ffffffff8143107e>] schedule+0x74/0x83 [21120.705490] [<ffffffffa055f025>] btrfs_tree_lock+0xd7/0x236 [btrfs] [21120.706757] [<ffffffff81075cd6>] ? signal_pending_state+0x31/0x31 [21120.708156] [<ffffffffa054ac1e>] lock_extent_buffer_for_io+0x3e/0x194 [btrfs] [21120.709892] [<ffffffffa054bb86>] ? btree_write_cache_pages+0x273/0x385 [btrfs] [21120.711605] [<ffffffffa054bc42>] btree_write_cache_pages+0x32f/0x385 [btrfs] [21120.723440] [<ffffffffa0527552>] btree_writepages+0x23/0x5c [btrfs] [21120.724943] [<ffffffff8110c4c8>] do_writepages+0x23/0x2c [21120.726008] [<ffffffff81176dde>] __writeback_single_inode+0x73/0x2fa [21120.727230] [<ffffffff8117714a>] ? writeback_sb_inodes+0xe5/0x38b [21120.728526] [<ffffffff811771fb>] ? writeback_sb_inodes+0x196/0x38b [21120.729701] [<ffffffff8117726a>] writeback_sb_inodes+0x205/0x38b (...) [21120.747853] INFO: task btrfs:13282 blocked for more than 120 seconds. [21120.749459] Tainted: G W 4.0.0-rc5-btrfs-next-9+ #2 [21120.751137] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. (...) [21120.768457] Call Trace: [21120.769039] [<ffffffff8143107e>] schedule+0x74/0x83 [21120.770107] [<ffffffffa052f25c>] btrfs_commit_transaction+0x315/0x9c9 [btrfs] [21120.771558] [<ffffffff81075cd6>] ? signal_pending_state+0x31/0x31 [21120.773659] [<ffffffffa056fd8c>] prepare_to_relocate+0xcb/0xd2 [btrfs] [21120.776257] [<ffffffffa05741da>] relocate_block_group+0x44/0x4a9 [btrfs] [21120.777755] [<ffffffffa05747a0>] ? btrfs_relocate_block_group+0x161/0x288 [btrfs] [21120.779459] [<ffffffffa05747a8>] btrfs_relocate_block_group+0x169/0x288 [btrfs] [21120.781153] [<ffffffffa0550403>] btrfs_relocate_chunk.isra.29+0x3e/0xa7 [btrfs] [21120.783918] [<ffffffffa05518fd>] btrfs_balance+0xaa4/0xc52 [btrfs] [21120.785436] [<ffffffff8114306e>] ? cpu_cache_get.isra.39+0xe/0x1f [21120.786434] [<ffffffffa0559252>] btrfs_ioctl_balance+0x23f/0x2b0 [btrfs] (...) [21120.889251] INFO: task fsstress:13288 blocked for more than 120 seconds. [21120.890526] Tainted: G W 4.0.0-rc5-btrfs-next-9+ #2 [21120.891773] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. (...) [21120.899960] Call Trace: [21120.900743] [<ffffffff8143107e>] schedule+0x74/0x83 [21120.903004] [<ffffffffa055f025>] btrfs_tree_lock+0xd7/0x236 [btrfs] [21120.904383] [<ffffffff81075cd6>] ? signal_pending_state+0x31/0x31 [21120.905608] [<ffffffffa051125b>] btrfs_search_slot+0x766/0x7d2 [btrfs] [21120.906812] [<ffffffff8114290e>] ? virt_to_head_page+0x9/0x2c [21120.907874] [<ffffffff81144b7f>] ? cache_alloc_debugcheck_after.isra.42+0x16c/0x1cb [21120.909551] [<ffffffffa05124e0>] btrfs_insert_empty_items+0x5d/0xa8 [btrfs] [21120.910914] [<ffffffffa0512585>] btrfs_insert_item+0x5a/0xa5 [btrfs] [21120.912181] [<ffffffffa0520271>] ? btrfs_create_pending_block_groups+0x96/0x130 [btrfs] [21120.913784] [<ffffffffa052028a>] btrfs_create_pending_block_groups+0xaf/0x130 [btrfs] [21120.915374] [<ffffffffa052ffc2>] __btrfs_end_transaction+0x84/0x366 [btrfs] [21120.916735] [<ffffffffa05302b4>] btrfs_end_transaction+0x10/0x12 [btrfs] [21120.917996] [<ffffffffa051ab26>] btrfs_check_data_free_space+0x11f/0x27c [btrfs] [21120.919478] [<ffffffffa051ba25>] btrfs_delalloc_reserve_space+0x1e/0x51 [btrfs] [21120.921226] [<ffffffffa05382f2>] btrfs_truncate_page+0x85/0x2c4 [btrfs] [21120.923121] [<ffffffffa0538572>] btrfs_cont_expand+0x41/0x3ef [btrfs] [21120.924449] [<ffffffffa0541091>] ? btrfs_file_write_iter+0x19a/0x431 [btrfs] [21120.926602] [<ffffffff8107b024>] ? arch_local_irq_save+0x9/0xc [21120.927769] [<ffffffffa0541091>] ? btrfs_file_write_iter+0x19a/0x431 [btrfs] [21120.929324] [<ffffffffa05410a0>] ? btrfs_file_write_iter+0x1a9/0x431 [btrfs] [21120.930723] [<ffffffffa05410d9>] btrfs_file_write_iter+0x1e2/0x431 [btrfs] [21120.931897] [<ffffffff81067d85>] ? get_parent_ip+0xe/0x3e [21120.934446] [<ffffffff811534c3>] new_sync_write+0x7c/0xa0 [21120.935528] [<ffffffff81153b58>] vfs_write+0xb2/0x117 (...) Fixes: 1bbc621e ("Btrfs: allow block group cache writeout outside critical section in commit") Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Filipe Manana 提交于
While running xfstests I ran into the following: [20892.242791] ------------[ cut here ]------------ [20892.243776] WARNING: CPU: 0 PID: 13299 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x52/0x114 [btrfs]() [20892.245874] BTRFS: Transaction aborted (error -2) [20892.247329] Modules linked in: btrfs dm_snapshot dm_bufio dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse$ [20892.258488] CPU: 0 PID: 13299 Comm: fsstress Tainted: G W 4.0.0-rc5-btrfs-next-9+ #2 [20892.262011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [20892.264738] 0000000000000009 ffff880427f8bc18 ffffffff8142fa46 ffffffff8108b6a2 [20892.266244] ffff880427f8bc68 ffff880427f8bc58 ffffffff81045ea5 ffff880427f8bc48 [20892.267761] ffffffffa0509a6d 00000000fffffffe ffff8803545d6f40 ffffffffa05a15a0 [20892.269378] Call Trace: [20892.269915] [<ffffffff8142fa46>] dump_stack+0x4f/0x7b [20892.271097] [<ffffffff8108b6a2>] ? console_unlock+0x361/0x3ad [20892.272173] [<ffffffff81045ea5>] warn_slowpath_common+0xa1/0xbb [20892.273386] [<ffffffffa0509a6d>] ? __btrfs_abort_transaction+0x52/0x114 [btrfs] [20892.274857] [<ffffffff81045f05>] warn_slowpath_fmt+0x46/0x48 [20892.275851] [<ffffffffa0509a6d>] __btrfs_abort_transaction+0x52/0x114 [btrfs] [20892.277341] [<ffffffffa0515e10>] write_one_cache_group+0x68/0xaf [btrfs] [20892.278628] [<ffffffffa052088a>] btrfs_start_dirty_block_groups+0x18d/0x29b [btrfs] [20892.280191] [<ffffffffa052f077>] btrfs_commit_transaction+0x130/0x9c9 [btrfs] [20892.281781] [<ffffffff8107d33d>] ? trace_hardirqs_on+0xd/0xf [20892.282873] [<ffffffffa054163b>] btrfs_sync_file+0x313/0x387 [btrfs] [20892.284111] [<ffffffff8117acad>] vfs_fsync_range+0x95/0xa4 [20892.285203] [<ffffffff810e603f>] ? time_hardirqs_on+0x15/0x28 [20892.286290] [<ffffffff8123960b>] ? trace_hardirqs_on_thunk+0x3a/0x3f [20892.287469] [<ffffffff8117acd8>] vfs_fsync+0x1c/0x1e [20892.288412] [<ffffffff8117ae54>] do_fsync+0x34/0x4e [20892.289348] [<ffffffff8117b07c>] SyS_fsync+0x10/0x14 [20892.290255] [<ffffffff81435b32>] system_call_fastpath+0x12/0x17 [20892.291316] ---[ end trace 597f77e664245373 ]--- [20892.293955] BTRFS: error (device sdg) in write_one_cache_group:3184: errno=-2 No such entry [20892.297390] BTRFS info (device sdg): forced readonly This happens because in btrfs_start_dirty_block_groups() we splice the transaction's list of dirty block groups into a local list and then we keep extracting the first element of the list without holding the cache_write_mutex mutex. This means that before we acquire that mutex the first block group on the list might be removed by a conurrent task running btrfs_remove_block_group(). So make sure we extract the first element (and test the list emptyness) while holding that mutex. Fixes: 1bbc621e ("Btrfs: allow block group cache writeout outside critical section in commit") Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 25 4月, 2015 4 次提交
-
-
由 Al Viro 提交于
Calling unlazy_walk() in walk_component() and do_last() when we find a symlink that needs to be followed doesn't acquire a reference to vfsmount. That's fine when the symlink is on the same vfsmount as the parent directory (which is almost always the case), but it's not always true - one _can_ manage to bind a symlink on top of something. And in such cases we end up with excessive mntput(). Cc: stable@vger.kernel.org # since 2.6.39 Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jens Axboe 提交于
do_blockdev_direct_IO() increments and decrements the inode ->i_dio_count for each IO operation. It does this to protect against truncate of a file. Block devices don't need this sort of protection. For a capable multiqueue setup, this atomic int is the only shared state between applications accessing the device for O_DIRECT, and it presents a scaling wall for that. In my testing, as much as 30% of system time is spent incrementing and decrementing this value. A mixed read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with better latencies too. Before: clat percentiles (usec): | 1.00th=[ 33], 5.00th=[ 34], 10.00th=[ 34], 20.00th=[ 34], | 30.00th=[ 34], 40.00th=[ 34], 50.00th=[ 35], 60.00th=[ 35], | 70.00th=[ 35], 80.00th=[ 35], 90.00th=[ 37], 95.00th=[ 80], | 99.00th=[ 98], 99.50th=[ 151], 99.90th=[ 155], 99.95th=[ 155], | 99.99th=[ 165] After: clat percentiles (usec): | 1.00th=[ 95], 5.00th=[ 108], 10.00th=[ 129], 20.00th=[ 149], | 30.00th=[ 155], 40.00th=[ 161], 50.00th=[ 167], 60.00th=[ 171], | 70.00th=[ 177], 80.00th=[ 185], 90.00th=[ 201], 95.00th=[ 270], | 99.00th=[ 390], 99.50th=[ 398], 99.90th=[ 418], 99.95th=[ 422], | 99.99th=[ 438] In other setups, Robert Elliott reported seeing good performance improvements: https://lkml.org/lkml/2015/4/3/557 The more applications accessing the device, the worse it gets. Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells do_blockdev_direct_IO() that it need not worry about incrementing or decrementing the inode i_dio_count for this caller. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Elliott, Robert (Server Storage) <elliott@hp.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NJens Axboe <axboe@fb.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Johannes Berg 提交于
Al Viro's IOV changes broke 9p readdir() because the new code didn't abort the read when it returned nothing. The original code checked if the combined error/length was <= 0 but in the new code that accidentally got changed to just an error check. Add back the return from the function when nothing is read. Cc: Al Viro <viro@zeniv.linux.org.uk> Fixes: e1200fe6 ("9p: switch p9_client_read() to passing struct iov_iter *") Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Chris Mason 提交于
__btrfs_write_out_cache is holding the ctl->tree_lock while it prepares a list of bitmaps to record in the free space cache. It was dropping the lock while it worked on other components, which made a window for free_bitmap() to free the bitmap struct without removing it from the list. This changes things to hold the lock the whole time, and also makes sure we hold the lock during enospc cleanup. Reported-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
- 24 4月, 2015 15 次提交
-
-
由 Chris Mason 提交于
The code to fix stalls during free spache cache IO wasn't using the correct root when waiting on the IO for inode caches. This is only a problem when the inode cache is enabled with mount -o inode_cache This fixes the inode cache writeout to preserve any error values and makes sure not to override the root when inode cache writeout is done. Reported-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NChris Mason <clm@fb.com>
-
由 Andre Przywara 提交于
The brand new GCC 5.1.0 warns by default on using a boolean in the switch condition. This results in the following warning: fs/nfs/nfs4proc.c: In function 'nfs4_proc_get_rootfh': fs/nfs/nfs4proc.c:3100:10: warning: switch condition has boolean value [-Wswitch-bool] switch (auth_probe) { ^ This code was obviously using switch to make use of the fall-through semantics (without the usual comment, though). Rewrite that code using if statements to avoid the warning and make the code a bit more readable on the way. Signed-off-by: NAndre Przywara <andre.przywara@arm.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Firo Yang 提交于
Don't unnecessarily cast allocation return value in fs/nfs/inode.c::nfs_alloc_inode(). Signed-off-by: NFiro Yang <firogm@gmail.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Benjamin Coddington 提交于
If a READDIR reply comes back without any page data, avoid a NULL pointer dereference in xdr_copy_to_scratch(). BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 IP: [<ffffffff813a378d>] memcpy+0xd/0x110 ... Call Trace: ? xdr_inline_decode+0x7a/0xb0 [sunrpc] nfs3_decode_dirent+0x73/0x320 [nfsv3] nfs_readdir_page_filler+0xd5/0x4e0 [nfs] ? nfs3_rpc_wrapper.constprop.9+0x42/0xc0 [nfsv3] nfs_readdir_xdr_to_array+0x1fa/0x330 [nfs] ? mem_cgroup_commit_charge+0xac/0x160 ? nfs_readdir_xdr_to_array+0x330/0x330 [nfs] nfs_readdir_filler+0x22/0x90 [nfs] do_read_cache_page+0x7e/0x1a0 read_cache_page+0x1c/0x20 nfs_readdir+0x18e/0x660 [nfs] ? nfs3_xdr_dec_getattr3res+0x80/0x80 [nfsv3] iterate_dir+0x97/0x130 SyS_getdents+0x94/0x120 ? fillonedir+0xd0/0xd0 system_call_fastpath+0x12/0x17 Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Nicolas Iooss 提交于
This reverts commit 5a254d08. Since commit 5a254d08 ("nfs: replace nfs_add_stats with nfs_inc_stats when add one"), nfs_readpage and nfs_do_writepage use nfs_inc_stats to increment NFSIOS_READPAGES and NFSIOS_WRITEPAGES instead of nfs_add_stats. However nfs_inc_stats does not do the same thing as nfs_add_stats with value 1 because these functions work on distinct stats: nfs_inc_stats increments stats from "enum nfs_stat_eventcounters" (in server->io_stats->events) and nfs_add_stats those from "enum nfs_stat_bytecounters" (in server->io_stats->bytes). Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org> Fixes: 5a254d08 ("nfs: replace nfs_add_stats with nfs_inc_stats...") Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
I added the nfs4 prefix to make it obvious that this file is built into the NFS v4 module, and not the generic client. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
This file is only used internally to the NFS v4 module, so it doesn't need to be in the global include path. I also renamed it from nfs_idmap.h to nfs4idmap.h to emphasize that it's an NFSv4-only include file. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
The idmapper is completely internal to the NFS v4 module, so this macro will always evaluate to true. This patch also removes unnecessary includes of this file from the generic NFS client. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
d4b18c3e (pnfs: remove GETDEVICELIST implementation) removed the GETDEVICELIST operation from the NFS client, but left a "hole" in the nfs4_procedures array. This caused /proc/self/mountstats to report an operation named "51" where GETDEVICELIST used to be. This patch adds a stub to fix mountstats. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Fixes: d4b18c3e (pnfs: remove GETDEVICELIST implementation) Cc: stable@vger.kernel.org # 3.17+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Peng Tao 提交于
For flexfiles driver, we might choose to read from mirror index other than 0 while mirror_count is always 1 for read. Reported-by: NJean Spector <jean@primarydata.com> Cc: <stable@vger.kernel.org> # v3.19+ Cc: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Peng Tao 提交于
For direct read that has IO size larger than rsize, we'll split it into several READ requests and nfs_direct_good_bytes() would count completed bytes incorrectly by eating last zero count reply. Fix it by handling mirror and non-mirror cases differently such that we only count mirrored writes differently. This fixes 5fadeb47("nfs: count DIO good bytes correctly with mirroring"). Reported-by: NJean Spector <jean@primarydata.com> Cc: <stable@vger.kernel.org> # v3.19+ Signed-off-by: NPeng Tao <tao.peng@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
2ef47eb1 (NFS: Fix use of nfs_attr_use_mounted_on_fileid()) was a good start to fixing a circular directory structure warning for NFS v4 "junctioned" mountpoints. Unfortunately, further testing continued to generate this error. My server is configured like this: anna@nfsd ~ % df Filesystem Size Used Avail Use% Mounted on /dev/vda1 9.1G 2.0G 6.5G 24% / /dev/vdc1 1014M 33M 982M 4% /exports /dev/vdc2 1014M 33M 982M 4% /exports/vol1 /dev/vdc3 1014M 33M 982M 4% /exports/vol1/vol2 anna@nfsd ~ % cat /etc/exports /exports/ *(rw,async,no_subtree_check,no_root_squash) /exports/vol1/ *(rw,async,no_subtree_check,no_root_squash) /exports/vol1/vol2 *(rw,async,no_subtree_check,no_root_squash) I've been running chown across the entire mountpoint twice in a row to hit this problem. The first run succeeds, but the second one fails with the circular directory warning along with: anna@client ~ % dmesg [Apr 3 14:28] NFS: server 192.168.100.204 error: fileid changed fsid 0:39: expected fileid 0x100080, got 0x80 WHere 0x80 is the mountpoint's fileid and 0x100080 is the mounted-on fileid. This patch fixes the issue by requesting an updated mounted-on fileid from the server during nfs_update_inode(), and then checking that the fileid stored in the nfs_inode matches either the fileid or mounted-on fileid returned by the server. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Jeff Layton 提交于
Chuck pointed out a problem that crept in with commit 6ffa30d3 (nfs: don't call blocking operations while !TASK_RUNNING). Linux counts tasks in uninterruptible sleep against the load average, so this caused the system's load average to be pinned at at least 1 when there was a NFSv4.1+ mount active. Not a huge problem, but it's probably worth fixing before we get too many complaints about it. This patch converts the code back to use TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop iteration. In practice no one should really be signalling this thread at all, so I think this is reasonably safe. With this change, there's also no need to game the hung task watchdog so we can also convert the schedule_timeout call back to a normal schedule. Cc: <stable@vger.kernel.org> Reported-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJeff Layton <jeff.layton@primarydata.com> Tested-by: NChuck Lever <chuck.lever@oracle.com> Fixes: commit 6ffa30d3 (“nfs: don't call blocking . . .”) Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
At the very least, we should not be taking the i_mutex until after checking if the server even supports ALLOCATE or DEALLOCATE, allowing v4.0 or v4.1 to exit without potentially waiting on a lock. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Anna Schumaker 提交于
This patch adds a GETATTR to the end of ALLOCATE and DEALLOCATE operations so we can set the updated inode size and change attribute directly. DEALLOCATE will still need to release pagecache pages, so nfs42_proc_deallocate() now calls truncate_pagecache_range() before contacting the server. Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 22 4月, 2015 9 次提交
-
-
由 Yan, Zheng 提交于
For CEPH_OSD_CMPXATTR_MODE_U64, OSD expects the u64 to be encoded as string in object's xattr. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yan, Zheng 提交于
sb->s_root can be null when umounting Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Giuseppe Cantavenera 提交于
nfsd triggered a BUG_ON in net_generic(...) when rpc_pipefs_event(...) in fs/nfsd/nfs4recover.c was called before assigning ntfsd_net_id. The following was observed on a MIPS 32-core processor: kernel: Call Trace: kernel: [<ffffffffc00bc5e4>] rpc_pipefs_event+0x7c/0x158 [nfsd] kernel: [<ffffffff8017a2a0>] notifier_call_chain+0x70/0xb8 kernel: [<ffffffff8017a4e4>] __blocking_notifier_call_chain+0x4c/0x70 kernel: [<ffffffff8053aff8>] rpc_fill_super+0xf8/0x1a0 kernel: [<ffffffff8022204c>] mount_ns+0xb4/0xf0 kernel: [<ffffffff80222b48>] mount_fs+0x50/0x1f8 kernel: [<ffffffff8023dc00>] vfs_kern_mount+0x58/0xf0 kernel: [<ffffffff802404ac>] do_mount+0x27c/0xa28 kernel: [<ffffffff80240cf0>] SyS_mount+0x98/0xe8 kernel: [<ffffffff80135d24>] handle_sys64+0x44/0x68 kernel: kernel: Code: 0040f809 00000000 2e020001 <00020336> 3c12c00d 3c02801a de100000 6442eb98 0040f809 kernel: ---[ end trace 7471374335809536 ]--- Fixed this behaviour by calling register_pernet_subsys(&nfsd_net_ops) before registering rpc_pipefs_event(...) with the notifier chain. Signed-off-by: NGiuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com> Signed-off-by: NLorenzo Restelli <lorenzo.restelli.ext@nokia.com> Reviewed-by: NKinlong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Mark Salter 提交于
Commit f895b252 ("sunrpc: eliminate RPC_DEBUG") introduced use of IS_ENABLED() in a uapi header which leads to a build failure for userspace apps trying to use <linux/nfsd/debug.h>: linux/nfsd/debug.h:18:15: error: missing binary operator before token "(" #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) ^ Since this was only used to define NFSD_DEBUG if CONFIG_SUNRPC_DEBUG is enabled, replace instances of NFSD_DEBUG with CONFIG_SUNRPC_DEBUG. Cc: stable@vger.kernel.org Fixes: f895b252 "sunrpc: eliminate RPC_DEBUG" Signed-off-by: NMark Salter <msalter@redhat.com> Reviewed-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
In the case we already have a struct file (derived from a stateid), we still need to do permission-checking; otherwise an unauthorized user could gain access to a file by sniffing or guessing somebody else's stateid. Cc: stable@vger.kernel.org Fixes: dc97618d "nfsd4: separate splice and readv cases" Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
If the client uses a special stateid then we'll pass a NULL file to vfs_llseek. Fixes: 24bab491 " NFSD: Implement SEEK" Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: stable@vger.kernel.org Reported-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
vfs_fallocate will hit a NULL dereference if the client tries an ALLOCATE or DEALLOCATE with a special stateid. Fix that. (We also depend on the open to have broken any conflicting leases or delegations for us.) (If it turns out we need to allow special stateid's then we could do a temporary open here in the special-stateid case, as we do for read and write. For now I'm assuming it's not necessary.) Fixes: 95d871f0 "nfsd: Add ALLOCATE support" Cc: stable@vger.kernel.org Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Linus Torvalds 提交于
This reverts commit e2ac55b6. Huang Ying reports that this causes a hang at boot with debugfs disabled. It is true that the debugfs error checks are kind of confusing, and this code certainly merits more cleanup and thinking about it, but there's something wrong with the trivial "check not just for NULL, but for error pointers too" patch. Yes, with debugfs disabled, we will end up setting the o2hb_debug_dir pointer variable to an error pointer (-ENODEV), and then continue as if everything was fine. But since debugfs is disabled, all the _users_ of that pointer end up being compiled away, so even though the pointer can not be dereferenced, that's still fine. So it's confusing and somewhat questionable, but the "more correct" error checks end up causing more trouble than they fix. Reported-by: NHuang Ying <ying.huang@intel.com> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NChengyu Song <csong84@gatech.edu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 4月, 2015 2 次提交
-
-
由 Yan, Zheng 提交于
If a directory is complete, we want to keep the exclusive cap. So that MDS does not end up revoking the shared cap on every create/unlink operation. Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Ilya Dryomov 提交于
Don't pollute /proc/mounts with default options (presently these are dcache, nofsc and acl). Leave the acl/noacl however - it's a bit of a special case due to CONFIG_CEPH_FS_POSIX_ACL. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-