- 20 10月, 2011 2 次提交
-
-
由 Josef Bacik 提交于
We have not been reserving enough space for checksums. We were just reserving bytes for the checksum items themselves, we were not taking into account having to cow the tree and such. This patch adds a csum_bytes counter to the inode for keeping track of the number of bytes outstanding we have for checksums. Then we calculate how many leaves would be required for the checksums we are given and use that to reserve space. This adds a significant amount of bytes to our reservations, but we will handle this later. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
由 Josef Bacik 提交于
reserved_bytes is not used for anything in the inode, remove it. Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 18 9月, 2011 2 次提交
-
-
由 Josef Bacik 提交于
We can race with readdir and the RCU path walking stuff. This is because we clear the need lookup flag before actually instantiating the inode. This will lead the RCU path walk stuff to find a dentry it thinks is valid without a d_inode attached. So instead unhash the dentry when we first start the lookup, and then clear the flag after we've instantiated the dentry so we're garunteed to either try the slow lookup, or have the d_inode set properly. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Hidetoshi Seto 提交于
Since the d_off in the first dirent for "." (that originates from the 4th argument "offset" of filldir() for the 2nd dirent for "..") is wrongly assigned in btrfs_real_readdir(), telldir returns same offset for different locations. | # mkfs.btrfs /dev/sdb1 | # mount /dev/sdb1 fs0 | # cd fs0 | # touch file0 file1 | # ../test | telldir: 0 | readdir: d_off = 2, d_name = "." | telldir: 2 | readdir: d_off = 2, d_name = ".." | telldir: 2 | readdir: d_off = 3, d_name = "file0" | telldir: 3 | readdir: d_off = 2147483647, d_name = "file1" | telldir: 2147483647 To fix this problem, pass filp->f_pos (which is loff_t) instead. | # ../test | telldir: 0 | readdir: d_off = 1, d_name = "." | telldir: 1 | readdir: d_off = 2, d_name = ".." | telldir: 2 | readdir: d_off = 3, d_name = "file0" : At the moment the "offset" for "." is unused because there is no preceding dirent, however it is better to pass filp->f_pos to follow grammatical usage. Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 11 9月, 2011 3 次提交
-
-
由 Miao Xie 提交于
If we write some data into the data hole of the file(no preallocation for this hole), Btrfs will allocate some disk space, and update nbytes of the inode, but the other element--disk_i_size needn't be updated. At this condition, we must update inode metadata though disk_i_size is not changed(btrfs_ordered_update_i_size() return 1). # mkfs.btrfs /dev/sdb1 # mount /dev/sdb1 /mnt # touch /mnt/a # truncate -s 856002 /mnt/a # dd if=/dev/zero of=/mnt/a bs=4K count=1 conv=nocreat,notrunc # umount /mnt # btrfsck /dev/sdb1 root 5 inode 257 errors 400 found 32768 bytes used err is 1 Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Miao Xie 提交于
The function - btrfs_cont_expand() forgot to close the transaction handle before it jump out the while loop. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sergei Trofimovich 提交于
iput() shouldn't be called for inodes in I_NEW state. We need to mark inode as constructed first. WARNING: at fs/inode.c:1309 iput+0x20b/0x210() Call Trace: [<ffffffff8103e7ba>] warn_slowpath_common+0x7a/0xb0 [<ffffffff8103e805>] warn_slowpath_null+0x15/0x20 [<ffffffff810eaf0b>] iput+0x20b/0x210 [<ffffffff811b96fb>] btrfs_iget+0x1eb/0x4a0 [<ffffffff811c3ad6>] btrfs_run_defrag_inodes+0x136/0x210 [<ffffffff811ad55f>] cleaner_kthread+0x17f/0x1a0 [<ffffffff81035b7d>] ? sub_preempt_count+0x9d/0xd0 [<ffffffff811ad3e0>] ? transaction_kthread+0x280/0x280 [<ffffffff8105af86>] kthread+0x96/0xa0 [<ffffffff814336d4>] kernel_thread_helper+0x4/0x10 [<ffffffff8105aef0>] ? kthread_worker_fn+0x190/0x190 [<ffffffff814336d0>] ? gs_change+0xb/0xb Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org> CC: Konstantin Khlebnikov <khlebnikov@openvz.org> Tested-by: NDavid Sterba <dsterba@suse.cz> CC: Josef Bacik <josef@redhat.com> CC: Chris Mason <chris.mason@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 18 8月, 2011 1 次提交
-
-
由 Jeff Mahoney 提交于
This patch tightens the read-only access checks in btrfs_permission to match the constraints in inode_permission. Currently, even though the device node itself will be unmodified, read-write access to device nodes is denied to when the device node resides on a read-only subvolume or a is a file that has been marked read-only by the btrfs conversion utility. With this patch applied, the check only affects regular files, directories, and symlinks. It also restructures the code a bit so that we don't duplicate the MAY_WRITE check for both tests. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 02 8月, 2011 3 次提交
-
-
由 Jeff Mahoney 提交于
The set/clear bit and the extent split/merge hooks only ever return 0. Changing them to return void simplifies the error handling cases later. This patch changes the hook prototypes, the single implementation of each, and the functions that call them to return void instead. Since all four of these hooks execute under a spinlock, they're necessarily simple. Signed-off-by: NJeff Mahoney <jeffm@suse.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Li Zefan 提交于
We passed the wrong value to btrfs_force_ra(). Fix this by changing the argument of btrfs_force_ra() from last_index to nr_page. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Tsutomu Itoh 提交于
When btrfs_unlink_inode() and btrfs_orphan_add() in btrfs_unlink() are error, the error code is returned to the caller instead of BUG_ON(). Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 01 8月, 2011 1 次提交
-
-
由 Josef Bacik 提交于
In btrfs we have 2 indexes for inodes. One is for readdir, it's in this nice sequential order and works out brilliantly for readdir. However if you use ls, it usually stat's each file it gets from readdir. This is where the second index comes in, which is based on a hash of the name of the file. So then the lookup has to lookup this index, and then lookup the inode. The index lookup is going to be in random order (since its based on the name hash), which gives us less than stellar performance. Since we know the inode location from the readdir index, I create a dummy dentry and copy the location key into dentry->d_fsdata. Then on lookup if we have d_fsdata we use that location to lookup the inode, avoiding looking up the other directory index. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 7月, 2011 4 次提交
-
-
由 Chris Mason 提交于
Now that we are using regular file crcs for the free space cache, we can deadlock if we try to read the free_space_inode while we are updating the crc tree. This commit fixes things by using the commit_root to read the crcs. This is safe because we the free space cache file would already be loaded if that block group had been changed in the current transaction. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
The extent_buffers have a very complex interface where we use HIGHMEM for metadata and try to cache a kmap mapping to access the memory. The next commit adds reader/writer locks, and concurrent use of this kmap cache would make it even more complex. This commit drops the ability to use HIGHMEM with extent buffers, and rips out all of the related code. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
So I had this brilliant idea to use atomic counters for outstanding and reserved extents, but this turned out to be a bad idea. Consider this where we have 1 outstanding extent and 1 reserved extent Reserver Releaser atomic_dec(outstanding) now 0 atomic_read(outstanding)+1 get 1 atomic_read(reserved) get 1 don't actually reserve anything because they are the same atomic_cmpxchg(reserved, 1, 0) atomic_inc(outstanding) atomic_add(0, reserved) free reserved space for 1 extent Then the reserver now has no actual space reserved for it, and when it goes to finish the ordered IO it won't have enough space to do it's allocation and you get those lovely warnings. Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to GFP_HIGHUSER_MOVABLE. So instead use find_or_create_page in all cases where we need GFP_NOFS so we don't deadlock. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 27 7月, 2011 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 7月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
Replace the ->check_acl method with a ->get_acl method that simply reads an ACL from disk after having a cache miss. This means we can replace the ACL checking boilerplate code with a single implementation in namei.c. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 7月, 2011 1 次提交
-
-
由 Al Viro 提交于
->d_parent is locked and stable there... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 20 7月, 2011 5 次提交
-
-
由 Al Viro 提交于
... and simplify the living hell out of callers Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
not used by the instances anymore. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
redundant; all callers get it duplicated in mask & MAY_NOT_BLOCK and none of them removes that bit. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
its value depends only on inode and does not change; we might as well store it in ->i_op->check_acl and be done with that. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 7月, 2011 3 次提交
-
-
由 Mark Fasheh 提交于
btrfs_iget() also needed an update so that errors from btrfs_locked_inode() are caught and bubbled back up. Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Mark Fasheh 提交于
I moved the path allocation up a few lines to the top of the function so that we couldn't get into the state where we've dropped delayed items and the extent cache but fail due to -ENOMEM. Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Mark Fasheh 提交于
This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation failure. All the sites that are fixed in this patch were checked by me to be fairly trivial to fix because of at least one of two criteria: - Callers of the function catch errors from it already so bubbling the error up will be handled. - Callers of the function might BUG_ON any nonzero return code in which case there is no behavior changed (but we still got to remove a BUG_ON) The following functions were updated: btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group, btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written, btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink, insert_reserved_file_extent, and run_delalloc_nocow Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
- 07 7月, 2011 1 次提交
-
-
由 Miao Xie 提交于
We need to make sure the data relocation inode doesn't go through the delayed metadata updates, otherwise we get an oops during balance: kernel BUG at fs/btrfs/relocation.c:4303! [SNIP] Call Trace: [<ffffffffa03143fd>] ? update_ref_for_cow+0x22d/0x330 [btrfs] [<ffffffffa0314951>] __btrfs_cow_block+0x451/0x5e0 [btrfs] [<ffffffffa031355d>] ? read_block_for_search+0x14d/0x4d0 [btrfs] [<ffffffffa0314beb>] btrfs_cow_block+0x10b/0x240 [btrfs] [<ffffffffa031acae>] btrfs_search_slot+0x49e/0x7a0 [btrfs] [<ffffffffa032d8af>] btrfs_lookup_inode+0x2f/0xa0 [btrfs] [<ffffffff8147bf0e>] ? mutex_lock+0x1e/0x50 [<ffffffffa0380cf1>] btrfs_update_delayed_inode+0x71/0x160 [btrfs] [<ffffffffa037ff27>] ? __btrfs_release_delayed_node+0x67/0x190 [btrfs] [<ffffffffa0381cf8>] btrfs_run_delayed_items+0xe8/0x120 [btrfs] [<ffffffffa03365e0>] btrfs_commit_transaction+0x250/0x850 [btrfs] [<ffffffff810f91d9>] ? find_get_pages+0x39/0x130 [<ffffffffa0336cd5>] ? join_transaction+0x25/0x250 [btrfs] [<ffffffff81081de0>] ? wake_up_bit+0x40/0x40 [<ffffffffa03785fa>] prepare_to_relocate+0xda/0xf0 [btrfs] [<ffffffffa037f2bb>] relocate_block_group+0x4b/0x620 [btrfs] [<ffffffffa0334cf5>] ? btrfs_clean_old_snapshots+0x35/0x150 [btrfs] [<ffffffffa037fa43>] btrfs_relocate_block_group+0x1b3/0x2e0 [btrfs] [<ffffffffa0368ec0>] ? btrfs_tree_unlock+0x50/0x50 [btrfs] [<ffffffffa035e39b>] btrfs_relocate_chunk+0x8b/0x670 [btrfs] [<ffffffffa031303d>] ? btrfs_set_path_blocking+0x3d/0x50 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa031bea1>] ? btrfs_previous_item+0xb1/0x150 [btrfs] [<ffffffffa03577d8>] ? read_extent_buffer+0xd8/0x1d0 [btrfs] [<ffffffffa035f5aa>] btrfs_balance+0x21a/0x2b0 [btrfs] [<ffffffffa0368898>] btrfs_ioctl+0x798/0xd20 [btrfs] [<ffffffff8111e358>] ? handle_mm_fault+0x148/0x270 [<ffffffff814809e8>] ? do_page_fault+0x1d8/0x4b0 [<ffffffff81160d6a>] do_vfs_ioctl+0x9a/0x540 [<ffffffff811612b1>] sys_ioctl+0xa1/0xb0 [<ffffffff81484ec2>] system_call_fastpath+0x16/0x1b [SNIP] RIP [<ffffffffa037c1cc>] btrfs_reloc_cow_block+0x22c/0x270 [btrfs] Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 27 6月, 2011 1 次提交
-
-
由 Miao Xie 提交于
When iputting the inode, We may leave the delayed nodes if they have some delayed items that have not been dealt with. So when the inode is read again, we must look up the relative delayed node, and use the information in it to initialize the inode. Or we will get inconsonant inode information, it may cause that the same directory index number is allocated again, and hit the following oops: [ 5447.554187] err add delayed dir index item(name: pglog_0.965_0) into the insertion tree of the delayed node(root id: 262, inode id: 258, errno: -17) [ 5447.569766] ------------[ cut here ]------------ [ 5447.575361] kernel BUG at fs/btrfs/delayed-inode.c:1301! [SNIP] [ 5447.790721] Call Trace: [ 5447.793191] [<ffffffffa0641c4e>] btrfs_insert_dir_item+0x189/0x1bb [btrfs] [ 5447.800156] [<ffffffffa0651a45>] btrfs_add_link+0x12b/0x191 [btrfs] [ 5447.806517] [<ffffffffa0651adc>] btrfs_add_nondir+0x31/0x58 [btrfs] [ 5447.812876] [<ffffffffa0651d6a>] btrfs_create+0xf9/0x197 [btrfs] [ 5447.818961] [<ffffffff8111f840>] vfs_create+0x72/0x92 [ 5447.824090] [<ffffffff8111fa8c>] do_last+0x22c/0x40b [ 5447.829133] [<ffffffff8112076a>] path_openat+0xc0/0x2ef [ 5447.834438] [<ffffffff810c58e2>] ? __perf_event_task_sched_out+0x24/0x44 [ 5447.841216] [<ffffffff8103ecdd>] ? perf_event_task_sched_out+0x59/0x67 [ 5447.847846] [<ffffffff81121a79>] do_filp_open+0x3d/0x87 [ 5447.853156] [<ffffffff811e126c>] ? strncpy_from_user+0x43/0x4d [ 5447.859072] [<ffffffff8111f1f5>] ? getname_flags+0x2e/0x80 [ 5447.864636] [<ffffffff8111f179>] ? do_getname+0x14b/0x173 [ 5447.870112] [<ffffffff8111f1b7>] ? audit_getname+0x16/0x26 [ 5447.875682] [<ffffffff8112b1ab>] ? spin_lock+0xe/0x10 [ 5447.880882] [<ffffffff81112d39>] do_sys_open+0x69/0xae [ 5447.886153] [<ffffffff81112db1>] sys_open+0x20/0x22 [ 5447.891114] [<ffffffff813b9aab>] system_call_fastpath+0x16/0x1b Fix it by reusing the old delayed node. Reported-by: NJim Schutt <jaschut@sandia.gov> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Tested-by: NJim Schutt <jaschut@sandia.gov> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 25 6月, 2011 1 次提交
-
-
由 Chris Mason 提交于
When we create a new inode, we aren't filling in the field that records the transaction that last changed this inode. If we then go to fsync that inode, it will be skipped because the field isn't filled in. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 16 6月, 2011 1 次提交
-
-
由 Josef Bacik 提交于
The delayed ref patch accidently removed the btrfs_free_path in btrfs_unlink_subvol, this puts it back and means we don't leak a path. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 11 6月, 2011 2 次提交
-
-
由 Li Zefan 提交于
The WARN_ON() in start_transaction() was triggered while balancing. The cause is btrfs_relocate_chunk() started a transaction and then called iput() on the inode that stores free space cache, and iput() called btrfs_start_transaction() again. Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Reviewed-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Jan Schmidt 提交于
When encountering an EIO while reading from a nodatasum extent, we insert an error record into the inode's failure tree. btrfs_readpage_end_io_hook returns early for nodatasum inodes. We'd better clear the failure tree in that case, otherwise the kernel complains about BUG extent_state: Objects remaining on kmem_cache_close() on rmmod. Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 04 6月, 2011 1 次提交
-
-
由 David Sterba 提交于
wrap checking of filesystem 'closing' flag and fix a few missing memory barriers. Signed-off-by: NDavid Sterba <dsterba@suse.cz>
-
- 27 5月, 2011 2 次提交
-
-
由 Christoph Hellwig 提交于
Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or anything else, so that the filesystem can track internally if it needs to push out a transaction for fdatasync or not. This is just the prototype change with no user for it yet. I plan to push large XFS changes for the next merge window, and getting this trivial infrastructure in this window would help a lot to avoid tree interdependencies. Also remove incorrect comments that ->dirty_inode can't block. That has been changed a long time ago, and many implementations rely on it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Chris Mason 提交于
This will detect small random writes into files and queue the up for an auto defrag process. It isn't well suited to database workloads yet, but works for smaller files such as rpm, sqlite or bdb databases. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 26 5月, 2011 3 次提交
-
-
由 Sage Weil 提交于
Btrfs has no problems with lingering references to unlinked directory inodes. CC: Chris Mason <chris.mason@oracle.com> CC: linux-btrfs@vger.kernel.org Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Sage Weil 提交于
Only a few file systems need this. Start by pushing it down into each rename method (except gfs2 and xfs) so that it can be dealt with on a per-fs basis. Acked-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Sage Weil 提交于
Only a few file systems need this. Start by pushing it down into each fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs basis. This does not change behavior for any in-tree file systems. Acked-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 5月, 2011 1 次提交
-
-
由 Julia Lawall 提交于
Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression struct btrfs_path * x; expression ra,rb; position p1,p2; @@ x = btrfs_alloc_path@p1(...) ... when != btrfs_free_path(x,...) when != if (...) { ... btrfs_free_path(x,...) ...} when != x = ra if(...) { ... when != x = rb when forall when != btrfs_free_path(x,...) \(return <+...x...+>; \| return@p2...; \) } @script:python@ p1 << r.p1; p2 << r.p2; @@ cocci.print_main("alloc",p1) cocci.print_secs("return",p2) // </smpl> Signed-off-by: NJulia Lawall <julia@diku.dk> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-