- 01 10月, 2013 2 次提交
-
-
由 Yan, Zheng 提交于
If client has outdated directory fragments information, it may request readdir an non-existent directory fragment. In this case, the MDS finds an approximate directory fragment and sends its contents back to the client. When receiving a reply with fragment that is different than the requested one, the client need to reset the 'readdir offset'. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
由 Yan, Zheng 提交于
If directory fragments change, fill_inode() inserts new frags into the fragtree, but it does not remove outdated frags from the fragtree. This patch fixes it. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 26 9月, 2013 1 次提交
-
-
由 Milosz Tanski 提交于
In some cases I'm on my ceph client cluster I'm seeing hunk kernel tasks in the invalidate page code path. This is due to the fact that we don't check if the page is marked as cache before calling fscache_wait_on_page_write(). This is the log from the hang INFO: task XXXXXX:12034 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ... Call Trace: [<ffffffff81568d09>] schedule+0x29/0x70 [<ffffffffa01d4cbd>] __fscache_wait_on_page_write+0x6d/0xb0 [fscache] [<ffffffff81083520>] ? add_wait_queue+0x60/0x60 [<ffffffffa029a3e9>] ceph_invalidate_fscache_page+0x29/0x50 [ceph] [<ffffffffa027df00>] ceph_invalidatepage+0x70/0x190 [ceph] [<ffffffff8112656f>] ? delete_from_page_cache+0x5f/0x70 [<ffffffff81133cab>] truncate_inode_page+0x8b/0x90 [<ffffffff81133ded>] truncate_inode_pages_range.part.12+0x13d/0x620 [<ffffffff8113431d>] truncate_inode_pages_range+0x4d/0x60 [<ffffffff811343b5>] truncate_inode_pages+0x15/0x20 [<ffffffff8119bbf6>] evict+0x1a6/0x1b0 [<ffffffff8119c3f3>] iput+0x103/0x190 ... Signed-off-by: NMilosz Tanski <milosz@adfin.com> Reviewed-by: NSage Weil <sage@inktank.com>
-
- 21 9月, 2013 27 次提交
-
-
由 Josef Bacik 提交于
Users have been complaining of the uuid tree stuff warning that there is no uuid root when trying to do snapshot operations. This is because if you mount -o ro we will not create the uuid tree. But then if you mount -o rw,remount we will still not create it and then any subsequent snapshot/subvol operations you try to do will fail gloriously. Fix this by creating the uuid_root on remount rw if it was not already there. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Mark Fasheh 提交于
btrfs_ioctl_file_extent_same() uses __put_user_unaligned() to copy some data back to it's argument struct. Unfortunately, not all architectures provide __put_user_unaligned(), so compiles break on them if btrfs is selected. Instead, just copy the whole struct in / out at the start and end of operations, respectively. Signed-off-by: NMark Fasheh <mfasheh@suse.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Guangyu Sun 提交于
Commit 2bc55652 (Btrfs: don't update atime on RO subvolumes) ensures that the access time of an inode is not updated when the inode lives in a read-only subvolume. However, if a directory on a read-only subvolume is accessed, the atime is updated. This results in a write operation to a read-only subvolume. I believe that access times should never be updated on read-only subvolumes. To reproduce: # mkfs.btrfs -f /dev/dm-3 (...) # mount /dev/dm-3 /mnt # btrfs subvol create /mnt/sub Create subvolume '/mnt/sub' # mkdir /mnt/sub/dir # echo "abc" > /mnt/sub/dir/file # btrfs subvol snapshot -r /mnt/sub /mnt/rosnap Create a readonly snapshot of '/mnt/sub' in '/mnt/rosnap' # stat /mnt/rosnap/dir File: `/mnt/rosnap/dir' Size: 8 Blocks: 0 IO Block: 4096 directory Device: 16h/22d Inode: 257 Links: 1 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-09-11 07:21:49.389157126 -0400 Modify: 2013-09-11 07:22:02.330156079 -0400 Change: 2013-09-11 07:22:02.330156079 -0400 # ls /mnt/rosnap/dir file # stat /mnt/rosnap/dir File: `/mnt/rosnap/dir' Size: 8 Blocks: 0 IO Block: 4096 directory Device: 16h/22d Inode: 257 Links: 1 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2013-09-11 07:22:56.797151670 -0400 Modify: 2013-09-11 07:22:02.330156079 -0400 Change: 2013-09-11 07:22:02.330156079 -0400 Reported-by: NKoen De Wit <koen.de.wit@oracle.com> Signed-off-by: NGuangyu Sun <guangyu.sun@oracle.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Frank Holton 提交于
The kernel log entries for device label %s and device fsid %pU are missing the btrfs: prefix. Add those here. Signed-off-by: NFrank Holton <fholton@gmail.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 David Sterba 提交于
It's still possible to flip the filesystem into RW mode after it's remounted RO due to an abort. There are lots of places that check for the superblock error bit and will not write data, but we should not let the filesystem appear read-write. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 chandan 提交于
This patch makes it possible to set BTRFS_FS_TREE_OBJECTID as the default subvolume by passing a subvolume id of 0. Signed-off-by: Nchandan <chandan@linux.vnet.ibm.com> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
In btrfs_sync_file(), if the call to btrfs_log_dentry_safe() returns a negative error (for e.g. -ENOMEM via btrfs_log_inode()), we would return without ending/freeing the transaction. Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
The BUG() was replaced by btrfs_error() and return -EIO with the patch "get rid of one BUG() in write_all_supers()", but the missing mutex_unlock() was overlooked. The 0-DAY kernel build service from Intel reported the missing unlock which was found by the coccinelle tool: fs/btrfs/disk-io.c:3422:2-8: preceding lock on line 3374 Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
We don't do the iput when we fail to allocate our delayed delalloc work in __start_delalloc_inodes, fix this. Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
This isn't used for anything anymore, just remove it. Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
This is a left over of how we used to wait for ordered extents, which was to grab the inode and then run filemap flush on it. However if we have an ordered extent then we already are holding a ref on the inode, and we just use btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on the inode to start work on the ordered extent. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
Forever ago I made the worst case calculator say that we could potentially split into 3 blocks for every level on the way down, which isn't right. If we split we're only going to get two new blocks, the one we originally cow'ed and the new one we're going to split. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
This reverts commit 70afa399. It is causing performance issues and wasn't actually correct. There were problems with the way we flushed delalloc and that was the real cause of the early enospc. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
Various people have hit a deadlock when running btrfs/011. This is because when replacing nocow extents we will take the i_mutex to make sure nobody messes with the file while we are replacing the extent. The problem is we are already holding a transaction open, which is a locking inversion, so instead we need to save these inodes we find and then process them outside of the transaction. Further we can't just lock the inode and assume we are good to go. We need to lock the extent range and then read back the extent cache for the inode to make sure the extent really still points at the physical block we want. If it doesn't we don't have to copy it. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
So if we have dir_index items in the log that means we also have the inode item as well, which means that the inode's i_size is correct. However when we process dir_index'es we call btrfs_add_link() which will increase the directory's i_size for the new entry. To fix this we need to just set the dir items i_size to 0, and then as we find dir_index items we adjust the i_size. btrfs_add_link() will do it for new entries, and if the entry already exists we can just add the name_len to the i_size ourselves. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
A user reported a bug where his log would not replay because he was getting -EEXIST back. This was because he had a file moved into a directory that was logged. What happens is the file had a lower inode number, and so it is processed first when replaying the log, and so we add the inode ref in for the directory it was moved to. But then we process the directories DIR_INDEX item and try to add the inode ref for that inode and it fails because we already added it when we replayed the inode. To solve this problem we need to just process any DIR_INDEX items we have in the log first so this all is taken care of, and then we can replay the rest of the items. With this patch my reproducer can remount the file system properly instead of erroring out. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
Liu introduced a local copy of the last log commit for an inode to make sure we actually log an inode even if a log commit has already taken place. In order to make sure we didn't relog the same inode multiple times he set this local copy to the current trans when we log the inode, because usually we log the inode and then sync the log. The exception to this is during rename, we will relog an inode if the name changed and it is already in the log. The problem with this is then we go to sync the inode, and our check to see if the inode has already been logged is tripped and we don't sync the log. To fix this we need to _also_ check against the roots last log commit, because it could be less than what is in our local copy of the log commit. This fixes a bug where we rename a file into a directory and then fsync the directory and then on remount the directory is no longer there. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
If you just create a directory and then fsync that directory and then pull the power plug you will come back up and the directory will not be there. That is because we won't actually create directories if we've logged files inside of them since they will be created on replay, but in this check we will set our logged_trans of our current directory if it happens to be a directory, making us think it doesn't need to be logged. Fix the logic to only do this to parent directories. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
So forever we have had this thing to limit the amount of delalloc pages we'll setup to be written out to 128mb. This is because we have to lock all the pages in this range, so anything above this gets a bit unweildly, and also without a limit we'll happily allocate gigantic chunks of disk space. Turns out our check for this wasn't quite right, we wouldn't actually limit the chunk we wanted to write out, we'd just stop looking for more space after we went over the limit. So if you do a giant 20gb dd on my box with lots of ram I could get 2gig extents. This is fine normally, except when you go to relocate these extents and we can't find enough space to relocate these moster extents, since we have to be able to allocate exactly the same sized extent to move it around. So fix this by actually enforcing the limit. With this patch I'm no longer seeing giant 1.5gb extents. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Miao Xie 提交于
By the current code, if the requested size is very large, and all the extents in the free space cache are small, we will waste lots of the cpu time to cut the requested size in half and search the cache again and again until it gets down to the size the allocator can return. In fact, we can know the max extent size in the cache after the first search, so we needn't cut the size in half repeatedly, and just use the max extent size directly. This way can save lots of cpu time and make the performance grow up when there are only fragments in the free space cache. According to my test, if there are only 4KB free space extents in the fs, and the total size of those extents are 256MB, we can reduce the execute time of the following test from 5.4s to 1.4s. dd if=/dev/zero of=<testfile> bs=1MB count=1 oflag=sync Changelog v2 -> v3: - fix the problem that we skip the block group with the space which is less than we need. Changelog v1 -> v2: - address the problem that we return a wrong start position when searching the free space in a bitmap. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 David Sterba 提交于
Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Stefan Behrens 提交于
We want to know if there are debugging features compiled in, this may affect performance. The message is printed before the sanity checks. (This commit message is a copy of David Sterba's commit message when he introduced btrfs_print_info()). Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
Instead of removing the current inode from the red black tree and then add the new one, just use the red black tree replace operation, which is more efficient. Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com> Reviewed-by: NZach Brown <zab@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Ilya Dryomov 提交于
If replace was suspended by the umount, replace target device is added to the fs_devices->alloc_list during a later mount. This is obviously wrong. ->is_tgtdev_for_dev_replace is supposed to guard against that, but ->is_tgtdev_for_dev_replace is (and can only ever be) initialized *after* everything is opened and fs_devices lists are populated. Fix this by checking the devid instead: for replace targets it's always equal to BTRFS_DEV_REPLACE_DEVID. Cc: Stefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Josef Bacik 提交于
If we failed to actually allocate the correct size of the extent to relocate we will end up in an infinite loop because we won't return an error, we'll just move on to the next extent. So fix this up by returning an error, and then fix all the callers to return an error up the stack rather than BUG_ON()'ing. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 David Howells 提交于
Don't try to dump the index key that distinguishes an object if netfs data in the cookie the object refers to has been cleared (ie. the cookie has passed most of the way through __fscache_relinquish_cookie()). Since the netfs holds the index key, we can't get at it once the ->def and ->netfs_data pointers have been cleared - and a NULL pointer exception will ensue, usually just after a: CacheFiles: Error: Unexpected object collision error is reported. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Josh Boyer 提交于
In cachefiles_check_auxdata(), we allocate auxbuf but fail to free it if we determine there's an error or that the data is stale. Further, assigning the output of vfs_getxattr() to auxbuf->len gives problems with checking for errors as auxbuf->len is a u16. We don't actually need to set auxbuf->len, so keep the length in a variable for now. We shouldn't need to check the upper limit of the buffer as an overflow there should be indicated by -ERANGE. While we're at it, fscache_check_aux() returns an enum value, not an int, so assign it to an appropriately typed variable rather than to ret. Signed-off-by: NJosh Boyer <jwboyer@fedoraproject.org> Signed-off-by: NDavid Howells <dhowells@redhat.com> cc: Hongyi Jia <jiayisuse@gmail.com> cc: Milosz Tanski <milosz@adfin.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 9月, 2013 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Bjorn Helgaas 提交于
This fixes a copy and paste error introduced by 9f060e22 ("block: Convert integrity to bvec_alloc_bs()"). Found by Coverity (CID 1020654). Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NKent Overstreet <koverstreet@google.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 9月, 2013 7 次提交
-
-
由 Miklos Szeredi 提交于
If O_CREAT|O_EXCL are passed to open, then we know that either - the file is successfully created, or - the operation fails in some way. So previously we set FILE_CREATED before calling ->atomic_open() so the filesystem doesn't have to. This, however, led to bugs in the implementation that went unnoticed when the filesystem didn't check for existence, yet returned success. To prevent this kind of bug, require filesystems to always explicitly set FILE_CREATED on O_CREAT|O_EXCL and verify this in the VFS. Also added a couple more verifications for the result of atomic_open(): - Warn if filesystem set FILE_CREATED despite the lack of O_CREAT. - Warn if filesystem set FILE_CREATED but gave a negative dentry. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Miklos Szeredi 提交于
Set FILE_CREATED on O_CREAT|O_EXCL. If the NFS server honored our request for exclusivity then this must be correct. Currently this is a no-op, since the VFS sets FILE_CREATED anyway. The next patch will, however, require this flag to be always set by filesystems. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Miklos Szeredi 提交于
In gfs2_create_inode() set FILE_CREATED in *opened. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Miklos Szeredi 提交于
If an error occurs after having called finish_open() then fput() needs to be called on the already opened file. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Steve French <sfrench@samba.org> Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Miklos Szeredi 提交于
Fix documentation of ->atomic_open() and related functions: finish_open() and finish_no_open(). Also add details that seem to be unclear and a source of bugs (some of which are fixed in the following series). Cc-ing maintainers of all filesystems implementing ->atomic_open(). Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Sage Weil <sage@inktank.com> Cc: Steve French <sfrench@samba.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Don't drop ->wq_mutex before calling autofs4_notify_daemon() only to regain it there. Besides being pointless, that opens a race window where autofs4_wait_release() could've come and freed wq->name.name. And do the debugging printk in the "reused an existing wq" case before dropping ->wq_mutex - the same reason... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Acked-by: NIan Kent <raven@themaw.net>
-
由 Aruna Balakrishnaiah 提交于
Remove the messages indicating compression failure as it will add to the space during panic path. Reported-by: NSeiji Aguchi <seiji.aguchi@hds.com> Tested-by: NSeiji Aguchi <seiji.aguchi@hds.com> Signed-off-by: NAruna Balakrishnaiah <aruna@linux.vnet.ibm.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-