- 21 2月, 2013 8 次提交
-
-
由 Miao Xie 提交于
In some cases, we need commit the current transaction, but don't want to start a new one if there is no running transaction, so we introduce the function - btrfs_attach_transaction(), which can catch the current transaction, and return -ENOENT if there is no running transaction. But no running transaction doesn't mean the current transction completely, because we removed the running transaction before it completes. In some cases, it doesn't matter. But in some special cases, such as freeze fs, we hope the transaction is fully on disk, it will introduce some bugs, for example, we may feeze the fs and dump the data in the disk, if the transction doesn't complete, we would dump inconsistent data. So we need fix the above problem for those cases. We fixes this problem by introducing a function: btrfs_attach_transaction_barrier() if we hope all the transaction is fully on the disk, even they are not running, we can use this function. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Miao Xie 提交于
Now btrfs_commit_transaction() does this ret = btrfs_run_ordered_operations(root, 0) which async flushes all inodes on the ordered operations list, it introduced a deadlock that transaction-start task, transaction-commit task and the flush workers waited for each other. (See the following URL to get the detail http://marc.info/?l=linux-btrfs&m=136070705732646&w=2) As we know, if ->in_commit is set, it means someone is committing the current transaction, we should not try to join it if we are not JOIN or JOIN_NOLOCK, wait is the best choice for it. In this way, we can avoid the above problem. In this way, there is another benefit: there is no new transaction handle to block the transaction which is on the way of commit, once we set ->in_commit. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Miao Xie 提交于
In start_transactio(), we will try to join the transaction again after the current transaction is committed, so we should not release the reserved space of the qgroup. Fix it. Cc: Arne Jansen <sensille@gmx.net> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
Miao made the ordered operations stuff run async, which introduced a deadlock where we could get somebody (sync) racing in and committing the transaction while a commit was already happening. The new committer would try and flush ordered operations which would hang waiting for the commit to finish because it is done asynchronously and no longer inherits the callers trans handle. To fix this we need to make the ordered operations list a per transaction list. We can get new inodes added to the ordered operation list by truncating them and then having another process writing to them, so this makes it so that anybody trying to add an ordered operation _must_ start a transaction in order to add itself to the list, which will keep new inodes from getting added to the ordered operations list after we start committing. This should fix the deadlock and also keeps us from doing a lot more work than we need to during commit. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 David Sterba 提交于
The defrag operation can take very long, we want to have a way how to cancel it. The code checks for a pending signal at safe points in the defrag loops and returns EAGAIN. This means a user can press ^C after running 'btrfs fi defrag', woks for both defrag modes, files and root. Returning from the command was instant in my light tests, but may take longer depending on the aging factor of the filesystem. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Eric Sandeen 提交于
The entry point at the defrag ioctl always sets "cache only" to 0; the codepaths haven't run for a long time as far as I can tell. Chris says they're dead code, so remove them. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
I hit a deadlock where transaction commit was waiting on num_writers to be 0. This happened because somebody came into btrfs_commit_transaction and noticed we had aborted and it went to cleanup_transaction. This shouldn't happen because cleanup_transaction is really to fixup a bad commit, it doesn't do the normal trans handle cleanup things. So if we have an error just do the normal btrfs_end_transaction dance and return. Once we are in the actual commit path we can use cleanup_transaction and be good to go. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Miao Xie 提交于
There is no lock to protect fs_info->fs_state, it will introduce some problems, such as the value may be covered by the other task when several tasks modify it. For example: Task0 - CPU0 Task1 - CPU1 mov %fs_state rax or $0x1 rax mov %fs_state rax or $0x2 rax mov rax %fs_state mov rax %fs_state The expected value is 3, but in fact, it is 2. Though this problem doesn't happen now (because there is only one flag currently), the code is error prone, if we add other flags, the above problem will happen to a certainty. Now we use bit operation for it to fix the above problem. In this way, we can make the code more robust and be easy to add new flags. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 20 2月, 2013 5 次提交
-
-
由 Miao Xie 提交于
We forget to check the return value of btrfs_run_ordered_operations() when flushing all the pending stuffs, fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Miao Xie 提交于
We forget to check the return value of btrfs_start_delalloc_inodes(), fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
If we start running low on metadata space we will try to allocate a chunk, which could then try to allocate a chunk to add the device entry. The thing is we allocate a chunk before we try really hard to make the allocation, so we should be able to find space for the device entry. Add a flag to the trans handle so we know we're currently allocating a chunk so we can just bail out if we try to allocate another chunk. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Miao Xie 提交于
Since we do not want to delay the async transaction commit, we should use common work, not delayed work. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-
由 Miao Xie 提交于
We clear the transaction object and the trans handle when they are about to be freed, it is unnecessary, cleanup it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-
- 06 2月, 2013 1 次提交
-
-
由 Miao Xie 提交于
When we fail to start a transaction, we need to release the reserved free space and qgroup space, fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 25 1月, 2013 2 次提交
-
-
由 Miao Xie 提交于
First, though the current transaction->aborted check can stop the commit early and avoid unnecessary operations, it is too early, and some transaction handles don't end, those handles may set transaction->aborted after the check. Second, when we commit the transaction, we will wake up some worker threads to flush the space cache and inode cache. Those threads also allocate some transaction handles and may set transaction->aborted if some serious error happens. So we need more check for ->aborted when committing the transaction. Fix it. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Miao Xie 提交于
We may access and update transaction->aborted on the different CPUs without lock, so we need ACCESS_ONCE() wrapper to prevent the compiler from creating unsolicited accesses and make sure we can get the right value. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 18 12月, 2012 1 次提交
-
-
由 Chris Mason 提交于
The handling for directory crc hash overflows was fairly obscure, split_leaf returns EOVERFLOW when we try to extend the item and that is supposed to bubble up to userland. For a while it did so, but along the way we added better handling of errors and forced the FS readonly if we hit IO errors during the directory insertion. Along the way, we started testing only for EEXIST and the EOVERFLOW case was dropped. The end result is that we may force the FS readonly if we catch a directory hash bucket overflow. This fixes a few problem spots. First I add tests for EOVERFLOW in the places where we can safely just return the error up the chain. btrfs_rename is harder though, because it tries to insert the new directory item only after it has already unlinked anything the rename was going to overwrite. Rather than adding very complex logic, I added a helper to test for the hash overflow case early while it is still safe to bail out. Snapshot and subvolume creation had a similar problem, so they are using the new helper now too. Signed-off-by: NChris Mason <chris.mason@fusionio.com> Reported-by: NPascal Junod <pascal@junod.info>
-
- 17 12月, 2012 2 次提交
-
-
由 Miao Xie 提交于
If the id of the existed transaction is more than the one we specified, it means the specified transaction was commited, so we should return 0, not EINVAL. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Miao Xie 提交于
If there is no running transaction in the fs, we needn't start a new one when we want to start sync. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 13 12月, 2012 5 次提交
-
-
由 Stefan Behrens 提交于
This commit contains all the essential changes to the core code of Btrfs for support of the device replace procedure. Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Liu Bo 提交于
- 'nr' is no more used. - btrfs_btree_balance_dirty() and __btrfs_btree_balance_dirty() can share a bunch of code. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Julia Lawall 提交于
Use WARN rather than printk followed by WARN_ON(1), for conciseness. A simplified version of the semantic patch that makes this transformation is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression list es; @@ -printk( +WARN(1, es); -WARN_ON(1); // </smpl> Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: NDavid Sterba <dsterba@suse.cz> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Miao Xie 提交于
Consider the following case: Task1 Task2 start_transaction commit_transaction check pending snapshots list and the list is empty. add pending snapshot into list skip the delalloc flush end_transaction ... And then the problem that the snapshot is different with the source subvolume happen. This patch fixes the above problem by flush all pending stuffs when all the other tasks end the transaction. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Miao Xie 提交于
If we flush inodes with pending delalloc in a transaction, we may join the same transaction handler more than 2 times. The reason is: Task use_count of trans handle commit_transaction 1 |-> btrfs_start_delalloc_inodes 1 |-> run_delalloc_nocow 1 |-> join_transaction 2 |-> cow_file_range 2 |-> join_transaction 3 In fact, cow_file_range needn't join the transaction again because the caller have joined the transaction, so we fix this problem by this way. Reported-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 12 12月, 2012 3 次提交
-
-
由 Miao Xie 提交于
The process of the ordered operations is similar to the delalloc inode flush, so we handle them by flush workers. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Miao Xie 提交于
This patch introduce a new worker pool named "flush_workers", and if we want to force all the inode with pending delalloc to the disks, we can queue those inodes into the work queue of the worker pool, in this way, those inodes will be flushed by multi-task. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
由 Miao Xie 提交于
In some places(such as: evicting inode), we just can not flush the reserved space of delalloc, flushing the delayed directory index and delayed inode is OK, but we don't try to flush those things and just go back when there is no enough space to be reserved. This patch fixes this problem. We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL. If we can in the transaction, we should not flush anything, or the deadlock would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used, and we will flush all things. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@fusionio.com>
-
- 26 10月, 2012 1 次提交
-
-
由 Josef Bacik 提交于
On a really full file system I was getting ENOSPC back from btrfs_update_inode when trying to update the parent inode when creating a snapshot. Just use the fallback method so we can update the inode and not have to worry about having a delayed ref. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 09 10月, 2012 5 次提交
-
-
由 Josef Bacik 提交于
Everytime we write out dirty pages we search for an offset in the tree, convert the bits in the state, and then when we wait we search for the offset again and clear the bits. So for every dirty range in the io tree we are doing 4 rb searches, which is suboptimal. With this patch we are only doing 2 searches for every cycle (modulo weird things happening). Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Miao Xie 提交于
With the following debug patch: static int btrfs_freeze(struct super_block *sb) { + struct btrfs_fs_info *fs_info = btrfs_sb(sb); + struct btrfs_transaction *trans; + + spin_lock(&fs_info->trans_lock); + trans = fs_info->running_transaction; + if (trans) { + printk("Transid %llu, use_count %d, num_writer %d\n", + trans->transid, atomic_read(&trans->use_count), + atomic_read(&trans->num_writers)); + } + spin_unlock(&fs_info->trans_lock); return 0; } I found there was a orphan transaction after the freeze operation was done. It is because the transaction may not be committed when the transaction handle end even though it is the last handle of the current transaction. This design avoid committing the transaction frequently, but also introduce the above problem. So I add btrfs_attach_transaction() which can catch the current transaction and commit it. If there is no transaction, it will return ENOENT, and do not anything. This function also can be used to instead of btrfs_join_transaction_freeze() because it don't increase the writer counter and don't start a new transaction, so it also can fix the deadlock between sync and freeze. Besides that, it is used to instead of btrfs_join_transaction() in transaction_kthread(), because if there is no transaction, the transaction kthread needn't anything. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-
由 Miao Xie 提交于
This patch add a type field into the transaction handle structure, in this way, we needn't implement various end-transaction functions and can make the code more simple and readable. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-
由 Miao Xie 提交于
This patch fixes memory leak of the transaction handle which happened when starting transaction failed on a freezed fs. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-
由 Miao Xie 提交于
The macro btrfs_abort_transaction() can get the line number of the code where the problem happens, so we should invoke it in the place that the error occurs, or we will lose the line number. Reported-by: NDavid Sterba <dave@jikos.cz> Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-
- 04 10月, 2012 3 次提交
-
-
由 Josef Bacik 提交于
So we start our freeze, somebody comes in and does an fsync() on a file where we have to commit a transaction for whatever reason, and we will deadlock because the freeze is waiting on FS_FREEZE people to stop writing to the file system, but the transaction is waiting for its free space inodes to be written out, which are in turn waiting on sb_start_intwrite while trying to write the file extents. To fix this we'll just skip the sb_start_intwrite() if we TRANS_JOIN_NOLOCK since we're being waited on by a transaction commit so we're safe wrt to freeze and this will keep us from deadlocking. Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Liu Bo 提交于
nocow_only is now an obsolete argument. Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
-
由 Josef Bacik 提交于
I screwed this up, there is a race between checking if there is a running transaction and actually starting a transaction in sync where we could race with a freezer and get ourselves into trouble. To fix this we need to make a new join type to only do the try lock on the freeze stuff. If it fails we'll return EPERM and just return from sync. This fixes a hang Liu Bo reported when running xfstest 68 in a loop. Thanks, Reported-by: NLiu Bo <bo.li.liu@oracle.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
- 02 10月, 2012 4 次提交
-
-
由 Josef Bacik 提交于
So we have lots of places where we try to preallocate chunks in order to make sure we have enough space as we make our allocations. This has historically meant that we're constantly tweaking when we should allocate a new chunk, and historically we have gotten this horribly wrong so we way over allocate either metadata or data. To try and keep this from happening we are going to make it so that the block group item insertion is done out of band at the end of a transaction. This will allow us to create chunks even if we are trying to make an allocation for the extent tree. With this patch my enospc tests run faster (didn't expect this) and more efficiently use the disk space (this is what I wanted). Thanks, Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Josef Bacik 提交于
Sage reported the following lockdep backtrace ===================================== [ BUG: bad unlock balance detected! ] 3.6.0-rc2-ceph-00171-gc7ed62d #1 Not tainted ------------------------------------- btrfs-cleaner/7607 is trying to release lock (sb_internal) at: [<ffffffffa00422ae>] btrfs_commit_transaction+0xa6e/0xb20 [btrfs] but there are no more locks to release! other info that might help us debug this: 1 lock held by btrfs-cleaner/7607: #0: (&fs_info->cleaner_mutex){+.+...}, at: [<ffffffffa003b405>] cleaner_kthread+0x95/0x120 [btrfs] stack backtrace: Pid: 7607, comm: btrfs-cleaner Not tainted 3.6.0-rc2-ceph-00171-gc7ed62d #1 Call Trace: [<ffffffffa00422ae>] ? btrfs_commit_transaction+0xa6e/0xb20 [btrfs] [<ffffffff810afa9e>] print_unlock_inbalance_bug+0xfe/0x110 [<ffffffff810b289e>] lock_release_non_nested+0x1ee/0x310 [<ffffffff81172f9b>] ? kmem_cache_free+0x7b/0x160 [<ffffffffa004106c>] ? put_transaction+0x8c/0x130 [btrfs] [<ffffffffa00422ae>] ? btrfs_commit_transaction+0xa6e/0xb20 [btrfs] [<ffffffff810b2a95>] lock_release+0xd5/0x220 [<ffffffff81173071>] ? kmem_cache_free+0x151/0x160 [<ffffffff8117d9ed>] __sb_end_write+0x7d/0x90 [<ffffffffa00422ae>] btrfs_commit_transaction+0xa6e/0xb20 [btrfs] [<ffffffff81079850>] ? __init_waitqueue_head+0x60/0x60 [<ffffffff81634c6b>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffffa0042758>] __btrfs_end_transaction+0x368/0x3c0 [btrfs] [<ffffffffa0042808>] btrfs_end_transaction_throttle+0x18/0x20 [btrfs] [<ffffffffa00318f0>] btrfs_drop_snapshot+0x410/0x600 [btrfs] [<ffffffff8132babd>] ? do_raw_spin_unlock+0x5d/0xb0 [<ffffffffa00430ef>] btrfs_clean_old_snapshots+0xaf/0x150 [btrfs] [<ffffffffa003b405>] ? cleaner_kthread+0x95/0x120 [btrfs] [<ffffffffa003b419>] cleaner_kthread+0xa9/0x120 [btrfs] [<ffffffffa003b370>] ? btrfs_destroy_delayed_refs.isra.102+0x220/0x220 [btrfs] [<ffffffff810791ee>] kthread+0xae/0xc0 [<ffffffff810b379d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10 [<ffffffff81635430>] ? retint_restore_args+0x13/0x13 [<ffffffff81079140>] ? flush_kthread_work+0x1a0/0x1a0 [<ffffffff8163e740>] ? gs_change+0x13/0x13 This is because the throttle stuff can commit the transaction, which expects to be the one stopping the intwrite stuff, but we've already done it in the __btrfs_end_transaction. Moving the sb_end_intewrite after this logic makes the lockdep go away. Thanks, Tested-by: NSage Weil <sage@inktank.com> Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
-
由 Miao Xie 提交于
When we delete a inode, we will remove all the delayed items including delayed inode update, and then truncate all the relative metadata. If there is lots of metadata, we will end the current transaction, and start a new transaction to truncate the left metadata. In this way, we will leave a inode item that its link counter is > 0, and also may leave some directory index items in fs/file tree after the current transaction ends. In other words, the metadata in this fs/file tree is inconsistent. If we create a snapshot for this tree now, we will find a inode with corrupted metadata in the new snapshot, and we won't continue to drop the left metadata, because its link counter is not 0. We fix this problem by updating the inode item before the current transaction ends. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-
由 Miao Xie 提交于
The snapshot should be the image of the fs tree before it was created, so the metadata of the snapshot should not exist in the its tree. But now, we found the directory item and directory name index is in both the snapshot tree and the fs tree. It introduces some problems and makes the users feel strange: # mkfs.btrfs /dev/sda1 # mount /dev/sda1 /mnt # mkdir /mnt/1 # cd /mnt/1 # btrfs subvolume snapshot /mnt snap0 # ls -a /mnt/1/snap0/1 . .. [no other file/dir] # ll /mnt/1/snap0/ total 0 drwxr-xr-x 1 root root 10 Ju1 24 12:11 1 ^^^ There is no file/dir in it, but it's size is 10 # cd /mnt/1/snap0/1/snap0 [Enter a unexisted directory successfully...] There is nothing in the directory 1 in snap0, but btrfs told the length of this directory is 10. Beside that, we can enter an unexisted directory, it is very strange to the users. # btrfs subvolume snapshot /mnt/1/snap0 /mnt/snap1 # ll /mnt/1/snap0/1/ total 0 [None] # ll /mnt/snap1/1/ total 0 drwxr-xr-x 1 root root 0 Ju1 24 12:14 snap0 And the source of snap1 did have any directory in Directory 1, but snap1 have a snap0, it is different between the source and the snapshot. So I think we should insert directory item and directory name index and update the parent inode as the last step of snapshot creation, and do not leave the useless metadata in the file tree. Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
-