1. 21 2月, 2013 8 次提交
    • M
      Btrfs: fix uncompleted transaction · d4edf39b
      Miao Xie 提交于
      In some cases, we need commit the current transaction, but don't want
      to start a new one if there is no running transaction, so we introduce
      the function - btrfs_attach_transaction(), which can catch the current
      transaction, and return -ENOENT if there is no running transaction.
      
      But no running transaction doesn't mean the current transction completely,
      because we removed the running transaction before it completes. In some
      cases, it doesn't matter. But in some special cases, such as freeze fs, we
      hope the transaction is fully on disk, it will introduce some bugs, for
      example, we may feeze the fs and dump the data in the disk, if the transction
      doesn't complete, we would dump inconsistent data. So we need fix the above
      problem for those cases.
      
      We fixes this problem by introducing a function:
      	btrfs_attach_transaction_barrier()
      if we hope all the transaction is fully on the disk, even they are not
      running, we can use this function.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      d4edf39b
    • M
      Btrfs: fix the deadlock between the transaction start/attach and commit · 178260b2
      Miao Xie 提交于
      Now btrfs_commit_transaction() does this
      
      ret = btrfs_run_ordered_operations(root, 0)
      
      which async flushes all inodes on the ordered operations list, it introduced
      a deadlock that transaction-start task, transaction-commit task and the flush
      workers waited for each other.
      (See the following URL to get the detail
       http://marc.info/?l=linux-btrfs&m=136070705732646&w=2)
      
      As we know, if ->in_commit is set, it means someone is committing the
      current transaction, we should not try to join it if we are not JOIN
      or JOIN_NOLOCK, wait is the best choice for it. In this way, we can avoid
      the above problem. In this way, there is another benefit: there is no new
      transaction handle to block the transaction which is on the way of commit,
      once we set ->in_commit.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      178260b2
    • M
      Btrfs: fix the qgroup reserved space is released prematurely · 4b824906
      Miao Xie 提交于
      In start_transactio(), we will try to join the transaction again after
      the current transaction is committed, so we should not release the
      reserved space of the qgroup. Fix it.
      
      Cc: Arne Jansen <sensille@gmx.net>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      4b824906
    • J
      Btrfs: place ordered operations on a per transaction list · 569e0f35
      Josef Bacik 提交于
      Miao made the ordered operations stuff run async, which introduced a
      deadlock where we could get somebody (sync) racing in and committing the
      transaction while a commit was already happening.  The new committer would
      try and flush ordered operations which would hang waiting for the commit to
      finish because it is done asynchronously and no longer inherits the callers
      trans handle.  To fix this we need to make the ordered operations list a per
      transaction list.  We can get new inodes added to the ordered operation list
      by truncating them and then having another process writing to them, so this
      makes it so that anybody trying to add an ordered operation _must_ start a
      transaction in order to add itself to the list, which will keep new inodes
      from getting added to the ordered operations list after we start committing.
      This should fix the deadlock and also keeps us from doing a lot more work
      than we need to during commit.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      569e0f35
    • D
      btrfs: add cancellation points to defrag · 210549eb
      David Sterba 提交于
      The defrag operation can take very long, we want to have a way how to
      cancel it. The code checks for a pending signal at safe points in the
      defrag loops and returns EAGAIN. This means a user can press ^C after
      running 'btrfs fi defrag', woks for both defrag modes, files and root.
      
      Returning from the command was instant in my light tests, but may take
      longer depending on the aging factor of the filesystem.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      210549eb
    • E
      btrfs: remove cache only arguments from defrag path · de78b51a
      Eric Sandeen 提交于
      The entry point at the defrag ioctl always sets "cache only" to 0;
      the codepaths haven't run for a long time as far as I can
      tell.  Chris says they're dead code, so remove them.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      de78b51a
    • J
      Btrfs: if we aren't committing just end the transaction if we error out · e4a2bcac
      Josef Bacik 提交于
      I hit a deadlock where transaction commit was waiting on num_writers to be
      0.  This happened because somebody came into btrfs_commit_transaction and
      noticed we had aborted and it went to cleanup_transaction.  This shouldn't
      happen because cleanup_transaction is really to fixup a bad commit, it
      doesn't do the normal trans handle cleanup things.  So if we have an error
      just do the normal btrfs_end_transaction dance and return.  Once we are in
      the actual commit path we can use cleanup_transaction and be good to go.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e4a2bcac
    • M
      Btrfs: use bit operation for ->fs_state · 87533c47
      Miao Xie 提交于
      There is no lock to protect fs_info->fs_state, it will introduce
      some problems, such as the value may be covered by the other task
      when several tasks modify it. For example:
      	Task0 - CPU0		Task1 - CPU1
      	mov %fs_state rax
      	or $0x1 rax
      				mov %fs_state rax
      				or $0x2 rax
      	mov rax %fs_state
      				mov rax %fs_state
      The expected value is 3, but in fact, it is 2.
      
      Though this problem doesn't happen now (because there is only one
      flag currently), the code is error prone, if we add other flags,
      the above problem will happen to a certainty.
      
      Now we use bit operation for it to fix the above problem.
      In this way, we can make the code more robust and be easy to
      add new flags.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      87533c47
  2. 20 2月, 2013 5 次提交
  3. 06 2月, 2013 1 次提交
  4. 25 1月, 2013 2 次提交
  5. 18 12月, 2012 1 次提交
    • C
      Btrfs: fix hash overflow handling · 9c52057c
      Chris Mason 提交于
      The handling for directory crc hash overflows was fairly obscure,
      split_leaf returns EOVERFLOW when we try to extend the item and that is
      supposed to bubble up to userland.  For a while it did so, but along the
      way we added better handling of errors and forced the FS readonly if we
      hit IO errors during the directory insertion.
      
      Along the way, we started testing only for EEXIST and the EOVERFLOW case
      was dropped.  The end result is that we may force the FS readonly if we
      catch a directory hash bucket overflow.
      
      This fixes a few problem spots.  First I add tests for EOVERFLOW in the
      places where we can safely just return the error up the chain.
      
      btrfs_rename is harder though, because it tries to insert the new
      directory item only after it has already unlinked anything the rename
      was going to overwrite.  Rather than adding very complex logic, I added
      a helper to test for the hash overflow case early while it is still safe
      to bail out.
      
      Snapshot and subvolume creation had a similar problem, so they are using
      the new helper now too.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NPascal Junod <pascal@junod.info>
      9c52057c
  6. 17 12月, 2012 2 次提交
  7. 13 12月, 2012 5 次提交
  8. 12 12月, 2012 3 次提交
  9. 26 10月, 2012 1 次提交
  10. 09 10月, 2012 5 次提交
    • J
      Btrfs: cache extent state when writing out dirty metadata pages · e6138876
      Josef Bacik 提交于
      Everytime we write out dirty pages we search for an offset in the tree,
      convert the bits in the state, and then when we wait we search for the
      offset again and clear the bits.  So for every dirty range in the io tree we
      are doing 4 rb searches, which is suboptimal.  With this patch we are only
      doing 2 searches for every cycle (modulo weird things happening).  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e6138876
    • M
      Btrfs: fix orphan transaction on the freezed filesystem · 354aa0fb
      Miao Xie 提交于
      With the following debug patch:
      
       static int btrfs_freeze(struct super_block *sb)
       {
      + 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
      +	struct btrfs_transaction *trans;
      +
      +	spin_lock(&fs_info->trans_lock);
      +	trans = fs_info->running_transaction;
      +	if (trans) {
      +		printk("Transid %llu, use_count %d, num_writer %d\n",
      +			trans->transid, atomic_read(&trans->use_count),
      +			atomic_read(&trans->num_writers));
      +	}
      +	spin_unlock(&fs_info->trans_lock);
       	return 0;
       }
      
      I found there was a orphan transaction after the freeze operation was done.
      
      It is because the transaction may not be committed when the transaction handle
      end even though it is the last handle of the current transaction. This design
      avoid committing the transaction frequently, but also introduce the above
      problem.
      
      So I add btrfs_attach_transaction() which can catch the current transaction
      and commit it. If there is no transaction, it will return ENOENT, and do not
      anything.
      
      This function also can be used to instead of btrfs_join_transaction_freeze()
      because it don't increase the writer counter and don't start a new transaction,
      so it also can fix the deadlock between sync and freeze.
      
      Besides that, it is used to instead of btrfs_join_transaction() in
      transaction_kthread(), because if there is no transaction, the transaction
      kthread needn't anything.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      354aa0fb
    • M
      Btrfs: add a type field for the transaction handle · a698d075
      Miao Xie 提交于
      This patch add a type field into the transaction handle structure,
      in this way, we needn't implement various end-transaction functions
      and can make the code more simple and readable.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      a698d075
    • M
      Btrfs: fix memory leak in start_transaction() · e8830e60
      Miao Xie 提交于
      This patch fixes memory leak of the transaction handle which happened
      when starting transaction failed on a freezed fs.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      e8830e60
    • M
      Btrfs: fix the missing error information in create_pending_snapshot() · 8732d44f
      Miao Xie 提交于
      The macro btrfs_abort_transaction() can get the line number of the code
      where the problem happens, so we should invoke it in the place that the
      error occurs, or we will lose the line number.
      Reported-by: NDavid Sterba <dave@jikos.cz>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      8732d44f
  11. 04 10月, 2012 3 次提交
    • J
      Btrfs: fix race with freeze and free space inodes · 98114659
      Josef Bacik 提交于
      So we start our freeze, somebody comes in and does an fsync() on a file
      where we have to commit a transaction for whatever reason, and we will
      deadlock because the freeze is waiting on FS_FREEZE people to stop writing
      to the file system, but the transaction is waiting for its free space inodes
      to be written out, which are in turn waiting on sb_start_intwrite while
      trying to write the file extents.  To fix this we'll just skip the
      sb_start_intwrite() if we TRANS_JOIN_NOLOCK since we're being waited on by a
      transaction commit so we're safe wrt to freeze and this will keep us from
      deadlocking.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      98114659
    • L
      Btrfs: kill obsolete arguments in btrfs_wait_ordered_extents · 6bbe3a9c
      Liu Bo 提交于
      nocow_only is now an obsolete argument.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      6bbe3a9c
    • J
      Btrfs: fix race in sync and freeze again · 60376ce4
      Josef Bacik 提交于
      I screwed this up, there is a race between checking if there is a running
      transaction and actually starting a transaction in sync where we could race
      with a freezer and get ourselves into trouble.  To fix this we need to make
      a new join type to only do the try lock on the freeze stuff.  If it fails
      we'll return EPERM and just return from sync.  This fixes a hang Liu Bo
      reported when running xfstest 68 in a loop.  Thanks,
      Reported-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      60376ce4
  12. 02 10月, 2012 4 次提交
    • J
      Btrfs: delay block group item insertion · ea658bad
      Josef Bacik 提交于
      So we have lots of places where we try to preallocate chunks in order to
      make sure we have enough space as we make our allocations.  This has
      historically meant that we're constantly tweaking when we should allocate a
      new chunk, and historically we have gotten this horribly wrong so we way
      over allocate either metadata or data.  To try and keep this from happening
      we are going to make it so that the block group item insertion is done out
      of band at the end of a transaction.  This will allow us to create chunks
      even if we are trying to make an allocation for the extent tree.  With this
      patch my enospc tests run faster (didn't expect this) and more efficiently
      use the disk space (this is what I wanted).  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ea658bad
    • J
      Btrfs: move the sb_end_intwrite until after the throttle logic · 6df7881a
      Josef Bacik 提交于
      Sage reported the following lockdep backtrace
      
      =====================================
      [ BUG: bad unlock balance detected! ]
      3.6.0-rc2-ceph-00171-gc7ed62d #1 Not tainted
      -------------------------------------
      btrfs-cleaner/7607 is trying to release lock (sb_internal) at:
      [<ffffffffa00422ae>] btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
      but there are no more locks to release!
      
      other info that might help us debug this:
      1 lock held by btrfs-cleaner/7607:
       #0:  (&fs_info->cleaner_mutex){+.+...}, at: [<ffffffffa003b405>] cleaner_kthread+0x95/0x120 [btrfs]
      
      stack backtrace:
      Pid: 7607, comm: btrfs-cleaner Not tainted 3.6.0-rc2-ceph-00171-gc7ed62d #1
      Call Trace:
       [<ffffffffa00422ae>] ? btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
       [<ffffffff810afa9e>] print_unlock_inbalance_bug+0xfe/0x110
       [<ffffffff810b289e>] lock_release_non_nested+0x1ee/0x310
       [<ffffffff81172f9b>] ? kmem_cache_free+0x7b/0x160
       [<ffffffffa004106c>] ? put_transaction+0x8c/0x130 [btrfs]
       [<ffffffffa00422ae>] ? btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
       [<ffffffff810b2a95>] lock_release+0xd5/0x220
       [<ffffffff81173071>] ? kmem_cache_free+0x151/0x160
       [<ffffffff8117d9ed>] __sb_end_write+0x7d/0x90
       [<ffffffffa00422ae>] btrfs_commit_transaction+0xa6e/0xb20 [btrfs]
       [<ffffffff81079850>] ? __init_waitqueue_head+0x60/0x60
       [<ffffffff81634c6b>] ? _raw_spin_unlock+0x2b/0x40
       [<ffffffffa0042758>] __btrfs_end_transaction+0x368/0x3c0 [btrfs]
       [<ffffffffa0042808>] btrfs_end_transaction_throttle+0x18/0x20 [btrfs]
       [<ffffffffa00318f0>] btrfs_drop_snapshot+0x410/0x600 [btrfs]
       [<ffffffff8132babd>] ? do_raw_spin_unlock+0x5d/0xb0
       [<ffffffffa00430ef>] btrfs_clean_old_snapshots+0xaf/0x150 [btrfs]
       [<ffffffffa003b405>] ? cleaner_kthread+0x95/0x120 [btrfs]
       [<ffffffffa003b419>] cleaner_kthread+0xa9/0x120 [btrfs]
       [<ffffffffa003b370>] ? btrfs_destroy_delayed_refs.isra.102+0x220/0x220 [btrfs]
       [<ffffffff810791ee>] kthread+0xae/0xc0
       [<ffffffff810b379d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
       [<ffffffff81635430>] ? retint_restore_args+0x13/0x13
       [<ffffffff81079140>] ? flush_kthread_work+0x1a0/0x1a0
       [<ffffffff8163e740>] ? gs_change+0x13/0x13
      
      This is because the throttle stuff can commit the transaction, which expects to
      be the one stopping the intwrite stuff, but we've already done it in the
      __btrfs_end_transaction.  Moving the sb_end_intewrite after this logic makes the
      lockdep go away.  Thanks,
      Tested-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6df7881a
    • M
      Btrfs: fix corrupted metadata in the snapshot · 8407aa46
      Miao Xie 提交于
      When we delete a inode, we will remove all the delayed items including delayed
      inode update, and then truncate all the relative metadata. If there is lots of
      metadata, we will end the current transaction, and start a new transaction to
      truncate the left metadata. In this way, we will leave a inode item that its
      link counter is > 0, and also may leave some directory index items in fs/file tree
      after the current transaction ends. In other words, the metadata in this fs/file tree
      is inconsistent. If we create a snapshot for this tree now, we will find a inode with
      corrupted metadata in the new snapshot, and we won't continue to drop the left metadata,
      because its link counter is not 0.
      
      We fix this problem by updating the inode item before the current transaction ends.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      8407aa46
    • M
      Btrfs: fix the snapshot that should not exist · 42874b3d
      Miao Xie 提交于
      The snapshot should be the image of the fs tree before it was created,
      so the metadata of the snapshot should not exist in the its tree. But now, we
      found the directory item and directory name index is in both the snapshot tree
      and the fs tree. It introduces some problems and makes the users feel strange:
      
       # mkfs.btrfs /dev/sda1
       # mount /dev/sda1 /mnt
       # mkdir /mnt/1
       # cd /mnt/1
       # btrfs subvolume snapshot /mnt snap0
       # ls -a /mnt/1/snap0/1
       .	..	[no other file/dir]
      
       # ll /mnt/1/snap0/
       total 0
       drwxr-xr-x 1 root root 10 Ju1 24 12:11 1
      			^^^
      			There is no file/dir in it, but it's size is 10
      
       # cd /mnt/1/snap0/1/snap0
       [Enter a unexisted directory successfully...]
      
      There is nothing in the directory 1 in snap0, but btrfs told the length of
      this directory is 10. Beside that, we can enter an unexisted directory, it is
      very strange to the users.
      
       # btrfs subvolume snapshot /mnt/1/snap0 /mnt/snap1
       # ll /mnt/1/snap0/1/
       total 0
       [None]
       # ll /mnt/snap1/1/
       total 0
       drwxr-xr-x 1 root root 0 Ju1 24 12:14 snap0
      
      And the source of snap1 did have any directory in Directory 1, but snap1 have
      a snap0, it is different between the source and the snapshot.
      
      So I think we should insert directory item and directory name index and update
      the parent inode as the last step of snapshot creation, and do not leave the
      useless metadata in the file tree.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      42874b3d