1. 02 10月, 2012 4 次提交
    • M
      Btrfs: fix full backref problem when inserting shared block reference · 361048f5
      Miao Xie 提交于
      If we create several snapshots at the same time, the following BUG_ON() will be
      triggered.
      
      	kernel BUG at fs/btrfs/extent-tree.c:6047!
      
      Steps to reproduce:
       # mkfs.btrfs <partition>
       # mount <partition> <mnt>
       # cd <mnt>
       # for ((i=0;i<2400;i++)); do touch long_name_to_make_tree_more_deep$i; done
       # for ((i=0; i<4; i++))
       > do
       > mkdir $i
       > for ((j=0; j<200; j++))
       > do
       > btrfs sub snap . $i/$j
       > done &
       > done
      
      The reason is:
      Before transaction commit, some operations changed the fs tree and new tree
      blocks were allocated because of COW. We used the implicit non-shared back
      reference for those newly allocated tree blocks because they were not shared by
      two or more trees.
      
      And then we created the first snapshot for the fs tree, according to the back
      reference rules, we also used implicit back refs for the child tree blocks of
      the root node of the fs tree, now those child nodes/leaves were shared by two
      trees.
      
      Then We didn't deal with the delayed references, and continued to change the fs
      tree(created the second snapshot and inserted the dir item of the new snapshot
      into the fs tree). According to the rules of the back reference, we added full
      back refs for those tree blocks whose parents have be shared by two trees.
      Now some newly allocated tree blocks had two types of the references.
      
      As we know, the delayed reference system handles these delayed references from
      back to front, and the full delayed reference is inserted after the implicit
      ones. So when we dealt with the back references of those newly allocated tree
      blocks, the full references was dealt with at first. And if the first reference
      is a shared back reference and the tree block that the reference points to is
      newly allocated, It would be considered as a tree block which is shared by two
      or more trees when it is allocated and should be a full back reference not a
      implicit one, the flag of its reference also should be set to FULL_BACKREF.
      But in fact, it was a non-shared tree block with a implicit reference at
      beginning, so it was not compulsory to set the flags to FULL_BACKREF. So BUG_ON
      was triggered.
      
      We have several methods to fix this bug:
      1. deal with delayed references after the snapshot is created and before we
         change the source tree of the snapshot. This is the easiest and safest way.
      2. modify the sort method of the delayed reference tree, make the full delayed
         references be inserted before the implicit ones. It is also very easy, but
         I don't know if it will introduce some problems or not.
      3. modify select_delayed_ref() and make it select the implicit delayed reference
         at first. This way is not so good because it may wastes CPU time if we have
         lots of delayed references.
      4. set the flags to FULL_BACKREF, this method is a little complex comparing with
         the 1st way.
      
      I chose the 1st way to fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      361048f5
    • M
      Btrfs: fix error path in create_pending_snapshot() · 6fa9700e
      Miao Xie 提交于
      This patch fixes the following problem:
      - If we failed to deal with the delayed dir items, we should abort transaction,
        just as its comment said. Fix it.
      - If root reference or root back reference insertion failed, we should
        abort transaction. Fix it.
      - Fix the double free problem of pending->inherit.
      - Do not restore the trans->rsv if we doesn't change it.
      - make the error path more clearly.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      6fa9700e
    • S
      Btrfs: set journal_info in async trans commit worker · e209db7a
      Sage Weil 提交于
      We expect current->journal_info to point to the trans handle we are
      committing.
      Signed-off-by: NSage Weil <sage@inktank.com>
      e209db7a
    • S
      Btrfs: pass lockdep rwsem metadata to async commit transaction · 6fc4e354
      Sage Weil 提交于
      The freeze rwsem is taken by sb_start_intwrite() and dropped during the
      commit_ or end_transaction().  In the async case, that happens in a worker
      thread.  Tell lockdep the calling thread is releasing ownership of the
      rwsem and the async thread is picking it up.
      
      XFS plays the same trick in fs/xfs/xfs_aops.c.
      Signed-off-by: NSage Weil <sage@inktank.com>
      6fc4e354
  2. 29 8月, 2012 2 次提交
  3. 31 7月, 2012 1 次提交
    • J
      btrfs: Convert to new freezing mechanism · b2b5ef5c
      Jan Kara 提交于
      We convert btrfs_file_aio_write() to use new freeze check.  We also add proper
      freeze protection to btrfs_page_mkwrite(). We also add freeze protection to
      the transaction mechanism to avoid starting transactions on frozen filesystem.
      At minimum this is necessary to stop iput() of unlinked file to change frozen
      filesystem during truncation.
      
      Checks in cleaner_kthread() and transaction_kthread() can be safely removed
      since btrfs_freeze() will lock the mutexes and thus block the threads (and they
      shouldn't have anything to do anyway).
      
      CC: linux-btrfs@vger.kernel.org
      CC: Chris Mason <chris.mason@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b2b5ef5c
  4. 26 7月, 2012 1 次提交
    • A
      Btrfs: introduce subvol uuids and times · 8ea05e3a
      Alexander Block 提交于
      This patch introduces uuids for subvolumes. Each
      subvolume has it's own uuid. In case it was snapshotted,
      it also contains parent_uuid. In case it was received,
      it also contains received_uuid.
      
      It also introduces subvolume ctime/otime/stime/rtime. The
      first two are comparable to the times found in inodes. otime
      is the origin/creation time and ctime is the change time.
      stime/rtime are only valid on received subvolumes.
      stime is the time of the subvolume when it was
      sent. rtime is the time of the subvolume when it was
      received.
      
      Additionally to the times, we have a transid for each
      time. They are updated at the same place as the times.
      
      btrfs receive uses stransid and rtransid to find out
      if a received subvolume changed in the meantime.
      
      If an older kernel mounts a filesystem with the
      extented fields, all fields become invalid. The next
      mount with a new kernel will detect this and reset the
      fields.
      Signed-off-by: NAlexander Block <ablock84@googlemail.com>
      Reviewed-by: NDavid Sterba <dave@jikos.cz>
      Reviewed-by: NArne Jansen <sensille@gmx.net>
      Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>
      8ea05e3a
  5. 24 7月, 2012 3 次提交
    • J
      Btrfs: change how we indicate we're adding csums · 0e721106
      Josef Bacik 提交于
      There is weird logic I had to put in place to make sure that when we were
      adding csums that we'd used the delalloc block rsv instead of the global
      block rsv.  Part of this meant that we had to free up our transaction
      reservation before we ran the delayed refs since csum deletion happens
      during the delayed ref work.  The problem with this is that when we release
      a reservation we will add it to the global reserve if it is not full in
      order to keep us going along longer before we have to force a transaction
      commit.  By releasing our reservation before we run delayed refs we don't
      get the opportunity to drain down the global reserve for the work we did, so
      we won't refill it as often.  This isn't a problem per-se, it just results
      in us possibly committing transactions more and more often, and in rare
      cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran
      out of space in our block rsv.
      
      This also helps us by holding onto space while the delayed refs run so we
      don't end up with as many people trying to do things at the same time, which
      again will help us not force commits or hit the use_block_rsv warnings.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0e721106
    • D
      Btrfs: small naming cleanup in join_transaction() · e4b50e14
      Dan Carpenter 提交于
      "root->fs_info" and "fs_info" are the same, but "fs_info" is prefered
      because it is shorter and that's what is used in the rest of the
      function.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      e4b50e14
    • C
      Btrfs: don't wait around for new log writers on an SSD · e39e64ac
      Chris Mason 提交于
      Waiting on spindles improves performance, but ssds want all the
      IO as quickly as we can push it down.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      e39e64ac
  6. 12 7月, 2012 5 次提交
  7. 10 7月, 2012 2 次提交
  8. 15 6月, 2012 2 次提交
    • J
      Btrfs: abort the transaction if the commit fails · 7b8b92af
      Josef Bacik 提交于
      If a transaction commit fails we don't abort it so we don't set an error on
      the file system.  This patch fixes that by actually calling the abort stuff
      and then adding a check for a fs error in the transaction start stuff to
      make sure it is caught properly.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7b8b92af
    • J
      Btrfs: wake up transaction waiters when aborting a transaction · d7096fc3
      Josef Bacik 提交于
      I was getting lots of hung tasks and a NULL pointer dereference because we
      are not cleaning up the transaction properly when it aborts.  First we need
      to reset the running_transaction to NULL so we don't get a bad dereference
      for any start_transaction callers after this.  Also we cannot rely on
      waitqueue_active() since it's just a list_empty(), so just call wake_up()
      directly since that will do the barrier for us and such.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d7096fc3
  9. 30 5月, 2012 3 次提交
  10. 19 4月, 2012 1 次提交
  11. 13 4月, 2012 1 次提交
  12. 29 3月, 2012 1 次提交
  13. 27 3月, 2012 1 次提交
  14. 22 3月, 2012 4 次提交
  15. 24 2月, 2012 1 次提交
  16. 23 2月, 2012 1 次提交
  17. 17 1月, 2012 1 次提交
  18. 07 1月, 2012 1 次提交
    • C
      Btrfs: run chunk allocations while we do delayed refs · 203bf287
      Chris Mason 提交于
      Btrfs tries to batch extent allocation tree changes to improve performance
      and reduce metadata trashing.  But it doesn't allocate new metadata chunks
      while it is doing allocations for the extent allocation tree.
      
      This commit changes the delayed refence code to do chunk allocations if we're
      getting low on room.  It prevents crashes and improves performance.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      203bf287
  19. 04 1月, 2012 2 次提交
    • J
      Btrfs: add waitqueue instead of doing busy waiting for more delayed refs · a168650c
      Jan Schmidt 提交于
      Now that we may be holding back delayed refs for a limited period, we
      might end up having no runnable delayed refs. Without this commit, we'd
      do busy waiting in that thread until another (runnable) ref arives.
      Instead, we're detecting this situation and use a waitqueue, such that
      we only try to run more refs after
      	a) another runnable ref was added  or
      	b) delayed refs are no longer held back
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      a168650c
    • A
      Btrfs: add sequence numbers to delayed refs · 00f04b88
      Arne Jansen 提交于
      Sequence numbers are needed to reconstruct the backrefs of a given extent to
      a certain point in time. The total set of backrefs consist of the set of
      backrefs recorded on disk plus the enqueued delayed refs for it that existed
      at that moment.
      
      This patch also adds a list that records all delayed refs which are
      currently in the process of being added.
      
      When walking all refs of an extent in btrfs_find_all_roots(), we freeze the
      current state of delayed refs, honor anythinh up to this point and prevent
      processing newer delayed refs to assert consistency.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      00f04b88
  20. 22 12月, 2011 1 次提交
    • A
      Btrfs: mark delayed refs as for cow · 66d7e7f0
      Arne Jansen 提交于
      Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
      from every call site. The for_cow parameter will later on be used to
      determine if a ref will change anything with respect to qgroups.
      
      Delayed refs coming from relocation are always counted as for_cow, as they
      don't change subvol quota.
      
      Also pass in the fs_info for later use.
      
      btrfs_find_all_roots() will use this as an optimization, as changes that are
      for_cow will not change anything with respect to which root points to a
      certain leaf. Thus, we don't need to add the current sequence number to
      those delayed refs.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      66d7e7f0
  21. 15 11月, 2011 1 次提交
    • L
      Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush · f1ebcc74
      Liu Bo 提交于
      The btrfs snapshotting code requires that once a root has been
      snapshotted, we don't change it during a commit.
      
      But there are two cases to lead to tree corruptions:
      
      1) multi-thread snapshots can commit serveral snapshots in a transaction,
         and this may change the src root when processing the following pending
         snapshots, which lead to the former snapshots corruptions;
      
      2) the free inode cache was changing the roots when it root the cache,
         which lead to corruptions.
      
      This fixes things by making sure we force COW the block after we create a
      snapshot during commiting a transaction, then any changes to the roots
      will result in COW, and we get all the fs roots and snapshot roots to be
      consistent.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f1ebcc74
  22. 11 11月, 2011 1 次提交