1. 26 7月, 2012 1 次提交
    • A
      Btrfs: introduce subvol uuids and times · 8ea05e3a
      Alexander Block 提交于
      This patch introduces uuids for subvolumes. Each
      subvolume has it's own uuid. In case it was snapshotted,
      it also contains parent_uuid. In case it was received,
      it also contains received_uuid.
      
      It also introduces subvolume ctime/otime/stime/rtime. The
      first two are comparable to the times found in inodes. otime
      is the origin/creation time and ctime is the change time.
      stime/rtime are only valid on received subvolumes.
      stime is the time of the subvolume when it was
      sent. rtime is the time of the subvolume when it was
      received.
      
      Additionally to the times, we have a transid for each
      time. They are updated at the same place as the times.
      
      btrfs receive uses stransid and rtransid to find out
      if a received subvolume changed in the meantime.
      
      If an older kernel mounts a filesystem with the
      extented fields, all fields become invalid. The next
      mount with a new kernel will detect this and reset the
      fields.
      Signed-off-by: NAlexander Block <ablock84@googlemail.com>
      Reviewed-by: NDavid Sterba <dave@jikos.cz>
      Reviewed-by: NArne Jansen <sensille@gmx.net>
      Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>
      8ea05e3a
  2. 24 7月, 2012 3 次提交
    • J
      Btrfs: change how we indicate we're adding csums · 0e721106
      Josef Bacik 提交于
      There is weird logic I had to put in place to make sure that when we were
      adding csums that we'd used the delalloc block rsv instead of the global
      block rsv.  Part of this meant that we had to free up our transaction
      reservation before we ran the delayed refs since csum deletion happens
      during the delayed ref work.  The problem with this is that when we release
      a reservation we will add it to the global reserve if it is not full in
      order to keep us going along longer before we have to force a transaction
      commit.  By releasing our reservation before we run delayed refs we don't
      get the opportunity to drain down the global reserve for the work we did, so
      we won't refill it as often.  This isn't a problem per-se, it just results
      in us possibly committing transactions more and more often, and in rare
      cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran
      out of space in our block rsv.
      
      This also helps us by holding onto space while the delayed refs run so we
      don't end up with as many people trying to do things at the same time, which
      again will help us not force commits or hit the use_block_rsv warnings.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0e721106
    • D
      Btrfs: small naming cleanup in join_transaction() · e4b50e14
      Dan Carpenter 提交于
      "root->fs_info" and "fs_info" are the same, but "fs_info" is prefered
      because it is shorter and that's what is used in the rest of the
      function.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      e4b50e14
    • C
      Btrfs: don't wait around for new log writers on an SSD · e39e64ac
      Chris Mason 提交于
      Waiting on spindles improves performance, but ssds want all the
      IO as quickly as we can push it down.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      e39e64ac
  3. 12 7月, 2012 5 次提交
  4. 10 7月, 2012 2 次提交
  5. 15 6月, 2012 2 次提交
    • J
      Btrfs: abort the transaction if the commit fails · 7b8b92af
      Josef Bacik 提交于
      If a transaction commit fails we don't abort it so we don't set an error on
      the file system.  This patch fixes that by actually calling the abort stuff
      and then adding a check for a fs error in the transaction start stuff to
      make sure it is caught properly.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7b8b92af
    • J
      Btrfs: wake up transaction waiters when aborting a transaction · d7096fc3
      Josef Bacik 提交于
      I was getting lots of hung tasks and a NULL pointer dereference because we
      are not cleaning up the transaction properly when it aborts.  First we need
      to reset the running_transaction to NULL so we don't get a bad dereference
      for any start_transaction callers after this.  Also we cannot rely on
      waitqueue_active() since it's just a list_empty(), so just call wake_up()
      directly since that will do the barrier for us and such.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d7096fc3
  6. 30 5月, 2012 3 次提交
  7. 19 4月, 2012 1 次提交
  8. 13 4月, 2012 1 次提交
  9. 29 3月, 2012 1 次提交
  10. 27 3月, 2012 1 次提交
  11. 22 3月, 2012 4 次提交
  12. 24 2月, 2012 1 次提交
  13. 23 2月, 2012 1 次提交
  14. 17 1月, 2012 1 次提交
  15. 07 1月, 2012 1 次提交
    • C
      Btrfs: run chunk allocations while we do delayed refs · 203bf287
      Chris Mason 提交于
      Btrfs tries to batch extent allocation tree changes to improve performance
      and reduce metadata trashing.  But it doesn't allocate new metadata chunks
      while it is doing allocations for the extent allocation tree.
      
      This commit changes the delayed refence code to do chunk allocations if we're
      getting low on room.  It prevents crashes and improves performance.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      203bf287
  16. 04 1月, 2012 2 次提交
    • J
      Btrfs: add waitqueue instead of doing busy waiting for more delayed refs · a168650c
      Jan Schmidt 提交于
      Now that we may be holding back delayed refs for a limited period, we
      might end up having no runnable delayed refs. Without this commit, we'd
      do busy waiting in that thread until another (runnable) ref arives.
      Instead, we're detecting this situation and use a waitqueue, such that
      we only try to run more refs after
      	a) another runnable ref was added  or
      	b) delayed refs are no longer held back
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      a168650c
    • A
      Btrfs: add sequence numbers to delayed refs · 00f04b88
      Arne Jansen 提交于
      Sequence numbers are needed to reconstruct the backrefs of a given extent to
      a certain point in time. The total set of backrefs consist of the set of
      backrefs recorded on disk plus the enqueued delayed refs for it that existed
      at that moment.
      
      This patch also adds a list that records all delayed refs which are
      currently in the process of being added.
      
      When walking all refs of an extent in btrfs_find_all_roots(), we freeze the
      current state of delayed refs, honor anythinh up to this point and prevent
      processing newer delayed refs to assert consistency.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      00f04b88
  17. 22 12月, 2011 1 次提交
    • A
      Btrfs: mark delayed refs as for cow · 66d7e7f0
      Arne Jansen 提交于
      Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
      from every call site. The for_cow parameter will later on be used to
      determine if a ref will change anything with respect to qgroups.
      
      Delayed refs coming from relocation are always counted as for_cow, as they
      don't change subvol quota.
      
      Also pass in the fs_info for later use.
      
      btrfs_find_all_roots() will use this as an optimization, as changes that are
      for_cow will not change anything with respect to which root points to a
      certain leaf. Thus, we don't need to add the current sequence number to
      those delayed refs.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      66d7e7f0
  18. 15 11月, 2011 1 次提交
    • L
      Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush · f1ebcc74
      Liu Bo 提交于
      The btrfs snapshotting code requires that once a root has been
      snapshotted, we don't change it during a commit.
      
      But there are two cases to lead to tree corruptions:
      
      1) multi-thread snapshots can commit serveral snapshots in a transaction,
         and this may change the src root when processing the following pending
         snapshots, which lead to the former snapshots corruptions;
      
      2) the free inode cache was changing the roots when it root the cache,
         which lead to corruptions.
      
      This fixes things by making sure we force COW the block after we create a
      snapshot during commiting a transaction, then any changes to the roots
      will result in COW, and we get all the fs roots and snapshot roots to be
      consistent.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f1ebcc74
  19. 11 11月, 2011 1 次提交
  20. 06 11月, 2011 3 次提交
    • C
      Btrfs: fix race during transaction joins · d43317dc
      Chris Mason 提交于
      While we're allocating ram for a new transaction, we drop our spinlock.
      When we get the lock back, we do check to see if a transaction started
      while we slept, but we don't check to make sure it isn't blocked
      because a commit has already started.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d43317dc
    • C
      Btrfs: ClearPageError during writepage and clean_tree_block · bf0da8c1
      Chris Mason 提交于
      Failure testing was tripping up over stale PageError bits in
      metadata pages.  If we have an io error on a block, and later on
      end up reusing it, nobody ever clears PageError on those pages.
      
      During commit, we'll find PageError and think we had trouble writing
      the block, which will lead to aborts and other problems.
      
      This changes clean_tree_block and the btrfs writepage code to
      clear the PageError bit.  In both cases we're either completely
      done with the page or the page has good stuff and the error bit
      is no longer valid.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      bf0da8c1
    • D
      btrfs: separate superblock items out of fs_info · 6c41761f
      David Sterba 提交于
      fs_info has now ~9kb, more than fits into one page. This will cause
      mount failure when memory is too fragmented. Top space consumers are
      super block structures super_copy and super_for_commit, ~2.8kb each.
      Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)
      
      Add a wrapper for freeing fs_info and all of it's dynamically allocated
      members.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      6c41761f
  21. 20 10月, 2011 4 次提交
    • J
      Btrfs: seperate out btrfs_block_rsv_check out into 2 different functions · 36ba022a
      Josef Bacik 提交于
      Currently btrfs_block_rsv_check does 2 things, it will either refill a block
      reserve like in the truncate or refill case, or it will check to see if there is
      enough space in the global reserve and possibly refill it.  However because of
      overcommit we could be well overcommitting ourselves just to try and refill the
      global reserve, when really we should just be committing the transaction.  So
      breack this out into btrfs_block_rsv_refill and btrfs_block_rsv_check.  Refill
      will try to reserve more metadata if it can and btrfs_block_rsv_check will not,
      it will only tell you if the factor of the total space is still reserved.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      36ba022a
    • J
      Btrfs: release trans metadata bytes before flushing delayed refs · b24e03db
      Josef Bacik 提交于
      We started setting trans->block_rsv = NULL to allow the delayed refs flushing
      stuff to use the right block_rsv and then just made
      btrfs_trans_release_metadata() unconditionally use the trans block rsv.  The
      problem with this is we need to reserve some space in the transaction and then
      migrate it to the global block rsv, so we need to be able to free that out
      properly.  So instead just move btrfs_trans_release_metadata() before the
      delayed ref flushing and use trans->block_rsv for the freeing.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      b24e03db
    • J
      Btrfs: introduce mount option no_space_cache · 73bc1876
      Josef Bacik 提交于
      Some users have requested this and I've found I needed a way to disable cache
      loading without actually clearing the cache, so introduce the no_space_cache
      option.  Before we check the super blocks cache generation field and if it was
      populated we always turned space caching on.  Now we check this and set the
      space cache option on, and then parse the mount options so that if we want it
      off it get's turned off.  Then we check the mount option all the places we do
      the caching work instead of checking the super's cache generation.  This makes
      things more consistent and lets us turn space caching off.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      73bc1876
    • J
      Btrfs: stop using write_one_page · 1728366e
      Josef Bacik 提交于
      While looking for a performance regression a user was complaining about, I
      noticed that we had a regression with the varmail test of filebench.  This was
      introduced by
      
      0d10ee2e
      
      which keeps us from calling writepages in writepage.  This is a correct change,
      however it happens to help the varmail test because we write out in larger
      chunks.  This is largly to do with how we write out dirty pages for each
      transaction.  If you run filebench with
      
      load varmail
      set $dir=/mnt/btrfs-test
      run 60
      
      prior to this patch you would get ~1420 ops/second, but with the patch you get
      ~1200 ops/second.  This is a 16% decrease.  So since we know the range of dirty
      pages we want to write out, don't write out in one page chunks, write out in
      ranges.  So to do this we call filemap_fdatawrite_range() on the range of bytes.
      Then we convert the DIRTY extents to NEED_WAIT extents.  When we then call
      btrfs_wait_marked_extents() we only have to filemap_fdatawait_range() on that
      range and clear the NEED_WAIT extents.  This doesn't get us back to our original
      speeds, but I've been seeing ~1380 ops/second, which is a <5% regression as
      opposed to a >15% regression.  That is acceptable given that the original commit
      greatly reduces our latency to begin with.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      1728366e