1. 01 9月, 2013 18 次提交
    • G
      Btrfs: Make BTRFS_DEV_REPLACE_DEVID an unsigned long long constant · 6e71c47a
      Geert Uytterhoeven 提交于
      The internal btrfs device id is a u64, hence make the constant
      BTRFS_DEV_REPLACE_DEVID "unsigned long long" as well, so we no longer need
      a cast to print it.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      6e71c47a
    • S
      Btrfs: add mount option to force UUID tree checking · f420ee1e
      Stefan Behrens 提交于
      This should never be needed, but since all functions are there
      to check and rebuild the UUID tree, a mount option is added that
      allows to force this check and rebuild procedure.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      f420ee1e
    • S
      Btrfs: check UUID tree during mount if required · 70f80175
      Stefan Behrens 提交于
      If the filesystem was mounted with an old kernel that was not
      aware of the UUID tree, this is detected by looking at the
      uuid_tree_generation field of the superblock (similar to how
      the free space cache is doing it). If a mismatch is detected
      at mount time, a thread is started that does two things:
      1. Iterate through the UUID tree, check each entry, delete those
         entries that are not valid anymore (i.e., the subvol does not
         exist anymore or the value changed).
      2. Iterate through the root tree, for each found subvolume, add
         the UUID tree entries for the subvolume (if they are not
         already there).
      
      This mechanism is also used to handle and repair errors that
      happened during the initial creation and filling of the tree.
      The update of the uuid_tree_generation field (which indicates
      that the state of the UUID tree is up to date) is blocked until
      all create and repair operations are successfully completed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      70f80175
    • S
      Btrfs: introduce uuid-tree-gen field · 26432799
      Stefan Behrens 提交于
      In order to be able to detect the case that a filesystem is mounted
      with an old kernel, add a uuid-tree-gen field like the free space
      cache is doing it. It is part of the super block and written with
      each commit. Old kernels do not know this field and don't update it.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      26432799
    • S
      Btrfs: fill UUID tree initially · 803b2f54
      Stefan Behrens 提交于
      When the UUID tree is initially created, a task is spawned that
      walks through the root tree. For each found subvolume root_item,
      the uuid and received_uuid entries in the UUID tree are added.
      This is such a quick operation so that in case somebody wants
      to unmount the filesystem while the task is still running, the
      unmount is delayed until the UUID tree building task is finished.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      803b2f54
    • S
      Btrfs: maintain subvolume items in the UUID tree · dd5f9615
      Stefan Behrens 提交于
      When a new subvolume or snapshot is created, a new UUID item is added
      to the UUID tree. Such items are removed when the subvolume is deleted.
      The ioctl to set the received subvolume UUID is also touched and will
      now also add this received UUID into the UUID tree together with the
      ID of the subvolume. The latter is also done when read-only snapshots
      are created which inherit all the send/receive information from the
      parent subvolume.
      
      User mode programs use the BTRFS_IOC_TREE_SEARCH ioctl to search and
      read in the UUID tree.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      dd5f9615
    • S
      Btrfs: create UUID tree if required · f7a81ea4
      Stefan Behrens 提交于
      This tree is not created by mkfs.btrfs. Therefore when a filesystem
      is mounted writable and the UUID tree does not exist, this tree is
      created if required. The tree is also added to the fs_info structure
      and initialized, but this commit does not yet read or write UUID tree
      elements.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      f7a81ea4
    • S
      Btrfs: introduce a tree for items that map UUIDs to something · 07b30a49
      Stefan Behrens 提交于
      Mapping UUIDs to subvolume IDs is an operation with a high effort
      today. Today, the algorithm even has quadratic effort (based on the
      number of existing subvolumes), which means, that it takes minutes
      to send/receive a single subvolume if 10,000 subvolumes exist. But
      even linear effort would be too much since it is a waste. And these
      data structures to allow mapping UUIDs to subvolume IDs are created
      every time a btrfs send/receive instance is started.
      
      It is much more efficient to maintain a searchable persistent data
      structure in the filesystem, one that is updated whenever a
      subvolume/snapshot is created and deleted, and when the received
      subvolume UUID is set by the btrfs-receive tool.
      
      Therefore kernel code is added with this commit that is able to
      maintain data structures in the filesystem that allow to quickly
      search for a given UUID and to retrieve data that is assigned to
      this UUID, like which subvolume ID is related to this UUID.
      
      This commit adds a new tree to hold UUID-to-data mapping items. The
      key of the items is the full UUID plus the key type BTRFS_UUID_KEY.
      Multiple data blocks can be stored for a given UUID, a type/length/
      value scheme is used.
      
      Now follows the lengthy justification, why a new tree was added
      instead of using the existing root tree:
      
      The first approach was to not create another tree that holds UUID
      items. Instead, the items should just go into the top root tree.
      Unfortunately this confused the algorithm to assign the objectid
      of subvolumes and snapshots. The reason is that
      btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for
      the first created subvol or snapshot after mounting a filesystem,
      and this function simply searches for the largest used objectid in
      the root tree keys to pick the next objectid to assign. Of course,
      the UUID keys have always been the ones with the highest offset
      value, and the next assigned subvol ID was wastefully huge.
      
      To use any other existing tree did not look proper. To apply a
      workaround such as setting the objectid to zero in the UUID item
      key and to implement collision handling would either add
      limitations (in case of a btrfs_extend_item() approach to handle
      the collisions) or a lot of complexity and source code (in case a
      key would be looked up that is free of collisions). Adding new code
      that introduces limitations is not good, and adding code that is
      complex and lengthy for no good reason is also not good. That's the
      justification why a completely new tree was introduced.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      07b30a49
    • S
      btrfs: mark some local function as 'static' · 171170c1
      Sergei Trofimovich 提交于
      Cc: Josef Bacik <jbacik@fusionio.com>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      171170c1
    • S
      Btrfs: get rid of sparse warnings · 35a3621b
      Stefan Behrens 提交于
      make C=2 fs/btrfs/ CF=-D__CHECK_ENDIAN__
      
      I tried to filter out the warnings for which patches have already
      been sent to the mailing list, pending for inclusion in btrfs-next.
      
      All these changes should be obviously safe.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      35a3621b
    • J
      Btrfs: fix send issues related to inode number reuse · ba5e8f2e
      Josef Bacik 提交于
      If you are sending a snapshot and specifying a parent snapshot we will walk the
      trees and figure out where they differ and send the differences only.  The way
      we check for differences are if the leaves aren't the same and if the keys are
      not the same within the leaves.  So if neither leaf is the same (ie the leaf has
      been cow'ed from the parent snapshot) we walk each item in the send root and
      check it against the parent root.  If the items match exactly then we don't do
      anything.  This doesn't quite work for inode refs, since they will just have the
      name and the parent objectid.  If you move the file from a directory and then
      remove that directory and re-create a directory with the same inode number as
      the old directory and then move that file back into that directory we will
      assume that nothing changed and you will get errors when you try to receive.
      
      In order to fix this we need to do extra checking to see if the inode ref really
      is the same or not.  So do this by passing down BTRFS_COMPARE_TREE_SAME if the
      items match.  Then if the key type is an inode ref we can do some extra
      checking, otherwise we just keep processing.  The extra checking is to look up
      the generation of the directory in the parent volume and compare it to the
      generation of the send volume.  If they match then they are the same directory
      and we are good to go.  If they don't we have to add them to the changed refs
      list.
      
      This means we have to track the generation of the ref we're trying to lookup
      when we iterate all the refs for a particular inode.  So in the case of looking
      for new refs we have to get the generation from the parent volume, and in the
      case of looking for deleted refs we have to get the generation from the send
      volume to compare with.
      
      There was also the issue of using a ulist to keep track of the directories we
      needed to check.  Because we can get a deleted ref and a new ref for the same
      inode number the ulist won't work since it indexes based on the value.  So
      instead just dup any directory ref we find and add it to a local list, and then
      process that list as normal and do away with using a ulist for this altogether.
      
      Before we would fail all of the tests in the far-progs that related to moving
      directories (test group 32).  With this patch we now pass these tests, and all
      of the tests in the far-progs send testing suite.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      ba5e8f2e
    • J
      Btrfs: avoid starting a transaction in the write path · 00361589
      Josef Bacik 提交于
      I noticed while looking at a deadlock that we are always starting a transaction
      in cow_file_range().  This isn't really needed since we only need a transaction
      if we are doing an inline extent, or if the allocator needs to allocate a chunk.
      So push down all the transaction start stuff to be closer to where we actually
      need a transaction in all of these cases.  This will hopefully reduce our write
      latency when we are committing often.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      00361589
    • J
      Btrfs: fix heavy delalloc related deadlock · 9ffba8cd
      Josef Bacik 提交于
      I added a patch where we started taking the ordered operations mutex when we
      waited on ordered extents.  We need this because we splice the list and process
      it, so if a flusher came in during this scenario it would think the list was
      empty and we'd usually get an early ENOSPC.  The problem with this is that this
      lock is used in transaction committing.  So we end up with something like this
      
      Transaction commit
      	-> wait on writers
      
      Delalloc flusher
      	-> run_ordered_operations (holds mutex)
      		->wait for filemap-flush to do its thing
      
      flush task
      	-> cow_file_range
      		->wait on btrfs_join_transaction because we're commiting
      
      some other task
      	-> commit_transaction because we notice trans->transaction->flush is set
      		-> run_ordered_operations (hang on mutex)
      
      We need to disentangle the ordered operations flushing from the delalloc
      flushing, since they are separate things.  This solves the deadlock issue I was
      seeing.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      9ffba8cd
    • D
      btrfs: add mount option to set commit interval · 8b87dc17
      David Sterba 提交于
      I'ts hardcoded to 30 seconds which is fine for most users. Higher values
      defer data being synced to permanent storage with obvious consequences
      when the system crashes. The upper bound is not forced, but a warning is
      printed if it's more than 300 seconds (5 minutes).
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      8b87dc17
    • J
      Btrfs: handle errors when doing slow caching · 36cce922
      Josef Bacik 提交于
      Alex Lyakas reported a bug where wait_block_group_cache_progress() would wait
      forever if a drive failed.  This is because we just bail out if there is an
      error while trying to cache a block group, we don't update anybody who may be
      waiting.  So this introduces a new enum for the cache state in case of error and
      makes everybody bail out if we have an error.  Alex tested and verified this
      patch fixed his problem.  This fixes bz 59431.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      36cce922
    • M
      Btrfs: don't cache the csum value into the extent state tree · facc8a22
      Miao Xie 提交于
      Before applying this patch, we cached the csum value into the extent state
      tree when reading some data from the disk, this operation increased the lock
      contention of the state tree.
      
      Now, we just store the csum value into the bio structure or other unshared
      structure, so we can reduce the lock contention.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      facc8a22
    • Q
      btrfs: Cleanup for using BTRFS_SETGET_STACK instead of raw convert · 3cae210f
      Qu Wenruo 提交于
      Some codes still use the cpu_to_lexx instead of the
      BTRFS_SETGET_STACK_FUNCS declared in ctree.h.
      
      Also added some BTRFS_SETGET_STACK_FUNCS for btrfs_header btrfs_timespec
      and other structures.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NMiao Xie <miaoxie@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      3cae210f
    • J
      btrfs: fall back to global reservation when removing subvolumes · ee3441b4
      Jeff Mahoney 提交于
      I recently did some ENOSPC testing that involved filling the disk
      while create and removing snapshots in a loop. During the test cycle,
      I ran into an ENOSPC when trying to remove a snapshot, leaving the fs
      stuck in ENOSPC even after a umount/mount cycle.
      
      This patch allow subvolume removal to fall back onto the global
      block reservation in order to succeed when it would have failed
      otherwise.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      ee3441b4
  2. 02 7月, 2013 2 次提交
    • J
      Btrfs: check if we can nocow if we don't have data space · 7ee9e440
      Josef Bacik 提交于
      We always just try and reserve data space when we write, but if we are out of
      space but have prealloc'ed extents we should still successfully write.  This
      patch will try and see if we can write to prealloc'ed space and if we can go
      ahead and allow the write to continue.  With this patch we now pass xfstests
      generic/274.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7ee9e440
    • J
      Btrfs: use a percpu to keep track of possibly pinned bytes · b150a4f1
      Josef Bacik 提交于
      There are all of these checks in the ENOSPC code to see if committing the
      transaction would free up enough space to make the allocation.  This is because
      early on we just committed the transaction and hoped and prayed, which resulted
      in cases where it took _forever_ to get an ENOSPC when we really were out of
      space.  So we check space_info->bytes_pinned, except this isn't completely true
      because it doesn't account for space we may free but are stuck in delayed refs.
      So tests like xfstests 226 would fail because we wouldn't commit the transaction
      to free up the data space.  So instead add a percpu counter that will be a
      little fuzzier, it will add bytes as soon as we try to free up the space, and
      remove any space it doesn't actually free up when we get around to doing the
      actual free.  We then 0 out this counter every transaction period so we have a
      better idea of how much space we will actually free up by committing this
      transaction.  With this patch we now pass xfstests 226.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b150a4f1
  3. 01 7月, 2013 1 次提交
    • J
      Btrfs: fix transaction throttling for delayed refs · 1be41b78
      Josef Bacik 提交于
      Dave has this fs_mark script that can make btrfs abort with sufficient amount of
      ram.  This is because with more ram we can keep more dirty metadata in cache
      which in a round about way makes for many more pending delayed refs.  What
      happens is we end up not throttling the transaction enough so when we go to
      commit the transaction when we've completely filled the file system we'll
      abort() because we use all of the space in the global reserve and we still have
      delayed refs to run.  To fix this we need to make the delayed ref flushing and
      the transaction throttling dependant upon the number of delayed refs that we
      have instead of how much reserved space is left in the global reserve.  With
      this patch we not only stop aborting transactions but we also get a smoother run
      speed with fs_mark and it makes us about 10% faster.  Thanks,
      Reported-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      1be41b78
  4. 14 6月, 2013 12 次提交
    • J
      Btrfs: exclude logged extents before replying when we are mixed · 8c2a1a30
      Josef Bacik 提交于
      With non-mixed block groups we replay the logs before we're allowed to do any
      writes, so we get away with not pinning/removing the data extents until right
      when we replay them.  However with mixed block groups we allocate out of the
      same pool, so we could easily allocate a metadata block that was logged in our
      tree log.  To deal with this we just need to notice that we have mixed block
      groups and do the normal excluding/removal dance during the pin stage of the log
      replay and that way we don't allocate metadata blocks from areas we have logged
      data extents.  With this patch we now pass xfstests generic/311 with mixed
      block groups turned on.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      8c2a1a30
    • J
      Btrfs: fix qgroup rescan resume on mount · b382a324
      Jan Schmidt 提交于
      When called during mount, we cannot start the rescan worker thread until
      open_ctree is done. This commit restuctures the qgroup rescan internals to
      enable a clean deferral of the rescan resume operation.
      
      First of all, the struct qgroup_rescan is removed, saving us a malloc and
      some initialization synchronizations problems. Its only element (the worker
      struct) now lives within fs_info just as the rest of the rescan code.
      
      Then setting up a rescan worker is split into several reusable stages.
      Currently we have three different rescan startup scenarios:
      	(A) rescan ioctl
      	(B) rescan resume by mount
      	(C) rescan by quota enable
      
      Each case needs its own combination of the four following steps:
      	(1) set the progress [A, C: zero; B: state of umount]
      	(2) commit the transaction [A]
      	(3) set the counters [A, C: zero; B: state of umount]
      	(4) start worker [A, B, C]
      
      qgroup_rescan_init does step (1). There's no extra function added to commit
      a transaction, we've got that already. qgroup_rescan_zero_tracking does
      step (3). Step (4) is nothing more than a call to the generic
      btrfs_queue_worker.
      
      We also get rid of a double check for the rescan progress during
      btrfs_qgroup_account_ref, which is no longer required due to having step 2
      from the list above.
      
      As a side effect, this commit prepares to move the rescan start code from
      btrfs_run_qgroups (which is run during commit) to a less time critical
      section.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b382a324
    • J
      Btrfs: simplify unlink reservations · d52be818
      Josef Bacik 提交于
      Dave pointed out a problem where if you filled up a file system as much as
      possible you couldn't remove any files.  The whole unlink reservation thing is
      convoluted because it tries to guess if it's going to add space to unlink
      something or not, and has all these odd uncommented cases where it simply does
      not try.  So to fix this I've added a way to conditionally steal from the global
      reserve if we can't make our normal reservation.  If we have more than half the
      space in the global reserve free we will go ahead and steal from the global
      reserve.  With this patch Dave's reproducer now works and I can rm all the files
      on the file system.  Thanks,
      Reported-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      d52be818
    • M
      Btrfs: make the state of the transaction more readable · 4a9d8bde
      Miao Xie 提交于
      We used 3 variants to track the state of the transaction, it was complex
      and wasted the memory space. Besides that, it was hard to understand that
      which types of the transaction handles should be blocked in each transaction
      state, so the developers often made mistakes.
      
      This patch improved the above problem. In this patch, we define 6 states
      for the transaction,
        enum btrfs_trans_state {
      	TRANS_STATE_RUNNING		= 0,
      	TRANS_STATE_BLOCKED		= 1,
      	TRANS_STATE_COMMIT_START	= 2,
      	TRANS_STATE_COMMIT_DOING	= 3,
      	TRANS_STATE_UNBLOCKED		= 4,
      	TRANS_STATE_COMPLETED		= 5,
      	TRANS_STATE_MAX			= 6,
        }
      and just use 1 variant to track those state.
      
      In order to make the blocked handle types for each state more clear,
      we introduce a array:
        unsigned int btrfs_blocked_trans_types[TRANS_STATE_MAX] = {
      	[TRANS_STATE_RUNNING]		= 0U,
      	[TRANS_STATE_BLOCKED]		= (__TRANS_USERSPACE |
      					   __TRANS_START),
      	[TRANS_STATE_COMMIT_START]	= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH),
      	[TRANS_STATE_COMMIT_DOING]	= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN),
      	[TRANS_STATE_UNBLOCKED]		= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN |
      					   __TRANS_JOIN_NOLOCK),
      	[TRANS_STATE_COMPLETED]		= (__TRANS_USERSPACE |
      					   __TRANS_START |
      					   __TRANS_ATTACH |
      					   __TRANS_JOIN |
      					   __TRANS_JOIN_NOLOCK),
        }
      it is very intuitionistic.
      
      Besides that, because we remove ->in_commit in transaction structure, so
      the lock ->commit_lock which was used to protect it is unnecessary, remove
      ->commit_lock.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      4a9d8bde
    • M
      Btrfs: introduce per-subvolume ordered extent list · 199c2a9c
      Miao Xie 提交于
      The reason we introduce per-subvolume ordered extent list is the same
      as the per-subvolume delalloc inode list.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      199c2a9c
    • M
      Btrfs: introduce per-subvolume delalloc inode list · eb73c1b7
      Miao Xie 提交于
      When we create a snapshot, we need flush all delalloc inodes in the
      fs, just flushing the inodes in the source tree is OK. So we introduce
      per-subvolume delalloc inode list.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      eb73c1b7
    • M
      Btrfs: introduce grab/put functions for the root of the fs/file tree · b0feb9d9
      Miao Xie 提交于
      The grab/put funtions will be used in the next patch, which need grab
      the root object and ensure it is not freed. We use reference counter
      instead of the srcu lock is to aovid blocking the memory reclaim task,
      which invokes synchronize_srcu().
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b0feb9d9
    • M
      Btrfs: cleanup the similar code of the fs root read · cb517eab
      Miao Xie 提交于
      There are several functions whose code is similar, such as
        btrfs_find_last_root()
        btrfs_read_fs_root_no_radix()
      
      Besides that, some functions are invoked twice, it is unnecessary,
      for example, we are sure that all roots which is found in
        btrfs_find_orphan_roots()
      have their orphan items, so it is unnecessary to check the orphan
      item again.
      
      So cleanup it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      cb517eab
    • M
      Btrfs: make the snap/subv deletion end more early when the fs is R/O · babbf170
      Miao Xie 提交于
      The snapshot/subvolume deletion might spend lots of time, it would make
      the remount task wait for a long time. This patch improve this problem,
      we will break the deletion if the fs is remounted to be R/O. It will make
      the users happy.
      
      Cc: David Sterba <dsterba@suse.cz>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      babbf170
    • A
      Minor format cleanup. · 1c89cdd1
      Andreas Philipp 提交于
      Clean up the format of the definitions of BTRFS_BLOCK_GROUP_RAID5 and
      BTRFS_BLOCK_GROUP_RAID6.
      Signed-off-by: NAndreas Philipp <philipp.andreas@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      1c89cdd1
    • J
      Btrfs: add ioctl to wait for qgroup rescan completion · 57254b6e
      Jan Schmidt 提交于
      btrfs_qgroup_wait_for_completion waits until the currently running qgroup
      operation completes. It returns immediately when no rescan process is in
      progress. This is useful to automate things around the rescan process (e.g.
      testing).
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      57254b6e
    • W
      Btrfs: introduce qgroup_ulist to avoid frequently allocating/freeing ulist · 1e8f9158
      Wang Shilong 提交于
      When doing qgroup accounting, we call ulist_alloc()/ulist_free() every time
      when we want to walk qgroup tree.
      
      By introducing 'qgroup_ulist', we only need to call ulist_alloc()/ulist_free()
      once. This reduce some sys time to allocate memory, see the measurements below
      
      fsstress -p 4 -n 10000 -d $dir
      
      With this patch:
      
      real    0m50.153s
      user    0m0.081s
      sys     0m6.294s
      
      real    0m51.113s
      user    0m0.092s
      sys     0m6.220s
      
      real    0m52.610s
      user    0m0.096s
      sys     0m6.125s	avg 6.213
      -----------------------------------------------------
      Without the patch:
      
      real    0m54.825s
      user    0m0.061s
      sys     0m10.665s
      
      real    1m6.401s
      user    0m0.089s
      sys     0m11.218s
      
      real    1m13.768s
      user    0m0.087s
      sys     0m10.665s       avg 10.849
      
      we can see the sys time reduce ~43%.
      Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      1e8f9158
  5. 18 5月, 2013 2 次提交
    • J
      Btrfs: handle running extent ops with skinny metadata · b1c79e09
      Josef Bacik 提交于
      Chris hit a bug where we weren't finding extent records when running extent ops.
      This is because we use the delayed_ref_head when running the extent op, which
      means we can't use the ->type checks to see if we are metadata.  We also lose
      the level of the metadata we are working on.  So to fix this we can just check
      the ->is_data section of the extent_op, and we can store the level of the buffer
      we were modifying in the extent_op.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b1c79e09
    • D
      btrfs: annotate quota tree for lockdep · 60b62978
      David Sterba 提交于
      Quota tree has been missing from lockdep annotations, though no warning
      has been seen in the wild.
      
      There's currently one entry that does not belong there,
      BTRFS_ORPHAN_OBJECTID.  No such tree exists, it's probably a copy &
      paste mistake, the id is defined among tree ids.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      60b62978
  6. 07 5月, 2013 5 次提交
    • D
      btrfs: enhance superblock checks · 1104a885
      David Sterba 提交于
      The superblock checksum is not verified upon mount. <awkward silence>
      
      Add that check and also reorder existing checks to a more logical
      order.
      
      Current mkfs.btrfs does not calculate the correct checksum of
      super_block and thus a freshly created filesytem will fail to mount when
      this patch is applied.
      
      First transaction commit calculates correct superblock checksum and
      saves it to disk.
      
      Reproducer:
      $ mfks.btrfs /dev/sda
      $ mount /dev/sda /mnt
      $ btrfs scrub start /mnt
      $ sleep 5
      $ btrfs scrub status /mnt
      ... super:2 ...
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      1104a885
    • D
      btrfs: fix misleading variable name for flags · b6919a58
      David Sterba 提交于
      The variable was named 'data' in btrfs_reserve_extent and that's the
      only function that actually uses it to let btrfs_get_alloc_profile know
      what profile we want. Then it's passed down as u64 flags.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b6919a58
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
    • J
      Btrfs: rescan for qgroups · 2f232036
      Jan Schmidt 提交于
      If qgroup tracking is out of sync, a rescan operation can be started. It
      iterates the complete extent tree and recalculates all qgroup tracking data.
      This is an expensive operation and should not be used unless required.
      
      A filesystem under rescan can still be umounted. The rescan continues on the
      next mount.  Status information is provided with a separate ioctl while a
      rescan operation is in progress.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2f232036
    • J
      Btrfs: separate sequence numbers for delayed ref tracking and tree mod log · fc36ed7e
      Jan Schmidt 提交于
      Sequence numbers for delayed refs have been introduced in the first version
      of the qgroup patch set. To solve the problem of find_all_roots on a busy
      file system, the tree mod log was introduced. The sequence numbers for that
      were simply shared between those two users.
      
      However, at one point in qgroup's quota accounting, there's a statement
      accessing the previous sequence number, that's still just doing (seq - 1)
      just as it would have to in the very first version.
      
      To satisfy that requirement, this patch makes the sequence number counter 64
      bit and splits it into a major part (used for qgroup sequence number
      counting) and a minor part (incremented for each tree modification in the
      log). This enables us to go exactly one major step backwards, as required
      for qgroups, while still incrementing the sequence counter for tree mod log
      insertions to keep track of their order. Keeping them in a single variable
      means there's no need to change all the code dealing with comparisons of two
      sequence numbers.
      
      The sequence number is reset to 0 on commit (not new in this patch), which
      ensures we won't overflow the two 32 bit counters.
      
      Without this fix, the qgroup tracking can occasionally go wrong and WARN_ONs
      from the tree mod log code may happen.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      fc36ed7e