1. 12 11月, 2013 2 次提交
    • J
      Btrfs: add tests for find_lock_delalloc_range · 294e30fe
      Josef Bacik 提交于
      So both Liu and I made huge messes of find_lock_delalloc_range trying to fix
      stuff, me first by fixing extent size, then him by fixing something I broke and
      then me again telling him to fix it a different way.  So this is obviously a
      candidate for some testing.  This patch adds a pseudo fs so we can allocate fake
      inodes for tests that need an inode or pages.  Then it addes a bunch of tests to
      make sure find_lock_delalloc_range is acting the way it is supposed to.  With
      this patch and all of our previous patches to find_lock_delalloc_range I am sure
      it is working as expected now.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      294e30fe
    • J
      Btrfs: add a sanity test for btrfs_split_item · 06ea65a3
      Josef Bacik 提交于
      While looking at somebodys corruption I became completely convinced that
      btrfs_split_item was broken, so I wrote this test to verify that it was working
      as it was supposed to.  Thankfully it appears to be working as intended, so just
      add this test to make sure nobody breaks it in the future.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      06ea65a3
  2. 01 9月, 2013 2 次提交
    • S
      Btrfs: introduce a tree for items that map UUIDs to something · 07b30a49
      Stefan Behrens 提交于
      Mapping UUIDs to subvolume IDs is an operation with a high effort
      today. Today, the algorithm even has quadratic effort (based on the
      number of existing subvolumes), which means, that it takes minutes
      to send/receive a single subvolume if 10,000 subvolumes exist. But
      even linear effort would be too much since it is a waste. And these
      data structures to allow mapping UUIDs to subvolume IDs are created
      every time a btrfs send/receive instance is started.
      
      It is much more efficient to maintain a searchable persistent data
      structure in the filesystem, one that is updated whenever a
      subvolume/snapshot is created and deleted, and when the received
      subvolume UUID is set by the btrfs-receive tool.
      
      Therefore kernel code is added with this commit that is able to
      maintain data structures in the filesystem that allow to quickly
      search for a given UUID and to retrieve data that is assigned to
      this UUID, like which subvolume ID is related to this UUID.
      
      This commit adds a new tree to hold UUID-to-data mapping items. The
      key of the items is the full UUID plus the key type BTRFS_UUID_KEY.
      Multiple data blocks can be stored for a given UUID, a type/length/
      value scheme is used.
      
      Now follows the lengthy justification, why a new tree was added
      instead of using the existing root tree:
      
      The first approach was to not create another tree that holds UUID
      items. Instead, the items should just go into the top root tree.
      Unfortunately this confused the algorithm to assign the objectid
      of subvolumes and snapshots. The reason is that
      btrfs_find_free_objectid() calls btrfs_find_highest_objectid() for
      the first created subvol or snapshot after mounting a filesystem,
      and this function simply searches for the largest used objectid in
      the root tree keys to pick the next objectid to assign. Of course,
      the UUID keys have always been the ones with the highest offset
      value, and the next assigned subvol ID was wastefully huge.
      
      To use any other existing tree did not look proper. To apply a
      workaround such as setting the objectid to zero in the UUID item
      key and to implement collision handling would either add
      limitations (in case of a btrfs_extend_item() approach to handle
      the collisions) or a lot of complexity and source code (in case a
      key would be looked up that is free of collisions). Adding new code
      that introduces limitations is not good, and adding code that is
      complex and lengthy for no good reason is also not good. That's the
      justification why a completely new tree was introduced.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      07b30a49
    • J
      Btrfs: separate out tests into their own directory · dc11dd5d
      Josef Bacik 提交于
      The plan is to have a bunch of unit tests that run when btrfs is loaded when you
      build with the appropriate config option.  My ultimate goal is to have a test
      for every non-static function we have, but at first I'm going to focus on the
      things that cause us the most problems.  To start out with this just adds a
      tests/ directory and moves the existing free space cache tests into that
      directory and sets up all of the infrastructure.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      dc11dd5d
  3. 02 2月, 2013 1 次提交
    • D
      Btrfs: RAID5 and RAID6 · 53b381b3
      David Woodhouse 提交于
      This builds on David Woodhouse's original Btrfs raid5/6 implementation.
      The code has changed quite a bit, blame Chris Mason for any bugs.
      
      Read/modify/write is done after the higher levels of the filesystem have
      prepared a given bio.  This means the higher layers are not responsible
      for building full stripes, and they don't need to query for the topology
      of the extents that may get allocated during delayed allocation runs.
      It also means different files can easily share the same stripe.
      
      But, it does expose us to incorrect parity if we crash or lose power
      while doing a read/modify/write cycle.  This will be addressed in a
      later commit.
      
      Scrub is unable to repair crc errors on raid5/6 chunks.
      
      Discard does not work on raid5/6 (yet)
      
      The stripe size is fixed at 64KiB per disk.  This will be tunable
      in a later commit.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53b381b3
  4. 13 12月, 2012 1 次提交
  5. 26 7月, 2012 1 次提交
  6. 12 7月, 2012 1 次提交
  7. 22 12月, 2011 2 次提交
  8. 02 10月, 2011 1 次提交
    • A
      btrfs: initial readahead code and prototypes · 7414a03f
      Arne Jansen 提交于
      This is the implementation for the generic read ahead framework.
      
      To trigger a readahead, btrfs_reada_add must be called. It will start
      a read ahead for the given range [start, end) on tree root. The returned
      handle can either be used to wait on the readahead to finish
      (btrfs_reada_wait), or to send it to the background (btrfs_reada_detach).
      
      The read ahead works as follows:
      On btrfs_reada_add, the root of the tree is inserted into a radix_tree.
      reada_start_machine will then search for extents to prefetch and trigger
      some reads. When a read finishes for a node, all contained node/leaf
      pointers that lie in the given range will also be enqueued. The reads will
      be triggered in sequential order, thus giving a big win over a naive
      enumeration. It will also make use of multi-device layouts. Each disk
      will have its on read pointer and all disks will by utilized in parallel.
      Also will no two disks read both sides of a mirror simultaneously, as this
      would waste seeking capacity. Instead both disks will read different parts
      of the filesystem.
      Any number of readaheads can be started in parallel. The read order will be
      determined globally, i.e. 2 parallel readaheads will normally finish faster
      than the 2 started one after another.
      
      Changes v2:
       - protect root->node by transaction instead of node_lock
       - fix missed branches:
          The readahead had a too simple check to determine if a branch from
          a node should be checked or not. It now also records the upper bound
          of each node to see if the requested RA range lies within.
       - use KERN_CONT to debug output, to avoid line breaks
       - defer reada_start_machine to worker to avoid deadlock
      
      Changes v3:
       - protect root->node by rcu
      
      Changes v5:
       - changed EIO-semantics of reada_tree_block_flagged
       - remove spin_lock from reada_control and make elems an atomic_t
       - remove unused read_total from reada_control
       - kill reada_key_cmp, use btrfs_comp_cpu_keys instead
       - use kref-style release functions where possible
       - return struct reada_control * instead of void * from btrfs_reada_add
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      7414a03f
  9. 29 9月, 2011 1 次提交
  10. 02 8月, 2011 1 次提交
  11. 21 5月, 2011 1 次提交
    • M
      btrfs: implement delayed inode items operation · 16cdcec7
      Miao Xie 提交于
      Changelog V5 -> V6:
      - Fix oom when the memory load is high, by storing the delayed nodes into the
        root's radix tree, and letting btrfs inodes go.
      
      Changelog V4 -> V5:
      - Fix the race on adding the delayed node to the inode, which is spotted by
        Chris Mason.
      - Merge Chris Mason's incremental patch into this patch.
      - Fix deadlock between readdir() and memory fault, which is reported by
        Itaru Kitayama.
      
      Changelog V3 -> V4:
      - Fix nested lock, which is reported by Itaru Kitayama, by updating space cache
        inode in time.
      
      Changelog V2 -> V3:
      - Fix the race between the delayed worker and the task which does delayed items
        balance, which is reported by Tsutomu Itoh.
      - Modify the patch address David Sterba's comment.
      - Fix the bug of the cpu recursion spinlock, reported by Chris Mason
      
      Changelog V1 -> V2:
      - break up the global rb-tree, use a list to manage the delayed nodes,
        which is created for every directory and file, and used to manage the
        delayed directory name index items and the delayed inode item.
      - introduce a worker to deal with the delayed nodes.
      
      Compare with Ext3/4, the performance of file creation and deletion on btrfs
      is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
      such as inode item, directory name item, directory name index and so on.
      
      If we can do some delayed b+ tree insertion or deletion, we can improve the
      performance, so we made this patch which implemented delayed directory name
      index insertion/deletion and delayed inode update.
      
      Implementation:
      - introduce a delayed root object into the filesystem, that use two lists to
        manage the delayed nodes which are created for every file/directory.
        One is used to manage all the delayed nodes that have delayed items. And the
        other is used to manage the delayed nodes which is waiting to be dealt with
        by the work thread.
      - Every delayed node has two rb-tree, one is used to manage the directory name
        index which is going to be inserted into b+ tree, and the other is used to
        manage the directory name index which is going to be deleted from b+ tree.
      - introduce a worker to deal with the delayed operation. This worker is used
        to deal with the works of the delayed directory name index items insertion
        and deletion and the delayed inode update.
        When the delayed items is beyond the lower limit, we create works for some
        delayed nodes and insert them into the work queue of the worker, and then
        go back.
        When the delayed items is beyond the upper bound, we create works for all
        the delayed nodes that haven't been dealt with, and insert them into the work
        queue of the worker, and then wait for that the untreated items is below some
        threshold value.
      - When we want to insert a directory name index into b+ tree, we just add the
        information into the delayed inserting rb-tree.
        And then we check the number of the delayed items and do delayed items
        balance. (The balance policy is above.)
      - When we want to delete a directory name index from the b+ tree, we search it
        in the inserting rb-tree at first. If we look it up, just drop it. If not,
        add the key of it into the delayed deleting rb-tree.
        Similar to the delayed inserting rb-tree, we also check the number of the
        delayed items and do delayed items balance.
        (The same to inserting manipulation)
      - When we want to update the metadata of some inode, we cached the data of the
        inode into the delayed node. the worker will flush it into the b+ tree after
        dealing with the delayed insertion and deletion.
      - We will move the delayed node to the tail of the list after we access the
        delayed node, By this way, we can cache more delayed items and merge more
        inode updates.
      - If we want to commit transaction, we will deal with all the delayed node.
      - the delayed node will be freed when we free the btrfs inode.
      - Before we log the inode items, we commit all the directory name index items
        and the delayed inode update.
      
      I did a quick test by the benchmark tool[1] and found we can improve the
      performance of file creation by ~15%, and file deletion by ~20%.
      
      Before applying this patch:
      Create files:
              Total files: 50000
              Total time: 1.096108
              Average time: 0.000022
      Delete files:
              Total files: 50000
              Total time: 1.510403
              Average time: 0.000030
      
      After applying this patch:
      Create files:
              Total files: 50000
              Total time: 0.932899
              Average time: 0.000019
      Delete files:
              Total files: 50000
              Total time: 1.215732
              Average time: 0.000024
      
      [1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3
      
      Many thanks for Kitayama-san's help!
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dave@jikos.cz>
      Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Tested-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      16cdcec7
  12. 12 5月, 2011 1 次提交
    • A
      btrfs: scrub · a2de733c
      Arne Jansen 提交于
      This adds an initial implementation for scrub. It works quite
      straightforward. The usermode issues an ioctl for each device in the
      fs. For each device, it enumerates the allocated device chunks. For
      each chunk, the contained extents are enumerated and the data checksums
      fetched. The extents are read sequentially and the checksums verified.
      If an error occurs (checksum or EIO), a good copy is searched for. If
      one is found, the bad copy will be rewritten.
      All enumerations happen from the commit roots. During a transaction
      commit, the scrubs get paused and afterwards continue from the new
      roots.
      
      This commit is based on the series originally posted to linux-btrfs
      with some improvements that resulted from comments from David Sterba,
      Ilya Dryomov and Jan Schmidt.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      a2de733c
  13. 22 12月, 2010 1 次提交
    • L
      btrfs: Add lzo compression support · a6fa6fae
      Li Zefan 提交于
      Lzo is a much faster compression algorithm than gzib, so would allow
      more users to enable transparent compression, and some users can
      choose from compression ratio and speed for different applications
      
      Usage:
      
       # mount -t btrfs -o compress[=<zlib,lzo>] dev /mnt
      or
       # mount -t btrfs -o compress-force[=<zlib,lzo>] dev /mnt
      
      "-o compress" without argument is still allowed for compatability.
      
      Compatibility:
      
      If we mount a filesystem with lzo compression, it will not be able be
      mounted in old kernels. One reason is, otherwise btrfs will directly
      dump compressed data, which sits in inline extent, to user.
      
      Performance:
      
      The test copied a linux source tarball (~400M) from an ext4 partition
      to the btrfs partition, and then extracted it.
      
      (time in second)
                 lzo        zlib        nocompress
      copy:      10.6       21.7        14.9
      extract:   70.1       94.4        66.6
      
      (data size in MB)
                 lzo        zlib        nocompress
      copy:      185.87     108.69      394.49
      extract:   193.80     132.36      381.21
      
      Changelog:
      
      v1 -> v2:
      - Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig.
      - Add incompability flag.
      - Fix error handling in compress code.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      a6fa6fae
  14. 10 6月, 2009 1 次提交
    • Y
      Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE) · 5d4f98a2
      Yan Zheng 提交于
      This commit introduces a new kind of back reference for btrfs metadata.
      Once a filesystem has been mounted with this commit, IT WILL NO LONGER
      BE MOUNTABLE BY OLDER KERNELS.
      
      When a tree block in subvolume tree is cow'd, the reference counts of all
      extents it points to are increased by one.  At transaction commit time,
      the old root of the subvolume is recorded in a "dead root" data structure,
      and the btree it points to is later walked, dropping reference counts
      and freeing any blocks where the reference count goes to 0.
      
      The increments done during cow and decrements done after commit cancel out,
      and the walk is a very expensive way to go about freeing the blocks that
      are no longer referenced by the new btree root.  This commit reduces the
      transaction overhead by avoiding the need for dead root records.
      
      When a non-shared tree block is cow'd, we free the old block at once, and the
      new block inherits old block's references. When a tree block with reference
      count > 1 is cow'd, we increase the reference counts of all extents
      the new block points to by one, and decrease the old block's reference count by
      one.
      
      This dead tree avoidance code removes the need to modify the reference
      counts of lower level extents when a non-shared tree block is cow'd.
      But we still need to update back ref for all pointers in the block.
      This is because the location of the block is recorded in the back ref
      item.
      
      We can solve this by introducing a new type of back ref. The new
      back ref provides information about pointer's key, level and in which
      tree the pointer lives. This information allow us to find the pointer
      by searching the tree. The shortcoming of the new back ref is that it
      only works for pointers in tree blocks referenced by their owner trees.
      
      This is mostly a problem for snapshots, where resolving one of these
      fuzzy back references would be O(number_of_snapshots) and quite slow.
      The solution used here is to use the fuzzy back references in the common
      case where a given tree block is only referenced by one root,
      and use the full back references when multiple roots have a reference
      on a given block.
      
      This commit adds per subvolume red-black tree to keep trace of cached
      inodes. The red-black tree helps the balancing code to find cached
      inodes whose inode numbers within a given range.
      
      This commit improves the balancing code by introducing several data
      structures to keep the state of balancing. The most important one
      is the back ref cache. It caches how the upper level tree blocks are
      referenced. This greatly reduce the overhead of checking back ref.
      
      The improved balancing code scales significantly better with a large
      number of snapshots.
      
      This is a very large commit and was written in a number of
      pieces.  But, they depend heavily on the disk format change and were
      squashed together to make sure git bisect didn't end up in a
      bad state wrt space balancing or the format change.
      Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5d4f98a2
  15. 25 4月, 2009 1 次提交
  16. 25 3月, 2009 1 次提交
    • C
      Btrfs: do extent allocation and reference count updates in the background · 56bec294
      Chris Mason 提交于
      The extent allocation tree maintains a reference count and full
      back reference information for every extent allocated in the
      filesystem.  For subvolume and snapshot trees, every time
      a block goes through COW, the new copy of the block adds a reference
      on every block it points to.
      
      If a btree node points to 150 leaves, then the COW code needs to go
      and add backrefs on 150 different extents, which might be spread all
      over the extent allocation tree.
      
      These updates currently happen during btrfs_cow_block, and most COWs
      happen during btrfs_search_slot.  btrfs_search_slot has locks held
      on both the parent and the node we are COWing, and so we really want
      to avoid IO during the COW if we can.
      
      This commit adds an rbtree of pending reference count updates and extent
      allocations.  The tree is ordered by byte number of the extent and byte number
      of the parent for the back reference.  The tree allows us to:
      
      1) Modify back references in something close to disk order, reducing seeks
      2) Significantly reduce the number of modifications made as block pointers
      are balanced around
      3) Do all of the extent insertion and back reference modifications outside
      of the performance critical btrfs_search_slot code.
      
      #3 has the added benefit of greatly reducing the btrfs stack footprint.
      The extent allocation tree modifications are done without the deep
      (and somewhat recursive) call chains used in the past.
      
      These delayed back reference updates must be done before the transaction
      commits, and so the rbtree is tied to the transaction.  Throttling is
      implemented to help keep the queue of backrefs at a reasonable size.
      
      Since there was a similar mechanism in place for the extent tree
      extents, that is removed and replaced by the delayed reference tree.
      
      Yan Zheng <yan.zheng@oracle.com> helped review and fixup this code.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      56bec294
  17. 30 10月, 2008 1 次提交
    • C
      Btrfs: Add zlib compression support · c8b97818
      Chris Mason 提交于
      This is a large change for adding compression on reading and writing,
      both for inline and regular extents.  It does some fairly large
      surgery to the writeback paths.
      
      Compression is off by default and enabled by mount -o compress.  Even
      when the -o compress mount option is not used, it is possible to read
      compressed extents off the disk.
      
      If compression for a given set of pages fails to make them smaller, the
      file is flagged to avoid future compression attempts later.
      
      * While finding delalloc extents, the pages are locked before being sent down
      to the delalloc handler.  This allows the delalloc handler to do complex things
      such as cleaning the pages, marking them writeback and starting IO on their
      behalf.
      
      * Inline extents are inserted at delalloc time now.  This allows us to compress
      the data before inserting the inline extent, and it allows us to insert
      an inline extent that spans multiple pages.
      
      * All of the in-memory extent representations (extent_map.c, ordered-data.c etc)
      are changed to record both an in-memory size and an on disk size, as well
      as a flag for compression.
      
      From a disk format point of view, the extent pointers in the file are changed
      to record the on disk size of a given extent and some encoding flags.
      Space in the disk format is allocated for compression encoding, as well
      as encryption and a generic 'other' field.  Neither the encryption or the
      'other' field are currently used.
      
      In order to limit the amount of data read for a single random read in the
      file, the size of a compressed extent is limited to 128k.  This is a
      software only limit, the disk format supports u64 sized compressed extents.
      
      In order to limit the ram consumed while processing extents, the uncompressed
      size of a compressed extent is limited to 256k.  This is a software only limit
      and will be subject to tuning later.
      
      Checksumming is still done on compressed extents, and it is done on the
      uncompressed version of the data.  This way additional encodings can be
      layered on without having to figure out which encoding to checksum.
      
      Compression happens at delalloc time, which is basically singled threaded because
      it is usually done by a single pdflush thread.  This makes it tricky to
      spread the compression load across all the cpus on the box.  We'll have to
      look at parallel pdflush walks of dirty inodes at a later time.
      
      Decompression is hooked into readpages and it does spread across CPUs nicely.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      c8b97818
  18. 09 10月, 2008 1 次提交
  19. 30 9月, 2008 1 次提交
    • C
      Btrfs: add and improve comments · d352ac68
      Chris Mason 提交于
      This improves the comments at the top of many functions.  It didn't
      dive into the guts of functions because I was trying to
      avoid merging problems with the new allocator and back reference work.
      
      extent-tree.c and volumes.c were both skipped, and there is definitely
      more work todo in cleaning and commenting the code.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d352ac68
  20. 26 9月, 2008 1 次提交
  21. 25 9月, 2008 17 次提交
    • J
      Btrfs: free space accounting redo · 0f9dd46c
      Josef Bacik 提交于
      1) replace the per fs_info extent_io_tree that tracked free space with two
      rb-trees per block group to track free space areas via offset and size.  The
      reason to do this is because most allocations come with a hint byte where to
      start, so we can usually find a chunk of free space at that hint byte to satisfy
      the allocation and get good space packing.  If we cannot find free space at or
      after the given offset we fall back on looking for a chunk of the given size as
      close to that given offset as possible.  When we fall back on the size search we
      also try to find a slot as close to the size we want as possible, to avoid
      breaking small chunks off of huge areas if possible.
      
      2) remove the extent_io_tree that tracked the block group cache from fs_info and
      replaced it with an rb-tree thats tracks block group cache via offset.  also
      added a per space_info list that tracks the block group cache for the particular
      space so we can lookup related block groups easily.
      
      3) cleaned up the allocation code to make it a little easier to read and a
      little less complicated.  Basically there are 3 steps, first look from our
      provided hint.  If we couldn't find from that given hint, start back at our
      original search start and look for space from there.  If that fails try to
      allocate space if we can and start looking again.  If not we're screwed and need
      to start over again.
      
      4) small fixes.  there were some issues in volumes.c where we wouldn't allocate
      the rest of the disk.  fixed cow_file_range to actually pass the alloc_hint,
      which has helped a good bit in making the fs_mark test I run have semi-normal
      results as we run out of space.  Generally with data allocations we don't track
      where we last allocated from, so everytime we did a data allocation we'd search
      through every block group that we have looking for free space.  Now searching a
      block group with no free space isn't terribly time consuming, it was causing a
      slight degradation as we got more data block groups.  The alloc_hint has fixed
      this slight degredation and made things semi-normal.
      
      There is still one nagging problem I'm working on where we will get ENOSPC when
      there is definitely plenty of space.  This only happens with metadata
      allocations, and only when we are almost full.  So you generally hit the 85%
      mark first, but sometimes you'll hit the BUG before you hit the 85% wall.  I'm
      still tracking it down, but until then this seems to be pretty stable and make a
      significant performance gain.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0f9dd46c
    • C
      Btrfs: Add a write ahead tree log to optimize synchronous operations · e02119d5
      Chris Mason 提交于
      File syncs and directory syncs are optimized by copying their
      items into a special (copy-on-write) log tree.  There is one log tree per
      subvolume and the btrfs super block points to a tree of log tree roots.
      
      After a crash, items are copied out of the log tree and back into the
      subvolume.  See tree-log.c for all the details.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e02119d5
    • J
      Btrfs: compile when posix acl's are disabled · eab922ec
      Josef Bacik 提交于
      This patch makes btrfs so it will compile properly when acls are disabled.  I
      tested this and it worked with CONFIG_FS_POSIX_ACL off and on.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      eab922ec
    • D
      Switch btrfs_name_hash() to crc32c · 615f996f
      David Woodhouse 提交于
      Date: Tue, 19 Aug 2008 19:21:57 +0100
      Using a 64-bit hash as the readdir cookie is just asking for trouble.
      And gets it, when we try to export the file system by NFS.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      615f996f
    • B
      NFS support for btrfs - v3 · be6e8dc0
      Balaji Rao 提交于
      Date: Mon, 21 Jul 2008 02:01:56 +0530
      Here's an implementation of NFS support for btrfs. It relies on the
      fixes which are going in to 2.6.28 for the NFS readdir/lookup deadlock.
      
      This uses the btrfs_iget helper introduced previously.
      
      [dwmw2: Tidy up a little, switch to d_obtain_alias() w/compat routine,
      	change fh_type,	store parent's root object ID where needed,
      	fix some get_parent() and fs_to_dentry() bugs]
      Signed-off-by: NBalaji Rao <balajirrao@gmail.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      be6e8dc0
    • Y
      Btrfs: Add a leaf reference cache · 31153d81
      Yan Zheng 提交于
      Much of the IO done while dropping snapshots is done looking up
      leaves in the filesystem trees to see if they point to any extents and
      to drop the references on any extents found.
      
      This creates a cache so that IO isn't required.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      31153d81
    • J
    • C
      Btrfs: Add version strings on module load · b3c3da71
      Chris Mason 提交于
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b3c3da71
    • C
      Btrfs: Start btree concurrency work. · 925baedd
      Chris Mason 提交于
      The allocation trees and the chunk trees are serialized via their own
      dedicated mutexes.  This means allocation location is still not very
      fine grained.
      
      The main FS btree is protected by locks on each block in the btree.  Locks
      are taken top / down, and as processing finishes on a given level of the
      tree, the lock is released after locking the lower level.
      
      The end result of a search is now a path where only the lowest level
      is locked.  Releasing or freeing the path drops any locks held.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      925baedd
    • C
      Btrfs: split out ioctl.c · f46b5a66
      Christoph Hellwig 提交于
      Split the ioctl handling out of inode.c into a file of it's own.
      Also fix up checkpatch.pl warnings for the moved code.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f46b5a66
    • C
      Btrfs: Add async worker threads for pre and post IO checksumming · 8b712842
      Chris Mason 提交于
      Btrfs has been using workqueues to spread the checksumming load across
      other CPUs in the system.  But, workqueues only schedule work on the
      same CPU that queued the work, giving them a limited benefit for systems with
      higher CPU counts.
      
      This code adds a generic facility to schedule work with pools of kthreads,
      and changes the bio submission code to queue bios up.  The queueing is
      important to make sure large numbers of procs on the system don't
      turn streaming workloads into random workloads by sending IO down
      concurrently.
      
      The end result of all of this is much higher performance (and CPU usage) when
      doing checksumming on large machines.  Two worker pools are created,
      one for writes and one for endio processing.  The two could deadlock if
      we tried to service both from a single pool.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      8b712842
    • C
      btrfs: tiny makefile cleanup · 95c9eb17
      Christoph Hellwig 提交于
      use normal kbuild syntax to build acl.o conditinally and remove comment
      out lines.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      95c9eb17
    • C
      0b86a832
    • C
      Btrfs: Split the extent_map code into two parts · d1310b2e
      Chris Mason 提交于
      There is now extent_map for mapping offsets in the file to disk and
      extent_io for state tracking, IO submission and extent_bufers.
      
      The new extent_map code shifts from [start,end] pairs to [start,len], and
      pushes the locking out into the caller.  This allows a few performance
      optimizations and is easier to use.
      
      A number of extent_map usage bugs were fixed, mostly with failing
      to remove extent_map entries when changing the file.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d1310b2e
    • Y
      Btrfs: Fix compile on kernel without ACLs enabled · caaca38b
      Yan 提交于
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      caaca38b
    • C
      Btrfs: Add data=ordered support · dc17ff8f
      Chris Mason 提交于
      This forces file data extents down the disk along with the metadata that
      references them.  The current implementation is fairly simple, and just
      writes out all of the dirty pages in an inode before the commit.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      dc17ff8f
    • J
      xattr support for btrfs · 5103e947
      Josef Bacik 提交于
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5103e947