1. 27 5月, 2011 1 次提交
  2. 24 5月, 2011 2 次提交
  3. 06 5月, 2011 1 次提交
  4. 02 5月, 2011 5 次提交
  5. 26 4月, 2011 1 次提交
  6. 25 4月, 2011 1 次提交
    • L
      Btrfs: Always use 64bit inode number · 33345d01
      Li Zefan 提交于
      There's a potential problem in 32bit system when we exhaust 32bit inode
      numbers and start to allocate big inode numbers, because btrfs uses
      inode->i_ino in many places.
      
      So here we always use BTRFS_I(inode)->location.objectid, which is an
      u64 variable.
      
      There are 2 exceptions that BTRFS_I(inode)->location.objectid !=
      inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2),
      and inode->i_ino will be used in those cases.
      
      Another reason to make this change is I'm going to use a special inode
      to save free ino cache, and the inode number must be > (u64)-256.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      33345d01
  7. 16 4月, 2011 1 次提交
  8. 13 4月, 2011 1 次提交
    • C
      Btrfs: make uncache_state unconditional · 109b36a2
      Chris Mason 提交于
      The extent_io code can take cached pointers into the extent state trees,
      and these can make lookups much faster in common operations.  The
      caching only happens when specific bits are set that prevent merging
      and splitting of the extent state.
      
      A help function was added to uncache the state, and it was testing
      the same set of conditionals.  This can leak in very strange corner
      cases where the lock bit goes away unexpectedly.
      
      The uncaching should be unconditional.  Once we have a ref on the
      extent we should always give it up.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      109b36a2
  9. 12 4月, 2011 2 次提交
    • A
      btrfs: using cached extent_state in set/unlock combinations · 507903b8
      Arne Jansen 提交于
      In several places the sequence (set_extent_uptodate, unlock_extent) is used.
      This leads to a duplicate lookup of the extent state. This patch lets
      set_extent_uptodate return a cached extent_state which can be passed to
      unlock_extent_cached.
      The occurences of the above sequences are updated to use the cache. Only
      end_bio_extent_readpage is updated that it first gets a cached state to
      pass it to the readpage_end_io_hook as the prototype requested and is later
      on being used for set/unlock.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      507903b8
    • S
      btrfs: properly handle overlapping areas in memmove_extent_buffer · 3387206f
      Sergei Trofimovich 提交于
      Fix data corruption caused by memcpy() usage on overlapping data.
      I've observed it first when found out usermode linux crash on btrfs.
      
      ?all chain is the following:
      ------------[ cut here ]------------
      WARNING: at /home/slyfox/linux-2.6/fs/btrfs/extent_io.c:3900 memcpy_extent_buffer+0x1a5/0x219()
      Call Trace:
      6fa39a58:  [<601b495e>] _raw_spin_unlock_irqrestore+0x18/0x1c
      6fa39a68:  [<60029ad9>] warn_slowpath_common+0x59/0x70
      6fa39aa8:  [<60029b05>] warn_slowpath_null+0x15/0x17
      6fa39ab8:  [<600efc97>] memcpy_extent_buffer+0x1a5/0x219
      6fa39b48:  [<600efd9f>] memmove_extent_buffer+0x94/0x208
      6fa39bc8:  [<600becbf>] btrfs_del_items+0x214/0x473
      6fa39c78:  [<600ce1b0>] btrfs_delete_one_dir_name+0x7c/0xda
      6fa39cc8:  [<600dad6b>] __btrfs_unlink_inode+0xad/0x25d
      6fa39d08:  [<600d7864>] btrfs_start_transaction+0xe/0x10
      6fa39d48:  [<600dc9ff>] btrfs_unlink_inode+0x1b/0x3b
      6fa39d78:  [<600e04bc>] btrfs_unlink+0x70/0xef
      6fa39dc8:  [<6007f0d0>] vfs_unlink+0x58/0xa3
      6fa39df8:  [<60080278>] do_unlinkat+0xd4/0x162
      6fa39e48:  [<600517db>] call_rcu_sched+0xe/0x10
      6fa39e58:  [<600452a8>] __put_cred+0x58/0x5a
      6fa39e78:  [<6007446c>] sys_faccessat+0x154/0x166
      6fa39ed8:  [<60080317>] sys_unlink+0x11/0x13
      6fa39ee8:  [<60016b80>] handle_syscall+0x58/0x70
      6fa39f08:  [<60021377>] userspace+0x2d4/0x381
      6fa39fc8:  [<60014507>] fork_handler+0x62/0x69
      ---[ end trace 70b0ca2ef0266b93 ]---
      
      http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg09302.htmlSigned-off-by: NSergei Trofimovich <slyfox@gentoo.org>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3387206f
  10. 28 3月, 2011 1 次提交
    • L
      Btrfs: add initial tracepoint support for btrfs · 1abe9b8a
      liubo 提交于
      Tracepoints can provide insight into why btrfs hits bugs and be greatly
      helpful for debugging, e.g
                    dd-7822  [000]  2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0
                    dd-7822  [000]  2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0
       btrfs-transacti-7804  [001]  2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0)
       btrfs-transacti-7804  [001]  2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0)
       btrfs-transacti-7804  [001]  2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8
         flush-btrfs-2-7821  [001]  2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA
         flush-btrfs-2-7821  [001]  2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0)
         flush-btrfs-2-7821  [001]  2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0)
         flush-btrfs-2-7821  [000]  2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0)
       btrfs-endio-wri-7800  [001]  2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0)
       btrfs-endio-wri-7800  [001]  2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0)
      
      Here is what I have added:
      
      1) ordere_extent:
              btrfs_ordered_extent_add
              btrfs_ordered_extent_remove
              btrfs_ordered_extent_start
              btrfs_ordered_extent_put
      
      These provide critical information to understand how ordered_extents are
      updated.
      
      2) extent_map:
              btrfs_get_extent
      
      extent_map is used in both read and write cases, and it is useful for tracking
      how btrfs specific IO is running.
      
      3) writepage:
              __extent_writepage
              btrfs_writepage_end_io_hook
      
      Pages are cirtical resourses and produce a lot of corner cases during writeback,
      so it is valuable to know how page is written to disk.
      
      4) inode:
              btrfs_inode_new
              btrfs_inode_request
              btrfs_inode_evict
      
      These can show where and when a inode is created, when a inode is evicted.
      
      5) sync:
              btrfs_sync_file
              btrfs_sync_fs
      
      These show sync arguments.
      
      6) transaction:
              btrfs_transaction_commit
      
      In transaction based filesystem, it will be useful to know the generation and
      who does commit.
      
      7) back reference and cow:
      	btrfs_delayed_tree_ref
      	btrfs_delayed_data_ref
      	btrfs_delayed_ref_head
      	btrfs_cow_block
      
      Btrfs natively supports back references, these tracepoints are helpful on
      understanding btrfs's COW mechanism.
      
      8) chunk:
      	btrfs_chunk_alloc
      	btrfs_chunk_free
      
      Chunk is a link between physical offset and logical offset, and stands for space
      infomation in btrfs, and these are helpful on tracing space things.
      
      9) reserved_extent:
      	btrfs_reserved_extent_alloc
      	btrfs_reserved_extent_free
      
      These can show how btrfs uses its space.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1abe9b8a
  11. 18 3月, 2011 1 次提交
  12. 10 3月, 2011 1 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
  13. 09 3月, 2011 1 次提交
  14. 24 2月, 2011 1 次提交
    • C
      Btrfs: fix fiemap bugs with delalloc · ec29ed5b
      Chris Mason 提交于
      The Btrfs fiemap code wasn't properly returning delalloc extents,
      so applications that trust fiemap to decide if there are holes in the
      file see holes instead of delalloc.
      
      This reworks the btrfs fiemap code, adding a get_extent helper that
      searches for delalloc ranges and also adding a helper for extent_fiemap
      that skips past holes in the file.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ec29ed5b
  15. 15 2月, 2011 2 次提交
    • C
      Btrfs: don't release pages when we can't clear the uptodate bits · e3f24cc5
      Chris Mason 提交于
      Btrfs tracks uptodate state in an rbtree as well as in the
      page bits.  This is supposed to enable us to use block sizes other than
      the page size, but there are a few parts still missing before that
      completely works.
      
      But, our readpage routine trusts this additional range based tracking
      of uptodateness, much in the same way the buffer head up to date bits
      are trusted for the other filesystems.
      
      The problem is that sometimes we need to allocate memory in order to
      split records in the rbtree, even when we are just clearing bits.  This
      can be difficult when our clearing function is called GFP_ATOMIC, which
      can happen in the releasepage path.
      
      So, what happens today looks like this:
      
      releasepage called with GFP_ATOMIC
      btrfs_releasepage calls clear_extent_bit
      clear_extent_bit fails to allocate ram, leaving the up to date bit set
      btrfs_releasepage returns success
      
      The end result is the page being gone, but btrfs thinking the range is
      up to date.   Later on if someone tries to read that same page, the
      btrfs readpage code will return immediately thinking the page is already
      up to date.
      
      This commit fixes things to fail the releasepage when we can't clear the
      extent state bits.  It covers both data pages and metadata tree blocks.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e3f24cc5
    • C
      Btrfs: fix page->private races · eb14ab8e
      Chris Mason 提交于
      There is a race where btrfs_releasepage can drop the
      page->private contents just as alloc_extent_buffer is setting
      up pages for metadata.  Because of how the Btrfs page flags work,
      this results in us skipping the crc on the page during IO.
      
      This patch sovles the race by waiting until after the extent buffer
      is inserted into the radix tree before it sets page private.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      eb14ab8e
  16. 01 2月, 2011 1 次提交
  17. 29 1月, 2011 1 次提交
  18. 17 1月, 2011 1 次提交
  19. 22 12月, 2010 1 次提交
  20. 28 11月, 2010 1 次提交
    • J
      Btrfs: fix fiemap · 975f84fe
      Josef Bacik 提交于
      There are two big problems currently with FIEMAP
      
      1) We return extents for holes.  This isn't supposed to happen, we just don't
      return extents for holes and then userspace interprets the lack of an extent as
      a hole.
      
      2) We sometimes don't set FIEMAP_EXTENT_LAST properly.  This is because we wait
      to see a EXTENT_FLAG_VACANCY flag on the em, but this won't happen if say we ask
      fiemap to map up to the last extent in a file, and there is nothing but holes up
      to the i_size.  To fix this we need to lookup the last extent in this file and
      save the logical offset, so if we happen to try and map that extent we can be
      sure to set FIEMAP_EXTENT_LAST.
      
      With this patch we now pass xfstest 225, which we never have before.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      975f84fe
  21. 22 11月, 2010 2 次提交
  22. 30 10月, 2010 2 次提交
  23. 29 10月, 2010 2 次提交
  24. 06 7月, 2010 1 次提交
  25. 26 5月, 2010 1 次提交
    • C
      Btrfs: rework O_DIRECT enospc handling · 4845e44f
      Chris Mason 提交于
      This changes O_DIRECT write code to mark extents as delalloc
      while it is processing them.  Yan Zheng has reworked the
      enospc accounting based on tracking delalloc extents and
      this makes it much easier to track enospc in the O_DIRECT code.
      
      There are a few space cases with the O_DIRECT code though,
      it only sets the EXTENT_DELALLOC bits, instead of doing
      EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because
      we don't want to mess with clearing the dirty and uptodate
      bits when things go wrong.  This is important because there
      are no pages in the page cache, so any extent state structs
      that we put in the tree won't get freed by releasepage.  We have
      to clear them ourselves as the DIO ends.
      
      With this commit, we reserve space at in btrfs_file_aio_write,
      and then as each btrfs_direct_IO call progresses it sets
      EXTENT_DELALLOC on the range.
      
      btrfs_get_blocks_direct is responsible for clearing the delalloc
      at the same time it drops the extent lock.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      4845e44f
  26. 25 5月, 2010 3 次提交
  27. 06 4月, 2010 1 次提交
    • N
      Btrfs: use add_to_page_cache_lru, use __page_cache_alloc · 28ecb609
      Nick Piggin 提交于
      Pagecache pages should be allocated with __page_cache_alloc, so they
      obey pagecache memory policies.
      
      add_to_page_cache_lru is exported, so it should be used. Benefits over
      using a private pagevec: neater code, 128 bytes fewer stack used, percpu
      lru ordering is preserved, and finally don't need to flush pagevec
      before returning so batching may be shared with other LRU insertions.
      
      Signed-off-by: Nick Piggin <npiggin@suse.de>:
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      28ecb609
  28. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6