1. 08 4月, 2011 11 次提交
    • C
      xfs: fix xfs_debug warnings · 957935dc
      Christoph Hellwig 提交于
      For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no
      effect in xfs_debug:
      
      fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles':
      fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect
      
      The reason for that is that the various new xfs message functions have a
      return value which is never used, and in case of the non-debug build
      xfs_debug the macro evaluates to a plain 0 which produces the above
      warnings.  This can be fixed by turning xfs_debug into an inline function
      instead of a macro, but in addition to that I've also changed all the
      message helpers to return void as we never use their return values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      957935dc
    • C
      xfs: fix variable set but not used warnings · ecb697c1
      Christoph Hellwig 提交于
      GCC 4.6 now warnings about variables set but not used.  Fix the trivially
      fixable warnings of this sort.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      ecb697c1
    • D
      xfs: convert log tail checking to a warning · da8a1a4a
      Dave Chinner 提交于
      On the Power platform, the log tail debug checks fire excessively
      causing the system to panic early in testing. The debug checks are
      known to be racy, though on x86_64 there is no evidence that they
      trigger at all.
      
      We want to keep the checks active on debug systems to alert us to
      problems with log space accounting, but we need to reduce the impact
      of a racy check on testing on the Power platform.
      
      As a result, convert the ASSERT conditions to warnings, and
      allow them to fire only once per filesystem mount. This will prevent
      false positives from interfering with testing, whilst still
      providing us with the indication that they may be a problem with log
      space accounting should that occur.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      da8a1a4a
    • D
      xfs: catch bad block numbers freeing extents. · be65b18a
      Dave Chinner 提交于
      A fuzzed filesystem crashed a kernel when freeing an extent with a
      block number beyond the end of the filesystem. Convert all the debug
      asserts in xfs_free_extent() to active checks so that we catch bad
      extents and return that the filesytsem is corrupted rather than
      crashing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      be65b18a
    • D
      xfs: push the AIL from memory reclaim and periodic sync · fd074841
      Dave Chinner 提交于
      When we are short on memory, we want to expedite the cleaning of
      dirty objects.  Hence when we run short on memory, we need to kick
      the AIL flushing into action to clean as many dirty objects as
      quickly as possible.  To implement this, sample the lsn of the log
      item at the head of the AIL and use that as the push target for the
      AIL flush.
      
      Further, we keep items in the AIL that are dirty that are not
      tracked any other way, so we can get objects sitting in the AIL that
      don't get written back until the AIL is pushed. Hence to get the
      filesystem to the idle state, we might need to push the AIL to flush
      out any remaining dirty objects sitting in the AIL. This requires
      the same push mechanism as the reclaim push.
      
      This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
      match the new xfs_ail_max_lsn() function introduced in this patch.
      Similarly for xfs_trans_ail_push -> xfs_ail_push.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      fd074841
    • D
      xfs: clean up code layout in xfs_trans_ail.c · cd4a3c50
      Dave Chinner 提交于
      This patch rearranges the location of functions in xfs_trans_ail.c
      to remove the need for forward declarations of those functions in
      preparation for adding new functions without the need for forward
      declarations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      cd4a3c50
    • D
      xfs: convert the xfsaild threads to a workqueue · 0bf6a5bd
      Dave Chinner 提交于
      Similar to the xfssyncd, the per-filesystem xfsaild threads can be
      converted to a global workqueue and run periodically by delayed
      works. This makes sense for the AIL pushing because it uses
      variable timeouts depending on the work that needs to be done.
      
      By removing the xfsaild, we simplify the AIL pushing code and
      remove the need to spread the code to implement the threading
      and pushing across multiple files.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      0bf6a5bd
    • D
      xfs: introduce background inode reclaim work · a7b339f1
      Dave Chinner 提交于
      Background inode reclaim needs to run more frequently that the XFS
      syncd work is run as 30s is too long between optimal reclaim runs.
      Add a new periodic work item to the xfs syncd workqueue to run a
      fast, non-blocking inode reclaim scan.
      
      Background inode reclaim is kicked by the act of marking inodes for
      reclaim.  When an AG is first marked as having reclaimable inodes,
      the background reclaim work is kicked. It will continue to run
      periodically untill it detects that there are no more reclaimable
      inodes. It will be kicked again when the first inode is queued for
      reclaim.
      
      To ensure shrinker based inode reclaim throttles to the inode
      cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
      background inode reclaim so that when we are low on memory we are
      trying to reclaim inodes as efficiently as possible. This kick shoul
      d not be necessary, but it will protect against failures to kick the
      background reclaim when inodes are first dirtied.
      
      To provide the rate throttling, make the shrinker pass do
      synchronous inode reclaim so that it blocks on inodes under IO. This
      means that the shrinker will reclaim inodes rather than just
      skipping over them, but it does not adversely affect the rate of
      reclaim because most dirty inodes are already under IO due to the
      background reclaim work the shrinker kicked.
      
      These two modifications solve one of the two OOM killer invocations
      Chris Mason reported recently when running a stress testing script.
      The particular workload trigger for the OOM killer invocation is
      where there are more threads than CPUs all unlinking files in an
      extremely memory constrained environment. Unlike other solutions,
      this one does not have a performance impact on performance when
      memory is not constrained or the number of concurrent threads
      operating is <= to the number of CPUs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      a7b339f1
    • D
      xfs: convert ENOSPC inode flushing to use new syncd workqueue · 89e4cb55
      Dave Chinner 提交于
      On of the problems with the current inode flush at ENOSPC is that we
      queue a flush per ENOSPC event, regardless of how many are already
      queued. Thi can result in    hundreds of queued flushes, most of
      which simply burn CPU scanned and do no real work. This simply slows
      down allocation at ENOSPC.
      
      We really only need one active flush at a time, and we can easily
      implement that via the new xfs_syncd_wq. All we need to do is queue
      a flush if one is not already active, then block waiting for the
      currently active flush to complete. The result is that we only ever
      have a single ENOSPC inode flush active at a time and this greatly
      reduces the overhead of ENOSPC processing.
      
      On my 2p test machine, this results in tests exercising ENOSPC
      conditions running significantly faster - 042 halves execution time,
      083 drops from 60s to 5s, etc - while not introducing test
      regressions.
      
      This allows us to remove the old xfssyncd threads and infrastructure
      as they are no longer used.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      89e4cb55
    • D
      xfs: introduce a xfssyncd workqueue · c6d09b66
      Dave Chinner 提交于
      All of the work xfssyncd does is background functionality. There is
      no need for a thread per filesystem to do this work - it can al be
      managed by a global workqueue now they manage concurrency
      effectively.
      
      Introduce a new gglobal xfssyncd workqueue, and convert the periodic
      work to use this new functionality. To do this, use a delayed work
      construct to schedule the next running of the periodic sync work
      for the filesystem. When the sync work is complete, queue a new
      delayed work for the next running of the sync work.
      
      For laptop mode, we wait on completion for the sync works, so ensure
      that the sync work queuing interface can flush and wait for work to
      complete to enable the work queue infrastructure to replace the
      current sequence number and wakeup that is used.
      
      Because the sync work does non-trivial amounts of work, mark the
      new work queue as CPU intensive.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      c6d09b66
    • D
      xfs: fix extent format buffer allocation size · e828776a
      Dave Chinner 提交于
      When formatting an inode item, we have to allocate a separate buffer
      to hold extents when there are delayed allocation extents on the
      inode and it is in extent format. The allocation size is derived
      from the in-core data fork representation, which accounts for
      delayed allocation extents, while the on-disk representation does
      not contain any delalloc extents.
      
      As a result of this mismatch, the allocated buffer can be far larger
      than needed to hold the real extent list which, due to the fact the
      inode is in extent format, is limited to the size of the literal
      area of the inode. However, we can have thousands of delalloc
      extents, resulting in an allocation size orders of magnitude larger
      than is needed to hold all the real extents.
      
      Fix this by limiting the size of the buffer being allocated to the
      size of the literal area of the inodes in the filesystem (i.e. the
      maximum size an inode fork can grow to).
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAlex Elder <aelder@sgi.com>
      e828776a
  2. 31 3月, 2011 1 次提交
  3. 29 3月, 2011 3 次提交
  4. 28 3月, 2011 25 次提交
    • C
      Btrfs: fix __btrfs_map_block on 32 bit machines · d9d04879
      Chris Mason 提交于
      Recent changes for discard support didn't compile,
      this fixes them not to try and % 64 bit numbers.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d9d04879
    • M
      btrfs: fix possible deadlock by clearing __GFP_FS flag · 1561deda
      Miao Xie 提交于
      Using the GFP_HIGHUSER_MOVABLE flag to allocate the metadata's page may cause
      deadlock.
        Task1
        open()
          ...
          btrfs_search_slot()
            ...
            btrfs_cow_block()
      	...
      	alloc_page()
      	  wait for reclaiming
      					shrink_slab()
      					  ...
      					  shrink_icache_memory()
      					    ...
      					    btrfs_evict_inode()
      					      ...
      					      btrfs_search_slot()
      
      If the path is locked by task1, the deadlock happens.
      
      So the btree's page cache is different with the file's page cache, it can not
      allocate pages by GFP_HIGHUSER_MOVABLE flag, we must clear __GFP_FS flag in
      GFP_HIGHUSER_MOVABLE flag.
      Reported-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1561deda
    • A
      btrfs: check link counter overflow in link(2) · c055e99e
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      c055e99e
    • A
      btrfs: don't mess with i_nlink of unlocked inode in rename() · 92986796
      Al Viro 提交于
      old_inode is not locked; it's not safe to play with its link
      count.  Instead of bumping it and calling btrfs_unlink_inode(),
      add a variant of the latter that does not do btrfs_drop_nlink()/
      btrfs_update_inode(), call it instead of btrfs_inc_nlink()/
      btrfs_unlink_inode() and do btrfs_update_inode() ourselves.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      92986796
    • T
      Btrfs: check return value of btrfs_alloc_path() · c2db1073
      Tsutomu Itoh 提交于
      Adding the check on the return value of btrfs_alloc_path() to several places.
      And, some of callers are modified by this change.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      c2db1073
    • L
      Btrfs: fix OOPS of empty filesystem after balance · c59021f8
      liubo 提交于
      btrfs will remove unused block groups after balance.
      When a empty filesystem is balanced, the block group with tag "DATA" may be
      dropped, and after umount and mount again, it will not find "DATA" space_info
      and lead to OOPS.
      So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS.
      Reported-by: NDaniel J Blueman <daniel.blueman@gmail.com>
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      c59021f8
    • L
      Btrfs: fix memory leak of empty filesystem after balance · 9f7c43c9
      liubo 提交于
      After Josef's patch(commit 3c14874a),
      btrfs will exclude super bytes when reading block groups(by marking a extent
      state UPTODATE).  However, these bytes do not get freed while balance remove
      unused block groups, and we won't process those removed ones any more, when
      we do umount and unload the btrfs module,  btrfs hits a memory leak.
      
      This patch add the missing free operation.
      
      Reproduce steps:
      $ mkfs.btrfs disk
      $ mount disk /mnt/btrfs -o loop
      $ btrfs filesystem balance /mnt/btrfs
      $ umount /mnt/btrfs
      $ rmmod btrfs
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9f7c43c9
    • L
      Btrfs: fix return value of setflags ioctl · 2d4e6f6a
      liubo 提交于
      setflags ioctl should return error when any checks fail.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2d4e6f6a
    • Y
      Btrfs: fix uncheck memory allocations · dac97e51
      Yoshinori Sano 提交于
      To make Btrfs code more robust, several return value checks where memory
      allocation can fail are introduced. I use BUG_ON where I don't know how
      to handle the error properly, which increases the number of using the
      notorious BUG_ON, though.
      Signed-off-by: NYoshinori Sano <yoshinori.sano@gmail.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      dac97e51
    • L
      btrfs: make inode ref log recovery faster · c622ae60
      liubo 提交于
      When we recover from crash via write-ahead log tree and process
      the inode refs, for each btrfs_inode_ref item, we will
      1) check if we already have a perfect match in fs/file tree, if
         we have, then we're done.
      2) search the corresponding back reference in fs/file tree, and
         check all the names in this back reference to see if they are
         also in the log to avoid conflict corners.
      3) recover the logged inode refs to fs/file tree.
      
      In current btrfs, however,
      - for 2)'s check, once is enough, since the checked back reference
        will remain unchanged after processing all the inode refs belonged
        to the key.
      - it has no need to do another 1) between 2) and 3).
      
      I've made a small test to show how it improves,
      
      $dd if=/dev/zero of=foobar bs=4K count=1
      $sync
      $make 100 hard links continuously, like ln foobar link_i
      $fsync foobar
      $echo b > /proc/sysrq-trigger
      after reboot
      $time mount DEV PATH
      
      without patch:
      real    0m0.285s
      user    0m0.001s
      sys     0m0.009s
      
      with patch:
      real    0m0.123s
      user    0m0.000s
      sys     0m0.010s
      
      Changelog v1->v2:
      - fix double free - pointed by David Sterba
      Changelog v2->v3:
      - adjust free order
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      c622ae60
    • L
      Btrfs: add btrfs_trim_fs() to handle FITRIM · f7039b1d
      Li Dongyang 提交于
      We take an free extent out from allocator, trim it, then put it back,
      but before we trim the block group, we should make sure the block group is
      cached, so plus a little change to make cache_block_group() run without a
      transaction.
      Signed-off-by: NLi Dongyang <lidongyang@novell.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f7039b1d
    • L
      Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes · 5378e607
      Li Dongyang 提交于
      Callers of btrfs_discard_extent() should check if we are mounted with -o discard,
      as we want to make fitrim to work even the fs is not mounted with -o discard.
      Also we should use REQ_DISCARD to map the free extent to get a full mapping,
      last we only return errors if
      1. the error is not a EOPNOTSUPP
      2. no device supports discard
      Signed-off-by: NLi Dongyang <lidongyang@novell.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5378e607
    • L
      Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP · fce3bb9a
      Li Dongyang 提交于
      btrfs_map_block() will only return a single stripe length, but we want the
      full extent be mapped to each disk when we are trimming the extent,
      so we add length to btrfs_bio_stripe and fill it if we are mapping for REQ_DISCARD.
      Signed-off-by: NLi Dongyang <lidongyang@novell.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      fce3bb9a
    • L
      Btrfs: make update_reserved_bytes() public · b4d00d56
      Li Dongyang 提交于
      Make the function public as we should update the reserved extents calculations
      after taking out an extent for trimming.
      Signed-off-by: NLi Dongyang <lidongyang@novell.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b4d00d56
    • M
      btrfs: return EXDEV when linking from different subvolumes · 3ab3564f
      Mark Fasheh 提交于
      btrfs_link returns EPERM if a cross-subvolume link is attempted.
      
      However, in this case I believe EXDEV to be the more appropriate value.
      >From the link(2) man page:
      
      EXDEV  oldpath and newpath are not on the same mounted file system.  (Linux
             permits a file system to be mounted at multiple points, but link()
             does not work across different mount points, even if the same file
             system is mounted on both.)
      
      This matters because an application may have different behaviors based on
      return codes.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3ab3564f
    • L
      Btrfs: Per file/directory controls for COW and compression · 75e7cb7f
      Liu Bo 提交于
      Data compression and data cow are controlled across the entire FS by mount
      options right now.  ioctls are needed to set this on a per file or per
      directory basis.  This has been proposed previously, but VFS developers
      wanted us to use generic ioctls rather than btrfs-specific ones.
      
      According to Chris's comment, there should be just one true compression
      method(probably LZO) stored in the super.  However, before this, we would
      wait for that one method is stable enough to be adopted into the super.
      So I list it as a long term goal, and just store it in ram today.
      
      After applying this patch, we can use the generic "FS_IOC_SETFLAGS" ioctl to
      control file and directory's datacow and compression attribute.
      
      NOTE:
       - The compression type is selected by such rules:
         If we mount btrfs with compress options, ie, zlib/lzo, the type is it.
         Otherwise, we'll use the default compress type (zlib today).
      
      v1->v2:
      - rebase to the latest btrfs.
      v2->v3:
      - fix a problem, i.e. when a file is set NOCOW via mount option, then this NOCOW
        will be screwed by inheritance from parent directory.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      75e7cb7f
    • M
      btrfs: use GFP_NOFS instead of GFP_KERNEL · fc0e4a31
      Miao Xie 提交于
      In the filesystem context, we must allocate memory by GFP_NOFS,
      or we may start another filesystem operation and make kswap thread hang up.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      fc0e4a31
    • T
      Btrfs: check return value of read_tree_block() · 97d9a8a4
      Tsutomu Itoh 提交于
      This patch is checking return value of read_tree_block(),
      and if it is NULL, error processing.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      97d9a8a4
    • D
      btrfs: properly access unaligned checksum buffer · 7e75bf3f
      David Sterba 提交于
      On Fri, Mar 18, 2011 at 11:56:53AM -0400, Chris Mason wrote:
      > Thanks for fielding this one.  Does put_unaligned_le32 optimize away on
      > platforms with efficient access?  It would be great if we didn't need
      > the #ifdef.
      
      (quicktest: assembly output is same for put_unaligned_le32 and direct
      assignment on my x86_64)
      I was originally following examples in
      Documentation/unaligned-memory-access.txt. From other code it seems to me that
      the define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is intended for larger
      portions of code. Macros/wrappers for {put,get}_unaligned* are chosen via
      arch/<arch>/include/asm/unaligned.h accordingly, therefore it's safe to use
      put_unaligned_le32 without the ifdef.
      
      dave
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7e75bf3f
    • T
      Btrfs: cleanup some BUG_ON() · db5b493a
      Tsutomu Itoh 提交于
      This patch changes some BUG_ON() to the error return.
      (but, most callers still use BUG_ON())
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      db5b493a
    • L
      Btrfs: add initial tracepoint support for btrfs · 1abe9b8a
      liubo 提交于
      Tracepoints can provide insight into why btrfs hits bugs and be greatly
      helpful for debugging, e.g
                    dd-7822  [000]  2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0
                    dd-7822  [000]  2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0
       btrfs-transacti-7804  [001]  2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0)
       btrfs-transacti-7804  [001]  2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0)
       btrfs-transacti-7804  [001]  2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8
         flush-btrfs-2-7821  [001]  2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA
         flush-btrfs-2-7821  [001]  2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0)
         flush-btrfs-2-7821  [001]  2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0)
         flush-btrfs-2-7821  [000]  2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0)
       btrfs-endio-wri-7800  [001]  2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0)
       btrfs-endio-wri-7800  [001]  2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0)
      
      Here is what I have added:
      
      1) ordere_extent:
              btrfs_ordered_extent_add
              btrfs_ordered_extent_remove
              btrfs_ordered_extent_start
              btrfs_ordered_extent_put
      
      These provide critical information to understand how ordered_extents are
      updated.
      
      2) extent_map:
              btrfs_get_extent
      
      extent_map is used in both read and write cases, and it is useful for tracking
      how btrfs specific IO is running.
      
      3) writepage:
              __extent_writepage
              btrfs_writepage_end_io_hook
      
      Pages are cirtical resourses and produce a lot of corner cases during writeback,
      so it is valuable to know how page is written to disk.
      
      4) inode:
              btrfs_inode_new
              btrfs_inode_request
              btrfs_inode_evict
      
      These can show where and when a inode is created, when a inode is evicted.
      
      5) sync:
              btrfs_sync_file
              btrfs_sync_fs
      
      These show sync arguments.
      
      6) transaction:
              btrfs_transaction_commit
      
      In transaction based filesystem, it will be useful to know the generation and
      who does commit.
      
      7) back reference and cow:
      	btrfs_delayed_tree_ref
      	btrfs_delayed_data_ref
      	btrfs_delayed_ref_head
      	btrfs_cow_block
      
      Btrfs natively supports back references, these tracepoints are helpful on
      understanding btrfs's COW mechanism.
      
      8) chunk:
      	btrfs_chunk_alloc
      	btrfs_chunk_free
      
      Chunk is a link between physical offset and logical offset, and stands for space
      infomation in btrfs, and these are helpful on tracing space things.
      
      9) reserved_extent:
      	btrfs_reserved_extent_alloc
      	btrfs_reserved_extent_free
      
      These can show how btrfs uses its space.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1abe9b8a
    • C
      Btrfs: use RCU instead of a spinlock to protect the root node · 240f62c8
      Chris Mason 提交于
      The pointer to the extent buffer for the root of each tree
      is protected by a spinlock so that we can safely read the pointer
      and take a reference on the extent buffer.
      
      But now that the extent buffers are freed via RCU, we can safely
      use rcu_read_lock instead.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      240f62c8
    • R
      eCryptfs: write lock requested keys · b5695d04
      Roberto Sassu 提交于
      A requested key is write locked in order to prevent modifications on the
      authentication token while it is being used.
      Signed-off-by: NRoberto Sassu <roberto.sassu@polito.it>
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      b5695d04
    • R
      eCryptfs: move ecryptfs_find_auth_tok_for_sig() call before mutex_lock · 950983fc
      Roberto Sassu 提交于
      The ecryptfs_find_auth_tok_for_sig() call is moved before the
      mutex_lock(s->tfm_mutex) instruction in order to avoid possible deadlocks
      that may occur by holding the lock on the two semaphores 'key->sem' and
      's->tfm_mutex' in reverse order.
      Signed-off-by: NRoberto Sassu <roberto.sassu@polito.it>
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      950983fc
    • R
      eCryptfs: verify authentication tokens before their use · 0e1fc5ef
      Roberto Sassu 提交于
      Authentication tokens content may change if another requestor calls the
      update() method of the corresponding key. The new function
      ecryptfs_verify_auth_tok_from_key() retrieves the authentication token from
      the provided key and verifies if it is still valid before being used to
      encrypt or decrypt an eCryptfs file.
      Signed-off-by: NRoberto Sassu <roberto.sassu@polito.it>
      [tyhicks: Minor formatting changes]
      Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>
      0e1fc5ef