提交 · 957935dcd8e11d6f789b4ed769b376040e15565b · openanolis / cloud-kernel

08 4月, 2011 11 次提交

由 Christoph Hellwig 提交于 4月 02, 2011

For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no
effect in xfs_debug:

fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles':
fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect

The reason for that is that the various new xfs message functions have a
return value which is never used, and in case of the non-debug build
xfs_debug the macro evaluates to a plain 0 which produces the above
warnings.  This can be fixed by turning xfs_debug into an inline function
instead of a macro, but in addition to that I've also changed all the
message helpers to return void as we never use their return values.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>

957935dc

xfs: fix variable set but not used warnings · ecb697c1

由 Christoph Hellwig 提交于 4月 04, 2011

GCC 4.6 now warnings about variables set but not used.  Fix the trivially
fixable warnings of this sort.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAlex Elder <aelder@sgi.com>

ecb697c1

xfs: convert log tail checking to a warning · da8a1a4a

由 Dave Chinner 提交于 4月 08, 2011

On the Power platform, the log tail debug checks fire excessively
causing the system to panic early in testing. The debug checks are
known to be racy, though on x86_64 there is no evidence that they
trigger at all.

We want to keep the checks active on debug systems to alert us to
problems with log space accounting, but we need to reduce the impact
of a racy check on testing on the Power platform.

As a result, convert the ASSERT conditions to warnings, and
allow them to fire only once per filesystem mount. This will prevent
false positives from interfering with testing, whilst still
providing us with the indication that they may be a problem with log
space accounting should that occur.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

da8a1a4a

xfs: catch bad block numbers freeing extents. · be65b18a

由 Dave Chinner 提交于 4月 08, 2011

A fuzzed filesystem crashed a kernel when freeing an extent with a
block number beyond the end of the filesystem. Convert all the debug
asserts in xfs_free_extent() to active checks so that we catch bad
extents and return that the filesytsem is corrupted rather than
crashing.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

be65b18a

xfs: push the AIL from memory reclaim and periodic sync · fd074841

由 Dave Chinner 提交于 4月 08, 2011

When we are short on memory, we want to expedite the cleaning of
dirty objects.  Hence when we run short on memory, we need to kick
the AIL flushing into action to clean as many dirty objects as
quickly as possible.  To implement this, sample the lsn of the log
item at the head of the AIL and use that as the push target for the
AIL flush.

Further, we keep items in the AIL that are dirty that are not
tracked any other way, so we can get objects sitting in the AIL that
don't get written back until the AIL is pushed. Hence to get the
filesystem to the idle state, we might need to push the AIL to flush
out any remaining dirty objects sitting in the AIL. This requires
the same push mechanism as the reclaim push.

This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
match the new xfs_ail_max_lsn() function introduced in this patch.
Similarly for xfs_trans_ail_push -> xfs_ail_push.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>

fd074841

xfs: clean up code layout in xfs_trans_ail.c · cd4a3c50

由 Dave Chinner 提交于 4月 08, 2011

This patch rearranges the location of functions in xfs_trans_ail.c
to remove the need for forward declarations of those functions in
preparation for adding new functions without the need for forward
declarations.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>

cd4a3c50

xfs: convert the xfsaild threads to a workqueue · 0bf6a5bd

由 Dave Chinner 提交于 4月 08, 2011

Similar to the xfssyncd, the per-filesystem xfsaild threads can be
converted to a global workqueue and run periodically by delayed
works. This makes sense for the AIL pushing because it uses
variable timeouts depending on the work that needs to be done.

By removing the xfsaild, we simplify the AIL pushing code and
remove the need to spread the code to implement the threading
and pushing across multiple files.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

0bf6a5bd

xfs: introduce background inode reclaim work · a7b339f1

由 Dave Chinner 提交于 4月 08, 2011

Background inode reclaim needs to run more frequently that the XFS
syncd work is run as 30s is too long between optimal reclaim runs.
Add a new periodic work item to the xfs syncd workqueue to run a
fast, non-blocking inode reclaim scan.

Background inode reclaim is kicked by the act of marking inodes for
reclaim.  When an AG is first marked as having reclaimable inodes,
the background reclaim work is kicked. It will continue to run
periodically untill it detects that there are no more reclaimable
inodes. It will be kicked again when the first inode is queued for
reclaim.

To ensure shrinker based inode reclaim throttles to the inode
cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
background inode reclaim so that when we are low on memory we are
trying to reclaim inodes as efficiently as possible. This kick shoul
d not be necessary, but it will protect against failures to kick the
background reclaim when inodes are first dirtied.

To provide the rate throttling, make the shrinker pass do
synchronous inode reclaim so that it blocks on inodes under IO. This
means that the shrinker will reclaim inodes rather than just
skipping over them, but it does not adversely affect the rate of
reclaim because most dirty inodes are already under IO due to the
background reclaim work the shrinker kicked.

These two modifications solve one of the two OOM killer invocations
Chris Mason reported recently when running a stress testing script.
The particular workload trigger for the OOM killer invocation is
where there are more threads than CPUs all unlinking files in an
extremely memory constrained environment. Unlike other solutions,
this one does not have a performance impact on performance when
memory is not constrained or the number of concurrent threads
operating is <= to the number of CPUs.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

a7b339f1

xfs: convert ENOSPC inode flushing to use new syncd workqueue · 89e4cb55

由 Dave Chinner 提交于 4月 08, 2011

On of the problems with the current inode flush at ENOSPC is that we
queue a flush per ENOSPC event, regardless of how many are already
queued. Thi can result in    hundreds of queued flushes, most of
which simply burn CPU scanned and do no real work. This simply slows
down allocation at ENOSPC.

We really only need one active flush at a time, and we can easily
implement that via the new xfs_syncd_wq. All we need to do is queue
a flush if one is not already active, then block waiting for the
currently active flush to complete. The result is that we only ever
have a single ENOSPC inode flush active at a time and this greatly
reduces the overhead of ENOSPC processing.

On my 2p test machine, this results in tests exercising ENOSPC
conditions running significantly faster - 042 halves execution time,
083 drops from 60s to 5s, etc - while not introducing test
regressions.

This allows us to remove the old xfssyncd threads and infrastructure
as they are no longer used.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

89e4cb55

xfs: introduce a xfssyncd workqueue · c6d09b66

由 Dave Chinner 提交于 4月 08, 2011

All of the work xfssyncd does is background functionality. There is
no need for a thread per filesystem to do this work - it can al be
managed by a global workqueue now they manage concurrency
effectively.

Introduce a new gglobal xfssyncd workqueue, and convert the periodic
work to use this new functionality. To do this, use a delayed work
construct to schedule the next running of the periodic sync work
for the filesystem. When the sync work is complete, queue a new
delayed work for the next running of the sync work.

For laptop mode, we wait on completion for the sync works, so ensure
that the sync work queuing interface can flush and wait for work to
complete to enable the work queue infrastructure to replace the
current sequence number and wakeup that is used.

Because the sync work does non-trivial amounts of work, mark the
new work queue as CPU intensive.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAlex Elder <aelder@sgi.com>

c6d09b66

xfs: fix extent format buffer allocation size · e828776a

由 Dave Chinner 提交于 4月 08, 2011

When formatting an inode item, we have to allocate a separate buffer
to hold extents when there are delayed allocation extents on the
inode and it is in extent format. The allocation size is derived
from the in-core data fork representation, which accounts for
delayed allocation extents, while the on-disk representation does
not contain any delalloc extents.

As a result of this mismatch, the allocated buffer can be far larger
than needed to hold the real extent list which, due to the fact the
inode is in extent format, is limited to the size of the literal
area of the inode. However, we can have thousands of delalloc
extents, resulting in an allocation size orders of magnitude larger
than is needed to hold all the real extents.

Fix this by limiting the size of the buffer being allocated to the
size of the literal area of the inodes in the filesystem (i.e. the
maximum size an inode fork can grow to).
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NAlex Elder <aelder@sgi.com>

e828776a

31 3月, 2011 1 次提交
- D
  xfs: fix unreferenced var error in xfs_buf.c · 89b3600c
  由 Dave Chinner 提交于 3月 29, 2011
```
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NAlex Elder <aelder@sgi.com>
```
  89b3600c
29 3月, 2011 3 次提交

fs: don't use igrab() while holding i_lock · 0444d76a

由 Dave Chinner 提交于 3月 29, 2011

Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥

If we are already holding the i_lock, we have a reference to the
inode so we can safely use ihold() to gain an extra reference. This
avoids hangs due to lock recursion on the i_lock now that the
inode_lock is gone and igrab() uses the i_lock itself.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0444d76a

Treat writes as new when holes span across page boundaries · 272b62c1

由 Goldwyn Rodrigues 提交于 2月 17, 2011

When a hole spans across page boundaries, the next write forces
a read of the block. This could end up reading existing garbage
data from the disk in ocfs2_map_page_blocks. This leads to
non-zero holes. In order to avoid this, mark the writes as new
when the holes span across page boundaries.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.de>
Signed-off-by: Njlbec <jlbec@evilplan.org>

272b62c1

fs,ocfs2: Move o2net_get_func_run_time under CONFIG_OCFS2_FS_STATS. · ed59992e

由 Rakib Mullick 提交于 3月 18, 2011

When CONFIG_DEBUG_FS=y and CONFIG_OCFS2_FS_STATS=n, we get the
following warning:

fs/ocfs2/cluster/tcp.c:213:16: warning: ‘o2net_get_func_run_time’
defined but not used

Since o2net_get_func_run_time is only called from
o2net_update_recv_stats, so move it under CONFIG_OCFS2_FS_STATS.
Signed-off-by: NRakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Njlbec <jlbec@evilplan.org>

ed59992e

28 3月, 2011 25 次提交

Btrfs: fix __btrfs_map_block on 32 bit machines · d9d04879

由 Chris Mason 提交于 3月 27, 2011

Recent changes for discard support didn't compile,
this fixes them not to try and % 64 bit numbers.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d9d04879

btrfs: fix possible deadlock by clearing __GFP_FS flag · 1561deda

由 Miao Xie 提交于 3月 27, 2011

Using the GFP_HIGHUSER_MOVABLE flag to allocate the metadata's page may cause
deadlock.
  Task1
  open()
    ...
    btrfs_search_slot()
      ...
      btrfs_cow_block()
	...
	alloc_page()
	  wait for reclaiming
					shrink_slab()
					  ...
					  shrink_icache_memory()
					    ...
					    btrfs_evict_inode()
					      ...
					      btrfs_search_slot()

If the path is locked by task1, the deadlock happens.

So the btree's page cache is different with the file's page cache, it can not
allocate pages by GFP_HIGHUSER_MOVABLE flag, we must clear __GFP_FS flag in
GFP_HIGHUSER_MOVABLE flag.
Reported-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1561deda

btrfs: check link counter overflow in link(2) · c055e99e

由 Al Viro 提交于 3月 04, 2011

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c055e99e

btrfs: don't mess with i_nlink of unlocked inode in rename() · 92986796

由 Al Viro 提交于 3月 04, 2011

old_inode is not locked; it's not safe to play with its link
count.  Instead of bumping it and calling btrfs_unlink_inode(),
add a variant of the latter that does not do btrfs_drop_nlink()/
btrfs_update_inode(), call it instead of btrfs_inc_nlink()/
btrfs_unlink_inode() and do btrfs_update_inode() ourselves.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

92986796

Btrfs: check return value of btrfs_alloc_path() · c2db1073

由 Tsutomu Itoh 提交于 3月 01, 2011

Adding the check on the return value of btrfs_alloc_path() to several places.
And, some of callers are modified by this change.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c2db1073

Btrfs: fix OOPS of empty filesystem after balance · c59021f8

由 liubo 提交于 3月 07, 2011

btrfs will remove unused block groups after balance.
When a empty filesystem is balanced, the block group with tag "DATA" may be
dropped, and after umount and mount again, it will not find "DATA" space_info
and lead to OOPS.
So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS.
Reported-by: NDaniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c59021f8

Btrfs: fix memory leak of empty filesystem after balance · 9f7c43c9

由 liubo 提交于 3月 07, 2011

After Josef's patch(commit 3c14874a),
btrfs will exclude super bytes when reading block groups(by marking a extent
state UPTODATE).  However, these bytes do not get freed while balance remove
unused block groups, and we won't process those removed ones any more, when
we do umount and unload the btrfs module,  btrfs hits a memory leak.

This patch add the missing free operation.

Reproduce steps:
$ mkfs.btrfs disk
$ mount disk /mnt/btrfs -o loop
$ btrfs filesystem balance /mnt/btrfs
$ umount /mnt/btrfs
$ rmmod btrfs
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9f7c43c9

Btrfs: fix return value of setflags ioctl · 2d4e6f6a

由 liubo 提交于 2月 24, 2011

setflags ioctl should return error when any checks fail.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

2d4e6f6a

Btrfs: fix uncheck memory allocations · dac97e51

由 Yoshinori Sano 提交于 2月 15, 2011

To make Btrfs code more robust, several return value checks where memory
allocation can fail are introduced. I use BUG_ON where I don't know how
to handle the error properly, which increases the number of using the
notorious BUG_ON, though.
Signed-off-by: NYoshinori Sano <yoshinori.sano@gmail.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

dac97e51

btrfs: make inode ref log recovery faster · c622ae60

由 liubo 提交于 3月 26, 2011

When we recover from crash via write-ahead log tree and process
the inode refs, for each btrfs_inode_ref item, we will
1) check if we already have a perfect match in fs/file tree, if
   we have, then we're done.
2) search the corresponding back reference in fs/file tree, and
   check all the names in this back reference to see if they are
   also in the log to avoid conflict corners.
3) recover the logged inode refs to fs/file tree.

In current btrfs, however,
- for 2)'s check, once is enough, since the checked back reference
  will remain unchanged after processing all the inode refs belonged
  to the key.
- it has no need to do another 1) between 2) and 3).

I've made a small test to show how it improves,

$dd if=/dev/zero of=foobar bs=4K count=1
$sync
$make 100 hard links continuously, like ln foobar link_i
$fsync foobar
$echo b > /proc/sysrq-trigger
after reboot
$time mount DEV PATH

without patch:
real    0m0.285s
user    0m0.001s
sys     0m0.009s

with patch:
real    0m0.123s
user    0m0.000s
sys     0m0.010s

Changelog v1->v2:
- fix double free - pointed by David Sterba
Changelog v2->v3:
- adjust free order
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c622ae60

Btrfs: add btrfs_trim_fs() to handle FITRIM · f7039b1d

由 Li Dongyang 提交于 3月 24, 2011

We take an free extent out from allocator, trim it, then put it back,
but before we trim the block group, we should make sure the block group is
cached, so plus a little change to make cache_block_group() run without a
transaction.
Signed-off-by: NLi Dongyang <lidongyang@novell.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f7039b1d

Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes · 5378e607

由 Li Dongyang 提交于 3月 24, 2011

Callers of btrfs_discard_extent() should check if we are mounted with -o discard,
as we want to make fitrim to work even the fs is not mounted with -o discard.
Also we should use REQ_DISCARD to map the free extent to get a full mapping,
last we only return errors if
1. the error is not a EOPNOTSUPP
2. no device supports discard
Signed-off-by: NLi Dongyang <lidongyang@novell.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5378e607

Btrfs: make btrfs_map_block() return entire free extent for each device of RAID0/1/10/DUP · fce3bb9a

由 Li Dongyang 提交于 3月 24, 2011

btrfs_map_block() will only return a single stripe length, but we want the
full extent be mapped to each disk when we are trimming the extent,
so we add length to btrfs_bio_stripe and fill it if we are mapping for REQ_DISCARD.
Signed-off-by: NLi Dongyang <lidongyang@novell.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fce3bb9a

Btrfs: make update_reserved_bytes() public · b4d00d56

由 Li Dongyang 提交于 3月 24, 2011

Make the function public as we should update the reserved extents calculations
after taking out an extent for trimming.
Signed-off-by: NLi Dongyang <lidongyang@novell.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b4d00d56

btrfs: return EXDEV when linking from different subvolumes · 3ab3564f

由 Mark Fasheh 提交于 3月 22, 2011

btrfs_link returns EPERM if a cross-subvolume link is attempted.

However, in this case I believe EXDEV to be the more appropriate value.
>From the link(2) man page:

EXDEV  oldpath and newpath are not on the same mounted file system.  (Linux
       permits a file system to be mounted at multiple points, but link()
       does not work across different mount points, even if the same file
       system is mounted on both.)

This matters because an application may have different behaviors based on
return codes.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3ab3564f

Btrfs: Per file/directory controls for COW and compression · 75e7cb7f

由 Liu Bo 提交于 3月 22, 2011

Data compression and data cow are controlled across the entire FS by mount
options right now.  ioctls are needed to set this on a per file or per
directory basis.  This has been proposed previously, but VFS developers
wanted us to use generic ioctls rather than btrfs-specific ones.

According to Chris's comment, there should be just one true compression
method(probably LZO) stored in the super.  However, before this, we would
wait for that one method is stable enough to be adopted into the super.
So I list it as a long term goal, and just store it in ram today.

After applying this patch, we can use the generic "FS_IOC_SETFLAGS" ioctl to
control file and directory's datacow and compression attribute.

NOTE:
 - The compression type is selected by such rules:
   If we mount btrfs with compress options, ie, zlib/lzo, the type is it.
   Otherwise, we'll use the default compress type (zlib today).

v1->v2:
- rebase to the latest btrfs.
v2->v3:
- fix a problem, i.e. when a file is set NOCOW via mount option, then this NOCOW
  will be screwed by inheritance from parent directory.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

75e7cb7f

btrfs: use GFP_NOFS instead of GFP_KERNEL · fc0e4a31

由 Miao Xie 提交于 3月 24, 2011

In the filesystem context, we must allocate memory by GFP_NOFS,
or we may start another filesystem operation and make kswap thread hang up.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

fc0e4a31

Btrfs: check return value of read_tree_block() · 97d9a8a4

由 Tsutomu Itoh 提交于 3月 24, 2011

This patch is checking return value of read_tree_block(),
and if it is NULL, error processing.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

97d9a8a4

btrfs: properly access unaligned checksum buffer · 7e75bf3f

由 David Sterba 提交于 3月 18, 2011

On Fri, Mar 18, 2011 at 11:56:53AM -0400, Chris Mason wrote:
> Thanks for fielding this one.  Does put_unaligned_le32 optimize away on
> platforms with efficient access?  It would be great if we didn't need
> the #ifdef.

(quicktest: assembly output is same for put_unaligned_le32 and direct
assignment on my x86_64)
I was originally following examples in
Documentation/unaligned-memory-access.txt. From other code it seems to me that
the define CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is intended for larger
portions of code. Macros/wrappers for {put,get}_unaligned* are chosen via
arch/<arch>/include/asm/unaligned.h accordingly, therefore it's safe to use
put_unaligned_le32 without the ifdef.

dave
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7e75bf3f

Btrfs: cleanup some BUG_ON() · db5b493a

由 Tsutomu Itoh 提交于 3月 23, 2011

This patch changes some BUG_ON() to the error return.
(but, most callers still use BUG_ON())
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

db5b493a

Btrfs: add initial tracepoint support for btrfs · 1abe9b8a

由 liubo 提交于 3月 24, 2011

Tracepoints can provide insight into why btrfs hits bugs and be greatly
helpful for debugging, e.g
              dd-7822  [000]  2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0
              dd-7822  [000]  2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0
 btrfs-transacti-7804  [001]  2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0)
 btrfs-transacti-7804  [001]  2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0)
 btrfs-transacti-7804  [001]  2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8
   flush-btrfs-2-7821  [001]  2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA
   flush-btrfs-2-7821  [001]  2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0)
   flush-btrfs-2-7821  [001]  2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0)
   flush-btrfs-2-7821  [000]  2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0)
 btrfs-endio-wri-7800  [001]  2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0)
 btrfs-endio-wri-7800  [001]  2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0)

Here is what I have added:

1) ordere_extent:
        btrfs_ordered_extent_add
        btrfs_ordered_extent_remove
        btrfs_ordered_extent_start
        btrfs_ordered_extent_put

These provide critical information to understand how ordered_extents are
updated.

2) extent_map:
        btrfs_get_extent

extent_map is used in both read and write cases, and it is useful for tracking
how btrfs specific IO is running.

3) writepage:
        __extent_writepage
        btrfs_writepage_end_io_hook

Pages are cirtical resourses and produce a lot of corner cases during writeback,
so it is valuable to know how page is written to disk.

4) inode:
        btrfs_inode_new
        btrfs_inode_request
        btrfs_inode_evict

These can show where and when a inode is created, when a inode is evicted.

5) sync:
        btrfs_sync_file
        btrfs_sync_fs

These show sync arguments.

6) transaction:
        btrfs_transaction_commit

In transaction based filesystem, it will be useful to know the generation and
who does commit.

7) back reference and cow:
	btrfs_delayed_tree_ref
	btrfs_delayed_data_ref
	btrfs_delayed_ref_head
	btrfs_cow_block

Btrfs natively supports back references, these tracepoints are helpful on
understanding btrfs's COW mechanism.

8) chunk:
	btrfs_chunk_alloc
	btrfs_chunk_free

Chunk is a link between physical offset and logical offset, and stands for space
infomation in btrfs, and these are helpful on tracing space things.

9) reserved_extent:
	btrfs_reserved_extent_alloc
	btrfs_reserved_extent_free

These can show how btrfs uses its space.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1abe9b8a

Btrfs: use RCU instead of a spinlock to protect the root node · 240f62c8

由 Chris Mason 提交于 3月 23, 2011

The pointer to the extent buffer for the root of each tree
is protected by a spinlock so that we can safely read the pointer
and take a reference on the extent buffer.

But now that the extent buffers are freed via RCU, we can safely
use rcu_read_lock instead.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

240f62c8

eCryptfs: write lock requested keys · b5695d04

由 Roberto Sassu 提交于 3月 21, 2011

A requested key is write locked in order to prevent modifications on the
authentication token while it is being used.
Signed-off-by: NRoberto Sassu <roberto.sassu@polito.it>
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>

b5695d04

eCryptfs: move ecryptfs_find_auth_tok_for_sig() call before mutex_lock · 950983fc

由 Roberto Sassu 提交于 3月 21, 2011

The ecryptfs_find_auth_tok_for_sig() call is moved before the
mutex_lock(s->tfm_mutex) instruction in order to avoid possible deadlocks
that may occur by holding the lock on the two semaphores 'key->sem' and
's->tfm_mutex' in reverse order.
Signed-off-by: NRoberto Sassu <roberto.sassu@polito.it>
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>

950983fc

eCryptfs: verify authentication tokens before their use · 0e1fc5ef

由 Roberto Sassu 提交于 3月 21, 2011

Authentication tokens content may change if another requestor calls the
update() method of the corresponding key. The new function
ecryptfs_verify_auth_tok_from_key() retrieves the authentication token from
the provided key and verifies if it is still valid before being used to
encrypt or decrypt an eCryptfs file.
Signed-off-by: NRoberto Sassu <roberto.sassu@polito.it>
[tyhicks: Minor formatting changes]
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>

0e1fc5ef

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功