提交 · 5f5bc6b1e2d5a6f827bc860ef2dc5b6f365d1339 · openeuler / Kernel

21 11月, 2014 1 次提交

Btrfs: make xattr replace operations atomic · 5f5bc6b1

由 Filipe Manana 提交于 11月 09, 2014

Replacing a xattr consists of doing a lookup for its existing value, delete
the current value from the respective leaf, release the search path and then
finally insert the new value. This leaves a time window where readers (getxattr,
listxattrs) won't see any value for the xattr. Xattrs are used to store ACLs,
so this has security implications.

This change also fixes 2 other existing issues which were:

*) Deleting the old xattr value without verifying first if the new xattr will
   fit in the existing leaf item (in case multiple xattrs are packed in the
   same item due to name hash collision);

*) Returning -EEXIST when the flag XATTR_CREATE is given and the xattr doesn't
   exist but we have have an existing item that packs muliple xattrs with
   the same name hash as the input xattr. In this case we should return ENOSPC.

A test case for xfstests follows soon.

Thanks to Alexandre Oliva for reporting the non-atomicity of the xattr replace
implementation.
Reported-by: NAlexandre Oliva <oliva@gnu.org>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

5f5bc6b1

18 9月, 2014 1 次提交

btrfs: kill the key type accessor helpers · 962a298f

由 David Sterba 提交于 6月 04, 2014

btrfs_set_key_type and btrfs_key_type are used inconsistently along with
open coded variants. Other members of btrfs_key are accessed directly
without any helpers anyway.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

962a298f

29 1月, 2014 2 次提交

Btrfs: convert printk to btrfs_ and fix BTRFS prefix · efe120a0

由 Frank Holton 提交于 12月 20, 2013

Convert all applicable cases of printk and pr_* to the btrfs_* macros.

Fix all uses of the BTRFS prefix.
Signed-off-by: NFrank Holton <fholton@gmail.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

efe120a0

Btrfs: fix max dir item size calculation · 878f2d2c

由 Filipe David Borba Manana 提交于 11月 27, 2013

We were accounting for sizeof(struct btrfs_item) twice, once
in the data_size variable and another time in the if statement
below.
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

878f2d2c

12 11月, 2013 2 次提交

Btrfs: fix verification of dir_item · e46f5388

由 Filipe David Borba Manana 提交于 10月 28, 2013

We were ignoring the name component of the dir_item. Both the
name and data must fit within BTRFS_MAX_XATTR_SIZE(root).
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

e46f5388

btrfs: drop unused parameter from btrfs_item_nr · dd3cc16b

由 Ross Kirk 提交于 9月 16, 2013

Remove unused eb parameter from btrfs_item_nr
Signed-off-by: NRoss Kirk <ross.kirk@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

dd3cc16b

07 5月, 2013 3 次提交

btrfs: make static code static & remove dead code · 48a3b636

由 Eric Sandeen 提交于 4月 25, 2013

Big patch, but all it does is add statics to functions which
are in fact static, then remove the associated dead-code fallout.

removed functions:

btrfs_iref_to_path()
__btrfs_lookup_delayed_deletion_item()
__btrfs_search_delayed_insertion_item()
__btrfs_search_delayed_deletion_item()
find_eb_for_page()
btrfs_find_block_group()
range_straddles_pages()
extent_range_uptodate()
btrfs_file_extent_length()
btrfs_scrub_cancel_devid()
btrfs_start_transaction_lflush()

btrfs_print_tree() is left because it is used for debugging.
btrfs_start_transaction_lflush() and btrfs_reada_detach() are
left for symmetry.

ulist.c functions are left, another patch will take care of those.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

48a3b636

Btrfs: remove unused argument of btrfs_extend_item() · 4b90c680

由 Tsutomu Itoh 提交于 4月 16, 2013

Argument 'trans' is not used in btrfs_extend_item().
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

4b90c680

Btrfs: cleanup of function where fixup_low_keys() is called · afe5fea7

由 Tsutomu Itoh 提交于 4月 16, 2013

If argument 'trans' is unnecessary in the function where
fixup_low_keys() is called, 'trans' is deleted.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

afe5fea7

18 12月, 2012 1 次提交

Btrfs: fix hash overflow handling · 9c52057c

由 Chris Mason 提交于 12月 17, 2012

The handling for directory crc hash overflows was fairly obscure,
split_leaf returns EOVERFLOW when we try to extend the item and that is
supposed to bubble up to userland.  For a while it did so, but along the
way we added better handling of errors and forced the FS readonly if we
hit IO errors during the directory insertion.

Along the way, we started testing only for EEXIST and the EOVERFLOW case
was dropped.  The end result is that we may force the FS readonly if we
catch a directory hash bucket overflow.

This fixes a few problem spots.  First I add tests for EOVERFLOW in the
places where we can safely just return the error up the chain.

btrfs_rename is harder though, because it tries to insert the new
directory item only after it has already unlinked anything the rename
was going to overwrite.  Rather than adding very complex logic, I added
a helper to test for the hash overflow case early while it is still safe
to bail out.

Snapshot and subvolume creation had a similar problem, so they are using
the new helper now too.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
Reported-by: NPascal Junod <pascal@junod.info>

9c52057c

22 3月, 2012 2 次提交

btrfs: replace many BUG_ONs with proper error handling · 79787eaa

由 Jeff Mahoney 提交于 3月 12, 2012

 btrfs currently handles most errors with BUG_ON. This patch is a work-in-
 progress but aims to handle most errors other than internal logic
 errors and ENOMEM more gracefully.

 This iteration prevents most crashes but can run into lockups with
 the page lock on occasion when the timing "works out."
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

79787eaa

J
btrfs: return void in functions without error conditions · 143bede5
由 Jeff Mahoney 提交于 3月 01, 2012
```
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
```
143bede5

02 8月, 2011 1 次提交

Btrfs: remove redundant code for dir item lookup · 85d85a74

由 Li Zefan 提交于 7月 14, 2011

When we search a dir item with a specific hash code, we can
just return NULL without further checking if btrfs_search_slot()
returns 1.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

85d85a74

11 7月, 2011 1 次提交

Btrfs: try to only do one btrfs_search_slot in do_setxattr · fa09200b

由 Josef Bacik 提交于 5月 27, 2011

I've been watching how many btrfs_search_slot()'s we do and I noticed that when
we create a file with selinux enabled we were doing 2 each time we initialize
the security context. That's because we lookup the xattr first so we can delete
it if we're setting a new value to an existing xattr. But in the create case we
don't have any xattrs, so it is completely useless to have the extra lookup. So
re-arrange things so that we only lookup first if we specifically have
XATTR_REPLACE. That way in the basic case we only do 1 search, and in the more
complicated case we do the normal 2 lookups. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fa09200b

24 5月, 2011 2 次提交

Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item · 1cd30799

由 Tsutomu Itoh 提交于 5月 19, 2011

Currently, btrfs_truncate_item and btrfs_extend_item returns only 0.
So, the check by BUG_ON in the caller is unnecessary.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1cd30799

btrfs: typo: 'btrfS' -> 'btrfs' · 9694b3fc

由 Sergei Trofimovich 提交于 5月 20, 2011

Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9694b3fc

22 5月, 2011 1 次提交
- C
  Btrfs: update the delayed inode code to use the btrfs_ino helper. · 0d0ca30f
  由 Chris Mason 提交于 5月 22, 2011
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
  0d0ca30f
21 5月, 2011 1 次提交

btrfs: implement delayed inode items operation · 16cdcec7

由 Miao Xie 提交于 4月 22, 2011

Changelog V5 -> V6:
- Fix oom when the memory load is high, by storing the delayed nodes into the
  root's radix tree, and letting btrfs inodes go.

Changelog V4 -> V5:
- Fix the race on adding the delayed node to the inode, which is spotted by
  Chris Mason.
- Merge Chris Mason's incremental patch into this patch.
- Fix deadlock between readdir() and memory fault, which is reported by
  Itaru Kitayama.

Changelog V3 -> V4:
- Fix nested lock, which is reported by Itaru Kitayama, by updating space cache
  inode in time.

Changelog V2 -> V3:
- Fix the race between the delayed worker and the task which does delayed items
  balance, which is reported by Tsutomu Itoh.
- Modify the patch address David Sterba's comment.
- Fix the bug of the cpu recursion spinlock, reported by Chris Mason

Changelog V1 -> V2:
- break up the global rb-tree, use a list to manage the delayed nodes,
  which is created for every directory and file, and used to manage the
  delayed directory name index items and the delayed inode item.
- introduce a worker to deal with the delayed nodes.

Compare with Ext3/4, the performance of file creation and deletion on btrfs
is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
such as inode item, directory name item, directory name index and so on.

If we can do some delayed b+ tree insertion or deletion, we can improve the
performance, so we made this patch which implemented delayed directory name
index insertion/deletion and delayed inode update.

Implementation:
- introduce a delayed root object into the filesystem, that use two lists to
  manage the delayed nodes which are created for every file/directory.
  One is used to manage all the delayed nodes that have delayed items. And the
  other is used to manage the delayed nodes which is waiting to be dealt with
  by the work thread.
- Every delayed node has two rb-tree, one is used to manage the directory name
  index which is going to be inserted into b+ tree, and the other is used to
  manage the directory name index which is going to be deleted from b+ tree.
- introduce a worker to deal with the delayed operation. This worker is used
  to deal with the works of the delayed directory name index items insertion
  and deletion and the delayed inode update.
  When the delayed items is beyond the lower limit, we create works for some
  delayed nodes and insert them into the work queue of the worker, and then
  go back.
  When the delayed items is beyond the upper bound, we create works for all
  the delayed nodes that haven't been dealt with, and insert them into the work
  queue of the worker, and then wait for that the untreated items is below some
  threshold value.
- When we want to insert a directory name index into b+ tree, we just add the
  information into the delayed inserting rb-tree.
  And then we check the number of the delayed items and do delayed items
  balance. (The balance policy is above.)
- When we want to delete a directory name index from the b+ tree, we search it
  in the inserting rb-tree at first. If we look it up, just drop it. If not,
  add the key of it into the delayed deleting rb-tree.
  Similar to the delayed inserting rb-tree, we also check the number of the
  delayed items and do delayed items balance.
  (The same to inserting manipulation)
- When we want to update the metadata of some inode, we cached the data of the
  inode into the delayed node. the worker will flush it into the b+ tree after
  dealing with the delayed insertion and deletion.
- We will move the delayed node to the tail of the list after we access the
  delayed node, By this way, we can cache more delayed items and merge more
  inode updates.
- If we want to commit transaction, we will deal with all the delayed node.
- the delayed node will be freed when we free the btrfs inode.
- Before we log the inode items, we commit all the directory name index items
  and the delayed inode update.

I did a quick test by the benchmark tool[1] and found we can improve the
performance of file creation by ~15%, and file deletion by ~20%.

Before applying this patch:
Create files:
        Total files: 50000
        Total time: 1.096108
        Average time: 0.000022
Delete files:
        Total files: 50000
        Total time: 1.510403
        Average time: 0.000030

After applying this patch:
Create files:
        Total files: 50000
        Total time: 0.932899
        Average time: 0.000019
Delete files:
        Total files: 50000
        Total time: 1.215732
        Average time: 0.000024

[1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3

Many thanks for Kitayama-san's help!
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Tested-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

16cdcec7

02 5月, 2011 1 次提交

btrfs: drop unused parameter from btrfs_release_path · b3b4aa74

由 David Sterba 提交于 4月 21, 2011

parameter tree root it's not used since commit
5f39d397 ("Btrfs: Create extent_buffer
interface for large blocksizes")
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

b3b4aa74

28 3月, 2011 1 次提交

Btrfs: check return value of btrfs_alloc_path() · c2db1073

由 Tsutomu Itoh 提交于 3月 01, 2011

Adding the check on the return value of btrfs_alloc_path() to several places.
And, some of callers are modified by this change.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c2db1073

18 3月, 2011 1 次提交

Btrfs: add checks to verify dir items are correct · 22a94d44

由 Josef Bacik 提交于 3月 16, 2011

We need to make sure the dir items we get are valid dir items.  So any time we
try and read one check it with verify_dir_item, which will do various sanity
checks to make sure it looks sane.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

22a94d44

30 10月, 2010 1 次提交

Btrfs: Fix variables set but not read (bugs found by gcc 4.6) · 411fc6bc

由 Andi Kleen 提交于 10月 29, 2010

These are all the cases where a variable is set, but not
read which are really bugs.

- Couple of incorrect error handling fixed.
- One incorrect use of a allocation policy
- Some other things

Still needs more review.

Found by gcc 4.6's new warnings.

[akpm@linux-foundation.org: fix build.  Might have been bitrot]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

411fc6bc

18 12月, 2009 1 次提交

Btrfs: Pass transaction handle to security and ACL initialization functions · f34f57a3

由 Yan, Zheng 提交于 11月 12, 2009

Pass transaction handle down to security and ACL initialization
functions, so we can avoid starting nested transactions
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f34f57a3

22 9月, 2009 1 次提交

Btrfs: change how subvolumes are organized · 4df27c4d

由 Yan, Zheng 提交于 9月 21, 2009

btrfs allows subvolumes and snapshots anywhere in the directory tree.
If we snapshot a subvolume that contains a link to other subvolume
called subvolA, subvolA can be accessed through both the original
subvolume and the snapshot. This is similar to creating hard link to
directory, and has the very similar problems.

The aim of this patch is enforcing there is only one access point to
each subvolume. Only the first directory entry (the one added when
the subvolume/snapshot was created) is treated as valid access point.
The first directory entry is distinguished by checking root forward
reference. If the corresponding root forward reference is missing,
we know the entry is not the first one.

This patch also adds snapshot/subvolume rename support, the code
allows rename subvolume link across subvolumes.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4df27c4d

25 3月, 2009 1 次提交

Btrfs: leave btree locks spinning more often · b9473439

由 Chris Mason 提交于 3月 13, 2009

btrfs_mark_buffer dirty would set dirty bits in the extent_io tree
for the buffers it was dirtying. This may require a kmalloc and it
was not atomic. So, anyone who called btrfs_mark_buffer_dirty had to
set any btree locks they were holding to blocking first.

This commit changes dirty tracking for extent buffers to just use a flag
in the extent buffer. Now that we have one and only one extent buffer
per page, this can be safely done without losing dirty bits along the way.

This also introduces a path->leave_spinning flag that callers of
btrfs_search_slot can use to indicate they will properly deal with a
path returned where all the locks are spinning instead of blocking.

Many of the btree search callers now expect spinning paths,
resulting in better btree concurrency overall.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b9473439

06 1月, 2009 1 次提交

Btrfs: Fix checkpatch.pl warnings · d397712b

由 Chris Mason 提交于 1月 05, 2009

There were many, most are fixed now.  struct-funcs.c generates some warnings
but these are bogus.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d397712b

30 9月, 2008 1 次提交

Btrfs: add and improve comments · d352ac68

由 Chris Mason 提交于 9月 29, 2008

This improves the comments at the top of many functions.  It didn't
dive into the guts of functions because I was trying to
avoid merging problems with the new allocator and back reference work.

extent-tree.c and volumes.c were both skipped, and there is definitely
more work todo in cleaning and commenting the code.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d352ac68

25 9月, 2008 8 次提交

Btrfs: Add a write ahead tree log to optimize synchronous operations · e02119d5

由 Chris Mason 提交于 9月 05, 2008

File syncs and directory syncs are optimized by copying their
items into a special (copy-on-write) log tree. There is one log tree per
subvolume and the btrfs super block points to a tree of log tree roots.

After a crash, items are copied out of the log tree and back into the
subvolume. See tree-log.c for all the details.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e02119d5

Btrfs: implement memory reclaim for leaf reference cache · bcc63abb

由 Yan 提交于 7月 30, 2008

The memory reclaiming issue happens when snapshot exists. In that
case, some cache entries may not be used during old snapshot dropping,
so they will remain in the cache until umount.

The patch adds a field to struct btrfs_leaf_ref to record create time. Besides,
the patch makes all dead roots of a given snapshot linked together in order of
create time. After a old snapshot was completely dropped, we check the dead
root list and remove all cache entries created before the oldest dead root in
the list.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bcc63abb

J
Btrfs: Implement new dir index format · aec7477b
由 Josef Bacik 提交于 7月 24, 2008
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
aec7477b

Btrfs: unaligned access fixes · df68b8a7

由 David Miller 提交于 2月 15, 2008

Btrfs set/get macros lose type information needed to avoid
unaligned accesses on sparc64.
ere is a patch for the kernel bits which fixes most of the
unaligned accesses on sparc64.

btrfs_name_hash is modified to return the hash value instead
of getting a return location via a (potentially unaligned)
pointer.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

df68b8a7

Btrfs: Implement ACLs setting and getting · 744f52f9

由 Yan 提交于 1月 14, 2008

ACLs are stored but not used for permission checks (yet)
Signed-off-by: NChris Mason <chris.mason@oracle.com>

744f52f9

J
xattr support for btrfs · 5103e947
由 Josef Bacik 提交于 11月 16, 2007
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
5103e947

Btrfs: Fix a number of inline extent problems that Yan Zheng reported. · 179e29e4

由 Chris Mason 提交于 11月 01, 2007

The fixes do a number of things:

1) Most btrfs_drop_extent callers will try to leave the inline extents in
place.  It can truncate bytes off the beginning of the inline extent if
required.

2) writepage can now update the inline extent, allowing mmap writes to
go directly into the inline extent.

3) btrfs_truncate_in_transaction truncates inline extents

4) extent_map.c fixed to not merge inline extent mappings and hole
mappings together
Signed-off-by: NChris Mason <chris.mason@oracle.com>

179e29e4

C
Btrfs: Create extent_buffer interface for large blocksizes · 5f39d397
由 Chris Mason 提交于 10月 15, 2007
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
5f39d397

11 7月, 2007 1 次提交

Btrfs: trivial include fixups · ec6b910f

由 Zach Brown 提交于 7月 11, 2007

Almost none of the files including module.h need to do so,
remove them.

Include sched.h in extent-tree.c to silence a warning about cond_resched()
being undeclared.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ec6b910f

23 6月, 2007 1 次提交
- C
  Btrfs: Audit callers and return codes to make sure -ENOSPC gets up the stack · 54aa1f4d
  由 Chris Mason 提交于 6月 22, 2007
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
  54aa1f4d
14 6月, 2007 1 次提交

btrfs: Code cleanup · f1ace244

由 Aneesh 提交于 6月 13, 2007

Attaching below is some of the code cleanups that i came across while
reading the code.

a) alloc_path already calls init_path.
b) Mention that btrfs_inode is the in memory copy.Ext4 have ext4_inode_info as
the in memory copy ext4_inode as the disk copy
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f1ace244

12 6月, 2007 1 次提交
- C
  Btrfs: add GPLv2 · 6cbd5570
  由 Chris Mason 提交于 6月 12, 2007
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
  6cbd5570
24 5月, 2007 1 次提交
- C
  Btrfs: rename · e06afa83
  由 Chris Mason 提交于 5月 23, 2007
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
  e06afa83

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功