提交 · cca1c81f43e26ab60c0d1090fb90992358d69bdf · openeuler / raspberrypi-kernel

24 5月, 2011 12 次提交

Btrfs: don't try to allocate from a block group that doesn't have enough space · cca1c81f

由 Josef Bacik 提交于 5月 13, 2011

If we have a very large filesystem, we can spend a lot of time in
find_free_extent just trying to allocate from empty block groups.  So instead
check to see if the block group even has enough space for the allocation, and if
not go on to the next block group.
Signed-off-by: NJosef Bacik <josef@redhat.com>

cca1c81f

Btrfs: don't always do readahead · 026fd317

由 Josef Bacik 提交于 5月 13, 2011

Our readahead is sort of sloppy, and really isn't always needed. For example if
ls is doing a stating ls (which is the default) it's going to stat in non-disk
order, so if say you have a directory with a stupid amount of files, readahead
is going to do nothing but waste time in the case of doing the stat. Taking the
unconditional readahead out made my test go from 57 minutes to 36 minutes. This
means that everywhere we do loop through the tree we want to make sure we do set
path->reada properly, so I went through and found all of the places where we
loop through the path and set reada to 1. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

026fd317

Btrfs: try not to sleep as much when doing slow caching · 589d8ade

由 Josef Bacik 提交于 5月 11, 2011

When the fs is super full and we unmount the fs, we could get stuck in this
thing where unmount is waiting for the caching kthread to make progress and the
caching kthread keeps scheduling because we're in the middle of a commit. So
instead just let the caching kthread keep going and only yeild if
need_resched(). This makes my horrible umount case go from taking up to 10
minutes to taking less than 20 seconds. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

589d8ade

Btrfs: kill BTRFS_I(inode)->block_group · d82a6f1d

由 Josef Bacik 提交于 5月 11, 2011

Originally this was going to be used as a way to give hints to the allocator,
but frankly we can get much better hints elsewhere and it's not even used at all
for anything usefull. In addition to be completely useless, when we initialize
an inode we try and find a freeish block group to set as the inodes block group,
and with a completely full 40gb fs this takes _forever_, so I imagine with say
1tb fs this is just unbearable. So just axe the thing altoghether, we don't
need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
inode lookup in my testcase. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

d82a6f1d

Btrfs: don't look at the extent buffer level 3 times in a row · 7e2355ba

由 Josef Bacik 提交于 5月 11, 2011

We have a bit of debugging in btrfs_search_slot to make sure the level of the
cow block is the same as the original block we were cow'ing.  I don't think I've
ever seen this tripped, so kill it.  This saves us 2 kmap's per level in our
search.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7e2355ba

Btrfs: map the node block when looking for readahead targets · cb25c2ea

由 Josef Bacik 提交于 5月 11, 2011

If we have particularly full nodes, we could call btrfs_node_blockptr up to 32
times, which is 32 pairs of kmap/kunmap, which _sucks_. So go ahead and map the
extent buffer while we look for readahead targets. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

cb25c2ea

Btrfs: set range_start to the right start in count_range_bits · af60bed2

由 Josef Bacik 提交于 5月 04, 2011

In count_range_bits we are adjusting total_bytes based on the range we are
searching for, but we don't adjust the range start according to the range we are
searching for, which makes for weird results.  For example, if the range

[0-8192]

is set DELALLOC, but I search for 4096-8192, I will get back 4096 for the number
of bytes found, but the range_start will be 0, which makes it look like the
range is [0-4096].  So instead set range_start = max(cur_start, state->start).
This makes everything come out right.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

af60bed2

Btrfs: fix how we do space reservation for truncate · fcb80c2a

由 Josef Bacik 提交于 5月 03, 2011

The ceph guys keep running into problems where we have space reserved in our
orphan block rsv when freeing it up. This is because they tend to do snapshots
alot, so their truncates tend to use a bunch of space, so when we go to do
things like update the inode we have to steal reservation space in order to make
the reservation happen. This happens because truncate can use as much space as
it freaking feels like, but we still have to hold space for removing the orphan
item and updating the inode, which will definitely always happen. So in order
to fix this we need to split all of the reservation stuf up. So with this patch
we have

1) The orphan block reserve which only holds the space for deleting our orphan
item when everything is over.

2) The truncate block reserve which gets allocated and used specifically for the
space that the truncate will use on a per truncate basis.

3) The transaction will always have 1 item's worth of data reserved so we can
update the inode normally.

Hopefully this will make the ceph problem go away. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fcb80c2a

Btrfs: kill trans_mutex · a4abeea4

由 Josef Bacik 提交于 4月 11, 2011

We use trans_mutex for lots of things, here's a basic list

1) To serialize trans_handles joining the currently running transaction
2) To make sure that no new trans handles are started while we are committing
3) To protect the dead_roots list and the transaction lists

Really the serializing trans_handles joining is not too hard, and can really get
bogged down in acquiring a reference to the transaction. So replace the
trans_mutex with a trans_lock spinlock and use it to do the following

1) Protect fs_info->running_transaction. All trans handles have to do is check
this, and then take a reference of the transaction and keep on going.
2) Protect the fs_info->trans_list. This doesn't get used too much, basically
it just holds the current transactions, which will usually just be the currently
committing transaction and the currently running transaction at most.
3) Protect the dead roots list. This is only ever processed by splicing the
list so this is relatively simple.
4) Protect the fs_info->reloc_ctl stuff. This is very lightweight and was using
the trans_mutex before, so this is a pretty straightforward change.
5) Protect fs_info->no_trans_join. Because we don't hold the trans_lock over
the entirety of the commit we need to have a way to block new people from
creating a new transaction while we're doing our work. So we set no_trans_join
and in join_transaction we test to see if that is set, and if it is we do a
wait_on_commit.
6) Make the transaction use count atomic so we don't need to take locks to
modify it when we're dropping references.
7) Add a commit_lock to the transaction to make sure multiple people trying to
commit the same transaction don't race and commit at the same time.
8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
trans.

I have tested this with xfstests, but obviously it is a pretty hairy change so
lots of testing is greatly appreciated. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

a4abeea4

Btrfs: if we've already started a trans handle, use that one · 2a1eb461

由 Josef Bacik 提交于 4月 13, 2011

We currently track trans handles in current->journal_info, but we don't actually
use it. This patch fixes it. This will cover the case where we have multiple
people starting transactions down the call chain. This keeps us from having to
allocate a new handle and all of that, we just increase the use count of the
current handle, save the old block_rsv, and return. I tested this with xfstests
and it worked out fine. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2a1eb461

Btrfs: take away the num_items argument from btrfs_join_transaction · 7a7eaa40

由 Josef Bacik 提交于 4月 13, 2011

I keep forgetting that btrfs_join_transaction() just ignores the num_items
argument, which leads me to sending pointless patches and looking stupid :). So
just kill the num_items argument from btrfs_join_transaction and
btrfs_start_ioctl_transaction, since neither of them use it. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7a7eaa40

Btrfs: make sure to use the delalloc reserve when filling delalloc · 74b21075

由 Josef Bacik 提交于 4月 13, 2011

In the prealloc filling code and compressed code we don't set trans->block_rsv
to the delalloc block reserve properly, which is going to make us use metadata
from the wrong pool, this patch fixes that. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

74b21075

15 5月, 2011 5 次提交

Btrfs: fix FS_IOC_SETFLAGS ioctl · ebcb904d

由 Li Zefan 提交于 4月 15, 2011

Steps to reproduce the bug:

  - Call FS_IOC_SETLFAGS ioctl with flags=FS_COMPR_FL
  - Call FS_IOC_SETFLAGS ioctl with flags=0
  - Call FS_IOC_GETFLAGS ioctl, and you'll see FS_COMPR_FL is still set!
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ebcb904d

Btrfs: fix FS_IOC_GETFLAGS ioctl · d0092bdd

由 Li Zefan 提交于 4月 15, 2011

As we've added per file compression/cow support.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d0092bdd

fs: remove FS_COW_FL · e1e8fb6a

由 Li Zefan 提交于 4月 15, 2011

FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file
COW in btrfs, but FS_NOCOW_FL is sufficient.

The fact is we don't have corresponding BTRFS_INODE_COW flag.

COW is default, and FS_NOCOW_FL can be used to switch off COW for
a single file.

If we mount btrfs with nodatacow, a newly created file will be set with
the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the
FS_NOCOW_FL flag.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e1e8fb6a

Btrfs: fix easily get into ENOSPC in mixed case · 1aba86d6

由 liubo 提交于 4月 08, 2011

When a btrfs disk is created by mixed data & metadata option, it will have no
pure data or pure metadata space info.

In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920
(Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at
the very beginning.  The problem is this initialization does not take the mixed
case into account, which will cause btrfs will easily get into ENOSPC in mixed
case.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1aba86d6

Prevent oopsing in posix_acl_valid() · f5de9391

由 Daniel J Blueman 提交于 5月 03, 2011

If posix_acl_from_xattr() returns an error code, a negative address is
dereferenced causing an oops; fix by checking for error code first.
Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f5de9391

27 4月, 2011 1 次提交

Revert wrong fixes for common misspellings · e9c54999

由 Lucas De Marchi 提交于 4月 26, 2011

These changes were incorrectly fixed by codespell. They were now
manually corrected.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

e9c54999

26 4月, 2011 8 次提交

Btrfs: cleanup error handling in inode.c · 7cf96da3

由 Tsutomu Itoh 提交于 4月 25, 2011

The error processing of several places is changed like setting the
error number only at the error.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

7cf96da3

Btrfs: put the right bio if we have an error · 64728bbb

由 Josef Bacik 提交于 4月 25, 2011

In btrfs_submit_direct_hook if the first btrfs_map_block fails we need to put
the orig_bio, not bio.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

64728bbb

Btrfs: free bitmaps properly when evicting the cache · a4f0162f

由 Josef Bacik 提交于 4月 25, 2011

If our space cache is wrong, we do the right thing and free up everything that
we loaded, however we don't reset the total_bitmaps counter or the thresholds or
anything. So in btrfs_remove_free_space_cache make sure to call free_bitmap()
if it's a bitmap, this will keep us from panicing when we check to make sure we
don't have too many bitmaps. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a4f0162f

Btrfs: Free free_space item properly in btrfs_trim_block_group() · f789b684

由 Li Zefan 提交于 4月 25, 2011

Since commit dc89e982, we've changed
to use a specific slab for alocation of free_space items.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f789b684

D
btrfs: add missing spin_unlock to a rare exit path · cfece4db
由 David Sterba 提交于 4月 25, 2011
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
cfece4db

Btrfs: check return value of kmalloc() · 8d413713

由 Tsutomu Itoh 提交于 4月 25, 2011

The check on the return value of kmalloc() is added to some places.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8d413713

btrfs: fix wrong allocating flag when reading page · 43e817a1

由 Itaru Kitayama 提交于 4月 25, 2011

the space cache use extent_readpages() to read free space information,
so we can not use GFP_KERNEL flag to allocate memory, or it may lead
to deadlock.
Signed-off-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

43e817a1

Btrfs: fix missing mutex_unlock in btrfs_del_dir_entries_in_log() · a62f44a5

由 Tsutomu Itoh 提交于 4月 25, 2011

It is necessary to unlock mutex_lock before it return an error when
btrfs_alloc_path() fails.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a62f44a5

20 4月, 2011 1 次提交

Btrfs: do some plugging in the submit_bio threads · 211588ad

由 Chris Mason 提交于 4月 19, 2011

The Btrfs submit bio threads have a small number of
threads responsible for pushing down bios we've collected
for a large number of devices.

Since we do all the bios for a single device at once,
we want to make sure we unplug and send down the bios
for each device as we're done processing them.

The new plugging API removed the btrfs code to
unplug while processing bios, this adds it back with
the new API.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

211588ad

18 4月, 2011 1 次提交

Btrfs: fix free space cache leak · f65647c2

由 Chris Mason 提交于 4月 18, 2011

The free space caching code was recently reworked to
cache all the pages it needed instead of using find_get_page everywhere.

One loop was missed though, so it ended up leaking pages.  This fixes
it to use our page array instead of find_get_page.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f65647c2

16 4月, 2011 3 次提交

Btrfs: avoid taking the chunk_mutex in do_chunk_alloc · 6d74119f

由 Josef Bacik 提交于 4月 11, 2011

Everytime we try to allocate disk space we try and see if we can pre-emptively
allocate a chunk, but in the common case we don't allocate anything, so there is
no sense in taking the chunk_mutex at all. So instead if we are allocating a
chunk, mark it in the space_info so we don't get two people trying to allocate
at the same time. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Reviewed-by: NLiu Bo <liubo2009@cn.fujitsu.com>

6d74119f

Btrfs end_bio_extent_readpage should look for locked bits · 0d399205

由 Chris Mason 提交于 4月 16, 2011

A recent commit caches the extent state in end_bio_extent_readpage,
but the search it does should look for locked extents.  This
fixes things to make it more effective.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0d399205

Btrfs: don't force chunk allocation in find_free_extent · 0e4f8f88

由 Chris Mason 提交于 4月 15, 2011

find_free_extent likes to allocate in contiguous clusters,
which makes writeback faster, especially on SSD storage.  As
the FS fragments, these clusters become harder to find and we have
to decide between allocating a new chunk to make more clusters
or giving up on the cluster to allocate from the free space
we have.

Right now it creates too many chunks, and you can end up with
a whole FS that is mostly empty metadata chunks.  This commit
changes the allocation code to be more strict and only
allocate new chunks when we've made good use of the chunks we
already have.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0e4f8f88

13 4月, 2011 5 次提交

Btrfs: Check validity before setting an acl · 329c5056

由 Miao Xie 提交于 4月 13, 2011

Call posix_acl_valid() to check if an acl is valid or not.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

329c5056

Btrfs: Fix incorrect inode nlink in btrfs_link() · 3153495d

由 Miao Xie 提交于 4月 13, 2011

Link count of the inode is not decreased if btrfs_set_inode_index()
fails.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Singed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

3153495d

Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir() · b9e03af0

由 Li Zefan 提交于 3月 23, 2011

btrfs_next_leaf() can return -errno, and we should propagate
it to userspace.

This also simplifies how we walk the btree path.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

b9e03af0

Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr() · 2e6a0035

由 Li Zefan 提交于 3月 17, 2011

btrfs_next_leaf() can return -errno, and we should propagate
it to userspace.

This also simplifies how we walk the btree path.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

2e6a0035

Btrfs: make uncache_state unconditional · 109b36a2

由 Chris Mason 提交于 4月 12, 2011

The extent_io code can take cached pointers into the extent state trees,
and these can make lookups much faster in common operations.  The
caching only happens when specific bits are set that prevent merging
and splitting of the extent state.

A help function was added to uncache the state, and it was testing
the same set of conditionals.  This can leak in very strange corner
cases where the lock bit goes away unexpectedly.

The uncaching should be unconditional.  Once we have a ref on the
extent we should always give it up.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

109b36a2

12 4月, 2011 4 次提交

btrfs: using cached extent_state in set/unlock combinations · 507903b8

由 Arne Jansen 提交于 4月 06, 2011

In several places the sequence (set_extent_uptodate, unlock_extent) is used.
This leads to a duplicate lookup of the extent state. This patch lets
set_extent_uptodate return a cached extent_state which can be passed to
unlock_extent_cached.
The occurences of the above sequences are updated to use the cache. Only
end_bio_extent_readpage is updated that it first gets a cached state to
pass it to the readpage_end_io_hook as the prototype requested and is later
on being used for set/unlock.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

507903b8

Btrfs: avoid taking the trans_mutex in btrfs_end_transaction · 13c5a93e

由 Josef Bacik 提交于 4月 11, 2011

I've been working on making our O_DIRECT latency not suck and I noticed we were
taking the trans_mutex in btrfs_end_transaction. So to do this we convert
num_writers and use_count to atomic_t's and just decrement them in
btrfs_end_transaction. Instead of deleting the transaction from the trans list
in put_transaction we do that in btrfs_commit_transaction() since that's the
only time it actually needs to be removed from the list. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

13c5a93e

Btrfs: fix subvolume mount by name problem when default mount subvolume is set · e15d0542

由 Xin Zhong 提交于 4月 06, 2011

We create two subvolumes (meego_root and meego_home) in
btrfs root directory. And set meego_root as default mount
subvolume. After we remount btrfs, meego_root is mounted
to top directory by default. Then when we try to mount
meego_home (subvol=meego_home) to a subdirectory, it failed.
The problem is when default mount subvolume is set to
meego_root, we search meego_home in meego_root but can not find
it. So the solution is to add a new mount option (subvolrootid)
to specify subvol id of root and search subvol name in it. For
our case, now we can use "-o subvolrootid=0,subvol=meego_home)
to mount meego_home.

Detail information can be found in meego bugzilla:
https://bugs.meego.com/show_bug.cgi?id=15055Signed-off-by: NZhong, Xin <xin.zhong@intel.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e15d0542

fix user annotation in ioctl.c · 13f2696f

由 Daniel J Blueman 提交于 4月 11, 2011

Fix address space annotation correct in ioctl.c.
Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com>

 		       BTRFS_BLOCK_GROUP_SYSTEM,
@@ -2387,7 +2387,7 @@ long btrfs_ioctl_space_info(struct btrfs_root
*root, void __user *arg)
 		up_read(&info->groups_sem);
 	}

-	user_dest = (struct btrfs_ioctl_space_info *)
+	user_dest = (struct btrfs_ioctl_space_info __user *)
 		(arg + sizeof(struct btrfs_ioctl_space_args));

 	if (copy_to_user(user_dest, dest_orig, alloc_size))
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

13f2696f