提交 · be1a12a0dfed06cf1e62e35bf91620dc610a451a · openeuler / raspberrypi-kernel

09 4月, 2011 1 次提交

Btrfs: deal with the case that we run out of space in the cache · be1a12a0

由 Josef Bacik 提交于 4月 06, 2011

Currently we don't handle running out of space in the cache, so to fix this we
keep track of how far in the cache we are.  Then we only dirty the pages if we
successfully modify all of them, otherwise if we have an error or run out of
space we can just drop them and not worry about the vm writing them out.
Thanks,

Tested-by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Signed-off-by: NJosef Bacik <josef@redhat.com>

be1a12a0

05 4月, 2011 1 次提交

Btrfs: Fix uninitialized root flags for subvolumes · 08fe4db1

由 Li Zefan 提交于 3月 28, 2011

root_item->flags and root_item->byte_limit are not initialized when
a subvolume is created. This bug is not revealed until we added
readonly snapshot support - now you mount a btrfs filesystem and you
may find the subvolumes in it are readonly.

To work around this problem, we steal a bit from root_item->inode_item->flags,
and use it to indicate if those fields have been properly initialized.
When we read a tree root from disk, we check if the bit is set, and if
not we'll set the flag and initialize the two fields of the root item.
Reported-by: NAndreas Philipp <philipp.andreas@gmail.com>
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Tested-by: NAndreas Philipp <philipp.andreas@gmail.com>
cc: stable@kernel.org
Signed-off-by: NChris Mason <chris.mason@oracle.com>

08fe4db1

28 3月, 2011 6 次提交

Btrfs: fix OOPS of empty filesystem after balance · c59021f8

由 liubo 提交于 3月 07, 2011

btrfs will remove unused block groups after balance.
When a empty filesystem is balanced, the block group with tag "DATA" may be
dropped, and after umount and mount again, it will not find "DATA" space_info
and lead to OOPS.
So we initial the necessary space_infos(DATA, SYSTEM, METADATA) to avoid OOPS.
Reported-by: NDaniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c59021f8

Btrfs: add btrfs_trim_fs() to handle FITRIM · f7039b1d

由 Li Dongyang 提交于 3月 24, 2011

We take an free extent out from allocator, trim it, then put it back,
but before we trim the block group, we should make sure the block group is
cached, so plus a little change to make cache_block_group() run without a
transaction.
Signed-off-by: NLi Dongyang <lidongyang@novell.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f7039b1d

Btrfs: adjust btrfs_discard_extent() return errors and trimmed bytes · 5378e607

由 Li Dongyang 提交于 3月 24, 2011

Callers of btrfs_discard_extent() should check if we are mounted with -o discard,
as we want to make fitrim to work even the fs is not mounted with -o discard.
Also we should use REQ_DISCARD to map the free extent to get a full mapping,
last we only return errors if
1. the error is not a EOPNOTSUPP
2. no device supports discard
Signed-off-by: NLi Dongyang <lidongyang@novell.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

5378e607

Btrfs: make update_reserved_bytes() public · b4d00d56

由 Li Dongyang 提交于 3月 24, 2011

Make the function public as we should update the reserved extents calculations
after taking out an extent for trimming.
Signed-off-by: NLi Dongyang <lidongyang@novell.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b4d00d56

Btrfs: Per file/directory controls for COW and compression · 75e7cb7f

由 Liu Bo 提交于 3月 22, 2011

Data compression and data cow are controlled across the entire FS by mount
options right now.  ioctls are needed to set this on a per file or per
directory basis.  This has been proposed previously, but VFS developers
wanted us to use generic ioctls rather than btrfs-specific ones.

According to Chris's comment, there should be just one true compression
method(probably LZO) stored in the super.  However, before this, we would
wait for that one method is stable enough to be adopted into the super.
So I list it as a long term goal, and just store it in ram today.

After applying this patch, we can use the generic "FS_IOC_SETFLAGS" ioctl to
control file and directory's datacow and compression attribute.

NOTE:
 - The compression type is selected by such rules:
   If we mount btrfs with compress options, ie, zlib/lzo, the type is it.
   Otherwise, we'll use the default compress type (zlib today).

v1->v2:
- rebase to the latest btrfs.
v2->v3:
- fix a problem, i.e. when a file is set NOCOW via mount option, then this NOCOW
  will be screwed by inheritance from parent directory.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

75e7cb7f

Btrfs: add initial tracepoint support for btrfs · 1abe9b8a

由 liubo 提交于 3月 24, 2011

Tracepoints can provide insight into why btrfs hits bugs and be greatly
helpful for debugging, e.g
              dd-7822  [000]  2121.641088: btrfs_inode_request: root = 5(FS_TREE), gen = 4, ino = 256, blocks = 8, disk_i_size = 0, last_trans = 8, logged_trans = 0
              dd-7822  [000]  2121.641100: btrfs_inode_new: root = 5(FS_TREE), gen = 8, ino = 257, blocks = 0, disk_i_size = 0, last_trans = 0, logged_trans = 0
 btrfs-transacti-7804  [001]  2146.935420: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29368320 (orig_level = 0), cow_buf = 29388800 (cow_level = 0)
 btrfs-transacti-7804  [001]  2146.935473: btrfs_cow_block: root = 1(ROOT_TREE), refs = 2, orig_buf = 29364224 (orig_level = 0), cow_buf = 29392896 (cow_level = 0)
 btrfs-transacti-7804  [001]  2146.972221: btrfs_transaction_commit: root = 1(ROOT_TREE), gen = 8
   flush-btrfs-2-7821  [001]  2155.824210: btrfs_chunk_alloc: root = 3(CHUNK_TREE), offset = 1103101952, size = 1073741824, num_stripes = 1, sub_stripes = 0, type = DATA
   flush-btrfs-2-7821  [001]  2155.824241: btrfs_cow_block: root = 2(EXTENT_TREE), refs = 2, orig_buf = 29388800 (orig_level = 0), cow_buf = 29396992 (cow_level = 0)
   flush-btrfs-2-7821  [001]  2155.824255: btrfs_cow_block: root = 4(DEV_TREE), refs = 2, orig_buf = 29372416 (orig_level = 0), cow_buf = 29401088 (cow_level = 0)
   flush-btrfs-2-7821  [000]  2155.824329: btrfs_cow_block: root = 3(CHUNK_TREE), refs = 2, orig_buf = 20971520 (orig_level = 0), cow_buf = 20975616 (cow_level = 0)
 btrfs-endio-wri-7800  [001]  2155.898019: btrfs_cow_block: root = 5(FS_TREE), refs = 2, orig_buf = 29384704 (orig_level = 0), cow_buf = 29405184 (cow_level = 0)
 btrfs-endio-wri-7800  [001]  2155.898043: btrfs_cow_block: root = 7(CSUM_TREE), refs = 2, orig_buf = 29376512 (orig_level = 0), cow_buf = 29409280 (cow_level = 0)

Here is what I have added:

1) ordere_extent:
        btrfs_ordered_extent_add
        btrfs_ordered_extent_remove
        btrfs_ordered_extent_start
        btrfs_ordered_extent_put

These provide critical information to understand how ordered_extents are
updated.

2) extent_map:
        btrfs_get_extent

extent_map is used in both read and write cases, and it is useful for tracking
how btrfs specific IO is running.

3) writepage:
        __extent_writepage
        btrfs_writepage_end_io_hook

Pages are cirtical resourses and produce a lot of corner cases during writeback,
so it is valuable to know how page is written to disk.

4) inode:
        btrfs_inode_new
        btrfs_inode_request
        btrfs_inode_evict

These can show where and when a inode is created, when a inode is evicted.

5) sync:
        btrfs_sync_file
        btrfs_sync_fs

These show sync arguments.

6) transaction:
        btrfs_transaction_commit

In transaction based filesystem, it will be useful to know the generation and
who does commit.

7) back reference and cow:
	btrfs_delayed_tree_ref
	btrfs_delayed_data_ref
	btrfs_delayed_ref_head
	btrfs_cow_block

Btrfs natively supports back references, these tracepoints are helpful on
understanding btrfs's COW mechanism.

8) chunk:
	btrfs_chunk_alloc
	btrfs_chunk_free

Chunk is a link between physical offset and logical offset, and stands for space
infomation in btrfs, and these are helpful on tracing space things.

9) reserved_extent:
	btrfs_reserved_extent_alloc
	btrfs_reserved_extent_free

These can show how btrfs uses its space.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1abe9b8a

26 3月, 2011 1 次提交

Btrfs: cleanup how we setup free space clusters · 4e69b598

由 Josef Bacik 提交于 3月 21, 2011

This patch makes the free space cluster refilling code a little easier to
understand, and fixes some things with the bitmap part of it. Currently we
either want to refill a cluster with

1) All normal extent entries (those without bitmaps)
2) A bitmap entry with enough space

The current code has this ugly jump around logic that will first try and fill up
the cluster with extent entries and then if it can't do that it will try and
find a bitmap to use. So instead split this out into two functions, one that
tries to find only normal entries, and one that tries to find bitmaps.

This also fixes a suboptimal thing we would do with bitmaps. If we used a
bitmap we would just tell the cluster that we were pointing at a bitmap and it
would do the tree search in the block group for that entry every time we tried
to make an allocation. Instead of doing that now we just add it to the clusters
group.

I tested this with my ENOSPC tests and xfstests and it survived.
Signed-off-by: NJosef Bacik <josef@redhat.com>

4e69b598

18 3月, 2011 4 次提交

Btrfs: add checks to verify dir items are correct · 22a94d44

由 Josef Bacik 提交于 3月 16, 2011

We need to make sure the dir items we get are valid dir items.  So any time we
try and read one check it with verify_dir_item, which will do various sanity
checks to make sure it looks sane.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

22a94d44

Btrfs: handle errors in btrfs_orphan_cleanup · 66b4ffd1

由 Josef Bacik 提交于 1月 31, 2011

If we cannot truncate an inode for some reason we will never delete the orphan
item associated with that inode, which means that we will loop forever in
btrfs_orphan_cleanup. Instead of doing this just return error so we fail to
mount. It sucks, but hey it's better than hanging. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

66b4ffd1

Btrfs: convert to the new truncate sequence · a41ad394

由 Josef Bacik 提交于 1月 31, 2011

->truncate() is going away, instead all of the work needs to be done in
->setattr().  So this converts us over to do this.  It's fairly straightforward,
just get rid of our .truncate inode operation and call btrfs_truncate() directly
from btrfs_setsize.  This works out better for us since truncate can technically
return ENOSPC, and before we had no way of letting anybody know.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

a41ad394

Btrfs: use a slab for the free space entries · dc89e982

由 Josef Bacik 提交于 1月 28, 2011

Since we alloc/free free space entries a whole lot, lets use a slab to keep
track of them. This makes some of my tests slightly faster. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

dc89e982

12 3月, 2011 1 次提交

Btrfs: break out of shrink_delalloc earlier · 36e39c40

由 Chris Mason 提交于 3月 12, 2011

Josef had changed shrink_delalloc to exit after three shrink
attempts, which wasn't quite enough because new writers could
race in and steal free space.

But it also fixed deadlocks and stalls as we tried to recover
delalloc reservations.  The code was tweaked to loop 1024
times, and would reset the counter any time a small amount
of progress was made.  This was too drastic, and with a
lot of writers we can end up stuck in shrink_delalloc forever.

The shrink_delalloc loop is fairly complex because the caller is looping
too, and the caller will go ahead and force a transaction commit to make
sure we reclaim space.

This reworks things to exit shrink_delalloc when we've forced some
writeback and the delalloc reservations have gone down.  This means
the writeback has not just started but has also finished at
least some of the metadata changes required to reclaim delalloc
space.

If we've got this wrong, we're returning ENOSPC too early, which
is a big improvement over the current behavior of hanging the machine.

Test 224 in xfstests hammers on this nicely, and with 1000 writers
trying to fill a 1GB drive we get our first ENOSPC at 93% full.  The
other writers are able to continue until we get 100%.

This is a worst case test for btrfs because the 1000 writers are doing
small IO, and the small FS size means we don't have a lot of room
for metadata chunks.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

36e39c40

17 2月, 2011 2 次提交

Btrfs: allow balance to explicitly allocate chunks as it relocates · c87f08ca

由 Chris Mason 提交于 2月 16, 2011

Btrfs device shrinking and balancing ends up reallocating all the blocks
in order to allow COW to move them to new destinations. It is somewhat
awkward in terms of ENOSPC because most of the enospc code is built
around the idea that some operation on a reference counted tree triggers
allocations in the non-reference counted trees.

This commit changes the balancing code to deal with enospc by trying to
allocate a new chunk. If that allocation succeeds, we go ahead and
retry whatever failed due to enospc.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c87f08ca

Btrfs: put ENOSPC debugging under a mount option · 91435650

由 Chris Mason 提交于 2月 16, 2011

ENOSPC in btrfs is getting to the point where the extra debugging isn't
required.  I've put it under mount -o enospc_debug just in case someone
is having difficult problems.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

91435650

18 1月, 2011 1 次提交

Btrfs: forced readonly mounts on errors · acce952b

由 liubo 提交于 1月 06, 2011

This patch comes from "Forced readonly mounts on errors" ideas.

As we know, this is the first step in being more fault tolerant of disk
corruptions instead of just using BUG() statements.

The major content:
- add a framework for generating errors that should result in filesystems
  going readonly.
- keep FS state in disk super block.
- make sure that all of resource will be freed and released at umount time.
- make sure that fter FS is forced readonly on error, there will be no more
  disk change before FS is corrected. For this, we should stop write operation.

After this patch is applied, the conversion from BUG() to such a framework can
happen incrementally.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

acce952b

17 1月, 2011 3 次提交

fs/btrfs: Fix build of ctree · f8b18087

由 Stefan Schmidt 提交于 1月 12, 2011

Fix the build failure in some configurations:

     CC [M]  fs/btrfs/ctree.o
  In file included from fs/btrfs/ctree.c:21:0:
  fs/btrfs/ctree.h:1003:17: error: field 'super_kobj' has incomplete type
  fs/btrfs/ctree.h:1074:17: error: field 'root_kobj' has incomplete type
  make[2]: *** [fs/btrfs/ctree.o] Error 1
  make[1]: *** [fs/btrfs] Error 2
  make: *** [fs] Error 2

caused by commit 57cc7215 ("headers: kobject.h redux")

We need to include kobject.h here.
Reported-by: NJeff Garzik <jeff@garzik.org>
Fix-suggested-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NStefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f8b18087

btrfs: fix wrong free space information of btrfs · 6d07bcec

由 Miao Xie 提交于 1月 05, 2011

When we store data by raid profile in btrfs with two or more different size
disks, df command shows there is some free space in the filesystem, but the
user can not write any data in fact, df command shows the wrong free space
information of btrfs.

 # mkfs.btrfs -d raid1 /dev/sda9 /dev/sda10
 # btrfs-show
 Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
 	Total devices 2 FS bytes used 28.00KB
 	devid    1 size 5.01GB used 2.03GB path /dev/sda9
 	devid    2 size 10.00GB used 2.01GB path /dev/sda10
 # btrfs device scan /dev/sda9 /dev/sda10
 # mount /dev/sda9 /mnt
 # dd if=/dev/zero of=tmpfile0 bs=4K count=9999999999
   (fill the filesystem)
 # sync
 # df -TH
 Filesystem	Type	Size	Used	Avail	Use%	Mounted on
 /dev/sda9	btrfs	17G	8.6G	5.4G	62%	/mnt
 # btrfs-show
 Label: none  uuid: a95cd49e-6e33-45b8-8741-a36153ce4b64
 	Total devices 2 FS bytes used 3.99GB
 	devid    1 size 5.01GB used 5.01GB path /dev/sda9
 	devid    2 size 10.00GB used 4.99GB path /dev/sda10

It is because btrfs cannot allocate chunks when one of the pairing disks has
no space, the free space on the other disks can not be used for ever, and should
be subtracted from the total space, but btrfs doesn't subtract this space from
the total. It is strange to the user.

This patch fixes it by calcing the free space that can be used to allocate
chunks.

Implementation:
1. get all the devices free space, and align them by stripe length.
2. sort the devices by the free space.
3. check the free space of the devices,
   3.1. if it is not zero, and then check the number of the devices that has
        more free space than this device,
        if the number of the devices is beyond the min stripe number, the free
        space can be used, and add into total free space.
        if the number of the devices is below the min stripe number, we can not
        use the free space, the check ends.
   3.2. if the free space is zero, check the next devices, goto 3.1

This implementation is just likely fake chunk allocation.

After appling this patch, df can show correct space information:
 # df -TH
 Filesystem	Type	Size	Used	Avail	Use%	Mounted on
 /dev/sda9	btrfs	17G	8.6G	0	100%	/mnt
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

6d07bcec

fs/btrfs: Fix build of ctree · f580eb09

由 Stefan Schmidt 提交于 1月 12, 2011

CC [M]  fs/btrfs/ctree.o
In file included from fs/btrfs/ctree.c:21:0:
fs/btrfs/ctree.h:1003:17: error: field <91>super_kobj<92> has incomplete type
fs/btrfs/ctree.h:1074:17: error: field <91>root_kobj<92> has incomplete type
make[2]: *** [fs/btrfs/ctree.o] Error 1
make[1]: *** [fs/btrfs] Error 2
make: *** [fs] Error 2

We need to include kobject.h here.
Reported-by: NJeff Garzik <jeff@garzik.org>
Fix-suggested-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NStefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f580eb09

07 1月, 2011 1 次提交
- N
  fs: provide rcu-walk aware permission i_ops · b74c79e9
  由 Nick Piggin 提交于 1月 07, 2011
```
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
```
  b74c79e9
23 12月, 2010 1 次提交

Btrfs: Add readonly snapshots support · b83cc969

由 Li Zefan 提交于 12月 20, 2010

Usage:

Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and call
ioctl(BTRFS_I0CTL_SNAP_CREATE_V2).

Implementation:

- Set readonly bit of btrfs_root_item->flags.
- Add readonly checks in btrfs_permission (inode_permission),
btrfs_setattr, btrfs_set/remove_xattr and some ioctls.

Changelog for v3:

- Eliminate btrfs_root->readonly, but check btrfs_root->root_item.flags.
- Rename BTRFS_ROOT_SNAP_RDONLY to BTRFS_ROOT_SUBVOL_RDONLY.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

b83cc969

22 12月, 2010 2 次提交

btrfs: Add lzo compression support · a6fa6fae

由 Li Zefan 提交于 10月 25, 2010

Lzo is a much faster compression algorithm than gzib, so would allow
more users to enable transparent compression, and some users can
choose from compression ratio and speed for different applications

Usage:

 # mount -t btrfs -o compress[=<zlib,lzo>] dev /mnt
or
 # mount -t btrfs -o compress-force[=<zlib,lzo>] dev /mnt

"-o compress" without argument is still allowed for compatability.

Compatibility:

If we mount a filesystem with lzo compression, it will not be able be
mounted in old kernels. One reason is, otherwise btrfs will directly
dump compressed data, which sits in inline extent, to user.

Performance:

The test copied a linux source tarball (~400M) from an ext4 partition
to the btrfs partition, and then extracted it.

(time in second)
           lzo        zlib        nocompress
copy:      10.6       21.7        14.9
extract:   70.1       94.4        66.6

(data size in MB)
           lzo        zlib        nocompress
copy:      185.87     108.69      394.49
extract:   193.80     132.36      381.21

Changelog:

v1 -> v2:
- Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig.
- Add incompability flag.
- Fix error handling in compress code.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

a6fa6fae

btrfs: Allow to add new compression algorithm · 261507a0

由 Li Zefan 提交于 12月 17, 2010

Make the code aware of compression type, instead of always assuming
zlib compression.

Also make the zlib workspace function as common code for all
compression types.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>

261507a0

22 11月, 2010 1 次提交

btrfs: make 1-bit signed fileds unsigned · 0410c94a

由 Mariusz Kozlowski 提交于 11月 20, 2010

Fixes these sparse warnings:
fs/btrfs/ctree.h:811:17: error: dubious one-bit signed bitfield
fs/btrfs/ctree.h:812:20: error: dubious one-bit signed bitfield
fs/btrfs/ctree.h:813:19: error: dubious one-bit signed bitfield
Signed-off-by: NMariusz Kozlowski <mk@lab.zgora.pl>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0410c94a

30 10月, 2010 2 次提交

Btrfs: allow subvol deletion by unprivileged user with -o user_subvol_rm_allowed · 4260f7c7

由 Sage Weil 提交于 10月 29, 2010

Add a mount option user_subvol_rm_allowed that allows users to delete a
(potentially non-empty!) subvol when they would otherwise we allowed to do
an rmdir(2).  We duplicate the may_delete() checks from the core VFS code
to implement identical security checks (minus the directory size check).
We additionally require that the user has write+exec permission on the
subvol root inode.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4260f7c7

Btrfs: async transaction commit · bb9c12c9

由 Sage Weil 提交于 10月 29, 2010

Add support for an async transaction commit that is ordered such that any
subsequent operations will join the following transaction, but does not
wait until the current commit is fully on disk. This avoids much of the
latency associated with the btrfs_commit_transaction for callers concerned
with serialization and not safety.

The wait_for_unblock flag controls whether we wait for the 'middle' portion
of commit_transaction to complete, which is necessary if the caller expects
some of the modifications contained in the commit to be available (this is
the case for subvol/snapshot creation).
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

bb9c12c9

29 10月, 2010 4 次提交

Btrfs: Add a clear_cache mount option · 88c2ba3b

由 Josef Bacik 提交于 9月 21, 2010

If something goes wrong with the free space cache we need a way to make sure
it's not loaded on mount and that it's cleared for everybody. When you pass the
clear_cache option it will make it so all block groups are setup to be cleared,
which keeps them from being loaded and then they will be truncated when the
transaction is committed. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

88c2ba3b

Btrfs: add support for mixed data+metadata block groups · 67377734

由 Josef Bacik 提交于 9月 16, 2010

There are just a few things that need to be fixed in the kernel to support mixed
data+metadata block groups. Mostly we just need to make sure that if we are
using mixed block groups that we continue to allocate mixed block groups as we
need them. Also we need to make sure __find_space_info will find our space info
if we search for DATA or METADATA only. Tested this with xfstests and it works
nicely. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

67377734

Btrfs: write out free space cache · 0cb59c99

由 Josef Bacik 提交于 7月 02, 2010

This is a simple bit, just dump the free space cache out to our preallocated
inode when we're writing out dirty block groups. There are a bunch of changes
in inode.c in order to account for special cases. Mostly when we're doing the
writeout we're holding trans_mutex, so we need to use the nolock transacation
functions. Also we can't do asynchronous completions since the async thread
could be blocked on already completed IO waiting for the transaction lock. This
has been tested with xfstests and btrfs filesystem balance, as well as my ENOSPC
tests. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

0cb59c99

Btrfs: create special free space cache inode · 0af3d00b

由 Josef Bacik 提交于 6月 21, 2010

In order to save free space cache, we need an inode to hold the data, and we
need a special item to point at the right inode for the right block group. So
first, create a special item that will point to the right inode, and the number
of extent entries we will have and the number of bitmaps we will have. We
truncate and pre-allocate space everytime to make sure it's uptodate.

This feature will be turned on as soon as you mount with -o space_cache, however
it is safe to boot into old kernels, they will just generate the cache the old
fashion way. When you boot back into a newer kernel we will notice that we
modified and not the cache and automatically discard the cache.
Signed-off-by: NJosef Bacik <josef@redhat.com>

0af3d00b

23 10月, 2010 3 次提交

Btrfs: rework how we reserve metadata bytes · 8bb8ab2e

由 Josef Bacik 提交于 10月 15, 2010

With multi-threaded writes we were getting ENOSPC early because somebody would
come in, start flushing delalloc because they couldn't make their reservation,
and in the meantime other threads would come in and use the space that was
getting freed up, so when the original thread went to check to see if they had
space they didn't and they'd return ENOSPC. So instead if we have some free
space but not enough for our reservation, take the reservation and then start
doing the flushing. The only time we don't take reservations is when we've
already overcommitted our space, that way we don't have people who come late to
the party way overcommitting ourselves. This also moves all of the retrying and
flushing code into reserve_metdata_bytes so it's all uniform. This keeps my
fs_mark test from returning -ENOSPC as soon as it starts and actually lets me
fill up the disk. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

8bb8ab2e

Btrfs: re-work delalloc flushing · 0019f10d

由 Josef Bacik 提交于 10月 15, 2010

Currently we try and flush delalloc, but we only do that in a sort of weak way,
which works fine in most cases but if we're under heavy pressure we need to be
able to wait for flushing to happen. Also instead of checking the bytes
reserved in the block_rsv, check the space info since it is more accurate. The
sync option will be used in a future patch.
Signed-off-by: NJosef Bacik <josef@redhat.com>

0019f10d

Btrfs: fix df regression · 89a55897

由 Josef Bacik 提交于 10月 14, 2010

The new ENOSPC stuff breaks out the raid types which breaks the way we were
reporting df to the system.  This fixes it back so that Available is the total
space available to data and used is the actual bytes used by the filesystem.
This means that Available is Total - data used - all of the metadata space.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

89a55897

10 8月, 2010 2 次提交

A
Make ->drop_inode() just return whether inode needs to be dropped · 45321ac5
由 Al Viro 提交于 6月 07, 2010
```
... and let iput_final() do the actual eviction or retention
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
45321ac5

convert btrfs to ->evict_inode() · bd555975

由 Al Viro 提交于 6月 07, 2010

NB: do we want btrfs_wait_ordered_range() on eviction of
inodes with positive i_nlink on subvolume with zero root_refs?
If not, btrfs_evict_inode() can be simplified by unconditionally
bailing out in case of i_nlink > 0 in the very beginning...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bd555975

28 5月, 2010 1 次提交

drop unused dentry argument to ->fsync · 7ea80859

由 Christoph Hellwig 提交于 5月 26, 2010

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7ea80859

25 5月, 2010 3 次提交

Btrfs: add basic DIO read/write support · 4b46fce2

由 Josef Bacik 提交于 5月 23, 2010

This provides basic DIO support for reading and writing.  It does not do the
work to recover from mismatching checksums, that will come later.  A few design
changes have been made from Jim's code (sorry Jim!)

1) Use the generic direct-io code.  Jim originally re-wrote all the generic DIO
code in order to account for all of BTRFS's oddities, but thanks to that work it
seems like the best bet is to just ignore compression and such and just opt to
fallback on buffered IO.

2) Fallback on buffered IO for compressed or inline extents.  Jim's code did
it's own buffering to make dio with compressed extents work.  Now we just
fallback onto normal buffered IO.

3) Use ordered extents for the writes so that all of the

lock_extent()
lookup_ordered()

type checks continue to work.

4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with
DIO writes.

I've tested this with fsx and everything works great.  This patch depends on my
dio and filemap.c patches to work.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4b46fce2

Btrfs: Metadata ENOSPC handling for balance · 3fd0a558

由 Yan, Zheng 提交于 5月 16, 2010

This patch adds metadata ENOSPC handling for the balance code.
It is consisted by following major changes:

1. Avoid COW tree leave in the phrase of merging tree.

2. Handle interaction with snapshot creation.

3. make the backref cache can live across transactions.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

3fd0a558

Btrfs: Pre-allocate space for data relocation · efa56464

由 Yan, Zheng 提交于 5月 16, 2010

Pre-allocate space for data relocation. This can detect ENOPSC
condition caused by fragmentation of free space.
Signed-off-by: NYan Zheng <zheng.yan@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

efa56464