提交 · 499f377f49f085ee4aa214c738e948e88626f39b · openeuler / raspberrypi-kernel

29 7月, 2015 1 次提交

btrfs: iterate over unused chunk space in FITRIM · 499f377f

由 Jeff Mahoney 提交于 6月 15, 2015

Since we now clean up block groups automatically as they become
empty, iterating over block groups is no longer sufficient to discard
unused space.

This patch iterates over the unused chunk space and discards any regions
that are unallocated, regardless of whether they were ever used. This is
a change for btrfs but is consistent with other file systems.

We do this in a transactionless manner since the discard process can take
a substantial amount of time and a transaction would need to be started
before the acquisition of the device list lock. That would mean a
transaction would be held open across /all/ of the discards collectively.
In order to prevent other threads from allocating or freeing chunks, we
hold the chunks lock across the search and discard calls. We release it
between searches to allow the file system to perform more-or-less
normally. Since the running transaction can commit and disappear while
we're using the transaction pointer, we take a reference to it and
release it after the search. This is safe since it would happen normally
at the end of the transaction commit after any locks are released anyway.
We also take the commit_root_sem to protect against a transaction starting
and committing while we're running.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Reviewed-by: NFilipe Manana <fdmanana@suse.com>
Tested-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

499f377f

27 5月, 2015 3 次提交

Btrfs: sysfs: add pointer to access fs_info from fs_devices · 5a13f430

由 Anand Jain 提交于 3月 10, 2015

adds fs_info pointer with struct btrfs_fs_devices.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

5a13f430

A
Btrfs: introduce btrfs_get_fs_uuids to get fs_uuids · c73eccf7
由 Anand Jain 提交于 3月 10, 2015
```
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
c73eccf7

Btrfs: sysfs: move super_kobj and device_dir_kobj from fs_info to btrfs_fs_devices · 2e7910d6

由 Anand Jain 提交于 3月 10, 2015

This patch will provide a framework and help to create attributes
from the structure btrfs_fs_devices which are available even before
fs_info is created. So by moving the parent kobject super_kobj from
fs_info to btrfs_fs_devices, it will help to create attributes
from the btrfs_fs_devices as well.

Patches on top of this patch now will be able to create the
sys/fs/btrfs/fsid kobject and attributes from btrfs_fs_devices
when devices are scanned and registered to the kernel.

Just to note, this does not change any of the existing btrfs sysfs
external kobject names and its attributes and not even the life
cycle of them. Changes are internal only. And to ensure the same,
this path has been tested with various device operations and,
checking and comparing the sysfs kobjects and attributes with
sysfs kobject and attributes with out this patch, and they remain
same.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

2e7910d6

17 2月, 2015 1 次提交

btrfs: remove unused fs_info arg from btrfs_close_extra_devices() · 9eaed21e

由 Eric Sandeen 提交于 8月 01, 2014

The commit:
8dabb742 Btrfs: change core code of btrfs to support the
        device replace operations
added the fs_info argument, but never used it -
just remove it again.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

9eaed21e

22 1月, 2015 3 次提交

Btrfs: Include map_type in raid_bio · 10f11900

由 Zhao Lei 提交于 1月 20, 2015

Corrent code use many kinds of "clever" way to determine operation
target's raid type, as:
  raid_map != NULL
  or
  raid_map[MAX_NR] == RAID[56]_Q_STRIPE

To make code easy to maintenance, this patch put raid type into
bbio, and we can always get raid type from bbio with a "stupid"
way.
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

10f11900

Btrfs: add ref_count and free function for btrfs_bio · 6e9606d2

由 Zhao Lei 提交于 1月 20, 2015

1: ref_count is simple than current RBIO_HOLD_BBIO_MAP_BIT flag
   to keep btrfs_bio's memory in raid56 recovery implement.
2: free function for bbio will make code clean and flexible, plus
   forced data type checking in compile.

Changelog v1->v2:
 Rename following by David Sterba's suggestion:
 put_btrfs_bio() -> btrfs_put_bio()
 get_btrfs_bio() -> btrfs_get_bio()
 bbio->ref_count -> bbio->refs
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

6e9606d2

Btrfs: Make raid_map array be inlined in btrfs_bio structure · 8e5cfb55

由 Zhao Lei 提交于 1月 20, 2015

It can make code more simple and clear, we need not care about
free bbio and raid_map together.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

8e5cfb55

03 12月, 2014 3 次提交

Btrfs: fix race between fs trimming and block group remove/allocation · 04216820

由 Filipe Manana 提交于 11月 27, 2014

Our fs trim operation, which is completely transactionless (doesn't start
or joins an existing transaction) consists of visiting all block groups
and then for each one to iterate its free space entries and perform a
discard operation against the space range represented by the free space
entries. However before performing a discard, the corresponding free space
entry is removed from the free space rbtree, and when the discard completes
it is added back to the free space rbtree.

If a block group remove operation happens while the discard is ongoing (or
before it starts and after a free space entry is hidden), we end up not
waiting for the discard to complete, remove the extent map that maps
logical address to physical addresses and the corresponding chunk metadata
from the the chunk and device trees. After that and before the discard
completes, the current running transaction can finish and a new one start,
allowing for new block groups that map to the same physical addresses to
be allocated and written to.

So fix this by keeping the extent map in memory until the discard completes
so that the same physical addresses aren't reused before it completes.

If the physical locations that are under a discard operation end up being
used for a new metadata block group for example, and dirty metadata extents
are written before the discard finishes (the VM might call writepages() of
our btree inode's i_mapping for example, or an fsync log commit happens) we
end up overwriting metadata with zeroes, which leads to errors from fsck
like the following:

        checking extents
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        read block failed check_tree_block
        owner ref check failed [833912832 16384]
        Errors found in extent allocation tree or chunk allocation
        checking free space cache
        checking fs roots
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        Check tree block failed, want=833912832, have=0
        read block failed check_tree_block
        root 5 root dir 256 error
        root 5 inode 260 errors 2001, no inode item, link count wrong
                unresolved ref dir 256 index 0 namelen 8 name foobar_3 filetype 1 errors 6, no dir index, no inode ref
        root 5 inode 262 errors 2001, no inode item, link count wrong
                unresolved ref dir 256 index 0 namelen 8 name foobar_5 filetype 1 errors 6, no dir index, no inode ref
        root 5 inode 263 errors 2001, no inode item, link count wrong
        (...)
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

04216820

Btrfs, replace: write dirty pages into the replace target device · 2c8cdd6e

由 Miao Xie 提交于 11月 14, 2014

The implementation is simple:
- In order to avoid changing the code logic of btrfs_map_bio and
  RAID56, we add the stripes of the replace target devices at the
  end of the stripe array in btrfs bio, and we sort those target
  device stripes in the array. And we keep the number of the target
  device stripes in the btrfs bio.
- Except write operation on RAID56, all the other operation don't
  take the target device stripes into account.
- When we do write operation, we read the data from the common devices
  and calculate the parity. Then write the dirty data and new parity
  out, at this time, we will find the relative replace target stripes
  and wirte the relative data into it.

Note: The function that copying old data on the source device to
the target device was implemented in the past, it is similar to
the other RAID type.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

2c8cdd6e

Btrfs, scrub: repair the common data on RAID5/6 if it is corrupted · af8e2d1d

由 Miao Xie 提交于 10月 23, 2014

This patch implement the RAID5/6 common data repair function, the
implementation is similar to the scrub on the other RAID such as
RAID1, the differentia is that we don't read the data from the
mirror, we use the data repair function of RAID5/6.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

af8e2d1d

25 11月, 2014 1 次提交

btrfs: Fix a lockdep warning when running xfstest. · 084b6e7c

由 Qu Wenruo 提交于 10月 30, 2014

The following lockdep warning is triggered during xfstests:

[ 1702.980872] =========================================================
[ 1702.981181] [ INFO: possible irq lock inversion dependency detected ]
[ 1702.981482] 3.18.0-rc1 #27 Not tainted
[ 1702.981781] ---------------------------------------------------------
[ 1702.982095] kswapd0/77 just changed the state of lock:
[ 1702.982415]  (&delayed_node->mutex){+.+.-.}, at: [<ffffffffa03b0b51>] __btrfs_release_delayed_node+0x41/0x1f0 [btrfs]
[ 1702.982794] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[ 1702.983160]  (&fs_info->dev_replace.lock){+.+.+.}

and interrupts could create inverse lock ordering between them.

[ 1702.984675]
other info that might help us debug this:
[ 1702.985524] Chain exists of:
  &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock

[ 1702.986799]  Possible interrupt unsafe locking scenario:

[ 1702.987681]        CPU0                    CPU1
[ 1702.988137]        ----                    ----
[ 1702.988598]   lock(&fs_info->dev_replace.lock);
[ 1702.989069]                                local_irq_disable();
[ 1702.989534]                                lock(&delayed_node->mutex);
[ 1702.990038]                                lock(&found->groups_sem);
[ 1702.990494]   <Interrupt>
[ 1702.990938]     lock(&delayed_node->mutex);
[ 1702.991407]
 *** DEADLOCK ***

It is because the btrfs_kobj_{add/rm}_device() will call memory
allocation with GFP_KERNEL,
which may flush fs page cache to free space, waiting for it self to do
the commit, causing the deadlock.

To solve the problem, move btrfs_kobj_{add/rm}_device() out of the
dev_replace lock range, also involing split the
btrfs_rm_dev_replace_srcdev() function into remove and free parts.

Now only btrfs_rm_dev_replace_remove_srcdev() is called in dev_replace
lock range, and kobj_{add/rm} and btrfs_rm_dev_replace_free_srcdev() are
called out of the lock range.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

084b6e7c

23 9月, 2014 1 次提交

Btrfs: remove empty block groups automatically · 47ab2a6c

由 Josef Bacik 提交于 9月 18, 2014

One problem that has plagued us is that a user will use up all of his space with
data, remove a bunch of that data, and then try to create a bunch of small files
and run out of space. This happens because all the chunks were allocated for
data since the metadata requirements were so low. But now there's a bunch of
empty data block groups and not enough metadata space to do anything. This
patch solves this problem by automatically deleting empty block groups. If we
notice the used count go down to 0 when deleting or on mount notice that a block
group has a used count of 0 then we will queue it to be deleted.

When the cleaner thread runs we will double check to make sure the block group
is still empty and then we will delete it. This patch has the side effect of no
longer having a bunch of BUG_ON()'s in the chunk delete code, which will be
helpful for both this and relocate. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

47ab2a6c

18 9月, 2014 11 次提交

Btrfs: do file data check by sub-bio's self · c1dc0896

由 Miao Xie 提交于 9月 12, 2014

Direct IO splits the original bio to several sub-bios because of the limit of
raid stripe, and the filesystem will wait for all sub-bios and then run final
end io process.

But it was very hard to implement the data repair when dio read failure happens,
because at the final end io function, we didn't know which mirror the data was
read from. So in order to implement the data repair, we have to move the file data
check in the final end io function to the sub-bio end io function, in which we can
get the mirror number of the device we access. This patch did this work as the
first step of the direct io data repair implementation.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

c1dc0896

Btrfs: fix use-after-free problem of the device during device replace · 67a2c45e

由 Miao Xie 提交于 9月 03, 2014

The problem is:
	Task0(device scan task)		Task1(device replace task)
	scan_one_device()
	mutex_lock(&uuid_mutex)
	device = find_device()
					mutex_lock(&device_list_mutex)
					lock_chunk()
					rm_and_free_source_device
					unlock_chunk()
					mutex_unlock(&device_list_mutex)
	check device

Destroying the target device if device replace fails also has the same problem.

We fix this problem by locking uuid_mutex during destroying source device or
target device, just like the device remove operation.

It is a temporary solution, we can fix this problem and make the code more
clear by atomic counter in the future.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

67a2c45e

Btrfs: fix unprotected device's variants on 32bits machine · 7cc8e58d

由 Miao Xie 提交于 9月 03, 2014

->total_bytes,->disk_total_bytes,->bytes_used is protected by chunk
lock when we change them, but sometimes we read them without any lock,
and we might get unexpected value. We fix this problem like inode's
i_size.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

7cc8e58d

Btrfs: fix wrong device bytes_used in the super block · ce7213c7

由 Miao Xie 提交于 9月 03, 2014

device->bytes_used will be changed when allocating a new chunk, and
disk_total_size will be changed if resizing is successful.
Meanwhile, the on-disk super blocks of the previous transaction
might not be updated. Considering the consistency of the metadata
in the previous transaction, We should use the size in the previous
transaction to check if the super block is beyond the boundary
of the device.

Though it is not big problem because we don't use it now, but anyway
it is better that we make it be consistent with the common metadata,
maybe we will use it in the future.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

ce7213c7

Btrfs: fix wrong disk size when writing super blocks · 935e5cc9

由 Miao Xie 提交于 9月 03, 2014

total_size will be changed when resizing a device, and disk_total_size
will be changed if resizing is successful. Meanwhile, the on-disk super
blocks of the previous transaction might not be updated. Considering
the consistency of the metadata in the previous transaction, We should
use the size in the previous transaction to check if the super block is
beyond the boundary of the device. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

935e5cc9

Btrfs: fix unprotected assignment of the target device · 1c43366d

由 Miao Xie 提交于 9月 03, 2014

We didn't protect the assignment of the target device, it might cause the
problem that the super block update was skipped because we might find wrong
size of the target device during the assignment. Fix it by moving the
assignment sentences into the initialization function of the target device.
And there is another merit that we can check if the target device is suitable
more early.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

1c43366d

Btrfs: cleanup unused num_can_discard in fs_devices · 90180da4

由 Miao Xie 提交于 9月 03, 2014

The member variants - num_can_discard - of fs_devices structure
are set, but no one use them to do anything. so remove them.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

90180da4

Btrfs: cleanup unused latest_devid and latest_trans in fs_devices · 443f24fe

由 Miao Xie 提交于 7月 24, 2014

The member variants - latest_devid and latest_trans - of fs_devices structure
are set, but no one use them to do anything. so remove them.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

443f24fe

M
Btrfs: update the comment of total_bytes and disk_total_bytes of btrfs_devie · 6ba40b61
由 Miao Xie 提交于 7月 24, 2014
```
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>
```
6ba40b61

Btrfs: Fix the problem that the dirty flag of dev stats is cleared · addc3fa7

由 Miao Xie 提交于 7月 24, 2014

The io error might happen during writing out the device stats, and the
device stats information and dirty flag would be update at that time,
but the current code didn't consider this case, just clear the dirty
flag, it would cause that we forgot to write out the new device stats
information. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

addc3fa7

Btrfs: make the device lock and its protected data in the same cacheline · d5ee37bc

由 Miao Xie 提交于 7月 24, 2014

The lock in btrfs_device structure was far away from its protected data, it would
make CPU load the cache line twice when we accessed them, move them together.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

d5ee37bc

20 6月, 2014 1 次提交

Btrfs: fix deadlock when mounting a degraded fs · c55f1396

由 Miao Xie 提交于 6月 19, 2014

The deadlock happened when we mount degraded filesystem, the reproduced
steps are following:
 # mkfs.btrfs -f -m raid1 -d raid1 <dev0> <dev1>
 # echo 1 > /sys/block/`basename <dev0>`/device/delete
 # mount -o degraded <dev1> <mnt>

The reason was that the counter -- bi_remaining was wrong. If the missing
or unwriteable device was the last device in the mapping array, we would
not submit the original bio, so we shouldn't increase bi_remaining of it
in btrfs_end_bio(), or we would skip the final endio handle.

Fix this problem by adding a flag into btrfs bio structure. If we submit
the original bio, we will set the flag, and we increase bi_remaining counter,
or we don't.

Though there is another way to fix it -- decrease bi_remaining counter of the
original bio when we make sure the original bio is not submitted, this method
need add more check and is easy to make mistake.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

c55f1396

10 6月, 2014 1 次提交

btrfs: balance filter: add limit of processed chunks · 7d824b6f

由 David Sterba 提交于 5月 07, 2014

This started as debugging helper, to watch the effects of converting
between raid levels on multiple devices, but could be useful standalone.

In my case the usage filter was not finegrained enough and led to
converting too many chunks at once. Another example use is in connection
with drange+devid or vrange filters that allow to work with a specific
chunk or even with a chunk on a given device.

The limit filter applies last, the value of 0 means no limiting.

CC: Ilya Dryomov <idryomov@gmail.com>
CC: Hugo Mills <hugo@carfax.org.uk>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

7d824b6f

11 3月, 2014 3 次提交

btrfs: Cleanup the "_struct" suffix in btrfs_workequeue · d458b054

由 Qu Wenruo 提交于 2月 28, 2014

Since the "_struct" suffix is mainly used for distinguish the differnt
btrfs_work between the original and the newly created one,
there is no need using the suffix since all btrfs_workers are changed
into btrfs_workqueue.

Also this patch fixed some codes whose code style is changed due to the
too long "_struct" suffix.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

d458b054

btrfs: Replace fs_info->submit_workers with btrfs_workqueue. · a8c93d4e

由 Qu Wenruo 提交于 2月 28, 2014

Much like the fs_info->workers, replace the fs_info->submit_workers
use the same btrfs_workqueue.
Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Tested-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

a8c93d4e

Btrfs: fix use-after-free in the finishing procedure of the device replace · c404e0dc

由 Miao Xie 提交于 1月 30, 2014

During device replace test, we hit a null pointer deference (It was very easy
to reproduce it by running xfstests' btrfs/011 on the devices with the virtio
scsi driver). There were two bugs that caused this problem:
- We might allocate new chunks on the replaced device after we updated
  the mapping tree. And we forgot to replace the source device in those
  mapping of the new chunks.
- We might get the mapping information which including the source device
  before the mapping information update. And then submit the bio which was
  based on that mapping information after we freed the source device.

For the first bug, we can fix it by doing mapping tree update and source
device remove in the same context of the chunk mutex. The chunk mutex is
used to protect the allocable device list, the above method can avoid
the new chunk allocation, and after we remove the source device, all
the new chunks will be allocated on the new device. So it can fix
the first bug.

For the second bug, we need make sure all flighting bios are finished and
no new bios are produced during we are removing the source device. To fix
this problem, we introduced a global @bio_counter, we not only inc/dec
@bio_counter outsize of map_blocks, but also inc it before submitting bio
and dec @bio_counter when ending bios.

Since Raid56 is a little different and device replace dosen't support raid56
yet, it is not addressed in the patch and I add comments to make sure we will
fix it in the future.
Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fb.com>

c404e0dc

12 11月, 2013 2 次提交

btrfs: Pack struct btrfs_device · 3c45bfc1

由 Dulshani Gunawardhana 提交于 10月 31, 2013

Pack the structure btrfs_device in volumes.h to eliminate holes detected
by pahole, thus reducing binary memory footprint.
Signed-off-by: NDulshani Gunawardhana <dulshani.gunawardhana89@gmail.com>
Reviewed-by: NZach Brown <zab@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

3c45bfc1

Btrfs: remove scrub_super_lock holding in btrfs_sync_log() · 9b011adf

由 Wang Shilong 提交于 10月 25, 2013

Originally, we introduced scrub_super_lock to synchronize
tree log code with scrubbing super.

However we can replace scrub_super_lock with device_list_mutex,
because writing super will hold this mutex, this will reduce an extra
lock holding when writing supers in sync log code.
Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

9b011adf

01 9月, 2013 4 次提交

Btrfs: add btrfs_alloc_device and switch to it · 12bd2fc0

由 Ilya Dryomov 提交于 8月 23, 2013

Currently btrfs_device is allocated ad-hoc in a few different places,
and as a result not all fields are initialized properly.  In particular,
readahead state is only initialized in device_list_add (at scan time),
and not in btrfs_init_new_device (when the new device is added with
'btrfs dev add').  Fix this by adding an allocation helper and switch
everybody but __btrfs_close_devices to it.  (__btrfs_close_devices is
dealt with in a later commit.)
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

12bd2fc0

Btrfs: check UUID tree during mount if required · 70f80175

由 Stefan Behrens 提交于 8月 15, 2013

If the filesystem was mounted with an old kernel that was not
aware of the UUID tree, this is detected by looking at the
uuid_tree_generation field of the superblock (similar to how
the free space cache is doing it). If a mismatch is detected
at mount time, a thread is started that does two things:
1. Iterate through the UUID tree, check each entry, delete those
   entries that are not valid anymore (i.e., the subvol does not
   exist anymore or the value changed).
2. Iterate through the root tree, for each found subvolume, add
   the UUID tree entries for the subvolume (if they are not
   already there).

This mechanism is also used to handle and repair errors that
happened during the initial creation and filling of the tree.
The update of the uuid_tree_generation field (which indicates
that the state of the UUID tree is up to date) is blocked until
all create and repair operations are successfully completed.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

70f80175

Btrfs: create UUID tree if required · f7a81ea4

由 Stefan Behrens 提交于 8月 15, 2013

This tree is not created by mkfs.btrfs. Therefore when a filesystem
is mounted writable and the UUID tree does not exist, this tree is
created if required. The tree is also added to the fs_info structure
and initialized, but this commit does not yet read or write UUID tree
elements.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

f7a81ea4

Btrfs: don't cache the csum value into the extent state tree · facc8a22

由 Miao Xie 提交于 7月 25, 2013

Before applying this patch, we cached the csum value into the extent state
tree when reading some data from the disk, this operation increased the lock
contention of the state tree.

Now, we just store the csum value into the bio structure or other unshared
structure, so we can reduce the lock contention.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

facc8a22

02 7月, 2013 1 次提交

Btrfs: make the chunk allocator completely tree lockless · 6df9a95e

由 Josef Bacik 提交于 6月 27, 2013

When adjusting the enospc rules for relocation I ran into a deadlock because we
were relocating the only system chunk and that forced us to try and allocate a
new system chunk while holding locks in the chunk tree, which caused us to
deadlock. To fix this I've moved all of the dev extent addition and chunk
addition out to the delayed chunk completion stuff. We still keep the in-memory
stuff which makes sure everything is consistent.

One change I had to make was to search the commit root of the device tree to
find a free dev extent, and hold onto any chunk em's that we allocated in that
transaction so we do not allocate the same dev extent twice. This has the side
effect of fixing a bug with balance that has been there ever since balance
existed. Basically you can free a block group and it's dev extent and then
immediately allocate that dev extent for a new block group and write stuff to
that dev extent, all within the same transaction. So if you happen to crash
during a balance you could come back to a completely broken file system. This
patch should keep these sort of things from happening in the future since we
won't be able to allocate free'd dev extents until after the transaction
commits. This has passed all of the xfstests and my super annoying stress test
followed by a balance. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

6df9a95e

14 6月, 2013 1 次提交

Btrfs: cleanup the similar code of the fs root read · cb517eab

由 Miao Xie 提交于 5月 15, 2013

There are several functions whose code is similar, such as
  btrfs_find_last_root()
  btrfs_read_fs_root_no_radix()

Besides that, some functions are invoked twice, it is unnecessary,
for example, we are sure that all roots which is found in
  btrfs_find_orphan_roots()
have their orphan items, so it is unnecessary to check the orphan
item again.

So cleanup it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

cb517eab

18 5月, 2013 1 次提交

Btrfs: use a btrfs bioset instead of abusing bio internals · 9be3395b

由 Chris Mason 提交于 5月 17, 2013

Btrfs has been pointer tagging bi_private and using bi_bdev
to store the stripe index and mirror number of failed IOs.

As bios bubble back up through the call chain, we use these
to decide if and how to retry our IOs.  They are also used
to count IO failures on a per device basis.

Recently a bio tracepoint was added lead to crashes because
we were abusing bi_bdev.

This commit adds a btrfs bioset, and creates explicit fields
for the mirror number and stripe index.  The plan is to
extend this structure for all of the fields currently in
struct btrfs_bio, which will mean one less kmalloc in
our IO path.
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
Reported-by: NTejun Heo <tj@kernel.org>

9be3395b

07 5月, 2013 1 次提交

btrfs: make static code static & remove dead code · 48a3b636

由 Eric Sandeen 提交于 4月 25, 2013

Big patch, but all it does is add statics to functions which
are in fact static, then remove the associated dead-code fallout.

removed functions:

btrfs_iref_to_path()
__btrfs_lookup_delayed_deletion_item()
__btrfs_search_delayed_insertion_item()
__btrfs_search_delayed_deletion_item()
find_eb_for_page()
btrfs_find_block_group()
range_straddles_pages()
extent_range_uptodate()
btrfs_file_extent_length()
btrfs_scrub_cancel_devid()
btrfs_start_transaction_lflush()

btrfs_print_tree() is left because it is used for debugging.
btrfs_start_transaction_lflush() and btrfs_reada_detach() are
left for symmetry.

ulist.c functions are left, another patch will take care of those.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

48a3b636

20 2月, 2013 1 次提交

Btrfs: move fs/btrfs/ioctl.h to include/uapi/linux/btrfs.h · 55e301fd

由 Filipe Brandenburger 提交于 1月 29, 2013

The header file will then be installed under /usr/include/linux so that
userspace applications can refer to Btrfs ioctls by name and use the same
structs used internally in the kernel.
Signed-off-by: NFilipe Brandenburger <filbranden@google.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

55e301fd