提交 · 8dabb7420f014ab0f9f04afae8ae046c0f48b270 · openanolis / cloud-kernel

13 12月, 2012 40 次提交

Btrfs: change core code of btrfs to support the device replace operations · 8dabb742

由 Stefan Behrens 提交于 11月 06, 2012

This commit contains all the essential changes to the core code
of Btrfs for support of the device replace procedure.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

8dabb742

Btrfs: add new sources for device replace code · e93c89c1

由 Stefan Behrens 提交于 11月 05, 2012

This adds a new file to the sources together with the header file
and the changes to ioctl.h and ctree.h that are required by the
new C source file. Additionally, 4 new functions are added to
volume.c that deal with device creation and destruction.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

e93c89c1

Btrfs: add code to scrub to copy read data to another disk · ff023aac

由 Stefan Behrens 提交于 11月 06, 2012

The device replace procedure makes use of the scrub code. The scrub
code is the most efficient code to read the allocated data of a disk,
i.e. it reads sequentially in order to avoid disk head movements, it
skips unallocated blocks, it uses read ahead mechanisms, and it
contains all the code to detect and repair defects.
This commit adds code to scrub to allow the scrub code to copy read
data to another disk.
One goal is to be able to perform as fast as possible. Therefore the
write requests are collected until huge bios are built, and the
write process is decoupled from the read process with some kind of
flow control, of course, in order to limit the allocated memory.
The best performance on spinning disks could by reached when the
head movements are avoided as much as possible. Therefore a single
worker is used to interface the read process with the write process.
The regular scrub operation works as fast as before, it is not
negatively influenced and actually it is more or less unchanged.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

ff023aac

Btrfs: handle errors from btrfs_map_bio() everywhere · 61891923

由 Stefan Behrens 提交于 11月 05, 2012

With the addition of the device replace procedure, it is possible
for btrfs_map_bio(READ) to report an error. This happens when the
specific mirror is requested which is located on the target disk,
and the copy operation has not yet copied this block. Hence the
block cannot be read and this error state is indicated by
returning EIO.
Some background information follows now. A new mirror is added
while the device replace procedure is running.
btrfs_get_num_copies() returns one more, and
btrfs_map_bio(GET_READ_MIRROR) adds one more mirror if a disk
location is involved that was already handled by the device
replace copy operation. The assigned mirror num is the highest
mirror number, e.g. the value 3 in case of RAID1.
If btrfs_map_bio() is invoked with mirror_num == 0 (i.e., select
any mirror), the copy on the target drive is never selected
because that disk shall be able to perform the write requests as
quickly as possible. The parallel execution of read requests would
only slow down the disk copy procedure. Second case is that
btrfs_map_bio() is called with mirror_num > 0. This is done from
the repair code only. In this case, the highest mirror num is
assigned to the target disk, since it is used last. And when this
mirror is not available because the copy procedure has not yet
handled this area, an error is returned. Everywhere in the code
the handling of such errors is added now.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

61891923

Btrfs: disallow some operations on the device replace target device · 63a212ab

由 Stefan Behrens 提交于 11月 05, 2012

This patch adds some code to disallow operations on the device that
is used as the target for the device replace operation.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

63a212ab

Btrfs: disallow mutually exclusive admin operations from user mode · 5ac00add

由 Stefan Behrens 提交于 11月 05, 2012

Btrfs admin operations that are manually started from user mode
and that cannot be executed at the same time return -EINPROGRESS.
A common way to enter and leave this locked section is introduced
since it used to be specific to the balance operation.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

5ac00add

Btrfs: introduce a btrfs_dev_replace_item type · a2bff640

由 Stefan Behrens 提交于 11月 05, 2012

Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

a2bff640

Btrfs: enhance btrfs structures for device replace support · e922e087

由 Stefan Behrens 提交于 11月 05, 2012

Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

e922e087

Btrfs: avoid risk of a deadlock in btrfs_handle_error · 1acd6831

由 Stefan Behrens 提交于 11月 05, 2012

Remove the attempt to cancel a running scrub or device replace
operation in btrfs_handle_error() because it adds the risk of
a deadlock. The only penalty of not canceling the operation is
that some I/O remains active until the procedure completes.
This is basically the same thing that happens to other tasks
that are running in user mode context, they are not affected
or stopped in btrfs_handle_error(), these tasks just need to
handle write errors correctly.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

1acd6831

Btrfs: pass fs_info instead of root · aa1b8cd4

由 Stefan Behrens 提交于 11月 05, 2012

A small number of functions that are used in a device replace
procedure when the operation is resumed at mount time are unable
to pass the same root pointer that would be used in the regular
(ioctl) context. And since the root pointer is not required, only
the fs_info is, the root pointer argument is replaced with the
fs_info pointer argument.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

aa1b8cd4

Btrfs: add btrfs_scratch_superblock() function · a8a6dab7

由 Stefan Behrens 提交于 11月 05, 2012

This new function is used by the device replace procedure in
a later patch.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

a8a6dab7

Btrfs: pass fs_info to btrfs_map_block() instead of mapping_tree · 3ec706c8

由 Stefan Behrens 提交于 11月 05, 2012

This is required for the device replace procedure in a later step.
Two calling functions also had to be changed to have the fs_info
pointer: repair_io_failure() and scrub_setup_recheck_block().
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

3ec706c8

Btrfs: Pass fs_info to btrfs_num_copies() instead of mapping_tree · 5d964051

由 Stefan Behrens 提交于 11月 05, 2012

This is required for the device replace procedure in a later step.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

5d964051

Btrfs: add two more find_device() methods · 7ba15b7d

由 Stefan Behrens 提交于 11月 05, 2012

The new function btrfs_find_device_missing_or_by_path() will be
used for the device replace procedure. This function itself calls
the second new function btrfs_find_device_by_path().
Unfortunately, it is not possible to currently make the rest of the
code use these functions as well, since all functions that look
similar at first view are all a little bit different in what they
are doing. But in the future, new code could benefit from these
two new functions, and currently, device replace uses them.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

7ba15b7d

Btrfs: move some common code into a subfunction · beaf8ab3

由 Stefan Behrens 提交于 11月 12, 2012

Some code to open block devices, to read the superblock and to
handle errors was repeated multiple times in 3 places, and the
following patch makes use of it as well. This code is now moved
into a subfunction.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

beaf8ab3

Btrfs: cleanup scrub bio and worker wait code · b6bfebc1

由 Stefan Behrens 提交于 11月 02, 2012

Just move some code into functions to make everything more readable.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b6bfebc1

Btrfs: in scrub repair code, simplify alloc error handling · 34f5c8e9

由 Stefan Behrens 提交于 11月 02, 2012

In the scrub repair code, the code is changed to handle memory
allocation errors a little bit smarter. The change is to handle
it just like a read error. This simplifies the code and removes
a couple of lines of code, since the code to handle read errors
is there anyway.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

34f5c8e9

Btrfs: in scrub repair code, optimize the reading of mirrors · cb2ced73

由 Stefan Behrens 提交于 11月 02, 2012

In case that disk blocks need to be repaired (rewritten), the
current code at first (for simplicity reasons) reads all alternate
mirrors in the first step, afterwards selects the best one in a
second step. This is now changed to read one alternate mirror
after the other and to leave the loop early when a perfect mirror
is found.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

cb2ced73

Btrfs: make the scrub page array dynamically allocated · 7a9e9987

由 Stefan Behrens 提交于 11月 02, 2012

With the modified design (in order to support the devive replace
procedure) it is necessary to alloc the page array dynamically.
The reason is that pages are reused. At first a page is used for
the bio to read the data from the filesystem, then the same page
is reused for the bio that writes the data to the target disk.
Since the read process and the write process are completely
decoupled, this requires a new concept of refcounts and get/put
functions for pages, and it requires to use newly created pages
for each read bio which are freed after the write operation
is finished.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

7a9e9987

Btrfs: remove the block device pointer from the scrub context struct · a36cf8b8

由 Stefan Behrens 提交于 11月 02, 2012

The block device is removed from the scrub context state structure.
The scrub code as it is used for the device replace procedure reads
the source data from whereever it is optimal. The source device might
even be gone (disconnected, for instance due to a hardware failure).
Or the drive can be so faulty so that the device replace procedure
tries to avoid access to the faulty source drive as much as possible,
and only if all other mirrors are damaged, as a last resort, the
source disk is accessed.
The modified scrub code operates as if it would handle the source
drive and thereby generates an exact copy of the source disk on the
target disk, even if the source disk is not present at all. Therefore
the block device pointer to the source disk is removed in the scrub
context struct and moved into the lower level scope of scrub_bio,
fixup and page structures where the block device context is known.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

a36cf8b8

Btrfs: rename the scrub context structure · d9d181c1

由 Stefan Behrens 提交于 11月 02, 2012

The device replace procedure makes use of the scrub code. The scrub
code is the most efficient code to read the allocated data of a disk,
i.e. it reads sequentially in order to avoid disk head movements, it
skips unallocated blocks, it uses read ahead mechanisms, and it
contains all the code to detect and repair defects.
This commit is a first preparation step to adapt the scrub code to
be shareable for the device replace procedure.
The block device will be removed from the scrub context state
structure in a later step. It used to be the source block device.
The scrub code as it is used for the device replace procedure reads
the source data from whereever it is optimal. The source device might
even be gone (disconnected, for instance due to a hardware failure).
Or the drive can be so faulty so that the device replace procedure
tries to avoid access to the faulty source drive as much as possible,
and only if all other mirrors are damaged, as a last resort, the
source disk is accessed.
The modified scrub code operates as if it would handle the source
drive and thereby generates an exact copy of the source disk on the
target disk, even if the source disk is not present at all. Therefore
the block device pointer to the source disk is removed in a later
patch, and therefore the context structure is renamed (this is the
goal of the current patch) to reflect that no source block device
scope is there anymore.

Summary:
This first preparation step consists of a textual substitution of the
term "dev" to the term "ctx" whereever the scrub context is used.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d9d181c1

Btrfs: protect devices list with its mutex · d25628bd

由 Liu Bo 提交于 11月 14, 2012

Since we've kill the bigger one volume_mutex, we need to add devices
list mutex back.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d25628bd

Btrfs: cleanup for btrfs_btree_balance_dirty · b53d3f5d

由 Liu Bo 提交于 11月 14, 2012

- 'nr' is no more used.
- btrfs_btree_balance_dirty() and __btrfs_btree_balance_dirty() can share
  a bunch of code.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b53d3f5d

Btrfs: merge inode_list in __merge_refs · 3ef5969c

由 Alexander Block 提交于 11月 08, 2012

When __merge_refs merges two refs, it is also needed to merge the
inode_list of both refs. Otherwise we have missed backrefs and memory
leaks. This happens for example if two inodes share an extent and
both lie in the same leaf and thus also have the same parent.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

3ef5969c

Btrfs: set hole punching time properly · e1f5790e

由 Tsutomu Itoh 提交于 11月 08, 2012

Even if the hole punching is executed, the modification time of the
file is not updated.
So, current time is set to inode.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

e1f5790e

Btrfs: Don't trust the superblock label and simply printk("%s") it · d03f918a

由 Stefan Behrens 提交于 11月 05, 2012

Someone who is root or capable(CAP_SYS_ADMIN) could corrupt the
superblock and make Btrfs printk("%s") crash while holding the
uuid_mutex since nobody forces a limit on the string. Since the
uuid_mutex is significant, the system would be unusable
afterwards.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d03f918a

Btrfs: fix a double free on pending snapshots in error handling · 109f2365

由 Liu Bo 提交于 11月 05, 2012

When creating a snapshot, failing to commit a transaction can end up
with aborting the transaction, following by doing a cleanup for it, where
we'll free all snapshots pending to disk.

So we check it and avoid double free on pending snapshots.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

109f2365

Btrfs: fix a deadlock in aborting transaction due to ENOSPC · 37c4146d

由 Liu Bo 提交于 11月 05, 2012

When committing a transaction, we may bail out of running delayed refs
due to ENOSPC, and then abort the current transaction to flip into readonly.

But we'll hit a deadlock on ref head's lock since we forget to release
its lock and other cleanup stuff.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

37c4146d

fs/btrfs: drop if around WARN_ON · 6c1500f2

由 Julia Lawall 提交于 11月 03, 2012

Just use WARN_ON rather than an if containing only WARN_ON(1).

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
@@
- if (e) WARN_ON(1);
+ WARN_ON(e);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

6c1500f2

fs/btrfs: use WARN · 31b1a2bd

由 Julia Lawall 提交于 11月 03, 2012

Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

31b1a2bd

Btrfs: fix missing log when BTRFS_INODE_NEEDS_FULL_SYNC is set · 5269b67e

由 Miao Xie 提交于 11月 01, 2012

If we set BTRFS_INODE_NEEDS_FULL_SYNC, we should log all the extent,
but now we forget to take it into account, and set a wrong max key,
if so, we will skip the file extent metadata when doing logging. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

5269b67e

Btrfs: fix unprotected extent map operation when logging file extents · bbe14267

由 Miao Xie 提交于 11月 01, 2012

We forget to protect the modified_extents list, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

bbe14267

Btrfs: fix wrong file extent length · 315a9850

由 Miao Xie 提交于 11月 01, 2012

There are two types of the file extent - inline extent and regular extent,
When we log file extents, we didn't take inline extent into account, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

315a9850

Btrfs: fix missing flush when committing a transaction · ca469637

由 Miao Xie 提交于 11月 01, 2012

Consider the following case:
	Task1				Task2
	start_transaction
					commit_transaction
					  check pending snapshots list and the
					  list is empty.
	add pending snapshot into list
					  skip the delalloc flush
	end_transaction
					  ...

And then the problem that the snapshot is different with the source subvolume
happen.

This patch fixes the above problem by flush all pending stuffs when all the
other tasks end the transaction.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

ca469637

Btrfs: fix joining the same transaction handler more than 2 times · b7d5b0a8

由 Miao Xie 提交于 11月 01, 2012

If we flush inodes with pending delalloc in a transaction, we may join
the same transaction handler more than 2 times.

The reason is:
  Task						use_count of trans handle
  commit_transaction				1
    |-> btrfs_start_delalloc_inodes		1
	  |-> run_delalloc_nocow		1
		|-> join_transaction		2
		|-> cow_file_range		2
			|-> join_transaction	3

In fact, cow_file_range needn't join the transaction again because the caller
have joined the transaction, so we fix this problem by this way.
Reported-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b7d5b0a8

Btrfs: cleanup for btrfs_wait_order_range · 4fde183d

由 Liu Bo 提交于 11月 01, 2012

Variable 'found' is no more used.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

4fde183d

Btrfs: get right arguments for btrfs_wait_ordered_range · 9f3959c5

由 Liu Bo 提交于 11月 01, 2012

btrfs_wait_ordered_range expects for 'len' instead of 'end'.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

9f3959c5

Btrfs: do not log extents when we only log new names · 183f37fa

由 Liu Bo 提交于 11月 01, 2012

When we log new names, we need to log just enough to recreate the inode
during log replay, and there is no need to log extents along with it.

This actually fixes a bug revealed by xfstests 241, where it shows
that we're logging some extents that have not updated metadata,
so we don't get proper EXTENT_DATA items to be copied to log tree.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

183f37fa

Btrfs: don't allow degraded mount if too many devices are missing · 292fd7fc

由 Stefan Behrens 提交于 10月 30, 2012

The current behavior is to allow mounting or remounting a filesystem
writeable in degraded mode if at least one writeable device is
present.
The next failed write access to a missing device which is above
the tolerance of the configured level of redundancy results in an
read-only enforcement. Even without this, the next time
barrier_all_devices() is called and more devices are missing than
tolerable, the switch to read-only mode takes place.

In order to behave predictably and to provide proper feedback to
the user at mount time, this patch compares the number of missing
devices with the number of devices that are tolerated to be missing
according to the configured RAID level. If more devices are missing
than tolerated, e.g. if two devices are missing in case of RAID1,
only a read-only mount and remount is allowed.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

292fd7fc

Btrfs: Fix typo in fs/btrfs · d1423248

由 Masanari Iida 提交于 10月 31, 2012

Correct spelling typo in btrfs.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d1423248

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功