提交 · 3ec706c831d4c96905c287013c8228b21619a1d9 · openanolis / cloud-kernel

13 12月, 2012 30 次提交

Btrfs: pass fs_info to btrfs_map_block() instead of mapping_tree · 3ec706c8

由 Stefan Behrens 提交于 11月 05, 2012

This is required for the device replace procedure in a later step.
Two calling functions also had to be changed to have the fs_info
pointer: repair_io_failure() and scrub_setup_recheck_block().
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

3ec706c8

Btrfs: Pass fs_info to btrfs_num_copies() instead of mapping_tree · 5d964051

由 Stefan Behrens 提交于 11月 05, 2012

This is required for the device replace procedure in a later step.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

5d964051

Btrfs: add two more find_device() methods · 7ba15b7d

由 Stefan Behrens 提交于 11月 05, 2012

The new function btrfs_find_device_missing_or_by_path() will be
used for the device replace procedure. This function itself calls
the second new function btrfs_find_device_by_path().
Unfortunately, it is not possible to currently make the rest of the
code use these functions as well, since all functions that look
similar at first view are all a little bit different in what they
are doing. But in the future, new code could benefit from these
two new functions, and currently, device replace uses them.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

7ba15b7d

Btrfs: move some common code into a subfunction · beaf8ab3

由 Stefan Behrens 提交于 11月 12, 2012

Some code to open block devices, to read the superblock and to
handle errors was repeated multiple times in 3 places, and the
following patch makes use of it as well. This code is now moved
into a subfunction.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

beaf8ab3

Btrfs: cleanup scrub bio and worker wait code · b6bfebc1

由 Stefan Behrens 提交于 11月 02, 2012

Just move some code into functions to make everything more readable.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b6bfebc1

Btrfs: in scrub repair code, simplify alloc error handling · 34f5c8e9

由 Stefan Behrens 提交于 11月 02, 2012

In the scrub repair code, the code is changed to handle memory
allocation errors a little bit smarter. The change is to handle
it just like a read error. This simplifies the code and removes
a couple of lines of code, since the code to handle read errors
is there anyway.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

34f5c8e9

Btrfs: in scrub repair code, optimize the reading of mirrors · cb2ced73

由 Stefan Behrens 提交于 11月 02, 2012

In case that disk blocks need to be repaired (rewritten), the
current code at first (for simplicity reasons) reads all alternate
mirrors in the first step, afterwards selects the best one in a
second step. This is now changed to read one alternate mirror
after the other and to leave the loop early when a perfect mirror
is found.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

cb2ced73

Btrfs: make the scrub page array dynamically allocated · 7a9e9987

由 Stefan Behrens 提交于 11月 02, 2012

With the modified design (in order to support the devive replace
procedure) it is necessary to alloc the page array dynamically.
The reason is that pages are reused. At first a page is used for
the bio to read the data from the filesystem, then the same page
is reused for the bio that writes the data to the target disk.
Since the read process and the write process are completely
decoupled, this requires a new concept of refcounts and get/put
functions for pages, and it requires to use newly created pages
for each read bio which are freed after the write operation
is finished.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

7a9e9987

Btrfs: remove the block device pointer from the scrub context struct · a36cf8b8

由 Stefan Behrens 提交于 11月 02, 2012

The block device is removed from the scrub context state structure.
The scrub code as it is used for the device replace procedure reads
the source data from whereever it is optimal. The source device might
even be gone (disconnected, for instance due to a hardware failure).
Or the drive can be so faulty so that the device replace procedure
tries to avoid access to the faulty source drive as much as possible,
and only if all other mirrors are damaged, as a last resort, the
source disk is accessed.
The modified scrub code operates as if it would handle the source
drive and thereby generates an exact copy of the source disk on the
target disk, even if the source disk is not present at all. Therefore
the block device pointer to the source disk is removed in the scrub
context struct and moved into the lower level scope of scrub_bio,
fixup and page structures where the block device context is known.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

a36cf8b8

Btrfs: rename the scrub context structure · d9d181c1

由 Stefan Behrens 提交于 11月 02, 2012

The device replace procedure makes use of the scrub code. The scrub
code is the most efficient code to read the allocated data of a disk,
i.e. it reads sequentially in order to avoid disk head movements, it
skips unallocated blocks, it uses read ahead mechanisms, and it
contains all the code to detect and repair defects.
This commit is a first preparation step to adapt the scrub code to
be shareable for the device replace procedure.
The block device will be removed from the scrub context state
structure in a later step. It used to be the source block device.
The scrub code as it is used for the device replace procedure reads
the source data from whereever it is optimal. The source device might
even be gone (disconnected, for instance due to a hardware failure).
Or the drive can be so faulty so that the device replace procedure
tries to avoid access to the faulty source drive as much as possible,
and only if all other mirrors are damaged, as a last resort, the
source disk is accessed.
The modified scrub code operates as if it would handle the source
drive and thereby generates an exact copy of the source disk on the
target disk, even if the source disk is not present at all. Therefore
the block device pointer to the source disk is removed in a later
patch, and therefore the context structure is renamed (this is the
goal of the current patch) to reflect that no source block device
scope is there anymore.

Summary:
This first preparation step consists of a textual substitution of the
term "dev" to the term "ctx" whereever the scrub context is used.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d9d181c1

Btrfs: protect devices list with its mutex · d25628bd

由 Liu Bo 提交于 11月 14, 2012

Since we've kill the bigger one volume_mutex, we need to add devices
list mutex back.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d25628bd

Btrfs: cleanup for btrfs_btree_balance_dirty · b53d3f5d

由 Liu Bo 提交于 11月 14, 2012

- 'nr' is no more used.
- btrfs_btree_balance_dirty() and __btrfs_btree_balance_dirty() can share
  a bunch of code.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b53d3f5d

Btrfs: merge inode_list in __merge_refs · 3ef5969c

由 Alexander Block 提交于 11月 08, 2012

When __merge_refs merges two refs, it is also needed to merge the
inode_list of both refs. Otherwise we have missed backrefs and memory
leaks. This happens for example if two inodes share an extent and
both lie in the same leaf and thus also have the same parent.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

3ef5969c

Btrfs: set hole punching time properly · e1f5790e

由 Tsutomu Itoh 提交于 11月 08, 2012

Even if the hole punching is executed, the modification time of the
file is not updated.
So, current time is set to inode.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

e1f5790e

Btrfs: Don't trust the superblock label and simply printk("%s") it · d03f918a

由 Stefan Behrens 提交于 11月 05, 2012

Someone who is root or capable(CAP_SYS_ADMIN) could corrupt the
superblock and make Btrfs printk("%s") crash while holding the
uuid_mutex since nobody forces a limit on the string. Since the
uuid_mutex is significant, the system would be unusable
afterwards.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d03f918a

Btrfs: fix a double free on pending snapshots in error handling · 109f2365

由 Liu Bo 提交于 11月 05, 2012

When creating a snapshot, failing to commit a transaction can end up
with aborting the transaction, following by doing a cleanup for it, where
we'll free all snapshots pending to disk.

So we check it and avoid double free on pending snapshots.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

109f2365

Btrfs: fix a deadlock in aborting transaction due to ENOSPC · 37c4146d

由 Liu Bo 提交于 11月 05, 2012

When committing a transaction, we may bail out of running delayed refs
due to ENOSPC, and then abort the current transaction to flip into readonly.

But we'll hit a deadlock on ref head's lock since we forget to release
its lock and other cleanup stuff.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

37c4146d

fs/btrfs: drop if around WARN_ON · 6c1500f2

由 Julia Lawall 提交于 11月 03, 2012

Just use WARN_ON rather than an if containing only WARN_ON(1).

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e;
@@
- if (e) WARN_ON(1);
+ WARN_ON(e);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

6c1500f2

fs/btrfs: use WARN · 31b1a2bd

由 Julia Lawall 提交于 11月 03, 2012

Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

31b1a2bd

Btrfs: fix missing log when BTRFS_INODE_NEEDS_FULL_SYNC is set · 5269b67e

由 Miao Xie 提交于 11月 01, 2012

If we set BTRFS_INODE_NEEDS_FULL_SYNC, we should log all the extent,
but now we forget to take it into account, and set a wrong max key,
if so, we will skip the file extent metadata when doing logging. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

5269b67e

Btrfs: fix unprotected extent map operation when logging file extents · bbe14267

由 Miao Xie 提交于 11月 01, 2012

We forget to protect the modified_extents list, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

bbe14267

Btrfs: fix wrong file extent length · 315a9850

由 Miao Xie 提交于 11月 01, 2012

There are two types of the file extent - inline extent and regular extent,
When we log file extents, we didn't take inline extent into account, fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

315a9850

Btrfs: fix missing flush when committing a transaction · ca469637

由 Miao Xie 提交于 11月 01, 2012

Consider the following case:
	Task1				Task2
	start_transaction
					commit_transaction
					  check pending snapshots list and the
					  list is empty.
	add pending snapshot into list
					  skip the delalloc flush
	end_transaction
					  ...

And then the problem that the snapshot is different with the source subvolume
happen.

This patch fixes the above problem by flush all pending stuffs when all the
other tasks end the transaction.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

ca469637

Btrfs: fix joining the same transaction handler more than 2 times · b7d5b0a8

由 Miao Xie 提交于 11月 01, 2012

If we flush inodes with pending delalloc in a transaction, we may join
the same transaction handler more than 2 times.

The reason is:
  Task						use_count of trans handle
  commit_transaction				1
    |-> btrfs_start_delalloc_inodes		1
	  |-> run_delalloc_nocow		1
		|-> join_transaction		2
		|-> cow_file_range		2
			|-> join_transaction	3

In fact, cow_file_range needn't join the transaction again because the caller
have joined the transaction, so we fix this problem by this way.
Reported-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

b7d5b0a8

Btrfs: cleanup for btrfs_wait_order_range · 4fde183d

由 Liu Bo 提交于 11月 01, 2012

Variable 'found' is no more used.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

4fde183d

Btrfs: get right arguments for btrfs_wait_ordered_range · 9f3959c5

由 Liu Bo 提交于 11月 01, 2012

btrfs_wait_ordered_range expects for 'len' instead of 'end'.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

9f3959c5

Btrfs: do not log extents when we only log new names · 183f37fa

由 Liu Bo 提交于 11月 01, 2012

When we log new names, we need to log just enough to recreate the inode
during log replay, and there is no need to log extents along with it.

This actually fixes a bug revealed by xfstests 241, where it shows
that we're logging some extents that have not updated metadata,
so we don't get proper EXTENT_DATA items to be copied to log tree.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

183f37fa

Btrfs: don't allow degraded mount if too many devices are missing · 292fd7fc

由 Stefan Behrens 提交于 10月 30, 2012

The current behavior is to allow mounting or remounting a filesystem
writeable in degraded mode if at least one writeable device is
present.
The next failed write access to a missing device which is above
the tolerance of the configured level of redundancy results in an
read-only enforcement. Even without this, the next time
barrier_all_devices() is called and more devices are missing than
tolerable, the switch to read-only mode takes place.

In order to behave predictably and to provide proper feedback to
the user at mount time, this patch compares the number of missing
devices with the number of devices that are tolerated to be missing
according to the configured RAID level. If more devices are missing
than tolerated, e.g. if two devices are missing in case of RAID1,
only a read-only mount and remount is allowed.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

292fd7fc

Btrfs: Fix typo in fs/btrfs · d1423248

由 Masanari Iida 提交于 10月 31, 2012

Correct spelling typo in btrfs.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

d1423248

Btrfs: Remove the invalid shrink size check up from btrfs_shrink_dev() · 0253f40e

由 jeff.liu 提交于 10月 27, 2012

Remove an invalid size check up from btrfs_shrink_dev().

The new size should not larger than the device->total_bytes as it was
already verified before coming to here(i.e. new_size < old_size).

Remove invalid check up for btrfs_shrink_dev().
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

0253f40e

12 12月, 2012 10 次提交

Btrfs: make ordered extent be flushed by multi-task · 9afab882

由 Miao Xie 提交于 10月 25, 2012

Though the process of the ordered extents is a bit different with the delalloc inode
flush, but we can see it as a subset of the delalloc inode flush, so we also handle
them by flush workers.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

9afab882

Btrfs: make ordered operations be handled by multi-task · 25287e0a

由 Miao Xie 提交于 10月 25, 2012

The process of the ordered operations is similar to the delalloc inode flush, so
we handle them by flush workers.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

25287e0a

Btrfs: make delalloc inodes be flushed by multi-task · 8ccf6f19

由 Miao Xie 提交于 10月 25, 2012

This patch introduce a new worker pool named "flush_workers", and if we
want to force all the inode with pending delalloc to the disks, we can
queue those inodes into the work queue of the worker pool, in this way,
those inodes will be flushed by multi-task.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

8ccf6f19

Btrfs: fill the global reserve when unpinning space · 7b398f8e

由 Josef Bacik 提交于 10月 22, 2012

Dave gave me an image of a very full file system that would abort the
transaction because it ran out of space while committing the transaction.
This is because we would think there was plenty of room to create a snapshot
even though the global reserve was not full. This happens because we
calculate the global reserve size before we unpin any space, so after we
unpin the space we allow reservations to occur even though we haven't
reserved all of the space for our global reserve. Fix this by adding to the
global reserve while unpinning in order to make sure we always have enough
space to do our work. With this patch we no longer end up with an aborted
transaction, we return ENOSPC properly to the person trying to create the
snapshot. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

7b398f8e

Btrfs: cleanup unused arguments · 32adf090

由 Liu Bo 提交于 10月 19, 2012

'disk_key' is not used at all.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

32adf090

Btrfs: kill unnecessary arguments in del_ptr · 0e411ece

由 Liu Bo 提交于 10月 19, 2012

The argument 'tree_mod_log' is not necessary since all of callers enable it.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

0e411ece

Btrfs: reorder tree mod log operations in deleting a pointer · 6a7a665d

由 Liu Bo 提交于 10月 19, 2012

Since we don't use MOD_LOG_KEY_REMOVE_WHILE_MOVING to add nritems
during rewinding, we should insert a MOD_LOG_KEY_REMOVE operation first.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

6a7a665d

Btrfs: MOD_LOG_KEY_REMOVE_WHILE_MOVING never change node's nritems · 95c80bb1

由 Liu Bo 提交于 10月 19, 2012

Key MOD_LOG_KEY_REMOVE_WHILE_MOVING means that we're doing memmove inside
an extent buffer node, and the node's number of items remains unchanged
(unless we are inserting a single pointer, but we have MOD_LOG_KEY_ADD for that).

So we don't need to increase node's number of items during rewinding,
otherwise we may get an node larger than leafsize and cause general protection
errors later.

Here is the details,
- If we do memory move for inserting a single pointer, we need to
  add node's nritems by one, and we honor MOD_LOG_KEY_ADD for adding.

- If we do memory move for deleting a single pointer, we need to
  decrease node's nritems by one, and we honor MOD_LOG_KEY_REMOVE for
  deleting.

- If we do memory move for balance left/right, we need to decrease
  node's nritems, and we honor MOD_LOG_KEY_REMOVE for balaning.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

95c80bb1

Btrfs: fix unnecessary while loop when search the free space, cache · de6c4115

由 Miao Xie 提交于 10月 18, 2012

When we find a bitmap free space entry, we may check the previous extent
entry covers the offset or not. But if we find this entry is also a bitmap
entry, we will continue to check the previous entry of the current one by
a while loop. It is unnecessary because it is impossible that the extent
entry which is in front of a bitmap entry can cover the offset of the entry
after that bitmap entry.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

de6c4115

Btrfs: recheck bio against block device when we map the bio · de1ee92a

由 Josef Bacik 提交于 10月 19, 2012

Alex reported a problem where we were writing between chunks on a rbd
device. The thing is we do bio_add_page using logical offsets, but the
physical offset may be different. So when we map the bio now check to see
if the bio is still ok with the physical offset, and if it is not split the
bio up and redo the bio_add_page with the physical sector. This fixes the
problem for Alex and doesn't affect performance in the normal case. Thanks,
Reported-and-tested-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

de1ee92a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功