提交 · 174ba50915b08dcfd07c8b5fb795b46a165fa09a · openeuler / Kernel

27 5月, 2011 5 次提交

Btrfs: use the device_list_mutex during write_dev_supers · 174ba509

由 Chris Mason 提交于 5月 27, 2011

write_dev_supers was changed to use RCU to protect the list of
devices, but it was then sleeping while it actually wrote the supers.
This fixes it to just use the mutex, since we really don't any
concurrency in write_dev_supers anyway.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

174ba509

Btrfs: setup free ino caching in a more asynchronous way · a47d6b70

由 Li Zefan 提交于 5月 26, 2011

For a filesystem that has lots of files in it, the first time we mount
it with free ino caching support, it can take quite a long time to
setup the caching before we can create new files.

Here we fill the cache with [highest_ino, BTRFS_LAST_FREE_OBJECTID]
before we start the caching thread to search through the extent tree.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

a47d6b70

btrfs scrub: don't coalesce pages that are logically discontiguous · 00d01bc1

由 Arne Jansen 提交于 5月 25, 2011

scrub_page collects several pages into one bio as long as they are physically
contiguous. As we only save one logical address for the whole bio, don't
collect pages that are physically contiguous but logically discontiguous.
Signed-off-by: NArne Jansen <sensille@gmx.net>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

00d01bc1

Btrfs: return -ENOMEM in clear_extent_bit · c309df07

由 Chris Mason 提交于 5月 26, 2011

The btrfs releasepage function depends on ENOMEM coming
back when it is called atomic.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c309df07

Btrfs: add mount -o auto_defrag · 4cb5300b

由 Chris Mason 提交于 5月 24, 2011

This will detect small random writes into files and
queue the up for an auto defrag process.  It isn't well suited to
database workloads yet, but works for smaller files such as rpm, sqlite
or bdb databases.
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4cb5300b

24 5月, 2011 19 次提交

Btrfs: using rcu lock in the reader side of devices list · 1f78160c

由 Xiao Guangrong 提交于 4月 20, 2011

fs_devices->devices is only updated on remove and add device paths, so we can
use rcu to protect it in the reader side
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1f78160c

Btrfs: drop unnecessary device lock · 46224705

由 Xiao Guangrong 提交于 4月 20, 2011

Drop device_list_mutex for the reader side  on clone_fs_devices and
btrfs_rm_device pathes since the fs_info->volume_mutex can ensure the device
list is not updated

btrfs_close_extra_devices is the initialized path, we can not add or remove
device at this time, so we can simply drop the mutex safely, like other
initialized function does(add_missing_dev, __find_device, __btrfs_open_devices
...).
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

46224705

Btrfs: fix the race between remove dev and alloc chunk · 0c1daee0

由 Xiao Guangrong 提交于 4月 20, 2011

On remove device path, it updates device->dev_alloc_list but does not hold
chunk lock
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0c1daee0

Btrfs: fix the race between reading and updating devices · c9513edb

由 Xiao Guangrong 提交于 4月 20, 2011

On btrfs_congested_fn and __unplug_io_fn paths, we should hold
device_list_mutex to avoid remove/add device path to
update fs_devices->devices

On __btrfs_close_devices and btrfs_prepare_sprout paths, the devices in
fs_devices->devices or fs_devices->devices is updated, so we should hold
the mutex to avoid the reader side to reach them
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c9513edb

Btrfs: fix bh leak on __btrfs_open_devices path · 4f6c9328

由 Xiao Guangrong 提交于 4月 20, 2011

'bh' is forgot to release if no error is detected
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

4f6c9328

Btrfs: fix unsafe usage of merge_state · c7f895a2

由 Xiao Guangrong 提交于 4月 20, 2011

merge_state can free the current state if it can be merged with the next node,
but in set_extent_bit(), after merge_state, we still use the current extent to
get the next node and cache it into cached_state
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c7f895a2

Btrfs: allocate extent state and check the result properly · 8233767a

由 Xiao Guangrong 提交于 4月 20, 2011

It doesn't allocate extent_state and check the result properly:
- in set_extent_bit, it doesn't allocate extent_state if the path is not
  allowed wait

- in clear_extent_bit, it doesn't check the result after atomic-ly allocate,
  we trigger BUG_ON() if it's fail

- if allocate fail, we trigger BUG_ON instead of returning -ENOMEM since
  the return value of clear_extent_bit() is ignored by many callers
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8233767a

fs/btrfs: Add missing btrfs_free_path · b0839166

由 Julia Lawall 提交于 5月 14, 2011

Btrfs_alloc_path should be matched with btrfs_free_path in error-handling code.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
local idexpression struct btrfs_path * x;
expression ra,rb;
position p1,p2;
@@

x = btrfs_alloc_path@p1(...)
...  when != btrfs_free_path(x,...)
     when != if (...) { ... btrfs_free_path(x,...) ...}
     when != x = ra
if(...) { ... when != x = rb
     when forall
     when != btrfs_free_path(x,...)
 \(return <+...x...+>; \| return@p2...; \) }

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

cocci.print_main("alloc",p1)
cocci.print_secs("return",p2)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b0839166

Btrfs: check return value of btrfs_inc_extent_ref() · 37daa4f9

由 Tsutomu Itoh 提交于 4月 28, 2011

If return value of btrfs_inc_extent_ref() is not 0, BUG() is called.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

37daa4f9

Btrfs: return error to caller if read_one_inode() fails · c00e9493

由 Tsutomu Itoh 提交于 4月 28, 2011

When read_one_inode() fails, error code is returned to caller instead
of BUG_ON().
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c00e9493

Btrfs: BUG_ON is deleted from the caller of btrfs_truncate_item & btrfs_extend_item · 1cd30799

由 Tsutomu Itoh 提交于 5月 19, 2011

Currently, btrfs_truncate_item and btrfs_extend_item returns only 0.
So, the check by BUG_ON in the caller is unnecessary.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1cd30799

Btrfs: return error code to caller when btrfs_del_item fails · 65a246c5

由 Tsutomu Itoh 提交于 5月 19, 2011

The error code is returned instead of calling BUG_ON when
btrfs_del_item returns the error.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

65a246c5

Btrfs: return error code to caller when btrfs_previous_item fails · b0b802d7

由 Tsutomu Itoh 提交于 5月 19, 2011

The error code is returned instead of calling BUG_ON when
btrfs_previous_item returns the error.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b0b802d7

btrfs: fix typo 'testeing' -> 'testing' · 27160b6b

由 Sergei Trofimovich 提交于 5月 20, 2011

Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

27160b6b

btrfs: typo: 'btrfS' -> 'btrfs' · 9694b3fc

由 Sergei Trofimovich 提交于 5月 20, 2011

Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

9694b3fc

btrfs: don't spin in shrink_delalloc if there is nothing to free · c4f675cd

由 Sergei Trofimovich 提交于 5月 20, 2011

Observed as a large delay when --mixed filesystem is filled up.
Test example:
1. create tiny --mixed FS:
   $ dd if=/dev/zero of=2G.img seek=$((2048 * 1024 * 1024 - 1)) count=1 bs=1
   $ mkfs.btrfs --mixed 2G.img
   $ mount -oloop 2G.img /mnt/ut/
2. Try to fill it up:
   $ dd if=/dev/urandom of=10M.file bs=10240 count=1024
   $ seq 1 256 | while read file_no; do echo $file_no; time cp 10M.file ${file_no}.copy; done

Up to '200.copy' it goes fast, but when disk fills-up each -ENOSPC
message takes 3 seconds to pop-up _every_ ENOSPC (and in usermode linux
it's even more: 30-60 seconds!). (Maybe, time depends on kernel's timer resolution).

No IO, no CPU load, just rescheduling. Some debugging revealed busy spinning
in shrink_delalloc.
Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

c4f675cd

btrfs: Delete unused version.sh script. · 0f3b708c

由 Jamey Sharp 提交于 5月 05, 2011

In 2008, commit b4f6c45d dropped the use
of fs/btrfs/version.sh, but left the script behind. Kill it.

Commit by Jamey Sharp and Josh Triplett.
Signed-off-by: NJamey Sharp <jamey@minilop.net>
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0f3b708c

btrfs: Ensure the tree search ioctl returns the right number of records · e2156867

由 Hugo Mills 提交于 5月 14, 2011

Btrfs's tree search ioctl has a field to indicate that no more than a
given number of records should be returned. The ioctl doesn't honour
this, as the tested value is not incremented until the end of the
copy_to_sk function. This patch removes an unnecessary local variable,
and updates the num_found counter as each key is found in the tree.
Signed-off-by: NHugo Mills <hugo@carfax.org.uk>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e2156867

BTRFS: Remove unused node_lock · 0956c798

由 Andi Kleen 提交于 5月 18, 2011

240f62c8 replaced the node_lock with rcu_read_lock, but forgot
to remove the actual lock in the data structure. Remove it here.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

0956c798

23 5月, 2011 1 次提交

Btrfs: do not flush csum items of unchanged file data during treelog · 8e531cdf

由 liubo 提交于 5月 06, 2011

The current code relogs the entire inode every time during fsync log,
and it is much better suited to small files rather than large ones.

During my performance test, the fsync performace of large files sucks,
and we can ascribe this to the tremendous amount of csum infos of the
large ones, cause we have to flush all of these csum infos into log trees
even when there are only _one_ change in the whole file data.  Apparently,
to optimize fsync, we need to create a filter to skip the unnecessary csum
ones, that is, the corresponding file data remains unchanged before this fsync.

Here I have some test results to show, I use sysbench to do "random write + fsync".

===
sysbench --test=fileio --num-threads=1 --file-num=2 --file-block-size=4K --file-total-size=8G --file-test-mode=rndwr --file-io-mode=sync --file-extra-flags=  [prepare, run]
===

Sysbench args:
  - Number of threads: 1
  - Extra file open flags: 0
  - 2 files, 4Gb each
  - Block size 4Kb
  - Number of random requests for random IO: 10000
  - Read/Write ratio for combined random IO test: 1.50
  - Periodic FSYNC enabled, calling fsync() each 100 requests.
  - Calling fsync() at the end of test, Enabled.
  - Using synchronous I/O mode
  - Doing random write test

Sysbench results:
===
   Operations performed:  0 Read, 10000 Write, 200 Other = 10200 Total
   Read 0b  Written 39.062Mb  Total transferred 39.062Mb
===
a) without patch:  (*SPEED* : 451.01Kb/sec)
   112.75 Requests/sec executed

b) with patch:     (*SPEED* : 4.7533Mb/sec)
   1216.84 Requests/sec executed

PS: I've made a _sub transid_ stuff patch, but it does not perform as effectively as this patch,
and I'm wanderring where the problem is and trying to improve it more.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

8e531cdf

22 5月, 2011 1 次提交
- C
  Btrfs: update the delayed inode code to use the btrfs_ino helper. · 0d0ca30f
  由 Chris Mason 提交于 5月 22, 2011
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
  0d0ca30f
21 5月, 2011 1 次提交

btrfs: implement delayed inode items operation · 16cdcec7

由 Miao Xie 提交于 4月 22, 2011

Changelog V5 -> V6:
- Fix oom when the memory load is high, by storing the delayed nodes into the
  root's radix tree, and letting btrfs inodes go.

Changelog V4 -> V5:
- Fix the race on adding the delayed node to the inode, which is spotted by
  Chris Mason.
- Merge Chris Mason's incremental patch into this patch.
- Fix deadlock between readdir() and memory fault, which is reported by
  Itaru Kitayama.

Changelog V3 -> V4:
- Fix nested lock, which is reported by Itaru Kitayama, by updating space cache
  inode in time.

Changelog V2 -> V3:
- Fix the race between the delayed worker and the task which does delayed items
  balance, which is reported by Tsutomu Itoh.
- Modify the patch address David Sterba's comment.
- Fix the bug of the cpu recursion spinlock, reported by Chris Mason

Changelog V1 -> V2:
- break up the global rb-tree, use a list to manage the delayed nodes,
  which is created for every directory and file, and used to manage the
  delayed directory name index items and the delayed inode item.
- introduce a worker to deal with the delayed nodes.

Compare with Ext3/4, the performance of file creation and deletion on btrfs
is very poor. the reason is that btrfs must do a lot of b+ tree insertions,
such as inode item, directory name item, directory name index and so on.

If we can do some delayed b+ tree insertion or deletion, we can improve the
performance, so we made this patch which implemented delayed directory name
index insertion/deletion and delayed inode update.

Implementation:
- introduce a delayed root object into the filesystem, that use two lists to
  manage the delayed nodes which are created for every file/directory.
  One is used to manage all the delayed nodes that have delayed items. And the
  other is used to manage the delayed nodes which is waiting to be dealt with
  by the work thread.
- Every delayed node has two rb-tree, one is used to manage the directory name
  index which is going to be inserted into b+ tree, and the other is used to
  manage the directory name index which is going to be deleted from b+ tree.
- introduce a worker to deal with the delayed operation. This worker is used
  to deal with the works of the delayed directory name index items insertion
  and deletion and the delayed inode update.
  When the delayed items is beyond the lower limit, we create works for some
  delayed nodes and insert them into the work queue of the worker, and then
  go back.
  When the delayed items is beyond the upper bound, we create works for all
  the delayed nodes that haven't been dealt with, and insert them into the work
  queue of the worker, and then wait for that the untreated items is below some
  threshold value.
- When we want to insert a directory name index into b+ tree, we just add the
  information into the delayed inserting rb-tree.
  And then we check the number of the delayed items and do delayed items
  balance. (The balance policy is above.)
- When we want to delete a directory name index from the b+ tree, we search it
  in the inserting rb-tree at first. If we look it up, just drop it. If not,
  add the key of it into the delayed deleting rb-tree.
  Similar to the delayed inserting rb-tree, we also check the number of the
  delayed items and do delayed items balance.
  (The same to inserting manipulation)
- When we want to update the metadata of some inode, we cached the data of the
  inode into the delayed node. the worker will flush it into the b+ tree after
  dealing with the delayed insertion and deletion.
- We will move the delayed node to the tail of the list after we access the
  delayed node, By this way, we can cache more delayed items and merge more
  inode updates.
- If we want to commit transaction, we will deal with all the delayed node.
- the delayed node will be freed when we free the btrfs inode.
- Before we log the inode items, we commit all the directory name index items
  and the delayed inode update.

I did a quick test by the benchmark tool[1] and found we can improve the
performance of file creation by ~15%, and file deletion by ~20%.

Before applying this patch:
Create files:
        Total files: 50000
        Total time: 1.096108
        Average time: 0.000022
Delete files:
        Total files: 50000
        Total time: 1.510403
        Average time: 0.000030

After applying this patch:
Create files:
        Total files: 50000
        Total time: 0.932899
        Average time: 0.000019
Delete files:
        Total files: 50000
        Total time: 1.215732
        Average time: 0.000024

[1] http://marc.info/?l=linux-btrfs&m=128212635122920&q=p3

Many thanks for Kitayama-san's help!
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Tested-by: NItaru Kitayama <kitayama@cl.bb4u.ne.jp>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

16cdcec7

15 5月, 2011 5 次提交

Btrfs: fix FS_IOC_SETFLAGS ioctl · ebcb904d

由 Li Zefan 提交于 4月 15, 2011

Steps to reproduce the bug:

  - Call FS_IOC_SETLFAGS ioctl with flags=FS_COMPR_FL
  - Call FS_IOC_SETFLAGS ioctl with flags=0
  - Call FS_IOC_GETFLAGS ioctl, and you'll see FS_COMPR_FL is still set!
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ebcb904d

Btrfs: fix FS_IOC_GETFLAGS ioctl · d0092bdd

由 Li Zefan 提交于 4月 15, 2011

As we've added per file compression/cow support.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d0092bdd

fs: remove FS_COW_FL · e1e8fb6a

由 Li Zefan 提交于 4月 15, 2011

FS_COW_FL and FS_NOCOW_FL were newly introduced to control per file
COW in btrfs, but FS_NOCOW_FL is sufficient.

The fact is we don't have corresponding BTRFS_INODE_COW flag.

COW is default, and FS_NOCOW_FL can be used to switch off COW for
a single file.

If we mount btrfs with nodatacow, a newly created file will be set with
the FS_NOCOW_FL flag. So to turn on COW for it, we can just clear the
FS_NOCOW_FL flag.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e1e8fb6a

Btrfs: fix easily get into ENOSPC in mixed case · 1aba86d6

由 liubo 提交于 4月 08, 2011

When a btrfs disk is created by mixed data & metadata option, it will have no
pure data or pure metadata space info.

In btrfs's for-linus branch, commit 78b1ea13838039cd88afdd62519b40b344d6c920
(Btrfs: fix OOPS of empty filesystem after balance) initializes space infos at
the very beginning.  The problem is this initialization does not take the mixed
case into account, which will cause btrfs will easily get into ENOSPC in mixed
case.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1aba86d6

Prevent oopsing in posix_acl_valid() · f5de9391

由 Daniel J Blueman 提交于 5月 03, 2011

If posix_acl_from_xattr() returns an error code, a negative address is
dereferenced causing an oops; fix by checking for error code first.
Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

f5de9391

13 5月, 2011 5 次提交

btrfs: quasi-round-robin for chunk allocation · 73c5de00

由 Arne Jansen 提交于 4月 12, 2011

In a multi device setup, the chunk allocator currently always allocates
chunks on the devices in the same order. This leads to a very uneven
distribution, especially with RAID1 or RAID10 and an uneven number of
devices.
This patch always sorts the devices before allocating, and allocates the
stripes on the devices with the most available space, as long as there
is enough space available. In a low space situation, it first tries to
maximize striping.
The patch also simplifies the allocator and reduces the checks for
corner cases.
The simplification is done by several means. First, it defines the
properties of each RAID type upfront. These properties are used afterwards
instead of differentiating cases in several places.
Second, the old allocator defined a minimum stripe size for each block
group type, tried to find a large enough chunk, and if this fails just
allocates a smaller one. This is now done in one step. The largest possible
chunk (up to max_chunk_size) is searched and allocated.
Because we now have only one pass, the allocation of the map (struct
map_lookup) is moved down to the point where the number of stripes is
already known. This way we avoid reallocation of the map.
We still avoid allocating stripes that are not a multiple of STRIPE_SIZE.

73c5de00

btrfs: heed alloc_start · a9c9bf68

由 Arne Jansen 提交于 4月 12, 2011

currently alloc_start is disregarded if the requested
chunk size is bigger than (device size - alloc_start),
but smaller than the device size.
The only situation where I see this could have made sense
was when a chunk equal the size of the device has been
requested. This was possible as the allocator failed to
take alloc_start into account when calculating the request
chunk size. As this gets fixed by this patch, the workaround
is not necessary anymore.

a9c9bf68

A
btrfs: move btrfs_cmp_device_free_bytes to super.c · bcd53741
由 Arne Jansen 提交于 4月 12, 2011
```
this function won't be used here anymore, so move it super.c where it is
used for df-calculation
```
bcd53741
D
btrfs: use unsigned type for single bit bitfield · 4ea02885
由 David Sterba 提交于 5月 12, 2011
```
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
4ea02885
D
btrfs: use printk_ratelimited instead of printk_ratelimit · 7a36ddec
由 David Sterba 提交于 5月 06, 2011
```
As per printk_ratelimit comment, it should not be used.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
```
7a36ddec

12 5月, 2011 3 次提交

btrfs: add readonly flag · 8628764e

由 Arne Jansen 提交于 3月 23, 2011

setting the readonly flag prevents writes in case an error is detected
Signed-off-by: NArne Jansen <sensille@gmx.net>

8628764e

btrfs scrub: make fixups sync · 96e36920

由 Ilya Dryomov 提交于 4月 09, 2011

btrfs scrub - make fixups sync, don't reuse fixup bios

Fixups are already sync for csum failures, this patch makes them sync
for EIO case as well.

Fixups are now sharing pages with the parent sbio - instead of
allocating a separate page to do a fixup we grab the page from the sbio
buffer.

Fixup bios are no longer reused.

struct fixup is no longer needed, instead pass [sbio pointer, index].

Originally this was added to look at the possibility of sharing the code
between drive swap and scrub, but it actually fixes a serious bug in
scrub code where errors that could be corrected were ignored and
reported as uncorrectable.

btrfs scrub - restore bios properly after media errors

The current code reallocates a bio after a media error.  This is a
temporary measure introduced in v3 after a serious problem related to
bio reuse was found in v2 of scrub patchset.

Basically we did not reset bv_offset and bv_len fields of the bio_vec
structure.  They are changed in case I/O error happens, for example, at
offset 512 or 1024 into the page.  Also bi_flags field wasn't properly
setup before reusing the bio.
Signed-off-by: NArne Jansen <sensille@gmx.net>

96e36920

btrfs: new ioctls for scrub · 475f6387

由 Jan Schmidt 提交于 3月 11, 2011

adds ioctls necessary to start and cancel scrubs, to get current
progress and to get info about devices to be scrubbed.
Note that the scrub is done per-device and that the ioctl only
returns after the scrub for this devices is finished or has been
canceled.
Signed-off-by: NArne Jansen <sensille@gmx.net>

475f6387

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功