提交 · 2671485d395c07fca104c972785898d7c52fc942 · xiphi1978 / linux

02 10月, 2012 2 次提交

Btrfs: remove unused hint byte argument for btrfs_drop_extents · 2671485d

由 Josef Bacik 提交于 8月 29, 2012

I audited all users of btrfs_drop_extents and found that nobody actually uses
the hint_byte argument. I'm sure it was used for something at some point but
it's not used now, and the way the pinning works the disk bytenr would never be
immediately useful anyway so lets just remove it. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2671485d

Btrfs: turbo charge fsync · 5dc562c5

由 Josef Bacik 提交于 8月 17, 2012

At least for the vm workload.  Currently on fsync we will

1) Truncate all items in the log tree for the given inode if they exist

and

2) Copy all items for a given inode into the log

The problem with this is that for things like VMs you can have lots of
extents from the fragmented writing behavior, and worst yet you may have
only modified a few extents, not the entire thing.  This patch fixes this
problem by tracking which transid modified our extent, and then when we do
the tree logging we find all of the extents we've modified in our current
transaction, sort them and commit them.  We also only truncate up to the
xattrs of the inode and copy that stuff in normally, and then just drop any
extents in the range we have that exist in the log already.  Here are some
numbers of a 50 meg fio job that does random writes and fsync()s after every
write

		Original	Patched
SATA drive	82KB/s		140KB/s
Fusion drive	431KB/s		2532KB/s

So around 2-6 times faster depending on your hardware.  There are a few
corner cases, for example if you truncate at all we have to do it the old
way since there is no way to be sure what is in the log is ok.  This
probably could be done smarter, but if you write-fsync-truncate-write-fsync
you deserve what you get.  All this work is in RAM of course so if your
inode gets evicted from cache and you read it in and fsync it we'll do it
the slow way if we are still in the same transaction that we last modified
the inode in.

The biggest cool part of this is that it requires no changes to the recovery
code, so if you fsync with this patch and crash and load an old kernel, it
will run the recovery and be a-ok.  I have tested this pretty thoroughly
with an fsync tester and everything comes back fine, as well as xfstests.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

5dc562c5

29 8月, 2012 1 次提交

Btrfs: fix some endian bugs handling the root times · dadd1105

由 Dan Carpenter 提交于 7月 30, 2012

"trans->transid" is cpu endian but we want to store the data as little
endian.  "item->ctime.nsec" is only 32 bits, not 64.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>

dadd1105

09 8月, 2012 1 次提交

Btrfs: remove mnt_want_write call in btrfs_mksubvol · e00da206

由 Alexander Block 提交于 8月 02, 2012

We got a recursive lock in mksubvol because the caller already held
a lock. I think we got into this due to a merge error. Commit a874a63e
removed the mnt_want_write call from btrfs_mksubvol and added a
replacement call to mnt_want_write_file in btrfs_ioctl_snap_create_transid.
Commit e7848683 however tried to move all calls to mnt_want_write above
i_mutex. So somewhere while merging this, it got mixed up. The
solution is to remove the mnt_want_write call completely from
mksubvol.
Reported-by: NDavid Sterba <dave@jikos.cz>
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

e00da206

31 7月, 2012 1 次提交

btrfs: Push mnt_want_write() outside of i_mutex · e7848683

由 Jan Kara 提交于 6月 12, 2012

When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.

CC: Chris Mason <chris.mason@oracle.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e7848683

26 7月, 2012 3 次提交

Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive · 31db9f7c

由 Alexander Block 提交于 7月 25, 2012

This patch introduces the BTRFS_IOC_SEND ioctl that is
required for send. It allows btrfs-progs to implement
full and incremental sends. Patches for btrfs-progs will
follow.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Reviewed-by: NArne Jansen <sensille@gmx.net>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>

31db9f7c

Btrfs: introduce subvol uuids and times · 8ea05e3a

由 Alexander Block 提交于 7月 25, 2012

This patch introduces uuids for subvolumes. Each
subvolume has it's own uuid. In case it was snapshotted,
it also contains parent_uuid. In case it was received,
it also contains received_uuid.

It also introduces subvolume ctime/otime/stime/rtime. The
first two are comparable to the times found in inodes. otime
is the origin/creation time and ctime is the change time.
stime/rtime are only valid on received subvolumes.
stime is the time of the subvolume when it was
sent. rtime is the time of the subvolume when it was
received.

Additionally to the times, we have a transid for each
time. They are updated at the same place as the times.

btrfs receive uses stransid and rtransid to find out
if a received subvolume changed in the meantime.

If an older kernel mounts a filesystem with the
extented fields, all fields become invalid. The next
mount with a new kernel will detect this and reset the
fields.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Reviewed-by: NArne Jansen <sensille@gmx.net>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>

8ea05e3a

Btrfs: Check INCOMPAT flags on remount and add helper function · 2b0ce2c2

由 Mitch Harder 提交于 7月 24, 2012

In support of the recently added capability to remount with lzo
compression, provide a helper function to check the compression
INCOMPAT flags when remounting with lzo compression, and set
the flags if necessary.

Also, implement the new helper function when defragmenting with
explicit lzo compression and when setting the default subvolume.
Signed-off-by: NMitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

2b0ce2c2

25 7月, 2012 1 次提交

btrfs: allow cross-subvolume file clone · 362a20c5

由 David Sterba 提交于 8月 01, 2011

Lift the EXDEV condition and allow different root trees for files being
cloned, then pass source inode's root when searching for extents.
Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from
one filesystem are mounted separately.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

362a20c5

24 7月, 2012 6 次提交

Btrfs: do not set subvolume flags in readonly mode · b9ca0664

由 Liu Bo 提交于 6月 29, 2012

$ mkfs.btrfs /dev/sdb7
$ btrfstune -S1 /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs
mount: block device /dev/sdb7 is write-protected, mounting read-only
$ btrfs dev add /dev/sdb8 /mnt/btrfs/

Now we get a btrfs in which mnt flags has readonly but sb flags does
not.  So for those ioctls that only check sb flags with MS_RDONLY, it
is going to be a problem.
Setting subvolume flags is such an ioctl, we should use mnt_want_write_file()
to check RO flags.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

b9ca0664

Btrfs: use mnt_want_write_file instead of mnt_want_write · e54bfa31

由 Liu Bo 提交于 6月 29, 2012

mnt_want_write_file is faster when file has been opened for write.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

e54bfa31

Btrfs: remove redundant r/o check for superblock · 768e9dfe

由 Liu Bo 提交于 6月 29, 2012

mnt_want_write() and mnt_want_write_file() will check sb->s_flags with
MS_RDONLY, and we don't need to do it ourselves.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

768e9dfe

Btrfs: check write access to mount earlier while creating snapshots · a874a63e

由 Liu Bo 提交于 6月 29, 2012

Move check of write access to mount into upper functions so that we can
use mnt_want_write_file instead, which is faster than mnt_want_write.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>

a874a63e

btrfs: join DEV_STATS ioctls to one · b27f7c0c

由 David Sterba 提交于 6月 22, 2012

Commit c11d2c23 (Btrfs: add ioctl to get and reset the device
stats) introduced two ioctls doing almost the same thing distinguished
by just the ioctl number which encodes "do reset after read". I have
suggested

http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg16604.html

to implement it via the ioctl args. This hasn't happen, and I think we
should use a more clean way to pass flags and should not waste ioctl
numbers.

CC: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>

b27f7c0c

btrfs: ignore unfragmented file checks in defrag when compression enabled - rebased · a43a2111

由 Andrew Mahone 提交于 6月 19, 2012

Rebased on btrfs-next and retested.

Inform should_defrag_range if BTRFS_DEFRAG_RANGE_COMPRESS is set. If so, skip
checks for adjacent extents and extent size when deciding whether to defrag,
as these can prevent an uncompressed and unfragmented file from being
compressed as requested.
Signed-off-by: NAndrew Mahone <andrew.mahone@gmail.com>

a43a2111

23 7月, 2012 1 次提交
- A
  btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file() · 11e62a8f
  由 Al Viro 提交于 7月 19, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  11e62a8f
12 7月, 2012 2 次提交

Btrfs: add qgroup inheritance · 6f72c7e2

由 Arne Jansen 提交于 9月 14, 2011

When creating a subvolume or snapshot, it is necessary
to initialize the qgroup account with a copy of some
other (tracking) qgroup. This patch adds parameters
to the ioctls to pass the information from which qgroup
to inherit.
Signed-off-by: NArne Jansen <sensille@gmx.net>

6f72c7e2

Btrfs: add qgroup ioctls · 5d13a37b

由 Arne Jansen 提交于 9月 14, 2011

Ioctls to control the qgroup feature like adding and
removing qgroups and assigning qgroups.
Signed-off-by: NArne Jansen <sensille@gmx.net>

5d13a37b

16 6月, 2012 1 次提交
- C
  Btrfs: cast devid to unsigned long long for printk %llu · a8c4a33b
  由 Chris Mason 提交于 6月 15, 2012
```
Avoid warning in 32 bit machines
Signed-off-by: NChris Mason <chris.mason@fusionio.com>
```
  a8c4a33b
15 6月, 2012 3 次提交

Btrfs: do not resize a seeding device · 4e42ae1b

由 Liu Bo 提交于 6月 14, 2012

Seeding devices are not supposed to change any more.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

4e42ae1b

Btrfs: fix defrag regression · 6c282eb4

由 Li Zefan 提交于 6月 11, 2012

If a file has 3 small extents:

| ext1 | ext2 | ext3 |

Running "btrfs fi defrag" will only defrag the last two extents, if those
extent mappings hasn't been read into memory from disk.

This bug was introduced by commit 17ce6ef8
("Btrfs: add a check to decide if we should defrag the range")

The cause is, that commit looked into previous and next extents using
lookup_extent_mapping() only.

While at it, remove the code that checks the previous extent, since
it's sufficient to check the next extent.
Signed-off-by: NLi Zefan <lizefan@huawei.com>

6c282eb4

Btrfs: use rcu to protect device->name · 606686ee

由 Josef Bacik 提交于 6月 04, 2012

Al pointed out that we can just toss out the old name on a device and add a
new one arbitrarily, so anybody who uses device->name in printk could
possibly use free'd memory. Instead of adding locking around all of this he
suggested doing it with RCU, so I've introduced a struct rcu_string that
does just that and have gone through and protected all accesses to
device->name that aren't under the uuid_mutex with rcu_read_lock(). This
protects us and I will use it for dealing with removing the device that we
used to mount the file system in a later patch. Thanks,
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NJosef Bacik <josef@redhat.com>

606686ee

30 5月, 2012 5 次提交

Btrfs: add ioctl to get and reset the device stats · c11d2c23

由 Stefan Behrens 提交于 5月 25, 2012

An ioctl interface is added to get the device statistic counters.
A second ioctl is added to atomically get and reset these counters.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

c11d2c23

Btrfs: do not do balance in readonly mode · 9ba1f6e4

由 Liu Bo 提交于 5月 11, 2012

In normal cases, we would not be allowed to do balance in RO mode.
However, when we're using a seeding device and adding another device to sprout,
things will change:

$ mkfs.btrfs /dev/sdb7
$ btrfstune -S 1 /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs -o ro
$ btrfs fi bal /mnt/btrfs   -----------------------> fail.
$ btrfs dev add /dev/sdb8 /mnt/btrfs
$ btrfs fi bal /mnt/btrfs   -----------------------> works!

It should not be designed as an exception, and we'd better add another check for
mnt flags.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <josef@redhat.com>

9ba1f6e4

Btrfs: NUL-terminate path buffer in DEV_INFO ioctl result · a27202fb

由 Jim Meyering 提交于 4月 26, 2012

A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer
would not be NUL-terminated in the DEV_INFO ioctl result buffer.
Signed-off-by: NJim Meyering <meyering@redhat.com>

a27202fb

Fix minor type issues · 2eec6c81

由 Daniel J Blueman 提交于 4月 26, 2012

Address some minor type issues identified by sparse checker.
Signed-off-by: NDaniel J Blueman <daniel@quora.org>

2eec6c81

Btrfs: use i_version instead of our own sequence · 0c4d2d95

由 Josef Bacik 提交于 4月 05, 2012

We've been keeping around the inode sequence number in hopes that somebody
would use it, but nobody uses it and people actually use i_version which
serves the same purpose, so use i_version where we used the incore inode's
sequence number and that way the sequence is updated properly across the
board, and not just in file write. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

0c4d2d95

26 5月, 2012 1 次提交

Btrfs: don't set for_cow parameter for tree block functions · 5581a51a

由 Jan Schmidt 提交于 5月 16, 2012

Three callers of btrfs_free_tree_block or btrfs_alloc_tree_block passed
parameter for_cow = 1. In fact, these two functions should never mark
their tree modification operations as for_cow, because they can change
the number of blocks referenced by a tree.

Hence, we remove the extra for_cow parameter from these functions and
make them pass a zero down.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

5581a51a

19 4月, 2012 1 次提交

Btrfs: fix btrfs_ioctl_dev_info() crash on missing device · 99ba55ad

由 Stefan Behrens 提交于 3月 19, 2012

When a filesystem is mounted with the degraded option, it is
possible that some of the devices are not there.
btrfs_ioctl_dev_info() crashs in this case because the device
name is a NULL pointer. This ioctl was only used for scrub.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

99ba55ad

29 3月, 2012 5 次提交

Btrfs: update to the right index of defragment · e1f041e1

由 Liu Bo 提交于 3月 29, 2012

When we use autodefrag, we forget to update the index which indicates
the last page we've dirty.  And we'll set dirty flags on a same set of
pages again and again.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

e1f041e1

Btrfs: do not bother to defrag an extent if it is a big real extent · 66c26892

由 Liu Bo 提交于 3月 29, 2012

$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag
$ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0     3072              10 eof
/mnt/btrfs/foobar: 1 extent found

Now we have a big real extent [0, 40960), but autodefrag will still defrag it.

$ sync
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0     3082              10 eof
/mnt/btrfs/foobar: 1 extent found

So if we already find a big real extent, we're ok about that, just skip it.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

66c26892

Btrfs: add a check to decide if we should defrag the range · 17ce6ef8

由 Liu Bo 提交于 3月 29, 2012

If our file's layout is as follows:
| hole | data1 | hole | data2 |

we do not need to defrag this file, because this file has holes and
cannot be merged into one extent.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

17ce6ef8

Btrfs: fix the mismatch of page->mapping · 1f12bd06

由 Liu Bo 提交于 3月 29, 2012

commit 600a45e1
(Btrfs: fix deadlock on page lock when doing auto-defragment)
fixes the deadlock on page, but it also introduces another bug.

A page may have been truncated after unlock & lock.
So we need to find it again to get the right one.

And since we've held i_mutex lock, inode size remains unchanged and
we can drop isize overflow checks.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

1f12bd06

Btrfs: fix race between direct io and autodefrag · ecb8bea8

由 Liu Bo 提交于 3月 29, 2012

The bug is from running xfstests 209 with autodefrag.

The race is as follows:
       t1                       t2(autodefrag)
   direct IO
     invalidate pagecache
     dio(old data)             add_inode_defrag
     invalidate pagecache
   endio

   direct IO
     invalidate pagecache
                                run_defrag
                                  readpage(old data)
                                  set page dirty (old data)
     dio(new data, rewrite)
     invalidate pagecache (*)
     endio

t2(autodefrag) will get old data into pagecache via readpage and set
pagecache dirty.  Meanwhile, invalidate pagecache(*) will fail due to
dirty flags in pages.  So the old data may be flushed into disk by
flush thread, which will lead to data loss.

And so does the case of user defragment progs.

The patch fixes this race by holding i_mutex when we readpage and set page dirty.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

ecb8bea8

27 3月, 2012 1 次提交

Btrfs: fix regression in scrub path resolving · 7a3ae2f8

由 Jan Schmidt 提交于 3月 23, 2012

In commit 4692cf58 we introduced new backref walking code for btrfs. This
assumes we're searching live roots, which requires a transaction context.
While scrubbing, however, we must not join a transaction because this could
deadlock with the commit path. Additionally, what scrub really wants to do
is resolving a logical address in the commit root it's currently checking.

This patch adds support for logical to path resolving on commit roots and
makes scrub use that.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

7a3ae2f8

22 3月, 2012 3 次提交

btrfs: replace many BUG_ONs with proper error handling · 79787eaa

由 Jeff Mahoney 提交于 3月 12, 2012

 btrfs currently handles most errors with BUG_ON. This patch is a work-in-
 progress but aims to handle most errors other than internal logic
 errors and ENOMEM more gracefully.

 This iteration prevents most crashes but can run into lockups with
 the page lock on occasion when the timing "works out."
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

79787eaa

btrfs: Don't BUG_ON errors from btrfs_create_subvol_root() · ce598979

由 Mark Fasheh 提交于 7月 26, 2011

This is called from only one place - create_subvol() which passes errors
safely back out to it's caller, btrfs_mksubvol where they are handled.

Additionally, btrfs_create_subvol_root() itself bug's needlessly from error
return of btrfs_update_inode(). Since create_subvol() was fixed to catch
errors we can bubble this one up too.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

ce598979

btrfs: drop gfp_t from lock_extent · d0082371

由 Jeff Mahoney 提交于 3月 01, 2012

 lock_extent and unlock_extent are always called with GFP_NOFS, drop the
 argument and use GFP_NOFS consistently.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>

d0082371

23 2月, 2012 1 次提交
- C
  Btrfs: add extra sanity checks on the path names in btrfs_mksubvol · 16780cab
  由 Chris Mason 提交于 2月 20, 2012
```
Signed-off-by: NChris Mason <chris.mason@oracle.com>
```
  16780cab
17 2月, 2012 1 次提交

Btrfs: fix deadlock on page lock when doing auto-defragment · 600a45e1

由 Miao Xie 提交于 2月 16, 2012

When I ran xfstests circularly on a auto-defragment btrfs, the deadlock
happened.

Steps to reproduce:
[tty0]
 # export MOUNT_OPTIONS="-o autodefrag"
 # export TEST_DEV=<partition1>
 # export TEST_DIR=<mountpoint1>
 # export SCRATCH_DEV=<partition2>
 # export SCRATCH_MNT=<mountpoint2>
 # while [ 1 ]
 > do
 > ./check 091 127 263
 > sleep 1
 > done
[tty1]
 # while [ 1 ]
 > do
 > echo 3 > /proc/sys/vm/drop_caches
 > done

Several hours later, the test processes will hang on, and the deadlock will
happen on page lock.

The reason is that:
  Auto defrag task		Flush thread			Test task
				btrfs_writepages()
				  add ordered extent
				  (including page 1, 2)
				  set page 1 writeback
				  set page 2 writeback
				endio_fn()
				  end page 2 writeback
								release page 2
lock page 1
alloc and lock page 2
page 2 is not uptodate
  btrfs_readpage()
    start ordered extent()
    btrfs_writepages()
      try  to lock page 1

so deadlock happens.

Fix this bug by unlocking the page which is in writeback, and re-locking it
after the writeback end.
Signed-off-by: NMiao Xie <miax@cn.fujitsu.com>

600a45e1