提交 · 0308af4465897c889e32754ef37bb465a1b2b872 · xiphi1978 / linux

18 9月, 2014 10 次提交

Btrfs: fix unprotected device's variants on 32bits machine · 7cc8e58d

由 Miao Xie 提交于 9月 03, 2014

->total_bytes,->disk_total_bytes,->bytes_used is protected by chunk
lock when we change them, but sometimes we read them without any lock,
and we might get unexpected value. We fix this problem like inode's
i_size.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

7cc8e58d

Btrfs: add missing compression property remove in btrfs_ioctl_setflags · 78a017a2

由 Filipe Manana 提交于 9月 11, 2014

The behaviour of a 'chattr -c' consists of getting the current flags,
clearing the FS_COMPR_FL bit and then sending the result to the set
flags ioctl - this means the bit FS_NOCOMP_FL isn't set in the flags
passed to the ioctl. This results in the compression property not being
cleared from the inode - it was cleared only if the bit FS_NOCOMP_FL
was set in the received flags.

Reproducer:

    $ mkfs.btrfs -f /dev/sdd
    $ mount /dev/sdd /mnt && cd /mnt
    $ mkdir a
    $ chattr +c a
    $ touch a/file
    $ lsattr a/file
    --------c------- a/file
    $ chattr -c a
    $ touch a/file2
    $ lsattr a/file2
    --------c------- a/file2
    $ lsattr -d a
    ---------------- a
Reported-by: NAndreas Schneider <asn@cryptomilk.org>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

78a017a2

Btrfs: make btrfs_search_forward return with nodes unlocked · f98de9b9

由 Filipe Manana 提交于 8月 04, 2014

None of the uses of btrfs_search_forward() need to have the path
nodes (level >= 1) read locked, only the leaf needs to be locked
while the caller processes it. Therefore make it return a path
with all nodes unlocked, except for the leaf.

This change is motivated by the observation that during a file
fsync we repeatdly call btrfs_search_forward() and process the
returned leaf while upper nodes of the returned path (level >= 1)
are read locked, which unnecessarily blocks other tasks that want
to write to the same fs/subvol btree.
Therefore instead of modifying the fsync code to unlock all nodes
with level >= 1 immediately after calling btrfs_search_forward(),
change btrfs_search_forward() to do it, so that it benefits all
callers.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

f98de9b9

btrfs: wake up transaction thread from SYNC_FS ioctl · 2fad4e83

由 David Sterba 提交于 7月 23, 2014

The transaction thread may want to do more work, namely it pokes the
cleaner ktread that will start processing uncleaned subvols.

This can be triggered by user via the 'btrfs fi sync' command, otherwise
there was a delay up to 30 seconds before the cleaner started to clean
old snapshots.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

2fad4e83

btrfs: device delete must be sysloged · ec95d491

由 Anand Jain 提交于 7月 01, 2014

as in the disk add patch, disk detached from the volume must be
recorded in the syslog as well for the same reason.
Signed-off-by: NAnand Jain <Anand.Jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

ec95d491

btrfs: device add must be sysloged · 43d20761

由 Anand Jain 提交于 7月 01, 2014

when we add a new disk to the mounted btrfs we don't record it
as of now, disk add is a critical change of btrfs configuration,
it must be recorded in the syslog to help offline investigations
of customer problems when reported.
Signed-off-by: NAnand Jain <Anand.Jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

43d20761

btrfs: use DIV_ROUND_UP instead of open-coded variants · ed6078f7

由 David Sterba 提交于 6月 05, 2014

The form

  (value + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT

is equivalent to

  (value + PAGE_CACHE_SIZE - 1) / PAGE_CACHE_SIZE

The rest is a simple subsitution, no difference in the generated
assembly code.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

ed6078f7

btrfs: use nodesize everywhere, kill leafsize · 707e8a07

由 David Sterba 提交于 6月 04, 2014

The nodesize and leafsize were never of different values. Unify the
usage and make nodesize the one. Cleanup the redundant checks and
helpers.

Shaves a few bytes from .text:

  text    data     bss     dec     hex filename
852418   24560   23112  900090   dbbfa btrfs.ko.before
851074   24584   23112  898770   db6d2 btrfs.ko.after
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

707e8a07

btrfs: kill the key type accessor helpers · 962a298f

由 David Sterba 提交于 6月 04, 2014

btrfs_set_key_type and btrfs_key_type are used inconsistently along with
open coded variants. Other members of btrfs_key are accessed directly
without any helpers anyway.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

962a298f

btrfs: cleanup ino cache members of btrfs_root · 57cdc8db

由 David Sterba 提交于 2月 05, 2014

The naming is confusing, generic yet used for a specific cache. Add a
prefix 'ino_' or rename appropriately.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

57cdc8db

09 9月, 2014 1 次提交

Btrfs: kfree()ing ERR_PTRs · c47ca32d

由 Dan Carpenter 提交于 9月 04, 2014

The "inherit" in btrfs_ioctl_snap_create_v2() and "vol_args" in
btrfs_ioctl_rm_dev() are ERR_PTRs so we can't call kfree() on them.

These kind of bugs are "One Err Bugs" where there is just one error
label that does everything. I could set the "inherit = NULL" and keep
the single out label but it ends up being more complicated that way. It
makes the code simpler to re-order the unwind so it's in the mirror
order of the allocation and introduce some new error labels.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

c47ca32d

27 8月, 2014 1 次提交

Btrfs: fix autodefrag with compression · e9512d72

由 Chris Mason 提交于 8月 26, 2014

The autodefrag code skips defrag when two extents are adjacent.  But one
big advantage for autodefrag is cutting down on the number of small
extents, even when they are adjacent.  This commit changes it to defrag
all small extents.
Signed-off-by: NChris Mason <clm@fb.com>

e9512d72

21 8月, 2014 2 次提交

Btrfs: clone, don't create invalid hole extent map · 62e2390e

由 Filipe Manana 提交于 8月 08, 2014

When cloning a file that consists of an inline extent, we were creating
an extent map that represents a non-existing trailing hole starting at a
file offset that isn't a multiple of the sector size. This happened because
when processing an inline extent we weren't aligning the extent's length to
the sector size, and therefore incorrectly treating the range
[inline_extent_length; sector_size[ as a hole.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

62e2390e

Btrfs: race free update of commit root for ro snapshots · 9c3b306e

由 Filipe Manana 提交于 7月 31, 2014

This is a better solution for the problem addressed in the following
commit:

    Btrfs: update commit root on snapshot creation after orphan cleanup
    (3821f348)

The previous solution wasn't the best because of 2 reasons:

    1) It added another full transaction commit, which is more expensive
       than just swapping the commit root with the root;

    2) If a reboot happened after the first transaction commit (the one
       that creates the snapshot) and before the second transaction commit,
       then we would end up with the same problem if a send using that
       snapshot was requested before the first transaction commit after
       the reboot.

This change addresses those 2 issues. The second issue is addressed by
switching the commit root in the dentry lookup VFS callback, which is
also called by the snapshot/subvol creation ioctl and performs orphan
cleanup if needed. Like the vfs, the ioctl locks the parent inode too,
preventing race issues between a dentry lookup and snapshot creation.

Cc: Alex Lyakas <alex.btrfs@zadarastorage.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

9c3b306e

03 7月, 2014 2 次提交

Btrfs: fix use-after-free when cloning a trailing file hole · 14f59796

由 Filipe Manana 提交于 6月 29, 2014

The transaction handle was being used after being freed.

Cc: Chris Mason <clm@fb.com>
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

14f59796

Btrfs: atomically set inode->i_flags in btrfs_update_iflags · 3cc79392

由 Filipe Manana 提交于 6月 25, 2014

This change is based on the corresponding recent change for ext4:

  ext4: atomically set inode->i_flags in ext4_set_inode_flags()

That has the following commit message that applies to btrfs as well:

  "Use cmpxchg() to atomically set i_flags instead of clearing out the
   S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
   EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race
   where an immutable file has the immutable flag cleared for a brief
   window of time."

Replacing EXT4_IMMUTABLE_FL and EXT4_APPEND_FL with BTRFS_INODE_IMMUTABLE
and BTRFS_INODE_APPEND, respectively.
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Reviewed-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

3cc79392

14 6月, 2014 1 次提交

btrfs: new ioctl TREE_SEARCH_V2 · cc68a8a5

由 Gerhard Heift 提交于 1月 30, 2014

This new ioctl call allows the user to supply a buffer of varying size in which
a tree search can store its results. This is much more flexible if you want to
receive items which are larger than the current fixed buffer of 3992 bytes or
if you want to fetch more items at once. Items larger than this buffer are for
example some of the type EXTENT_CSUM.
Signed-off-by: NGerhard Heift <Gerhard@Heift.Name>
Signed-off-by: NChris Mason <clm@fb.com>
Acked-by: NDavid Sterba <dsterba@suse.cz>

cc68a8a5

13 6月, 2014 5 次提交

btrfs: tree_search, search_ioctl: direct copy to userspace · ba346b35

由 Gerhard Heift 提交于 1月 30, 2014

By copying each found item seperatly to userspace, we do not need extra
buffer in the kernel.
Signed-off-by: NGerhard Heift <Gerhard@Heift.Name>
Signed-off-by: NChris Mason <clm@fb.com>
Acked-by: NDavid Sterba <dsterba@suse.cz>

ba346b35

btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW · 9b6e817d

由 Gerhard Heift 提交于 1月 30, 2014

If an item in tree_search is too large to be stored in the given buffer, return
the needed size (including the header).
Signed-off-by: NGerhard Heift <Gerhard@Heift.Name>
Signed-off-by: NChris Mason <clm@fb.com>
Acked-by: NDavid Sterba <dsterba@suse.cz>

9b6e817d

btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer · 8f5f6178

由 Gerhard Heift 提交于 1月 30, 2014

In copy_to_sk, if an item is too large for the given buffer, it now returns
-EOVERFLOW instead of copying a search_header with len = 0. For backward
compatibility for the first item it still copies such a header to the buffer,
but not any other following items, which could have fitted.

tree_search changes -EOVERFLOW back to 0 to behave similiar to the way it
behaved before this patch.
Signed-off-by: NGerhard Heift <Gerhard@Heift.Name>
Signed-off-by: NChris Mason <clm@fb.com>
Acked-by: NDavid Sterba <dsterba@suse.cz>

8f5f6178

btrfs: tree_search, search_ioctl: accept varying buffer · 12544442

由 Gerhard Heift 提交于 1月 30, 2014

rewrite search_ioctl to accept a buffer with varying size
Signed-off-by: NGerhard Heift <Gerhard@Heift.Name>
Signed-off-by: NChris Mason <clm@fb.com>
Acked-by: NDavid Sterba <dsterba@suse.cz>

12544442

btrfs: tree_search: eliminate redundant nr_items check · 25c9bc2e

由 Gerhard Heift 提交于 1月 30, 2014

If the amount of items reached the given limit of nr_items, we can leave
copy_to_sk without updating the key. Also by returning 1 we leave the loop in
search_ioctl without rechecking if we reached the given limit.
Signed-off-by: NGerhard Heift <Gerhard@Heift.Name>
Signed-off-by: NChris Mason <clm@fb.com>
Acked-by: NDavid Sterba <dsterba@suse.cz>

25c9bc2e

10 6月, 2014 17 次提交

Btrfs: make fsync work after cloning into a file · 7ffbb598

由 Filipe Manana 提交于 6月 09, 2014

When cloning into a file, we were correctly replacing the extent
items in the target range and removing the extent maps. However
we weren't replacing the extent maps with new ones that point to
the new extents - as a consequence, an incremental fsync (when the
inode doesn't have the full sync flag) was a NOOP, since it relies
on the existence of extent maps in the modified list of the inode's
extent map tree, which was empty. Therefore add new extent maps to
reflect the target clone range.

A test case for xfstests follows.
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

7ffbb598

trivial: fs/btrfs/ioctl.c: fix typo s/substract/subtract/ · 93915584

由 Antonio Ospite 提交于 6月 04, 2014

Signed-off-by: NAntonio Ospite <ao2@ao2.it>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: NChris Mason <clm@fb.com>

93915584

Btrfs: fix clone to deal with holes when NO_HOLES feature is enabled · f82a9901

由 Filipe Manana 提交于 6月 01, 2014

If the NO_HOLES feature is enabled holes don't have file extent items in
the btree that represent them anymore. This made the clone operation
ignore the gaps that exist between consecutive file extent items and
therefore not create the holes at the destination. When not using the
NO_HOLES feature, the holes were created at the destination.

A test case for xfstests follows.
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

f82a9901

btrfs: replace EINVAL with ERANGE for resize when ULLONG_MAX · 902c68a4

由 Gui Hecheng 提交于 5月 29, 2014

To be accurate about the error case,
if the new size is beyond ULLONG_MAX, return ERANGE instead of EINVAL.
Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

902c68a4

Btrfs: update commit root on snapshot creation after orphan cleanup · 3821f348

由 Filipe Manana 提交于 6月 03, 2014

On snapshot creation (either writable or read-only), we do orphan cleanup
against the root of the snapshot. If the cleanup did remove any orphans,
then the current root node will be different from the commit root node
until the next transaction commit happens.

A send operation always uses the commit root of a snapshot - this means
it will see the orphans if it starts computing the send stream before the
next transaction commit happens (triggered by a timer or sync() for .e.g),
which is when the commit root gets assigned a reference to current root,
where the orphans are not visible anymore. The consequence of send seeing
the orphans is explained below.

For example:

    mkfs.btrfs -f /dev/sdd
    mount -o commit=999 /dev/sdd /mnt

    # open a file with O_TMPFILE and leave it open
    # write some data to the file
    btrfs subvolume snapshot -r /mnt /mnt/snap1

    btrfs send /mnt/snap1 -f /tmp/send.data

The send operation will fail with the following error:

    ERROR: send ioctl failed with -116: Stale file handle

What happens here is that our snapshot has an orphan inode still visible
through the commit root, that corresponds to the tmpfile. However send
will attempt to call inode.c:btrfs_iget(), with the goal of reading the
file's data, which will return -ESTALE because it will use the current
root (and not the commit root) of the snapshot.

Of course, there are other cases where we can get orphans, but this
example using a tmpfile makes it much easier to reproduce the issue.

Therefore on snapshot creation, after calling btrfs_orphan_cleanup, if
the commit root is different from the current root, just commit the
transaction associated with the snapshot's root (if it exists), so that
a send will not see any orphans that don't exist anymore. This also
guarantees a send will always see the same content regardless of whether
a transaction commit happened already before the send was requested and
after the orphan cleanup (meaning the commit root and current roots are
the same) or it hasn't happened yet (commit and current roots are
different).
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

3821f348

Btrfs: ioctl, don't re-lock extent range when not necessary · ff5df9b8

由 Filipe Manana 提交于 5月 30, 2014

In ioctl.c:lock_extent_range(), after locking our target range, the
ordered extent that btrfs_lookup_first_ordered_extent() returns us
may not overlap our target range at all. In this case we would just
unlock our target range, wait for any new ordered extents that overlap
the range to complete, lock again the range and repeat all these steps
until we don't get any ordered extent and the delalloc flag isn't set
in the io tree for our target range.

Therefore just stop if we get an ordered extent that doesn't overlap
our target range and the dealalloc flag isn't set for the range in
the inode's io tree.
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

ff5df9b8

Btrfs: avoid visiting all extent items when cloning a range · 2c463823

由 Filipe Manana 提交于 5月 31, 2014

When cloning a range of a file, we were visiting all the extent items in
the btree that belong to our source inode. We don't need to visit those
extent items that don't overlap the range we are cloning, as doing so only
makes us waste time and do unnecessary btree navigations (btrfs_next_leaf)
for inodes that have a large number of file extent items in the btree.
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

2c463823

Btrfs: set dead flag on the right root when destroying snapshot · c55bfa67

由 Filipe Manana 提交于 5月 25, 2014

We were setting the BTRFS_ROOT_SUBVOL_DEAD flag on the root of the
parent of our target snapshot, instead of setting it in the target
snapshot's root.

This is easy to observe by running the following scenario:

    mkfs.btrfs -f /dev/sdd
    mount /dev/sdd /mnt

    btrfs subvolume create /mnt/first_subvol
    btrfs subvolume snapshot -r /mnt /mnt/mysnap1

    btrfs subvolume delete /mnt/first_subvol
    btrfs subvolume snapshot -r /mnt /mnt/mysnap2

    btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data

The send command failed because the send ioctl returned -EPERM.
A test case for xfstests follows.
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

c55bfa67

Btrfs: ensure readers see new data after a clone operation · c125b8bf

由 Filipe Manana 提交于 5月 23, 2014

We were cleaning the clone target file range from the page cache before
we did replace the file extent items in the fs tree. This was racy,
as right after cleaning the relevant range from the page cache and before
replacing the file extent items, a read against that range could be
performed by another task and populate again the page cache with stale
data (stale after the cloning finishes). This would result in reads after
the clone operation successfully finishes to get old data (and potentially
for a very long time). Therefore evict the pages after replacing the file
extent items, so that subsequent reads will always get the new data.

Similarly, we were prone to races while cloning the file extent items
because we weren't locking the target range and wait for any existing
ordered extents against that range to complete. It was possible that
after cloning the extent items, a write operation that was performed
before the clone operation and overlaps the same range, would end up
undoing all or part of the work the clone operation did (a worker task
running inode.c:btrfs_finish_ordered_io). Therefore lock the target
range in the io tree, wait for all pending ordered extents against that
range to finish and then safely perform the cloning.

The issue of reading stale data after the clone operation is easy to
reproduce by running the following C program in a loop until it exits
with return value 1.

 #include <unistd.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <errno.h>
 #include <pthread.h>
 #include <fcntl.h>
 #include <assert.h>
 #include <asm/types.h>
 #include <linux/ioctl.h>
 #include <sys/stat.h>
 #include <sys/types.h>
 #include <sys/ioctl.h>

 #define SRC_FILE "/mnt/sdd/foo"
 #define DST_FILE "/mnt/sdd/bar"
 #define FILE_SIZE (16 * 1024)
 #define PATTERN_SRC 'X'
 #define PATTERN_DST 'Y'

struct btrfs_ioctl_clone_range_args {
	__s64 src_fd;
	__u64 src_offset, src_length;
	__u64 dest_offset;
};

 #define BTRFS_IOCTL_MAGIC 0x94
 #define BTRFS_IOC_CLONE_RANGE _IOW(BTRFS_IOCTL_MAGIC, 13, \
				   struct btrfs_ioctl_clone_range_args)

static pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
static int clone_done = 0;
static int reader_ready = 0;
static int stale_data = 0;

static void *reader_loop(void *arg)
{
	char buf[4096], want_buf[4096];

	memset(want_buf, PATTERN_SRC, 4096);
	pthread_mutex_lock(&mutex);
	reader_ready = 1;
	pthread_mutex_unlock(&mutex);

	while (1) {
		int done, fd, ret;

		fd = open(DST_FILE, O_RDONLY);
		assert(fd != -1);

		pthread_mutex_lock(&mutex);
		done = clone_done;
		pthread_mutex_unlock(&mutex);

		ret = read(fd, buf, 4096);
		assert(ret == 4096);
		close(fd);

		if (done) {
			ret = memcmp(buf, want_buf, 4096);
			if (ret == 0) {
				printf("Found new content\n");
			} else {
				printf("Found old content\n");
				pthread_mutex_lock(&mutex);
				stale_data = 1;
				pthread_mutex_unlock(&mutex);
			}
			break;
		}
	}
	return NULL;
}

int main(int argc, char *argv[])
{
	pthread_t reader;
	int ret, i, fd;
	struct btrfs_ioctl_clone_range_args clone_args;
	int fd1, fd2;

	ret = remove(SRC_FILE);
	if (ret == -1 && errno != ENOENT) {
		fprintf(stderr, "Error deleting src file: %s\n", strerror(errno));
		return 1;
	}
	ret = remove(DST_FILE);
	if (ret == -1 && errno != ENOENT) {
		fprintf(stderr, "Error deleting dst file: %s\n", strerror(errno));
		return 1;
	}

	fd = open(SRC_FILE, O_CREAT | O_WRONLY | O_TRUNC, S_IRWXU);
	assert(fd != -1);
	for (i = 0; i < FILE_SIZE; i++) {
		char c = PATTERN_SRC;
		ret = write(fd, &c, 1);
		assert(ret == 1);
	}
	close(fd);
	fd = open(DST_FILE, O_CREAT | O_WRONLY | O_TRUNC, S_IRWXU);
	assert(fd != -1);
	for (i = 0; i < FILE_SIZE; i++) {
		char c = PATTERN_DST;
		ret = write(fd, &c, 1);
		assert(ret == 1);
	}
	close(fd);
        sync();

	ret = pthread_create(&reader, NULL, reader_loop, NULL);
	assert(ret == 0);
	while (1) {
		int r;
		pthread_mutex_lock(&mutex);
		r = reader_ready;
		pthread_mutex_unlock(&mutex);
		if (r) break;
	}

	fd1 = open(SRC_FILE, O_RDONLY);
	if (fd1 < 0) {
		fprintf(stderr, "Error open src file: %s\n", strerror(errno));
		return 1;
	}
	fd2 = open(DST_FILE, O_RDWR);
	if (fd2 < 0) {
		fprintf(stderr, "Error open dst file: %s\n", strerror(errno));
		return 1;
	}
	clone_args.src_fd = fd1;
	clone_args.src_offset = 0;
	clone_args.src_length = 4096;
	clone_args.dest_offset = 0;
	ret = ioctl(fd2, BTRFS_IOC_CLONE_RANGE, &clone_args);
	assert(ret == 0);
	close(fd1);
	close(fd2);

	pthread_mutex_lock(&mutex);
	clone_done = 1;
	pthread_mutex_unlock(&mutex);
	ret = pthread_join(reader, NULL);
	assert(ret == 0);

	pthread_mutex_lock(&mutex);
	ret = stale_data ? 1 : 0;
	pthread_mutex_unlock(&mutex);
	return ret;
}
Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: NChris Mason <clm@fb.com>

c125b8bf

btrfs: replace simple_strtoull() with kstrtoull() · 58dfae63

由 ZhangZhen 提交于 5月 13, 2014

use the newer and more pleasant kstrtoull() to replace simple_strtoull(),
because simple_strtoull() is marked for obsoletion.
Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: NChris Mason <clm@fb.com>

58dfae63

Btrfs: rework qgroup accounting · fcebe456

由 Josef Bacik 提交于 5月 13, 2014

Currently qgroups account for space by intercepting delayed ref updates to fs
trees. It does this by adding sequence numbers to delayed ref updates so that
it can figure out how the tree looked before the update so we can adjust the
counters properly. The problem with this is that it does not allow delayed refs
to be merged, so if you say are defragging an extent with 5k snapshots pointing
to it we will thrash the delayed ref lock because we need to go back and
manually merge these things together. Instead we want to process quota changes
when we know they are going to happen, like when we first allocate an extent, we
free a reference for an extent, we add new references etc. This patch
accomplishes this by only adding qgroup operations for real ref changes. We
only modify the sequence number when we need to lookup roots for bytenrs, this
reduces the amount of churn on the sequence number and allows us to merge
delayed refs as we add them most of the time. This patch encompasses a bunch of
architectural changes

1) qgroup ref operations: instead of tracking qgroup operations through the
delayed refs we simply add new ref operations whenever we notice that we need to
when we've modified the refs themselves.

2) tree mod seq: we no longer have this separation of major/minor counters.
this makes the sequence number stuff much more sane and we can remove some
locking that was needed to protect the counter.

3) delayed ref seq: we now read the tree mod seq number and use that as our
sequence. This means each new delayed ref doesn't have it's own unique sequence
number, rather whenever we go to lookup backrefs we inc the sequence number so
we can make sure to keep any new operations from screwing up our world view at
that given point. This allows us to merge delayed refs during runtime.

With all of these changes the delayed ref stuff is a little saner and the qgroup
accounting stuff no longer goes negative in some cases like it was before.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

fcebe456

Btrfs: use bitfield instead of integer data type for the some variants in btrfs_root · 27cdeb70

由 Miao Xie 提交于 4月 02, 2014

Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

27cdeb70

btrfs: assert that send is not in progres before root deletion · 61155aa0

由 David Sterba 提交于 4月 15, 2014

CC: Miao Xie <miaox@cn.fujitsu.com>
CC: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

61155aa0

btrfs: protect snapshots from deleting during send · 521e0546

由 David Sterba 提交于 4月 15, 2014

The patch "Btrfs: fix protection between send and root deletion"
(18f687d5) does not actually prevent to delete the snapshot
and just takes care during background cleaning, but this seems rather
user unfriendly, this patch implements the idea presented in

http://www.spinics.net/lists/linux-btrfs/msg30813.html

- add an internal root_item flag to denote a dead root
- check if the send_in_progress is set and refuse to delete, otherwise
  set the flag and proceed
- check the flag in send similar to the btrfs_root_readonly checks, for
  all involved roots

The root lookup in send via btrfs_read_fs_root_no_name will check if the
root is really dead or not. If it is, ENOENT, aborted send. If it's
alive, it's protected by send_in_progress, send can continue.

CC: Miao Xie <miaox@cn.fujitsu.com>
CC: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

521e0546

btrfs: make FS_INFO ioctl available to anyone · e4ef90ff

由 David Sterba 提交于 4月 24, 2014

This ioctl provides basic info about the filesystem that can be obtained
in other ways (eg. sysfs), there's no reason to restrict it to
CAP_SYSADMIN.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

e4ef90ff

btrfs: make DEV_INFO ioctl available to anyone · 7d6213c5

由 David Sterba 提交于 4月 24, 2014

This ioctl provides basic info about the devices that can be obtained in
other ways (eg. sysfs), there's no reason to restrict it to
CAP_SYSADMIN.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

7d6213c5

btrfs: retrieve more info from FS_INFO ioctl · 80a773fb

由 David Sterba 提交于 5月 07, 2014

Provide the basic information about filesystem through the ioctl:
* b-tree node size (same as leaf size)
* sector size
* expected alignment of CLONE_RANGE and EXTENT_SAME ioctl arguments

Backward compatibility: if the values are 0, kernel does not provide
this information, the applications should ignore them.
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

80a773fb

21 5月, 2014 1 次提交

Btrfs: fix EIO on reading file after ioctl clone works on it · d3ecfcdf

由 Liu Bo 提交于 5月 09, 2014

For inline data extent, we need to make its length aligned, otherwise,
we can get a phantom extent map which confuses readpages() to return -EIO.

This can be detected by xfstests/btrfs/035.
Reported-by: NDavid Disseldorp <ddiss@suse.de>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

d3ecfcdf