提交 · a80c8dcf7e5065adc555ef8ffb256df11e3293e3 · bug2833 / cloud-kernel

02 10月, 2012 11 次提交

Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag · 9e8a4a8b

由 Liu Bo 提交于 9月 05, 2012

We're going to use this flag EXTENT_DEFRAG to indicate which range
belongs to defragment so that we can implement snapshow-aware defrag:

We set the EXTENT_DEFRAG flag when dirtying the extents that need
defragmented, so later on writeback thread can differentiate between
normal writeback and writeback started by defragmentation.
Original-Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>

9e8a4a8b

Btrfs: add a new "type" field into the block reservation structure · 66d8f3dd

由 Miao Xie 提交于 9月 06, 2012

Sometimes we need choose the method of the reservation according to the type
of the block reservation, such as the reservation for the delayed inode update.
Now we identify the type just by comparing the address of the reservation
variants, it is very ugly if it is a temporary one because we need compare it
with all the common reservation variants. So we add a new "type" field to keep
the type the reservation variants.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

66d8f3dd

Btrfs: do not take cleanup_work_sem in btrfs_run_delayed_iputs() · ac14aed6

由 Sage Weil 提交于 8月 30, 2012

Josef has suggested that this is not necessary.  Removing it also avoids
this lockdep splat (after the new sb_internal locking stuff was added):

[  604.090449] ======================================================
[  604.114819] [ INFO: possible circular locking dependency detected ]
[  604.139262] 3.6.0-rc2-ceph-00144-g463b030 #1 Not tainted
[  604.162193] -------------------------------------------------------
[  604.186139] btrfs-cleaner/6669 is trying to acquire lock:
[  604.209555]  (sb_internal#2){.+.+..}, at: [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
[  604.257100]
[  604.257100] but task is already holding lock:
[  604.300366]  (&fs_info->cleanup_work_sem){.+.+..}, at: [<ffffffffa0048002>] btrfs_run_delayed_iputs+0x72/0x130 [btrfs]
[  604.352989]
[  604.352989] which lock already depends on the new lock.
[  604.352989]
[  604.427104]
[  604.427104] the existing dependency chain (in reverse order) is:
[  604.478493]
[  604.478493] -> #1 (&fs_info->cleanup_work_sem){.+.+..}:
[  604.529313]        [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
[  604.559621]        [<ffffffff81632b69>] down_read+0x39/0x4e
[  604.589382]        [<ffffffffa004db98>] btrfs_lookup_dentry+0x218/0x550 [btrfs]
[  604.596161] btrfs: unlinked 1 orphans
[  604.675002]        [<ffffffffa006aadd>] create_subvol+0x62d/0x690 [btrfs]
[  604.708859]        [<ffffffffa006d666>] btrfs_mksubvol.isra.52+0x346/0x3a0 [btrfs]
[  604.772466]        [<ffffffffa006d7f2>] btrfs_ioctl_snap_create_transid+0x132/0x190 [btrfs]
[  604.842245]        [<ffffffffa006d8ae>] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs]
[  604.912852]        [<ffffffffa00708ae>] btrfs_ioctl+0x138e/0x1990 [btrfs]
[  604.951888]        [<ffffffff8118e9b8>] do_vfs_ioctl+0x98/0x560
[  604.989961]        [<ffffffff8118ef11>] sys_ioctl+0x91/0xa0
[  605.026628]        [<ffffffff8163d569>] system_call_fastpath+0x16/0x1b
[  605.064404]
[  605.064404] -> #0 (sb_internal#2){.+.+..}:
[  605.126832]        [<ffffffff810b25e8>] __lock_acquire+0x1ac8/0x1b90
[  605.163671]        [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
[  605.200228]        [<ffffffff8117dac6>] __sb_start_write+0xc6/0x1b0
[  605.236818]        [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
[  605.274029]        [<ffffffffa00431a3>] btrfs_start_transaction+0x13/0x20 [btrfs]
[  605.340520]        [<ffffffffa004ccfa>] btrfs_evict_inode+0x19a/0x330 [btrfs]
[  605.378720]        [<ffffffff811972c8>] evict+0xb8/0x1c0
[  605.416057]        [<ffffffff811974d5>] iput+0x105/0x210
[  605.452373]        [<ffffffffa0048082>] btrfs_run_delayed_iputs+0xf2/0x130 [btrfs]
[  605.521627]        [<ffffffffa003b5e1>] cleaner_kthread+0xa1/0x120 [btrfs]
[  605.560520]        [<ffffffff810791ee>] kthread+0xae/0xc0
[  605.598094]        [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
[  605.636499]
[  605.636499] other info that might help us debug this:
[  605.636499]
[  605.736504]  Possible unsafe locking scenario:
[  605.736504]
[  605.801931]        CPU0                    CPU1
[  605.835126]        ----                    ----
[  605.867093]   lock(&fs_info->cleanup_work_sem);
[  605.898594]                                lock(sb_internal#2);
[  605.931954]                                lock(&fs_info->cleanup_work_sem);
[  605.965359]   lock(sb_internal#2);
[  605.994758]
[  605.994758]  *** DEADLOCK ***
[  605.994758]
[  606.075281] 2 locks held by btrfs-cleaner/6669:
[  606.104528]  #0:  (&fs_info->cleaner_mutex){+.+...}, at: [<ffffffffa003b5d5>] cleaner_kthread+0x95/0x120 [btrfs]
[  606.165626]  #1:  (&fs_info->cleanup_work_sem){.+.+..}, at: [<ffffffffa0048002>] btrfs_run_delayed_iputs+0x72/0x130 [btrfs]
[  606.231297]
[  606.231297] stack backtrace:
[  606.287723] Pid: 6669, comm: btrfs-cleaner Not tainted 3.6.0-rc2-ceph-00144-g463b030 #1
[  606.347823] Call Trace:
[  606.376184]  [<ffffffff8162a77c>] print_circular_bug+0x1fb/0x20c
[  606.409243]  [<ffffffff810b25e8>] __lock_acquire+0x1ac8/0x1b90
[  606.441343]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
[  606.474583]  [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
[  606.505934]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
[  606.539429]  [<ffffffff8132babd>] ? do_raw_spin_unlock+0x5d/0xb0
[  606.571719]  [<ffffffff8117dac6>] __sb_start_write+0xc6/0x1b0
[  606.603498]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
[  606.637405]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
[  606.670165]  [<ffffffff81172e75>] ? kmem_cache_alloc+0xb5/0x160
[  606.702144]  [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
[  606.735562]  [<ffffffffa00256a6>] ? block_rsv_add_bytes+0x56/0x80 [btrfs]
[  606.769861]  [<ffffffffa00431a3>] btrfs_start_transaction+0x13/0x20 [btrfs]
[  606.804575]  [<ffffffffa004ccfa>] btrfs_evict_inode+0x19a/0x330 [btrfs]
[  606.838756]  [<ffffffff81634c6b>] ? _raw_spin_unlock+0x2b/0x40
[  606.872010]  [<ffffffff811972c8>] evict+0xb8/0x1c0
[  606.903800]  [<ffffffff811974d5>] iput+0x105/0x210
[  606.935416]  [<ffffffffa0048082>] btrfs_run_delayed_iputs+0xf2/0x130 [btrfs]
[  606.970510]  [<ffffffffa003b5d5>] ? cleaner_kthread+0x95/0x120 [btrfs]
[  607.005648]  [<ffffffffa003b5e1>] cleaner_kthread+0xa1/0x120 [btrfs]
[  607.040724]  [<ffffffffa003b540>] ? btrfs_destroy_delayed_refs.isra.102+0x220/0x220 [btrfs]
[  607.104740]  [<ffffffff810791ee>] kthread+0xae/0xc0
[  607.137119]  [<ffffffff810b379d>] ? trace_hardirqs_on+0xd/0x10
[  607.169797]  [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
[  607.202472]  [<ffffffff81635430>] ? retint_restore_args+0x13/0x13
[  607.235884]  [<ffffffff81079140>] ? flush_kthread_work+0x1a0/0x1a0
[  607.268731]  [<ffffffff8163e740>] ? gs_change+0x13/0x13
Signed-off-by: NSage Weil <sage@inktank.com>

ac14aed6

Btrfs: add hole punching · 2aaa6655

由 Josef Bacik 提交于 8月 29, 2012

This patch adds hole punching via fallocate.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2aaa6655

Btrfs: remove unused hint byte argument for btrfs_drop_extents · 2671485d

由 Josef Bacik 提交于 8月 29, 2012

I audited all users of btrfs_drop_extents and found that nobody actually uses
the hint_byte argument. I'm sure it was used for something at some point but
it's not used now, and the way the pinning works the disk bytenr would never be
immediately useful anyway so lets just remove it. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

2671485d

Btrfs: fix a bug in checking whether a inode is already in log · 46d8bc34

由 Liu Bo 提交于 8月 29, 2012

This is based on Josef's "Btrfs: turbo charge fsync".

The current btrfs checks if an inode is in log by comparing
root's last_log_commit to inode's last_sub_trans[2].

But the problem is that this root->last_log_commit is shared among
inodes.

Say we have N inodes to be logged, after the first inode,
root's last_log_commit is updated and the N-1 remained files will
be skipped.

This fixes the bug by keeping a local copy of root's last_log_commit
inside each inode and this local copy will be maintained itself.

[1]: we regard each log transaction as a subset of btrfs's transaction,
i.e. sub_trans
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>

46d8bc34

Btrfs: fix wrong orphan count of the fs/file tree · 321f0e70

由 Miao Xie 提交于 8月 28, 2012

If we add a new orphan item, we should increase the atomic counter,
not decrease it. Fix it.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

321f0e70

Btrfs: improve fsync by filtering extents that we want · 4e2f84e6

由 Liu Bo 提交于 8月 27, 2012

This is based on Josef's "Btrfs: turbo charge fsync".

The above Josef's patch performs very good in random sync write test,
because we won't have too much extents to merge.

However, it does not performs good on the test:
dd if=/dev/zero of=foobar bs=4k count=12500 oflag=sync

The reason is when we do sequencial sync write, we need to merge the
current extent just with the previous one, so that we can get accumulated
extents to log:

A(4k) --> AA(8k) --> AAA(12k) --> AAAA(16k) ...

So we'll have to flush more and more checksum into log tree, which is the
bottleneck according to my tests.

But we can avoid this by telling fsync the real extents that are needed
to be logged.

With this, I did the above dd sync write test (size=50m),

         w/o (orig)   w/ (josef's)   w/ (this)
SATA      104KB/s       109KB/s       121KB/s
ramdisk   1.5MB/s       1.5MB/s       10.7MB/s (613%)
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>

4e2f84e6

Btrfs: do not needlessly restart the transaction for enospc · ca7e70f5

由 Josef Bacik 提交于 8月 27, 2012

We will stop and restart a transaction every time we move to a different leaf
when truncating a file. This is for enospc reasons, but really we could
probably get away with doing this a little better by actually working until we
hit an ENOSPC. So add a ->failfast flag to the block_rsv and set it when we do
truncates which will fail as soon as the block rsv runs out of space, and then
at that point we can stop and restart the transaction and refill the block rsv
and carry on. This will make rm'ing of a file with lots of extents a bit
faster. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

ca7e70f5

Btrfs: turbo charge fsync · 5dc562c5

由 Josef Bacik 提交于 8月 17, 2012

At least for the vm workload.  Currently on fsync we will

1) Truncate all items in the log tree for the given inode if they exist

and

2) Copy all items for a given inode into the log

The problem with this is that for things like VMs you can have lots of
extents from the fragmented writing behavior, and worst yet you may have
only modified a few extents, not the entire thing.  This patch fixes this
problem by tracking which transid modified our extent, and then when we do
the tree logging we find all of the extents we've modified in our current
transaction, sort them and commit them.  We also only truncate up to the
xattrs of the inode and copy that stuff in normally, and then just drop any
extents in the range we have that exist in the log already.  Here are some
numbers of a 50 meg fio job that does random writes and fsync()s after every
write

		Original	Patched
SATA drive	82KB/s		140KB/s
Fusion drive	431KB/s		2532KB/s

So around 2-6 times faster depending on your hardware.  There are a few
corner cases, for example if you truncate at all we have to do it the old
way since there is no way to be sure what is in the log is ok.  This
probably could be done smarter, but if you write-fsync-truncate-write-fsync
you deserve what you get.  All this work is in RAM of course so if your
inode gets evicted from cache and you read it in and fsync it we'll do it
the slow way if we are still in the same transaction that we last modified
the inode in.

The biggest cool part of this is that it requires no changes to the recovery
code, so if you fsync with this patch and crash and load an old kernel, it
will run the recovery and be a-ok.  I have tested this pretty thoroughly
with an fsync tester and everything comes back fine, as well as xfstests.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

5dc562c5

Btrfs: update last trans if we don't update the inode · 7c735313

由 Josef Bacik 提交于 8月 13, 2012

There is a completely impossible situation to hit where you can preallocate
a file, fsync it, write into the preallocated region, have the transaction
commit twice and then fsync and then immediately lose power and lose all of
the contents of the write. This patch fixes this just so I feel better
about the situation and because it is lightweight, we just update the
last_trans when we finish an ordered IO and we don't update the inode
itself. This way we are completely safe and I feel better. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

7c735313

29 8月, 2012 6 次提交

Btrfs: fix ordered extent leak when failing to start a transaction · d280e5be

由 Liu Bo 提交于 8月 21, 2012

We cannot just return error before freeing ordered extent and releasing reserved
space when we fail to start a transacion.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

d280e5be

Btrfs: fix a dio write regression · 24c03fa5

由 Liu Bo 提交于 8月 22, 2012

This bug is introduced by commit 3b8bde746f6f9bd36a9f05f5f3b6e334318176a9
(Btrfs: lock extents as we map them in DIO).

In dio write, we should unlock the section which we didn't do IO on in case that
we fall back to buffered write.  But we need to not only unlock the section
but also cleanup reserved space for the section.

This bug was found while running xfstests 133, with this 133 no longer complains.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

24c03fa5

Btrfs: fix enospc problems when deleting a subvol · 5a24e84c

由 Josef Bacik 提交于 8月 08, 2012

Subvol delete is a special kind of awful where we use the global reserve to
cover the ENOSPC requirements. The problem is once we're done removing
everything we do a btrfs_update_inode(), which by default will try to do the
delayed update stuff which will use it's own reserve. There will be no
space in this reserve and we'll return ENOSPC. So instead use
btrfs_update_inode_fallback() which will just fallback to updating the inode
item in the case of enospc. This is fine because the global reserve covers
the space requirements for this. With this patch I can now delete a subvol
on a problem image Dave Sterba sent me. Thanks,
Reported-by: NDavid Sterba <dave@jikos.cz>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

5a24e84c

Btrfs: barrier before waitqueue_active · 66657b31

由 Josef Bacik 提交于 8月 01, 2012

We need a barrir before calling waitqueue_active otherwise we will miss
wakeups.  So in places that do atomic_dec(); then atomic_read() use
atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
and then add an explicit memory barrier everywhere else that need them.
Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

66657b31

Btrfs: don't allocate a seperate csums array for direct reads · c329861d

由 Josef Bacik 提交于 8月 03, 2012

We've been allocating a big array for csums instead of storing them in the
io_tree like we do for buffered reads because previously we were locking the
entire range, so we didn't have an extent state for each sector of the
range. But now that we do the range locking as we map the buffers we can
limit the mapping lenght to sectorsize and use the private part of the
io_tree for our csums. This allows us to avoid an extra memory allocation
for direct reads which could incur latency. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c329861d

Btrfs: lock extents as we map them in DIO · eb838e73

由 Josef Bacik 提交于 7月 31, 2012

A deadlock in xfstests 113 was uncovered by commit

d187663e

This is because we would not return EIOCBQUEUED for short AIO reads, instead
we'd wait for the DIO to complete and then return the amount of data we
transferred, which would allow our stuff to unlock the remaning amount. But
with this change this no longer happens, so if we have a short AIO read (for
example if we try to read past EOF), we could leave the section from EOF to
the end of where we tried to read locked. Fixing this is tricky since there
is no clear way to know exactly how much data DIO truly submitted for IO, so
to make this less hard on ourselves and less combersome we need to lock the
extents as we try to map them, and then we unlock any areas we didn't
actually map. This makes us completely safe from deadlocks and reliance on
a particular behavior of the DIO code. This also lays the groundwork for
allowing us to use the normal csum storage method for reads which means we
can remove an allocation. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

eb838e73

04 8月, 2012 1 次提交

btrfs: nuke pdflush from comments · b2570314

由 Artem Bityutskiy 提交于 7月 25, 2012

The pdflush thread is long gone, so this patch removes references to pdflush
from btrfs comments.

Cc: Chris Mason <chris.mason@fusionio.com>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b2570314

31 7月, 2012 1 次提交

btrfs: Convert to new freezing mechanism · b2b5ef5c

由 Jan Kara 提交于 6月 12, 2012

We convert btrfs_file_aio_write() to use new freeze check.  We also add proper
freeze protection to btrfs_page_mkwrite(). We also add freeze protection to
the transaction mechanism to avoid starting transactions on frozen filesystem.
At minimum this is necessary to stop iput() of unlinked file to change frozen
filesystem during truncation.

Checks in cleaner_kthread() and transaction_kthread() can be safely removed
since btrfs_freeze() will lock the mutexes and thus block the threads (and they
shouldn't have anything to do anyway).

CC: linux-btrfs@vger.kernel.org
CC: Chris Mason <chris.mason@oracle.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b2b5ef5c

26 7月, 2012 1 次提交

Btrfs: introduce subvol uuids and times · 8ea05e3a

由 Alexander Block 提交于 7月 25, 2012

This patch introduces uuids for subvolumes. Each
subvolume has it's own uuid. In case it was snapshotted,
it also contains parent_uuid. In case it was received,
it also contains received_uuid.

It also introduces subvolume ctime/otime/stime/rtime. The
first two are comparable to the times found in inodes. otime
is the origin/creation time and ctime is the change time.
stime/rtime are only valid on received subvolumes.
stime is the time of the subvolume when it was
sent. rtime is the time of the subvolume when it was
received.

Additionally to the times, we have a transid for each
time. They are updated at the same place as the times.

btrfs receive uses stransid and rtransid to find out
if a received subvolume changed in the meantime.

If an older kernel mounts a filesystem with the
extented fields, all fields become invalid. The next
mount with a new kernel will detect this and reset the
fields.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>
Reviewed-by: NDavid Sterba <dave@jikos.cz>
Reviewed-by: NArne Jansen <sensille@gmx.net>
Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>

8ea05e3a

24 7月, 2012 6 次提交

Btrfs: zero unused bytes in inode item · 293f7e07

由 Li Zefan 提交于 7月 10, 2012

The otime field is not zeroed, so users will see random otime in an old
filesystem with a new kernel which has otime support in the future.

The reserved bytes are also not zeroed, and we'll have compatibility
issue if we make use of those bytes.
Signed-off-by: NLi Zefan <lizefan@huawei.com>

293f7e07

Btrfs: kill free_space pointer from inode structure · b4d7c3c9

由 Li Zefan 提交于 7月 09, 2012

Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type,
which means every inode has the same BTRFS_I(inode)->free_space pointer.

This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits).
Signed-off-by: NLi Zefan <lizefan@huawei.com>

b4d7c3c9

Btrfs: kill root from btrfs_is_free_space_inode · 83eea1f1

由 Liu Bo 提交于 7月 10, 2012

Since root can be fetched via BTRFS_I macro directly, we can save an args
for btrfs_is_free_space_inode().
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

83eea1f1

Btrfs: fix typo in cow_file_range_async and async_cow_submit · 287082b0

由 Liu Bo 提交于 6月 28, 2012

It should be 10 * 1024 * 1024.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

287082b0

Btrfs: return error of btrfs_update_inode() to caller · b9959295

由 Tsutomu Itoh 提交于 6月 25, 2012

We didn't check error of btrfs_update_inode(), but that error looks
easy to bubble back up.
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

b9959295

Btrfs: don't update atime on RO subvolumes · 2bc55652

由 Alexander Block 提交于 6月 15, 2012

Before the update_time inode operation was indroduced, it was
not possible to prevent updates of atime on RO subvolumes. VFS
was only able to check for RO on the mount, but did not know
anything about btrfs subvolumes.

btrfs_update_time does now check if the root is RO and skip
updating of times.
Signed-off-by: NAlexander Block <ablock84@googlemail.com>

2bc55652

14 7月, 2012 3 次提交

don't pass nameidata to ->create() · ebfc3b49

由 Al Viro 提交于 6月 10, 2012

boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ebfc3b49

stop passing nameidata to ->lookup() · 00cd8dd3

由 Al Viro 提交于 6月 10, 2012

Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument.  And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

00cd8dd3

A
vfs: switch i_dentry/d_alias to hlist · b3d9b7a3
由 Al Viro 提交于 6月 09, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b3d9b7a3

03 7月, 2012 2 次提交

Btrfs: fix wrong check during log recovery · 6bf02314

由 Liu Bo 提交于 6月 25, 2012

When we're evicting an inode during log recovery, we need to ensure that the inode
is not in orphan state any more, which means inode's run_time flags has _no_
BTRFS_INODE_HAS_ORPHAN_ITEM.  Thus, the BUG_ON was triggered because of a wrong
check for the flags.
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

6bf02314

Btrfs: fix dio write vs buffered read race · c3473e83

由 Josef Bacik 提交于 6月 19, 2012

Miao pointed out there's a problem with mixing dio writes and buffered
reads. If the read happens between us invalidating the page range and
actually locking the extent we can bring in pages into page cache. Then
once the write finishes if somebody tries to read again it will just find
uptodate pages and we'll read stale data. So we need to lock the extent and
check for uptodate bits in the range. If there are uptodate bits we need to
unlock and invalidate again. This will keep this race from happening since
we will hold the extent locked until we create the ordered extent, and then
teh read side always waits for ordered extents. There was also a race in
how we updated i_size, previously we were relying on the generic DIO stuff
to adjust the i_size after the DIO had completed, but this happens outside
of the extent lock which means reads could come in and not see the updated
i_size. So instead move this work into where we create the extents, and
then this way the update ordered i_size stuff works properly in the endio
handlers. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

c3473e83

21 6月, 2012 1 次提交

Btrfs: delay iput with async extents · cb77fcd8

由 Josef Bacik 提交于 6月 15, 2012

There is some concern that these iput()'s could be the final iputs and could
induce lockups on people waiting on writeback. This would happen in the
rare case that we don't create ordered extents because of an error, but it
is theoretically possible and we already have a mechanism to deal with this
so just make them delayed iputs to negate any worry.
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

cb77fcd8

15 6月, 2012 5 次提交

Btrfs: fix missing inherited flag in rename · bc178237

由 Liu Bo 提交于 6月 14, 2012

When we move a file into a directory with compression flag, we need to
inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well.
But if we move a file into a directory without compression flag, we need
to clear both of them.

It is the way how our setflags deals with compression flag, so keep
the same behaviour here.
Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

bc178237

Btrfs: call filemap_fdatawrite twice for compression · 7ddf5a42

由 Josef Bacik 提交于 6月 08, 2012

I removed this in an earlier commit and I was wrong. Because compression
can return from filemap_fdatawrite() without having actually set any of it's
pages as writeback() it can make filemap_fdatawait() do essentially nothing,
and then we won't find any ordered extents because they may not have been
created yet. So not only does this make fsync() completely useless, but it
will also screw up if you truncate on a non-page aligned offset since we
zero out the end and then wait on ordered extents and then call drop caches.
We can drop the cache before the io completes and then we try to unpin the
extent we just wrote we won't find it and everything goes sideways. So fix
this by putting it back and put a giant comment there to keep me from trying
to remove it in the future. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7ddf5a42

Btrfs: keep inode pinned when compressing writes · 8180ef88

由 Josef Bacik 提交于 6月 08, 2012

A user reported lots of problems using compression on the new code and it
turns out part of the problem was that igrab() was failing when we added a
new ordered extent. This is because when writing out an inode under
compression we immediately return without actually doing anything to the
pages, and then in another thread at some point down the line actually do
the ordered dance. The problem is between the point that we start writeback
and we actually add the ordered extent we could be trying to reclaim the
inode, which makes igrab() return NULL. So we need to do an igrab() when we
create the async extent and then drop it when we are done with it. This
makes sure we stay pinned in memory until the ordered extent can get a
reference on it and we are good to go. With this patch we no longer panic
in btrfs_finish_ordered_io(). Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

8180ef88

Btrfs: unlock everything properly in the error case for nocow · 17ca04af

由 Josef Bacik 提交于 5月 31, 2012

I was getting hung on umount when a transaction was aborted because a range
of one of the free space inodes was still locked. This is because the nocow
stuff doesn't unlock anything on error. This fixed the problem and I
verified that is what was happening. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

17ca04af

Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error · beb42dd7

由 Josef Bacik 提交于 5月 30, 2012

While doing my enospc work I got a transaction abortion that resulted in a
panic when we tried to unlock_page() an already unlocked page.  This is
because we aren't calling extent_clear_unlock_delalloc with the locked page
so it was unlocking all the pages in the range.  This is wrong since
__extent_writepage expects to have the page locked still unless we return
*page_started as 1.  This should keep us from panicing.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

beb42dd7

02 6月, 2012 1 次提交

Btrfs: move over to use ->update_time · e41f941a

由 Josef Bacik 提交于 3月 26, 2012

Btrfs had been doing it's own file_update_time so we could catch ENOSPC
properly, so just update our btrfs_update_time to work with the new stuff and
then we'll be fancy later.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

e41f941a

30 5月, 2012 2 次提交

Btrfs: fall back to non-inline if we don't have enough space · 2adcac1a

由 Josef Bacik 提交于 5月 23, 2012

If cow_file_range_inline fails with ENOSPC we abort the transaction which
isn't very nice. This really shouldn't be happening anyways but there's no
sense in making it a horrible error when we can easily just go allocate
normal data space for this stuff. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

2adcac1a

Btrfs: fix how we deal with the orphan block rsv · 8a35d95f

由 Josef Bacik 提交于 5月 23, 2012

Ceph was hitting this race where we would remove an inode from the per-root
orphan list before we would release the space we had reserved for the inode.
We actually don't need a list or anything, we just need to make sure the
root doesn't try to free up the orphan reserve until after the inodes have
released their reservations. So use an atomic counter instead of a list on
the root and only decrement the counter after we've released our
reservation. I've tested this as well as several others and we no longer
see the warnings that you would see while running ceph. Thanks,
Btrfs: fix how we deal with the orphan block rsv

8a35d95f

bug2833 / cloud-kernel 与 Fork 源项目一致

bug2833 / cloud-kernel
与 Fork 源项目一致