提交 · 462d6fac8960a3ba797927adfcbd29d503eb16fd · openanolis / cloud-kernel

20 10月, 2011 33 次提交

Btrfs: introduce convert_extent_bit · 462d6fac

由 Josef Bacik 提交于 9月 26, 2011

If I have a range where I know a certain bit is and I want to set it to another
bit the only option I have is to call set and then clear bit, which will result
in 2 tree searches.  This is inefficient, so introduce convert_extent_bit which
will go through and set the bit I want and clear the old bit I don't want.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

462d6fac

Btrfs: check unused against how much space we actually want · ef3be457

由 Josef Bacik 提交于 9月 22, 2011

There is a bug that may lead to early ENOSPC in our reservation code. We've
been checking against num_bytes which may be above and beyond what we want to
actually reserve, which could give us a false ENOSPC. Fix this by making sure
the unused space is above how much we want to reserve and not how much we're
trying to flush. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

ef3be457

Btrfs: fix orphan cleanup regression · a8c9e576

由 Josef Bacik 提交于 9月 21, 2011

In fixing how we deal with bad inodes, we had a regression in the orphan cleanup
code, since it expects to get a bad inode back.  So fix it to deal with getting
-ESTALE back by deleting the orphan item manually and moving on.  Thanks,
Reported-by: NSimon Kirby <sim@hostway.ca>
Signed-off-by: NJosef Bacik <josef@redhat.com>

a8c9e576

Btrfs: use the inode's mapping mask for allocating pages · 3b16a4e3

由 Josef Bacik 提交于 9月 21, 2011

Johannes pointed out we were allocating only kernel pages for doing writes,
which is kind of a big deal if you are on 32bit and have more than a gig of ram.
So fix our allocations to use the mapping's gfp but still clear __GFP_FS so we
don't re-enter.  Thanks,
Reported-by: NJohannes Weiner <jweiner@redhat.com>
Signed-off-by: NJosef Bacik <josef@redhat.com>

3b16a4e3

Btrfs: delay iput when deleting a block group · 455757c3

由 Josef Bacik 提交于 9月 19, 2011

I kept getting warnings from evict because we were calling
btrfs_start_transaction() with a transaction already started when doing a
balance. This is because we remove a block group which requires a transaction,
and the put the last reference on the cache inode. Instead of doing this we
need to delay the iput so it is done not within a transaction having started.
This gets rid of our warnings. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

455757c3

Btrfs: make sure to unset trans->block_rsv before running delayed refs · 9c8d86db

由 Josef Bacik 提交于 9月 19, 2011

Checksums are charged in 2 different ways. The first case is when we're writing
to the disk, we account for the new checksums with the delalloc block rsv. In
order for this to work we check if we're allocating a block for the csum root
and if trans->block_rsv == the delalloc block rsv. But when we're deleting the
csums because of cow, this is charged to the global block rsv, and is done when
we run the delayed refs. So we need to make sure that trans->block_rsv == NULL
when running the delayed refs. So set it to NULL and reset it in
should_end_transaction, and set it to NULL in commit_transaction. This got rid
of the ridiculous amount of warnings I was seeing when trying to do a balance.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

9c8d86db

Btrfs: stop passing a trans handle all around the reservation code · 4a92b1b8

由 Josef Bacik 提交于 8月 30, 2011

The only thing that we need to have a trans handle for is in
reserve_metadata_bytes and thats to know how much flushing we can do.  So
instead of passing it around, just check current->journal_info for a
trans_handle so we know if we can commit a transaction to try and free up space
or not.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4a92b1b8

Btrfs: don't get the block_rsv in btrfs_free_tree_block · d02c9955

由 Josef Bacik 提交于 8月 30, 2011

Since the durable block rsv stuff has been killed there is no need to get the
block_rsv in btrfs_free_tree_block anymore.
Signed-off-by: NJosef Bacik <josef@redhat.com>

d02c9955

Btrfs: use the transactions block_rsv for the csum root · 4c13d758

由 Josef Bacik 提交于 8月 30, 2011

The alloc warnings everybody has been seeing is because we have been reserving
space for csums, but we weren't actually using that space. So make
get_block_rsv() return the trans->block_rsv if we're modifying the csum root.
Also set the trans->block_rsv to NULL so that if we modify the csum root when
running delayed ref's that comes out of the global reserve like it's supposed
to. With this patch I'm not seeing those alloc warnings anymore. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4c13d758

Btrfs: handle enospc accounting for free space inodes · c09544e0

由 Josef Bacik 提交于 8月 30, 2011

Since free space inodes now use normal checksumming we need to make sure to
account for their metadata use. So reserve metadata space, and then if we fail
to write out the metadata we can just release it, otherwise it will be freed up
when the io completes. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

c09544e0

Btrfs: put the block group cache after we commit the super · 300e4f8a

由 Josef Bacik 提交于 8月 29, 2011

In moving some enospc stuff around I noticed that when we unmount we are often
evicting the free space cache inodes before we do our last commit. This isn't
bad, but it makes us constantly have to re-read the inodes back. So instead
don't evict the cache until after we do our last commit, this will make things a
little less crappy and makes a future enospc change work properly. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

300e4f8a

Btrfs: set truncate block rsv's size · 4a338542

由 Josef Bacik 提交于 8月 29, 2011

While debugging a different issue I noticed that we were always reserving space
when we tried to use our truncate block rsv's. This is because they didn't have
a ->size value, so use_block_rsv just assumes there is nothing reserved and it
does a reserve_metadata_bytes. This is because btrfs_check_block_rsv() doesn't
actually add to the size of the block rsv. That seems to be the right thing to
do so set ->size to the minimum truncate size we need, since we will always only
refill to that size anyway, and this way everything works out correctly.
Signed-off-by: NJosef Bacik <josef@redhat.com>

4a338542

Btrfs: don't increase the block_rsv's size when emergency allocating space · 7f701508

由 Josef Bacik 提交于 8月 22, 2011

If we have to emergency reserve space we need to not increase the block_rsv
size, otherwise we'll leak space. Take for instance delalloc, say we reserve
4k, and we use that 4k, and then we have to emergency allocate another 4k, we
bump the size up to 8k, however we've only accounted for 4k in reservations in
all of our supporting logic, so we'll go to free the 4k and end up having a size
of 4k, which will cause us to later not free as much space. I saw this doing
testing where I wasn't reserving enough space for something but was still
leaking space, very frustrating. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7f701508

Btrfs: fix space leak when we fail to make an allocation · 7ed49f18

由 Josef Bacik 提交于 8月 19, 2011

When changing back to using a spin_lock to protect the extent counters I decided
that since we would only be dropping our original extent, it was ok to just drop
the extent and return. However since somebody else could have come in and done
a reservation, we need to do the normal song and dance to clear the reservation
out properly. So calculate how much space we need to free, and then subtract
what we just attempted to reserve. If it's more then we know we need to drop
those bytes from the delalloc block rsv. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7ed49f18

Btrfs: fix call to btrfs_search_slot in free space cache · a9b5fcdd

由 Josef Bacik 提交于 8月 19, 2011

We are setting ins_len to 1 even tho we are just modifying an item that should
be there already. This may cause the search stuff to split nodes on the way
down needelessly. Set this to 0 since we aren't inserting anything. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

a9b5fcdd

Btrfs: allow callers to specify if flushing can occur for btrfs_block_rsv_check · 482e6dc5

由 Josef Bacik 提交于 8月 19, 2011

If you run xfstest 224 it you will get lots of messages about not being able to
delete inodes and that they will be cleaned up next mount. This is because
btrfs_block_rsv_check was not calling reserve_metadata_bytes with the ability to
flush, so if there was not enough space, it simply failed. But in truncate and
evict case we could easily flush space to try and get enough space to do our
work, so make btrfs_block_rsv_check take a flush argument to pass down to
reserve_metadata_bytes. Now xfstests 224 runs fine without all those
complaints. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

482e6dc5

Btrfs: reduce the amount of space needed for truncates · 07127184

由 Josef Bacik 提交于 8月 19, 2011

With btrfs_truncate_inode_items we always return if we have to go to another
leaf, which makes us do our reservation again. This means we will only ever
modify one leaf at a time, so we only need 1 items worth of slack space. Also,
since we are deleting we will not be creating nodes as we go down, if anything
we'll be free'ing them as we merge them together, so make a different
calculation for truncate which will only have the worst case useage of COW'ing
the entire path down to the leaf. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

07127184

Btrfs: only reserve space in fallocate if we have to do a preallocate · 1b9c332b

由 Josef Bacik 提交于 8月 17, 2011

Lukas found a problem where if he tries to fallocate over the same region twice
and the first fallocate took up all the space we would fail with ENOSPC. This
is because we reserve the total space we want to use for fallocate, regardless
of wether or not we will have to actually preallocate. So instead move the
check into the loop where we actually have to do the preallocate. Thanks,
Tested-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NJosef Bacik <josef@redhat.com>

1b9c332b

Btrfs: kill btrfs_truncate_reserve_metadata · 5e962c78

由 Josef Bacik 提交于 8月 08, 2011

Since we've optimized the truncate path, we no longer require this function.
Signed-off-by: NJosef Bacik <josef@redhat.com>

5e962c78

Btrfs: optimize how we account for space in truncate · 907cbceb

由 Josef Bacik 提交于 8月 08, 2011

Currently we're starting and stopping a transaction for no real reason, so kill
that and just reserve enough space as if we can truncate all in one transaction.
Also use btrfs_block_rsv_check() for our reserve to minimize the amount of space
we may have to allocate for our slack space. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

907cbceb

Btrfs: don't try to commit in btrfs_block_rsv_check · 13553e52

由 Josef Bacik 提交于 8月 08, 2011

We will try and reserve metadata bytes in btrfs_block_rsv_check and if we cannot
because we have a transaction open it will return EAGAIN, so we do not need to
try and commit the transaction again.
Signed-off-by: NJosef Bacik <josef@redhat.com>

13553e52

Btrfs: kill unused parts of block_rsv · dabdb640

由 Josef Bacik 提交于 8月 08, 2011

The priority and refill_used flags are not used anymore, and neither is the
usage counter, so just remove them from btrfs_block_rsv.
Signed-off-by: NJosef Bacik <josef@redhat.com>

dabdb640

Btrfs: ratelimit the generation printk for the free space cache · 6ab60601

由 Josef Bacik 提交于 8月 08, 2011

A user reported getting spammed when moving to 3.0 by this message.  Since we
switched to the normal checksumming infrastructure all old free space caches
will be wrong and need to be regenerated so people are likely to see this
message a lot, so ratelimit it so it doesn't fill up their logs and freak them
out.  Thanks,
Reported-by: NAndrew Lutomirski <luto@mit.edu>
Signed-off-by: NJosef Bacik <josef@redhat.com>

6ab60601

Btrfs: fix how we reserve space for deleting inodes · 4289a667

由 Josef Bacik 提交于 8月 05, 2011

I converted btrfs_truncate to do sane reservations for truncate, but didn't
convert btrfs_evict_inode. Basically we need to save the orphan_rsv for
deleting the orphan item, and do normal reservations for our truncate. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

4289a667

Btrfs: kill the durable block rsv stuff · 37be25bc

由 Josef Bacik 提交于 8月 05, 2011

This is confusing code and isn't used by anything anymore, so delete it.
Signed-off-by: NJosef Bacik <josef@redhat.com>

37be25bc

Btrfs: kill the orphan space calculation for snapshots · dba68306

由 Josef Bacik 提交于 8月 04, 2011

This patch kills off the calculation for the amount of space needed for the
orphan operations during a snapshot. The thing is we only do snapshots on
commit, so any space that is in the block_rsv->freed[] isn't going to be in the
new snapshot anyway, so there isn't any reason to require that space to be
reserved for the snapshot to occur. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

dba68306

Btrfs: calculate checksum space correctly · 7709cde3

由 Josef Bacik 提交于 8月 04, 2011

We have not been reserving enough space for checksums. We were just reserving
bytes for the checksum items themselves, we were not taking into account having
to cow the tree and such. This patch adds a csum_bytes counter to the inode for
keeping track of the number of bytes outstanding we have for checksums. Then we
calculate how many leaves would be required for the checksums we are given and
use that to reserve space. This adds a significant amount of bytes to our
reservations, but we will handle this later. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

7709cde3

Btrfs: skip looking for delalloc if we don't have ->fill_delalloc · 9e487107

由 Josef Bacik 提交于 8月 01, 2011

We always look for delalloc bytes in our io_tree so we can fill in delalloc.
This is fine in most cases, but if we're writing out the btree_inode this is
just a superfluous tree search on the io_tree, and if we have a lot of metadata
dirty this could be an expensive check. So instead check to see if our io_tree
has a ->fill_delalloc op, and if not don't even bother doing the lookup.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

9e487107

Btrfs: use bytes_may_use for all ENOSPC reservations · fb25e914

由 Josef Bacik 提交于 7月 26, 2011

We have been using bytes_reserved for metadata reservations, which is wrong
since we use that to keep track of outstanding reservations from the allocator.
This resulted in us doing a lot of silly things to make sure we don't allocate a
bunch of metadata chunks since we never had a real view of how much space was
actually in use by metadata.

This passes Arne's enospc test and xfstests as well as my own enospc tests.
Hopefully this will get us moving in the right direction. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

fb25e914

Btrfs: fix how we mount subvol=<whatever> · 830c4adb

由 Josef Bacik 提交于 7月 25, 2011

We've only been able to mount with subvol=<whatever> where whatever was a subvol
within whatever root we had as the default. This allows us to mount -o
subvol=path/to/subvol/you/want relative from the normal fs_tree root. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

830c4adb

Btrfs: use d_obtain_alias when mounting subvol/subvolid · ba5b8958

由 Josef Bacik 提交于 7月 25, 2011

Currently what we do is just wrong.  We either

1) Alloc a new "root" dentry with sb->s_root as it's parent which is just wrong
as we could walk into this subvol later on via another path and hilarity could
ensue.  Also we don't check the return value of d_splice_alias which isn't good
either.

or

2) Do a d_find_alias() which we could have lost our dentry from cache at this
point and found nothing.

So use d_obtain_alias().  In the case that we already have the inode/dentry in
cache we will get the correct dentry.  If not we will get a disconnected dentry
tree so if we walk into it later on everything will be connected up properly.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

ba5b8958

Btrfs: kill reserved_bytes in inode · 0cbbdf7c

由 Josef Bacik 提交于 7月 14, 2011

reserved_bytes is not used for anything in the inode, remove it.
Signed-off-by: NJosef Bacik <josef@redhat.com>

0cbbdf7c

Btrfs: move stuff around in btrfs_inode to get better packing · f1bdcc0a

由 Josef Bacik 提交于 7月 14, 2011

Moving things around to give us better packing in the btrfs_inode.  This reduces
the size of our inode by 8 bytes.  Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

f1bdcc0a

01 10月, 2011 1 次提交

Btrfs: force a page fault if we have a shorty copy on a page boundary · b6316429

由 Josef Bacik 提交于 9月 30, 2011

A user reported a problem where ceph was getting into 100% cpu usage while doing
some writing. It turns out it's because we were doing a short write on a not
uptodate page, which means we'd fall back at one page at a time and fault the
page in. The problem is our position is on the page boundary, so our fault in
logic wasn't actually reading the page, so we'd just spin forever or until the
page got read in by somebody else. This will force a readpage if we end up
doing a short copy. Alexandre could reproduce this easily with ceph and reports
it fixes his problem. I also wrote a reproducer that no longer hangs my box
with this patch. Thanks,
Reported-and-tested-by: NAlexandre Oliva <aoliva@redhat.com>
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NChris Mason <chris.mason@oracle.com>

b6316429

27 9月, 2011 3 次提交

vfs: remove LOOKUP_NO_AUTOMOUNT flag · b6c8069d

由 Linus Torvalds 提交于 9月 27, 2011

That flag no longer makes sense, since we don't look up automount points
as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT
handling was buggy to begin with: it would avoid automounting even for
cases where we really *needed* to do the automount handling, and could
return ENOENT for autofs entries that hadn't been instantiated yet.

With our new non-eager automount semantics, one discussion has been
about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
newfstatat() and fstatat64() system calls), but it's probably not worth
it: you can always force at least directory automounting by simply
adding the final '/' to the filename, which works for *all* of the stat
family system calls, old and new.

So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
result of our bad default behavior.
Acked-by: NIan Kent <raven@themaw.net>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b6c8069d

VFS: Fix the remaining automounter semantics regressions · 815d405c

由 Trond Myklebust 提交于 9月 26, 2011

The concensus seems to be that system calls such as stat() etc should
not trigger an automount.  Neither should the l* versions.

This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups
that _should_ trigger an automount on the last path element.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
[ Edited to leave out the cases that are already covered by LOOKUP_OPEN,
  LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally
  force automounting for their own reasons   - Linus ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

815d405c

vfs pathname lookup: Add LOOKUP_AUTOMOUNT flag · d94c177b

由 Linus Torvalds 提交于 9月 26, 2011

Since we've now turned around and made LOOKUP_FOLLOW *not* force an
automount, we want to add the ability to force an automount event on
lookup even if we don't happen to have one of the other flags that force
it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)

Most cases will never want to use this, since you'd normally want to
delay automounting as long as possible, which usually implies
LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
the automount any more).

But Trond argued sufficiently forcefully that at a minimum bind mounting
a file and quotactl will want to force the automount lookup.  Some other
cases (like nfs_follow_remote_path()) could use it too, although
LOOKUP_DIRECTORY would work there as well.

This commit just adds the flag and logic, no users yet, though.  It also
doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
was made irrelevant by the same change that made us not follow on
LOOKUP_FOLLOW.

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d94c177b

22 9月, 2011 3 次提交

teach /proc/$pid/numa_maps about transparent hugepages · 32ef4384

由 Dave Hansen 提交于 9月 20, 2011

This is modeled after the smaps code.

It detects transparent hugepages and then does a single gather_stats()
for the page as a whole.  This has two benifits:
 1. It is more efficient since it does many pages in a single shot.
 2. It does not have to break down the huge page.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Acked-by: NHugh Dickins <hughd@google.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

32ef4384

break out numa_maps gather_pte_stats() checks · 3200a8aa

由 Dave Hansen 提交于 9月 20, 2011

gather_pte_stats() does a number of checks on a target page
to see whether it should even be considered for statistics.
This breaks that code out in to a separate function so that
we can use it in the transparent hugepage case in the next
patch.
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Acked-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NChristoph Lameter <cl@gentwo.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3200a8aa

make /proc/$pid/numa_maps gather_stats() take variable page size · eb4866d0

由 Dave Hansen 提交于 9月 20, 2011

We need to teach the numa_maps code about transparent huge pages.  The
first step is to teach gather_stats() that the pte it is dealing with
might represent more than one page.

Note that will we use this in a moment for transparent huge pages since
they have use a single pmd_t which _acts_ as a "surrogate" for a bunch
of smaller pte_t's.

I'm a _bit_ unhappy that this interface counts in hugetlbfs page sizes
for hugetlbfs pages and PAGE_SIZE for normal pages.  That means that to
figure out how many _bytes_ "dirty=1" means, you must first know the
hugetlbfs page size.  That's easier said than done especially if you
don't have visibility in to the mount.

But, that's probably a discussion for another day especially since it
would change behavior to fix it.  But, just in case anyone wonders why
this patch only passes a '1' in the hugetlb case...
Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
Acked-by: NHugh Dickins <hughd@google.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eb4866d0

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功