提交 · 1c994a0909a556508c2cc26ab5d9e13c5ce33aa0 · openeuler / Kernel

10 9月, 2014 10 次提交

locks: consolidate "nolease" routines · 1c994a09

由 Jeff Layton 提交于 10年前

GFS2 and NFS have setlease routines that always just return -EINVAL.
Turn that into a generic routine that can live in fs/libfs.c.

Cc: <linux-nfs@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: <cluster-devel@redhat.com>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

1c994a09

J
locks: remove lock_may_read and lock_may_write · 699688a4
由 Jeff Layton 提交于 10年前
```
There are no callers of these functions.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
```
699688a4

lockd: rip out deferred lock handling from testlock codepath · 09802fd2

由 Jeff Layton 提交于 10年前

As Kinglong points out, the nlm_block->b_fl field is no longer used at
all. Also, vfs_test_lock in the generic locking code will only return
FILE_LOCK_DEFERRED if FL_SLEEP is set, and it isn't here.

The only other place that returns that value is the DLM lock code, but
it only does that in dlm_posix_lock, never in dlm_posix_get.

Remove all of the deferred locking code from the testlock codepath
since it doesn't appear to ever be used anyway.

I do have a small concern that this might cause a behavior change in the
case where you have a block already sitting on the list when the
testlock request comes in, but that looks like it doesn't really work
properly anyway. I think it's best to just pass that down to
vfs_test_lock and let the filesystem report that instead of trying to
infer what's going on with the lock by looking at an existing block.

Cc: cluster-devel@redhat.com
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Reviewed-by: NKinglong Mee <kinglongmee@gmail.com>

09802fd2

NFSD: Get reference of lockowner when coping file_lock · aef9583b

由 Kinglong Mee 提交于 10年前

v5: using nfs4_get_stateowner() instead of an inline function
v3: Update based on Jeff's comments
v2: Fix bad using of struct file_lock_operations for handle the owner
Acked-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

aef9583b

NFSD: New helper nfs4_get_stateowner() for atomic_inc sop reference · b5971afa

由 Kinglong Mee 提交于 10年前

v5: same as the first version
Reviewed-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

b5971afa

locks: Copy fl_lmops information for conflock in locks_copy_conflock() · f328296e

由 Kinglong Mee 提交于 10年前

Commit d5b9026a ([PATCH] knfsd: locks: flag NFSv4-owned locks) using
fl_lmops field in file_lock for checking nfsd4 lockowner.

But, commit 1a747ee0 (locks: don't call ->copy_lock methods on return
of conflicting locks) causes the fl_lmops of conflock always be NULL.

Also, commit 0996905f (lockd: posix_test_lock() should not call
locks_copy_lock()) caused the fl_lmops of conflock always be NULL too.

Make sure copy the private information by fl_copy_lock() in struct
file_lock_operations, merge __locks_copy_lock() to fl_copy_lock().

Jeff advice, "Set fl_lmops on conflocks, but don't set fl_ops.
fl_ops are superfluous, since they are callbacks into the filesystem.
There should be no need to bother the filesystem at all with info
in a conflock. But, lock _ownership_ matters for conflocks and that's
indicated by the fl_lmops. So you really do want to copy the fl_lmops
for conflocks I think."

v5: add missing calling of locks_release_private() in nlmsvc_testlock()
v4: only copy fl_lmops for conflock, don't copy fl_ops
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

f328296e

locks: New ops in lock_manager_operations for get/put owner · 5c97d7b1

由 Kinglong Mee 提交于 10年前

NFSD or other lockmanager may increase the owner's reference,
so adds two new options for copying and releasing owner.

v5: change order from 2/6 to 3/6
v4: rename lm_copy_owner/lm_release_owner to lm_get_owner/lm_put_owner
Reviewed-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

5c97d7b1

locks: Rename __locks_copy_lock() to locks_copy_conflock() · 3fe0fff1

由 Kinglong Mee 提交于 10年前

Jeff advice, " Right now __locks_copy_lock is only used to copy
conflocks. It would be good to rename that to something more
distinct (i.e.locks_copy_conflock), to make it clear that we're
generating a conflock there."

v5: change order from 3/6 to 2/6
v4: new patch only renaming function name
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

3fe0fff1

locks: Remove unused conf argument from lm_grant · d0449b90

由 Joe Perches 提交于 10年前

This argument is always NULL so don't pass it around.

[jlayton: remove dependencies on previous patches in series]
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

d0449b90

locks: pass correct "before" pointer to locks_unlink_lock in generic_add_lease · f39b913c

由 Jeff Layton 提交于 10年前

The argument to locks_unlink_lock can't be just any pointer to a
pointer. It must be a pointer to the fl_next field in the previous
lock in the list.

Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f39b913c

03 9月, 2014 2 次提交

nfs: do not start the callback thread until we set rqstp->rq_task · 66f09ca7

由 Trond Myklebust 提交于 10年前

This fixes an Oopsable race when starting up the callback server.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

66f09ca7

lockd: Do not start the lockd thread before we've set nlmsvc_rqst->rq_task · d4e89902

由 Trond Myklebust 提交于 10年前

This fixes an Oopsable race when starting lockd.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d4e89902

29 8月, 2014 3 次提交

nfsd4: remove labeled NFS warning from config help · ccad7dad

由 J. Bruce Fields 提交于 10年前

The working group appears committed to keeping the protocol stable, the
code has gotten some use and seems to work OK.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ccad7dad

NFSD: Update some as-yet unused 4.2 error codes · 2b8941b9

由 Anna Schumaker 提交于 10年前

Recent NFS v4.2 drafts have removed NFS4ERR_METADATA_NOTSUPP and
reassigned the error code to NFS4ERR_UNION_NOTSUPP.

I also add in the NFS4ERR_OFFLOAD_NO_REQS error code.

We're not using any of these yet, so there's no harm done.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2b8941b9

NFSD: Remove duplicate initialization of file_lock · 6cd90662

由 Kinglong Mee 提交于 10年前

locks_alloc_lock() has initialized struct file_lock, no need to
re-initialize it here.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6cd90662

19 8月, 2014 1 次提交

nfsd: allow turning off nfsv3 readdir_plus · 18c01ab3

由 Rajesh Ghanekar 提交于 10年前

One of our customer's application only needs file names, not file
attributes. With directories having 10K+ inodes (assuming buffer cache
has directory blocks cached having file names, but inode cache is
limited and hence need eviction of older cached inodes), older inodes
are evicted periodically. So if they keep on doing readdir(2) from NSF
client on multiple directories, some directory's files are periodically
removed from inode cache and hence new readdir(2) on same directory
requires disk access to bring back inodes again to inode cache.

As READDIRPLUS request fetches attributes also, doing getattr on each
file on server, it causes unnecessary disk accesses. If READDIRPLUS on
NFS client is returned with -ENOTSUPP, NFS client uses READDIR request
which just gets the names of the files in a directory, not attributes,
hence avoiding disk accesses on server.

There's already a corresponding client-side mount option, but an export
option reduces the need for configuration across multiple clients.

This flag affects NFSv3 only. If it turns out it's needed for NFSv4 as
well then we may have to figure out how to extend the behavior to NFSv4,
but it's not currently obvious how to do that.
Signed-off-by: NRajesh Ghanekar <rajesh_ghanekar@symantec.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

18c01ab3

18 8月, 2014 13 次提交

nfsd4: reserve adequate space for LOCK op · f7b43d0c

由 J. Bruce Fields 提交于 10年前

As of 8c7424cf "nfsd4: don't try to encode conflicting owner if low
on space", we permit the server to process a LOCK operation even if
there might not be space to return the conflicting lockowner, because
we've made returning the conflicting lockowner optional.

However, the rpc server still wants to know the most we might possibly
return, so we need to take into account the possible conflicting
lockowner in the svc_reserve_space() call here.

Symptoms were log messages like "RPC request reserved 88 but used 108".

Fixes: 8c7424cf "nfsd4: don't try to encode conflicting owner if low on space"
Reported-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f7b43d0c

J
nfsd4: remove obsolete comment · 1383bf37
由 J. Bruce Fields 提交于 10年前
```
We do what Neil suggests now.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
1383bf37

nfsd3: Check write permission after checking existence · 63bab065

由 Ross Lagerwall 提交于 10年前

When creating a file that already exists in a read-only directory with
O_EXCL, the NFSv3 server returns EACCES rather than EEXIST (which local
files and the NFSv4 server return). Fix this by checking the MAY_CREATE
permission only if the file does not exist. Since this already happens
in do_nfsd_create, the check in nfsd3_proc_create can simply be removed.
Signed-off-by: NRoss Lagerwall <rosslagerwall@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

63bab065

nfsd: call nfs4_put_deleg_lease outside of state_lock · afbda402

由 Jeff Layton 提交于 10年前

Currently, we hold the state_lock when releasing the lease. That's
potentially problematic in the future if we allow for setlease methods
that can sleep. Move the nfs4_put_deleg_lease call out of the delegation
unhashing routine (which was always a bit goofy anyway), and into the
unlocked sections of the callers of unhash_delegation_locked.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

afbda402

nfsd: protect lease-related nfs4_file fields with fi_lock · 6bcc034e

由 Jeff Layton 提交于 10年前

Currently these fields are protected with the state_lock, but that
doesn't really make a lot of sense. These fields are "private" to the
nfs4_file, and can be protected with the more granular fi_lock.

The fi_lock is already held when setting these fields. Make the code
hold the fp->fi_lock when clearing the lease-related fields in the
nfs4_file, and no longer require that the state_lock be held when
calling into this function.

To prevent lock inversion with the i_lock, we also move the vfs_setlease
and fput calls outside of the fi_lock. This also sets us up for allowing
vfs_setlease calls to block in the future.

Finally, remove a redundant NULL pointer check. unhash_delegation_locked
locks the fp->fi_lock prior to that check, so fp in that function must
never be NULL.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6bcc034e

nfsd: Reorder nfsd_cache_match to check more powerful discriminators first · ef9b16dc

由 Trond Myklebust 提交于 10年前

We would normally expect the xid and the checksum to be the best
discriminators. Check them before looking at the procedure number,
etc.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ef9b16dc

nfsd: split DRC global spinlock into per-bucket locks · 89a26b3d

由 Trond Myklebust 提交于 10年前

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

89a26b3d

nfsd: convert num_drc_entries to an atomic_t · 31e60f52

由 Trond Myklebust 提交于 10年前

...so we can remove the spinlocking around it.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

31e60f52

nfsd: Remove the cache_hash list · 11acf6ef

由 Trond Myklebust 提交于 10年前

Now that the lru list is per-bucket, we don't need a second list for
searches.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

11acf6ef

nfsd: convert the lru list into a per-bucket thing · bedd4b61

由 Trond Myklebust 提交于 10年前

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

bedd4b61

T
nfsd: Clean up drc cache in preparation for global spinlock elimination · 7142b98d
由 Trond Myklebust 提交于 10年前
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
7142b98d

nfs: Ensure that nfs_callback_start_svc sets the server rq_task... · 88799977

由 Trond Myklebust 提交于 10年前

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

88799977

lockd: Ensure that lockd_start_svc sets the server rq_task... · d6a7ce42

由 Trond Myklebust 提交于 10年前

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d6a7ce42

15 8月, 2014 9 次提交

btrfs: disable strict file flushes for renames and truncates · 8d875f95

由 Chris Mason 提交于 10年前

Truncates and renames are often used to replace old versions of a file
with new versions.  Applications often expect this to be an atomic
replacement, even if they haven't done anything to make sure the new
version is fully on disk.

Btrfs has strict flushing in place to make sure that renaming over an
old file with a new file will fully flush out the new file before
allowing the transaction commit with the rename to complete.

This ordering means the commit code needs to be able to lock file pages,
and there are a few paths in the filesystem where we will try to end a
transaction with the page lock held.  It's rare, but these things can
deadlock.

This patch removes the ordered flushes and switches to a best effort
filemap_flush like ext4 uses. It's not perfect, but it should fix the
deadlocks.
Signed-off-by: NChris Mason <clm@fb.com>

8d875f95

Btrfs: fix csum tree corruption, duplicate and outdated checksums · 27b9a812

由 Filipe Manana 提交于 10年前

Under rare circumstances we can end up leaving 2 versions of a checksum
for the same file extent range.

The reason for this is that after calling btrfs_next_leaf we process
slot 0 of the leaf it returns, instead of processing the slot set in
path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after
btrfs_next_leaf() releases the path and before it searches for the next
leaf, another task might cause a split of the next leaf, which migrates
some of its keys to the leaf we were processing before calling
btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
same leaf but with path->slots[0] having a slot number corresponding
to the first new key it got, that is, a slot number that didn't exist
before calling btrfs_next_leaf(), as the leaf now has more keys than
it had before. So we must really process the returned leaf starting at
path->slots[0] always, as it isn't always 0, and the key at slot 0 can
have an offset much lower than our search offset/bytenr.

For example, consider the following scenario, where we have:

sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568
four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472

  Leaf N:

    slot = 0                           slot = btrfs_header_nritems() - 1
  |-------------------------------------------------------------------|
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
  |-------------------------------------------------------------------|

  Leaf N + 1:

      slot = 0                          slot = btrfs_header_nritems() - 1
  |--------------------------------------------------------------------|
  | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
  |--------------------------------------------------------------------|

Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
find the next highest key, which releases the current path and then searches
for that next key. However after releasing the path and before finding that
next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
btrfs_next_leaf() will returns us a path again with leaf N but with the slot
pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
is then:

    slot = 0                        slot = btrfs_header_nritems() - 2  slot = btrfs_header_nritems() - 1
  |----------------------------------------------------------------------------------------------------|
  | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4]  [(CSUM CSUM 40161280), size 32] |
  |----------------------------------------------------------------------------------------------------|

And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
into the "insert:" label, which will set tmp to:

    tmp = min((sums->len - total_bytes) >> blocksize_bits,
        (next_offset - file_key.offset) >> blocksize_bits) =
    min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) =
    min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4

and

   ins_size = csum_size * tmp = 4 * 4 = 16 bytes.

In other words, we insert a new csum item in the tree with key
(CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums
for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong,
because the item with key (CSUM CSUM 40161280) (the one that was moved from
leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
bytes of our data and won't get those old checksums removed.

So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
and breaks the logical rule:

   Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover

An obvious bad effect of this is that a subsequent csum tree lookup to get
the checksum of any of the blocks with logical offset of 40161280, 40165376
or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.

Cc: stable@vger.kernel.org
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

27b9a812

Btrfs: Fix memory corruption by ulist_add_merge() on 32bit arch · 4eb1f66d

由 Takashi Iwai 提交于 10年前

We've got bug reports that btrfs crashes when quota is enabled on
32bit kernel, typically with the Oops like below:
 BUG: unable to handle kernel NULL pointer dereference at 00000004
 IP: [<f9234590>] find_parent_nodes+0x360/0x1380 [btrfs]
 *pde = 00000000
 Oops: 0000 [#1] SMP
 CPU: 0 PID: 151 Comm: kworker/u8:2 Tainted: G S      W 3.15.2-1.gd43d97e-default #1
 Workqueue: btrfs-qgroup-rescan normal_work_helper [btrfs]
 task: f1478130 ti: f147c000 task.ti: f147c000
 EIP: 0060:[<f9234590>] EFLAGS: 00010213 CPU: 0
 EIP is at find_parent_nodes+0x360/0x1380 [btrfs]
 EAX: f147dda8 EBX: f147ddb0 ECX: 00000011 EDX: 00000000
 ESI: 00000000 EDI: f147dda4 EBP: f147ddf8 ESP: f147dd38
  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
 CR0: 8005003b CR2: 00000004 CR3: 00bf3000 CR4: 00000690
 Stack:
  00000000 00000000 f147dda4 00000050 00000001 00000000 00000001 00000050
  00000001 00000000 d3059000 00000001 00000022 000000a8 00000000 00000000
  00000000 000000a1 00000000 00000000 00000001 00000000 00000000 11800000
 Call Trace:
  [<f923564d>] __btrfs_find_all_roots+0x9d/0xf0 [btrfs]
  [<f9237bb1>] btrfs_qgroup_rescan_worker+0x401/0x760 [btrfs]
  [<f9206148>] normal_work_helper+0xc8/0x270 [btrfs]
  [<c025e38b>] process_one_work+0x11b/0x390
  [<c025eea1>] worker_thread+0x101/0x340
  [<c026432b>] kthread+0x9b/0xb0
  [<c0712a71>] ret_from_kernel_thread+0x21/0x30
  [<c0264290>] kthread_create_on_node+0x110/0x110

This indicates a NULL corruption in prefs_delayed list.  The further
investigation and bisection pointed that the call of ulist_add_merge()
results in the corruption.

ulist_add_merge() takes u64 as aux and writes a 64bit value into
old_aux.  The callers of this function in backref.c, however, pass a
pointer of a pointer to old_aux.  That is, the function overwrites
64bit value on 32bit pointer.  This caused a NULL in the adjacent
variable, in this case, prefs_delayed.

Here is a quick attempt to band-aid over this: a new function,
ulist_add_merge_ptr() is introduced to pass/store properly a pointer
value instead of u64.  There are still ugly void ** cast remaining
in the callers because void ** cannot be taken implicitly.  But, it's
safer than explicit cast to u64, anyway.

Bugzilla: https://bugzilla.novell.com/show_bug.cgi?id=887046
Cc: <stable@vger.kernel.org> [v3.11+]
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NChris Mason <clm@fb.com>

4eb1f66d

Btrfs: fix compressed write corruption on enospc · ce62003f

由 Liu Bo 提交于 10年前

When failing to allocate space for the whole compressed extent, we'll
fallback to uncompressed IO, but we've forgotten to redirty the pages
which belong to this compressed extent, and these 'clean' pages will
simply skip 'submit' part and go to endio directly, at last we got data
corruption as we write nothing.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Tested-By: NMartin Steigerwald <martin@lichtvoll.de>
Signed-off-by: NChris Mason <clm@fb.com>

ce62003f

btrfs: correctly handle return from ulist_add · f90e579c

由 Mark Fasheh 提交于 10年前

ulist_add() can return '1' on sucess, which qgroup_subtree_accounting()
doesn't take into account. As a result, that value can be bubbled up to
callers, causing an error to be printed. Fix this by only returning the
value of ulist_add() when it indicates an error.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Signed-off-by: NChris Mason <clm@fb.com>

f90e579c

btrfs: qgroup: account shared subtrees during snapshot delete · 1152651a

由 Mark Fasheh 提交于 10年前

During its tree walk, btrfs_drop_snapshot() will skip any shared
subtrees it encounters. This is incorrect when we have qgroups
turned on as those subtrees need to have their contents
accounted. In particular, the case we're concerned with is when
removing our snapshot root leaves the subtree with only one root
reference.

In those cases we need to find the last remaining root and add
each extent in the subtree to the corresponding qgroup exclusive
counts.

This patch implements the shared subtree walk and a new qgroup
operation, BTRFS_QGROUP_OPER_SUB_SUBTREE. When an operation of
this type is encountered during qgroup accounting, we search for
any root references to that extent and in the case that we find
only one reference left, we go ahead and do the math on it's
exclusive counts.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

1152651a

Btrfs: read lock extent buffer while walking backrefs · 6f7ff6d7

由 Filipe Manana 提交于 10年前

Before processing the extent buffer, acquire a read lock on it, so
that we're safe against concurrent updates on the extent buffer.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

6f7ff6d7

Btrfs: __btrfs_mod_ref should always use no_quota · e339a6b0

由 Josef Bacik 提交于 10年前

Before I extended the no_quota arg to btrfs_dec/inc_ref because I didn't
understand how snapshot delete was using it and assumed that we needed the
quota operations there.  With Mark's work this has turned out to be not the
case, we _always_ need to use no_quota for btrfs_dec/inc_ref, so just drop the
argument and make __btrfs_mod_ref call it's process function with no_quota set
always.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

e339a6b0

btrfs: adjust statfs calculations according to raid profiles · ba7b6e62

由 David Sterba 提交于 10年前

This has been discussed in thread:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/32528

and this patch implements this proposal:
http://thread.gmane.org/gmane.comp.file-systems.btrfs/32536

Works fine for "clean" raid profiles where the raid factor correction
does the right job. Otherwise it's pessimistic and may show low space
although there's still some left.

The df nubmers are lightly wrong in case of mixed block groups, but this
is not a major usecase and can be addressed later.

The RAID56 numbers are wrong almost the same way as before and will be
addressed separately.

CC: Hugo Mills <hugo@carfax.org.uk>
CC: cwillu <cwillu@cwillu.com>
CC: Josef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

ba7b6e62

14 8月, 2014 2 次提交

locks: move locks_free_lock calls in do_fcntl_add_lease outside spinlock · 2dfb928f

由 Jeff Layton 提交于 10年前

There's no need to call locks_free_lock here while still holding the
i_lock. Defer that until the lock has been dropped.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

2dfb928f

locks: defer freeing locks in locks_delete_lock until after i_lock has been dropped · ed9814d8

由 Jeff Layton 提交于 10年前

In commit 72f98e72 (locks: turn lock_flocks into a spinlock), we
moved from using the BKL to a global spinlock. With this change, we lost
the ability to block in the fl_release_private operation.

This is problematic for NFS (and probably some other filesystems as
well). Add a new list_head argument to locks_delete_lock. If that
argument is non-NULL, then queue any locks that we want to free to the
list instead of freeing them.

Then, add a new locks_dispose_list function that will walk such a list
and call locks_free_lock on them after the i_lock has been dropped.

Finally, change all of the callers of locks_delete_lock to pass in a
list_head, except for lease_modify. That function can be called long
after the i_lock has been acquired. Deferring the freeing of a lease
after unlocking it in that function is non-trivial until we overhaul
some of the spinlocking in the lease code.

Currently though, no filesystem that sets fl_release_private supports
leases, so this is not currently a problem. We'll eventually want to
make the same change in the lease code, but it needs a lot more work
before we can reasonably do so.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

ed9814d8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功