提交 · 1ccbad9f9f9bd36db26a10f0b17fbaf12b3ae93a · openanolis / cloud-kernel

24 4月, 2015 5 次提交

nfs: fix DIO good bytes calculation · 1ccbad9f

由 Peng Tao 提交于 4月 09, 2015

For direct read that has IO size larger than rsize, we'll split
it into several READ requests and nfs_direct_good_bytes() would
count completed bytes incorrectly by eating last zero count reply.

Fix it by handling mirror and non-mirror cases differently such that
we only count mirrored writes differently.

This fixes 5fadeb47("nfs: count DIO good bytes correctly with mirroring").
Reported-by: NJean Spector <jean@primarydata.com>
Cc: <stable@vger.kernel.org> # v3.19+
Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1ccbad9f

nfs: Fetch MOUNTED_ON_FILEID when updating an inode · ea96d1ec

由 Anna Schumaker 提交于 4月 03, 2015

2ef47eb1 (NFS: Fix use of nfs_attr_use_mounted_on_fileid()) was a good
start to fixing a circular directory structure warning for NFS v4
"junctioned" mountpoints.  Unfortunately, further testing continued to
generate this error.

My server is configured like this:

anna@nfsd ~ % df
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       9.1G  2.0G  6.5G  24% /
/dev/vdc1      1014M   33M  982M   4% /exports
/dev/vdc2      1014M   33M  982M   4% /exports/vol1
/dev/vdc3      1014M   33M  982M   4% /exports/vol1/vol2

anna@nfsd ~ % cat /etc/exports
/exports/          *(rw,async,no_subtree_check,no_root_squash)
/exports/vol1/     *(rw,async,no_subtree_check,no_root_squash)
/exports/vol1/vol2 *(rw,async,no_subtree_check,no_root_squash)

I've been running chown across the entire mountpoint twice in a row to
hit this problem.  The first run succeeds, but the second one fails with
the circular directory warning along with:

anna@client ~ % dmesg
[Apr 3 14:28] NFS: server 192.168.100.204 error: fileid changed
              fsid 0:39: expected fileid 0x100080, got 0x80

WHere 0x80 is the mountpoint's fileid and 0x100080 is the mounted-on
fileid.

This patch fixes the issue by requesting an updated mounted-on fileid
from the server during nfs_update_inode(), and then checking that the
fileid stored in the nfs_inode matches either the fileid or mounted-on
fileid returned by the server.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ea96d1ec

nfs: fix high load average due to callback thread sleeping · 5d05e54a

由 Jeff Layton 提交于 3月 20, 2015

Chuck pointed out a problem that crept in with commit 6ffa30d3 (nfs:
don't call blocking operations while !TASK_RUNNING). Linux counts tasks
in uninterruptible sleep against the load average, so this caused the
system's load average to be pinned at at least 1 when there was a
NFSv4.1+ mount active.

Not a huge problem, but it's probably worth fixing before we get too
many complaints about it. This patch converts the code back to use
TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop
iteration. In practice no one should really be signalling this thread at
all, so I think this is reasonably safe.

With this change, there's also no need to game the hung task watchdog so
we can also convert the schedule_timeout call back to a normal schedule.

Cc: <stable@vger.kernel.org>
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>
Fixes: commit 6ffa30d3 (“nfs: don't call blocking . . .”)
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5d05e54a

NFS: Reduce time spent holding the i_mutex during fallocate() · f830f7dd

由 Anna Schumaker 提交于 3月 16, 2015

At the very least, we should not be taking the i_mutex until after
checking if the server even supports ALLOCATE or DEALLOCATE, allowing
v4.0 or v4.1 to exit without potentially waiting on a lock.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f830f7dd

NFS: Don't zap caches on fallocate() · 9a51940b

由 Anna Schumaker 提交于 3月 16, 2015

This patch adds a GETATTR to the end of ALLOCATE and DEALLOCATE
operations so we can set the updated inode size and change attribute
directly.  DEALLOCATE will still need to release pagecache pages, so
nfs42_proc_deallocate() now calls truncate_pagecache_range() before
contacting the server.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9a51940b

28 3月, 2015 16 次提交

T
NFS: Block new writes while syncing data in nfs_getattr() · 8c18d76b
由 Trond Myklebust 提交于 3月 25, 2015
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
8c18d76b

NFSv4.1/pnfs: Separate out metadata and data consistency for pNFS · 5bb89b47

由 Trond Myklebust 提交于 3月 25, 2015

The LAYOUTCOMMIT operation means different things to different layout types.
For blocks and objects, it is both a data and metadata consistency operation.
For files and flexfiles, it is only a metadata consistency operation.

This patch separates out the 2 cases, allowing the files/flexfiles layout
drivers to optimise away the data consistency calls to layoutcommit.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5bb89b47

NFSv4.1/pnfs: Ensure we send layoutcommit before return-on-close · 7140171e

由 Trond Myklebust 提交于 3月 25, 2015

We must not send a close or delegreturn that would result in a
return-on-close of the layout without ensuring that we've also
sent the necessary layoutcommit.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7140171e

NFSv4.1/pnfs: Ensure that writes respect the O_SYNC flag when doing O_DIRECT · a0815d55

由 Trond Myklebust 提交于 3月 25, 2015

If the caller does not specify the O_SYNC flag, then it is legitimate
to return from O_DIRECT without doing a pNFS layoutcommit operation.
However if the file is opened O_DIRECT|O_SYNC then we'd better get it
right.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a0815d55

NFSv4: Truncating file opens should also sync O_DIRECT writes · 9e1681c2

由 Trond Myklebust 提交于 3月 25, 2015

We don't just want to sync out buffered writes, but also O_DIRECT ones.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9e1681c2

NFS: File unlock needs to be a metadata synchronisation point · d9dabc1a

由 Trond Myklebust 提交于 3月 26, 2015

File unlock needs to update both data and metadata on the NFS server
in order to act as a synchronisation point for other clients.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d9dabc1a

T
NFS: Add a helper to sync both O_DIRECT and buffered writes · 4d346bea
由 Trond Myklebust 提交于 3月 25, 2015
```
Then apply it to nfs_setattr() and nfs_getattr().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
4d346bea

NFSv4.1/pnfs: Refactor pnfs_set_layoutcommit() · 67af7611

由 Trond Myklebust 提交于 3月 25, 2015

pnfs_set_layoutcommit() and pnfs_commit_set_layoutcommit() are 100% identical
except for the function arguments. Refactor to eliminate the difference.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

67af7611

NFSv4.1/pnfs: Fix setting of layoutcommit last write byte · 29559b11

由 Trond Myklebust 提交于 3月 25, 2015

If the NFS_INO_LAYOUTCOMMIT flag was unset, then we _must_ ensure that
we also reset the last write byte (lwb) for that layout. The current
code depends on us clearing the lwb when we clear NFS_INO_LAYOUTCOMMIT,
which is not the case when we call pnfs_clear_layoutcommit().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

29559b11

NFSv4: Return the delegation before returning the layout in evict_inode() · 415320fc

由 Trond Myklebust 提交于 3月 25, 2015

Minor optimisation for the case where the layout has return-on-close
enabled.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

415320fc

NFSv4: Allow tracing of NFSv4 fsync calls · 81b79afb

由 Trond Myklebust 提交于 3月 25, 2015

I appear to have missed this when adding the ftrace probes.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

81b79afb

NFS: Fix free_deveiceid -> free_deviceid · fc87701b

由 Trond Myklebust 提交于 3月 09, 2015

Make it easier to grep for these functions by name.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

fc87701b

NFSv4.1: Don't cache deviceids that have no notifications · df52699e

由 Trond Myklebust 提交于 3月 09, 2015

The spec says that once all layouts that reference a given deviceid
have been returned, then we are only allowed to continue to cache
the deviceid if the metadata server supports notifications.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

df52699e

NFSv4.1: Allow getdeviceinfo to return notification info back to caller · 4e590803

由 Trond Myklebust 提交于 3月 09, 2015

We are only allowed to cache deviceinfo if the server supports notifications
and actually promises to call us back when changes occur. Right now, we
request those notifications, but then we don't check the server's reply.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4e590803

T
NFSv4.1: Cleanup - don't opencode nfs4_put_deviceid_node() · fb1458f4
由 Trond Myklebust 提交于 3月 09, 2015
```
There really is no reason to do so.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
fb1458f4

NFSv4.1: Convert pNFS deviceid to use kfree_rcu() · 84a80f62

由 Trond Myklebust 提交于 3月 09, 2015

Use of synchronize_rcu() when unmounting and potentially freeing a lot
of deviceids is problematic. There really is no reason why we can't just
use kfree_rcu() here.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

84a80f62

13 3月, 2015 2 次提交

nfs: clean up nfs_direct_IO · 2854475f

由 Peng Tao 提交于 1月 21, 2015

This follows up "nfs: fix dio deadlock when O_DIRECT flag is flipped"
and removes the unnecessary CONFIG_NFS_SWAP switch.
Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2854475f

NFSv4: Append delegations to the per-client list instead of prepending · 38942ba2

由 Trond Myklebust 提交于 3月 04, 2015

Do so on the assumption that for most use cases, that list will turn into
a more or less LRU-ordered list, and so the list traversals in
nfs_client_return_marked_delegations() are likely to be shorter before
hitting a candidate to return.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

38942ba2

06 3月, 2015 3 次提交

Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref. · dd9ef135

由 Quentin Casasnovas 提交于 3月 03, 2015

Improper arithmetics when calculting the address of the extended ref could
lead to an out of bounds memory read and kernel panic.
Signed-off-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
cc: stable@vger.kernel.org # v3.7+
Signed-off-by: NChris Mason <clm@fb.com>

dd9ef135

Btrfs: fix data loss in the fast fsync path · 3a8b36f3

由 Filipe Manana 提交于 3月 01, 2015

When using the fast file fsync code path we can miss the fact that new
writes happened since the last file fsync and therefore return without
waiting for the IO to finish and write the new extents to the fsync log.

Here's an example scenario where the fsync will miss the fact that new
file data exists that wasn't yet durably persisted:

1. fs_info->last_trans_committed == N - 1 and current transaction is
   transaction N (fs_info->generation == N);

2. do a buffered write;

3. fsync our inode, this clears our inode's full sync flag, starts
   an ordered extent and waits for it to complete - when it completes
   at btrfs_finish_ordered_io(), the inode's last_trans is set to the
   value N (via btrfs_update_inode_fallback -> btrfs_update_inode ->
   btrfs_set_inode_last_trans);

4. transaction N is committed, so fs_info->last_trans_committed is now
   set to the value N and fs_info->generation remains with the value N;

5. do another buffered write, when this happens btrfs_file_write_iter
   sets our inode's last_trans to the value N + 1 (that is
   fs_info->generation + 1 == N + 1);

6. transaction N + 1 is started and fs_info->generation now has the
   value N + 1;

7. transaction N + 1 is committed, so fs_info->last_trans_committed
   is set to the value N + 1;

8. fsync our inode - because it doesn't have the full sync flag set,
   we only start the ordered extent, we don't wait for it to complete
   (only in a later phase) therefore its last_trans field has the
   value N + 1 set previously by btrfs_file_write_iter(), and so we
   have:

       inode->last_trans <= fs_info->last_trans_committed
           (N + 1)              (N + 1)

   Which made us not log the last buffered write and exit the fsync
   handler immediately, returning success (0) to user space and resulting
   in data loss after a crash.

This can actually be triggered deterministically and the following excerpt
from a testcase I made for xfstests triggers the issue. It moves a dummy
file across directories and then fsyncs the old parent directory - this
is just to trigger a transaction commit, so moving files around isn't
directly related to the issue but it was chosen because running 'sync' for
example does more than just committing the current transaction, as it
flushes/waits for all file data to be persisted. The issue can also happen
at random periods, since the transaction kthread periodicaly commits the
current transaction (about every 30 seconds by default).
The body of the test is:

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create our main test file 'foo', the one we check for data loss.
  # By doing an fsync against our file, it makes btrfs clear the 'needs_full_sync'
  # bit from its flags (btrfs inode specific flags).
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 8K" \
                  -c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io

  # Now create one other file and 2 directories. We will move this second file
  # from one directory to the other later because it forces btrfs to commit its
  # currently open transaction if we fsync the old parent directory. This is
  # necessary to trigger the data loss bug that affected btrfs.
  mkdir $SCRATCH_MNT/testdir_1
  touch $SCRATCH_MNT/testdir_1/bar
  mkdir $SCRATCH_MNT/testdir_2

  # Make sure everything is durably persisted.
  sync

  # Write more 8Kb of data to our file.
  $XFS_IO_PROG -c "pwrite -S 0xbb 8K 8K" $SCRATCH_MNT/foo | _filter_xfs_io

  # Move our 'bar' file into a new directory.
  mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar

  # Fsync our first directory. Because it had a file moved into some other
  # directory, this made btrfs commit the currently open transaction. This is
  # a condition necessary to trigger the data loss bug.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1

  # Now fsync our main test file. If the fsync succeeds, we expect the 8Kb of
  # data we wrote previously to be persisted and available if a crash happens.
  # This did not happen with btrfs, because of the transaction commit that
  # happened when we fsynced the parent directory.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo

  # Simulate a crash/power loss.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # Now check that all data we wrote before are available.
  echo "File content after log replay:"
  od -t x1 $SCRATCH_MNT/foo

  status=0
  exit

The expected golden output for the test, which is what we get with this
fix applied (or when running against ext3/4 and xfs), is:

  wrote 8192/8192 bytes at offset 0
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  wrote 8192/8192 bytes at offset 8192
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  File content after log replay:
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0020000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
  *
  0040000

Without this fix applied, the output shows the test file does not have
the second 8Kb extent that we successfully fsynced:

  wrote 8192/8192 bytes at offset 0
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  wrote 8192/8192 bytes at offset 8192
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  File content after log replay:
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0020000

So fix this by skipping the fsync only if we're doing a full sync and
if the inode's last_trans is <= fs_info->last_trans_committed, or if
the inode is already in the log. Also remove setting the inode's
last_trans in btrfs_file_write_iter since it's useless/unreliable.

Also because btrfs_file_write_iter no longer sets inode->last_trans to
fs_info->generation + 1, don't set last_trans to 0 if we bail out and don't
bail out if last_trans is 0, otherwise something as simple as the following
example wouldn't log the second write on the last fsync:

  1. write to file

  2. fsync file

  3. fsync file
       |--> btrfs_inode_in_log() returns true and it set last_trans to 0

  4. write to file
       |--> btrfs_file_write_iter() no longers sets last_trans, so it
            remained with a value of 0
  5. fsync
       |--> inode->last_trans == 0, so it bails out without logging the
            second write

A test case for xfstests will be sent soon.

CC: <stable@vger.kernel.org>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

3a8b36f3

Btrfs: remove extra run_delayed_refs in update_cowonly_root · f5c0a122

由 Josef Bacik 提交于 3月 02, 2015

This got added with my dirty_bgs patch, it's not needed.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

f5c0a122

05 3月, 2015 1 次提交

locks: fix fasync_struct memory leak in lease upgrade/downgrade handling · 0164bf02

由 Jeff Layton 提交于 3月 04, 2015

Commit 8634b51f (locks: convert lease handling to file_lock_context)
introduced a regression in the handling of lease upgrade/downgrades.

In the event that we already have a lease on a file and are going to
either upgrade or downgrade it, we skip doing any list insertion or
deletion and simply re-call lm_setup on the existing lease.

As of commit 8634b51f however, we end up calling lm_setup on the
lease that was passed in, instead of on the existing lease. This causes
us to leak the fasync_struct that was allocated in the event that there
was not already an existing one (as it always appeared that there
wasn't one).

Fixes: 8634b51f (locks: convert lease handling to file_lock_context)
Reported-and-Tested-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>

0164bf02

04 3月, 2015 4 次提交

NFSv4.1: Clear the old state by our client id before establishing a new lease · e11259f9

由 Trond Myklebust 提交于 3月 03, 2015

If the call to exchange-id returns with the EXCHGID4_FLAG_CONFIRMED_R flag
set, then that means our lease was established by a previous mount instance.
Ensure that we detect this situation, and that we clear the state held by
that mount.
Reported-by: NJorge Mora <Jorge.Mora@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e11259f9

NFSv4: Fix a race in NFSv4.1 server trunking discovery · 48d66b97

由 Trond Myklebust 提交于 3月 03, 2015

We do not want to allow a race with another NFS mount to cause
nfs41_walk_client_list() to establish a lease on our nfs_client before
we're done checking for trunking.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

48d66b97

NFS: Don't write enable new pages while an invalidation is proceeding · ef070dcb

由 Trond Myklebust 提交于 3月 03, 2015

nfs_vm_page_mkwrite() should wait until the page cache invalidation
is finished. This is the second patch in a 2 patch series to deprecate
the NFS client's reliance on nfs_release_page() in the context of
nfs_invalidate_mapping().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ef070dcb

NFS: Fix a regression in the read() syscall · 874f9463

由 Trond Myklebust 提交于 3月 02, 2015

When invalidating the page cache for a regular file, we want to first
sync all dirty data to disk and then call invalidate_inode_pages2().
The latter relies on nfs_launder_page() and nfs_release_page() to deal
respectively with dirty pages, and unstable written pages.

When commit 95905446 ("NFS: avoid deadlocks with loop-back mounted
NFS filesystems.") changed the behaviour of nfs_release_page(), then it
made it possible for invalidate_inode_pages2() to fail with an EBUSY.
Unfortunately, that error is then propagated back to read().

Let's therefore work around the problem for now by protecting the call
to sync the data and invalidate_inode_pages2() so that they are atomic
w.r.t. the addition of new writes.
Later on, we can revisit whether or not we still need nfs_launder_page()
and nfs_release_page().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

874f9463

03 3月, 2015 9 次提交

eCryptfs: don't pass fs-specific ioctl commands through · 6d65261a

由 Tyler Hicks 提交于 2月 24, 2015

eCryptfs can't be aware of what to expect when after passing an
arbitrary ioctl command through to the lower filesystem. The ioctl
command may trigger an action in the lower filesystem that is
incompatible with eCryptfs.

One specific example is when one attempts to use the Btrfs clone
ioctl command when the source file is in the Btrfs filesystem that
eCryptfs is mounted on top of and the destination fd is from a new file
created in the eCryptfs mount. The ioctl syscall incorrectly returns
success because the command is passed down to Btrfs which thinks that it
was able to do the clone operation. However, the result is an empty
eCryptfs file.

This patch allows the trim, {g,s}etflags, and {g,s}etversion ioctl
commands through and then copies up the inode metadata from the lower
inode to the eCryptfs inode to catch any changes made to the lower
inode's metadata. Those five ioctl commands are mostly common across all
filesystems but the whitelist may need to be further pruned in the
future.

https://bugzilla.kernel.org/show_bug.cgi?id=93691
https://launchpad.net/bugs/1305335Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Cc: Rocko <rockorequin@hotmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: stable@vger.kernel.org # v2.6.36+: c43f7b8f eCryptfs: Handle ioctl calls with unlocked and compat functions

6d65261a

NFSv4: Ensure we skip delegations that are already being returned · ec3ca4e5

由 Trond Myklebust 提交于 2月 26, 2015

In nfs_client_return_marked_delegations() and nfs_delegation_reap_unclaimed()
we want to optimise the loop traversal by skipping delegations that are
already in the process of being returned.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ec3ca4e5

NFSv4: Pin the superblock while we're returning the delegation · 9f0f8e12

由 Trond Myklebust 提交于 2月 26, 2015

This patch ensures that the superblock doesn't go ahead and disappear
underneath us while the state manager thread is returning delegations.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9f0f8e12

NFSv4: Ensure we honour NFS_DELEGATION_RETURNING in nfs_inode_set_delegation() · ade04647

由 Trond Myklebust 提交于 2月 27, 2015

Ensure that nfs_inode_set_delegation() doesn't inadvertently detach a
delegation that is already in the process of being returned.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ade04647

T
NFSv4: Ensure that we don't reap a delegation that is being returned · b04b22f4
由 Trond Myklebust 提交于 2月 26, 2015
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
b04b22f4

NFS: Fix stateid used for NFS v4 closes · 369d6b7f

由 Anna Schumaker 提交于 3月 02, 2015

After 566fcec6 the client uses the "current stateid" from the
nfs4_state structure to close a file.  This could potentially contain a
delegation stateid, which is disallowed by the protocol and causes
servers to return NFS4ERR_BAD_STATEID.  This patch restores the
(correct) behavior of sending the open stateid to close a file.
Reported-by: NOlga Kornievskaia <kolga@netapp.com>
Fixes: 566fcec6 (NFSv4: Fix an atomicity problem in CLOSE)
Signed-off-by: NAnna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

369d6b7f

Btrfs: incremental send, don't rename a directory too soon · 84471e24

由 Filipe Manana 提交于 2月 28, 2015

There's one more case where we can't issue a rename operation for a
directory as soon as we process it. We used to delay directory renames
only if they have some ancestor directory with a higher inode number
that got renamed too, but there's another case where we need to delay
the rename too - when a directory A is renamed to the old name of a
directory B but that directory B has its rename delayed because it
has now (in the send root) an ancestor with a higher inode number that
was renamed. If we don't delay the directory rename in this case, the
receiving end of the send stream will attempt to rename A to the old
name of B before B got renamed to its new name, which results in a
"directory not empty" error. So fix this by delaying directory renames
for this case too.

Steps to reproduce:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ mkdir /mnt/a
  $ mkdir /mnt/b
  $ mkdir /mnt/c
  $ touch /mnt/a/file

  $ btrfs subvolume snapshot -r /mnt /mnt/snap1

  $ mv /mnt/c /mnt/x
  $ mv /mnt/a /mnt/x/y
  $ mv /mnt/b /mnt/a

  $ btrfs subvolume snapshot -r /mnt /mnt/snap2

  $ btrfs send /mnt/snap1 -f /tmp/1.send
  $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.send

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt2
  $ btrfs receive /mnt2 -f /tmp/1.send
  $ btrfs receive /mnt2 -f /tmp/2.send
  ERROR: rename b -> a failed. Directory not empty

A test case for xfstests follows soon.
Reported-by: NAmes Cornish <ames@cornishes.net>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

84471e24

btrfs: fix lost return value due to variable shadowing · 1932b7be

由 David Sterba 提交于 2月 24, 2015

A block-local variable stores error code but btrfs_get_blocks_direct may
not return it in the end as there's a ret defined in the function scope.

CC: <stable@vger.kernel.org>	# 3.6+
Fixes: d187663e ("Btrfs: lock extents as we map them in DIO")
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

1932b7be

Btrfs: do not ignore errors from btrfs_lookup_xattr in do_setxattr · 5cdf83ed

由 Filipe Manana 提交于 2月 23, 2015

The return value from btrfs_lookup_xattr() can be a pointer encoding an
error, therefore deal with it. This fixes commit 5f5bc6b1
("Btrfs: make xattr replace operations atomic").
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

5cdf83ed

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功