提交 · a6b6d5b85abf4914bbceade5dddd54c345c64136 · openanolis / cloud-kernel

15 8月, 2017 22 次提交

NFS: Use an atomic_long_t to count the number of requests · a6b6d5b8

由 Trond Myklebust 提交于 8月 01, 2017

Rather than forcing us to take the inode->i_lock just in order to bump
the number.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a6b6d5b8

NFSv4: Use a mutex to protect the per-inode commit lists · e824f99a

由 Trond Myklebust 提交于 8月 01, 2017

The commit lists can get very large, so using the inode->i_lock can
end up affecting general metadata performance.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e824f99a

NFS: Refactor nfs_page_find_head_request() · b30d2f04

由 Trond Myklebust 提交于 8月 01, 2017

Split out the 2 cases so that we can treat the locking differently.
The issue is that the locking in the pageswapcache cache is highly
linked to the commit list locking.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b30d2f04

NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request() · bd37d6fc

由 Trond Myklebust 提交于 8月 01, 2017

Hide the locking from nfs_lock_and_join_requests() so that we can
separate out the requirements for swapcache pages.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bd37d6fc

NFS: Fix up nfs_page_group_covers_page() · 7e8a30f8

由 Trond Myklebust 提交于 7月 17, 2017

Fix up the test in nfs_page_group_covers_page(). The simplest implementation
is to check that we have a set of intersecting or contiguous subrequests
that connect page offset 0 to nfs_page_length(req->wb_page).
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7e8a30f8

NFS: Remove unused parameter from nfs_page_group_lock() · 1344b7ea

由 Trond Myklebust 提交于 7月 17, 2017

nfs_page_group_lock() is now always called with the 'nonblock'
parameter set to 'false'.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1344b7ea

T
NFS: Remove unuse function nfs_page_group_lock_wait() · dee83046
由 Trond Myklebust 提交于 7月 17, 2017
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
dee83046

NFS: Remove nfs_page_group_clear_bits() · 902a4c00

由 Trond Myklebust 提交于 7月 19, 2017

At this point, we only expect ever to potentially see PG_REMOVE and
PG_TEARDOWN being set on the subrequests.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

902a4c00

NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases · 5b2b5187

由 Trond Myklebust 提交于 7月 19, 2017

Since nfs_page_group_destroy() does not take any locks on the requests
to be freed, we need to ensure that we don't inadvertently free the
request in nfs_destroy_unlinked_subrequests() while the last reference
is being released elsewhere.

Do this by:

1) Taking a reference to the request unless it is already being freed
2) Checking (under the page group lock) if PG_TEARDOWN is already set before
   freeing an unreferenced request in nfs_destroy_unlinked_subrequests()
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5b2b5187

NFS: Further optimise nfs_lock_and_join_requests() · 74a6d4b5

由 Trond Myklebust 提交于 7月 19, 2017

When locking the entire group in order to remove subrequests,
the locks are always taken in order, and with the page group
lock being taken after the page head is locked. The intention
is that:

1) The lock on the group head guarantees that requests may not
   be removed from the group (although new entries could be appended
   if we're not holding the group lock).
2) It is safe to drop and retake the page group lock while iterating
   through the list, in particular when waiting for a subrequest lock.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

74a6d4b5

NFS: Reduce inode->i_lock contention in nfs_lock_and_join_requests() · b5bab9bf

由 Trond Myklebust 提交于 7月 17, 2017

We should no longer need the inode->i_lock, now that we've
straightened out the request locking. The locking schema is now:

1) Lock page head request
2) Lock the page group
3) Lock the subrequests one by one

Note that there is a subtle race with nfs_inode_remove_request() due
to the fact that the latter does not lock the page head, when removing
it from the struct page. Only the last subrequest is locked, hence
we need to re-check that the PagePrivate(page) is still set after
we've locked all the subrequests.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b5bab9bf

NFS: Remove page group limit in nfs_flush_incompatible() · 7e6cca6c

由 Trond Myklebust 提交于 7月 17, 2017

nfs_try_to_update_request() should be able to cope now.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7e6cca6c

T
NFS: Teach nfs_try_to_update_request() to deal with request page_groups · f6032f21
由 Trond Myklebust 提交于 7月 17, 2017
```
Simplify the code, and avoid some flushes to disk.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
f6032f21

NFS: Fix the inode request accounting when pages have subrequests · b66aaa8d

由 Trond Myklebust 提交于 7月 18, 2017

Both nfs_destroy_unlinked_subrequests() and nfs_lock_and_join_requests()
manipulate the inode flags adjusting the NFS_I(inode)->nrequests.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b66aaa8d

NFS: Don't unlock writebacks before declaring PG_WB_END · 31a01f09

由 Trond Myklebust 提交于 7月 18, 2017

We don't want nfs_lock_and_join_requests() to start fiddling with
the request before the call to nfs_page_group_sync_on_bit().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

31a01f09

NFS: Don't check request offset and size without holding a lock · e14bebf6

由 Trond Myklebust 提交于 7月 17, 2017

Request offsets and sizes are not guaranteed to be stable unless you
are holding the request locked.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e14bebf6

NFS: Fix an ABBA issue in nfs_lock_and_join_requests() · a0e265bc

由 Trond Myklebust 提交于 7月 17, 2017

All other callers of nfs_page_group_lock() appear to already hold the
page lock on the head page, so doing it in the opposite order here
is inefficient, although not deadlock prone since we roll back all
locks on contention.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a0e265bc

NFS: Fix a reference and lock leak in nfs_lock_and_join_requests() · 7cb9cd9a

由 Trond Myklebust 提交于 7月 17, 2017

Yes, this is a situation that should never happen (hence the WARN_ON)
but we should still ensure that we free up the locks and references to
the faulty pages.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7cb9cd9a

NFS: Ensure we always dereference the page head last · 08fead2a

由 Trond Myklebust 提交于 7月 18, 2017

This fixes a race with nfs_page_group_sync_on_bit() whereby the
call to wake_up_bit() in nfs_page_group_unlock() could occur after
the page header had been freed.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

08fead2a

NFS: Reduce lock contention in nfs_try_to_update_request() · 1403390d

由 Trond Myklebust 提交于 7月 17, 2017

Micro-optimisation to move the lockless check into the for(;;) loop.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1403390d

NFS: Reduce lock contention in nfs_page_find_head_request() · 82749dd4

由 Trond Myklebust 提交于 7月 17, 2017

Add a lockless check for whether or not the page might be carrying
an existing writeback before we grab the inode->i_lock.
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

82749dd4

NFS: Simplify page writeback · 6d17d653

由 Trond Myklebust 提交于 7月 09, 2017

We don't expect the page header lock to ever be held across I/O, so
it should always be safe to wait for it, even if we're doing nonblocking
writebacks.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

6d17d653

12 8月, 2017 1 次提交

pnfs/blocklayout: require 64-bit sector_t · 8a9d6e96

由 Christoph Hellwig 提交于 8月 05, 2017

The blocklayout code does not compile cleanly for a 32-bit sector_t,
and also has no reliable checks for devices sizes, which makes it
unsafe to use with a kernel that doesn't support large block devices.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NArnd Bergmann <arnd@arndb.de>
Fixes: 5c83746a ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing")
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8a9d6e96

10 8月, 2017 1 次提交

NFSv4: Ignore NFS4ERR_OLD_STATEID in nfs41_check_open_stateid() · c0ca0e59

由 Trond Myklebust 提交于 8月 08, 2017

If the call to TEST_STATEID returns NFS4ERR_OLD_STATEID, then it just
means we raced with other calls to OPEN.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c0ca0e59

09 8月, 2017 1 次提交

nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays · 1feb2616

由 Weston Andros Adamson 提交于 8月 01, 2017

The client was freeing the nfs4_ff_layout_ds, but not the contained
nfs4_ff_ds_version array.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1feb2616

02 8月, 2017 2 次提交

NFSv4: Fix double frees in nfs4_test_session_trunk() · d9cb7330

由 Trond Myklebust 提交于 8月 01, 2017

rpc_clnt_add_xprt() expects the callback function to be synchronous, and
expects to release the transport and switch references itself.

Fixes: 04fa2c6b ("NFS pnfs data server multipath session trunking")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d9cb7330

NFSv4: Fix EXCHANGE_ID corrupt verifier issue · fd40559c

由 Trond Myklebust 提交于 8月 01, 2017

The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was
changed to be asynchronous by commit 8d89bd70. If we interrrupt
the call to rpc_wait_for_completion_task(), we can therefore end up
transmitting random stack contents in lieu of the verifier.

Fixes: 8d89bd70 ("NFS setup async exchange_id")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fd40559c

29 7月, 2017 1 次提交

NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter · b7dbcc0e

由 Benjamin Coddington 提交于 7月 28, 2017

nfs4_retry_setlk() sets the task's state to TASK_INTERRUPTIBLE within the
same region protected by the wait_queue's lock after checking for a
notification from CB_NOTIFY_LOCK callback. However, after releasing that
lock, a wakeup for that task may race in before the call to
freezable_schedule_timeout_interruptible() and set TASK_WAKING, then
freezable_schedule_timeout_interruptible() will set the state back to
TASK_INTERRUPTIBLE before the task will sleep. The result is that the task
will sleep for the entire duration of the timeout.

Since we've already set TASK_INTERRUPTIBLE in the locked section, just use
freezable_schedule_timout() instead.

Fixes: a1d617d8 ("nfs: allow blocking locks to be awoken by lock callbacks")
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b7dbcc0e

27 7月, 2017 3 次提交

NFS: Optimize fallocate by refreshing mapping when needed. · 6ba80d43

由 NeilBrown 提交于 7月 24, 2017

posix_fallocate() will allocate space in an NFS file by considering
the last byte of every 4K block. If it is before EOF, it will read
the byte and if it is zero, a zero is written out. If it is after EOF,
the zero is unconditionally written.

For the blocks beyond EOF, if NFS believes its cache is valid, it will
expand these writes to write full pages, and then will merge the pages.
This results if (typically) 1MB writes. If NFS believes its cache is
not valid (particularly if NFS_INO_INVALID_DATA or
NFS_INO_REVAL_PAGECACHE are set - see nfs_write_pageuptodate()), it will
send the individual 1-byte writes. This results in (typically) 256 times
as many RPC requests, and can be substantially slower.

Currently nfs_revalidate_mapping() is only used when reading a file or
mmapping a file, as these are times when the content needs to be
up-to-date. Writes don't generally need the cache to be up-to-date, but
writes beyond EOF can benefit, particularly in the posix_fallocate()
case.

So this patch calls nfs_revalidate_mapping() when writing beyond EOF -
i.e. when there is a gap between the end of the file and the start of
the write. If the cache is thought to be out of date (as happens after
taking a file lock), this will cause a GETATTR, and the two flags
mentioned above will be cleared. With this, posix_fallocate() on a
newly locked file does not generate excessive tiny writes.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6ba80d43

NFS: invalidate file size when taking a lock. · 442ce049

由 NeilBrown 提交于 7月 24, 2017

Prior to commit ca0daa27 ("NFS: Cache aggressively when file is open
for writing"), NFS would revalidate, or invalidate, the file size when
taking a lock.  Since that commit it only invalidates the file content.

If the file size is changed on the server while wait for the lock, the
client will have an incorrect understanding of the file size and could
corrupt data.  This particularly happens when writing beyond the
(supposed) end of file and can be easily be demonstrated with
posix_fallocate().

If an application opens an empty file, waits for a write lock, and then
calls posix_fallocate(), glibc will determine that the underlying
filesystem doesn't support fallocate (assuming version 4.1 or earlier)
and will write out a '0' byte at the end of each 4K page in the region
being fallocated that is after the end of the file.
NFS will (usually) detect that these writes are beyond EOF and will
expand them to cover the whole page, and then will merge the pages.
Consequently, NFS will write out large blocks of zeroes beyond where it
thought EOF was.  If EOF had moved, the pre-existing part of the file
will be over-written.  Locking should have protected against this,
but it doesn't.

This patch restores the use of nfs_zap_caches() which invalidated the
cached attributes.  When posix_fallocate() asks for the file size, the
request will go to the server and get a correct answer.

cc: stable@vger.kernel.org (v4.8+)
Fixes: ca0daa27 ("NFS: Cache aggressively when file is open for writing")
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

442ce049

NFS: Use raw NFS access mask in nfs4_opendata_access() · 1e6f2095

由 Anna Schumaker 提交于 7月 25, 2017

Commit bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
access cache") changed how the access results are stored after an
access() call.  An NFS v4 OPEN might have access bits returned with the
opendata, so we should use the NFS4_ACCESS values when determining the
return value in nfs4_opendata_access().

Fixes: bd8b2441 ("NFS: Store the raw NFS access mask in the inode's
access cache")
Reported-by: NEryu Guan <eguan@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Tested-by: NTakashi Iwai <tiwai@suse.de>

1e6f2095

22 7月, 2017 1 次提交

NFS/filelayout: Fix racy setting of fl->dsaddr in filelayout_check_deviceid() · 1ebf9801

由 Trond Myklebust 提交于 7月 20, 2017

We must set fl->dsaddr once, and once only, even if there are multiple
processes calling filelayout_check_deviceid() for the same layout
segment.
Reported-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1ebf9801

21 7月, 2017 5 次提交

NFS: Be more careful about mapping file permissions · ecbb903c

由 Trond Myklebust 提交于 7月 11, 2017

When mapping a directory, we want the MAY_WRITE permissions to reflect
whether or not we have permission to modify, add and delete the directory
entries. MAY_EXEC must map to lookup permissions.

On the other hand, for files, we want MAY_WRITE to reflect a permission
to modify and extend the file.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ecbb903c

NFS: Store the raw NFS access mask in the inode's access cache · bd8b2441

由 Trond Myklebust 提交于 7月 11, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

bd8b2441

NFSv3: Convert nfs3_proc_access() to use nfs_access_set_mask() · eda3e208

由 Trond Myklebust 提交于 7月 11, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eda3e208

NFS: Refactor NFS access to kernel access mask calculation · 15d4b73a

由 Trond Myklebust 提交于 7月 11, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

15d4b73a

nfs: count correct array for mnt3_counts array size · ecc7b435

由 Eryu Guan 提交于 7月 18, 2017

Array size of mnt3_counts should be the size of array
mnt3_procedures, not mnt_procedures, though they're same in size
right now. Found this by code inspection.

Fixes: 1c5876dd ("sunrpc: move p_count out of struct rpc_procinfo")
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NEryu Guan <eguan@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ecc7b435

20 7月, 2017 3 次提交

Revert commit ("pNFS: Don't send COMMITs to the DSes if...") · 21329736

由 Trond Myklebust 提交于 7月 12, 2017

Doing the test without taking any locks is racy, and so really it makes
more sense to do it in the flexfiles code (which is the only case that
cares).
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

21329736

pNFS/flexfiles: Handle expired layout segments in ff_layout_initiate_commit() · 4b75053e

由 Trond Myklebust 提交于 7月 12, 2017

If the layout has expired due to a fencing event, then we should not
attempt to commit to the DS.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4b75053e

NFS: Fix another COMMIT race in pNFS · 41181886

由 Trond Myklebust 提交于 7月 12, 2017

We must make sure that cinfo->ds->ncommitting is in sync with the
commit list, since it is checked as part of pnfs_commit_list().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

41181886

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功