提交 · bf4b49059718b2217339eb15c60f8753d5b0da99 · openeuler / raspberrypi-kernel

12 9月, 2017 2 次提交

NFS: various changes relating to reporting IO errors. · bf4b4905

由 NeilBrown 提交于 9月 11, 2017

1/ remove 'start' and 'end' args from nfs_file_fsync_commit().
   They aren't used.

2/ Make nfs_context_set_write_error() a "static inline" in internal.h
   so we can...

3/ Use nfs_context_set_write_error() instead of mapping_set_error()
   if nfs_pageio_add_request() fails before sending any request.
   NFS generally keeps errors in the open_context, not the mapping,
   so this is more consistent.

4/ If filemap_write_and_write_range() reports any error, still
   check ctx->error.  The value in ctx->error is likely to be
   more useful.  As part of this, NFS_CONTEXT_ERROR_WRITE is
   cleared slightly earlier, before nfs_file_fsync_commit() is called,
   rather than at the start of that function.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bf4b4905

NFS: Add static NFS I/O tracepoints · 8224b273

由 Chuck Lever 提交于 8月 21, 2017

Tools like tcpdump and rpcdebug can be very useful. But there are
plenty of environments where they are difficult or impossible to
use. For example, we've had customers report I/O failures during
workloads so heavy that collecting network traffic or enabling
RPC debugging are themselves onerous.

The kernel's static tracepoints are lightweight (less likely to
introduce timing changes) and efficient (the trace data is compact).
They also work in scenarios where capturing network traffic is not
possible due to lack of hardware support (some InfiniBand HCAs) or
where data or network privacy is a concern.

Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT
is initiated, and when it completes. Record the arguments and
results of each operation, which are not shown by existing sunrpc
module's tracepoints.

For instance, the recorded offset and count can be used to match an
"initiate" event to a "done" event. If an NFS READ result returns
fewer bytes than requested or zero, seeing the EOF flag can be
probative. Seeing an NFS4ERR_BAD_STATEID result is also indication
of a particular class of problems. The timing information attached
to each event record can often be useful as well.

Usage example:

[root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done
/sys/kernel/debug/tracing/events/nfs/*initiate*/filter
/sys/kernel/debug/tracing/events/nfs/*done/filter
Hit Ctrl^C to stop recording
^CKernel buffer statistics:
  Note: "entries" are the entries left in the kernel ring buffer and are not
        recorded in the trace data. They should all be zero.

CPU: 0
entries: 0
overrun: 0
commit overrun: 0
bytes: 3680
oldest event ts:    78.367422
now ts:   100.124419
dropped events: 0
read events: 74

... and so on.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8224b273

10 9月, 2017 4 次提交

NFS: Count the bytes of skipped subrequests in nfs_lock_and_join_requests() · 1bd5d6d0

由 Trond Myklebust 提交于 9月 09, 2017

If we skip a subrequest due to a zero refcount, we should still count
the byte range that it covered so that we accurately reconstruct the
original request size.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1bd5d6d0

NFS: Don't hold the group lock when calling nfs_release_request() · 8b77484f

由 Trond Myklebust 提交于 9月 09, 2017

That can deadlock if this is the last reference since
nfs_page_group_destroy() calls nfs_page_group_sync_on_bit().
Note that even if the page was removed from the subpage list,
the req->wb_head could still be pointing to the old head.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8b77484f

NFS: Remove pnfs_generic_transfer_commit_list() · 5d2a9d9d

由 Trond Myklebust 提交于 9月 09, 2017

It's pretty much a duplicate of nfs_scan_commit_list() that also
clears the PG_COMMIT_TO_DS flag.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5d2a9d9d

NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlock · 137da553

由 Trond Myklebust 提交于 9月 09, 2017

Since the commit list is not ordered, it is possible for nfs_scan_commit_list
to hold a request that nfs_lock_and_join_requests() is waiting for, while
at the same time trying to grab a request that nfs_lock_and_join_requests
already holds.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

137da553

07 9月, 2017 1 次提交

NFS: don't expect errors from mempool_alloc(). · 237f8306

由 NeilBrown 提交于 8月 18, 2017

Commit fbe77c30 ("NFS: move rw_mode to nfs_pageio_header")
reintroduced some pointless code that commit 518662e0 ("NFS: fix
usage of mempools.") had recently removed.

Remove it again.

Cc: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

237f8306

20 8月, 2017 1 次提交

NFS: Remove unused parameter gfp_flags from nfs_pageio_init() · 3bde7afd

由 Trond Myklebust 提交于 8月 20, 2017

Now that the mirror allocation has been moved, the parameter can go.
Also remove the redundant symbol export.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3bde7afd

15 8月, 2017 23 次提交

NFS: Wait for requests that are locked on the commit list · 2ce209c4

由 Trond Myklebust 提交于 8月 01, 2017

If a request is on the commit list, but is locked, we will currently skip
it, which can lead to livelocking when the commit count doesn't reduce
to zero.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2ce209c4

NFS: Switch to using mapping->private_lock for page writeback lookups. · 4b9bb25b

由 Trond Myklebust 提交于 8月 01, 2017

Switch from using the inode->i_lock for this to avoid contention with
other metadata manipulation.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4b9bb25b

T
NFS: Use an atomic_long_t to count the number of commits · 5cb953d4
由 Trond Myklebust 提交于 8月 01, 2017
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
5cb953d4

NFS: Use an atomic_long_t to count the number of requests · a6b6d5b8

由 Trond Myklebust 提交于 8月 01, 2017

Rather than forcing us to take the inode->i_lock just in order to bump
the number.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a6b6d5b8

NFSv4: Use a mutex to protect the per-inode commit lists · e824f99a

由 Trond Myklebust 提交于 8月 01, 2017

The commit lists can get very large, so using the inode->i_lock can
end up affecting general metadata performance.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e824f99a

NFS: Refactor nfs_page_find_head_request() · b30d2f04

由 Trond Myklebust 提交于 8月 01, 2017

Split out the 2 cases so that we can treat the locking differently.
The issue is that the locking in the pageswapcache cache is highly
linked to the commit list locking.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b30d2f04

NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request() · bd37d6fc

由 Trond Myklebust 提交于 8月 01, 2017

Hide the locking from nfs_lock_and_join_requests() so that we can
separate out the requirements for swapcache pages.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bd37d6fc

NFS: Fix up nfs_page_group_covers_page() · 7e8a30f8

由 Trond Myklebust 提交于 7月 17, 2017

Fix up the test in nfs_page_group_covers_page(). The simplest implementation
is to check that we have a set of intersecting or contiguous subrequests
that connect page offset 0 to nfs_page_length(req->wb_page).
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7e8a30f8

NFS: Remove unused parameter from nfs_page_group_lock() · 1344b7ea

由 Trond Myklebust 提交于 7月 17, 2017

nfs_page_group_lock() is now always called with the 'nonblock'
parameter set to 'false'.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1344b7ea

NFS: Remove nfs_page_group_clear_bits() · 902a4c00

由 Trond Myklebust 提交于 7月 19, 2017

At this point, we only expect ever to potentially see PG_REMOVE and
PG_TEARDOWN being set on the subrequests.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

902a4c00

NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases · 5b2b5187

由 Trond Myklebust 提交于 7月 19, 2017

Since nfs_page_group_destroy() does not take any locks on the requests
to be freed, we need to ensure that we don't inadvertently free the
request in nfs_destroy_unlinked_subrequests() while the last reference
is being released elsewhere.

Do this by:

1) Taking a reference to the request unless it is already being freed
2) Checking (under the page group lock) if PG_TEARDOWN is already set before
   freeing an unreferenced request in nfs_destroy_unlinked_subrequests()
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5b2b5187

NFS: Further optimise nfs_lock_and_join_requests() · 74a6d4b5

由 Trond Myklebust 提交于 7月 19, 2017

When locking the entire group in order to remove subrequests,
the locks are always taken in order, and with the page group
lock being taken after the page head is locked. The intention
is that:

1) The lock on the group head guarantees that requests may not
   be removed from the group (although new entries could be appended
   if we're not holding the group lock).
2) It is safe to drop and retake the page group lock while iterating
   through the list, in particular when waiting for a subrequest lock.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

74a6d4b5

NFS: Reduce inode->i_lock contention in nfs_lock_and_join_requests() · b5bab9bf

由 Trond Myklebust 提交于 7月 17, 2017

We should no longer need the inode->i_lock, now that we've
straightened out the request locking. The locking schema is now:

1) Lock page head request
2) Lock the page group
3) Lock the subrequests one by one

Note that there is a subtle race with nfs_inode_remove_request() due
to the fact that the latter does not lock the page head, when removing
it from the struct page. Only the last subrequest is locked, hence
we need to re-check that the PagePrivate(page) is still set after
we've locked all the subrequests.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b5bab9bf

NFS: Remove page group limit in nfs_flush_incompatible() · 7e6cca6c

由 Trond Myklebust 提交于 7月 17, 2017

nfs_try_to_update_request() should be able to cope now.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7e6cca6c

T
NFS: Teach nfs_try_to_update_request() to deal with request page_groups · f6032f21
由 Trond Myklebust 提交于 7月 17, 2017
```
Simplify the code, and avoid some flushes to disk.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
f6032f21

NFS: Fix the inode request accounting when pages have subrequests · b66aaa8d

由 Trond Myklebust 提交于 7月 18, 2017

Both nfs_destroy_unlinked_subrequests() and nfs_lock_and_join_requests()
manipulate the inode flags adjusting the NFS_I(inode)->nrequests.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b66aaa8d

NFS: Don't unlock writebacks before declaring PG_WB_END · 31a01f09

由 Trond Myklebust 提交于 7月 18, 2017

We don't want nfs_lock_and_join_requests() to start fiddling with
the request before the call to nfs_page_group_sync_on_bit().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

31a01f09

NFS: Don't check request offset and size without holding a lock · e14bebf6

由 Trond Myklebust 提交于 7月 17, 2017

Request offsets and sizes are not guaranteed to be stable unless you
are holding the request locked.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e14bebf6

NFS: Fix an ABBA issue in nfs_lock_and_join_requests() · a0e265bc

由 Trond Myklebust 提交于 7月 17, 2017

All other callers of nfs_page_group_lock() appear to already hold the
page lock on the head page, so doing it in the opposite order here
is inefficient, although not deadlock prone since we roll back all
locks on contention.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a0e265bc

NFS: Fix a reference and lock leak in nfs_lock_and_join_requests() · 7cb9cd9a

由 Trond Myklebust 提交于 7月 17, 2017

Yes, this is a situation that should never happen (hence the WARN_ON)
but we should still ensure that we free up the locks and references to
the faulty pages.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7cb9cd9a

NFS: Reduce lock contention in nfs_try_to_update_request() · 1403390d

由 Trond Myklebust 提交于 7月 17, 2017

Micro-optimisation to move the lockless check into the for(;;) loop.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1403390d

NFS: Reduce lock contention in nfs_page_find_head_request() · 82749dd4

由 Trond Myklebust 提交于 7月 17, 2017

Add a lockless check for whether or not the page might be carrying
an existing writeback before we grab the inode->i_lock.
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

82749dd4

NFS: Simplify page writeback · 6d17d653

由 Trond Myklebust 提交于 7月 09, 2017

We don't expect the page header lock to ever be held across I/O, so
it should always be safe to wait for it, even if we're doing nonblocking
writebacks.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

6d17d653

14 7月, 2017 2 次提交

NFS: Fix commit policy for non-blocking calls to nfs_write_inode() · 1a4edf0f

由 Trond Myklebust 提交于 6月 20, 2017

Now that the writes will schedule a commit on their own, we don't
need nfs_write_inode() to schedule one if there are outstanding
writes, and we're being called in non-blocking mode.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1a4edf0f

NFS: Ensure we commit after writeback is complete · 919e3bd9

由 Trond Myklebust 提交于 6月 20, 2017

If the page cache is being flushed, then we want to ensure that we
do start a commit once the pages are done being flushed.
If we just wait until all I/O is done to that file, we can end up
livelocking until the balance_dirty_pages() mechanism puts its
foot down and forces I/O to stop.
So instead we do more or less the same thing that O_DIRECT does,
and set up a counter to tell us when the flush is done,
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

919e3bd9

09 5月, 2017 1 次提交

NFS append COMMIT after synchronous COPY · e0926934

由 Olga Kornievskaia 提交于 5月 08, 2017

Instead of messing with the commit path which has been causing issues,
add a COMMIT op after the COPY and ask for stable copies in the first
space.

It saves a round trip, since after the COPY, the client sends a COMMIT
anyway.
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e0926934

27 4月, 2017 1 次提交

NFSv4: Don't special case "launder" · c373fff7

由 Trond Myklebust 提交于 4月 26, 2017

If the client receives a fatal server error from nfs_pageio_add_request(),
then we should always truncate the page on which the error occurred.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c373fff7

26 4月, 2017 1 次提交

NFS: Don't write back further requests if there is a pending write error · a6598813

由 Trond Myklebust 提交于 4月 25, 2017

If the server has already returned a fatal write error that the user
has not yet received on this file, then don't write back the other pages.
Instead, act as if they have been sent, and have returned with the same
error.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a6598813

21 4月, 2017 4 次提交

nfs: Convert to separately allocated bdi · 0db10944

由 Jan Kara 提交于 4月 12, 2017

Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.

CC: Anna Schumaker <anna.schumaker@netapp.com>
CC: linux-nfs@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0db10944

NFS: move rw_mode to nfs_pageio_header · fbe77c30

由 Benjamin Coddington 提交于 4月 19, 2017

Let's try to have it in a cacheline in nfs4_proc_pgio_rpc_prepare().
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

fbe77c30

NFS: Fix use after free in write error path · 1f84ccdf

由 Fred Isaman 提交于 4月 14, 2017

Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Fixes: 0bcbf039 ("nfs: handle request add failure properly")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1f84ccdf

NFS: fix usage of mempools. · 518662e0

由 NeilBrown 提交于 4月 10, 2017

When passed GFP flags that allow sleeping (such as
GFP_NOIO), mempool_alloc() will never return NULL, it will
wait until memory is available.

This means that we don't need to handle failure, but that we
do need to ensure one thread doesn't call mempool_alloc()
twice on the one pool without queuing or freeing the first
allocation.  If multiple threads did this during times of
high memory pressure, the pool could be exhausted and a
deadlock could result.

pnfs_generic_alloc_ds_commits() attempts to allocate from
the nfs_commit_mempool while already holding an allocation
from that pool.  This is not safe.  So change
nfs_commitdata_alloc() to take a flag that indicates whether
failure is acceptable.

In pnfs_generic_alloc_ds_commits(), accept failure and
handle it as we currently do.  Else where, do not accept
failure, and do not handle it.

Even when failure is acceptable, we want to succeed if
possible.  That means both
 - using an entry from the pool if there is one
 - waiting for direct reclaim is there isn't.

We call mempool_alloc(GFP_NOWAIT) to achieve the first, then
kmem_cache_alloc(GFP_NOIO|__GFP_NORETRY) to achieve the
second.  Each of these can fail, but together they do the
best they can without blocking indefinitely.

The objects returned by kmem_cache_alloc() will still be freed
by mempool_free().  This is safe as mempool_alloc() uses
exactly the same function to allocate objects (since the mempool
was created with mempool_create_slab_pool()).  The object returned
by mempool_alloc() and kmem_cache_alloc() are indistinguishable
so mempool_free() will handle both identically, either adding to the
pool or calling kmem_cache_free().

Also, don't test for failure when allocating from
nfs_wdata_mempool.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

518662e0