提交 · ea51f94b45a0fd657c61206c1b648cc72f95befa · openanolis / cloud-kernel

17 8月, 2018 2 次提交

pNFS: Treat RECALLCONFLICT like DELAY... · ea51f94b

由 Trond Myklebust 提交于 8月 15, 2018

Yes, it is possible to get trapped in a loop, but the server should be
administratively revoking the recalled layout if it never gets returned.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ea51f94b

pNFS: When updating the stateid in layoutreturn, also update the recall range · ecf84026

由 Trond Myklebust 提交于 8月 15, 2018

When we update the layout stateid in nfs4_layoutreturn_refresh_stateid, we
should also update the range in order to let the server know we're actually
returning everything.

Fixes: 16c278dbfa63 ("pnfs: Fix handling of NFS4ERR_OLD_STATEID replies...")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ecf84026

09 8月, 2018 3 次提交

pnfs: Use true and false for boolean values · 10db5b7a

由 Gustavo A. R. Silva 提交于 8月 01, 2018

Return statements in functions returning bool should use true or false
instead of an integer value.

This issue was detected with the help of Coccinelle.
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

10db5b7a

pnfs: pnfs_find_lseg() should not check NFS_LSEG_LAYOUTRETURN · 2230ca0d

由 Trond Myklebust 提交于 8月 01, 2018

Layout segment validity is determined only by the NFS_LSEG_VALID flag. If
it is set, the layout segment is finable. As it is, when the flexfiles
driver sets NFS_LSEG_LAYOUTRETURN to indicate that we cannot discard
the layout segment, but that it must be returned, then this can result
in an unnecessary layoutget storm.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2230ca0d

pnfs: Fix handling of NFS4ERR_OLD_STATEID replies to layoutreturn · c16467dc

由 Trond Myklebust 提交于 7月 29, 2018

If the server tells us that out layoutreturn raced with another layout
update, then we must ensure that the new layout segments are not in use
before we resend with an updated layout stateid.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c16467dc

27 7月, 2018 4 次提交

pNFS: Parse the results of layoutget on open even if permissions checks fail · af9b6d75

由 Trond Myklebust 提交于 6月 29, 2018

Even if the results of the permissions checks failed, we should parse
the results of the layout on open call so that we can return the
layout if required.
Note that we also want to ignore the sequence counter for whether or not
a layout recall occurred. If the recall pertained to our OPEN, then the
callback will know, and will attempt to wait for us to finih processing
anyway.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

af9b6d75

pNFS: Wait for stale layoutget calls to complete in pnfs_update_layout() · 411ae722

由 Trond Myklebust 提交于 6月 23, 2018

If the old layout was recalled, and we returned NFS4ERR_NOMATCHINGLAYOUT
then we need to wait for all outstanding layoutget calls to complete
before we can send a new one.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

411ae722

pNFS: Ignore non-recalled layouts in pnfs_layout_need_return() · f0b42981

由 Trond Myklebust 提交于 6月 23, 2018

If a layout has been recalled, then we should fire off a layoutreturn as
soon as all the layout segments that match the recall have been retired.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

f0b42981

pNFS: Don't discard layout segments that are marked for return · e0b7d420

由 Trond Myklebust 提交于 6月 23, 2018

If there are layout segments that are marked for return, then we need
to ensure that pnfs_mark_matching_lsegs_return() does not just
silently discard them, but it should tell the caller that there is a
layoutreturn scheduled.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

e0b7d420

12 6月, 2018 1 次提交

skip LAYOUTRETURN if layout is invalid · 93b7f7ad

由 Olga Kornievskaia 提交于 6月 11, 2018

Currently, when IO to DS fails, client returns the layout and
retries against the MDS. However, then on umounting (inode eviction)
it returns the layout again.

This is because pnfs_return_layout() was changed in
commit d78471d3 ("pnfs/blocklayout: set PNFS_LAYOUTRETURN_ON_ERROR")
to always set NFS_LAYOUT_RETURN_REQUESTED so even if we returned
the layout, it will be returned again. Instead, let's also check
if we have already marked the layout invalid.
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

93b7f7ad

01 6月, 2018 14 次提交

pnfs: Don't call commit on failed layoutget-on-open · 32f1c28f

由 Trond Myklebust 提交于 5月 22, 2018

If the layoutget on open call failed, we can't really commit the inode,
so don't bother calling it.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

32f1c28f

pNFS: Don't send LAYOUTGET on OPEN for read, if we already have cached data · 64294b08

由 Trond Myklebust 提交于 2月 02, 2017

If we're only opening the file for reading, and the file is empty and/or
we already have cached data, then heuristically optimise away the
LAYOUTGET.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

64294b08

NFSv4/pnfs: Don't switch off layoutget-on-open for transient errors · 8dc96566

由 Trond Myklebust 提交于 2月 01, 2017

Ensure that we only switch off the LAYOUTGET operation in the OPEN
compound when the server is truly broken, and/or it is complaining
that the compound is too large.
Currently, we end up turning off the functionality permanently,
even for transient errors such as EACCES or ENOSPC.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

8dc96566

NFSv4/pnfs: Ensure pnfs_parse_lgopen() won't try to parse uninitialised data · d49e0d5b

由 Trond Myklebust 提交于 2月 01, 2017

We need to ensure that pnfs_parse_lgopen() doesn't try to parse a
struct nfs4_layoutget_res that was not filled by a successful call
to decode_layoutget(). This can happen if we performed a cached open,
or if either the OP_ACCESS or OP_GETATTR operations preceding the
OP_LAYOUTGET in the compound returned an error.

By initialising the 'status' field to NFS4ERR_DELAY, we ensure that
pnfs_parse_lgopen() won't try to interpret the structure.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

d49e0d5b

pnfs: Fix manipulation of NFS_LAYOUT_FIRST_LAYOUTGET · 30ae2412

由 Fred Isaman 提交于 10月 18, 2016

The flag was not always being cleared after LAYOUTGET on OPEN.
Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

30ae2412

pnfs: Add barrier to prevent lgopen using LAYOUTGET during recall · c49b5209

由 Fred Isaman 提交于 10月 05, 2016

Since the LAYOUTGET on OPEN can be sent without prior inode information,
existing methods to prevent LAYOUTGET from being sent while processing
CB_LAYOUTRECALL don't work. Track if a recall occurred while LAYOUTGET
was being sent, and if so ignore the results.
Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

c49b5209

pnfs: Stop attempting LAYOUTGET on OPEN on failure · 6e01260c

由 Fred Isaman 提交于 10月 04, 2016

Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

6e01260c

pnfs: Add LAYOUTGET to OPEN of an existing file · 78746a38

由 Fred Isaman 提交于 9月 22, 2016

Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

78746a38

pNFS: Refactor nfs4_layoutget_release() · 29a8bfe5

由 Trond Myklebust 提交于 5月 30, 2018

Move the actual freeing of the struct nfs4_layoutget into fs/nfs/pnfs.c
where it can be reused by the layoutget on open code.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

29a8bfe5

pnfs: Add LAYOUTGET to OPEN of a new file · 2409a976

由 Fred Isaman 提交于 10月 06, 2016

This triggers when have no pre-existing inode to attach to.
The preexisting case is saved for later.
Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

2409a976

pnfs: Change pnfs_alloc_init_layoutget_args call signature · 5e36e2a9

由 Fred Isaman 提交于 10月 06, 2016

Don't send in a layout, instead use the (possibly NULL) inode.

This is needed for LAYOUTGET attached to an OPEN where the inode is not
yet set.
Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

5e36e2a9

pnfs: Move nfs4_opendata into nfs4_fs.h · 1b146fcf

由 Fred Isaman 提交于 9月 21, 2016

It will be needed now by the pnfs code.
Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

1b146fcf

pnfs: move allocations out of nfs4_proc_layoutget · dacb452d

由 Fred Isaman 提交于 9月 19, 2016

They work better in the new alloc_init function.
Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

dacb452d

pnfs: refactor send_layoutget · 587f03de

由 Fred Isaman 提交于 9月 21, 2016

Pull out the alloc/init part for eventual reuse by OPEN.
Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

587f03de

09 3月, 2018 1 次提交

pNFS: Prevent the layout header refcount going to zero in pnfs_roc() · 9c6376eb

由 Trond Myklebust 提交于 3月 07, 2018

Ensure that we hold a reference to the layout header when processing
the pNFS return-on-close so that the refcount value does not inadvertently
go to zero.
Reported-by: NTigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.10+
Tested-by: NTigran Mkrtchyan <tigran.mkrtchyan@desy.de>

9c6376eb

15 1月, 2018 2 次提交

nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds · ba4a76f7

由 Scott Mayhew 提交于 12月 15, 2017

Currently when falling back to doing I/O through the MDS (via
pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header
without releasing the reference taken on the dreq
via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init ->
nfs_direct_pgio_init.  It then takes another reference on the dreq via
nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and
as a result the requester will become stuck in inode_dio_wait.  Once
that happens, other processes accessing the inode will become stuck as
well.

Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean
up correctly by calling hdr->completion_ops->completion() instead of
calling hdr->release() directly.

This can be reproduced (sometimes) by performing "storage failover
takeover" commands on NetApp filer while doing direct I/O from a client.

This can also be reproduced using SystemTap to simulate a failure while
doing direct I/O from a client (from Dave Wysochanski
<dwysocha@redhat.com>):

stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }'
Suggested-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Fixes: 1ca018d2 ("pNFS: Fix a memory leak when attempted pnfs fails")
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ba4a76f7

pnfs/blocklayout: handle transient devices · b3dce6a2

由 Benjamin Coddington 提交于 12月 08, 2017

PNFS block/SCSI layouts should gracefully handle cases where block devices
are not available when a layout is retrieved, or the block devices are
removed while the client holds a layout.

While setting up a layout segment, keep a record of an unavailable or
un-parsable block device in cache with a flag so that subsequent layouts do
not spam the server with GETDEVINFO. We can reuse the current
NFS_DEVICEID_UNAVAILABLE handling with one variation: instead of reusing
the device, we will discard it and send a fresh GETDEVINFO after the
timeout, since the lookup and validation of the device occurs within the
GETDEVINFO response handling.

A lookup of a layout segment that references an unavailable device will
return a segment with the NFS_LSEG_UNAVAILABLE flag set. This will allow
the pgio layer to mark the layout with the appropriate fail bit, which
forces subsequent IO to the MDS, and prevents spamming the server with
LAYOUTGET, LAYOUTRETURN.

Finally, when IO to a block device fails, look up the block device(s)
referenced by the pgio header, and mark them as unavailable.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b3dce6a2

18 11月, 2017 4 次提交

pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-close · 7380020e

由 Trond Myklebust 提交于 11月 06, 2017

If our layoutreturn on close operation returns an NFS4ERR_OLD_STATEID,
then try to update the stateid and retry. We know that there should
be no further LAYOUTGET requests being launched.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7380020e

NFS: Fix bool initialization/comparison · 6089dd0d

由 Thomas Meyer 提交于 10月 07, 2017

Bool initializations should use true and false. Bool tests don't need
comparisons.
Signed-off-by: NThomas Meyer <thomas@m3y3r.de>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6089dd0d

fs, nfs: convert pnfs_layout_hdr.plh_refcount from atomic_t to refcount_t · 2b28a7be

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable pnfs_layout_hdr.plh_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2b28a7be

fs, nfs: convert pnfs_layout_segment.pls_refcount from atomic_t to refcount_t · eba6dd69

由 Elena Reshetova 提交于 10月 20, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eba6dd69

12 9月, 2017 1 次提交

pNFS: Use the standard I/O stateid when calling LAYOUTGET · 70d2f7b1

由 Trond Myklebust 提交于 9月 11, 2017

Instead of having a private method for copying the open/delegation stateid,
use the same call that is used for standard I/O through the MDS.

Note that this means we transmit the stateid with a zero seqid, avoiding
issues with NFS4ERR_OLD_STATEID.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

70d2f7b1

09 9月, 2017 1 次提交

NFS: Fix 2 use after free issues in the I/O code · 196639eb

由 Trond Myklebust 提交于 9月 08, 2017

The writeback code wants to send a commit after processing the pages,
which is why we want to delay releasing the struct path until after
that's done.

Also, the layout code expects that we do not free the inode before
we've put the layout segments in pnfs_writehdr_free() and
pnfs_readhdr_free()

Fixes: 919e3bd9 ("NFS: Ensure we commit after writeback is complete")
Fixes: 4714fb51 ("nfs: remove pgio_header refcount, related cleanup")
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

196639eb

15 8月, 2017 1 次提交

NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg() · 8205b9ce

由 Trond Myklebust 提交于 8月 01, 2017

Now that we no longer hold the inode->i_lock when manipulating the
commit lists, it is safe to call pnfs_put_lseg() again.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8205b9ce

24 5月, 2017 1 次提交

pnfs: Fix the check for requests in range of layout segment · 08cb5b0f

由 Benjamin Coddington 提交于 5月 22, 2017

It's possible and acceptable for NFS to attempt to add requests beyond the
range of the current pgio->pg_lseg, a case which should be caught and
limited by the pg_test operation. However, the current handling of this
case replaces pgio->pg_lseg with a new layout segment (after a WARN) within
that pg_test operation. That will cause all the previously added requests
to be submitted with this new layout segment, which may not be valid for
those requests.

Fix this problem by only returning zero for the number of bytes to coalesce
from pg_test for this case which allows any previously added requests to
complete on the current layout segment. The check for requests starting
out of range of the layout segment moves to pg_init, so that the
replacement of pgio->pg_lseg will be done when the next request is added.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

08cb5b0f

03 5月, 2017 2 次提交

pNFS: Fix a deadlock when coalescing writes and returning the layout · 61f454e3

由 Trond Myklebust 提交于 5月 01, 2017

Consider the following deadlock:

Process P1	Process P2		Process P3
==========	==========		==========
					lock_page(page)

		lseg = pnfs_update_layout(inode)

lo = NFS_I(inode)->layout
pnfs_error_mark_layout_for_return(lo)

		lock_page(page)

					lseg = pnfs_update_layout(inode)

In this scenario,
- P1 has declared the layout to be in error, but P2 holds a reference to
  a layout segment on that inode, so the layoutreturn is deferred.
- P2 is waiting for a page lock held by P3.
- P3 is asking for a new layout segment, but is blocked waiting
  for the layoutreturn.

The fix is to ensure that pnfs_error_mark_layout_for_return() does
not set the NFS_LAYOUT_RETURN flag, which blocks P3. Instead, we allow
the latter to call LAYOUTGET so that it can make progress and unblock
P2.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

61f454e3

pNFS: Don't clear the layout return info if there are segments to return · 5466d214

由 Trond Myklebust 提交于 5月 01, 2017

In pnfs_clear_layoutreturn_info, ensure that we don't clear the layout
return info if there are new segments queued for return due to, for
instance, a race between a LAYOUTRETURN and a failed I/O attempt.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5466d214

29 4月, 2017 3 次提交

pNFS: Ensure we commit the layout if it has been invalidated · 1f18b82c

由 Trond Myklebust 提交于 4月 29, 2017

If the layout is being invalidated on the server, then we must
invoke nfs_commit_inode() to ensure any commits to the DS get
cleared out.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1f18b82c

pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path · 37f8aa16

由 Trond Myklebust 提交于 4月 29, 2017

If the attempt to write through pNFS fails, we need to use the same
failure semantics as for the read path: If the FF_FLAGS_NO_IO_THRU_MDS
flag is set or we have sufficient valid DSes, then we must retry through
pNFS

Fixes: d67ae825 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

37f8aa16

pNFS: Ensure we check layout validity before marking it for return · bdebfccd

由 Trond Myklebust 提交于 4月 27, 2017

pnfs_error_mark_layout_for_return needs to check that the layout is
valid before calling pnfs_set_plh_return_info().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bdebfccd

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功