提交 · 3be0f80b5fe9c16eca2d538f799b94ca8aa59433 · openeuler / Kernel

18 11月, 2017 1 次提交

NFSv4.1: Fix up replays of interrupted requests · 3be0f80b

由 Trond Myklebust 提交于 10月 19, 2017

If the previous request on a slot was interrupted before it was
processed by the server, then our slot sequence number may be out of whack,
and so we try the next operation using the old sequence number.

The problem with this, is that not all servers check to see that the
client is replaying the same operations as previously when they decide
to go to the replay cache, and so instead of the expected error of
NFS4ERR_SEQ_FALSE_RETRY, we get a replay of the old reply, which could
(if the operations match up) be mistaken by the client for a new reply.

To fix this, we attempt to send a COMPOUND containing only the SEQUENCE op
in order to resync our slot sequence number.

Cc: Olga Kornievskaia <olga.kornievskaia@gmail.com>
[olga.kornievskaia@gmail.com: fix an Oops]
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3be0f80b

17 10月, 2017 4 次提交

NFS: remove special-case revalidate in nfs_opendir() · 1fea73ac

由 NeilBrown 提交于 8月 25, 2017

Commit f5a73672 ("NFS: allow close-to-open cache semantics to
apply to root of NFS filesystem") added a call to
__nfs_revalidate_inode() to nfs_opendir to as the lookup
process wouldn't reliable do this.

Subsequent commit a3fbbde7 ("VFS: we need to set LOOKUP_JUMPED
on mountpoint crossing") make this unnecessary.  So remove the
unnecessary code.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1fea73ac

NFS: revalidate "." etc correctly on "open". · b688741c

由 NeilBrown 提交于 8月 25, 2017

For correct close-to-open semantics, NFS must validate
the change attribute of a directory (or file) on open.

Since commit ecf3d1f1 ("vfs: kill FS_REVAL_DOT by adding a
d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
not revalidated reliably (except when that direct is a mount point).

Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
set.
Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
ignores the flags) and nothing is used for NFSv4.

This is fixed by using nfs_lookup_verify_inode() in
nfs_weak_revalidate().  This does the revalidation exactly when needed.
Also, add a definition of .d_weak_revalidate for NFSv4.

The incorrect behavior is easily demonstrated by running "echo *" in
some non-mountpoint NFS directory while watching network traffic.
Without this patch, "echo *" sometimes doesn't produce any traffic.
With the patch it always does.

Fixes: ecf3d1f1 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
cc: stable@vger.kernel.org (3.9+)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b688741c

NFS: Don't compare apples to elephants to determine access bits · 1750d929

由 Anna Schumaker 提交于 7月 26, 2017

The NFS_ACCESS_* flags aren't a 1:1 mapping to the MAY_* flags, so
checking for MAY_WHATEVER might have surprising results in
nfs*_proc_access().  Let's simplify this check when determining which
bits to ask for, and do it in a generic place instead of copying code
for each NFS version.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1750d929

NFS: Create NFS_ACCESS_* flags · 3c181827

由 Anna Schumaker 提交于 7月 26, 2017

Passing the NFS v4 flags into the v3 code seems weird to me, even if
they are defined to the same values.  This patch adds in generic flags
to help me feel better
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3c181827

05 10月, 2017 1 次提交

NFSv4/pnfs: Fix an infinite layoutget loop · e8fa33a6

由 Trond Myklebust 提交于 10月 04, 2017

Since we can now use a lock stateid or a delegation stateid, that
differs from the context stateid, we need to change the test in
nfs4_layoutget_handle_exception() to take this into account.

This fixes an infinite layoutget loop in the NFS client whereby
it keeps retrying the initial layoutget using the same broken
stateid.

Fixes: 70d2f7b1 ("pNFS: Use the standard I/O stateid when...")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e8fa33a6

02 10月, 2017 4 次提交

nfs/filelayout: fix oops when freeing filelayout segment · 0a47df11

由 Scott Mayhew 提交于 9月 29, 2017

Check for a NULL dsaddr in filelayout_free_lseg() before calling
nfs4_fl_put_deviceid().  This fixes the following oops:

[ 1967.645207] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
[ 1967.646010] IP: [<ffffffffc06d6aea>] nfs4_put_deviceid_node+0xa/0x90 [nfsv4]
[ 1967.646010] PGD c08bc067 PUD 915d3067 PMD 0
[ 1967.753036] Oops: 0000 [#1] SMP
[ 1967.753036] Modules linked in: nfs_layout_nfsv41_files ext4 mbcache jbd2 loop rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache amd64_edac_mod ipmi_ssif edac_mce_amd edac_core kvm_amd sg kvm ipmi_si ipmi_devintf irqbypass pcspkr k8temp ipmi_msghandler i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mptsas ttm scsi_transport_sas mptscsih drm mptbase serio_raw i2c_core bnx2 dm_mirror dm_region_hash dm_log dm_mod
[ 1967.790031] CPU: 2 PID: 1370 Comm: ls Not tainted 3.10.0-709.el7.test.bz1463784.x86_64 #1
[ 1967.790031] Hardware name: IBM BladeCenter LS21 -[7971AC1]-/Server Blade, BIOS -[BAE155AUS-1.10]- 06/03/2009
[ 1967.790031] task: ffff8800c42a3f40 ti: ffff8800c4064000 task.ti: ffff8800c4064000
[ 1967.790031] RIP: 0010:[<ffffffffc06d6aea>]  [<ffffffffc06d6aea>] nfs4_put_deviceid_node+0xa/0x90 [nfsv4]
[ 1967.790031] RSP: 0000:ffff8800c4067978  EFLAGS: 00010246
[ 1967.790031] RAX: ffffffffc062f000 RBX: ffff8801d468a540 RCX: dead000000000200
[ 1967.790031] RDX: ffff8800c40679f8 RSI: ffff8800c4067a0c RDI: 0000000000000000
[ 1967.790031] RBP: ffff8800c4067980 R08: ffff8801d468a540 R09: 0000000000000000
[ 1967.790031] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff8801d468a540
[ 1967.790031] R13: ffff8800c40679f8 R14: ffff8801d5645300 R15: ffff880126f15ff0
[ 1967.790031] FS:  00007f11053c9800(0000) GS:ffff88012bd00000(0000) knlGS:0000000000000000
[ 1967.790031] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1967.790031] CR2: 0000000000000030 CR3: 0000000094b55000 CR4: 00000000000007e0
[ 1967.790031] Stack:
[ 1967.790031]  ffff8801d468a540 ffff8800c4067990 ffffffffc062d2fe ffff8800c40679b0
[ 1967.790031]  ffffffffc062b5b4 ffff8800c40679f8 ffff8801d468a540 ffff8800c40679d8
[ 1967.790031]  ffffffffc06d39af ffff8800c40679f8 ffff880126f16078 0000000000000001
[ 1967.790031] Call Trace:
[ 1967.790031]  [<ffffffffc062d2fe>] nfs4_fl_put_deviceid+0xe/0x10 [nfs_layout_nfsv41_files]
[ 1967.790031]  [<ffffffffc062b5b4>] filelayout_free_lseg+0x24/0x90 [nfs_layout_nfsv41_files]
[ 1967.790031]  [<ffffffffc06d39af>] pnfs_free_lseg_list+0x5f/0x80 [nfsv4]
[ 1967.790031]  [<ffffffffc06d5a67>] _pnfs_return_layout+0x157/0x270 [nfsv4]
[ 1967.790031]  [<ffffffffc06c17dd>] nfs4_evict_inode+0x4d/0x70 [nfsv4]
[ 1967.790031]  [<ffffffff8121de19>] evict+0xa9/0x180
[ 1967.790031]  [<ffffffff8121e729>] iput+0xf9/0x190
[ 1967.790031]  [<ffffffffc0652cea>] nfs_dentry_iput+0x3a/0x50 [nfs]
[ 1967.790031]  [<ffffffff8121ab4f>] shrink_dentry_list+0x20f/0x490
[ 1967.790031]  [<ffffffff8121b018>] d_invalidate+0xd8/0x150
[ 1967.790031]  [<ffffffffc065446b>] nfs_readdir_page_filler+0x40b/0x600 [nfs]
[ 1967.790031]  [<ffffffffc0654bbd>] nfs_readdir_xdr_to_array+0x20d/0x3b0 [nfs]
[ 1967.790031]  [<ffffffff811f3482>] ? __mem_cgroup_commit_charge+0xe2/0x2f0
[ 1967.790031]  [<ffffffff81183208>] ? __add_to_page_cache_locked+0x48/0x170
[ 1967.790031]  [<ffffffffc0654d60>] ? nfs_readdir_xdr_to_array+0x3b0/0x3b0 [nfs]
[ 1967.790031]  [<ffffffffc0654d82>] nfs_readdir_filler+0x22/0x90 [nfs]
[ 1967.790031]  [<ffffffff8118351f>] do_read_cache_page+0x7f/0x190
[ 1967.790031]  [<ffffffff81215d30>] ? fillonedir+0xe0/0xe0
[ 1967.790031]  [<ffffffff8118366c>] read_cache_page+0x1c/0x30
[ 1967.790031]  [<ffffffffc0654f9b>] nfs_readdir+0x1ab/0x6b0 [nfs]
[ 1967.790031]  [<ffffffffc06bd1c0>] ? nfs4_xdr_dec_layoutget+0x270/0x270 [nfsv4]
[ 1967.790031]  [<ffffffff81215d30>] ? fillonedir+0xe0/0xe0
[ 1967.790031]  [<ffffffff81215c20>] vfs_readdir+0xb0/0xe0
[ 1967.790031]  [<ffffffff81216045>] SyS_getdents+0x95/0x120
[ 1967.790031]  [<ffffffff816b9449>] system_call_fastpath+0x16/0x1b
[ 1967.790031] Code: 90 31 d2 48 89 d0 5d c3 85 f6 74 f5 8d 4e 01 89 f0 f0 0f b1 0f 39 f0 74 e2 89 c6 eb eb 0f 1f 40 00 66 66 66 66 90 55 48 89 e5 53 <48> 8b 47 30 48 89 fb a8 04 74 3b 8b 57 60 83 fa 02 74 19 8d 4a
[ 1967.790031] RIP  [<ffffffffc06d6aea>] nfs4_put_deviceid_node+0xa/0x90 [nfsv4]
[ 1967.790031]  RSP <ffff8800c4067978>
[ 1967.790031] CR2: 0000000000000030
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Fixes: 1ebf9801 ("NFS/filelayout: Fix racy setting of fl->dsaddr...")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0a47df11

NFS: Fix uninitialized rpc_wait_queue · 68ebf8fe

由 Benjamin Coddington 提交于 9月 22, 2017

Michael Sterrett reports a NULL pointer dereference on NFSv3 mounts when
CONFIG_NFS_V4 is not set because the NFS UOC rpc_wait_queue has not been
initialized.  Move the initialization of the queue out of the CONFIG_NFS_V4
conditional setion.

Fixes: 7d6ddf88 ("NFS: Add an iocounter wait function for async RPC tasks")
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

68ebf8fe

NFS: Cleanup error handling in nfs_idmap_request_key() · cdb2e53f

由 Dan Carpenter 提交于 9月 21, 2017

nfs_idmap_get_desc() can't actually return zero.  But if it did then
we would return ERR_PTR(0) which is NULL and the caller,
nfs_idmap_get_key(), doesn't expect that so it leads to a NULL pointer
dereference.

I've cleaned this up by changing the "<=" to "<" so it's more clear that
we don't return ERR_PTR(0).
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

cdb2e53f

nfs: RPC_MAX_AUTH_SIZE is in bytes · 35c036ef

由 J. Bruce Fields 提交于 9月 20, 2017

The units of RPC_MAX_AUTH_SIZE is bytes, not 4-byte words. This causes
the client to request a larger-than-necessary session replay slot size.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

35c036ef

12 9月, 2017 3 次提交

NFS: various changes relating to reporting IO errors. · bf4b4905

由 NeilBrown 提交于 9月 11, 2017

1/ remove 'start' and 'end' args from nfs_file_fsync_commit().
   They aren't used.

2/ Make nfs_context_set_write_error() a "static inline" in internal.h
   so we can...

3/ Use nfs_context_set_write_error() instead of mapping_set_error()
   if nfs_pageio_add_request() fails before sending any request.
   NFS generally keeps errors in the open_context, not the mapping,
   so this is more consistent.

4/ If filemap_write_and_write_range() reports any error, still
   check ctx->error.  The value in ctx->error is likely to be
   more useful.  As part of this, NFS_CONTEXT_ERROR_WRITE is
   cleared slightly earlier, before nfs_file_fsync_commit() is called,
   rather than at the start of that function.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bf4b4905

NFS: Add static NFS I/O tracepoints · 8224b273

由 Chuck Lever 提交于 8月 21, 2017

Tools like tcpdump and rpcdebug can be very useful. But there are
plenty of environments where they are difficult or impossible to
use. For example, we've had customers report I/O failures during
workloads so heavy that collecting network traffic or enabling
RPC debugging are themselves onerous.

The kernel's static tracepoints are lightweight (less likely to
introduce timing changes) and efficient (the trace data is compact).
They also work in scenarios where capturing network traffic is not
possible due to lack of hardware support (some InfiniBand HCAs) or
where data or network privacy is a concern.

Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT
is initiated, and when it completes. Record the arguments and
results of each operation, which are not shown by existing sunrpc
module's tracepoints.

For instance, the recorded offset and count can be used to match an
"initiate" event to a "done" event. If an NFS READ result returns
fewer bytes than requested or zero, seeing the EOF flag can be
probative. Seeing an NFS4ERR_BAD_STATEID result is also indication
of a particular class of problems. The timing information attached
to each event record can often be useful as well.

Usage example:

[root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done
/sys/kernel/debug/tracing/events/nfs/*initiate*/filter
/sys/kernel/debug/tracing/events/nfs/*done/filter
Hit Ctrl^C to stop recording
^CKernel buffer statistics:
  Note: "entries" are the entries left in the kernel ring buffer and are not
        recorded in the trace data. They should all be zero.

CPU: 0
entries: 0
overrun: 0
commit overrun: 0
bytes: 3680
oldest event ts:    78.367422
now ts:   100.124419
dropped events: 0
read events: 74

... and so on.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8224b273

pNFS: Use the standard I/O stateid when calling LAYOUTGET · 70d2f7b1

由 Trond Myklebust 提交于 9月 11, 2017

Instead of having a private method for copying the open/delegation stateid,
use the same call that is used for standard I/O through the MDS.

Note that this means we transmit the stateid with a zero seqid, avoiding
issues with NFS4ERR_OLD_STATEID.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

70d2f7b1

10 9月, 2017 4 次提交

NFS: Count the bytes of skipped subrequests in nfs_lock_and_join_requests() · 1bd5d6d0

由 Trond Myklebust 提交于 9月 09, 2017

If we skip a subrequest due to a zero refcount, we should still count
the byte range that it covered so that we accurately reconstruct the
original request size.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1bd5d6d0

NFS: Don't hold the group lock when calling nfs_release_request() · 8b77484f

由 Trond Myklebust 提交于 9月 09, 2017

That can deadlock if this is the last reference since
nfs_page_group_destroy() calls nfs_page_group_sync_on_bit().
Note that even if the page was removed from the subpage list,
the req->wb_head could still be pointing to the old head.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8b77484f

NFS: Remove pnfs_generic_transfer_commit_list() · 5d2a9d9d

由 Trond Myklebust 提交于 9月 09, 2017

It's pretty much a duplicate of nfs_scan_commit_list() that also
clears the PG_COMMIT_TO_DS flag.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5d2a9d9d

NFS: nfs_lock_and_join_requests and nfs_scan_commit_list can deadlock · 137da553

由 Trond Myklebust 提交于 9月 09, 2017

Since the commit list is not ordered, it is possible for nfs_scan_commit_list
to hold a request that nfs_lock_and_join_requests() is waiting for, while
at the same time trying to grab a request that nfs_lock_and_join_requests
already holds.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

137da553

09 9月, 2017 1 次提交

NFS: Fix 2 use after free issues in the I/O code · 196639eb

由 Trond Myklebust 提交于 9月 08, 2017

The writeback code wants to send a commit after processing the pages,
which is why we want to delay releasing the struct path until after
that's done.

Also, the layout code expects that we do not free the inode before
we've put the layout segments in pnfs_writehdr_free() and
pnfs_readhdr_free()

Fixes: 919e3bd9 ("NFS: Ensure we commit after writeback is complete")
Fixes: 4714fb51 ("nfs: remove pgio_header refcount, related cleanup")
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

196639eb

07 9月, 2017 5 次提交

NFS: Sync the correct byte range during synchronous writes · e973b1a5

由 tarangg@amazon.com 提交于 9月 07, 2017

Since commit 18290650 ("NFS: Move buffered I/O locking into
nfs_file_write()") nfs_file_write() has not flushed the correct byte
range during synchronous writes.  generic_write_sync() expects that
iocb->ki_pos points to the right edge of the range rather than the
left edge.

To replicate the problem, open a file with O_DSYNC, have the client
write at increasing offsets, and then print the successful offsets.
Block port 2049 partway through that sequence, and observe that the
client application indicates successful writes in advance of what the
server received.

Fixes: 18290650 ("NFS: Move buffered I/O locking into nfs_file_write()")
Signed-off-by: NJacob Strauss <jsstraus@amazon.com>
Signed-off-by: NTarang Gupta <tarangg@amazon.com>
Tested-by: NTarang Gupta <tarangg@amazon.com>
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e973b1a5

fscache: remove unused ->now_uncached callback · 26b433d0

由 Jan Kara 提交于 9月 06, 2017

Patch series "Ranged pagevec lookup", v2.

In this series I make pagevec_lookup() update the index (to be
consistent with pagevec_lookup_tag() and also as a preparation for
ranged lookups), provide ranged variant of pagevec_lookup() and use it
in places where it makes sense.  This not only removes some common code
but is also a measurable performance win for some use cases (see patch
4/10) where radix tree is sparse and searching & grabing of a page after
the end of the range has measurable overhead.

This patch (of 10):

The callback doesn't ever get called.  Remove it.

Link: http://lkml.kernel.org/r/20170726114704.7626-2-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26b433d0

NFS: remove jiffies field from access cache · 03c6f7d6

由 NeilBrown 提交于 8月 16, 2017

This field hasn't been used since commit 57b69181 ("NFS: Cache
access checks more aggressively").
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

03c6f7d6

NFS: flush data when locking a file to ensure cache coherence for mmap. · 779eafab

由 NeilBrown 提交于 8月 18, 2017

When a byte range lock (or flock) is taken out on an NFS file, the
validity of the cached data is checked and the inode is marked
NFS_INODE_INVALID_DATA.  However the cached data isn't flushed from
the page cache.

This is sufficient for future read() requests or mmap() requests as
they call nfs_revalidate_mapping() which performs the flush if
necessary.

However an existing mapping is not affected.  Accessing data through
that mapping will continue to return old data even though the inode is
marked NFS_INODE_INVALID_DATA.

This can easily be confirmed using the 'nfs' tool in
  git://github.com/okirch/twopence-nfs.git
and running

   nfs coherence FILENAME
on one client, and
   nfs coherence -r FILENAME
on another client.

It appears that prior to Linux 2.6.0 this worked correctly.

However commit:

http://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ca9268fe3ddd075714005adecd4afbd7f9ab87d0

removed the call to inode_invalidate_pages() from nfs_zap_caches().  I
haven't tested this code, but inspection suggests that prior to this
commit, file locking would invalidate all inode pages.

This patch adds a call to nfs_revalidate_mapping() after a
successful SETLK so that invalid data is flushed.  With this patch the
above test passes.  To minimize impact (and possibly avoid a GETATTR
call) this only happens if the mapping might be mapped into
userspace.

Cc: Olaf Kirch <okir@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

779eafab

NFS: don't expect errors from mempool_alloc(). · 237f8306

由 NeilBrown 提交于 8月 18, 2017

Commit fbe77c30 ("NFS: move rw_mode to nfs_pageio_header")
reintroduced some pointless code that commit 518662e0 ("NFS: fix
usage of mempools.") had recently removed.

Remove it again.

Cc: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

237f8306

25 8月, 2017 1 次提交

sunrpc: Const-ify struct sv_serv_ops · afea5657

由 Chuck Lever 提交于 8月 01, 2017

Close an attack vector by moving the arrays of per-server methods to
read-only memory.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

afea5657

24 8月, 2017 1 次提交

block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992

由 Christoph Hellwig 提交于 8月 23, 2017

This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

74d46992

21 8月, 2017 2 次提交

NFS: Fix NFSv2 security settings · 53a75f22

由 Chuck Lever 提交于 8月 10, 2017

For a while now any NFSv2 mount where sec= is specified uses
AUTH_NULL. If sec= is not specified, the mount uses AUTH_UNIX.
Commit e68fd7c8 ("mount: use sec= that was specified on the
command line") attempted to address a very similar problem with
NFSv3, and should have fixed this too, but it has a bug.

The MNTv1 MNT procedure does not return a list of security flavors,
so our client makes up a list containing just AUTH_NULL. This should
enable nfs_verify_authflavors() to assign the sec= specified flavor,
but instead, it incorrectly sets it to AUTH_NULL.

I expect this would also be a problem for any NFSv3 server whose
MNTv3 MNT procedure returned a security flavor list containing only
AUTH_NULL.

Fixes: e68fd7c8 ("mount: use sec= that was specified on ... ")
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=310Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

53a75f22

NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys' · b79e87e0

由 NeilBrown 提交于 8月 18, 2017

An NFSv4.1 client might close a file after the user who opened it has
logged off.  In this case the user's credentials may no longer be
valid, if they are e.g. kerberos credentials that have expired.

NFSv4.1 has a mechanism to allow the client to use machine credentials
to close a file.  However due to a short-coming in the RFC, a CLOSE
with those credentials may not be possible if the file in question
isn't exported to the same security flavor - the required PUTFH must
be rejected when this is the case.

Specifically if a server and client support kerberos in general and
have used it to form a machine credential, but the file is only
exported to "sec=sys", a PUTFH with the machine credentials will fail,
so CLOSE is not possible.

As RPC_AUTH_UNIX (used by sec=sys) credentials can never expire, there
is no value in using the machine credential in place of them.
So in that case, just use the users credentials for CLOSE etc, as you would
in NFSv4.0
Signed-off-by: NNeil Brown <neilb@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b79e87e0

20 8月, 2017 2 次提交

NFS: Remove unused parameter gfp_flags from nfs_pageio_init() · 3bde7afd

由 Trond Myklebust 提交于 8月 20, 2017

Now that the mirror allocation has been moved, the parameter can go.
Also remove the redundant symbol export.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3bde7afd

NFSv4: Fix up mirror allocation · 14abcb0b

由 Trond Myklebust 提交于 8月 19, 2017

There are a number of callers of nfs_pageio_complete() that want to
continue using the nfs_pageio_descriptor without needing to call
nfs_pageio_init() again. Examples include nfs_pageio_resend() and
nfs_pageio_cond_complete().

The problem is that nfs_pageio_complete() also calls
nfs_pageio_cleanup_mirroring(), which frees up the array of mirrors.
This can lead to writeback errors, in the next call to
nfs_pageio_setup_mirroring().

Fix by simply moving the allocation of the mirrors to
nfs_pageio_setup_mirroring().

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196709Reported-by: NJianhongYin <yin-jianhong@163.com>
Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

14abcb0b

15 8月, 2017 11 次提交

NFS: Wait for requests that are locked on the commit list · 2ce209c4

由 Trond Myklebust 提交于 8月 01, 2017

If a request is on the commit list, but is locked, we will currently skip
it, which can lead to livelocking when the commit count doesn't reduce
to zero.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2ce209c4

NFSv4/pnfs: Replace pnfs_put_lseg_locked() with pnfs_put_lseg() · 8205b9ce

由 Trond Myklebust 提交于 8月 01, 2017

Now that we no longer hold the inode->i_lock when manipulating the
commit lists, it is safe to call pnfs_put_lseg() again.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8205b9ce

NFS: Switch to using mapping->private_lock for page writeback lookups. · 4b9bb25b

由 Trond Myklebust 提交于 8月 01, 2017

Switch from using the inode->i_lock for this to avoid contention with
other metadata manipulation.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4b9bb25b

T
NFS: Use an atomic_long_t to count the number of commits · 5cb953d4
由 Trond Myklebust 提交于 8月 01, 2017
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
5cb953d4

NFS: Use an atomic_long_t to count the number of requests · a6b6d5b8

由 Trond Myklebust 提交于 8月 01, 2017

Rather than forcing us to take the inode->i_lock just in order to bump
the number.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a6b6d5b8

NFSv4: Use a mutex to protect the per-inode commit lists · e824f99a

由 Trond Myklebust 提交于 8月 01, 2017

The commit lists can get very large, so using the inode->i_lock can
end up affecting general metadata performance.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e824f99a

NFS: Refactor nfs_page_find_head_request() · b30d2f04

由 Trond Myklebust 提交于 8月 01, 2017

Split out the 2 cases so that we can treat the locking differently.
The issue is that the locking in the pageswapcache cache is highly
linked to the commit list locking.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b30d2f04

NFSv4: Convert nfs_lock_and_join_requests() to use nfs_page_find_head_request() · bd37d6fc

由 Trond Myklebust 提交于 8月 01, 2017

Hide the locking from nfs_lock_and_join_requests() so that we can
separate out the requirements for swapcache pages.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bd37d6fc

NFS: Fix up nfs_page_group_covers_page() · 7e8a30f8

由 Trond Myklebust 提交于 7月 17, 2017

Fix up the test in nfs_page_group_covers_page(). The simplest implementation
is to check that we have a set of intersecting or contiguous subrequests
that connect page offset 0 to nfs_page_length(req->wb_page).
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7e8a30f8

NFS: Remove unused parameter from nfs_page_group_lock() · 1344b7ea

由 Trond Myklebust 提交于 7月 17, 2017

nfs_page_group_lock() is now always called with the 'nonblock'
parameter set to 'false'.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1344b7ea

T
NFS: Remove unuse function nfs_page_group_lock_wait() · dee83046
由 Trond Myklebust 提交于 7月 17, 2017
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
dee83046

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功