1. 18 11月, 2017 1 次提交
    • T
      NFSv4.1: Fix up replays of interrupted requests · 3be0f80b
      Trond Myklebust 提交于
      If the previous request on a slot was interrupted before it was
      processed by the server, then our slot sequence number may be out of whack,
      and so we try the next operation using the old sequence number.
      
      The problem with this, is that not all servers check to see that the
      client is replaying the same operations as previously when they decide
      to go to the replay cache, and so instead of the expected error of
      NFS4ERR_SEQ_FALSE_RETRY, we get a replay of the old reply, which could
      (if the operations match up) be mistaken by the client for a new reply.
      
      To fix this, we attempt to send a COMPOUND containing only the SEQUENCE op
      in order to resync our slot sequence number.
      
      Cc: Olga Kornievskaia <olga.kornievskaia@gmail.com>
      [olga.kornievskaia@gmail.com: fix an Oops]
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      3be0f80b
  2. 17 10月, 2017 4 次提交
    • N
      NFS: remove special-case revalidate in nfs_opendir() · 1fea73ac
      NeilBrown 提交于
      Commit f5a73672 ("NFS: allow close-to-open cache semantics to
      apply to root of NFS filesystem") added a call to
      __nfs_revalidate_inode() to nfs_opendir to as the lookup
      process wouldn't reliable do this.
      
      Subsequent commit a3fbbde7 ("VFS: we need to set LOOKUP_JUMPED
      on mountpoint crossing") make this unnecessary.  So remove the
      unnecessary code.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1fea73ac
    • N
      NFS: revalidate "." etc correctly on "open". · b688741c
      NeilBrown 提交于
      For correct close-to-open semantics, NFS must validate
      the change attribute of a directory (or file) on open.
      
      Since commit ecf3d1f1 ("vfs: kill FS_REVAL_DOT by adding a
      d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
      not revalidated reliably (except when that direct is a mount point).
      
      Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
      which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
      set.
      Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
      ignores the flags) and nothing is used for NFSv4.
      
      This is fixed by using nfs_lookup_verify_inode() in
      nfs_weak_revalidate().  This does the revalidation exactly when needed.
      Also, add a definition of .d_weak_revalidate for NFSv4.
      
      The incorrect behavior is easily demonstrated by running "echo *" in
      some non-mountpoint NFS directory while watching network traffic.
      Without this patch, "echo *" sometimes doesn't produce any traffic.
      With the patch it always does.
      
      Fixes: ecf3d1f1 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
      cc: stable@vger.kernel.org (3.9+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b688741c
    • A
      NFS: Don't compare apples to elephants to determine access bits · 1750d929
      Anna Schumaker 提交于
      The NFS_ACCESS_* flags aren't a 1:1 mapping to the MAY_* flags, so
      checking for MAY_WHATEVER might have surprising results in
      nfs*_proc_access().  Let's simplify this check when determining which
      bits to ask for, and do it in a generic place instead of copying code
      for each NFS version.
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1750d929
    • A
      NFS: Create NFS_ACCESS_* flags · 3c181827
      Anna Schumaker 提交于
      Passing the NFS v4 flags into the v3 code seems weird to me, even if
      they are defined to the same values.  This patch adds in generic flags
      to help me feel better
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      3c181827
  3. 05 10月, 2017 1 次提交
    • T
      NFSv4/pnfs: Fix an infinite layoutget loop · e8fa33a6
      Trond Myklebust 提交于
      Since we can now use a lock stateid or a delegation stateid, that
      differs from the context stateid, we need to change the test in
      nfs4_layoutget_handle_exception() to take this into account.
      
      This fixes an infinite layoutget loop in the NFS client whereby
      it keeps retrying the initial layoutget using the same broken
      stateid.
      
      Fixes: 70d2f7b1 ("pNFS: Use the standard I/O stateid when...")
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      e8fa33a6
  4. 02 10月, 2017 4 次提交
    • S
      nfs/filelayout: fix oops when freeing filelayout segment · 0a47df11
      Scott Mayhew 提交于
      Check for a NULL dsaddr in filelayout_free_lseg() before calling
      nfs4_fl_put_deviceid().  This fixes the following oops:
      
      [ 1967.645207] BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      [ 1967.646010] IP: [<ffffffffc06d6aea>] nfs4_put_deviceid_node+0xa/0x90 [nfsv4]
      [ 1967.646010] PGD c08bc067 PUD 915d3067 PMD 0
      [ 1967.753036] Oops: 0000 [#1] SMP
      [ 1967.753036] Modules linked in: nfs_layout_nfsv41_files ext4 mbcache jbd2 loop rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache amd64_edac_mod ipmi_ssif edac_mce_amd edac_core kvm_amd sg kvm ipmi_si ipmi_devintf irqbypass pcspkr k8temp ipmi_msghandler i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common amdkfd amd_iommu_v2 radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops mptsas ttm scsi_transport_sas mptscsih drm mptbase serio_raw i2c_core bnx2 dm_mirror dm_region_hash dm_log dm_mod
      [ 1967.790031] CPU: 2 PID: 1370 Comm: ls Not tainted 3.10.0-709.el7.test.bz1463784.x86_64 #1
      [ 1967.790031] Hardware name: IBM BladeCenter LS21 -[7971AC1]-/Server Blade, BIOS -[BAE155AUS-1.10]- 06/03/2009
      [ 1967.790031] task: ffff8800c42a3f40 ti: ffff8800c4064000 task.ti: ffff8800c4064000
      [ 1967.790031] RIP: 0010:[<ffffffffc06d6aea>]  [<ffffffffc06d6aea>] nfs4_put_deviceid_node+0xa/0x90 [nfsv4]
      [ 1967.790031] RSP: 0000:ffff8800c4067978  EFLAGS: 00010246
      [ 1967.790031] RAX: ffffffffc062f000 RBX: ffff8801d468a540 RCX: dead000000000200
      [ 1967.790031] RDX: ffff8800c40679f8 RSI: ffff8800c4067a0c RDI: 0000000000000000
      [ 1967.790031] RBP: ffff8800c4067980 R08: ffff8801d468a540 R09: 0000000000000000
      [ 1967.790031] R10: 0000000000000000 R11: ffffffffffffffff R12: ffff8801d468a540
      [ 1967.790031] R13: ffff8800c40679f8 R14: ffff8801d5645300 R15: ffff880126f15ff0
      [ 1967.790031] FS:  00007f11053c9800(0000) GS:ffff88012bd00000(0000) knlGS:0000000000000000
      [ 1967.790031] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 1967.790031] CR2: 0000000000000030 CR3: 0000000094b55000 CR4: 00000000000007e0
      [ 1967.790031] Stack:
      [ 1967.790031]  ffff8801d468a540 ffff8800c4067990 ffffffffc062d2fe ffff8800c40679b0
      [ 1967.790031]  ffffffffc062b5b4 ffff8800c40679f8 ffff8801d468a540 ffff8800c40679d8
      [ 1967.790031]  ffffffffc06d39af ffff8800c40679f8 ffff880126f16078 0000000000000001
      [ 1967.790031] Call Trace:
      [ 1967.790031]  [<ffffffffc062d2fe>] nfs4_fl_put_deviceid+0xe/0x10 [nfs_layout_nfsv41_files]
      [ 1967.790031]  [<ffffffffc062b5b4>] filelayout_free_lseg+0x24/0x90 [nfs_layout_nfsv41_files]
      [ 1967.790031]  [<ffffffffc06d39af>] pnfs_free_lseg_list+0x5f/0x80 [nfsv4]
      [ 1967.790031]  [<ffffffffc06d5a67>] _pnfs_return_layout+0x157/0x270 [nfsv4]
      [ 1967.790031]  [<ffffffffc06c17dd>] nfs4_evict_inode+0x4d/0x70 [nfsv4]
      [ 1967.790031]  [<ffffffff8121de19>] evict+0xa9/0x180
      [ 1967.790031]  [<ffffffff8121e729>] iput+0xf9/0x190
      [ 1967.790031]  [<ffffffffc0652cea>] nfs_dentry_iput+0x3a/0x50 [nfs]
      [ 1967.790031]  [<ffffffff8121ab4f>] shrink_dentry_list+0x20f/0x490
      [ 1967.790031]  [<ffffffff8121b018>] d_invalidate+0xd8/0x150
      [ 1967.790031]  [<ffffffffc065446b>] nfs_readdir_page_filler+0x40b/0x600 [nfs]
      [ 1967.790031]  [<ffffffffc0654bbd>] nfs_readdir_xdr_to_array+0x20d/0x3b0 [nfs]
      [ 1967.790031]  [<ffffffff811f3482>] ? __mem_cgroup_commit_charge+0xe2/0x2f0
      [ 1967.790031]  [<ffffffff81183208>] ? __add_to_page_cache_locked+0x48/0x170
      [ 1967.790031]  [<ffffffffc0654d60>] ? nfs_readdir_xdr_to_array+0x3b0/0x3b0 [nfs]
      [ 1967.790031]  [<ffffffffc0654d82>] nfs_readdir_filler+0x22/0x90 [nfs]
      [ 1967.790031]  [<ffffffff8118351f>] do_read_cache_page+0x7f/0x190
      [ 1967.790031]  [<ffffffff81215d30>] ? fillonedir+0xe0/0xe0
      [ 1967.790031]  [<ffffffff8118366c>] read_cache_page+0x1c/0x30
      [ 1967.790031]  [<ffffffffc0654f9b>] nfs_readdir+0x1ab/0x6b0 [nfs]
      [ 1967.790031]  [<ffffffffc06bd1c0>] ? nfs4_xdr_dec_layoutget+0x270/0x270 [nfsv4]
      [ 1967.790031]  [<ffffffff81215d30>] ? fillonedir+0xe0/0xe0
      [ 1967.790031]  [<ffffffff81215c20>] vfs_readdir+0xb0/0xe0
      [ 1967.790031]  [<ffffffff81216045>] SyS_getdents+0x95/0x120
      [ 1967.790031]  [<ffffffff816b9449>] system_call_fastpath+0x16/0x1b
      [ 1967.790031] Code: 90 31 d2 48 89 d0 5d c3 85 f6 74 f5 8d 4e 01 89 f0 f0 0f b1 0f 39 f0 74 e2 89 c6 eb eb 0f 1f 40 00 66 66 66 66 90 55 48 89 e5 53 <48> 8b 47 30 48 89 fb a8 04 74 3b 8b 57 60 83 fa 02 74 19 8d 4a
      [ 1967.790031] RIP  [<ffffffffc06d6aea>] nfs4_put_deviceid_node+0xa/0x90 [nfsv4]
      [ 1967.790031]  RSP <ffff8800c4067978>
      [ 1967.790031] CR2: 0000000000000030
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Fixes: 1ebf9801 ("NFS/filelayout: Fix racy setting of fl->dsaddr...")
      Cc: stable@vger.kernel.org # v4.13+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      0a47df11
    • B
      NFS: Fix uninitialized rpc_wait_queue · 68ebf8fe
      Benjamin Coddington 提交于
      Michael Sterrett reports a NULL pointer dereference on NFSv3 mounts when
      CONFIG_NFS_V4 is not set because the NFS UOC rpc_wait_queue has not been
      initialized.  Move the initialization of the queue out of the CONFIG_NFS_V4
      conditional setion.
      
      Fixes: 7d6ddf88 ("NFS: Add an iocounter wait function for async RPC tasks")
      Cc: stable@vger.kernel.org # 4.11+
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      68ebf8fe
    • D
      NFS: Cleanup error handling in nfs_idmap_request_key() · cdb2e53f
      Dan Carpenter 提交于
      nfs_idmap_get_desc() can't actually return zero.  But if it did then
      we would return ERR_PTR(0) which is NULL and the caller,
      nfs_idmap_get_key(), doesn't expect that so it leads to a NULL pointer
      dereference.
      
      I've cleaned this up by changing the "<=" to "<" so it's more clear that
      we don't return ERR_PTR(0).
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      cdb2e53f
    • J
      nfs: RPC_MAX_AUTH_SIZE is in bytes · 35c036ef
      J. Bruce Fields 提交于
      The units of RPC_MAX_AUTH_SIZE is bytes, not 4-byte words.  This causes
      the client to request a larger-than-necessary session replay slot size.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      35c036ef
  5. 12 9月, 2017 3 次提交
    • N
      NFS: various changes relating to reporting IO errors. · bf4b4905
      NeilBrown 提交于
      1/ remove 'start' and 'end' args from nfs_file_fsync_commit().
         They aren't used.
      
      2/ Make nfs_context_set_write_error() a "static inline" in internal.h
         so we can...
      
      3/ Use nfs_context_set_write_error() instead of mapping_set_error()
         if nfs_pageio_add_request() fails before sending any request.
         NFS generally keeps errors in the open_context, not the mapping,
         so this is more consistent.
      
      4/ If filemap_write_and_write_range() reports any error, still
         check ctx->error.  The value in ctx->error is likely to be
         more useful.  As part of this, NFS_CONTEXT_ERROR_WRITE is
         cleared slightly earlier, before nfs_file_fsync_commit() is called,
         rather than at the start of that function.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      bf4b4905
    • C
      NFS: Add static NFS I/O tracepoints · 8224b273
      Chuck Lever 提交于
      Tools like tcpdump and rpcdebug can be very useful. But there are
      plenty of environments where they are difficult or impossible to
      use. For example, we've had customers report I/O failures during
      workloads so heavy that collecting network traffic or enabling
      RPC debugging are themselves onerous.
      
      The kernel's static tracepoints are lightweight (less likely to
      introduce timing changes) and efficient (the trace data is compact).
      They also work in scenarios where capturing network traffic is not
      possible due to lack of hardware support (some InfiniBand HCAs) or
      where data or network privacy is a concern.
      
      Introduce tracepoints that show when an NFS READ, WRITE, or COMMIT
      is initiated, and when it completes. Record the arguments and
      results of each operation, which are not shown by existing sunrpc
      module's tracepoints.
      
      For instance, the recorded offset and count can be used to match an
      "initiate" event to a "done" event. If an NFS READ result returns
      fewer bytes than requested or zero, seeing the EOF flag can be
      probative. Seeing an NFS4ERR_BAD_STATEID result is also indication
      of a particular class of problems. The timing information attached
      to each event record can often be useful as well.
      
      Usage example:
      
      [root@manet tmp]# trace-cmd record -e nfs:*initiate* -e nfs:*done
      /sys/kernel/debug/tracing/events/nfs/*initiate*/filter
      /sys/kernel/debug/tracing/events/nfs/*done/filter
      Hit Ctrl^C to stop recording
      ^CKernel buffer statistics:
        Note: "entries" are the entries left in the kernel ring buffer and are not
              recorded in the trace data. They should all be zero.
      
      CPU: 0
      entries: 0
      overrun: 0
      commit overrun: 0
      bytes: 3680
      oldest event ts:    78.367422
      now ts:   100.124419
      dropped events: 0
      read events: 74
      
      ... and so on.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      8224b273
    • T
      pNFS: Use the standard I/O stateid when calling LAYOUTGET · 70d2f7b1
      Trond Myklebust 提交于
      Instead of having a private method for copying the open/delegation stateid,
      use the same call that is used for standard I/O through the MDS.
      
      Note that this means we transmit the stateid with a zero seqid, avoiding
      issues with NFS4ERR_OLD_STATEID.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      70d2f7b1
  6. 10 9月, 2017 4 次提交
  7. 09 9月, 2017 1 次提交
    • T
      NFS: Fix 2 use after free issues in the I/O code · 196639eb
      Trond Myklebust 提交于
      The writeback code wants to send a commit after processing the pages,
      which is why we want to delay releasing the struct path until after
      that's done.
      
      Also, the layout code expects that we do not free the inode before
      we've put the layout segments in pnfs_writehdr_free() and
      pnfs_readhdr_free()
      
      Fixes: 919e3bd9 ("NFS: Ensure we commit after writeback is complete")
      Fixes: 4714fb51 ("nfs: remove pgio_header refcount, related cleanup")
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      196639eb
  8. 07 9月, 2017 5 次提交
  9. 25 8月, 2017 1 次提交
  10. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  11. 21 8月, 2017 2 次提交
    • C
      NFS: Fix NFSv2 security settings · 53a75f22
      Chuck Lever 提交于
      For a while now any NFSv2 mount where sec= is specified uses
      AUTH_NULL. If sec= is not specified, the mount uses AUTH_UNIX.
      Commit e68fd7c8 ("mount: use sec= that was specified on the
      command line") attempted to address a very similar problem with
      NFSv3, and should have fixed this too, but it has a bug.
      
      The MNTv1 MNT procedure does not return a list of security flavors,
      so our client makes up a list containing just AUTH_NULL. This should
      enable nfs_verify_authflavors() to assign the sec= specified flavor,
      but instead, it incorrectly sets it to AUTH_NULL.
      
      I expect this would also be a problem for any NFSv3 server whose
      MNTv3 MNT procedure returned a security flavor list containing only
      AUTH_NULL.
      
      Fixes: e68fd7c8 ("mount: use sec= that was specified on ... ")
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=310Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      53a75f22
    • N
      NFSv4.1: don't use machine credentials for CLOSE when using 'sec=sys' · b79e87e0
      NeilBrown 提交于
      An NFSv4.1 client might close a file after the user who opened it has
      logged off.  In this case the user's credentials may no longer be
      valid, if they are e.g. kerberos credentials that have expired.
      
      NFSv4.1 has a mechanism to allow the client to use machine credentials
      to close a file.  However due to a short-coming in the RFC, a CLOSE
      with those credentials may not be possible if the file in question
      isn't exported to the same security flavor - the required PUTFH must
      be rejected when this is the case.
      
      Specifically if a server and client support kerberos in general and
      have used it to form a machine credential, but the file is only
      exported to "sec=sys", a PUTFH with the machine credentials will fail,
      so CLOSE is not possible.
      
      As RPC_AUTH_UNIX (used by sec=sys) credentials can never expire, there
      is no value in using the machine credential in place of them.
      So in that case, just use the users credentials for CLOSE etc, as you would
      in NFSv4.0
      Signed-off-by: NNeil Brown <neilb@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      b79e87e0
  12. 20 8月, 2017 2 次提交
  13. 15 8月, 2017 11 次提交