1. 31 7月, 2012 10 次提交
    • B
      NFS: Create a try_mount rpc op · ff9099f2
      Bryan Schumaker 提交于
      I'm already looking up the nfs subversion in nfs_fs_mount(), so I have
      easy access to rpc_ops that used to be difficult to reach.  This allows
      me to set up a different mount path for NFS v2/3 and NFS v4.
      Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ff9099f2
    • B
      NFS: Remove the NFS v4 xdev mount function · e8f25e6d
      Bryan Schumaker 提交于
      I can now share this code with the v2 and v3 code by using the NFS
      subversion structure.
      Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      e8f25e6d
    • B
      NFS: Add version registering framework · ab7017a3
      Bryan Schumaker 提交于
      This patch adds in the code to track multiple versions of the NFS
      protocol.  I created default structures for v2, v3 and v4 so that each
      version can continue to work while I convert them into kernel modules.
      I also removed the const parameter from the rpc_version array so that I
      can change it at runtime.
      Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ab7017a3
    • D
      NFS: Fix a number of bugs in the idmapper · a427b9ec
      David Howells 提交于
      Fix a number of bugs in the NFS idmapper code:
      
       (1) Only registered key types can be passed to the core keys code, so
           register the legacy idmapper key type.
      
           This is a requirement because the unregister function cleans up keys
           belonging to that key type so that there aren't dangling pointers to the
           module left behind - including the key->type pointer.
      
       (2) Rename the legacy key type.  You can't have two key types with the same
           name, and (1) would otherwise require that.
      
       (3) complete_request_key() must be called in the error path of
           nfs_idmap_legacy_upcall().
      
       (4) There is one idmap struct for each nfs_client struct.  This means that
           idmap->idmap_key_cons is shared without the use of a lock.  This is a
           problem because key_instantiate_and_link() - as called indirectly by
           idmap_pipe_downcall() - releases anyone waiting for the key to be
           instantiated.
      
           What happens is that idmap_pipe_downcall() running in the rpc.idmapd
           thread, releases the NFS filesystem in whatever thread that is running in
           to continue.  This may then make another idmapper call, overwriting
           idmap_key_cons before idmap_pipe_downcall() gets the chance to call
           complete_request_key().
      
           I *think* that reading idmap_key_cons only once, before
           key_instantiate_and_link() is called, and then caching the result in a
           variable is sufficient.
      
      Bug (4) is the cause of:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<          (null)>]           (null)
      PGD 0
      Oops: 0010 [#1] SMP
      CPU 1
      Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
      Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1
      RIP: 0010:[<0000000000000000>]  [<          (null)>]           (null)
      RSP: 0018:ffff8801a3645d40  EFLAGS: 00010246
      RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80
      RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50
      RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0
      R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80
      R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006
      FS:  00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710)
      Stack:
       ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90
       ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75
       ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8
      Call Trace:
       [<ffffffff81260878>] ? __key_instantiate_and_link+0x58/0x100
       [<ffffffff81260983>] key_instantiate_and_link+0x63/0xa0
       [<ffffffffa057062b>] idmap_pipe_downcall+0x1cb/0x1e0 [nfs]
       [<ffffffffa0107f57>] rpc_pipe_write+0x67/0x90 [sunrpc]
       [<ffffffff8117f833>] vfs_write+0xb3/0x180
       [<ffffffff8117fb5a>] sys_write+0x4a/0x90
       [<ffffffff81600329>] system_call_fastpath+0x16/0x1b
      Code:  Bad RIP value.
      RIP  [<          (null)>]           (null)
       RSP <ffff8801a3645d40>
      CR2: 0000000000000000
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NSteve Dickson <steved@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org [>= 3.4]
      a427b9ec
    • J
      nfs: skip commit in releasepage if we're freeing memory for fs-related reasons · 5cf02d09
      Jeff Layton 提交于
      We've had some reports of a deadlock where rpciod ends up with a stack
      trace like this:
      
          PID: 2507   TASK: ffff88103691ab40  CPU: 14  COMMAND: "rpciod/14"
           #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
           #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
           #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
           #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
           #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
           #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
           #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
           #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
           #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
           #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
          #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
          #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
          #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
          #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
          #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
          #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
          #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
          #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
          #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
          #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
          #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
          #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
          #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
          #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
          #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
          #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca
      
      rpciod is trying to allocate memory for a new socket to talk to the
      server. The VM ends up calling ->releasepage to get more memory, and it
      tries to do a blocking commit. That commit can't succeed however without
      a connected socket, so we deadlock.
      
      Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
      socket allocation, and having nfs_release_page check for that flag when
      deciding whether to do a commit call. Also, set PF_FSTRANS
      unconditionally in rpc_async_schedule since that function can also do
      allocations sometimes.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      5cf02d09
    • P
      pnfsblock: bail out partial page IO · 159e0561
      Peng Tao 提交于
      Current block layout driver read/write code assumes page
      aligned IO in many places. Add a checker to validate the assumption.
      Otherwise there would be data corruption like when application does
      open(O_WRONLY) and page unaliged write.
      Signed-off-by: NPeng Tao <tao.peng@emc.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      159e0561
    • J
      nfs: fix fl_type tests in NFSv4 code · f44106e2
      Jeff Layton 提交于
      fl_type is not a bitmap.
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f44106e2
    • F
      NFS: fix pnfs regression with directio writes · c95908e4
      Fred Isaman 提交于
      Commit 57208fa7 "NFS: Create an write_pageio_init() function"
      did not modify the calls in direct.c, preventing direct io from
      using pnfs.  This reintroduces that capability.
      Signed-off-by: NFred Isaman <iisaman@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c95908e4
    • F
      NFS: fix pnfs regression with directio reads · 59948db3
      Fred Isaman 提交于
      Commit 1abb5088 "NFS: Create an read_pageio_init() function"
      did not modify the call in direct.c, preventing direct io from
      using pnfs.  This reintroduces that capability.
      Signed-off-by: NFred Isaman <iisaman@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      59948db3
    • R
      nfs: fix stub return type warnings · 0add3e85
      Randy Dunlap 提交于
      Fix numerous repeated warnings by making the stub function
      void instead of non-void:
      
      fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl':
      fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc:	Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      0add3e85
  2. 18 7月, 2012 13 次提交
  3. 17 7月, 2012 14 次提交
    • C
      NFS: Clean up nfs4_proc_setclientid() and friends · 6bbb4ae8
      Chuck Lever 提交于
      Add documenting comments and appropriate debugging messages.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6bbb4ae8
    • C
      NFS: Treat NFS4ERR_CLID_INUSE as a fatal error · de734831
      Chuck Lever 提交于
      For NFSv4 minor version 0, currently the cl_id_uniquifier allows the
      Linux client to generate a unique nfs_client_id4 string whenever a
      server replies with NFS4ERR_CLID_INUSE.
      
      This implementation seems to be based on a flawed reading of RFC
      3530.  NFS4ERR_CLID_INUSE actually means that the client has presented
      this nfs_client_id4 string with a different principal at some time in
      the past, and that lease is still in use on the server.
      
      For a Linux client this might be rather difficult to achieve: the
      authentication flavor is named right in the nfs_client_id4.id
      string.  If we change flavors, we change strings automatically.
      
      So, practically speaking, NFS4ERR_CLID_INUSE means there is some other
      client using our string.  There is not much that can be done to
      recover automatically.  Let's make it a permanent error.
      
      Remove the recovery logic in nfs4_proc_setclientid(), and remove the
      cl_id_uniquifier field from the nfs_client data structure.  And,
      remove the authentication flavor from the nfs_client_id4 string.
      
      Keeping the authentication flavor in the nfs_client_id4.id string
      means that we could have a separate lease for each authentication
      flavor used by mounts on the client.  But we want just one lease for
      all the mounts on this client.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      de734831
    • C
      NFS: When state recovery fails, waiting tasks should exit · 46a87b8a
      Chuck Lever 提交于
      NFSv4 state recovery is not always successful.  Failure is signalled
      by setting the nfs_client.cl_cons_state to a negative (errno) value,
      then waking waiters.
      
      Currently this can happen only during mount processing.  I'm about to
      add an explicit case where state recovery failure during normal
      operation should force all NFS requests waiting on that state recovery
      to exit.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      46a87b8a
    • C
      SUNRPC: Add rpcauth_list_flavors() · 6a1a1e34
      Chuck Lever 提交于
      The gss_mech_list_pseudoflavors() function provides a list of
      currently registered GSS pseudoflavors.  This list does not include
      any non-GSS flavors that have been registered with the RPC client.
      nfs4_find_root_sec() currently adds these extra flavors by hand.
      
      Instead, nfs4_find_root_sec() should be looking at the set of flavors
      that have been explicitly registered via rpcauth_register().  And,
      other areas of code will soon need the same kind of list that
      contains all flavors the kernel currently knows about (see below).
      
      Rather than cloning the open-coded logic in nfs4_find_root_sec() to
      those new places, introduce a generic RPC function that generates a
      full list of registered auth flavors and pseudoflavors.
      
      A new rpc_authops method is added that lists a flavor's
      pseudoflavors, if it has any.  I encountered an interesting module
      loader loop when I tried to get the RPC client to invoke
      gss_mech_list_pseudoflavors() by name.
      
      This patch is a pre-requisite for server trunking discovery, and a
      pre-requisite for fixing up the in-kernel mount client to do better
      automatic security flavor selection.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6a1a1e34
    • C
      NFS: nfs_getaclargs.acl_len is a size_t · 56d08fef
      Chuck Lever 提交于
      Squelch compiler warnings:
      
      fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’:
      fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and
      	unsigned integer expressions [-Wsign-compare]
      fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and
      	unsigned integer expressions [-Wsign-compare]
      
      Introduced by commit bf118a34 "NFSv4: include bitmap in nfsv4 get
      acl data", Dec 7, 2011.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      56d08fef
    • C
      NFS: Clean up TEST_STATEID and FREE_STATEID error reporting · 38527b15
      Chuck Lever 提交于
      As a finishing touch, add appropriate documenting comments and some
      debugging printk's.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      38527b15
    • C
      NFS: Clean up nfs41_check_expired_stateid() · 3e60ffdd
      Chuck Lever 提交于
      Clean up: Instead of open-coded flag manipulation, use test_bit() and
      clear_bit() just like all other accessors of the state->flag field.
      This also eliminates several unnecessary implicit integer type
      conversions.
      
      To make it absolutely clear what is going on, a number of comments
      are introduced.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3e60ffdd
    • C
      NFS: State reclaim clears OPEN and LOCK state · eb64cf96
      Chuck Lever 提交于
      The "state->flags & flags" test in nfs41_check_expired_stateid()
      allows the state manager to squelch a TEST_STATEID operation when
      it is known for sure that a state ID is no longer valid.  If the
      lease was purged, for example, the client already knows that state
      ID is now defunct.
      
      But open recovery is still needed for that inode.
      
      To force a call to nfs4_open_expired(), change the default return
      value for nfs41_check_expired_stateid() to force open recovery, and
      the default return value for nfs41_check_locks() to force lock
      recovery, if the requested flags are clear.  Fix suggested by Bryan
      Schumaker.
      
      Also, the presence of a delegation state ID must not prevent normal
      open recovery.  The delegation state ID must be cleared if it was
      revoked, but once cleared I don't think it's presence or absence has
      any bearing on whether open recovery is still needed.  So the logic
      is adjusted to ignore the TEST_STATEID result for the delegation
      state ID.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      eb64cf96
    • C
      NFS: Don't free a state ID the server does not recognize · 89af2739
      Chuck Lever 提交于
      The result of a TEST_STATEID operation can indicate a few different
      things:
      
        o If NFS_OK is returned, then the client can continue using the
          state ID under test, and skip recovery.
      
        o RFC 5661 says that if the state ID was revoked, then the client
          must perform an explicit FREE_STATEID before trying to re-open.
      
        o If the server doesn't recognize the state ID at all, then no
          FREE_STATEID is needed, and the client can immediately continue
          with open recovery.
      
      Let's err on the side of caution: if the server clearly tells us the
      state ID is unknown, we skip the FREE_STATEID.  For any other error,
      we issue a FREE_STATEID.  Sometimes that FREE_STATEID will be
      unnecessary, but leaving unused state IDs on the server needlessly
      ties up resources.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      89af2739
    • C
      NFS: Fix up TEST_STATEID and FREE_STATEID return code handling · 377e507d
      Chuck Lever 提交于
      The TEST_STATEID and FREE_STATEID operations can return
      -NFS4ERR_BAD_STATEID, -NFS4ERR_OLD_STATEID, or -NFS4ERR_DEADSESSION.
      
      nfs41_{test,free}_stateid() should not pass these errors to
      nfs4_handle_exception() during state recovery, since that will
      recursively kick off state recovery again, resulting in a deadlock.
      
      In particular, when the TEST_STATEID operation returns NFS4_OK,
      res.status can contain one of these errors.  _nfs41_test_stateid()
      replaces NFS4_OK with the value in res.status, which is then returned
      to callers.
      
      But res.status is not passed through nfs4_stat_to_errno(), and thus is
      a positive NFS4ERR value.  Currently callers are only interested in
      !NFS4_OK, and nfs4_handle_exception() ignores positive values.
      
      Thus the res.status values are currently ignored by
      nfs4_handle_exception() and won't cause the deadlock above.  Thanks to
      this missing negative, it is only when these operations fail (which
      is very rare) that a deadlock can occur.
      
      Bryan agrees the original intent was to return res.status as a
      negative NFS4ERR value to callers of nfs41_test_stateid().
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      377e507d
    • A
      NFSv4.1 do not send LAYOUTRETURN on emtpy plh_segs list · 293b3b06
      Andy Adamson 提交于
      mark_matching_lsegs_invalid() resets the mds_threshold counters and can
      dereference the layout hdr on an initial empty plh_segs list. It returns 0 both
      in the case of an initial empty list and in a non-emtpy list that was cleared
      by calls to mark_lseg_invalid.
      
      Don't send a LAYOUTRETURN if the list was initially empty.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      293b3b06
    • A
      NFSv4.1 mark layout when already returned · 366d5052
      Andy Adamson 提交于
      When the file layout driver is fencing a DS, _pnfs_return_layout can be
      called mulitple times per inode due to in-flight i/o referencing lsegs on it's
      plh_segs list.
      
      Remember that LAYOUTRETURN has been called, and do not call it again.
      Allow LAYOUTRETURNs after a subsequent LAYOUTGET.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      366d5052
    • A
    • A
      NFSv4.1 return the LAYOUT for each file with failed DS connection I/O · 82c7c7a5
      Andy Adamson 提交于
      First mark the deviceid invalid to prevent any future use. Then fence all
      files involved in I/O to a DS with a connection error by sending a
      LAYOUTRETURN.
      Signed-off-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      82c7c7a5
  4. 14 7月, 2012 3 次提交