1. 06 12月, 2009 7 次提交
  2. 05 12月, 2009 12 次提交
  3. 04 12月, 2009 21 次提交
    • T
      Merge branch 'devel' into linux-next · 7285f2d2
      Trond Myklebust 提交于
      7285f2d2
    • N
      NFS4ERR_FILE_OPEN handling in Linux/NFS · 44ed3556
      NeilBrown 提交于
      NFS4ERR_FILE_OPEN is return by the server when an operation cannot be
      performed because the file is currently open and local (to the server)
      semantics prohibit the operation while the file is open.
      A typical case is a RENAME operation on an MS-Windows platform, which
      prevents rename while the file is open.
      
      While it is possible that such a condition is transitory, it is also
      very possible that the file will be held open for an extended period
      of time thus preventing the operation.
      
      The current behaviour of Linux/NFS is to retry the operation
      indefinitely.  This is not appropriate - we do not expect a rename to
      take an arbitrary amount of time to complete.
      
      Rather, and error should be returned.  The most obvious error code
      would be EBUSY, which is a legal at least for 'rename' and 'unlink',
      and accurately captures the reason for the error.
      
      This patch allows a few retries until about 2 seconds have elapsed,
      then returns EBUSY.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      44ed3556
    • T
      Merge branch 'bugfixes' into nfs-for-next · 0b08b075
      Trond Myklebust 提交于
      0b08b075
    • M
      nfs: clean up sillyrenaming in nfs_rename() · 24e93025
      Miklos Szeredi 提交于
      The d_instantiate(new_dentry, NULL) is superfluous, the dentry is
      already negative.  Rehashing this dummy dentry isn't needed either,
      d_move() works fine on an unhashed target.
      
      The re-checking for busy after a failed nfs_sillyrename() is bogus
      too: new_dentry->d_count < 2 would be a bug here.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      24e93025
    • M
      nfs: dont unhash target if renaming a directory · 27226104
      Miklos Szeredi 提交于
      Move unhashing the target to after the check for existence and being a
      non-directory.
      
      If renaming a directory then the VFS already unhashes the target if it
      is not busy.  If it's busy then acquiring more references during the
      rename makes no difference.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      27226104
    • M
      nfs: fix comments in nfs_rename() · 28f79a1a
      Miklos Szeredi 提交于
      Comments are wrong or out of date.  In particular d_drop() doesn't
      free the inode it just unhashes the dentry.  And if target is a
      directory then it is not checked for being busy.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      28f79a1a
    • M
      nfs: remove unnecessary check from nfs_rename() · e48de5ec
      Miklos Szeredi 提交于
      VFS already checks if both source and target are directories.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      e48de5ec
    • T
    • C
      SUNRPC: soft connect semantics for UDP · 3a28becc
      Chuck Lever 提交于
      Introduce soft connect behavior for UDP transports.  In this case, a
      major timeout returns ETIMEDOUT instead of EIO.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3a28becc
    • C
      SUNRPC: Use soft connect semantics when performing RPC ping · caabea8a
      Chuck Lever 提交于
      Currently, if a remote RPC service is unreachable, an RPC ping will
      hang until the underlying transport connect attempt times out.  A more
      desirable behavior might be to have the ping fail immediately so upper
      layers can recover appropriately.
      
      In the case of an NFS mount, for instance, this would mean the
      mount(2) system call could fail immediately if the server isn't
      listening, rather than hanging uninterruptibly for more than 3
      minutes.
      
      Change rpc_ping() so that it fails immediately for connection-oriented
      transports.  rpc_create() will then fail immediately for such
      transports if an RPC ping was requested.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      caabea8a
    • C
      SUNRPC: Use soft connects for autobinding over TCP · 012da158
      Chuck Lever 提交于
      Autobinding is handled by the rpciod process, not in user processes
      that are generating regular RPC requests.  Thus autobinding is usually
      not affected by signals targetting user processes, such as KILL or
      timer expiration events.
      
      In addition, an RPC request generated by a user process that has
      RPC_TASK_SOFTCONN set and needs to perform an autobind will hang if
      the remote rpcbind service is not available.
      
      For rpcbind queries on connection-oriented transports, let's use the
      new soft connect semantic to return control to the user's process
      quickly, if the kernel's rpcbind client can't connect to the remote
      rpcbind service.
      
      Logic is introduced in call_bind_status() to handle connection errors
      that occurred during an asynchronous rpcbind query.  The logic
      abandons the rpcbind query if the RPC request has SOFTCONN set, and
      retries after a few seconds in the normal case.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      012da158
    • C
      SUNRPC: Use TCP for local rpcbind upcalls · 2a76b3bf
      Chuck Lever 提交于
      Use TCP with the soft connect semantic for local rpcbind upcalls so
      the kernel can detect immediately if the local rpcbind daemon is not
      running.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      2a76b3bf
    • C
      SUNRPC: Use a cached RPC client and transport for rpcbind upcalls · c526611d
      Chuck Lever 提交于
      The kernel's rpcbind client creates and deletes an rpc_clnt and its
      underlying transport socket for every upcall to the local rpcbind
      daemon.
      
      When starting a typical NFS server on IPv4 and IPv6, the NFS service
      itself does three upcalls (one per version) times two upcalls (one
      per transport) times two upcalls (one per address family), making 12,
      plus another one for the initial call to unregister previous NFS
      services.  Starting the NLM service adds an additional 13 upcalls,
      for similar reasons.
      
      (Currently the NFS service doesn't start IPv6 listeners, but it will
      soon enough).
      
      Instead, let's create an rpc_clnt for rpcbind upcalls during the
      first local rpcbind query, and cache it.  This saves the overhead of
      creating and destroying an rpc_clnt and a socket for every upcall.
      
      The new logic also prevents the kernel from attempting an RPCB_SET or
      RPCB_UNSET if it knows from the start that the local portmapper does
      not support rpcbind protocol version 4.  This will cut down on the
      number of rpcbind upcalls in legacy environments.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      c526611d
    • C
      SUNRPC: Simplify synopsis of rpcb_local_clnt() · 5a462115
      Chuck Lever 提交于
      Clean up: At one point, rpcb_local_clnt() handled IPv6 loopback
      addresses too, but it doesn't any more; only IPv4 loopback is used
      now.  Get rid of the @addr and @addrlen arguments to
      rpcb_local_clnt().
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      5a462115
    • C
      SUNRPC: Allow RPCs to fail quickly if the server is unreachable · 09a21c41
      Chuck Lever 提交于
      The kernel sometimes makes RPC calls to services that aren't running.
      Because the kernel's RPC client always assumes the hard retry semantic
      when reconnecting a connection-oriented RPC transport, the underlying
      reconnect logic takes a long while to time out, even though the remote
      may have responded immediately with ECONNREFUSED.
      
      In certain cases, like upcalls to our local rpcbind daemon, or for NFS
      mount requests, we'd like the kernel to fail immediately if the remote
      service isn't reachable.  This allows another transport to be tried
      immediately, or the pending request can be abandoned quickly.
      
      Introduce a per-request flag which controls how call_transmit_status()
      behaves when request transmission fails because the server cannot be
      reached.
      
      We don't want soft connection semantics to apply to other errors.  The
      default case of the switch statement in call_transmit_status() no
      longer falls through; the fall through code is copied to the default
      case, and a "break;" is added.
      
      The transport's connection re-establishment timeout is also ignored for
      such requests.  We want the request to fail immediately, so the
      reconnect delay is skipped.  Additionally, we don't want a connect
      failure here to further increase the reconnect timeout value, since
      this request will not be retried.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      09a21c41
    • C
      SUNRPC: Check explicitly for tk_status == 0 in call_transmit_status() · 206a134b
      Chuck Lever 提交于
      The success case, where task->tk_status == 0, is by far the most
      frequent case in call_transmit_status().
      
      The default: arm of the switch statement in call_transmit_status()
      handles the 0 case.  default: was moved close to the top of the switch
      statement in call_transmit_status() under the theory that the compiler
      places object code for the earliest arms of a switch statement first,
      making the CPU do less work.
      
      The default: arm of a switch statement, however, is executed only
      after all the other cases have been checked.  Even if the compiler
      rearranges the object code, the default: arm is the "last resort",
      meaning all of the other cases have been explicitly exhausted.  That
      makes the current arrangement about as inefficient as it gets for the
      common case.
      
      To fix this, add an explicit check for zero before the switch
      statement.  That forces the compiler to do the zero check first, no
      matter what optimizations it might try to do to the switch statement.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      206a134b
    • C
      NFS: Revert default r/wsize behavior · dd47f96c
      Chuck Lever 提交于
      When the "rsize=" or "wsize=" mount options are not specified,
      text-based mounts have slightly different behavior than legacy binary
      mounts.  Text-based mounts use the smaller of the server's maximum
      and the client's maximum, but binary mounts use the smaller of the
      server's _preferred_ size and the client's maximum.
      
      This difference is actually pretty subtle.  Most servers advertise
      the same value as their maximum and their preferred transfer size, so
      the end result is the same in most cases.
      
      The reason for this difference is that for text-based mounts, if
      r/wsize are not specified, they are set to the largest value supported
      by the client.  For legacy mounts, the values are set to zero if these
      options are not specified.
      
      nfs_server_set_fsinfo() can negotiate the transfer size defaults
      correctly in any case.  There's no need to specify any particular
      value as default in the text-based option parsing logic.
      
      Note that nfs4 doesn't use nfs_server_set_fsinfo(), but the mount.nfs4
      command does set rsize and wsize to 0 if the user didn't specify these
      options.  So, make the same change for text-based NFSv4 mounts.
      
      Thanks to James Pearson <james-p@moving-picture.com> for reporting and
      diagnosing the problem.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      dd47f96c
    • C
      NFS: Display compressed (shorthand) IPv6 in /proc/mounts · d250e190
      Chuck Lever 提交于
      Recent changes to snprintf() introduced the %pI6c formatter, which can
      display an IPv6 address with standard shorthanding.  Use this new
      formatter when displaying IPv6 server addresses in /proc/mounts.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      d250e190
    • C
      SUNRPC: Display compressed (shorthand) IPv6 presentation addresses · dd1fd90f
      Chuck Lever 提交于
      Recent changes to snprintf() introduced the %pI6c formatter, which can
      display an IPv6 address with standard shorthanding.  Using a
      shorthanded address can save us a few bytes of memory for each stored
      presentation address, or a few bytes on the wire when sending these in
      a universal address.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      dd1fd90f
    • R
      NFS: reorder nfs4_sequence_regs to remove 8 bytes of padding on 64 bits · a01878aa
      Richard Kennedy 提交于
      reorder nfs4_sequence_args to remove 8 bytes of padding on 64 bit
      builds.
      
      The size of this structure drops to 24 bytes from 32 and reduces the
      text size of nfs.ko.
      On my x86_64 size reports
      
      		text       data     bss
      2.6.32-rc5 	200996	   8512	    432	 209940	  33414	nfs.ko
      +patch 		200884	   8512	    432	 209828	  333a4	nfs.ko
      Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      a01878aa
    • J
      NFS: convert proto= option to use netids rather than a protoname · ee671b01
      Jeff Layton 提交于
      Solaris uses netids as values for the proto= option, so that when
      someone specifies "tcp6" they get traffic over TCP + IPv6. Until
      recently, this has never really been an issue for Linux since it didn't
      support NFS over IPv6. The netid and the protocol name were generally
      always the same (modulo any strange configuration in /etc/netconfig).
      
      The solaris manpage documents their proto= option as:
      
          proto= _netid_ | rdma
      
      This patch is intended to bring Linux closer to how the Solaris proto=
      option works, by declaring a static netid mapping in the kernel and
      converting the proto= and mountproto= options to follow it and display
      the proper values in /proc/mounts.
      
      Much of this functionality will need to be provided by a userspace
      mount.nfs patch. Chuck Lever has a patch to change mount.nfs in
      the same way. In principle, we could do *all* of this in userspace but
      that would mean that the options in /proc/mounts may not match the
      options used by userspace.
      
      The alternative to the static mapping here is to add a mechanism to
      upcall to userspace for netid's. I'm not opposed to that option, but
      it'll probably mean more overhead (and quite a bit more code). Rather
      than shoot for that at first, I figured it was probably better to
      start simply.
      
      Comments welcome.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ee671b01