1. 28 5月, 2011 2 次提交
  2. 25 4月, 2011 1 次提交
  3. 18 3月, 2011 1 次提交
    • S
      RPC: killing RPC tasks races fixed · 8e26de23
      Stanislav Kinsbursky 提交于
      RPC task RPC_TASK_QUEUED bit is set must be checked before trying to wake up
      task rpc_killall_tasks() because task->tk_waitqueue can not be set (equal to
      NULL).
      Also, as Trond Myklebust mentioned, such approach (instead of checking
      tk_waitqueue to NULL) allows us to "optimise away the call to
      rpc_wake_up_queued_task() altogether for those
      tasks that aren't queued".
      
      Here is an example of dereferencing of tk_waitqueue equal to NULL:
      
      CPU 0               	CPU 1				CPU 2
      --------------------	---------------------	--------------------------
      nfs4_run_open_task
      rpc_run_task
      rpc_execute
      rpc_set_active
      rpc_make_runnable
      (waiting)
      			rpc_async_schedule
      			nfs4_open_prepare
      			nfs_wait_on_sequence
      						nfs_umount_begin
      						rpc_killall_tasks
      						rpc_wake_up_task
      						rpc_wake_up_queued_task
      						spin_lock(tk_waitqueue == NULL)
      						BUG()
      			rpc_sleep_on
      			spin_lock(&q->lock)
      			__rpc_sleep_on
      			task->tk_waitqueue = q
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@openvz.org>
      Cc: stable@kernel.org
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      8e26de23
  4. 12 3月, 2011 2 次提交
  5. 22 12月, 2010 1 次提交
  6. 17 12月, 2010 2 次提交
  7. 23 11月, 2010 1 次提交
  8. 25 10月, 2010 1 次提交
    • T
      SUNRPC: After calling xprt_release(), we must restart from call_reserve · 118df3d1
      Trond Myklebust 提交于
      Rob Leslie reports seeing the following Oops after his Kerberos session
      expired.
      
      BUG: unable to handle kernel NULL pointer dereference at 00000058
      IP: [<e186ed94>] rpcauth_refreshcred+0x11/0x12c [sunrpc]
      *pde = 00000000
      Oops: 0000 [#1]
      last sysfs file: /sys/devices/platform/pc87360.26144/temp3_input
      Modules linked in: autofs4 authenc esp4 xfrm4_mode_transport ipt_LOG ipt_REJECT xt_limit xt_state ipt_REDIRECT xt_owner xt_HL xt_hl xt_tcpudp xt_mark cls_u32 cls_tcindex sch_sfq sch_htb sch_dsmark geodewdt deflate ctr twofish_generic twofish_i586 twofish_common camellia serpent blowfish cast5 cbc xcbc rmd160 sha512_generic sha1_generic hmac crypto_null af_key rpcsec_gss_krb5 nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc ip_gre sit tunnel4 dummy ext3 jbd nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables pc8736x_gpio nsc_gpio pc87360 hwmon_vid loop aes_i586 aes_generic sha256_generic dm_crypt cs5535_gpio serio_raw cs5535_mfgpt hifn_795x des_generic geode_rng rng_core led_class ext4 mbcache jbd2 crc16 dm_mirror dm_region_hash dm_log dm_snapshot dm_mod sd_mod crc_t10dif ide_pci_generic cs5536 amd74xx ide_core pata_cs5536 ata_generic libata usb_storage via_rhine mii scsi_mod btrfs zlib_deflate crc32c libcrc32c [last unloaded: scsi_wait_scan]
      
      Pid: 12875, comm: sudo Not tainted 2.6.36-net5501 #1 /
      EIP: 0060:[<e186ed94>] EFLAGS: 00010292 CPU: 0
      EIP is at rpcauth_refreshcred+0x11/0x12c [sunrpc]
      EAX: 00000000 EBX: defb13a0 ECX: 00000006 EDX: e18683b8
      ESI: defb13a0 EDI: 00000000 EBP: 00000000 ESP: de571d58
       DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
      Process sudo (pid: 12875, ti=de570000 task=decd1430 task.ti=de570000)
      Stack:
       e186e008 00000000 defb13a0 0000000d deda6000 e1868f22 e196f12b defb13a0
      <0> defb13d8 00000000 00000000 e186e0aa 00000000 defb13a0 de571dac 00000000
      <0> e186956c de571e34 debea5c0 de571dc8 e186967a 00000000 debea5c0 de571e34
      Call Trace:
       [<e186e008>] ? rpc_wake_up_next+0x114/0x11b [sunrpc]
       [<e1868f22>] ? call_decode+0x24a/0x5af [sunrpc]
       [<e196f12b>] ? nfs4_xdr_dec_access+0x0/0xa2 [nfs]
       [<e186e0aa>] ? __rpc_execute+0x62/0x17b [sunrpc]
       [<e186956c>] ? rpc_run_task+0x91/0x97 [sunrpc]
       [<e186967a>] ? rpc_call_sync+0x40/0x5b [sunrpc]
       [<e1969ca2>] ? nfs4_proc_access+0x10a/0x176 [nfs]
       [<e19572fa>] ? nfs_do_access+0x2b1/0x2c0 [nfs]
       [<e186ed61>] ? rpcauth_lookupcred+0x62/0x84 [sunrpc]
       [<e19573b6>] ? nfs_permission+0xad/0x13b [nfs]
       [<c0177824>] ? exec_permission+0x15/0x4b
       [<c0177fbd>] ? link_path_walk+0x4f/0x456
       [<c017867d>] ? path_walk+0x4c/0xa8
       [<c0179678>] ? do_path_lookup+0x1f/0x68
       [<c017a3fb>] ? user_path_at+0x37/0x5f
       [<c016359c>] ? handle_mm_fault+0x229/0x55b
       [<c0170a2d>] ? sys_faccessat+0x93/0x146
       [<c0170aef>] ? sys_access+0xf/0x13
       [<c02cf615>] ? syscall_call+0x7/0xb
      Code: 0f 94 c2 84 d2 74 09 8b 44 24 0c e8 6a e9 8b de 83 c4 14 89 d8 5b 5e 5f 5d c3 55 57 56 53 83 ec 1c fc 89 c6 8b 40 10 89 44 24 04 <8b> 58 58 85 db 0f 85 d4 00 00 00 0f b7 46 70 8b 56 20 89 c5 83
      EIP: [<e186ed94>] rpcauth_refreshcred+0x11/0x12c [sunrpc] SS:ESP 0068:de571d58
      CR2: 0000000000000058
      
      This appears to be caused by the function rpc_verify_header() first
      calling xprt_release(), then doing a call_refresh. If we release the
      transport slot, we should _always_ jump back to call_reserve before
      calling anything else.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@kernel.org
      118df3d1
  9. 02 10月, 2010 1 次提交
  10. 13 9月, 2010 3 次提交
  11. 04 8月, 2010 4 次提交
  12. 18 5月, 2010 1 次提交
  13. 15 5月, 2010 1 次提交
  14. 22 3月, 2010 1 次提交
  15. 04 12月, 2009 5 次提交
    • C
      SUNRPC: soft connect semantics for UDP · 3a28becc
      Chuck Lever 提交于
      Introduce soft connect behavior for UDP transports.  In this case, a
      major timeout returns ETIMEDOUT instead of EIO.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3a28becc
    • C
      SUNRPC: Use soft connect semantics when performing RPC ping · caabea8a
      Chuck Lever 提交于
      Currently, if a remote RPC service is unreachable, an RPC ping will
      hang until the underlying transport connect attempt times out.  A more
      desirable behavior might be to have the ping fail immediately so upper
      layers can recover appropriately.
      
      In the case of an NFS mount, for instance, this would mean the
      mount(2) system call could fail immediately if the server isn't
      listening, rather than hanging uninterruptibly for more than 3
      minutes.
      
      Change rpc_ping() so that it fails immediately for connection-oriented
      transports.  rpc_create() will then fail immediately for such
      transports if an RPC ping was requested.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      caabea8a
    • C
      SUNRPC: Use soft connects for autobinding over TCP · 012da158
      Chuck Lever 提交于
      Autobinding is handled by the rpciod process, not in user processes
      that are generating regular RPC requests.  Thus autobinding is usually
      not affected by signals targetting user processes, such as KILL or
      timer expiration events.
      
      In addition, an RPC request generated by a user process that has
      RPC_TASK_SOFTCONN set and needs to perform an autobind will hang if
      the remote rpcbind service is not available.
      
      For rpcbind queries on connection-oriented transports, let's use the
      new soft connect semantic to return control to the user's process
      quickly, if the kernel's rpcbind client can't connect to the remote
      rpcbind service.
      
      Logic is introduced in call_bind_status() to handle connection errors
      that occurred during an asynchronous rpcbind query.  The logic
      abandons the rpcbind query if the RPC request has SOFTCONN set, and
      retries after a few seconds in the normal case.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      012da158
    • C
      SUNRPC: Allow RPCs to fail quickly if the server is unreachable · 09a21c41
      Chuck Lever 提交于
      The kernel sometimes makes RPC calls to services that aren't running.
      Because the kernel's RPC client always assumes the hard retry semantic
      when reconnecting a connection-oriented RPC transport, the underlying
      reconnect logic takes a long while to time out, even though the remote
      may have responded immediately with ECONNREFUSED.
      
      In certain cases, like upcalls to our local rpcbind daemon, or for NFS
      mount requests, we'd like the kernel to fail immediately if the remote
      service isn't reachable.  This allows another transport to be tried
      immediately, or the pending request can be abandoned quickly.
      
      Introduce a per-request flag which controls how call_transmit_status()
      behaves when request transmission fails because the server cannot be
      reached.
      
      We don't want soft connection semantics to apply to other errors.  The
      default case of the switch statement in call_transmit_status() no
      longer falls through; the fall through code is copied to the default
      case, and a "break;" is added.
      
      The transport's connection re-establishment timeout is also ignored for
      such requests.  We want the request to fail immediately, so the
      reconnect delay is skipped.  Additionally, we don't want a connect
      failure here to further increase the reconnect timeout value, since
      this request will not be retried.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      09a21c41
    • C
      SUNRPC: Check explicitly for tk_status == 0 in call_transmit_status() · 206a134b
      Chuck Lever 提交于
      The success case, where task->tk_status == 0, is by far the most
      frequent case in call_transmit_status().
      
      The default: arm of the switch statement in call_transmit_status()
      handles the 0 case.  default: was moved close to the top of the switch
      statement in call_transmit_status() under the theory that the compiler
      places object code for the earliest arms of a switch statement first,
      making the CPU do less work.
      
      The default: arm of a switch statement, however, is executed only
      after all the other cases have been checked.  Even if the compiler
      rearranges the object code, the default: arm is the "last resort",
      meaning all of the other cases have been explicitly exhausted.  That
      makes the current arrangement about as inefficient as it gets for the
      common case.
      
      To fix this, add an explicit check for zero before the switch
      statement.  That forces the compiler to do the zero check first, no
      matter what optimizations it might try to do to the switch statement.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      206a134b
  16. 25 9月, 2009 2 次提交
  17. 14 9月, 2009 1 次提交
  18. 29 8月, 2009 1 次提交
  19. 10 8月, 2009 3 次提交
  20. 13 7月, 2009 1 次提交
  21. 18 6月, 2009 4 次提交
  22. 12 3月, 2009 1 次提交