1. 13 9月, 2016 1 次提交
    • C
      svcauth_gss: Revert 64c59a37 ("Remove unnecessary allocation") · bf2c4b6f
      Chuck Lever 提交于
      rsc_lookup steals the passed-in memory to avoid doing an allocation of
      its own, so we can't just pass in a pointer to memory that someone else
      is using.
      
      If we really want to avoid allocation there then maybe we should
      preallocate somwhere, or reference count these handles.
      
      For now we should revert.
      
      On occasion I see this on my server:
      
      kernel: kernel BUG at /home/cel/src/linux/linux-2.6/mm/slub.c:3851!
      kernel: invalid opcode: 0000 [#1] SMP
      kernel: Modules linked in: cts rpcsec_gss_krb5 sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd btrfs xor iTCO_wdt iTCO_vendor_support raid6_pq pcspkr i2c_i801 i2c_smbus lpc_ich mfd_core mei_me sg mei shpchp wmi ioatdma ipmi_si ipmi_msghandler acpi_pad acpi_power_meter rpcrdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb mlx4_core ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
      kernel: CPU: 7 PID: 145 Comm: kworker/7:2 Not tainted 4.8.0-rc4-00006-g9d06b0b #15
      kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
      kernel: Workqueue: events do_cache_clean [sunrpc]
      kernel: task: ffff8808541d8000 task.stack: ffff880854344000
      kernel: RIP: 0010:[<ffffffff811e7075>]  [<ffffffff811e7075>] kfree+0x155/0x180
      kernel: RSP: 0018:ffff880854347d70  EFLAGS: 00010246
      kernel: RAX: ffffea0020fe7660 RBX: ffff88083f9db064 RCX: 146ff0f9d5ec5600
      kernel: RDX: 000077ff80000000 RSI: ffff880853f01500 RDI: ffff88083f9db064
      kernel: RBP: ffff880854347d88 R08: ffff8808594ee000 R09: ffff88087fdd8780
      kernel: R10: 0000000000000000 R11: ffffea0020fe76c0 R12: ffff880853f01500
      kernel: R13: ffffffffa013cf76 R14: ffffffffa013cff0 R15: ffffffffa04253a0
      kernel: FS:  0000000000000000(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      kernel: CR2: 00007fed60b020c3 CR3: 0000000001c06000 CR4: 00000000001406e0
      kernel: Stack:
      kernel: ffff8808589f2f00 ffff880853f01500 0000000000000001 ffff880854347da0
      kernel: ffffffffa013cf76 ffff8808589f2f00 ffff880854347db8 ffffffffa013d006
      kernel: ffff8808589f2f20 ffff880854347e00 ffffffffa0406f60 0000000057c7044f
      kernel: Call Trace:
      kernel: [<ffffffffa013cf76>] rsc_free+0x16/0x90 [auth_rpcgss]
      kernel: [<ffffffffa013d006>] rsc_put+0x16/0x30 [auth_rpcgss]
      kernel: [<ffffffffa0406f60>] cache_clean+0x2e0/0x300 [sunrpc]
      kernel: [<ffffffffa04073ee>] do_cache_clean+0xe/0x70 [sunrpc]
      kernel: [<ffffffff8109a70f>] process_one_work+0x1ff/0x3b0
      kernel: [<ffffffff8109b15c>] worker_thread+0x2bc/0x4a0
      kernel: [<ffffffff8109aea0>] ? rescuer_thread+0x3a0/0x3a0
      kernel: [<ffffffff810a0ba4>] kthread+0xe4/0xf0
      kernel: [<ffffffff8169c47f>] ret_from_fork+0x1f/0x40
      kernel: [<ffffffff810a0ac0>] ? kthread_stop+0x110/0x110
      kernel: Code: f7 ff ff eb 3b 65 8b 05 da 30 e2 7e 89 c0 48 0f a3 05 a0 38 b8 00 0f 92 c0 84 c0 0f 85 d1 fe ff ff 0f 1f 44 00 00 e9 f5 fe ff ff <0f> 0b 49 8b 03 31 f6 f6 c4 40 0f 85 62 ff ff ff e9 61 ff ff ff
      kernel: RIP  [<ffffffff811e7075>] kfree+0x155/0x180
      kernel: RSP <ffff880854347d70>
      kernel: ---[ end trace 3fdec044969def26 ]---
      
      It seems to be most common after a server reboot where a client has been
      using a Kerberos mount, and reconnects to continue its workload.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bf2c4b6f
  2. 07 9月, 2016 2 次提交
    • C
      xprtrdma: Fix receive buffer accounting · 05c97466
      Chuck Lever 提交于
      An RPC can terminate before its reply arrives, if a credential
      problem or a soft timeout occurs. After this happens, xprtrdma
      reports it is out of Receive buffers.
      
      A Receive buffer is posted before each RPC is sent, and returned to
      the buffer pool when a reply is received. If no reply is received
      for an RPC, that Receive buffer remains posted. But xprtrdma tries
      to post another when the next RPC is sent.
      
      If this happens a few dozen times, there are no receive buffers left
      to be posted at send time. I don't see a way for a transport
      connection to recover at that point, and it will spit warnings and
      unnecessarily delay RPCs on occasion for its remaining lifetime.
      
      Commit 1e465fd4 ("xprtrdma: Replace send and receive arrays")
      removed a little bit of logic to detect this case and not provide
      a Receive buffer so no more buffers are posted, and then transport
      operation continues correctly. We didn't understand what that logic
      did, and it wasn't commented, so it was removed as part of the
      overhaul to support backchannel requests.
      
      Restore it, but be wary of the need to keep extra Receives posted
      to deal with backchannel requests.
      
      Fixes: 1e465fd4 ("xprtrdma: Replace send and receive arrays")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      05c97466
    • C
      xprtrdma: Revert 3d4cf35b ("xprtrdma: Reply buffer exhaustion...") · 78d506e1
      Chuck Lever 提交于
      Receive buffer exhaustion, if it were to actually occur, would be
      catastrophic. However, when there are no reply buffers to post, that
      means all of them have already been posted and are waiting for
      incoming replies. By design, there can never be more RPCs in flight
      than there are available receive buffers.
      
      A receive buffer can be left posted after an RPC exits without a
      received reply; say, due to a credential problem or a soft timeout.
      This does not result in fewer posted receive buffers than there are
      pending RPCs, and there is already logic in xprtrdma to deal
      appropriately with this case.
      
      It also looks like the "+ 2" that was removed was accidentally
      accommodating the number of extra receive buffers needed for
      receiving backchannel requests. That will need to be addressed by
      another patch.
      
      Fixes: 3d4cf35b ("xprtrdma: Reply buffer exhaustion can be...")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      78d506e1
  3. 03 9月, 2016 1 次提交
  4. 25 8月, 2016 1 次提交
  5. 06 8月, 2016 3 次提交
  6. 05 8月, 2016 2 次提交
    • N
      SUNRPC: disable the use of IPv6 temporary addresses. · d88e4d82
      NeilBrown 提交于
      If the net.ipv6.conf.*.use_temp_addr sysctl is set to '2',
      then TCP connections over IPv6 will prefer a 'private' source
      address.
      These eventually expire and become invalid, typically after a week,
      but the time is configurable.
      
      When the local address becomes invalid the client will not be able to
      receive replies from the server.  Eventually the connection will timeout
      or break and a new connection will be established, but this can take
      half an hour (typically TCP connection break time).
      
      RFC 4941, which describes private IPv6 addresses, acknowledges that some
      applications might not work well with them and that the application may
      explicitly a request non-temporary (i.e. "public") address.
      
      I believe this is correct for SUNRPC clients.  Without this change, a
      client will occasionally experience a long delay if private addresses
      have been enabled.
      
      The privacy offered by private addresses is of little value for an NFS
      server which requires client authentication.
      
      For NFSv3 this will often not be a problem because idle connections are
      closed after 5 minutes.  For NFSv4 connections never go idle due to the
      period RENEW (or equivalent) request.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d88e4d82
    • O
      SUNRPC: allow for upcalls for same uid but different gss service · 9130b8db
      Olga Kornievskaia 提交于
      It's possible to have simultaneous upcalls for the same UIDs but
      different GSS service. In that case, we need to allow for the
      upcall to gssd to proceed so that not the same context is used
      by two different GSS services. Some servers lock the use of context
      to the GSS service.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Cc: stable@vger.kernel.org # v3.9+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      9130b8db
  7. 03 8月, 2016 1 次提交
  8. 02 8月, 2016 3 次提交
    • T
      SUNRPC: Detect immediate closure of accepted sockets · c7995f8a
      Trond Myklebust 提交于
      This modification is useful for debugging issues that happen while
      the socket is being initialised.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      c7995f8a
    • T
      SUNRPC: accept() may return sockets that are still in SYN_RECV · b2f21f7d
      Trond Myklebust 提交于
      We're seeing traces of the following form:
      
       [10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2
       [10952.396351] svc: tcp_accept ffff88042ba4 a000 sock ffff88042a6e4c80
       [10952.396362] nfsd: connect from 10.2.6.1, port=187
       [10952.396364] svc: svc_setup_socket ffff8800b99bcf00
       [10952.396368] setting up TCP socket for reading
       [10952.396370] svc: svc_setup_socket created ffff8803eb10a000 (inet ffff88042b75b800)
       [10952.396373] svc: transport ffff8803eb10a000 put into queue
       [10952.396375] svc: transport ffff88042ba4a000 put into queue
       [10952.396377] svc: server ffff8800bb0ec000 waiting for data (to = 3600000)
       [10952.396380] svc: transport ffff8803eb10a000 dequeued, inuse=2
       [10952.396381] svc_recv: found XPT_CLOSE
       [10952.396397] svc: svc_delete_xprt(ffff8803eb10a000)
       [10952.396398] svc: svc_tcp_sock_detach(ffff8803eb10a000)
       [10952.396399] svc: svc_sock_detach(ffff8803eb10a000)
       [10952.396412] svc: svc_sock_free(ffff8803eb10a000)
      
      i.e. an immediate close of the socket after initialisation.
      
      The culprit appears to be the test at the end of svc_tcp_init, which
      checks if the newly created socket is in the TCP_ESTABLISHED state,
      and immediately closes it if not. The evidence appears to suggest that
      the socket might still be in the SYN_RECV state at this time.
      
      The fix is to check for both states, and then to add a check in
      svc_tcp_state_change() to ensure we don't close the socket when
      it transitions into TCP_ESTABLISHED.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b2f21f7d
    • T
      SUNRPC: Handle EADDRNOTAVAIL on connection failures · 1f4c17a0
      Trond Myklebust 提交于
      If the connect attempt immediately fails with an EADDRNOTAVAIL error, then
      that means our choice of source port number was bad.
      This error is expected when we set the SO_REUSEPORT socket option and we
      have 2 sockets sharing the same source and destination address and port
      combinations.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Fixes: 402e23b4 ("SUNRPC: Fix stupid typo in xs_sock_set_reuseport")
      Cc: stable@vger.kernel.org # v4.0+
      1f4c17a0
  9. 25 7月, 2016 1 次提交
  10. 20 7月, 2016 6 次提交
    • K
      xprtrdma: fix semicolon.cocci warnings · 53d78523
      kbuild test robot 提交于
      net/sunrpc/xprtrdma/verbs.c:798:2-3: Unneeded semicolon
      
       Remove unneeded semicolon.
      
      Generated by: scripts/coccinelle/misc/semicolon.cocci
      
      CC: Chuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      53d78523
    • F
      sunrpc: Prevent resvport min/max inversion via sysfs and module parameter · ffb6ca33
      Frank Sorenson 提交于
      The current min/max resvport settings are independently limited
      by the entire range of allowed ports, so max_resvport can be
      set to a port lower than min_resvport.
      
      Prevent inversion of min/max values when set through sysfs and
      module parameter by setting the limits dependent on each other.
      Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ffb6ca33
    • F
      sunrpc: Prevent resvport min/max inversion via sysctl · e08ea3a9
      Frank Sorenson 提交于
      The current min/max resvport settings are independently limited
      by the entire range of allowed ports, so max_resvport can be
      set to a port lower than min_resvport.
      
      Prevent inversion of min/max values when set through sysctl by
      setting the limits dependent on each other.
      Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      e08ea3a9
    • F
      sunrpc: Fix reserved port range calculation · 5d71899a
      Frank Sorenson 提交于
      The range calculation for choosing the random reserved port will panic
      with divide-by-zero when min_resvport == max_resvport, a range of one
      port, not zero.
      
      Fix the reserved port range calculation by adding one to the difference.
      Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      5d71899a
    • F
      sunrpc: Fix bit count when setting hashtable size to power-of-two · 34ae685c
      Frank Sorenson 提交于
      Author: Frank Sorenson <sorenson@redhat.com>
      Date:   2016-06-27 13:55:48 -0500
      
          sunrpc: Fix bit count when setting hashtable size to power-of-two
      
          The hashtable size is incorrectly calculated as the next higher
          power-of-two when being set to a power-of-two.  fls() returns the
          bit number of the most significant set bit, with the least
          significant bit being numbered '1'.  For a power-of-two, fls()
          will return a bit number which is one higher than the number of bits
          required, leading to a hashtable which is twice the requested size.
      
          In addition, the value of (1 << nbits) will always be at least num,
          so the test will never be true.
      
          Fix the hash table size calculation to correctly set hashtable
          size, and eliminate the unnecessary check.
      Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      34ae685c
    • S
      sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags · ce52914e
      Scott Mayhew 提交于
      A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
      not really safe to use the the generic_cred->acred->ac_flags to store
      the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
      KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
      KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
      with the auth_cred to be in a state where they're perpetually doing 4K
      NFS_FILE_SYNC writes.
      
      This can be reproduced as follows:
      
      1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
      They do not need to be the same export, nor do they even need to be from
      the same NFS server.  Also, v3 is fine.
      $ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
      $ sudo mount -o v3,sec=sys server2:/export /mnt/sys
      
      2. As the normal user, before accessing the kerberized mount, kinit with
      a short lifetime (but not so short that renewing the ticket would leave
      you within the 4-minute window again by the time the original ticket
      expires), e.g.
      $ kinit -l 10m -r 60m
      
      3. Do some I/O to the kerberized mount and verify that the writes are
      wsize, UNSTABLE:
      $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
      
      4. Wait until you're within 4 minutes of key expiry, then do some more
      I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
      set.  Verify that the writes are 4K, FILE_SYNC:
      $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
      
      5. Now do some I/O to the sec=sys mount.  This will cause
      RPC_CRED_NO_CRKEY_TIMEOUT to be set:
      $ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1
      
      6. Writes for that user will now be permanently 4K, FILE_SYNC for that
      user, regardless of which mount is being written to, until you reboot
      the client.  Renewing the kerberos ticket (assuming it hasn't already
      expired) will have no effect.  Grabbing a new kerberos ticket at this
      point will have no effect either.
      
      Move the flag to the auth->au_flags field (which is currently unused)
      and rename it slightly to reflect that it's no longer associated with
      the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
      rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
      add the inode to the arg list of nfs_ctx_key_to_expire so we can
      determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ce52914e
  11. 16 7月, 2016 1 次提交
  12. 14 7月, 2016 10 次提交
  13. 12 7月, 2016 8 次提交