1. 16 4月, 2013 1 次提交
    • J
      nfsd4: implement SEQ4_STATUS_RECALLABLE_STATE_REVOKED · 3bd64a5b
      J. Bruce Fields 提交于
      A 4.1 server must notify a client that has had any state revoked using
      the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag.  The client can figure
      out exactly which state is the problem using CHECK_STATEID and then free
      it using FREE_STATEID.  The status flag will be unset once all such
      revoked stateids are freed.
      
      Our server's only recallable state is delegations.  So we keep with each
      4.1 client a list of delegations that have timed out and been recalled,
      but haven't yet been freed by FREE_STATEID.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      3bd64a5b
  2. 10 4月, 2013 4 次提交
  3. 09 4月, 2013 6 次提交
  4. 08 4月, 2013 1 次提交
    • J
      nfsd4: cleanup handling of nfsv4.0 closed stateid's · 9411b1d4
      J. Bruce Fields 提交于
      Closed stateid's are kept around a little while to handle close replays
      in the 4.0 case.  So we stash them in the last-used stateid in the
      oo_last_closed_stateid field of the open owner.  We can free that in
      encode_seqid_op_tail once the seqid on the open owner is next
      incremented.  But we don't want to do that on the close itself; so we
      set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
      first time through encode_seqid_op_tail, then when we see that flag set
      next time we free it.
      
      This is unnecessarily baroque.
      
      Instead, just move the logic that increments the seqid out of the xdr
      code and into the operation code itself.
      
      The justification given for the current placement is that we need to
      wait till the last minute to be sure we know whether the status is a
      sequence-id-mutating error or not, but examination of the code shows
      that can't actually happen.
      Reported-by: NYanchuan Nian <ycnian@gmail.com>
      Tested-by: NYanchuan Nian <ycnian@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9411b1d4
  5. 05 4月, 2013 4 次提交
  6. 04 4月, 2013 3 次提交
  7. 03 4月, 2013 21 次提交
    • J
      nfsd4: don't destroy in-use session · 66b2b9b2
      J. Bruce Fields 提交于
      This changes session destruction to be similar to client destruction in
      that attempts to destroy a session while in use (which should be rare
      corner cases) result in DELAY.  This simplifies things somewhat and
      helps meet a coming 4.2 requirement.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      66b2b9b2
    • J
      nfsd4: don't destroy in-use clients · 221a6876
      J. Bruce Fields 提交于
      When a setclientid_confirm or create_session confirms a client after a
      client reboot, it also destroys any previous state held by that client.
      
      The shutdown of that previous state must be careful not to free the
      client out from under threads processing other requests that refer to
      the client.
      
      This is a particular problem in the NFSv4.1 case when we hold a
      reference to a session (hence a client) throughout compound processing.
      
      The server attempts to handle this by unhashing the client at the time
      it's destroyed, then delaying the final free to the end.  But this still
      leaves some races in the current code.
      
      I believe it's simpler just to fail the attempt to destroy the client by
      returning NFS4ERR_DELAY.  This is a case that should never happen
      anyway.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      221a6876
    • J
      nfsd4: simplify bind_conn_to_session locking · 4f6e6c17
      J. Bruce Fields 提交于
      The locking here is very fiddly, and there's no reason for us to be
      setting cstate->session, since this is the only op in the compound.
      Let's just take the state lock and drop the reference counting.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      4f6e6c17
    • J
      nfsd4: fix destroy_session race · abcdff09
      J. Bruce Fields 提交于
      destroy_session uses the session and client without continuously holding
      any reference or locks.
      
      Put the whole thing under the state lock for now.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      abcdff09
    • J
      nfsd4: clientid lookup cleanup · bfa85e83
      J. Bruce Fields 提交于
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bfa85e83
    • J
      nfsd4: destroy_clientid simplification · c0293b01
      J. Bruce Fields 提交于
      I'm not sure what the check for clientid expiry was meant to do here.
      
      The check for a matching session is redundant given the previous check
      for state: a client without state is, in particular, a client without
      sessions.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      c0293b01
    • J
      nfsd4: remove some dprintk's · 1ca50792
      J. Bruce Fields 提交于
      E.g. printk's that just report the return value from an op are
      uninteresting as we already do that in the main proc_compound loop.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1ca50792
    • J
      nfsd4: STALE_STATEID cleanup · 0eb6f20a
      J. Bruce Fields 提交于
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0eb6f20a
    • J
      nfsd4: warn on odd create_session state · 78389046
      J. Bruce Fields 提交于
      This should never happen.
      
      (Note: the comparable case in setclientid_confirm *can* happen, since
      updating a client record can result in both confirmed and unconfirmed
      records with the same clientid.)
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      78389046
    • Y
      nfsd: fix bug on nfs4 stateid deallocation · 491402a7
      ycnian@gmail.com 提交于
      NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4
      stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(),
      but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in
      THIS close procedure may be freed immediately in the coming encoding function.
      Sorry that Signed-off-by was forgotten in last version.
      Signed-off-by: NYanchuan Nian <ycnian@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      491402a7
    • Y
      nfsd: remove unused macro in nfsv4 · 9c6bdbb8
      Yanchuan Nian 提交于
      lk_rflags is never used anywhere, and rflags is not defined in struct
      nfsd4_lock.
      Signed-off-by: NYanchuan Nian <ycnian@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9c6bdbb8
    • J
      nfsd4: fix use-after-free of 4.1 client on connection loss · 2e4b7239
      J. Bruce Fields 提交于
      Once we drop the lock here there's nothing keeping the client around:
      the only lock still held is the xpt_lock on this socket, but this socket
      no longer has any connection with the client so there's no way for other
      code to know we're still using the client.
      
      The solution is simple: all nfsd4_probe_callback does is set a few
      variables and queue some work, so there's no reason we can't just keep
      it under the lock.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2e4b7239
    • J
      nfsd4: fix race on client shutdown · b0a9d3ab
      J. Bruce Fields 提交于
      Dropping the session's reference count after the client's means we leave
      a window where the session's se_client pointer is NULL.  An xpt_user
      callback that encounters such a session may then crash:
      
      [  303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318
      [  303.959061] IP: [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
      [  303.959061] PGD 37811067 PUD 3d498067 PMD 0
      [  303.959061] Oops: 0002 [#8] PREEMPT SMP
      [  303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
      [  303.959061] CPU 0
      [  303.959061] Pid: 264, comm: nfsd Tainted: G      D      3.8.0-ARCH+ #156 Bochs Bochs
      [  303.959061] RIP: 0010:[<ffffffff81481a8e>]  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
      [  303.959061] RSP: 0018:ffff880037877dd8  EFLAGS: 00010202
      [  303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278
      [  303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318
      [  303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002
      [  303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [  303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00
      [  303.959061] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      [  303.959061] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0
      [  303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0)
      [  303.959061] Stack:
      [  303.959061]  ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
      [  303.959061]  ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
      [  303.959061]  0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
      [  303.959061] Call Trace:
      [  303.959061]  [<ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
      [  303.959061]  [<ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc]
      [  303.959061]  [<ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc]
      [  303.959061]  [<ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd]
      [  303.959061]  [<ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd]
      [  303.959061]  [<ffffffff8107ae00>] kthread+0xc0/0xd0
      [  303.959061]  [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
      [  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
      [  303.959061]  [<ffffffff814898ec>] ret_from_fork+0x7c/0xb0
      [  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
      [  303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
      [  303.959061] RIP  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
      [  303.959061]  RSP <ffff880037877dd8>
      [  303.959061] CR2: 0000000000000318
      [  304.001218] ---[ end trace 2d809cd4a7931f5a ]---
      [  304.001903] note: nfsd[264] exited with preempt_count 2
      Reported-by: NBryan Schumaker <bjschuma@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b0a9d3ab
    • J
      nfsd4: handle seqid-mutating open errors from xdr decoding · 9d313b17
      J. Bruce Fields 提交于
      If a client sets an owner (or group_owner or acl) attribute on open for
      create, and the mapping of that owner to an id fails, then we return
      BAD_OWNER.  But BAD_OWNER is a seqid-mutating error, so we can't
      shortcut the open processing that case: we have to at least look up the
      owner so we can find the seqid to bump.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9d313b17
    • J
      nfsd4: remove BUG_ON · b600de7a
      J. Bruce Fields 提交于
      This BUG_ON just crashes the thread a little earlier than it would
      otherwise--it doesn't seem useful.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b600de7a
    • J
      nfsd: scale up the number of DRC hash buckets with cache size · 0733c7ba
      Jeff Layton 提交于
      We've now increased the size of the duplicate reply cache by quite a
      bit, but the number of hash buckets has not changed. So, we've gone from
      an average hash chain length of 16 in the old code to 4096 when the
      cache is its largest. Change the code to scale out the number of buckets
      with the max size of the cache.
      
      At the same time, we also need to fix the hash function since the
      existing one isn't really suitable when there are more than 256 buckets.
      Move instead to use the stock hash_32 function for this. Testing on a
      machine that had 2048 buckets showed that this gave a smaller
      longest:average ratio than the existing hash function:
      
      The formula here is longest hash bucket searched divided by average
      number of entries per bucket at the time that we saw that longest
      bucket:
      
          old hash: 68/(39258/2048) == 3.547404
          hash_32:  45/(33773/2048) == 2.728807
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0733c7ba
    • J
      nfsd: keep stats on worst hash balancing seen so far · 98d821bd
      Jeff Layton 提交于
      The typical case with the DRC is a cache miss, so if we keep track of
      the max number of entries that we've ever walked over in a search, then
      we should have a reasonable estimate of the longest hash chain that
      we've ever seen.
      
      With that, we'll also keep track of the total size of the cache when we
      see the longest chain. In the case of a tie, we prefer to track the
      smallest total cache size in order to properly gauge the worst-case
      ratio of max vs. avg chain length.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      98d821bd
    • J
      nfsd: add new reply_cache_stats file in nfsdfs · a2f999a3
      Jeff Layton 提交于
      For presenting statistics relating to duplicate reply cache.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a2f999a3
    • J
      nfsd: track memory utilization by the DRC · 6c6910cd
      Jeff Layton 提交于
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      6c6910cd
    • J
      nfsd: break out comparator into separate function · 9dc56143
      Jeff Layton 提交于
      Break out the function that compares the rqstp and checksum against a
      reply cache entry. While we're at it, track the efficacy of the checksum
      over the NFS data by tracking the cases where we would have incorrectly
      matched a DRC entry if we had not tracked it or the length.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9dc56143
    • J
      nfsd: eliminate one of the DRC cache searches · 0b9ea37f
      Jeff Layton 提交于
      The most common case is to do a search of the cache, followed by an
      insert. In the case where we have to allocate an entry off the slab,
      then we end up having to redo the search, which is wasteful.
      
      Better optimize the code for the common case by eliminating the initial
      search of the cache and always preallocating an entry. In the case of a
      cache hit, we'll end up just freeing that entry but that's preferable to
      an extra search.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0b9ea37f