1. 04 4月, 2013 2 次提交
  2. 03 4月, 2013 21 次提交
    • J
      nfsd4: don't destroy in-use session · 66b2b9b2
      J. Bruce Fields 提交于
      This changes session destruction to be similar to client destruction in
      that attempts to destroy a session while in use (which should be rare
      corner cases) result in DELAY.  This simplifies things somewhat and
      helps meet a coming 4.2 requirement.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      66b2b9b2
    • J
      nfsd4: don't destroy in-use clients · 221a6876
      J. Bruce Fields 提交于
      When a setclientid_confirm or create_session confirms a client after a
      client reboot, it also destroys any previous state held by that client.
      
      The shutdown of that previous state must be careful not to free the
      client out from under threads processing other requests that refer to
      the client.
      
      This is a particular problem in the NFSv4.1 case when we hold a
      reference to a session (hence a client) throughout compound processing.
      
      The server attempts to handle this by unhashing the client at the time
      it's destroyed, then delaying the final free to the end.  But this still
      leaves some races in the current code.
      
      I believe it's simpler just to fail the attempt to destroy the client by
      returning NFS4ERR_DELAY.  This is a case that should never happen
      anyway.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      221a6876
    • J
      nfsd4: simplify bind_conn_to_session locking · 4f6e6c17
      J. Bruce Fields 提交于
      The locking here is very fiddly, and there's no reason for us to be
      setting cstate->session, since this is the only op in the compound.
      Let's just take the state lock and drop the reference counting.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      4f6e6c17
    • J
      nfsd4: fix destroy_session race · abcdff09
      J. Bruce Fields 提交于
      destroy_session uses the session and client without continuously holding
      any reference or locks.
      
      Put the whole thing under the state lock for now.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      abcdff09
    • J
      nfsd4: clientid lookup cleanup · bfa85e83
      J. Bruce Fields 提交于
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bfa85e83
    • J
      nfsd4: destroy_clientid simplification · c0293b01
      J. Bruce Fields 提交于
      I'm not sure what the check for clientid expiry was meant to do here.
      
      The check for a matching session is redundant given the previous check
      for state: a client without state is, in particular, a client without
      sessions.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      c0293b01
    • J
      nfsd4: remove some dprintk's · 1ca50792
      J. Bruce Fields 提交于
      E.g. printk's that just report the return value from an op are
      uninteresting as we already do that in the main proc_compound loop.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1ca50792
    • J
      nfsd4: STALE_STATEID cleanup · 0eb6f20a
      J. Bruce Fields 提交于
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0eb6f20a
    • J
      nfsd4: warn on odd create_session state · 78389046
      J. Bruce Fields 提交于
      This should never happen.
      
      (Note: the comparable case in setclientid_confirm *can* happen, since
      updating a client record can result in both confirmed and unconfirmed
      records with the same clientid.)
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      78389046
    • Y
      nfsd: fix bug on nfs4 stateid deallocation · 491402a7
      ycnian@gmail.com 提交于
      NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4
      stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(),
      but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in
      THIS close procedure may be freed immediately in the coming encoding function.
      Sorry that Signed-off-by was forgotten in last version.
      Signed-off-by: NYanchuan Nian <ycnian@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      491402a7
    • Y
      nfsd: remove unused macro in nfsv4 · 9c6bdbb8
      Yanchuan Nian 提交于
      lk_rflags is never used anywhere, and rflags is not defined in struct
      nfsd4_lock.
      Signed-off-by: NYanchuan Nian <ycnian@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9c6bdbb8
    • J
      nfsd4: fix use-after-free of 4.1 client on connection loss · 2e4b7239
      J. Bruce Fields 提交于
      Once we drop the lock here there's nothing keeping the client around:
      the only lock still held is the xpt_lock on this socket, but this socket
      no longer has any connection with the client so there's no way for other
      code to know we're still using the client.
      
      The solution is simple: all nfsd4_probe_callback does is set a few
      variables and queue some work, so there's no reason we can't just keep
      it under the lock.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2e4b7239
    • J
      nfsd4: fix race on client shutdown · b0a9d3ab
      J. Bruce Fields 提交于
      Dropping the session's reference count after the client's means we leave
      a window where the session's se_client pointer is NULL.  An xpt_user
      callback that encounters such a session may then crash:
      
      [  303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318
      [  303.959061] IP: [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
      [  303.959061] PGD 37811067 PUD 3d498067 PMD 0
      [  303.959061] Oops: 0002 [#8] PREEMPT SMP
      [  303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
      [  303.959061] CPU 0
      [  303.959061] Pid: 264, comm: nfsd Tainted: G      D      3.8.0-ARCH+ #156 Bochs Bochs
      [  303.959061] RIP: 0010:[<ffffffff81481a8e>]  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
      [  303.959061] RSP: 0018:ffff880037877dd8  EFLAGS: 00010202
      [  303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278
      [  303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318
      [  303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002
      [  303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [  303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00
      [  303.959061] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      [  303.959061] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0
      [  303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0)
      [  303.959061] Stack:
      [  303.959061]  ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
      [  303.959061]  ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
      [  303.959061]  0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
      [  303.959061] Call Trace:
      [  303.959061]  [<ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
      [  303.959061]  [<ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc]
      [  303.959061]  [<ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc]
      [  303.959061]  [<ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd]
      [  303.959061]  [<ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd]
      [  303.959061]  [<ffffffff8107ae00>] kthread+0xc0/0xd0
      [  303.959061]  [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
      [  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
      [  303.959061]  [<ffffffff814898ec>] ret_from_fork+0x7c/0xb0
      [  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
      [  303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
      [  303.959061] RIP  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
      [  303.959061]  RSP <ffff880037877dd8>
      [  303.959061] CR2: 0000000000000318
      [  304.001218] ---[ end trace 2d809cd4a7931f5a ]---
      [  304.001903] note: nfsd[264] exited with preempt_count 2
      Reported-by: NBryan Schumaker <bjschuma@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b0a9d3ab
    • J
      nfsd4: handle seqid-mutating open errors from xdr decoding · 9d313b17
      J. Bruce Fields 提交于
      If a client sets an owner (or group_owner or acl) attribute on open for
      create, and the mapping of that owner to an id fails, then we return
      BAD_OWNER.  But BAD_OWNER is a seqid-mutating error, so we can't
      shortcut the open processing that case: we have to at least look up the
      owner so we can find the seqid to bump.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9d313b17
    • J
      nfsd4: remove BUG_ON · b600de7a
      J. Bruce Fields 提交于
      This BUG_ON just crashes the thread a little earlier than it would
      otherwise--it doesn't seem useful.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b600de7a
    • J
      nfsd: scale up the number of DRC hash buckets with cache size · 0733c7ba
      Jeff Layton 提交于
      We've now increased the size of the duplicate reply cache by quite a
      bit, but the number of hash buckets has not changed. So, we've gone from
      an average hash chain length of 16 in the old code to 4096 when the
      cache is its largest. Change the code to scale out the number of buckets
      with the max size of the cache.
      
      At the same time, we also need to fix the hash function since the
      existing one isn't really suitable when there are more than 256 buckets.
      Move instead to use the stock hash_32 function for this. Testing on a
      machine that had 2048 buckets showed that this gave a smaller
      longest:average ratio than the existing hash function:
      
      The formula here is longest hash bucket searched divided by average
      number of entries per bucket at the time that we saw that longest
      bucket:
      
          old hash: 68/(39258/2048) == 3.547404
          hash_32:  45/(33773/2048) == 2.728807
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0733c7ba
    • J
      nfsd: keep stats on worst hash balancing seen so far · 98d821bd
      Jeff Layton 提交于
      The typical case with the DRC is a cache miss, so if we keep track of
      the max number of entries that we've ever walked over in a search, then
      we should have a reasonable estimate of the longest hash chain that
      we've ever seen.
      
      With that, we'll also keep track of the total size of the cache when we
      see the longest chain. In the case of a tie, we prefer to track the
      smallest total cache size in order to properly gauge the worst-case
      ratio of max vs. avg chain length.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      98d821bd
    • J
      nfsd: add new reply_cache_stats file in nfsdfs · a2f999a3
      Jeff Layton 提交于
      For presenting statistics relating to duplicate reply cache.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a2f999a3
    • J
      nfsd: track memory utilization by the DRC · 6c6910cd
      Jeff Layton 提交于
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      6c6910cd
    • J
      nfsd: break out comparator into separate function · 9dc56143
      Jeff Layton 提交于
      Break out the function that compares the rqstp and checksum against a
      reply cache entry. While we're at it, track the efficacy of the checksum
      over the NFS data by tracking the cases where we would have incorrectly
      matched a DRC entry if we had not tracked it or the length.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9dc56143
    • J
      nfsd: eliminate one of the DRC cache searches · 0b9ea37f
      Jeff Layton 提交于
      The most common case is to do a search of the cache, followed by an
      insert. In the case where we have to allocate an entry off the slab,
      then we end up having to redo the search, which is wasteful.
      
      Better optimize the code for the common case by eliminating the initial
      search of the cache and always preallocating an entry. In the case of a
      cache hit, we'll end up just freeing that entry but that's preferable to
      an extra search.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0b9ea37f
  3. 27 3月, 2013 1 次提交
  4. 23 3月, 2013 1 次提交
  5. 19 3月, 2013 2 次提交
  6. 14 3月, 2013 2 次提交
  7. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  8. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  9. 26 2月, 2013 2 次提交
  10. 24 2月, 2013 1 次提交
  11. 23 2月, 2013 1 次提交
  12. 18 2月, 2013 1 次提交
    • J
      nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum · 56edc86b
      Jeff Layton 提交于
      kbuild test robot says:
      
      tree:   git://linux-nfs.org/~bfields/linux.git for-3.9
      head:   deb4534f
      commit: 01a7decf [32/44] nfsd: keep a checksum of the first 256 bytes of request
      config: i386-randconfig-x088 (attached as .config)
      
      All warnings:
      
         fs/nfsd/nfscache.c: In function 'nfsd_cache_csum':
      >> fs/nfsd/nfscache.c:266:9: warning: comparison of distinct pointer types lacks a cast [enabled by default]
      
      vim +266 fs/nfsd/nfscache.c
      
         250		__wsum csum;
         251		struct xdr_buf *buf = &rqstp->rq_arg;
         252		const unsigned char *p = buf->head[0].iov_base;
         253		size_t csum_len = min_t(size_t, buf->head[0].iov_len + buf->page_len,
         254					RC_CSUMLEN);
         255		size_t len = min(buf->head[0].iov_len, csum_len);
         256
         257		/* rq_arg.head first */
         258		csum = csum_partial(p, len, 0);
         259		csum_len -= len;
         260
         261		/* Continue into page array */
         262		idx = buf->page_base / PAGE_SIZE;
         263		base = buf->page_base & ~PAGE_MASK;
         264		while (csum_len) {
         265			p = page_address(buf->pages[idx]) + base;
       > 266			len = min(PAGE_SIZE - base, csum_len);
         267			csum = csum_partial(p, len, csum);
         268			csum_len -= len;
         269			base = 0;
         270			++idx;
         271		}
         272		return csum;
         273	}
         274
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      56edc86b
  13. 16 2月, 2013 4 次提交