1. 04 4月, 2013 2 次提交
  2. 03 4月, 2013 21 次提交
    • J
      nfsd4: don't destroy in-use session · 66b2b9b2
      J. Bruce Fields 提交于
      This changes session destruction to be similar to client destruction in
      that attempts to destroy a session while in use (which should be rare
      corner cases) result in DELAY.  This simplifies things somewhat and
      helps meet a coming 4.2 requirement.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      66b2b9b2
    • J
      nfsd4: don't destroy in-use clients · 221a6876
      J. Bruce Fields 提交于
      When a setclientid_confirm or create_session confirms a client after a
      client reboot, it also destroys any previous state held by that client.
      
      The shutdown of that previous state must be careful not to free the
      client out from under threads processing other requests that refer to
      the client.
      
      This is a particular problem in the NFSv4.1 case when we hold a
      reference to a session (hence a client) throughout compound processing.
      
      The server attempts to handle this by unhashing the client at the time
      it's destroyed, then delaying the final free to the end.  But this still
      leaves some races in the current code.
      
      I believe it's simpler just to fail the attempt to destroy the client by
      returning NFS4ERR_DELAY.  This is a case that should never happen
      anyway.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      221a6876
    • J
      nfsd4: simplify bind_conn_to_session locking · 4f6e6c17
      J. Bruce Fields 提交于
      The locking here is very fiddly, and there's no reason for us to be
      setting cstate->session, since this is the only op in the compound.
      Let's just take the state lock and drop the reference counting.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      4f6e6c17
    • J
      nfsd4: fix destroy_session race · abcdff09
      J. Bruce Fields 提交于
      destroy_session uses the session and client without continuously holding
      any reference or locks.
      
      Put the whole thing under the state lock for now.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      abcdff09
    • J
      nfsd4: clientid lookup cleanup · bfa85e83
      J. Bruce Fields 提交于
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bfa85e83
    • J
      nfsd4: destroy_clientid simplification · c0293b01
      J. Bruce Fields 提交于
      I'm not sure what the check for clientid expiry was meant to do here.
      
      The check for a matching session is redundant given the previous check
      for state: a client without state is, in particular, a client without
      sessions.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      c0293b01
    • J
      nfsd4: remove some dprintk's · 1ca50792
      J. Bruce Fields 提交于
      E.g. printk's that just report the return value from an op are
      uninteresting as we already do that in the main proc_compound loop.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1ca50792
    • J
      nfsd4: STALE_STATEID cleanup · 0eb6f20a
      J. Bruce Fields 提交于
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0eb6f20a
    • J
      nfsd4: warn on odd create_session state · 78389046
      J. Bruce Fields 提交于
      This should never happen.
      
      (Note: the comparable case in setclientid_confirm *can* happen, since
      updating a client record can result in both confirmed and unconfirmed
      records with the same clientid.)
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      78389046
    • Y
      nfsd: fix bug on nfs4 stateid deallocation · 491402a7
      ycnian@gmail.com 提交于
      NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4
      stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(),
      but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in
      THIS close procedure may be freed immediately in the coming encoding function.
      Sorry that Signed-off-by was forgotten in last version.
      Signed-off-by: NYanchuan Nian <ycnian@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      491402a7
    • Y
      nfsd: remove unused macro in nfsv4 · 9c6bdbb8
      Yanchuan Nian 提交于
      lk_rflags is never used anywhere, and rflags is not defined in struct
      nfsd4_lock.
      Signed-off-by: NYanchuan Nian <ycnian@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9c6bdbb8
    • J
      nfsd4: fix use-after-free of 4.1 client on connection loss · 2e4b7239
      J. Bruce Fields 提交于
      Once we drop the lock here there's nothing keeping the client around:
      the only lock still held is the xpt_lock on this socket, but this socket
      no longer has any connection with the client so there's no way for other
      code to know we're still using the client.
      
      The solution is simple: all nfsd4_probe_callback does is set a few
      variables and queue some work, so there's no reason we can't just keep
      it under the lock.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2e4b7239
    • J
      nfsd4: fix race on client shutdown · b0a9d3ab
      J. Bruce Fields 提交于
      Dropping the session's reference count after the client's means we leave
      a window where the session's se_client pointer is NULL.  An xpt_user
      callback that encounters such a session may then crash:
      
      [  303.956011] BUG: unable to handle kernel NULL pointer dereference at 0000000000000318
      [  303.959061] IP: [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
      [  303.959061] PGD 37811067 PUD 3d498067 PMD 0
      [  303.959061] Oops: 0002 [#8] PREEMPT SMP
      [  303.959061] Modules linked in: md5 nfsd auth_rpcgss nfs_acl snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc microcode psmouse snd_timer serio_raw pcspkr evdev snd soundcore i2c_piix4 i2c_core intel_agp intel_gtt processor button nfs lockd sunrpc fscache ata_generic pata_acpi ata_piix uhci_hcd libata btrfs usbcore usb_common crc32c scsi_mod libcrc32c zlib_deflate floppy virtio_balloon virtio_net virtio_pci virtio_blk virtio_ring virtio
      [  303.959061] CPU 0
      [  303.959061] Pid: 264, comm: nfsd Tainted: G      D      3.8.0-ARCH+ #156 Bochs Bochs
      [  303.959061] RIP: 0010:[<ffffffff81481a8e>]  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
      [  303.959061] RSP: 0018:ffff880037877dd8  EFLAGS: 00010202
      [  303.959061] RAX: 0000000000000100 RBX: ffff880037a2b698 RCX: ffff88003d879278
      [  303.959061] RDX: ffff88003d879278 RSI: dead000000100100 RDI: 0000000000000318
      [  303.959061] RBP: ffff880037877dd8 R08: ffff88003c5a0f00 R09: 0000000000000002
      [  303.959061] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
      [  303.959061] R13: 0000000000000318 R14: ffff880037a2b680 R15: ffff88003c1cbe00
      [  303.959061] FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      [  303.959061] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  303.959061] CR2: 0000000000000318 CR3: 000000003d49c000 CR4: 00000000000006f0
      [  303.959061] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  303.959061] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  303.959061] Process nfsd (pid: 264, threadinfo ffff880037876000, task ffff88003c1fd0a0)
      [  303.959061] Stack:
      [  303.959061]  ffff880037877e08 ffffffffa03772ec ffff88003d879000 ffff88003d879278
      [  303.959061]  ffff88003d879080 0000000000000000 ffff880037877e38 ffffffffa0222a1f
      [  303.959061]  0000000000107ac0 ffff88003c22e000 ffff88003d879000 ffff88003c1cbe00
      [  303.959061] Call Trace:
      [  303.959061]  [<ffffffffa03772ec>] nfsd4_conn_lost+0x3c/0xa0 [nfsd]
      [  303.959061]  [<ffffffffa0222a1f>] svc_delete_xprt+0x10f/0x180 [sunrpc]
      [  303.959061]  [<ffffffffa0223d96>] svc_recv+0xe6/0x580 [sunrpc]
      [  303.959061]  [<ffffffffa03587c5>] nfsd+0xb5/0x140 [nfsd]
      [  303.959061]  [<ffffffffa0358710>] ? nfsd_destroy+0x90/0x90 [nfsd]
      [  303.959061]  [<ffffffff8107ae00>] kthread+0xc0/0xd0
      [  303.959061]  [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0x50/0x100
      [  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
      [  303.959061]  [<ffffffff814898ec>] ret_from_fork+0x7c/0xb0
      [  303.959061]  [<ffffffff8107ad40>] ? kthread_freezable_should_stop+0x70/0x70
      [  303.959061] Code: ff ff 5d c3 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 65 48 8b 04 25 f0 c6 00 00 48 89 e5 83 80 44 e0 ff ff 01 b8 00 01 00 00 <3e> 66 0f c1 07 0f b6 d4 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f
      [  303.959061] RIP  [<ffffffff81481a8e>] _raw_spin_lock+0x1e/0x40
      [  303.959061]  RSP <ffff880037877dd8>
      [  303.959061] CR2: 0000000000000318
      [  304.001218] ---[ end trace 2d809cd4a7931f5a ]---
      [  304.001903] note: nfsd[264] exited with preempt_count 2
      Reported-by: NBryan Schumaker <bjschuma@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b0a9d3ab
    • J
      nfsd4: handle seqid-mutating open errors from xdr decoding · 9d313b17
      J. Bruce Fields 提交于
      If a client sets an owner (or group_owner or acl) attribute on open for
      create, and the mapping of that owner to an id fails, then we return
      BAD_OWNER.  But BAD_OWNER is a seqid-mutating error, so we can't
      shortcut the open processing that case: we have to at least look up the
      owner so we can find the seqid to bump.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9d313b17
    • J
      nfsd4: remove BUG_ON · b600de7a
      J. Bruce Fields 提交于
      This BUG_ON just crashes the thread a little earlier than it would
      otherwise--it doesn't seem useful.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b600de7a
    • J
      nfsd: scale up the number of DRC hash buckets with cache size · 0733c7ba
      Jeff Layton 提交于
      We've now increased the size of the duplicate reply cache by quite a
      bit, but the number of hash buckets has not changed. So, we've gone from
      an average hash chain length of 16 in the old code to 4096 when the
      cache is its largest. Change the code to scale out the number of buckets
      with the max size of the cache.
      
      At the same time, we also need to fix the hash function since the
      existing one isn't really suitable when there are more than 256 buckets.
      Move instead to use the stock hash_32 function for this. Testing on a
      machine that had 2048 buckets showed that this gave a smaller
      longest:average ratio than the existing hash function:
      
      The formula here is longest hash bucket searched divided by average
      number of entries per bucket at the time that we saw that longest
      bucket:
      
          old hash: 68/(39258/2048) == 3.547404
          hash_32:  45/(33773/2048) == 2.728807
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0733c7ba
    • J
      nfsd: keep stats on worst hash balancing seen so far · 98d821bd
      Jeff Layton 提交于
      The typical case with the DRC is a cache miss, so if we keep track of
      the max number of entries that we've ever walked over in a search, then
      we should have a reasonable estimate of the longest hash chain that
      we've ever seen.
      
      With that, we'll also keep track of the total size of the cache when we
      see the longest chain. In the case of a tie, we prefer to track the
      smallest total cache size in order to properly gauge the worst-case
      ratio of max vs. avg chain length.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      98d821bd
    • J
      nfsd: add new reply_cache_stats file in nfsdfs · a2f999a3
      Jeff Layton 提交于
      For presenting statistics relating to duplicate reply cache.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a2f999a3
    • J
      nfsd: track memory utilization by the DRC · 6c6910cd
      Jeff Layton 提交于
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      6c6910cd
    • J
      nfsd: break out comparator into separate function · 9dc56143
      Jeff Layton 提交于
      Break out the function that compares the rqstp and checksum against a
      reply cache entry. While we're at it, track the efficacy of the checksum
      over the NFS data by tracking the cases where we would have incorrectly
      matched a DRC entry if we had not tracked it or the length.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9dc56143
    • J
      nfsd: eliminate one of the DRC cache searches · 0b9ea37f
      Jeff Layton 提交于
      The most common case is to do a search of the cache, followed by an
      insert. In the case where we have to allocate an entry off the slab,
      then we end up having to redo the search, which is wasteful.
      
      Better optimize the code for the common case by eliminating the initial
      search of the cache and always preallocating an entry. In the case of a
      cache hit, we'll end up just freeing that entry but that's preferable to
      an extra search.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0b9ea37f
  3. 27 3月, 2013 1 次提交
  4. 23 3月, 2013 1 次提交
  5. 19 3月, 2013 2 次提交
  6. 16 3月, 2013 1 次提交
    • L
      Btrfs: fix warning of free_extent_map · 3b277594
      Liu Bo 提交于
      Users report that an extent map's list is still linked when it's actually
      going to be freed from cache.
      
      The story is that
      
      a) when we're going to drop an extent map and may split this large one into
      smaller ems, and if this large one is flagged as EXTENT_FLAG_LOGGING which means
      that it's on the list to be logged, then the smaller ems split from it will also
      be flagged as EXTENT_FLAG_LOGGING, and this is _not_ expected.
      
      b) we'll keep ems from unlinking the list and freeing when they are flagged with
      EXTENT_FLAG_LOGGING, because the log code holds one reference.
      
      The end result is the warning, but the truth is that we set the flag
      EXTENT_FLAG_LOGGING only during fsync.
      
      So clear flag EXTENT_FLAG_LOGGING for extent maps split from a large one.
      Reported-by: NJohannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Reported-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      3b277594
  7. 15 3月, 2013 6 次提交
  8. 14 3月, 2013 2 次提交
  9. 13 3月, 2013 3 次提交
    • J
      ext2: Fix BUG_ON in evict() on inode deletion · c288d296
      Jan Kara 提交于
      Commit 8e3dffc6 introduced a regression where deleting inode with
      large extended attributes leads to triggering
        BUG_ON(inode->i_state != (I_FREEING | I_CLEAR))
      in fs/inode.c:evict(). That happens because freeing of xattr block
      dirtied the inode and it happened after clear_inode() has been called.
      
      Fix the issue by moving removal of xattr block into ext2_evict_inode()
      before clear_inode() call close to a place where data blocks are
      truncated. That is also more logical place and removes surprising
      requirement that ext2_free_blocks() mustn't dirty the inode.
      Reported-by: NTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      c288d296
    • E
      fs: Readd the fs module aliases. · fa7614dd
      Eric W. Biederman 提交于
      I had assumed that the only use of module aliases for filesystems
      prior to "fs: Limit sys_mount to only request filesystem modules."
      was in request_module.  It turns out I was wrong.  At least mkinitcpio
      in Arch linux uses these aliases.
      
      So readd the preexising aliases, to keep from breaking userspace.
      
      Userspace eventually will have to follow and use the same aliases the
      kernel does.  So at some point we may be delete these aliases without
      problems.  However that day is not today.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      fa7614dd
    • M
      Fix: compat_rw_copy_check_uvector() misuse in aio, readv, writev, and security keys · 8aec0f5d
      Mathieu Desnoyers 提交于
      Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
      compat_process_vm_rw() shows that the compatibility code requires an
      explicit "access_ok()" check before calling
      compat_rw_copy_check_uvector(). The same difference seems to appear when
      we compare fs/read_write.c:do_readv_writev() to
      fs/compat.c:compat_do_readv_writev().
      
      This subtle difference between the compat and non-compat requirements
      should probably be debated, as it seems to be error-prone. In fact,
      there are two others sites that use this function in the Linux kernel,
      and they both seem to get it wrong:
      
      Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
      also ends up calling compat_rw_copy_check_uvector() through
      aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
      be missing. Same situation for
      security/keys/compat.c:compat_keyctl_instantiate_key_iov().
      
      I propose that we add the access_ok() check directly into
      compat_rw_copy_check_uvector(), so callers don't have to worry about it,
      and it therefore makes the compat call code similar to its non-compat
      counterpart. Place the access_ok() check in the same location where
      copy_from_user() can trigger a -EFAULT error in the non-compat code, so
      the ABI behaviors are alike on both compat and non-compat.
      
      While we are here, fix compat_do_readv_writev() so it checks for
      compat_rw_copy_check_uvector() negative return values.
      
      And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
      handling.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8aec0f5d
  10. 12 3月, 2013 1 次提交
    • A
      vfs: fix pipe counter breakage · a930d879
      Al Viro 提交于
      If you open a pipe for neither read nor write, the pipe code will not
      add any usage counters to the pipe, causing the 'struct pipe_inode_info"
      to be potentially released early.
      
      That doesn't normally matter, since you cannot actually use the pipe,
      but the pipe release code - particularly fasync handling - still expects
      the actual pipe infrastructure to all be there.  And rather than adding
      NULL pointer checks, let's just disallow this case, the same way we
      already do for the named pipe ("fifo") case.
      
      This is ancient going back to pre-2.4 days, and until trinity, nobody
      naver noticed.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a930d879