1. 26 1月, 2011 4 次提交
    • C
      NFS: Prevent memory allocation failure in nfsacl_encode() · f61f6da0
      Chuck Lever 提交于
      nfsacl_encode() allocates memory in certain cases.  This of course
      is not guaranteed to work.
      
      Since commit 9f06c719 "SUNRPC: New xdr_streams XDR encoder API", the
      kernel's XDR encoders can't return a result indicating possibly a
      failure, so a memory allocation failure in nfsacl_encode() has become
      fatal (ie, the XDR code Oopses) in some cases.
      
      However, the allocated memory is a tiny fixed amount, on the order
      of 40-50 bytes.  We can easily use a stack-allocated buffer for
      this, with only a wee bit of nose-holding.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f61f6da0
    • C
      NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!" · ee5dc773
      Chuck Lever 提交于
      Milan Broz <mbroz@redhat.com> reports:
      
      > on today Linus' tree I get OOps if using nfs.
      >
      > server (2.6.36) exports dir:
      > /dir   172.16.1.0/24(rw,async,all_squash,no_subtree_check,anonuid=500,anongid=500)
      >
      > on client it is mounted  in fstab
      > server:/dir  /mnt/tst  nfs  rw,soft 0 0
      >
      > and these commands OOpses it (simplified from a configure script):
      >
      > cd /dir
      > touch x
      > install x y
      >
      > [  105.327701] ------------[ cut here ]------------
      > [  105.327979] kernel BUG at fs/nfs/nfs3xdr.c:1338!
      > [  105.328075] invalid opcode: 0000 [#1] PREEMPT SMP
      > [  105.328223] last sysfs file: /sys/devices/virtual/bdi/0:16/uevent
      > [  105.328349] Modules linked in: usbcore dm_mod
      > [  105.328553]
      > [  105.328678] Pid: 3710, comm: install Not tainted 2.6.37+ #423 440BX Desktop Reference Platform/VMware Virtual Platform
      > [  105.328853] EIP: 0060:[<c116c06c>] EFLAGS: 00010282 CPU: 0
      > [  105.329152] EIP is at nfs3_xdr_enc_setacl3args+0x61/0x98
      > [  105.329249] EAX: ffffffea EBX: ce941d98 ECX: 00000000 EDX: 00000004
      > [  105.329340] ESI: ce941cd0 EDI: 000000a4 EBP: ce941cc0 ESP: ce941cb4
      > [  105.329431]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      > [  105.329525] Process install (pid: 3710, ti=ce940000 task=ced36f20 task.ti=ce940000)
      > [  105.336600] Stack:
      > [  105.336693]  ce941cd0 ce9dc000 00000000 ce941cf8 c12ecd02 c12f43e0 c116c00b cf754158
      > [  105.336982]  ce9dc004 cf754284 ce9dc004 cf7ffee8 ceff9978 ce9dc000 cf7ffee8 ce9dc000
      > [  105.337182]  ce9dc000 ce941d14 c12e698d cf75412c ce941d98 cf7ffee8 cf7fff20 00000000
      > [  105.337405] Call Trace:
      > [  105.337695]  [<c12ecd02>] rpcauth_wrap_req+0x75/0x7f
      > [  105.337806]  [<c12f43e0>] ? xdr_encode_opaque+0x12/0x15
      > [  105.337898]  [<c116c00b>] ? nfs3_xdr_enc_setacl3args+0x0/0x98
      > [  105.337988]  [<c12e698d>] call_transmit+0x17e/0x1e8
      > [  105.338072]  [<c12ec307>] __rpc_execute+0x6d/0x1a6
      > [  105.338155]  [<c12ec474>] rpc_execute+0x34/0x37
      > [  105.338235]  [<c12e738d>] rpc_run_task+0xb5/0xbd
      > [  105.338316]  [<c12e7474>] rpc_call_sync+0x3d/0x58
      > [  105.338402]  [<c116d0c6>] nfs3_proc_setacls+0x18e/0x24f
      > [  105.338493]  [<c10b3f76>] ? __kmalloc+0x148/0x1c4
      > [  105.338579]  [<c10ecd01>] ? posix_acl_alloc+0x12/0x22
      > [  105.338665]  [<c116d5c8>] nfs3_proc_setacl+0xa0/0xca
      > [  105.338748]  [<c116d69c>] nfs3_setxattr+0x62/0x88
      > [  105.338834]  [<c1317042>] ? sub_preempt_count+0x7c/0x89
      > [  105.338926]  [<c116d63a>] ? nfs3_setxattr+0x0/0x88
      > [  105.339026]  [<c10cfa79>] __vfs_setxattr_noperm+0x26/0x95
      > [  105.339114]  [<c10cfb43>] vfs_setxattr+0x5b/0x76
      > [  105.339211]  [<c10cfbfb>] setxattr+0x9d/0xc3
      > [  105.339298]  [<c10a2ea8>] ? handle_pte_fault+0x258/0x5cb
      > [  105.339428]  [<c1091ff6>] ? __free_pages+0x1a/0x23
      > [  105.339517]  [<c10498ea>] ? up_read+0x16/0x2c
      > [  105.339599]  [<c10b8365>] ? fget+0x0/0xa3
      > [  105.339677]  [<c10b8365>] ? fget+0x0/0xa3
      > [  105.339760]  [<c1025d23>] ? get_parent_ip+0xb/0x31
      > [  105.339843]  [<c1317042>] ? sub_preempt_count+0x7c/0x89
      > [  105.339931]  [<c10cfc72>] sys_fsetxattr+0x51/0x79
      > [  105.340014]  [<c1002853>] sysenter_do_call+0x12/0x32
      > [  105.340133] Code: 2e 76 18 00 58 31 d2 8b 7f 28 f6 43 04 01 74 03 8b 53 08 6a 00 8b 46 04 6a 01 8b 0b 52 89 fa e8 85 10 f8 ff 83 c4 0c 85 c0 79 04 <0f> 0b eb fe 31 c9 f6 43 04 04 74 03 8b 4b 0c 68 00 10 00 00 8d
      > [  105.350321] EIP: [<c116c06c>] nfs3_xdr_enc_setacl3args+0x61/0x98 SS:ESP 0068:ce941cb4
      > [  105.364385] ---[ end trace 01fcfe7f0f7f6e4a ]---
      
      nfs3_xdr_enc_setacl3args() is not properly setting up the target
      buffer before nfsacl_encode() attempts to encode the ACL.
      
      Introduced by commit d9c407b1 "NFS: Introduce new-style XDR encoding
      functions for NFSv3."
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ee5dc773
    • C
      NFS: Fix "kernel BUG at fs/aio.c:554!" · 839f7ad6
      Chuck Lever 提交于
      Nick Piggin reports:
      
      > I'm getting use after frees in aio code in NFS
      >
      > [ 2703.396766] Call Trace:
      > [ 2703.396858]  [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
      > [ 2703.396959]  [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
      > [ 2703.397058]  [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
      > [ 2703.397159]  [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
      > [ 2703.397260]  [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
      > [ 2703.397361]  [<ffffffff81039701>] ? get_parent_ip+0x11/0x50
      > [ 2703.397464]  [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
      > [ 2703.397564]  [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
      > [ 2703.397662]  [<ffffffff811627db>] aio_put_req+0x2b/0x60
      > [ 2703.397761]  [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
      > [ 2703.397895]  [<ffffffff81164d0b>] sys_io_submit+0xb/0x10
      > [ 2703.397995]  [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
      >
      > Adding some tracing, it is due to nfs completing the request then
      > returning something other than -EIOCBQUEUED, so aio.c
      > also completes the request.
      
      To address this, prevent the NFS direct I/O engine from completing
      async iocbs when the forward path returns an error without starting
      any I/O.
      
      This fix appears to survive ^C during both "xfstest no. 208" and "fsx
      -Z."
      
      It's likely this bug has existed for a very long while, as we are seeing
      very similar symptoms in OEL 5.  Copying stable.
      
      Cc: Stable <stable@kernel.org>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      839f7ad6
    • J
      NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds(). · ad3d2eed
      Jesper Juhl 提交于
      On Mon, 17 Jan 2011, Mi Jinlong wrote:
      
      >
      >
      > Jesper Juhl:
      > > strrchr() can return NULL if nothing is found. If this happens we'll
      > > dereference a NULL pointer in
      > > fs/nfs/nfs4filelayoutdev.c::decode_and_add_ds().
      > >
      > > I tried to find some other code that guarantees that this can never
      > > happen but I was unsuccessful. So, unless someone else can point to some
      > > code that ensures this can never be a problem, I believe this patch is
      > > needed.
      > >
      > > While I was changing this code I also noticed that all the dprintk()
      > > statements, except one, start with "%s:". The one missing the ":" I added
      > > it to.
      >
      >   Maybe another one also should be changed at decode_and_add_ds() at line 243:
      >
      >    243  printk("%s Decoded address and port %s\n", __func__, buf);
      >
      Missed that one. Thanks.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      ad3d2eed
  2. 20 1月, 2011 1 次提交
  3. 16 1月, 2011 3 次提交
    • D
      Unexport do_add_mount() and add in follow_automount(), not ->d_automount() · ea5b778a
      David Howells 提交于
      Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
      added rather than calling do_add_mount() itself.  follow_automount() will then
      do the addition.
      
      This slightly complicates things as ->d_automount() normally wants to add the
      new vfsmount to an expiration list and start an expiration timer.  The problem
      with that is that the vfsmount will be deleted if it has a refcount of 1 and
      the timer will not repeat if the expiration list is empty.
      
      To this end, we require the vfsmount to be returned from d_automount() with a
      refcount of (at least) 2.  One of these refs will be dropped unconditionally.
      In addition, follow_automount() must get a 3rd ref around the call to
      do_add_mount() lest it eat a ref and return an error, leaving the mount we
      have open to being expired as we would otherwise have only 1 ref on it.
      
      d_automount() should also add the the vfsmount to the expiration list (by
      calling mnt_set_expiry()) and start the expiration timer before returning, if
      this mechanism is to be used.  The vfsmount will be unlinked from the
      expiration list by follow_automount() if do_add_mount() fails.
      
      This patch also fixes the call to do_add_mount() for AFS to propagate the mount
      flags from the parent vfsmount.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ea5b778a
    • D
      NFS: Use d_automount() rather than abusing follow_link() · 36d43a43
      David Howells 提交于
      Make NFS use the new d_automount() dentry operation rather than abusing
      follow_link() on directories.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      36d43a43
    • D
      Add a dentry op to allow processes to be held during pathwalk transit · cc53ce53
      David Howells 提交于
      Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
      sleep when it tries to transit away from one of that filesystem's directories
      during a pathwalk.  The operation is keyed off a new dentry flag
      (DCACHE_MANAGE_TRANSIT).
      
      The filesystem is allowed to be selective about which processes it holds and
      which it permits to continue on or prohibits from transiting from each flagged
      directory.  This will allow autofs to hold up client processes whilst letting
      its userspace daemon through to maintain the directory or the stuff behind it
      or mounted upon it.
      
      The ->d_manage() dentry operation:
      
      	int (*d_manage)(struct path *path, bool mounting_here);
      
      takes a pointer to the directory about to be transited away from and a flag
      indicating whether the transit is undertaken by do_add_mount() or
      do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.
      
      It should return 0 if successful and to let the process continue on its way;
      -EISDIR to prohibit the caller from skipping to overmounted filesystems or
      automounting, and to use this directory; or some other error code to return to
      the user.
      
      ->d_manage() is called with namespace_sem writelocked if mounting_here is true
      and no other locks held, so it may sleep.  However, if mounting_here is true,
      it may not initiate or wait for a mount or unmount upon the parameter
      directory, even if the act is actually performed by userspace.
      
      Within fs/namei.c, follow_managed() is extended to check with d_manage() first
      on each managed directory, before transiting away from it or attempting to
      automount upon it.
      
      follow_down() is renamed follow_down_one() and should only be used where the
      filesystem deliberately intends to avoid management steps (e.g. autofs).
      
      A new follow_down() is added that incorporates the loop done by all other
      callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
      and CIFS do use it, their use is removed by converting them to use
      d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
      also takes an extra parameter to indicate if it is being called from mount code
      (with namespace_sem writelocked) which it passes to d_manage().  follow_down()
      ignores automount points so that it can be used to mount on them.
      
      __follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
      DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
      sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
      that determine whether to abort or not itself.  That would allow the autofs
      daemon to continue on in rcu-walk mode.
      
      Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
      required as every tranist from that directory will cause d_manage() to be
      invoked.  It can always be set again when necessary.
      
      ==========================
      WHAT THIS MEANS FOR AUTOFS
      ==========================
      
      Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
      trigger the automounting of indirect mounts, and both of these can be called
      with i_mutex held.
      
      autofs knows that the i_mutex will be held by the caller in lookup(), and so
      can drop it before invoking the daemon - but this isn't so for d_revalidate(),
      since the lock is only held on _some_ of the code paths that call it.  This
      means that autofs can't risk dropping i_mutex from its d_revalidate() function
      before it calls the daemon.
      
      The bug could manifest itself as, for example, a process that's trying to
      validate an automount dentry that gets made to wait because that dentry is
      expired and needs cleaning up:
      
      	mkdir         S ffffffff8014e05a     0 32580  24956
      	Call Trace:
      	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
      	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
      	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
      	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
      	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
      	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
      	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
      	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4
      
      versus the automount daemon which wants to remove that dentry, but can't
      because the normal process is holding the i_mutex lock:
      
      	automount     D ffffffff8014e05a     0 32581      1              32561
      	Call Trace:
      	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
      	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
      	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
      	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
      	 [<ffffffff8005d229>] tracesys+0x71/0xe0
      	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0
      
      which means that the system is deadlocked.
      
      This patch allows autofs to hold up normal processes whilst the daemon goes
      ahead and does things to the dentry tree behind the automouter point without
      risking a deadlock as almost no locks are held in d_manage() and none in
      d_automount().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Was-Acked-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cc53ce53
  4. 14 1月, 2011 2 次提交
  5. 13 1月, 2011 1 次提交
  6. 12 1月, 2011 1 次提交
  7. 11 1月, 2011 1 次提交
    • T
      NFS: Don't use vm_map_ram() in readdir · 6650239a
      Trond Myklebust 提交于
      vm_map_ram() is not available on NOMMU platforms, and causes trouble
      on incoherrent architectures such as ARM when we access the page data
      through both the direct and the virtual mapping.
      
      The alternative is to use the direct mapping to access page data
      for the case when we are not crossing a page boundary, but to copy
      the data into a linear scratch buffer when we are accessing data
      that spans page boundaries.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Tested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Cc: stable@kernel.org  [2.6.37]
      6650239a
  8. 07 1月, 2011 27 次提交