1. 13 5月, 2010 1 次提交
  2. 12 5月, 2010 9 次提交
    • S
      ceph: preserve seq # on requeued messages after transient transport errors · e84346b7
      Sage Weil 提交于
      If the tcp connection drops and we reconnect to reestablish a stateful
      session (with the mds), we need to resend previously sent (and possibly
      received) messages with the _same_ seq # so that they can be dropped on
      the other end if needed.  Only assign a new seq once after the message is
      queued.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e84346b7
    • S
      ceph: fix cap removal races · f818a736
      Sage Weil 提交于
      The iterate_session_caps helper traverses the session caps list and tries
      to grab an inode reference.  However, the __ceph_remove_cap was clearing
      the inode backpointer _before_ removing itself from the session list,
      causing a null pointer dereference.
      
      Clear cap->ci under protection of s_cap_lock to avoid the race, and to
      tightly couple the list and backpointer state.  Use a local flag to
      indicate whether we are releasing the cap, as cap->session may be modified
      by a racing thread in iterate_session_caps.
      Signed-off-by: NSage Weil <sage@newdream.net>
      f818a736
    • R
      revert "procfs: provide stack information for threads" and its fixup commits · 34441427
      Robin Holt 提交于
      Originally, commit d899bf7b ("procfs: provide stack information for
      threads") attempted to introduce a new feature for showing where the
      threadstack was located and how many pages are being utilized by the
      stack.
      
      Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
      applied to fix the NO_MMU case.
      
      Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
      64-bit") was applied to fix a bug in ia32 executables being loaded.
      
      Commit 9ebd4eba ("procfs: fix /proc/<pid>/stat stack pointer for kernel
      threads") was applied to fix a bug which had kernel threads printing a
      userland stack address.
      
      Commit 1306d603 ('proc: partially revert "procfs: provide stack
      information for threads"') was then applied to revert the stack pages
      being used to solve a significant performance regression.
      
      This patch nearly undoes the effect of all these patches.
      
      The reason for reverting these is it provides an unusable value in
      field 28.  For x86_64, a fork will result in the task->stack_start
      value being updated to the current user top of stack and not the stack
      start address.  This unpredictability of the stack_start value makes
      it worthless.  That includes the intended use of showing how much stack
      space a thread has.
      
      Other architectures will get different values.  As an example, ia64
      gets 0.  The do_fork() and copy_process() functions appear to treat the
      stack_start and stack_size parameters as architecture specific.
      
      I only partially reverted c44972f1 ("procfs: disable per-task stack usage
      on NOMMU") .  If I had completely reverted it, I would have had to change
      mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
      configured.  Since I could not test the builds without significant effort,
      I decided to not change mm/Makefile.
      
      I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
      information for threads on 64-bit") .  I left the KSTK_ESP() change in
      place as that seemed worthwhile.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34441427
    • S
      ceph: zero unused message header, footer fields · 45c6ceb5
      Sage Weil 提交于
      We shouldn't leak any prior memory contents to other parties.  And random
      data, particularly in the 'version' field, can cause problems down the
      line.
      Signed-off-by: NSage Weil <sage@newdream.net>
      45c6ceb5
    • D
      CacheFiles: Fix occasional EIO on call to vfs_unlink() · c61ea31d
      David Howells 提交于
      Fix an occasional EIO returned by a call to vfs_unlink():
      
      	[ 4868.465413] CacheFiles: I/O Error: Unlink failed
      	[ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error
      	[ 4947.320011] CacheFiles: File cache on md3 unregistering
      	[ 4947.320041] FS-Cache: Withdrawing cache "mycache"
      	[ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles)
      	[ 5127.348716] CacheFiles: File cache on md3 registered
      	[ 7076.871081] CacheFiles: I/O Error: Unlink failed
      	[ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error
      	[ 7116.780891] CacheFiles: File cache on md3 unregistering
      	[ 7116.780937] FS-Cache: Withdrawing cache "mycache"
      	[ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles)
      	[ 7296.813432] CacheFiles: File cache on md3 registered
      
      What happens is this:
      
       (1) A cached NFS file is seen to have become out of date, so NFS retires the
           object and immediately acquires a new object with the same key.
      
       (2) Retirement of the old object is done asynchronously - so the lookup/create
           to generate the new object may be done first.
      
           This can be a problem as the old object and the new object must exist at
           the same point in the backing filesystem (i.e. they must have the same
           pathname).
      
       (3) The lookup for the new object sees that a backing file already exists,
           checks to see whether it is valid and sees that it isn't.  It then deletes
           that file and creates a new one on disk.
      
       (4) The retirement phase for the old file is then performed.  It tries to
           delete the dentry it has, but ext4_unlink() returns -EIO because the inode
           attached to that dentry no longer matches the inode number associated with
           the filename in the parent directory.
      
      The trace below shows this quite well.
      
      	[md5sum] ==> __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1)
      	[md5sum] ==> __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100)
      
      NFS has retired the old cookie and asked for a new one.
      
      	[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24})
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_DYING]
      	[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_INIT,0})
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_LOOKING_UP]
      	[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_DYING,24})
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_RECYCLING]
      
      The old object (OBJ52) is going through the terminal states to get rid of it,
      whilst the new object - (OBJ53) - is coming into being.
      
      	[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0})
      	[kslowd] ==> cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,)
      	[kslowd] lookup '@68'
      	[kslowd] next -> ffff88002ce41bd0 positive
      	[kslowd] advance
      	[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
      	[kslowd] next -> ffff8800369faac8 positive
      
      The new object has looked up the subdir in which the file would be in (getting
      dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry
      ffff8800369faac8).
      
      	[kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
      	[kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
      	[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
      	[kslowd] unlink stale object
      	[kslowd] <== cachefiles_bury_object() = 0
      
      It then checks the file's xattrs to see if it's valid.  NFS says that the
      auxiliary data indicate the file is out of date (obvious to us - that's why NFS
      ditched the old version and got a new one).  CacheFiles then deletes the old
      file (dentry ffff8800369faac8).
      
      	[kslowd] redo lookup
      	[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
      	[kslowd] next -> ffff88002cd94288 negative
      	[kslowd] create -> ffff88002cd94288{ffff88002cdaf238{ino=148247}}
      
      CacheFiles then redoes the lookup and gets a negative result in a new dentry
      (ffff88002cd94288) which it then creates a file for.
      
      	[kslowd] ==> cachefiles_mark_object_active(,OBJ53)
      	[kslowd] <== cachefiles_mark_object_active() = 0
      	[kslowd] === OBTAINED_OBJECT ===
      	[kslowd] <== cachefiles_walk_to_object() = 0 [148247]
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_AVAILABLE]
      
      The new object is then marked active and the state machine moves to the
      available state - at which point NFS can start filling the object.
      
      	[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_RECYCLING,20})
      	[kslowd] ==> fscache_release_object()
      	[kslowd] ==> cachefiles_drop_object({OBJ52,2})
      	[kslowd] ==> cachefiles_delete_object(,OBJ52{ffff8800369faac8})
      
      The old object, meanwhile, goes on with being retired.  If allocation occurs
      first, cachefiles_delete_object() has to wait for dir->d_inode->i_mutex to
      become available before it can continue.
      
      	[kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
      	[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
      	[kslowd] unlink stale object
      	EXT4-fs warning (device sda6): ext4_unlink: Inode number mismatch in unlink (148247!=148193)
      	CacheFiles: I/O Error: Unlink failed
      	FS-Cache: Cache cachefiles stopped due to I/O error
      
      CacheFiles then tries to delete the file for the old object, but the dentry it
      has (ffff8800369faac8) no longer points to a valid inode for that directory
      entry, and so ext4_unlink() returns -EIO when de->inode does not match i_ino.
      
      	[kslowd] <== cachefiles_bury_object() = -5
      	[kslowd] <== cachefiles_delete_object() = -5
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_DEAD]
      	[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0})
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_ACTIVE]
      
      (Note that the above trace includes extra information beyond that produced by
      the upstream code).
      
      The fix is to note when an object that is being retired has had its object
      deleted preemptively by a replacement object that is being created, and to
      skip the second removal attempt in such a case.
      Reported-by: NGreg M <gregm@servu.net.au>
      Reported-by: NMark Moseley <moseleymark@gmail.com>
      Reported-by: NRomain DEGEZ <romain.degez@smartjog.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c61ea31d
    • S
      ceph: fix locking for waking session requests after reconnect · 9abf82b8
      Sage Weil 提交于
      The session->s_waiting list is protected by mdsc->mutex, not s_mutex.  This
      was causing (rare) s_waiting list corruption.
      
      Fix errors paths too, while we're here.  A more thorough cleanup of this
      function is coming soon.
      Signed-off-by: NSage Weil <sage@newdream.net>
      9abf82b8
    • S
      ceph: resubmit requests on pg mapping change (not just primary change) · d85b7056
      Sage Weil 提交于
      OSD requests need to be resubmitted on any pg mapping change, not just when
      the pg primary changes.  Resending only when the primary changes results in
      occasional 'hung' requests during osd cluster recovery or rebalancing.
      Signed-off-by: NSage Weil <sage@newdream.net>
      d85b7056
    • S
      ceph: fix open file counting on snapped inodes when mds returns no caps · 04d000eb
      Sage Weil 提交于
      It's possible the MDS will not issue caps on a snapped inode, in which case
      an open request may not __ceph_get_fmode(), botching the open file
      counting.  (This is actually a server bug, but the client shouldn't BUG out
      in this case.)
      Signed-off-by: NSage Weil <sage@newdream.net>
      04d000eb
    • S
      ceph: unregister osd request on failure · 0ceed5db
      Sage Weil 提交于
      The osd request wasn't being unregistered when the osd returned a failure
      code, even though the result was returned to the caller.  This would cause
      it to eventually time out, and then crash the kernel when it tried to
      resend the request using a stale page vector.
      Signed-off-by: NSage Weil <sage@newdream.net>
      0ceed5db
  3. 11 5月, 2010 1 次提交
  4. 06 5月, 2010 1 次提交
    • S
      ceph: don't use writeback_control in writepages completion · 54ad023b
      Sage Weil 提交于
      The ->writepages writeback_control is not still valid in the writepages
      completion.  We were touching it solely to adjust pages_skipped when there
      was a writeback error (EIO, ENOSPC, EPERM due to bad osd credentials),
      causing an oops in the writeback code shortly thereafter.  Updating
      pages_skipped on error isn't correct anyway, so let's just rip out this
      (clearly broken) code to pass the wbc to the completion.
      Signed-off-by: NSage Weil <sage@newdream.net>
      54ad023b
  5. 05 5月, 2010 1 次提交
  6. 04 5月, 2010 12 次提交
  7. 03 5月, 2010 1 次提交
    • R
      nilfs2: fix sync silent failure · 973bec34
      Ryusuke Konishi 提交于
      As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set.
      And nilfs does not set s_bdi anywhere.  I noticed this problem by the
      warning introduced by the recent commit 5129a469 ("Catch filesystem
      lacking s_bdi").
      
       WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e()
       Hardware name: PowerEdge 2850
       Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas
       Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38
       Call Trace:
        [<c1028422>] warn_slowpath_common+0x60/0x90
        [<c102845f>] warn_slowpath_null+0xd/0x10
        [<c1095936>] vfs_kern_mount+0xc5/0x14e
        [<c1095a03>] do_kern_mount+0x32/0xbd
        [<c10a811e>] do_mount+0x671/0x6d0
        [<c1073794>] ? __get_free_pages+0x1f/0x21
        [<c10a684f>] ? copy_mount_options+0x2b/0xe2
        [<c107b634>] ? strndup_user+0x48/0x67
        [<c10a81de>] sys_mount+0x61/0x8f
        [<c100280c>] sysenter_do_call+0x12/0x32
      
      This ensures to set s_bdi for nilfs and fixes the sync silent failure.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      973bec34
  8. 02 5月, 2010 2 次提交
    • D
      NFS: Fix RCU issues in the NFSv4 delegation code · 17d2c0a0
      David Howells 提交于
      Fix a number of RCU issues in the NFSv4 delegation code.
      
       (1) delegation->cred doesn't need to be RCU protected as it's essentially an
           invariant refcounted structure.
      
           By the time we get to nfs_free_delegation(), the delegation is being
           released, so no one else should be attempting to use the saved
           credentials, and they can be cleared.
      
           However, since the list of delegations could still be under traversal at
           this point by such as nfs_client_return_marked_delegations(), the cred
           should be released in nfs_do_free_delegation() rather than in
           nfs_free_delegation().  Simply using rcu_assign_pointer() to clear it is
           insufficient as that doesn't stop the cred from being destroyed, and nor
           does calling put_rpccred() after call_rcu(), given that the latter is
           asynchronous.
      
       (2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use
           rcu_derefence_protected() because they can only be called if
           nfs_client::cl_lock is held, and that guards against anyone changing
           nfsi->delegation under it.  Furthermore, the barrier imposed by
           rcu_dereference() is superfluous, given that the spin_lock() is also a
           barrier.
      
       (3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client
           struct so that it can issue lockdep advice based on clp->cl_lock for (2).
      
       (4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation()
           should use rcu_access_pointer() outside the spinlocked region as they
           merely examine the pointer and don't follow it, thus rendering unnecessary
           the need to impose a partial ordering over the one item of interest.
      
           These result in an RCU warning like the following:
      
      [ INFO: suspicious rcu_dereference_check() usage. ]
      ---------------------------------------------------
      fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      2 locks held by mount.nfs4/2281:
       #0:  (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80
       #1:  (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a
      
      stack backtrace:
      Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110
      Call Trace:
       [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
       [<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs]
       [<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs]
       [<ffffffff810c2d92>] clear_inode+0x9e/0xf8
       [<ffffffff810c3028>] dispose_list+0x67/0x10e
       [<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a
       [<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4
       [<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f
       [<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs]
       [<ffffffff810b25bc>] deactivate_super+0x68/0x80
       [<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8
       [<ffffffff810c681b>] release_mounts+0x9a/0xb0
       [<ffffffff810c689b>] put_mnt_ns+0x6a/0x79
       [<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs]
       [<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs]
       [<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs]
       [<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs]
       [<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177
       [<ffffffff810b2176>] do_kern_mount+0x48/0xe8
       [<ffffffff810c810b>] do_mount+0x782/0x7f9
       [<ffffffff810c8205>] sys_mount+0x83/0xbe
       [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b
      
      Also on:
      
      fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection!
       [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
       [<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs]
       [<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs]
       [<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs]
       ...
      
      And:
      
      fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection!
       [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
       [<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs]
       [<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs]
       [<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs]
       [<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs]
       ...
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      17d2c0a0
    • T
      NFSv4: Fix the locking in nfs_inode_reclaim_delegation() · 8f649c37
      Trond Myklebust 提交于
      Ensure that we correctly rcu-dereference the delegation itself, and that we
      protect against removal while we're changing the contents.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8f649c37
  9. 01 5月, 2010 2 次提交
  10. 30 4月, 2010 3 次提交
  11. 29 4月, 2010 5 次提交
  12. 28 4月, 2010 2 次提交