1. 20 11月, 2009 9 次提交
    • D
      FS-Cache: Fix lock misorder in fscache_write_op() · 1bccf513
      David Howells 提交于
      FS-Cache has two structs internally for keeping track of the internal state of
      a cached file: the fscache_cookie struct, which represents the netfs's state,
      and fscache_object struct, which represents the cache's state.  Each has a
      pointer that points to the other (when both are in existence), and each has a
      spinlock for pointer maintenance.
      
      Since netfs operations approach these structures from the cookie side, they get
      the cookie lock first, then the object lock.  Cache operations, on the other
      hand, approach from the object side, and get the object lock first.  It is not
      then permitted for a cache operation to get the cookie lock whilst it is
      holding the object lock lest deadlock occur; instead, it must do one of two
      things:
      
       (1) increment the cookie usage counter, drop the object lock and then get both
           locks in order, or
      
       (2) simply hold the object lock as certain parts of the cookie may not be
           altered whilst the object lock is held.
      
      It is also not permitted to follow either pointer without holding the lock at
      the end you start with.  To break the pointers between the cookie and the
      object, both locks must be held.
      
      fscache_write_op(), however, violates the locking rules: It attempts to get the
      cookie lock without (a) checking that the cookie pointer is a valid pointer,
      and (b) holding the object lock to protect the cookie pointer whilst it follows
      it.  This is so that it can access the pending page store tree without
      interference from __fscache_write_page().
      
      This is fixed by splitting the cookie lock, such that the page store tracking
      tree is protected by its own lock, and checking that the cookie pointer is
      non-NULL before we attempt to follow it whilst holding the object lock.
      
      The new lock is subordinate to both the cookie lock and the object lock, and so
      should be taken after those.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      1bccf513
    • D
      FS-Cache: The object-available state can't rely on the cookie to be available · 6897e3df
      David Howells 提交于
      The object-available state in the object processing state machine (as
      processed by fscache_object_available()) can't rely on the cookie to be
      available because the FSCACHE_COOKIE_CREATING bit may have been cleared by
      fscache_obtained_object() prior to the object being put into the
      FSCACHE_OBJECT_AVAILABLE state.
      
      Clearing the FSCACHE_COOKIE_CREATING bit on a cookie permits
      __fscache_relinquish_cookie() to proceed and detach the cookie from the
      object.
      
      To deal with this, we don't dereference object->cookie in
      fscache_object_available() if the object has already been detached.
      
      In addition, a couple of assertions are added into fscache_drop_object() to
      make sure the object is unbound from the cookie before it gets there.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      6897e3df
    • D
      FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase · 5753c441
      David Howells 提交于
      Permit the operations to retrieve data from the cache or to allocate space in
      the cache for future writes to be interrupted whilst they're waiting for
      permission for the operation to proceed.  Typically this wait occurs whilst the
      cache object is being looked up on disk in the background.
      
      If an interruption occurs, and the operation has not yet been given the
      go-ahead to run, the operation is dequeued and cancelled, and control returns
      to the read operation of the netfs routine with none of the requested pages
      having been read or in any way marked as known by the cache.
      
      This means that the initial wait is done interruptibly rather than
      uninterruptibly.
      
      In addition, extra stats values are made available to show the number of ops
      cancelled and the number of cache space allocations interrupted.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5753c441
    • D
      FS-Cache: Use radix tree preload correctly in tracking of pages to be stored · b34df792
      David Howells 提交于
      __fscache_write_page() attempts to load the radix tree preallocation pool for
      the CPU it is on before calling radix_tree_insert(), as the insertion must be
      done inside a pair of spinlocks.
      
      Use of the preallocation pool, however, is contingent on the radix tree being
      initialised without __GFP_WAIT specified.  __fscache_acquire_cookie() was
      passing GFP_NOFS to INIT_RADIX_TREE() - but that includes __GFP_WAIT.
      
      The solution is to AND out __GFP_WAIT.
      
      Additionally, the banner comment to radix_tree_preload() is altered to make
      note of this prerequisite.  Possibly there should be a WARN_ON() too.
      
      Without this fix, I have seen the following recursive deadlock caused by
      radix_tree_insert() attempting to allocate memory inside the spinlocked
      region, which resulted in FS-Cache being called back into to release memory -
      which required the spinlock already held.
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.32-rc6-cachefs #24
      ---------------------------------------------
      nfsiod/7916 is trying to acquire lock:
       (&cookie->lock){+.+.-.}, at: [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache]
      
      but task is already holding lock:
       (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache]
      
      other info that might help us debug this:
      5 locks held by nfsiod/7916:
       #0:  (nfsiod){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2
       #1:  (&task->u.tk_work#2){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2
       #2:  (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache]
       #3:  (&object->lock#2){+.+.-.}, at: [<ffffffffa0076b07>] __fscache_write_page+0x197/0x3f3 [fscache]
       #4:  (&cookie->stores_lock){+.+...}, at: [<ffffffffa0076b0f>] __fscache_write_page+0x19f/0x3f3 [fscache]
      
      stack backtrace:
      Pid: 7916, comm: nfsiod Not tainted 2.6.32-rc6-cachefs #24
      Call Trace:
       [<ffffffff8105ac7f>] __lock_acquire+0x1649/0x16e3
       [<ffffffff81059ded>] ? __lock_acquire+0x7b7/0x16e3
       [<ffffffff8100e27d>] ? dump_trace+0x248/0x257
       [<ffffffff8105ad70>] lock_acquire+0x57/0x6d
       [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache]
       [<ffffffff8135467c>] _spin_lock+0x2c/0x3b
       [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache]
       [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache]
       [<ffffffffa0077eb7>] ? __fscache_check_page_write+0x0/0x71 [fscache]
       [<ffffffffa00b4755>] nfs_fscache_release_page+0x86/0xc4 [nfs]
       [<ffffffffa00907f0>] nfs_release_page+0x3c/0x41 [nfs]
       [<ffffffff81087ffb>] try_to_release_page+0x32/0x3b
       [<ffffffff81092c2b>] shrink_page_list+0x316/0x4ac
       [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70
       [<ffffffff8135451b>] ? _spin_unlock_irq+0x2b/0x31
       [<ffffffff81093153>] shrink_inactive_list+0x392/0x67c
       [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70
       [<ffffffff810934ca>] shrink_list+0x8d/0x8f
       [<ffffffff81093744>] shrink_zone+0x278/0x33c
       [<ffffffff81052c70>] ? ktime_get_ts+0xad/0xba
       [<ffffffff8109453b>] try_to_free_pages+0x22e/0x392
       [<ffffffff8109184c>] ? isolate_pages_global+0x0/0x212
       [<ffffffff8108e16b>] __alloc_pages_nodemask+0x3dc/0x5cf
       [<ffffffff810ae24a>] cache_alloc_refill+0x34d/0x6c1
       [<ffffffff811bcf74>] ? radix_tree_node_alloc+0x52/0x5c
       [<ffffffff810ae929>] kmem_cache_alloc+0xb2/0x118
       [<ffffffff811bcf74>] radix_tree_node_alloc+0x52/0x5c
       [<ffffffff811bcfd5>] radix_tree_insert+0x57/0x19c
       [<ffffffffa0076b53>] __fscache_write_page+0x1e3/0x3f3 [fscache]
       [<ffffffffa00b4248>] __nfs_readpage_to_fscache+0x58/0x11e [nfs]
       [<ffffffffa009bb77>] nfs_readpage_release+0x34/0x9b [nfs]
       [<ffffffffa009c0d9>] nfs_readpage_release_full+0x32/0x4b [nfs]
       [<ffffffffa0006cff>] rpc_release_calldata+0x12/0x14 [sunrpc]
       [<ffffffffa0006e2d>] rpc_free_task+0x59/0x61 [sunrpc]
       [<ffffffffa0006f03>] rpc_async_release+0x10/0x12 [sunrpc]
       [<ffffffff810482e5>] worker_thread+0x1ef/0x2e2
       [<ffffffff81048290>] ? worker_thread+0x19a/0x2e2
       [<ffffffff81352433>] ? thread_return+0x3e/0x101
       [<ffffffffa0006ef3>] ? rpc_async_release+0x0/0x12 [sunrpc]
       [<ffffffff8104bff5>] ? autoremove_wake_function+0x0/0x34
       [<ffffffff81058d25>] ? trace_hardirqs_on+0xd/0xf
       [<ffffffff810480f6>] ? worker_thread+0x0/0x2e2
       [<ffffffff8104bd21>] kthread+0x7a/0x82
       [<ffffffff8100beda>] child_rip+0xa/0x20
       [<ffffffff8100b87c>] ? restore_args+0x0/0x30
       [<ffffffff8104c2b9>] ? add_wait_queue+0x15/0x44
       [<ffffffff8104bca7>] ? kthread+0x0/0x82
       [<ffffffff8100bed0>] ? child_rip+0x0/0x20
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b34df792
    • D
      FS-Cache: Clear netfs pointers in cookie after detaching object, not before · 7e311a20
      David Howells 提交于
      Clear the pointers from the fscache_cookie struct to netfs private data after
      clearing the pointer to the cookie from the fscache_object struct and
      releasing the object lock, rather than before.
      
      This allows the netfs private data pointers to be relied on simply by holding
      the object lock, rather than having to hold the cookie lock.  This is makes
      things simpler as the cookie lock has to be taken before the object lock, but
      sometimes the object pointer is all that the code has.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7e311a20
    • D
      FS-Cache: Add counters for entry/exit to/from cache operation functions · 52bd75fd
      David Howells 提交于
      Count entries to and exits from cache operation table functions.  Maintain
      these as a single counter that's added to or removed from as appropriate.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      52bd75fd
    • D
      FS-Cache: Allow the current state of all objects to be dumped · 4fbf4291
      David Howells 提交于
      Allow the current state of all fscache objects to be dumped by doing:
      
      	cat /proc/fs/fscache/objects
      
      By default, all objects and all fields will be shown.  This can be restricted
      by adding a suitable key to one of the caller's keyrings (such as the session
      keyring):
      
      	keyctl add user fscache:objlist "<restrictions>" @s
      
      The <restrictions> are:
      
      	K	Show hexdump of object key (don't show if not given)
      	A	Show hexdump of object aux data (don't show if not given)
      
      And paired restrictions:
      
      	C	Show objects that have a cookie
      	c	Show objects that don't have a cookie
      	B	Show objects that are busy
      	b	Show objects that aren't busy
      	W	Show objects that have pending writes
      	w	Show objects that don't have pending writes
      	R	Show objects that have outstanding reads
      	r	Show objects that don't have outstanding reads
      	S	Show objects that have slow work queued
      	s	Show objects that don't have slow work queued
      
      If neither side of a restriction pair is given, then both are implied.  For
      example:
      
      	keyctl add user fscache:objlist KB @s
      
      shows objects that are busy, and lists their object keys, but does not dump
      their auxiliary data.  It also implies "CcWwRrSs", but as 'B' is given, 'b' is
      not implied.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4fbf4291
    • D
      FS-Cache: Annotate slow-work runqueue proc lines for FS-Cache work items · 440f0aff
      David Howells 提交于
      Annotate slow-work runqueue proc lines for FS-Cache work items.  Objects
      include the object ID and the state.  Operations include the object ID, the
      operation ID and the operation type and state.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      440f0aff
    • D
      SLOW_WORK: Wait for outstanding work items belonging to a module to clear · 3d7a641e
      David Howells 提交于
      Wait for outstanding slow work items belonging to a module to clear when
      unregistering that module as a user of the facility.  This prevents the put_ref
      code of a work item from being taken away before it returns.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3d7a641e
  2. 18 11月, 2009 4 次提交
  3. 16 11月, 2009 1 次提交
  4. 15 11月, 2009 2 次提交
  5. 13 11月, 2009 1 次提交
    • R
      nilfs2: fix lock order reversal in chcp operation · c1ea985c
      Ryusuke Konishi 提交于
      Will fix the following lock order reversal lockdep detected:
      
      =======================================================
      [ INFO: possible circular locking dependency detected ]
      2.6.32-rc6 #7
      -------------------------------------------------------
      chcp/30157 is trying to acquire lock:
       (&nilfs->ns_mount_mutex){+.+.+.}, at: [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2]
      
      but task is already holding lock:
       (&nilfs->ns_segctor_sem){++++.+}, at: [<fed7ca32>] nilfs_transaction_begin+0xba/0x110 [nilfs2]
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (&nilfs->ns_segctor_sem){++++.+}:
             [<c105799c>] __lock_acquire+0x109c/0x139d
             [<c1057d26>] lock_acquire+0x89/0xa0
             [<c14151e2>] down_read+0x31/0x45
             [<fed6d77b>] nilfs_attach_checkpoint+0x8f/0x16b [nilfs2]
             [<fed6e393>] nilfs_get_sb+0x3e7/0x653 [nilfs2]
             [<c10c0ccb>] vfs_kern_mount+0x8b/0x124
             [<c10c0db2>] do_kern_mount+0x37/0xc3
             [<c10d7517>] do_mount+0x64d/0x69d
             [<c10d75cd>] sys_mount+0x66/0x95
             [<c1002a14>] sysenter_do_call+0x12/0x32
      
      -> #1 (&type->s_umount_key#31/1){+.+.+.}:
             [<c105799c>] __lock_acquire+0x109c/0x139d
             [<c1057d26>] lock_acquire+0x89/0xa0
             [<c104c0f3>] down_write_nested+0x34/0x52
             [<c10c08fe>] sget+0x22e/0x389
             [<fed6e133>] nilfs_get_sb+0x187/0x653 [nilfs2]
             [<c10c0ccb>] vfs_kern_mount+0x8b/0x124
             [<c10c0db2>] do_kern_mount+0x37/0xc3
             [<c10d7517>] do_mount+0x64d/0x69d
             [<c10d75cd>] sys_mount+0x66/0x95
             [<c1002a14>] sysenter_do_call+0x12/0x32
      
      -> #0 (&nilfs->ns_mount_mutex){+.+.+.}:
             [<c1057727>] __lock_acquire+0xe27/0x139d
             [<c1057d26>] lock_acquire+0x89/0xa0
             [<c1414d63>] mutex_lock_nested+0x41/0x23e
             [<fed7cfcc>] nilfs_cpfile_change_cpmode+0x46/0x752 [nilfs2]
             [<fed801b2>] nilfs_ioctl+0x11a/0x7da [nilfs2]
             [<c10cca12>] vfs_ioctl+0x27/0x6e
             [<c10ccf93>] do_vfs_ioctl+0x491/0x4db
             [<c10cd022>] sys_ioctl+0x45/0x5f
             [<c1002a14>] sysenter_do_call+0x12/0x32
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      c1ea985c
  6. 12 11月, 2009 15 次提交
  7. 11 11月, 2009 3 次提交
  8. 09 11月, 2009 1 次提交
    • T
      ext4: partial revert to fix double brelse WARNING() · 1e424a34
      Theodore Ts'o 提交于
      This is a partial revert of commit 6487a9d3 (only the changes made to
      fs/ext4/namei.c), since it is causing the following brelse()
      double-free warning when running fsstress on a file system with 1k
      blocksize and we run into a block allocation failure while converting
      a single-block directory to a multi-block hash-tree indexed directory.
      
      WARNING: at fs/buffer.c:1197 __brelse+0x2e/0x33()
      Hardware name: 
      VFS: brelse: Trying to free free buffer
      Modules linked in:
      Pid: 2226, comm: jbd2/sdd-8 Not tainted 2.6.32-rc6-00577-g0003f55 #101
      Call Trace:
       [<c01587fb>] warn_slowpath_common+0x65/0x95
       [<c0158869>] warn_slowpath_fmt+0x29/0x2c
       [<c021168e>] __brelse+0x2e/0x33
       [<c0288a9f>] jbd2_journal_refile_buffer+0x67/0x6c
       [<c028a9ed>] jbd2_journal_commit_transaction+0x319/0x14d8
       [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60
       [<c0175bcc>] ? sched_clock_cpu+0x12a/0x13e
       [<c017f6b4>] ? trace_hardirqs_off+0xb/0xd
       [<c0175c1f>] ? cpu_clock+0x3f/0x5b
       [<c017f6ec>] ? lock_release_holdtime+0x36/0x137
       [<c0664ad0>] ? _spin_unlock_irqrestore+0x44/0x51
       [<c0180af3>] ? trace_hardirqs_on_caller+0x103/0x124
       [<c0180b1f>] ? trace_hardirqs_on+0xb/0xd
       [<c0164d73>] ? try_to_del_timer_sync+0x58/0x60
       [<c0290d1c>] kjournald2+0x11a/0x310
       [<c017118e>] ? autoremove_wake_function+0x0/0x38
       [<c0290c02>] ? kjournald2+0x0/0x310
       [<c0170ee6>] kthread+0x66/0x6b
       [<c0170e80>] ? kthread+0x0/0x6b
       [<c01251b3>] kernel_thread_helper+0x7/0x10
      ---[ end trace 5579351b86af61e3 ]---
      
      Commit 6487a9d3 was an attempt some buffer head leaks in an ENOSPC
      error path, but in some cases it actually results in an excess ENOSPC,
      as shown above.  Fixing this means cleaning up who is responsible for
      releasing the buffer heads from the callee to the caller of
      add_dirent_to_buf().
      
      Since that's a relatively complex change, and we're late in the rcX
      development cycle, I'm reverting this now, and holding back a more
      complete fix until after 2.6.32 ships.  We've lived with this
      buffer_head leak on ENOSPC in ext3 and ext4 for a very long time; a
      few more months won't kill us.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Curt Wohlgemuth <curtw@google.com>
      1e424a34
  9. 08 11月, 2009 2 次提交
    • R
      nilfs2: fix missing cleanup of gc cache on error cases · c083234f
      Ryusuke Konishi 提交于
      This fixes an -rc1 regression brought by the commit:
      1cf58fa8 ("nilfs2: shorten freeze
      period due to GC in write operation v3").
      
      Although the patch moved out a function call of
      nilfs_ioctl_move_blocks() to nilfs_ioctl_clean_segments() from
      nilfs_ioctl_prepare_clean_segments(), it didn't move corresponding
      cleanup job needed for the error case.
      
      This will move the missing cleanup job to the destination function.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NJiro SEKIBA <jir@unicus.jp>
      c083234f
    • R
      nilfs2: fix kernel oops in error case of nilfs_ioctl_move_blocks · 5399dd1f
      Ryusuke Konishi 提交于
      This fixes a kernel oops reported by Markus Trippelsdorf in the email
      titled "[NILFS users] kernel Oops while running nilfs_cleanerd".
      
      The oops was caused by a bug of error path in
      nilfs_ioctl_move_blocks() function, which was inlined in
      nilfs_ioctl_clean_segments().
      
      nilfs_ioctl_move_blocks checks duplication of blocks which will be
      moved in garbage collection.  But, the check should have be done
      within nilfs_ioctl_move_inode_block() to prevent list corruption among
      buffers storing the target blocks.
      
      To fix the kernel oops, this moves forward the duplication check
      before the list insertion.
      
      I also tested this for stable trees [2.6.30, 2.6.31].
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: stable <stable@kernel.org>
      5399dd1f
  10. 07 11月, 2009 2 次提交
    • J
      cifs: don't use CIFSGetSrvInodeNumber in is_path_accessible · f475f677
      Jeff Layton 提交于
      Because it's lighter weight, CIFS tries to use CIFSGetSrvInodeNumber to
      verify the accessibility of the root inode and then falls back to doing a
      full QPathInfo if that fails with -EOPNOTSUPP. I have at least a report
      of a server that returns NT_STATUS_INTERNAL_ERROR rather than something
      that translates to EOPNOTSUPP.
      
      Rather than trying to be clever with that call, just have
      is_path_accessible do a normal QPathInfo. That call is widely
      supported and it shouldn't increase the overhead significantly.
      
      Cc: Stable <stable@kernel.org>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      f475f677
    • J
      cifs: clean up handling when server doesn't consistently support inode numbers · ec06aedd
      Jeff Layton 提交于
      It's possible that a server will return a valid FileID when we query the
      FILE_INTERNAL_INFO for the root inode, but then zeroed out inode numbers
      when we do a FindFile with an infolevel of
      SMB_FIND_FILE_ID_FULL_DIR_INFO.
      
      In this situation turn off querying for server inode numbers, generate a
      warning for the user and just generate an inode number using iunique.
      Once we generate any inode number with iunique we can no longer use any
      server inode numbers or we risk collisions, so ensure that we don't do
      that in cifs_get_inode_info either.
      
      Cc: Stable <stable@kernel.org>
      Reported-by: NTimothy Normand Miller <theosib@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      ec06aedd