1. 06 5月, 2012 8 次提交
    • J
      vfs: Rename end_writeback() to clear_inode() · dbd5768f
      Jan Kara 提交于
      After we moved inode_sync_wait() from end_writeback() it doesn't make sense
      to call the function end_writeback() anymore. Rename it to clear_inode()
      which well says what the function really does - set I_CLEAR flag.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      dbd5768f
    • J
      vfs: Move waiting for inode writeback from end_writeback() to evict_inode() · 7994e6f7
      Jan Kara 提交于
      Currently, I_SYNC can never be set when evict_inode() (and thus
      end_writeback()) is called because flusher thread holds inode reference while
      inode is under writeback. As a result inode_sync_wait() in those places
      currently does nothing. However that is going to change and unveils problems
      with calling inode_sync_wait() from end_writeback(). Several filesystems call
      end_writeback() after they have deleted the inode (btrfs, gfs2, ...) and other
      filesystems (ext3, ext4, reiserfs, ...) can deadlock when waiting for I_SYNC
      because they call end_writeback() from within a transaction.
      
      To avoid these issues, we move inode_sync_wait() into evict_inode() before
      calling ->evict_inode(). That way we preserve the current property that
      ->evict_inode() and writeback never run in parallel and all filesystems are
      safe.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      7994e6f7
    • J
      writeback: Refactor writeback_single_inode() · 4f8ad655
      Jan Kara 提交于
      The code in writeback_single_inode() is relatively complex. The list requeing
      logic makes sense only for flusher thread but not really for sync_inode() or
      write_inode_now() callers. Also when we want to get rid of inode references
      held by flusher thread, we will need a special I_SYNC handling there.
      
      So separate part of writeback_single_inode() which does the real writeback work
      into __writeback_single_inode() and make writeback_single_inode() do only stuff
      necessary for callers writing only one inode, moving the special list handling
      into writeback_sb_inodes(). As a sideeffect this fixes a possible race where we
      could skip some inode during sync(2) because other writer refiled it from b_io
      to b_dirty list. Also I_SYNC handling is moved into the callers of
      __writeback_single_inode() to make locking easier.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      4f8ad655
    • J
      writeback: Remove wb->list_lock from writeback_single_inode() · f0d07b7f
      Jan Kara 提交于
      writeback_single_inode() doesn't need wb->list_lock for anything on entry now.
      So remove the requirement. This makes locking of writeback_single_inode()
      temporarily awkward (entering with i_lock, returning with i_lock and
      wb->list_lock) but it will be sanitized in the next patch.
      
      Also inode_wait_for_writeback() doesn't need wb->list_lock for anything. It was
      just taking it to make usage convenient for callers but with
      writeback_single_inode() changing it's not very convenient anymore. So remove
      the lock from that function.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      f0d07b7f
    • J
      writeback: Separate inode requeueing after writeback · ccb26b5a
      Jan Kara 提交于
      Move inode requeueing after inode has been written out into a separate
      function.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      ccb26b5a
    • J
      writeback: Move I_DIRTY_PAGES handling · 6290be1c
      Jan Kara 提交于
      Instead of clearing I_DIRTY_PAGES and resetting it when we didn't succeed in
      writing them all, just clear the bit only when we succeeded writing all the
      pages. We also move the clearing of the bit close to other i_state handling to
      separate it from writeback list handling. This is desirable because list
      handling will differ for flusher thread and other writeback_single_inode()
      callers in future. No filesystem plays any tricks with I_DIRTY_PAGES (like
      checking it in ->writepages or ->write_inode implementation) so this movement
      is safe.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      6290be1c
    • J
      writeback: Move requeueing when I_SYNC set to writeback_sb_inodes() · cc1676d9
      Jan Kara 提交于
      When writeback_single_inode() is called on inode which has I_SYNC already
      set while doing WB_SYNC_NONE, inode is moved to b_more_io list. However
      this makes sense only if the caller is flusher thread. For other callers of
      writeback_single_inode() it doesn't really make sense and may be even wrong
      - flusher thread may be doing WB_SYNC_ALL writeback in parallel.
      
      So we move requeueing from writeback_single_inode() to writeback_sb_inodes().
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      cc1676d9
    • J
      writeback: Move clearing of I_SYNC into inode_sync_complete() · 365b94ae
      Jan Kara 提交于
      Move clearing of I_SYNC into inode_sync_complete().  It is more logical to have
      clearing of I_SYNC bit and waking of waiters in one place. Also later we will
      have two places needing to clear I_SYNC and wake up waiters so this allows them
      to use the common helper. Moving of I_SYNC clearing to a later stage of
      writeback_single_inode() is safe since we hold i_lock all the time.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      365b94ae
  2. 13 4月, 2012 7 次提交
  3. 11 4月, 2012 2 次提交
    • D
      sysfs: handle 'parent deleted before child added' · 3a198886
      Dan Williams 提交于
      In scsi at least two cases of the parent device being deleted before the
      child is added have been observed.
      
      1/ scsi is performing async scans and the device is removed prior to the
         async can thread running (can happen with an in-opportune / unlikely
         unplug during initial scan).
      
      2/ libsas discovery event running after the parent port has been torn
         down (this is a bug in libsas).
      
      Result in crash signatures like:
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
       IP: [<ffffffff8115e100>] sysfs_create_dir+0x32/0xb6
       ...
       Process scsi_scan_8 (pid: 5417, threadinfo ffff88080bd16000, task ffff880801b8a0b0)
       Stack:
        00000000fffffffe ffff880813470628 ffff88080bd17cd0 ffff88080614b7e8
        ffff88080b45c108 00000000fffffffe ffff88080bd17d20 ffffffff8125e4a8
        ffff88080bd17cf0 ffffffff81075149 ffff88080bd17d30 ffff88080614b7e8
       Call Trace:
        [<ffffffff8125e4a8>] kobject_add_internal+0x120/0x1e3
        [<ffffffff81075149>] ? trace_hardirqs_on+0xd/0xf
        [<ffffffff8125e641>] kobject_add_varg+0x41/0x50
        [<ffffffff8125e70b>] kobject_add+0x64/0x66
        [<ffffffff8131122b>] device_add+0x12d/0x63a
      
      In this scenario the parent is still valid (because we have a
      reference), but it has been device_del()'d which means its kobj->sd
      pointer is NULL'd via:
      
       device_del()->kobject_del()->sysfs_remove_dir()
      
      ...and then sysfs_create_dir() (without this fix) goes ahead and
      de-references parent_sd via sysfs_ns_type():
      
       return (sd->s_flags & SYSFS_NS_TYPE_MASK) >> SYSFS_NS_TYPE_SHIFT;
      
      This scenario is being fixed in scsi/libsas, but if other subsystems
      present the same ordering the system need not immediately crash.
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: James Bottomley <JBottomley@parallels.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a198886
    • B
      sysfs: Prevent crash on unset sysfs group attributes · 5631f2c1
      Bruno Prémont 提交于
      Do not let the kernel crash when a device is registered with
      sysfs while group attributes are not set (aka NULL).
      
      Warn about the offender with some information about the offending
      device.
      
      This would warn instead of trying NULL pointer deref like:
       BUG: unable to handle kernel NULL pointer dereference at (null)
       IP: [<ffffffff81152673>] internal_create_group+0x83/0x1a0
       PGD 0
       Oops: 0000 [#1] SMP
       CPU 0
       Modules linked in:
      
       Pid: 1, comm: swapper/0 Not tainted 3.4.0-rc1-x86_64 #3 HP ProLiant DL360 G4
       RIP: 0010:[<ffffffff81152673>]  [<ffffffff81152673>] internal_create_group+0x83/0x1a0
       RSP: 0018:ffff88019485fd70  EFLAGS: 00010202
       RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000001
       RDX: ffff880192e99908 RSI: ffff880192e99630 RDI: ffffffff81a26c60
       RBP: ffff88019485fdc0 R08: 0000000000000000 R09: 0000000000000000
       R10: ffff880192e99908 R11: 0000000000000000 R12: ffffffff81a16a00
       R13: ffff880192e99908 R14: ffffffff81a16900 R15: 0000000000000000
       FS:  0000000000000000(0000) GS:ffff88019bc00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000000 CR3: 0000000001a0c000 CR4: 00000000000007f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process swapper/0 (pid: 1, threadinfo ffff88019485e000, task ffff880194878000)
       Stack:
        ffff88019485fdd0 ffff880192da9d60 0000000000000000 ffff880192e99908
        ffff880192e995d8 0000000000000001 ffffffff81a16a00 ffff880192da9d60
        0000000000000000 0000000000000000 ffff88019485fdd0 ffffffff811527be
       Call Trace:
        [<ffffffff811527be>] sysfs_create_group+0xe/0x10
        [<ffffffff81376ca6>] device_add_groups+0x46/0x80
        [<ffffffff81377d3d>] device_add+0x46d/0x6a0
        ...
      Signed-off-by: NBruno Prémont <bonbons@linux-vserver.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5631f2c1
  4. 10 4月, 2012 2 次提交
  5. 09 4月, 2012 1 次提交
    • A
      dentry leak in simple_fill_super() failure exit · 640946f2
      Al Viro 提交于
      d_genocide() does _not_ evict dentries; it just removes extra ref
      pinning each of those.  Normally it's followed by shrinking the
      tree (it's done just before generic_shutdown_super() by kill_litter_super()),
      but in case of simple_fill_super() nothing of that kind will follow.
      Just do shrink_dcache_parent() manually.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      640946f2
  6. 07 4月, 2012 1 次提交
    • L
      Make the "word-at-a-time" helper functions more commonly usable · f68e556e
      Linus Torvalds 提交于
      I have a new optimized x86 "strncpy_from_user()" that will use these
      same helper functions for all the same reasons the name lookup code uses
      them.  This is preparation for that.
      
      This moves them into an architecture-specific header file.  It's
      architecture-specific for two reasons:
      
       - some of the functions are likely to want architecture-specific
         implementations.  Even if the current code happens to be "generic" in
         the sense that it should work on any little-endian machine, it's
         likely that the "multiply by a big constant and shift" implementation
         is less than optimal for an architecture that has a guaranteed fast
         bit count instruction, for example.
      
       - I expect that if architectures like sparc want to start playing
         around with this, we'll need to abstract out a few more details (in
         particular the actual unaligned accesses).  So we're likely to have
         more architecture-specific stuff if non-x86 architectures start using
         this.
      
         (and if it turns out that non-x86 architectures don't start using
         this, then having it in an architecture-specific header is still the
         right thing to do, of course)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f68e556e
  7. 06 4月, 2012 8 次提交
  8. 05 4月, 2012 1 次提交
  9. 04 4月, 2012 2 次提交
  10. 02 4月, 2012 2 次提交
  11. 01 4月, 2012 6 次提交