1. 17 5月, 2009 1 次提交
    • L
      Fix caller information for warn_slowpath_null · 0f6f49a8
      Linus Torvalds 提交于
      Ian Campbell noticed that since "Eliminate thousands of warnings with
      gcc 3.2 build" (commit 57adc4d2) all
      WARN_ON()'s currently appear to come from warn_slowpath_null(), eg:
      
        WARNING: at kernel/softirq.c:143 warn_slowpath_null+0x1c/0x20()
      
      because now that warn_slowpath_null() is in the call path, the
      __builtin_return_address(0) returns that, rather than the place that
      caused the warning.
      
      Fix this by splitting up the warn_slowpath_null/fmt cases differently,
      using a common helper function, and getting the return address in the
      right place.  This also happens to avoid the unnecessary stack usage for
      the non-stdargs case, and just generally cleans things up.
      
      Make the function name printout use %pS while at it.
      
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f6f49a8
  2. 15 5月, 2009 2 次提交
  3. 09 5月, 2009 2 次提交
  4. 07 5月, 2009 2 次提交
    • A
      Eliminate thousands of warnings with gcc 3.2 build · 57adc4d2
      Andi Kleen 提交于
      When building with gcc 3.2 I get thousands of warnings such as
      
      include/linux/gfp.h: In function `allocflags_to_migratetype':
      include/linux/gfp.h:105: warning: null format string
      
      due to passing a NULL format string to warn_slowpath() in
      
      #define __WARN()		warn_slowpath(__FILE__, __LINE__, NULL)
      
      Split this case out into a separate call.  This also shrinks the kernel
      slightly:
      
                text    data     bss     dec     hex filename
             4802274  707668  712704 6222646  5ef336 vmlinux
                text    data     bss     dec     hex filename
             4799027  703572  712704 6215303  5ed687 vmlinux
      
      due to removeing one argument from the commonly-called __WARN().
      
      [akpm@linux-foundation.org: reduce scope of `empty']
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57adc4d2
    • W
      inotify: use GFP_NOFS in kernel_event() to work around a lockdep false-positive · 381a80e6
      Wu Fengguang 提交于
      There is what we believe to be a false positive reported by lockdep.
      
      inotify_inode_queue_event() => take inotify_mutex => kernel_event() =>
      kmalloc() => SLOB => alloc_pages_node() => page reclaim => slab reclaim =>
      dcache reclaim => inotify_inode_is_dead => take inotify_mutex => deadlock
      
      The plan is to fix this via lockdep annotation, but that is proving to be
      quite involved.
      
      The patch flips the allocation over to GFP_NFS to shut the warning up, for
      the 2.6.30 release.
      
      Hopefully we will fix this for real in 2.6.31.  I'll queue a patch in -mm
      to switch it back to GFP_KERNEL so we don't forget.
      
        =================================
        [ INFO: inconsistent lock state ]
        2.6.30-rc2-next-20090417 #203
        ---------------------------------
        inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
        kswapd0/380 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (&inode->inotify_mutex){+.+.?.}, at: [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
        {RECLAIM_FS-ON-W} state was registered at:
          [<ffffffff81079188>] mark_held_locks+0x68/0x90
          [<ffffffff810792a5>] lockdep_trace_alloc+0xf5/0x100
          [<ffffffff810f5261>] __kmalloc_node+0x31/0x1e0
          [<ffffffff81130652>] kernel_event+0xe2/0x190
          [<ffffffff81130826>] inotify_dev_queue_event+0x126/0x230
          [<ffffffff8112f096>] inotify_inode_queue_event+0xc6/0x110
          [<ffffffff8110444d>] vfs_create+0xcd/0x140
          [<ffffffff8110825d>] do_filp_open+0x88d/0xa20
          [<ffffffff810f6b68>] do_sys_open+0x98/0x140
          [<ffffffff810f6c50>] sys_open+0x20/0x30
          [<ffffffff8100c272>] system_call_fastpath+0x16/0x1b
          [<ffffffffffffffff>] 0xffffffffffffffff
        irq event stamp: 690455
        hardirqs last  enabled at (690455): [<ffffffff81564fe4>] _spin_unlock_irqrestore+0x44/0x80
        hardirqs last disabled at (690454): [<ffffffff81565372>] _spin_lock_irqsave+0x32/0xa0
        softirqs last  enabled at (690178): [<ffffffff81052282>] __do_softirq+0x202/0x220
        softirqs last disabled at (690157): [<ffffffff8100d50c>] call_softirq+0x1c/0x50
      
        other info that might help us debug this:
        2 locks held by kswapd0/380:
         #0:  (shrinker_rwsem){++++..}, at: [<ffffffff810d0bd7>] shrink_slab+0x37/0x180
         #1:  (&type->s_umount_key#17){++++..}, at: [<ffffffff8110cfbf>] shrink_dcache_memory+0x11f/0x1e0
      
        stack backtrace:
        Pid: 380, comm: kswapd0 Not tainted 2.6.30-rc2-next-20090417 #203
        Call Trace:
         [<ffffffff810789ef>] print_usage_bug+0x19f/0x200
         [<ffffffff81018bff>] ? save_stack_trace+0x2f/0x50
         [<ffffffff81078f0b>] mark_lock+0x4bb/0x6d0
         [<ffffffff810799e0>] ? check_usage_forwards+0x0/0xc0
         [<ffffffff8107b142>] __lock_acquire+0xc62/0x1ae0
         [<ffffffff810f478c>] ? slob_free+0x10c/0x370
         [<ffffffff8107c0a1>] lock_acquire+0xe1/0x120
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff81562d43>] mutex_lock_nested+0x63/0x420
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff8112f1b5>] ? inotify_inode_is_dead+0x35/0xb0
         [<ffffffff81012fe9>] ? sched_clock+0x9/0x10
         [<ffffffff81077165>] ? lock_release_holdtime+0x35/0x1c0
         [<ffffffff8112f1b5>] inotify_inode_is_dead+0x35/0xb0
         [<ffffffff8110c9dc>] dentry_iput+0xbc/0xe0
         [<ffffffff8110cb23>] d_kill+0x33/0x60
         [<ffffffff8110ce23>] __shrink_dcache_sb+0x2d3/0x350
         [<ffffffff8110cffa>] shrink_dcache_memory+0x15a/0x1e0
         [<ffffffff810d0cc5>] shrink_slab+0x125/0x180
         [<ffffffff810d1540>] kswapd+0x560/0x7a0
         [<ffffffff810ce160>] ? isolate_pages_global+0x0/0x2c0
         [<ffffffff81065a30>] ? autoremove_wake_function+0x0/0x40
         [<ffffffff8107953d>] ? trace_hardirqs_on+0xd/0x10
         [<ffffffff810d0fe0>] ? kswapd+0x0/0x7a0
         [<ffffffff8106555b>] kthread+0x5b/0xa0
         [<ffffffff8100d40a>] child_rip+0xa/0x20
         [<ffffffff8100cdd0>] ? restore_args+0x0/0x30
         [<ffffffff81065500>] ? kthread+0x0/0xa0
         [<ffffffff8100d400>] ? child_rip+0x0/0x20
      
      [eparis@redhat.com: fix audit too]
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      381a80e6
  5. 03 5月, 2009 1 次提交
    • A
      mm: prevent divide error for small values of vm_dirty_bytes · 9e4a5bda
      Andrea Righi 提交于
      Avoid setting less than two pages for vm_dirty_bytes: this is necessary to
      avoid potential division by 0 (like the following) in get_dirty_limits().
      
      [   49.951610] divide error: 0000 [#1] PREEMPT SMP
      [   49.952195] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/block/sda/uevent
      [   49.952195] CPU 1
      [   49.952195] Modules linked in: pcspkr
      [   49.952195] Pid: 3064, comm: dd Not tainted 2.6.30-rc3 #1
      [   49.952195] RIP: 0010:[<ffffffff802d39a9>]  [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0
      [   49.952195] RSP: 0018:ffff88001de03a98  EFLAGS: 00010202
      [   49.952195] RAX: 00000000000000c0 RBX: ffff88001de03b80 RCX: 28f5c28f5c28f5c3
      [   49.952195] RDX: 0000000000000000 RSI: 00000000000000c0 RDI: 0000000000000000
      [   49.952195] RBP: ffff88001de03ae8 R08: 0000000000000000 R09: 0000000000000000
      [   49.952195] R10: ffff88001ddda9a0 R11: 0000000000000001 R12: 0000000000000001
      [   49.952195] R13: ffff88001fbc8218 R14: ffff88001de03b70 R15: ffff88001de03b78
      [   49.952195] FS:  00007fe9a435b6f0(0000) GS:ffff8800025d9000(0000) knlGS:0000000000000000
      [   49.952195] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   49.952195] CR2: 00007fe9a39ab000 CR3: 000000001de38000 CR4: 00000000000006e0
      [   49.952195] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   49.952195] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   49.952195] Process dd (pid: 3064, threadinfo ffff88001de02000, task ffff88001ddda250)
      [   49.952195] Stack:
      [   49.952195]  ffff88001fa0de00 ffff88001f2dbd70 ffff88001f9fe800 000080b900000000
      [   49.952195]  00000000000000c0 ffff8800027a6100 0000000000000400 ffff88001fbc8218
      [   49.952195]  0000000000000000 0000000000000600 ffff88001de03bb8 ffffffff802d3ed7
      [   49.952195] Call Trace:
      [   49.952195]  [<ffffffff802d3ed7>] balance_dirty_pages_ratelimited_nr+0x1d7/0x3f0
      [   49.952195]  [<ffffffff80368f8e>] ? ext3_writeback_write_end+0x9e/0x120
      [   49.952195]  [<ffffffff802cc7df>] generic_file_buffered_write+0x12f/0x330
      [   49.952195]  [<ffffffff802cce8d>] __generic_file_aio_write_nolock+0x26d/0x460
      [   49.952195]  [<ffffffff802cda32>] ? generic_file_aio_write+0x52/0xd0
      [   49.952195]  [<ffffffff802cda49>] generic_file_aio_write+0x69/0xd0
      [   49.952195]  [<ffffffff80365fa6>] ext3_file_write+0x26/0xc0
      [   49.952195]  [<ffffffff803034d1>] do_sync_write+0xf1/0x140
      [   49.952195]  [<ffffffff80290d1a>] ? get_lock_stats+0x2a/0x60
      [   49.952195]  [<ffffffff80280730>] ? autoremove_wake_function+0x0/0x40
      [   49.952195]  [<ffffffff8030411b>] vfs_write+0xcb/0x190
      [   49.952195]  [<ffffffff803042d0>] sys_write+0x50/0x90
      [   49.952195]  [<ffffffff8022ff6b>] system_call_fastpath+0x16/0x1b
      [   49.952195] Code: 00 00 00 2b 05 09 1c 17 01 48 89 c6 49 0f af f4 48 c1 ee 02 48 89 f0 48 f7 e1 48 89 d6 31 d2 48 c1 ee 02 48 0f af 75 d0 48 89 f0 <48> f7 f7 41 8b 95 ac 01 00 00 48 89 c7 49 0f af d4 48 c1 ea 02
      [   49.952195] RIP  [<ffffffff802d39a9>] get_dirty_limits+0xe9/0x2c0
      [   49.952195]  RSP <ffff88001de03a98>
      [   50.096523] ---[ end trace 008d7aa02f244d7b ]---
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e4a5bda
  6. 02 5月, 2009 1 次提交
    • J
      clockevents: prevent endless loop in tick_handle_periodic() · 74a03b69
      john stultz 提交于
      tick_handle_periodic() can lock up hard when a one shot clock event
      device is used in combination with jiffies clocksource.
      
      Avoid an endless loop issue by requiring that a highres valid
      clocksource be installed before we call tick_periodic() in a loop when
      using ONESHOT mode. The result is we will only increment jiffies once
      per interrupt until a continuous hardware clocksource is available.
      
      Without this, we can run into a endless loop, where each cycle through
      the loop, jiffies is updated which increments time by tick_period or
      more (due to clock steering), which can cause the event programming to
      think the next event was before the newly incremented time and fail
      causing tick_periodic() to be called again and the whole process loops
      forever.
      
      [ Impact: prevent hard lock up ]
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      74a03b69
  7. 01 5月, 2009 1 次提交
  8. 30 4月, 2009 1 次提交
  9. 29 4月, 2009 2 次提交
  10. 27 4月, 2009 1 次提交
  11. 25 4月, 2009 1 次提交
    • R
      PM/Hibernate: Fix waiting for image device to appear on resume · 0c8454f5
      Rafael J. Wysocki 提交于
      Commit c7510859 ("PM/Hibernate: Wait for
      SCSI devices scan to complete during resume") added a call to
      scsi_complete_async_scans() to software_resume(), so that it waited for
      the SCSI scanning to complete, but the call was added at a wrong place.
      
      Namely, it should have been added after wait_for_device_probe(), which
      is called only if the image partition hasn't been specified yet.  Also,
      it's reasonable to check if the image partition is present and only wait
      for the device probing and SCSI scanning to complete if it is not the
      case.
      
      Additionally, since noresume is checked right at the beginning of
      software_resume() and the function returns immediately if it's set, it
      doesn't make sense to check it once again later.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c8454f5
  12. 24 4月, 2009 1 次提交
  13. 23 4月, 2009 1 次提交
    • I
      locking: clarify kernel-taint warning message · b48ccb09
      Ingo Molnar 提交于
      Andi Kleen reported this message triggering on non-lockdep kernels:
      
         Disabling lockdep due to kernel taint
      
      Clarify the message to say 'lock debugging' - debug_locks_off()
      turns off all things lock debugging, not just lockdep.
      
      [ Impact: change kernel warning message text ]
      Reported-by: NAndi Kleen <andi@firstfloor.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b48ccb09
  14. 22 4月, 2009 2 次提交
  15. 21 4月, 2009 1 次提交
  16. 20 4月, 2009 1 次提交
    • R
      PM/Suspend: Introduce two new platform callbacks to avoid breakage · 6a7c7eaf
      Rafael J. Wysocki 提交于
      Commit 900af0d9 (PM: Change suspend
      code ordering) changed the ordering of suspend code in such a way
      that the platform .prepare() callback is now executed after the
      device drivers' late suspend callbacks have run.  Unfortunately, this
      turns out to break ARM platforms that need to talk via I2C to power
      control devices during the .prepare() callback.
      
      For this reason introduce two new platform suspend callbacks,
      .prepare_late() and .wake(), that will be called just prior to
      disabling non-boot CPUs and right after bringing them back on line,
      respectively, and use them instead of .prepare() and .finish() for
      ACPI suspend.  Make the PM core execute the .prepare() and .finish()
      platform suspend callbacks where they were executed previously (that
      is, right after calling the regular suspend methods provided by
      device drivers and right before executing their regular resume
      methods, respectively).
      
      It is not necessary to make analogous changes to the hibernation
      code and data structures at the moment, because they are only used
      by ACPI platforms.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: NLen Brown <len.brown@intel.com>
      6a7c7eaf
  17. 19 4月, 2009 1 次提交
  18. 18 4月, 2009 1 次提交
    • P
      lockdep: more robust lockdep_map init sequence · c8a25005
      Peter Zijlstra 提交于
      Steven Rostedt reported:
      
      > OK, I think I figured this bug out. This is a lockdep issue with respect
      > to tracepoints.
      >
      > The trace points in lockdep are called all the time. Outside the lockdep
      > logic. But if lockdep were to trigger an error / warning (which this run
      > did) we might be in trouble. For new locks, like the dentry->d_lock, that
      > are created, they will not get a name:
      >
      > void lockdep_init_map(struct lockdep_map *lock, const char *name,
      >                       struct lock_class_key *key, int subclass)
      > {
      >         if (unlikely(!debug_locks))
      >                 return;
      >
      > When a problem is found by lockdep, debug_locks becomes false. Thus we
      > stop allocating names for locks. This dentry->d_lock I had, now has no
      > name. Worse yet, I have CONFIG_DEBUG_VM set, that scrambles non
      > initialized memory. Thus, when the trace point was hit, it had junk for
      > the lock->name, and the machine crashed.
      
      Ah, nice catch. I think we should put at least the name in regardless.
      
      Ensure we at least initialize the trivial entries of the depmap so that
      they can be relied upon, even when lockdep itself decided to pack up and
      go home.
      
      [ Impact: fix lock tracing after lockdep warnings. ]
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1239954049.23397.4156.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c8a25005
  19. 17 4月, 2009 2 次提交
  20. 16 4月, 2009 1 次提交
    • D
      RCU: Don't try and predeclare inline funcs as it upsets some versions of gcc · 5b1d07ed
      David Howells 提交于
      Don't try and predeclare inline funcs like this:
      
      	static inline void wait_migrated_callbacks(void)
      	...
      	static void _rcu_barrier(enum rcu_barrier type)
      	{
      		...
      		wait_migrated_callbacks();
      	}
      	...
      	static inline void wait_migrated_callbacks(void)
      	{
      		wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count));
      	}
      
      as it upsets some versions of gcc under some circumstances:
      
      	kernel/rcupdate.c: In function `_rcu_barrier':
      	kernel/rcupdate.c:125: sorry, unimplemented: inlining failed in call to 'wait_migrated_callbacks': function body not available
      	kernel/rcupdate.c:152: sorry, unimplemented: called from here
      
      This can be dealt with by simply putting the static variables (rcu_migrate_*)
      at the top, and moving the implementation of the function up so that it
      replaces its forward declaration.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b1d07ed
  21. 15 4月, 2009 1 次提交
  22. 14 4月, 2009 9 次提交
    • P
      x86, irq: Remove IRQ_DISABLED check in process context IRQ move · 6ec3cfec
      Pallipadi, Venkatesh 提交于
      As discussed in the thread here:
      
        http://marc.info/?l=linux-kernel&m=123964468521142&w=2
      
      Eric W. Biederman observed:
      
      > It looks like some additional bugs have slipped in since last I looked.
      >
      > set_irq_affinity does this:
      > ifdef CONFIG_GENERIC_PENDING_IRQ
      >        if (desc->status & IRQ_MOVE_PCNTXT || desc->status & IRQ_DISABLED) {
      >                cpumask_copy(desc->affinity, cpumask);
      >                desc->chip->set_affinity(irq, cpumask);
      >        } else {
      >                desc->status |= IRQ_MOVE_PENDING;
      >                cpumask_copy(desc->pending_mask, cpumask);
      >        }
      > #else
      >
      > That IRQ_DISABLED case is a software state and as such it has nothing to
      > do with how safe it is to move an irq in process context.
      
      [...]
      
      >
      > The only reason we migrate MSIs in interrupt context today is that there
      > wasn't infrastructure for support migration both in interrupt context
      > and outside of it.
      
      Yes. The idea here was to force the MSI migration to happen in process
      context. One of the patches in the series did
      
              disable_irq(dev->irq);
              irq_set_affinity(dev->irq, cpumask_of(dev->cpu));
              enable_irq(dev->irq);
      
      with the above patch adding irq/manage code check for interrupt disabled
      and moving the interrupt in process context.
      
      IIRC, there was no IRQ_MOVE_PCNTXT when we were developing this HPET
      code and we ended up having this ugly hack. IRQ_MOVE_PCNTXT was there
      when we eventually submitted the patch upstream. But, looks like I did a
      blind rebasing instead of using IRQ_MOVE_PCNTXT in hpet MSI code.
      
      Below patch fixes this. i.e., revert commit 932775a4
      and add PCNTXT to HPET MSI setup. Also removes copying of desc->affinity
      in generic code as set_affinity routines are doing it internally.
      Reported-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Li Shaohua" <shaohua.li@intel.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: "lcm@us.ibm.com" <lcm@us.ibm.com>
      Cc: suresh.b.siddha@intel.com
      LKML-Reference: <20090413222058.GB8211@linux-os.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6ec3cfec
    • P
      rcu: Make hierarchical RCU less IPI-happy · ef631b0c
      Paul E. McKenney 提交于
      This patch fixes a hierarchical-RCU performance bug located by Anton
      Blanchard.  The problem stems from a misguided attempt to provide a
      work-around for jiffies-counter failure.  This work-around uses a per-CPU
      n_rcu_pending counter, which is incremented on each call to rcu_pending(),
      which in turn is called from each scheduling-clock interrupt.  Each CPU
      then treats this counter as a surrogate for the jiffies counter, so
      that if the jiffies counter fails to advance, the per-CPU n_rcu_pending
      counter will cause RCU to invoke force_quiescent_state(), which in turn
      will (among other things) send resched IPIs to CPUs that have thus far
      failed to pass through an RCU quiescent state.
      
      Unfortunately, each CPU resets only its own counter after sending a
      batch of IPIs.  This means that the other CPUs will also (needlessly)
      send -another- round of IPIs, for a full N-squared set of IPIs in the
      worst case every three scheduler-clock ticks until the grace period
      finally ends.  It is not reasonable for a given CPU to reset each and
      every n_rcu_pending for all the other CPUs, so this patch instead simply
      disables the jiffies-counter "training wheels", thus eliminating the
      excessive IPIs.
      
      Note that the jiffies-counter IPIs do not have this problem due to
      the fact that the jiffies counter is global, so that the CPU sending
      the IPIs can easily reset things, thus preventing the other CPUs from
      sending redundant IPIs.
      
      Note also that the n_rcu_pending counter remains, as it will continue to
      be used for tracing.  It may also see use to update the jiffies counter,
      should an appropriate kick-the-jiffies-counter API appear.
      Located-by: NAnton Blanchard <anton@au1.ibm.com>
      Tested-by: NAnton Blanchard <anton@au1.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: anton@samba.org
      Cc: akpm@linux-foundation.org
      Cc: dipankar@in.ibm.com
      Cc: manfred@colorfullife.com
      Cc: cl@linux-foundation.org
      Cc: josht@linux.vnet.ibm.com
      Cc: schamp@sgi.com
      Cc: niv@us.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: ego@in.ibm.com
      Cc: laijs@cn.fujitsu.com
      Cc: rostedt@goodmis.org
      Cc: peterz@infradead.org
      Cc: penberg@cs.helsinki.fi
      Cc: andi@firstfloor.org
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <12396834793575-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ef631b0c
    • Z
      tracing: Fix branch tracer header · 557055be
      Zhaolei 提交于
      Before patch:
      
        # tracer: branch
        #
        #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
        #              | |       |          |         |
                   <...>-2981  [000] 24008.872738: [  ok  ] trace_irq_handler_exit:irq_event_types.h:41
                   <...>-2981  [000] 24008.872742: [  ok  ] note_interrupt:spurious.c:229
        ...
      
      After patch:
      
        # tracer: branch
        #
        #           TASK-PID    CPU#    TIMESTAMP  CORRECT  FUNC:FILE:LINE
        #              | |       |          |         |       |
                   <...>-2985  [000] 26329.142970: [  ok  ] slab_free:slub.c:1776
                   <...>-2985  [000] 26329.142972: [  ok  ] trace_kmem_cache_free:kmem_event_types.h:191
        ...
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <49E2F19A.3040006@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      557055be
    • L
      tracing, sched: mark get_parent_ip() notrace · 132380a0
      Lai Jiangshan 提交于
      Impact: remove overly redundant tracing entries
      
      When tracer is "function" or "function_graph", way too much
      "get_parent_ip" entries are recorded in ring_buffer.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D458B1.5000703@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      132380a0
    • A
      kernel/sys.c: clean up sys_shutdown exit path · 3d26dcf7
      Andi Kleen 提交于
      Impact: cleanup, fix
      
      Clean up sys_shutdown() exit path.  Factor out common code.  Return
      correct error code instead of always 0 on failure.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d26dcf7
    • O
      ptrace: fix exit_ptrace() vs ptrace_traceme() race · f1671f6d
      Oleg Nesterov 提交于
      Pointed out by Roland.  The bug was recently introduced by me in
      "forget_original_parent: split out the un-ptrace part", commit
      39c626ae.
      
      Since that patch we have a window after exit_ptrace() drops tasklist and
      before forget_original_parent() takes it again.  In this window the child
      can do ptrace(PTRACE_TRACEME) and nobody can untrace this child after
      that.
      
      Change ptrace_traceme() to not attach to the exiting ->real_parent.  We
      don't report the error in this case, we pretend we attach right before
      ->real_parent calls exit_ptrace() which should untrace us anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1671f6d
    • P
      mm: move the scan_unevictable_pages sysctl to the vm table · 4be6f6bb
      Peter Zijlstra 提交于
      vm knobs should go in the vm table.  Probably too late for
      randomize_va_space though.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4be6f6bb
    • Z
      tracing: Fix power tracer header · a3d03eca
      Zhaolei 提交于
      Before patch:
        # tracer: power
        #
        #           TASK-PID    CPU#    TIMESTAMP  FUNCTION
        #              | |       |          |         |
        [  676.875865889] CSTATE: Going to C1 on cpu 0 for 0.005911463
        [  676.882938805] CSTATE: Going to C1 on cpu 0 for 0.104796532
        ...
      
      After patch:
        # tracer: power
        #
        #   TIMESTAMP      STATE  EVENT
        #       |            |      |
        [  676.875865889] CSTATE: Going to C1 on cpu 0 for 0.005911463
        [  676.882938805] CSTATE: Going to C1 on cpu 0 for 0.104796532
        ...
      
      v2: Use seq_puts instead of seq_printf
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <49E2E889.5000903@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a3d03eca
    • R
      PM/Hibernate: Wait for SCSI devices scan to complete during resume · c7510859
      Rafael J. Wysocki 提交于
      There is a race between resume from hibernation and the asynchronous
      scanning of SCSI devices and to prevent it from happening we need to
      call scsi_complete_async_scans() during resume from hibernation.
      
      In addition, if the resume from hibernation is userland-driven, it's
      better to wait for all device probes in the kernel to complete before
      attempting to open the resume device.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7510859
  23. 12 4月, 2009 4 次提交