1. 15 8月, 2008 2 次提交
  2. 13 8月, 2008 2 次提交
  3. 12 8月, 2008 4 次提交
    • N
      generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask(), fix · c2fc1198
      Nick Piggin 提交于
      > > Nick Piggin (1):
      > >       generic-ipi: fix stack and rcu interaction bug in
      > > smp_call_function_mask()
      >
      > I'm still not 100% sure that I have this patch right... I might have seen
      > a lockup trace implicating the smp call function path... which may have
      > been due to some other problem or a different bug in the new call function
      > code, but if some more people can take a look at it before merging?
      
      OK indeed it did have a couple of bugs. Firstly, I wasn't freeing the
      data properly in the alloc && wait case. Secondly, I wasn't resetting
      CSD_FLAG_WAIT in the for each cpu loop (so only the first CPU would
      wait).
      
      After those fixes, the patch boots and runs with the kmalloc commented
      out (so it always executes the slowpath).
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c2fc1198
    • L
      stop_machine: remove unused variable · ed6d6876
      Li Zefan 提交于
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      ed6d6876
    • A
      modules: extend initcall_debug functionality to the module loader · 59f9415f
      Arjan van de Ven 提交于
      The kernel has this really nice facility where if you put "initcall_debug"
      on the kernel commandline, it'll print which function it's going to
      execute just before calling an initcall, and then after the call completes
      it will
      
      1) print if it had an error code
      
      2) checks for a few simple bugs (like leaving irqs off)
      and
      
      3) print how long the init call took in milliseconds.
      
      While trying to optimize the boot speed of my laptop, I have been loving
      number 3 to figure out what to optimize...  ...  and then I wished that
      the same thing was done for module loading.
      
      This patch makes the module loader use this exact same functionality; it's
      a logical extension in my view (since modules are just sort of late
      binding initcalls anyway) and so far I've found it quite useful in finding
      where things are too slow in my boot.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      59f9415f
    • P
      lockdep: fix debug_lock_alloc · 0f2bc27b
      Peter Zijlstra 提交于
      When we enable DEBUG_LOCK_ALLOC but do not enable PROVE_LOCKING and or
      LOCK_STAT, lock_alloc() and lock_release() turn into nops, even though
      we should be doing hlock checking (check=1).
      
      This causes a false warning and a lockdep self-disable.
      
      Rectify this.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0f2bc27b
  4. 11 8月, 2008 12 次提交
    • D
      sched, cpu hotplug: fix set_cpus_allowed() use in hotplug callbacks · 279ef6bb
      Dmitry Adamushko 提交于
      Mark Langsdorf reported:
      
      > One of my co-workers noticed that the powernow-k8
      > driver no longer restarts when a CPU core is
      > hot-disabled and then hot-enabled on AMD quad-core
      > systems.
      >
      > The following comands work fine on 2.6.26 and fail
      > on 2.6.27-rc1:
      >
      > echo 0 > /sys/devices/system/cpu/cpu3/online
      > echo 1 > /sys/devices/system/cpu/cpu3/online
      > find /sys -name cpufreq
      >
      > For 2.6.26, the find will return a cpufreq
      > directory for each processor.  In 2.6.27-rc1,
      > the cpu3 directory is missing.
      >
      > After digging through the code, the following
      > logic is failing when the core is hot-enabled
      > at runtime.  The code works during the boot
      > sequence.
      >
      >       cpumask_t = current->cpus_allowed;
      >       set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu));
      >       if (smp_processor_id() != cpu)
      >               return -ENODEV;
      
      So set the CPU active before calling the CPU_ONLINE notifier chain,
      there are a handful of notifiers that use set_cpus_allowed().
      
      This fix also solves the problem with x86-microcode. I've sent
      alternative patches for microcode, but as this "rely on
      set_cpus_allowed_ptr() being workable in cpu-hotplug(CPU_ONLINE, ...)"
      assumption seems to be more broad than what we thought, perhaps this fix
      should be applied.
      
      With this patch we define that by the moment CPU_ONLINE is being sent,
      a 'cpu' is online and ready for tasks to be migrated onto it.
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Reported-by: NMark Langsdorf <mark.langsdorf@amd.com>
      Tested-by: NMark Langsdorf <mark.langsdorf@amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      279ef6bb
    • N
      generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask() · cc7a486c
      Nick Piggin 提交于
      * Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:
      
      > Found a OOPS on a big SMP box during an overnight reboot test with
      > upstream git.
      >
      > Suresh and I looked at the oops and looks like the root cause is in
      > generic_smp_call_function_interrupt() and smp_call_function_mask() with
      > wait parameter.
      >
      > The actual oops looked like
      >
      > [   11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
      > [   11.277815] IP: [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.278155] PGD 202063 PUD 0
      > [   11.278576] Oops: 0010 [1] SMP
      > [   11.279006] CPU 5
      > [   11.279336] Modules linked in:
      > [   11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f7 #290
      > [   11.280039] RIP: 0010:[<ffff8802ffffffff>]  [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.280692] RSP: 0018:ffff88027f1f7f70  EFLAGS: 00010086
      > [   11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
      > [   11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
      > [   11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
      > [   11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
      > [   11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
      > [   11.282502] FS:  0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
      > [   11.283096] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > [   11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
      > [   11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > [   11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > [   11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
      > [   11.284936] Stack:  ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
      > [   11.285802]  ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
      > [   11.286599]  ffffffff8020bdd6 ffff88027f1f3e40 <EOI>  ffff88027f1f3ef8 0000000000000000
      > [   11.287120] Call Trace:
      > [   11.287768]  <IRQ>  [<ffffffff80250963>] ? generic_smp_call_function_interrupt+0x61/0x12c
      > [   11.288354]  [<ffffffff8021adb5>] smp_call_function_interrupt+0x17/0x27
      > [   11.288744]  [<ffffffff8020bdd6>] call_function_interrupt+0x66/0x70
      > [   11.289030]  <EOI>  [<ffffffff8024ab3b>] ? clockevents_notify+0x19/0x73
      > [   11.289380]  [<ffffffff803b9b75>] ? acpi_idle_enter_simple+0x18b/0x1fa
      > [   11.289760]  [<ffffffff803b9b6b>] ? acpi_idle_enter_simple+0x181/0x1fa
      > [   11.290051]  [<ffffffff8053aeca>] ? cpuidle_idle_call+0x70/0xa2
      > [   11.290338]  [<ffffffff80209f61>] ? cpu_idle+0x5f/0x7d
      > [   11.290723]  [<ffffffff8060224a>] ? start_secondary+0x14d/0x152
      > [   11.291010]
      > [   11.291287]
      > [   11.291654] Code:  Bad RIP value.
      > [   11.292041] RIP  [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.292380]  RSP <ffff88027f1f7f70>
      > [   11.292741] CR2: ffff8802ffffffff
      > [   11.310951] ---[ end trace 137c54d525305f1c ]---
      >
      > The problem is with the following sequence of events:
      >
      > - CPU A calls smp_call_function_mask() for CPU B with wait parameter
      > - CPU A sets up the call_function_data on the stack and does an rcu add to
      >   call_function_queue
      > - CPU A waits until the WAIT flag is cleared
      > - CPU B gets the call function interrupt and starts going through the
      >   call_function_queue
      > - CPU C also gets some other call function interrupt and starts going through
      >   the call_function_queue
      > - CPU C, which is also going through the call_function_queue, starts referencing
      >   CPU A's stack, as that element is still in call_function_queue
      > - CPU B finishes the function call that CPU A set up and as there are no other
      >   references to it, rcu deletes the call_function_data (which was from CPU A
      >   stack)
      > - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
      > - CPU A which was waiting on the flag continues executing and the stack
      >   contents change
      >
      > - CPU C is still in rcu_read section accessing the CPU A's stack sees
      >   inconsistent call_funation_data and can try to execute
      >   function with some random pointer, causing stack corruption for A
      >   (by clearing the bits in mask field) and oops.
      
      Nice debugging work.
      
      I'd suggest something like the attached (boot tested) patch as the simple
      fix for now.
      
      I expect the benefits from the less synchronized, multiple-in-flight-data
      global queue will still outweigh the costs of dynamic allocations. But
      if worst comes to worst then we just go back to a globally synchronous
      one-at-a-time implementation, but that would be pretty sad!
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc7a486c
    • M
      sched: fix mysql+oltp regression · 77ae6513
      Mike Galbraith 提交于
      Defer commit 6d299f1b to the next release.
      
      Testing of the tip/sched/clock tree revealed a mysql+oltp regression
      which bisection eventually traced back to this commit in mainline.
      
      Pertinent test results:  Three run sysbench averages, throughput units
      in read/write requests/sec.
      
      clients         1     2     4     8    16    32    64
      6e0534f2      9646 17876 34774 33868 32230 30767 29441
      2.6.26.1     9112 17936 34652 33383 31929 30665 29232
      6d299f1b      9112 14637 28370 33339 32038 30762 29204
      
      Note: subsequent commits hide the majority of this regression until you
      apply the clock fixes, at which time it reemerges at full magnitude.
      
      We cannot see anything bad about the change itself so we defer it to the
      next release until this problem is fully analysed.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Gregory Haskins <ghaskins@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      77ae6513
    • I
      lockdep: rename map_[acquire|release]() => lock_map_[acquire|release]() · 3295f0ef
      Ingo Molnar 提交于
      the names were too generic:
      
       drivers/uio/uio.c:87: error: expected identifier or '(' before 'do'
       drivers/uio/uio.c:87: error: expected identifier or '(' before 'while'
       drivers/uio/uio.c:113: error: 'map_release' undeclared here (not in a function)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3295f0ef
    • R
      lockdep: handle chains involving classes defined in modules · 8bfe0298
      Rabin Vincent 提交于
      Solve this by marking the classes as unused and not printing information
      about the unused classes.
      Reported-by: NEric Sesterhenn <snakebyte@gmx.de>
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Acked-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8bfe0298
    • P
      lockdep: spin_lock_nest_lock() · b7d39aff
      Peter Zijlstra 提交于
      Expose the new lock protection lock.
      
      This can be used to annotate places where we take multiple locks of the
      same class and avoid deadlocks by always taking another (top-level) lock
      first.
      
      NOTE: we're still bound to the MAX_LOCK_DEPTH (48) limit.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7d39aff
    • P
      lockdep: lock protection locks · 7531e2f3
      Peter Zijlstra 提交于
      On Fri, 2008-08-01 at 16:26 -0700, Linus Torvalds wrote:
      
      > On Fri, 1 Aug 2008, David Miller wrote:
      > >
      > > Taking more than a few locks of the same class at once is bad
      > > news and it's better to find an alternative method.
      >
      > It's not always wrong.
      >
      > If you can guarantee that anybody that takes more than one lock of a
      > particular class will always take a single top-level lock _first_, then
      > that's all good. You can obviously screw up and take the same lock _twice_
      > (which will deadlock), but at least you cannot get into ABBA situations.
      >
      > So maybe the right thing to do is to just teach lockdep about "lock
      > protection locks". That would have solved the multi-queue issues for
      > networking too - all the actual network drivers would still have taken
      > just their single queue lock, but the one case that needs to take all of
      > them would have taken a separate top-level lock first.
      >
      > Never mind that the multi-queue locks were always taken in the same order:
      > it's never wrong to just have some top-level serialization, and anybody
      > who needs to take <n> locks might as well do <n+1>, because they sure as
      > hell aren't going to be on _any_ fastpaths.
      >
      > So the simplest solution really sounds like just teaching lockdep about
      > that one special case. It's not "nesting" exactly, although it's obviously
      > related to it.
      
      Do as Linus suggested. The lock protection lock is called nest_lock.
      
      Note that we still have the MAX_LOCK_DEPTH (48) limit to consider, so anything
      that spills that it still up shit creek.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7531e2f3
    • P
      lockdep: map_acquire · 4f3e7524
      Peter Zijlstra 提交于
      Most the free-standing lock_acquire() usages look remarkably similar, sweep
      them into a new helper.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4f3e7524
    • D
      lockdep: shrink held_lock structure · f82b217e
      Dave Jones 提交于
      struct held_lock {
              u64                        prev_chain_key;       /*     0     8 */
              struct lock_class *        class;                /*     8     8 */
              long unsigned int          acquire_ip;           /*    16     8 */
              struct lockdep_map *       instance;             /*    24     8 */
              int                        irq_context;          /*    32     4 */
              int                        trylock;              /*    36     4 */
              int                        read;                 /*    40     4 */
              int                        check;                /*    44     4 */
              int                        hardirqs_off;         /*    48     4 */
      
              /* size: 56, cachelines: 1 */
              /* padding: 4 */
              /* last cacheline: 56 bytes */
      };
      
      struct held_lock {
              u64                        prev_chain_key;       /*     0     8 */
              long unsigned int          acquire_ip;           /*     8     8 */
              struct lockdep_map *       instance;             /*    16     8 */
              unsigned int               class_idx:11;         /*    24:21  4 */
              unsigned int               irq_context:2;        /*    24:19  4 */
              unsigned int               trylock:1;            /*    24:18  4 */
              unsigned int               read:2;               /*    24:16  4 */
              unsigned int               check:2;              /*    24:14  4 */
              unsigned int               hardirqs_off:1;       /*    24:13  4 */
      
              /* size: 32, cachelines: 1 */
              /* padding: 4 */
              /* bit_padding: 13 bits */
              /* last cacheline: 32 bytes */
      };
      
      [mingo@elte.hu: shrunk hlock->class too]
      [peterz@infradead.org: fixup bit sizes]
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      f82b217e
    • P
      lockdep: re-annotate scheduler runqueues · 1b12bbc7
      Peter Zijlstra 提交于
      Instead of using a per-rq lock class, use the regular nesting operations.
      
      However, take extra care with double_lock_balance() as it can release the
      already held rq->lock (and therefore change its nesting class).
      
      So what can happen is:
      
       spin_lock(rq->lock);	// this rq subclass 0
      
       double_lock_balance(rq, other_rq);
         // release rq
         // acquire other_rq->lock subclass 0
         // acquire rq->lock subclass 1
      
       spin_unlock(other_rq->lock);
      
      leaving you with rq->lock in subclass 1
      
      So a subsequent double_lock_balance() call can try to nest a subclass 1
      lock while already holding a subclass 1 lock.
      
      Fix this by introducing double_unlock_balance() which releases the other
      rq's lock, but also re-sets the subclass for this rq's lock to 0.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1b12bbc7
    • P
      lockdep: lock_set_subclass - reset a held lock's subclass · 64aa348e
      Peter Zijlstra 提交于
      this can be used to reset a held lock's subclass, for arbitrary-depth
      iterated data structures such as trees or lists which have per-node
      locks.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64aa348e
    • P
      sched_clock: delay using sched_clock() · c1955a3d
      Peter Zijlstra 提交于
      Some arch's can't handle sched_clock() being called too early - delay
      this until sched_clock_init() has been called.
      Reported-by: NBill Gatliff <bgat@billgatliff.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NNishanth Aravamudan <nacc@us.ibm.com>
      CC: Russell King - ARM Linux <linux@arm.linux.org.uk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c1955a3d
  5. 07 8月, 2008 1 次提交
  6. 06 8月, 2008 5 次提交
  7. 05 8月, 2008 1 次提交
  8. 04 8月, 2008 1 次提交
  9. 02 8月, 2008 6 次提交
  10. 01 8月, 2008 4 次提交
    • J
      kgdb: fix gdb serial thread queries · 25fc9999
      Jason Wessel 提交于
      The command "info threads" did not work correctly with kgdb.  It would
      result in a silent kernel hang if used.
      
      This patach addresses several problems.
       - Fix use of deprecated NR_CPUS
       - Fix kgdb to not walk linearly through the pid space
       - Correctly implement shadow pids
       - Change the threads per query to a #define
       - Fix kgdb_hex2long to work with negated values
      
      The threads 0 and -1 are reserved to represent the current task.  That
      means that CPU 0 will start with a shadow thread id of -2, and CPU 1
      will have a shadow thread id of -3, etc...
      
      From the debugger you can switch to a shadow thread to see what one of
      the other cpus was doing, however it is not possible to execute run
      control operations on any other cpu execept the cpu executing the
      kgdb_handle_exception().
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      25fc9999
    • J
      kgdb: fix kgdb_validate_break_address to perform a mem write · a9b60bf4
      Jason Wessel 提交于
      A regression to the kgdb core was found in the case of using the
      CONFIG_DEBUG_RODATA kernel option.  When this option is on, a breakpoint
      cannot be written into any readonly memory page.  When an external
      debugger requests a breakpoint to get set, the
      kgdb_validate_break_address() was only checking to see if the address
      to place the breakpoint was readable and lacked a write check.
      
      This patch changes the validate routine to try reading (via the
      breakpoint set request) and also to try immediately writing the break
      point.  If either fails, an error is correctly returned and the
      debugger behaves correctly.  Then an end user can make the
      descision to use hardware breakpoints.
      
      Also update the documentation to reflect that using
      CONFIG_DEBUG_RODATA will inhibit the use of software breakpoints.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      a9b60bf4
    • P
      lockdep: change scheduler annotation · 5e710e37
      Peter Zijlstra 提交于
      While thinking about David's graph walk lockdep patch it _finally_
      dawned on me that there is no reason we have a lock class per cpu ...
      
      Sorry for being dense :-/
      
      The below changes the annotation from a lock class per cpu, to a single
      nested lock, as the scheduler never holds more that 2 rq locks at a time
      anyway.
      
      If there was code requiring holding all rq locks this would not work and
      the original annotation would be the only option, but that not being the
      case, this is a much lighter one.
      
      Compiles and boots on a 2-way x86_64.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5e710e37
    • D
      lockdep: fix combinatorial explosion in lock subgraph traversal · 419ca3f1
      David Miller 提交于
      When we traverse the graph, either forwards or backwards, we
      are interested in whether a certain property exists somewhere
      in a node reachable in the graph.
      
      Therefore it is never necessary to traverse through a node more
      than once to get a correct answer to the given query.
      
      Take advantage of this property using a global ID counter so that we
      need not clear all the markers in all the lock_class entries before
      doing a traversal.  A new ID is choosen when we start to traverse, and
      we continue through a lock_class only if it's ID hasn't been marked
      with the new value yet.
      
      This short-circuiting is essential especially for high CPU count
      systems.  The scheduler has a runqueue per cpu, and needs to take
      two runqueue locks at a time, which leads to long chains of
      backwards and forwards subgraphs from these runqueue lock nodes.
      Without the short-circuit implemented here, a graph traversal on
      a runqueue lock can take up to (1 << (N - 1)) checks on a system
      with N cpus.
      
      For anything more than 16 cpus or so, lockdep will eventually bring
      the machine to a complete standstill.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      419ca3f1
  11. 31 7月, 2008 2 次提交