1. 30 8月, 2008 2 次提交
  2. 29 8月, 2008 1 次提交
  3. 28 8月, 2008 3 次提交
    • P
      sched: rt-bandwidth accounting fix · cc2991cf
      Peter Zijlstra 提交于
      It fixes an accounting bug where we would continue accumulating runtime
      even though the bandwidth control is disabled. This would lead to very long
      throttle periods once bandwidth control gets turned on again.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc2991cf
    • J
      sched: fix sched_rt_rq_enqueue() resched idle · f3ade837
      John Blackwood 提交于
      When sysctl_sched_rt_runtime is set to something other than -1 and the
      CONFIG_RT_GROUP_SCHED kernel parameter is NOT enabled, we get into a state
      where we see one or more CPUs idling forvever even though there are
      real-time
      tasks in their rt runqueue that are able to run (no longer throttled).
      
      The sequence is:
      
      - A real-time task is running when the timer sets the rt runqueue
          to throttled, and the rt task is resched_task()ed and switched
          out, and idle is switched in since there are no non-rt tasks to
          run on that cpu.
      
      - Eventually the do_sched_rt_period_timer() runs and un-throttles
          the rt runqueue, but we just exit the timer interrupt and go back
          to executing the idle task in the idle loop forever.
      
      If we change the sched_rt_rq_enqueue() routine to use some of the code
      from the CONFIG_RT_GROUP_SCHED enabled version of this same routine and
      resched_task() the currently executing task (idle in our case) if it is
      a lower priority task than the higher rt task in the now un-throttled
      runqueue, the problem is no longer observed.
      Signed-off-by: NJohn Blackwood <john.blackwood@ccur.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f3ade837
    • S
      ftrace: disable tracing for suspend to ram · f42ac38c
      Steven Rostedt 提交于
      I've been painstakingly debugging the issue with suspend to ram and
      ftraced. The 2.6.28 code does not have this issue, but since the mcount
      recording is not going to be in 27, this must be solved for the ftrace
      daemon version.
      
      The resume from suspend to ram would reboot because it was triple
      faulting. Debugging further, I found that calling the mcount function
      itself was not an issue, but it would fault when it incremented
      preempt_count. preempt_count is on the tasks info structure that is on the
      low memory address of the task's stack.  For some reason, it could not
      write to it. Resuming out of suspend to ram does quite a lot of funny
      tricks to get to work, so it is not surprising at all that simply doing a
      preempt_disable() would cause a fault.
      
      Thanks to Rafael for suggesting to add a "while (1);" to find the place in
      resuming that is causing the fault. I would place the loop somewhere in
      the code, compile and reboot and see if it would either reboot (hit the
      fault) or simply hang (hit the loop).  Doing this over and over again, I
      narrowed it down that it was happening in enable_nonboot_cpus.
      
      At this point, I found that it is easier to simply disable tracing around
      the suspend code, instead of searching for the particular function that
      can not handle doing a preempt_disable.
      
      This patch disables the tracer as it suspends and reenables it on resume.
      
      I tested this patch on my Laptop, and it can resume fine with the patch.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f42ac38c
  4. 27 8月, 2008 2 次提交
    • S
      exit signals: use of uninitialized field notify_count · 2633f0e5
      Steve VanDeBogart 提交于
      task->signal->notify_count is only initialized if
      task->signal->group_exit_task is not NULL.  Reorder a conditional so
      that uninitialised memory is not used.  Found by Valgrind.
      Signed-off-by: NSteve VanDeBogart <vandebo-lkml@nerdbox.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2633f0e5
    • Z
      lockdep: fix invalid list_del_rcu in zap_class · 74870172
      Zhu Yi 提交于
      The problem is found during iwlagn driver testing on
      v2.6.27-rc4-176-gb8e6c91c kernel, but it turns out to be a lockdep bug.
      In our testing, we frequently load and unload the iwlagn driver
      (>50 times). Then the MAX_STACK_TRACE_ENTRIES is reached (expected
      behaviour?). The error message with the call trace is as below.
      
      BUG: MAX_STACK_TRACE_ENTRIES too low!
      turning off the locking correctness validator.
      Pid: 4895, comm: iwlagn Not tainted 2.6.27-rc4 #13
      
      Call Trace:
       [<ffffffff81014aa1>] save_stack_trace+0x22/0x3e
       [<ffffffff8105390a>] save_trace+0x8b/0x91
       [<ffffffff81054e60>] mark_lock+0x1b0/0x8fa
       [<ffffffff81056f71>] __lock_acquire+0x5b9/0x716
       [<ffffffffa00d818a>] ieee80211_sta_work+0x0/0x6ea [mac80211]
       [<ffffffff81057120>] lock_acquire+0x52/0x6b
       [<ffffffff81045f0e>] run_workqueue+0x97/0x1ed
       [<ffffffff81045f5e>] run_workqueue+0xe7/0x1ed
       [<ffffffff81045f0e>] run_workqueue+0x97/0x1ed
       [<ffffffff81046ae4>] worker_thread+0xd8/0xe3
       [<ffffffff81049503>] autoremove_wake_function+0x0/0x2e
       [<ffffffff81046a0c>] worker_thread+0x0/0xe3
       [<ffffffff810493ec>] kthread+0x47/0x73
       [<ffffffff8128e3ab>] trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff8100cea9>] child_rip+0xa/0x11
       [<ffffffff8100c4df>] restore_args+0x0/0x30
       [<ffffffff810316e1>] finish_task_switch+0x0/0xcc
       [<ffffffff810493a5>] kthread+0x0/0x73
       [<ffffffff8100ce9f>] child_rip+0x0/0x11
      
      Although the above is harmless, when the ilwagn module is removed
      later lockdep will trigger a kernel oops as below.
      
      BUG: unable to handle kernel NULL pointer dereference at
      0000000000000008
      IP: [<ffffffff810531e1>] zap_class+0x24/0x82
      PGD 73128067 PUD 7448c067 PMD 0
      Oops: 0002 [1] SMP
      CPU 0
      Modules linked in: rfcomm l2cap bluetooth autofs4 sunrpc
      nf_conntrack_ipv6 xt_state nf_conntrack xt_tcpudp ip6t_ipv6header
      ip6t_REJECT ip6table_filter ip6_tables x_tables ipv6 cpufreq_ondemand
      acpi_cpufreq dm_mirror dm_log dm_multipath dm_mod snd_hda_intel sr_mod
      snd_seq_dummy snd_seq_oss snd_seq_midi_event battery snd_seq
      snd_seq_device cdrom button snd_pcm_oss snd_mixer_oss snd_pcm
      snd_timer snd_page_alloc e1000e snd_hwdep sg iTCO_wdt
      iTCO_vendor_support ac pcspkr i2c_i801 i2c_core snd soundcore video
      output ata_piix ata_generic libata sd_mod scsi_mod ext3 jbd mbcache
      uhci_hcd ohci_hcd ehci_hcd [last unloaded: mac80211]
      Pid: 4941, comm: modprobe Not tainted 2.6.27-rc4 #10
      RIP: 0010:[<ffffffff810531e1>]  [<ffffffff810531e1>]
      zap_class+0x24/0x82
      RSP: 0000:ffff88007bcb3eb0  EFLAGS: 00010046
      RAX: 0000000000068ee8 RBX: ffffffff8192a0a0 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000001dfb RDI: ffffffff816e70b0
      RBP: ffffffffa00cd000 R08: ffffffff816818f8 R09: ffff88007c923558
      R10: ffffe20002ad2408 R11: ffffffff811028ec R12: ffffffff8192a0a0
      R13: 000000000002bd90 R14: 0000000000000000 R15: 0000000000000296
      FS:  00007f9d1cee56f0(0000) GS:ffffffff814a58c0(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000008 CR3: 0000000073047000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process modprobe (pid: 4941, threadinfo ffff88007bcb2000, task
      ffff8800758d1fc0)
      Stack:  ffffffff81057376 0000000000000000 ffffffffa00f7b00
      0000000000000000
       0000000000000080 0000000000618278 00007fff24f16720 0000000000000000
       ffffffff8105d37a ffffffffa00f7b00 ffffffff8105d591 313132303863616d
      Call Trace:
       [<ffffffff81057376>] ? lockdep_free_key_range+0x61/0xf5
       [<ffffffff8105d37a>] ? free_module+0xd4/0xe4
       [<ffffffff8105d591>] ? sys_delete_module+0x1de/0x1f9
       [<ffffffff8106dbfa>] ? audit_syscall_entry+0x12d/0x160
       [<ffffffff8100be2b>] ? system_call_fastpath+0x16/0x1b
      
      Code: b2 00 01 00 00 00 c3 31 f6 49 c7 c0 10 8a 61 81 eb 32 49 39 38
      75 26 48 98 48 6b c0 38 48 8b 90 08 8a 61 81 48 8b 88 00 8a 61 81 <48>
      89 51 08 48 89 0a 48 c7 80 08 8a 61 81 00 02 20 00 48 ff c6
      RIP  [<ffffffff810531e1>] zap_class+0x24/0x82
       RSP <ffff88007bcb3eb0>
      CR2: 0000000000000008
      ---[ end trace a1297e0c4abb0f2e ]---
      
      The root cause for this oops is in add_lock_to_list() when
      save_trace() fails due to MAX_STACK_TRACE_ENTRIES is reached,
      entry->class is assigned but entry is never added into any lock list.
      This makes the list_del_rcu() in zap_class() oops later when the
      module is unloaded. This patch fixes the problem by assigning
      entry->class after save_trace() returns success.
      Signed-off-by: NZhu Yi <yi.zhu@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      74870172
  5. 26 8月, 2008 4 次提交
    • J
      lockstat: repair erronous contention statistics · 04148b73
      Joe Korty 提交于
      Fix bad contention counting in /proc/lock_stat.
      
      /proc/lockstat tries to gather per-ip contention
      statistics per-lock.  This was failing due to
      a garbage per-ip index selector being used.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      04148b73
    • J
      lockstat: fix numerical output rounding error · 2189459d
      Joe Korty 提交于
      Fix rounding error in /proc/lock_stat numerical output.
      
      On occasion the two digit fractional part contains the three
      digit value '100'.  This is due to a bug in the rounding algorithm
      which pushes values in the range '95..99' to '100' rather than
      to '00' + an increment to the integer part.  For example,
      
      	- 123456.100      old display
      	+ 123457.00	  new display
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2189459d
    • H
      smp: have smp_call_function_single() detect invalid CPUs · f73be6de
      H. Peter Anvin 提交于
      Have smp_call_function_single() return invalid CPU indicies and return
      -ENXIO.  This function is already executed inside a
      get_cpu()..put_cpu() which locks out CPU removal, so rather than
      having the higher layers doing another layer of locking to guard
      against unplugged CPUs do the test here.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      f73be6de
    • L
      [module] Don't let gcc inline load_module() · ffb4ba76
      Linus Torvalds 提交于
      'load_module()' is a complex function that contains all the ELF section
      logic, and inlining it is utterly insane.  But gcc will do it, simply
      because there is only one call-site.  As a result, all the stack space
      that is allocated for all the work to load the module will still be
      active when we actually call the module init sequence, and the deep call
      chain makes stack overflows happen.
      
      And stack overflows are really hard to debug, because they not only
      corrupt random pages below the stack, but also corrupt the thread_info
      structure that is allocated under the stack.
      
      In this case, Alan Brunelle reported some crazy oopses at bootup, after
      loading the processor module that ends up doing complex ACPI stuff and
      has quite a deep callchain.  This should fix it, and is the sane thing
      to do regardless.
      
      Cc: Alan D. Brunelle <Alan.Brunelle@hp.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ffb4ba76
  6. 25 8月, 2008 1 次提交
    • P
      sched_clock: fix cpu_clock() · 354879bb
      Peter Zijlstra 提交于
      This patch fixes 3 issues:
      
      a) it removes the dependency on jiffies, because jiffies are incremented
         by a single CPU, and the tick is not synchronized between CPUs. Therefore
         relying on it to calculate a window to clip whacky TSC values doesn't work
         as it can drift around.
      
         So instead use [GTOD, GTOD+TICK_NSEC) as the window.
      
      b) __update_sched_clock() did (roughly speaking):
      
         delta = sched_clock() - scd->tick_raw;
         clock += delta;
      
         Which gives exponential growth, instead of linear.
      
      c) allows the sched_clock_cpu() value to warp the u64 without breaking.
      
      the results are more reliable sched_clock() deltas:
      
                 before       after   sched_clock
      
      cpu_clock: 15750        51312   51488
      cpu_clock: 59719        51052   50947
      cpu_clock: 15879        51249   51061
      cpu_clock: 1            50933   51198
      cpu_clock: 1            50931   51039
      cpu_clock: 1            51093   50981
      cpu_clock: 1            51043   51040
      cpu_clock: 1            50959   50938
      cpu_clock: 1            50981   51011
      cpu_clock: 1            51364   51212
      cpu_clock: 1            51219   51273
      cpu_clock: 1            51389   51048
      cpu_clock: 1            51285   51611
      cpu_clock: 1            50964   51137
      cpu_clock: 1            50973   50968
      cpu_clock: 1            50967   50972
      cpu_clock: 1            58910   58485
      cpu_clock: 1            51082   51025
      cpu_clock: 1            50957   50958
      cpu_clock: 1            50958   50957
      cpu_clock: 1006128      51128   50971
      cpu_clock: 1            51107   51155
      cpu_clock: 1            51371   51081
      cpu_clock: 1            51104   51365
      cpu_clock: 1            51363   51309
      cpu_clock: 1            51107   51160
      cpu_clock: 1            51139   51100
      cpu_clock: 1            51216   51136
      cpu_clock: 1            51207   51215
      cpu_clock: 1            51087   51263
      cpu_clock: 1            51249   51177
      cpu_clock: 1            51519   51412
      cpu_clock: 1            51416   51255
      cpu_clock: 1            51591   51594
      cpu_clock: 1            50966   51374
      cpu_clock: 1            50966   50966
      cpu_clock: 1            51291   50948
      cpu_clock: 1            50973   50867
      cpu_clock: 1            50970   50970
      cpu_clock: 998306       50970   50971
      cpu_clock: 1            50971   50970
      cpu_clock: 1            50970   50970
      cpu_clock: 1            50971   50971
      cpu_clock: 1            50970   50970
      cpu_clock: 1            51351   50970
      cpu_clock: 1            50970   51352
      cpu_clock: 1            50971   50970
      cpu_clock: 1            50970   50970
      cpu_clock: 1            51321   50971
      cpu_clock: 1            50974   51324
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      354879bb
  7. 24 8月, 2008 1 次提交
  8. 21 8月, 2008 4 次提交
  9. 20 8月, 2008 1 次提交
  10. 18 8月, 2008 1 次提交
  11. 16 8月, 2008 2 次提交
  12. 15 8月, 2008 8 次提交
  13. 14 8月, 2008 4 次提交
    • P
      sched: fix rt-bandwidth hotplug race · f1679d08
      Peter Zijlstra 提交于
      When we hot-unplug a cpu and rebuild the sched-domain, all cpus will be
      detatched. Alex observed the case where a runqueue was stealing bandwidth
      from an already disabled runqueue to satisfy its own needs.
      
      Stop this by skipping over already disabled runqueues.
      Reported-by: NAlex Nixon <alex.nixon@citrix.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NAlex Nixon <alex.nixon@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f1679d08
    • D
      security: Fix setting of PF_SUPERPRIV by __capable() · 5cd9c58f
      David Howells 提交于
      Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags
      the target process if that is not the current process and it is trying to
      change its own flags in a different way at the same time.
      
      __capable() is using neither atomic ops nor locking to protect t->flags.  This
      patch removes __capable() and introduces has_capability() that doesn't set
      PF_SUPERPRIV on the process being queried.
      
      This patch further splits security_ptrace() in two:
      
       (1) security_ptrace_may_access().  This passes judgement on whether one
           process may access another only (PTRACE_MODE_ATTACH for ptrace() and
           PTRACE_MODE_READ for /proc), and takes a pointer to the child process.
           current is the parent.
      
       (2) security_ptrace_traceme().  This passes judgement on PTRACE_TRACEME only,
           and takes only a pointer to the parent process.  current is the child.
      
           In Smack and commoncap, this uses has_capability() to determine whether
           the parent will be permitted to use PTRACE_ATTACH if normal checks fail.
           This does not set PF_SUPERPRIV.
      
      Two of the instances of __capable() actually only act on current, and so have
      been changed to calls to capable().
      
      Of the places that were using __capable():
      
       (1) The OOM killer calls __capable() thrice when weighing the killability of a
           process.  All of these now use has_capability().
      
       (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see
           whether the parent was allowed to trace any process.  As mentioned above,
           these have been split.  For PTRACE_ATTACH and /proc, capable() is now
           used, and for PTRACE_TRACEME, has_capability() is used.
      
       (3) cap_safe_nice() only ever saw current, so now uses capable().
      
       (4) smack_setprocattr() rejected accesses to tasks other than current just
           after calling __capable(), so the order of these two tests have been
           switched and capable() is used instead.
      
       (5) In smack_file_send_sigiotask(), we need to allow privileged processes to
           receive SIGIO on files they're manipulating.
      
       (6) In smack_task_wait(), we let a process wait for a privileged process,
           whether or not the process doing the waiting is privileged.
      
      I've tested this with the LTP SELinux and syscalls testscripts.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
      Acked-by: NAndrew G. Morgan <morgan@kernel.org>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      5cd9c58f
    • Z
      sched: fix the race between walk_tg_tree and sched_create_group · 09f2724a
      Zhang, Yanmin 提交于
      With 2.6.27-rc3, I hit a kernel panic when running volanoMark on my
      new x86_64 machine. I also hit it with other 2.6.27-rc kernels.
      See below log.
      
      Basically, function walk_tg_tree and sched_create_group have a race
      between accessing and initiating tg->children. Below patch fixes it
      by moving tg->children initiation to the front of linking tg->siblings
      to parent->children.
      
      {----------------panic log------------}
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      IP: [<ffffffff802292ab>] walk_tg_tree+0x45/0x7f
      PGD 1be1c4067 PUD 1bdd8d067 PMD 0
      Oops: 0000 [1] SMP
      CPU 11
      Modules linked in: igb
      Pid: 22979, comm: java Not tainted 2.6.27-rc3 #1
      RIP: 0010:[<ffffffff802292ab>]  [<ffffffff802292ab>] walk_tg_tree+0x45/0x7f
      RSP: 0018:ffff8801bfbbbd18  EFLAGS: 00010083
      RAX: 0000000000000000 RBX: ffff8800be0dce40 RCX: ffffffffffffffc0
      RDX: ffff880102c43740 RSI: 0000000000000000 RDI: ffff8800be0dce40
      RBP: ffff8801bfbbbd48 R08: ffff8800ba437bc8 R09: 0000000000001f40
      R10: ffff8801be812100 R11: ffffffff805fdf44 R12: ffff880102c43740
      R13: 0000000000000000 R14: ffffffff8022cf0f R15: ffffffff8022749f
      FS:  00000000568ac950(0063) GS:ffff8801bfa26d00(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000000 CR3: 00000001bd848000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process java (pid: 22979, threadinfo ffff8801b145a000, task ffff8801bf18e450)
      Stack:  0000000000000001 ffff8800ba5c8d60 0000000000000001 0000000000000001
       ffff8800bad1ccb8 0000000000000000 ffff8801bfbbbd98 ffffffff8022ed37
       0000000000000001 0000000000000286 ffff8801bd5ee180 ffff8800ba437bc8
      Call Trace:
       <IRQ>  [<ffffffff8022ed37>] try_to_wake_up+0x71/0x24c
       [<ffffffff80247177>] autoremove_wake_function+0x9/0x2e
       [<ffffffff80228039>] ? __wake_up_common+0x46/0x76
       [<ffffffff802296d5>] __wake_up+0x38/0x4f
       [<ffffffff806169cc>] tcp_v4_rcv+0x380/0x62e
      Signed-off-by: NZhang Yanmin <yanmin_zhang@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      09f2724a
    • A
      lockdep: use WARN() in kernel/lockdep.c · 2df8b1d6
      Arjan van de Ven 提交于
      Use WARN() instead of a printk+WARN_ON() pair; this way the message
      becomes part of the warning section for better reporting/collection.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      2df8b1d6
  14. 13 8月, 2008 4 次提交
  15. 12 8月, 2008 2 次提交