1. 29 9月, 2008 1 次提交
    • B
      mm owner: fix race between swapoff and exit · 31a78f23
      Balbir Singh 提交于
      There's a race between mm->owner assignment and swapoff, more easily
      seen when task slab poisoning is turned on.  The condition occurs when
      try_to_unuse() runs in parallel with an exiting task.  A similar race
      can occur with callers of get_task_mm(), such as /proc/<pid>/<mmstats>
      or ptrace or page migration.
      
      CPU0                                    CPU1
                                              try_to_unuse
                                              looks at mm = task0->mm
                                              increments mm->mm_users
      task 0 exits
      mm->owner needs to be updated, but no
      new owner is found (mm_users > 1, but
      no other task has task->mm = task0->mm)
      mm_update_next_owner() leaves
                                              mmput(mm) decrements mm->mm_users
      task0 freed
                                              dereferencing mm->owner fails
      
      The fix is to notify the subsystem via mm_owner_changed callback(),
      if no new owner is found, by specifying the new task as NULL.
      
      Jiri Slaby:
      mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
      must be set after that, so as not to pass NULL as old owner causing oops.
      
      Daisuke Nishimura:
      mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
      and its callers need to take account of this situation to avoid oops.
      
      Hugh Dickins:
      Lockdep warning and hang below exec_mmap() when testing these patches.
      exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
      so exec_mmap() now needs to do the same.  And with that repositioning,
      there's now no point in mm_need_new_owner() allowing for NULL mm.
      Reported-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31a78f23
  2. 26 9月, 2008 2 次提交
  3. 23 9月, 2008 7 次提交
    • J
      kexec: fix segmentation fault in kimage_add_entry · f9092f35
      Jonathan Steel 提交于
      A segmentation fault can occur in kimage_add_entry in kexec.c when loading
      a kernel image into memory.  The fault occurs because a page is requested
      by calling kimage_alloc_page with gfp_mask GFP_KERNEL and the function may
      actually return a page with gfp_mask GFP_HIGHUSER.  The high mem page is
      returned because it was swapped with the kernel page due to the kernel
      page being a page that will shortly be copied to.
      
      This patch ensures that kimage_alloc_page returns a page that was created
      with the correct gfp flags.
      
      I have verified the change and fixed the whitespace damage of the original
      patch.  Jonathan did a great job of tracking this down after he hit the
      problem.  -- Eric
      Signed-off-by: NJonathan Steel <jon.steel@esentire.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9092f35
    • I
      timers: fix build error in !oneshot case · f8e256c6
      Ingo Molnar 提交于
       kernel/time/tick-common.c: In function ‘tick_setup_periodic’:
       kernel/time/tick-common.c:113: error: implicit declaration of function ‘tick_broadcast_oneshot_active’
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f8e256c6
    • T
      clockevents: prevent mode mismatch on cpu online · 27ce4cb4
      Thomas Gleixner 提交于
      Impact: timer hang on CPU online observed on AMD C1E systems
      
      When a CPU is brought online then the broadcast machinery can
      be in the one shot state already. Check this and setup the timer 
      device of the new CPU in one shot mode so the broadcast code
      can pick up the next_event value correctly.
      
      Another AMD C1E oddity, as we switch to broadcast immediately and
      not after the full bring up via the ACPI cpu idle code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      27ce4cb4
    • T
      clockevents: check broadcast device not tick device · 30274569
      Thomas Gleixner 提交于
      Impact: Possible hang on CPU online observed on AMD C1E machines.
      
      The broadcast setup code looks at the mode of the tick device to
      determine whether it needs to be shut down or setup. This is wrong
      when the broadcast mode is set to one shot already. This can happen
      when a CPU is brought online as it goes through the periodic setup
      first.
      
      The problem went unnoticed as sane systems do not call into that code
      before the switch to one shot for the clock event device happens.
      The AMD C1E idle routine switches over immediately and thereby shuts
      down the just setup device before the first interrupt happens.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      30274569
    • T
      clockevents: prevent stale tick_next_period for onlining CPUs · 49d670fb
      Thomas Gleixner 提交于
      Impact: possible hang on CPU onlining in timer one shot mode.
      
      The tick_next_period variable is only used during boot on nohz/highres
      enabled systems, but for CPU onlining it needs to be maintained when
      the per cpu clock events device operates in one shot mode.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      49d670fb
    • T
      clockevents: prevent cpu online to interfere with nohz · 6441402b
      Thomas Gleixner 提交于
      Impact: rare hang which can be triggered on CPU online.
      
      tick_do_timer_cpu keeps track of the CPU which updates jiffies
      via do_timer. The value -1 is used to signal, that currently no
      CPU is doing this. There are two cases, where the variable can 
      have this state:
      
       boot:
          necessary for systems where the boot cpu id can be != 0
      
       nohz long idle sleep:
          When the CPU which did the jiffies update last goes into
          a long idle sleep it drops the update jiffies duty so
          another CPU which is not idle can pick it up and keep
          jiffies going.
      
      Using the same value for both situations is wrong, as the CPU online
      code can see the -1 state when the timer of the newly onlined CPU is
      setup. The setup for a newly onlined CPU goes through periodic mode
      and can pick up the do_timer duty without being aware of the nohz /
      highres mode of the already running system.
      
      Use two separate states and make them constants to avoid magic
      numbers confusion. 
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6441402b
    • R
      sched: fix init_hrtick() section mismatch warning · fa748203
      Rakib Mullick 提交于
      LD      kernel/built-in.o
      WARNING: kernel/built-in.o(.text+0x326): Section mismatch in reference
      from the function init_hrtick() to the variable
      .cpuinit.data:hotplug_hrtick_nb.8
      The function init_hrtick() references
      the variable __cpuinitdata hotplug_hrtick_nb.8.
      This is often because init_hrtick lacks a __cpuinitdata
      annotation or the annotation of hotplug_hrtick_nb.8 is wrong.
      Signed-off-by: NMd.Rakib H. Mullick <rakib.mullick@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa748203
  4. 17 9月, 2008 1 次提交
    • T
      clockevents: make device shutdown robust · 2344abbc
      Thomas Gleixner 提交于
      The device shut down does not cleanup the next_event variable of the
      clock event device. So when the device is reactivated the possible
      stale next_event value can prevent the device to be reprogrammed as it
      claims to wait on a event already.
      
      This is the root cause of the resurfacing suspend/resume problem,
      where systems need key press to come back to life.
      
      Fix this by setting next_event to KTIME_MAX when the device is shut
      down. Use a separate function for shutdown which takes care of that
      and only keep the direct set mode call in the broadcast code, where we
      can not touch the next_event value.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2344abbc
  5. 14 9月, 2008 1 次提交
  6. 11 9月, 2008 2 次提交
    • H
      sched: fix deadlock in setting scheduler parameter to zero · ec5d4989
      Hiroshi Shimamoto 提交于
      Andrei Gusev wrote:
      
      > I played witch scheduler settings. After doing something like:
      > echo -n 1000000 >sched_rt_period_us
      >
      > command is locked. I found in kernel.log:
      >
      > Sep 11 00:39:34 zaratustra
      > Sep 11 00:39:34 zaratustra Pid: 4495, comm: bash Tainted: G        W
      > (2.6.26.3 #12)
      > Sep 11 00:39:34 zaratustra EIP: 0060:[<c0213fc7>] EFLAGS: 00210246 CPU: 0
      > Sep 11 00:39:34 zaratustra EIP is at div64_u64+0x57/0x80
      > Sep 11 00:39:34 zaratustra EAX: 0000389f EBX: 00000000 ECX: 00000000
      > EDX: 00000000
      > Sep 11 00:39:34 zaratustra ESI: d9800000 EDI: d9800000 EBP: 0000389f
      > ESP: ea7a6edc
      > Sep 11 00:39:34 zaratustra DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
      > Sep 11 00:39:34 zaratustra Process bash (pid: 4495, ti=ea7a6000
      > task=ea744000 task.ti=ea7a6000)
      > Sep 11 00:39:34 zaratustra Stack: 00000000 000003e8 d9800000 0000389f
      > c0119042 00000000 00000000 00000001
      > Sep 11 00:39:34 zaratustra 00000000 00000000 ea7a6f54 00010000 00000000
      > c04d2e80 00000001 000e7ef0
      > Sep 11 00:39:34 zaratustra c01191a3 00000000 00000000 ea7a6fa0 00000001
      > ffffffff c04d2e80 ea5b2480
      > Sep 11 00:39:34 zaratustra Call Trace:
      > Sep 11 00:39:34 zaratustra [<c0119042>] __rt_schedulable+0x52/0x130
      > Sep 11 00:39:34 zaratustra [<c01191a3>] sched_rt_handler+0x83/0x120
      > Sep 11 00:39:34 zaratustra [<c01a76a6>] proc_sys_call_handler+0xb6/0xd0
      > Sep 11 00:39:34 zaratustra [<c01a76c0>] proc_sys_write+0x0/0x20
      > Sep 11 00:39:34 zaratustra [<c01a76d9>] proc_sys_write+0x19/0x20
      > Sep 11 00:39:34 zaratustra [<c016cc68>] vfs_write+0xa8/0x140
      > Sep 11 00:39:34 zaratustra [<c016cdd1>] sys_write+0x41/0x80
      > Sep 11 00:39:34 zaratustra [<c0103051>] sysenter_past_esp+0x6a/0x91
      > Sep 11 00:39:34 zaratustra =======================
      > Sep 11 00:39:34 zaratustra Code: c8 41 0f ad f3 d3 ee f6 c1 20 0f 45 de
      > 31 f6 0f ad ef d3 ed f6 c1 20 0f 45 fd 0f 45 ee 31 c9 39 eb 89 fe 89 ea
      > 77 08 89 e8 31 d2 <f7> f3 89 c1 89 f0 8b 7c 24 08 f7 f3 8b 74 24 04 89
      > ca 8b 1c 24
      > Sep 11 00:39:34 zaratustra EIP: [<c0213fc7>] div64_u64+0x57/0x80 SS:ESP
      > 0068:ea7a6edc
      > Sep 11 00:39:34 zaratustra ---[ end trace 4eaa2a86a8e2da22 ]---
      
      fix the boundary condition.
      
      sysctl_sched_rt_period=0 makes exception at to_ratio().
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ec5d4989
    • Z
      sched: fix 2.6.27-rc5 couldn't boot on tulsa machine randomly · baf25731
      Zhang, Yanmin 提交于
      On my tulsa x86-64 machine, kernel 2.6.25-rc5 couldn't boot randomly.
      
      Basically, function __enable_runtime forgets to reset rt_rq->rt_throttled
      to 0. When every cpu is up, per-cpu migration_thread is created and it runs
      very fast, sometimes to mark the corresponding rt_rq->rt_throttled to 1 very
      quickly. After all cpus are up, with below calling chain:
      
         sched_init_smp => arch_init_sched_domains => build_sched_domains => ...
      => cpu_attach_domain => rq_attach_root => set_rq_online => ...
      => _enable_runtime
      
      _enable_runtime is called against every rt_rq again, so rt_rq->rt_time is
      reset to 0, but rt_rq->rt_throttled might be still 1. Later on function
      do_sched_rt_period_timer couldn't reset it, and all RT tasks couldn't be
      scheduled to run on that cpu. here is RT task migration_thread which is
      woken up when a task is migrated to another cpu.
      
      Below patch fixes it against 2.6.27-rc5.
      Signed-off-by: NZhang Yanmin <yanmin_zhang@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      baf25731
  7. 10 9月, 2008 2 次提交
    • T
      clockevents: remove WARN_ON which was used to gather information · e75b986a
      Thomas Gleixner 提交于
      The issue of the endless reprogramming loop due to a too small
      min_delta_ns was fixed with the previous updates of the clock events
      code, but we had no information about the spread of this problem. I
      added a WARN_ON to get automated information via kerneloops.org and to
      get some direct reports, which allowed me to analyse the affected
      machines.
      
      The WARN_ON has served its purpose and would be annoying for a release
      kernel. Remove it and just keep the information about the increase of
      the min_delta_ns value.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e75b986a
    • T
      clockevents: remove WARN_ON which was used to gather information · 61c22c34
      Thomas Gleixner 提交于
      The issue of the endless reprogramming loop due to a too small
      min_delta_ns was fixed with the previous updates of the clock events
      code, but we had no information about the spread of this problem. I
      added a WARN_ON to get automated information via kerneloops.org and to
      get some direct reports, which allowed me to analyse the affected
      machines.
      
      The WARN_ON has served its purpose and would be annoying for a release
      kernel. Remove it and just keep the information about the increase of
      the min_delta_ns value.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      61c22c34
  8. 07 9月, 2008 2 次提交
  9. 06 9月, 2008 4 次提交
    • M
      ntp: fix calculation of the next jiffie to trigger RTC sync · 4ff4b9e1
      Maciej W. Rozycki 提交于
      We have a bug in the calculation of the next jiffie to trigger the RTC
      synchronisation.  The aim here is to run sync_cmos_clock() as close as
      possible to the middle of a second.  Which means we want this function to
      be called less than or equal to half a jiffie away from when now.tv_nsec
      equals 5e8 (500000000).
      
      If this is not the case for a given call to the function, for this purpose
      instead of updating the RTC we calculate the offset in nanoseconds to the
      next point in time where now.tv_nsec will be equal 5e8.  The calculated
      offset is then converted to jiffies as these are the unit used by the
      timer.
      
      Hovewer timespec_to_jiffies() used here uses a ceil()-type rounding mode,
      where the resulting value is rounded up.  As a result the range of
      now.tv_nsec when the timer will trigger is from 5e8 to 5e8 + TICK_NSEC
      rather than the desired 5e8 - TICK_NSEC / 2 to 5e8 + TICK_NSEC / 2.
      
      As a result if for example sync_cmos_clock() happens to be called at the
      time when now.tv_nsec is between 5e8 + TICK_NSEC / 2 and 5e8 to 5e8 +
      TICK_NSEC, it will simply be rescheduled HZ jiffies later, falling in the
      same range of now.tv_nsec again.  Similarly for cases offsetted by an
      integer multiple of TICK_NSEC.
      
      This change addresses the problem by subtracting TICK_NSEC / 2 from the
      nanosecond offset to the next point in time where now.tv_nsec will be
      equal 5e8, effectively shifting the following rounding in
      timespec_to_jiffies() so that it produces a rounded-to-nearest result.
      Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4ff4b9e1
    • T
      clockevents: broadcast fixup possible waiters · 7300711e
      Thomas Gleixner 提交于
      Until the C1E patches arrived there where no users of periodic broadcast
      before switching to oneshot mode. Now we need to trigger a possible
      waiter for a periodic broadcast when switching to oneshot mode.
      Otherwise we can starve them for ever.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7300711e
    • B
      sched: fix process time monotonicity · 49048622
      Balbir Singh 提交于
      Spencer reported a problem where utime and stime were going negative despite
      the fixes in commit b27f03d4. The suspected
      reason for the problem is that signal_struct maintains it's own utime and
      stime (of exited tasks), these are not updated using the new task_utime()
      routine, hence sig->utime can go backwards and cause the same problem
      to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
      fixes the problem
      
      TODO: using max(task->prev_utime, derived utime) works for now, but a more
      generic solution is to implement cputime_max() and use the cputime_gt()
      function for comparison.
      
      Reported-by: spencer@bluehost.com
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      49048622
    • P
      sched_clock: fix NOHZ interaction · 56c7426b
      Peter Zijlstra 提交于
      If HLT stops the TSC, we'll fail to account idle time, thereby inflating the
      actual process times. Fix this by re-calibrating the clock against GTOD when
      leaving nohz mode.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NAvi Kivity <avi@qumranet.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      56c7426b
  10. 05 9月, 2008 6 次提交
    • T
      clockevents: prevent endless loop lockup · 1fb9b7d2
      Thomas Gleixner 提交于
      The C1E/HPET bug reports on AMDX2/RS690 systems where tracked down to a
      too small value of the HPET minumum delta for programming an event.
      
      The clockevents code needs to enforce an interrupt event on the clock event
      device in some cases. The enforcement code was stupid and naive, as it just
      added the minimum delta to the current time and tried to reprogram the device.
      When the minimum delta is too small, then this loops forever.
      
      Add a sanity check. Allow reprogramming to fail 3 times, then print a warning
      and double the minimum delta value to make sure, that this does not happen again.
      Use the same function for both tick-oneshot and tick-broadcast code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1fb9b7d2
    • T
      clockevents: prevent multiple init/shutdown · 9c17bcda
      Thomas Gleixner 提交于
      While chasing the C1E/HPET bugreports I went through the clock events
      code inch by inch and found that the broadcast device can be initialized
      and shutdown multiple times. Multiple shutdowns are not critical, but
      useless waste of time. Multiple initializations are simply broken. Another
      CPU might have the device in use already after the first initialization and
      the second init could just render it unusable again.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9c17bcda
    • T
      clockevents: enforce reprogram in oneshot setup · 7205656a
      Thomas Gleixner 提交于
      In tick_oneshot_setup we program the device to the given next_event,
      but we do not check the return value. We need to make sure that the
      device is programmed enforced so the interrupt handler engine starts
      working. Split out the reprogramming function from tick_program_event()
      and call it with the device, which was handed in to tick_setup_oneshot().
      Set the force argument, so the devices is firing an interrupt.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7205656a
    • T
      clockevents: prevent endless loop in periodic broadcast handler · d4496b39
      Thomas Gleixner 提交于
      The reprogramming of the periodic broadcast handler was broken,
      when the first programming returned -ETIME. The clockevents code
      stores the new expiry value in the clock events device next_event field
      only when the programming time has not been elapsed yet. The loop in
      question calculates the new expiry value from the next_event value
      and therefor never increases.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d4496b39
    • V
      clockevents: prevent clockevent event_handler ending up handler_noop · 7c1e7689
      Venkatesh Pallipadi 提交于
      There is a ordering related problem with clockevents code, due to which
      clockevents_register_device() called after tickless/highres switch
      will not work. The new clockevent ends up with clockevents_handle_noop as
      event handler, resulting in no timer activity.
      
      The problematic path seems to be
      
      * old device already has hrtimer_interrupt as the event_handler
      * new clockevent device registers with a higher rating
      * tick_check_new_device() is called
        * clockevents_exchange_device() gets called
          * old->event_handler is set to clockevents_handle_noop
        * tick_setup_device() is called for the new device
          * which sets new->event_handler using the old->event_handler which is noop.
      
      Change the ordering so that new device inherits the proper handler.
      
      This does not have any issue in normal case as most likely all the clockevent
      devices are setup before the highres switch. But, can potentially be affecting
      some corner case where HPET force detect happens after the highres switch.
      This was a problem with HPET in MSI mode code that we have been experimenting
      with.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7c1e7689
    • A
      forgotten refcount on sysctl root table · b380b0d4
      Al Viro 提交于
      We should've set refcount on the root sysctl table; otherwise we'll blow
      up the first time we get down to zero dynamically registered sysctl
      tables.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b380b0d4
  11. 03 9月, 2008 5 次提交
  12. 02 9月, 2008 1 次提交
  13. 30 8月, 2008 2 次提交
  14. 29 8月, 2008 1 次提交
  15. 28 8月, 2008 3 次提交
    • P
      sched: rt-bandwidth accounting fix · cc2991cf
      Peter Zijlstra 提交于
      It fixes an accounting bug where we would continue accumulating runtime
      even though the bandwidth control is disabled. This would lead to very long
      throttle periods once bandwidth control gets turned on again.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc2991cf
    • J
      sched: fix sched_rt_rq_enqueue() resched idle · f3ade837
      John Blackwood 提交于
      When sysctl_sched_rt_runtime is set to something other than -1 and the
      CONFIG_RT_GROUP_SCHED kernel parameter is NOT enabled, we get into a state
      where we see one or more CPUs idling forvever even though there are
      real-time
      tasks in their rt runqueue that are able to run (no longer throttled).
      
      The sequence is:
      
      - A real-time task is running when the timer sets the rt runqueue
          to throttled, and the rt task is resched_task()ed and switched
          out, and idle is switched in since there are no non-rt tasks to
          run on that cpu.
      
      - Eventually the do_sched_rt_period_timer() runs and un-throttles
          the rt runqueue, but we just exit the timer interrupt and go back
          to executing the idle task in the idle loop forever.
      
      If we change the sched_rt_rq_enqueue() routine to use some of the code
      from the CONFIG_RT_GROUP_SCHED enabled version of this same routine and
      resched_task() the currently executing task (idle in our case) if it is
      a lower priority task than the higher rt task in the now un-throttled
      runqueue, the problem is no longer observed.
      Signed-off-by: NJohn Blackwood <john.blackwood@ccur.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f3ade837
    • S
      ftrace: disable tracing for suspend to ram · f42ac38c
      Steven Rostedt 提交于
      I've been painstakingly debugging the issue with suspend to ram and
      ftraced. The 2.6.28 code does not have this issue, but since the mcount
      recording is not going to be in 27, this must be solved for the ftrace
      daemon version.
      
      The resume from suspend to ram would reboot because it was triple
      faulting. Debugging further, I found that calling the mcount function
      itself was not an issue, but it would fault when it incremented
      preempt_count. preempt_count is on the tasks info structure that is on the
      low memory address of the task's stack.  For some reason, it could not
      write to it. Resuming out of suspend to ram does quite a lot of funny
      tricks to get to work, so it is not surprising at all that simply doing a
      preempt_disable() would cause a fault.
      
      Thanks to Rafael for suggesting to add a "while (1);" to find the place in
      resuming that is causing the fault. I would place the loop somewhere in
      the code, compile and reboot and see if it would either reboot (hit the
      fault) or simply hang (hit the loop).  Doing this over and over again, I
      narrowed it down that it was happening in enable_nonboot_cpus.
      
      At this point, I found that it is easier to simply disable tracing around
      the suspend code, instead of searching for the particular function that
      can not handle doing a preempt_disable.
      
      This patch disables the tracer as it suspends and reenables it on resume.
      
      I tested this patch on my Laptop, and it can resume fine with the patch.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f42ac38c