1. 14 9月, 2008 1 次提交
    • F
      timers: fix itimer/many thread hang · f06febc9
      Frank Mayhar 提交于
      Overview
      
      This patch reworks the handling of POSIX CPU timers, including the
      ITIMER_PROF, ITIMER_VIRT timers and rlimit handling.  It was put together
      with the help of Roland McGrath, the owner and original writer of this code.
      
      The problem we ran into, and the reason for this rework, has to do with using
      a profiling timer in a process with a large number of threads.  It appears
      that the performance of the old implementation of run_posix_cpu_timers() was
      at least O(n*3) (where "n" is the number of threads in a process) or worse.
      Everything is fine with an increasing number of threads until the time taken
      for that routine to run becomes the same as or greater than the tick time, at
      which point things degrade rather quickly.
      
      This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF."
      
      Code Changes
      
      This rework corrects the implementation of run_posix_cpu_timers() to make it
      run in constant time for a particular machine.  (Performance may vary between
      one machine and another depending upon whether the kernel is built as single-
      or multiprocessor and, in the latter case, depending upon the number of
      running processors.)  To do this, at each tick we now update fields in
      signal_struct as well as task_struct.  The run_posix_cpu_timers() function
      uses those fields to make its decisions.
      
      We define a new structure, "task_cputime," to contain user, system and
      scheduler times and use these in appropriate places:
      
      struct task_cputime {
      	cputime_t utime;
      	cputime_t stime;
      	unsigned long long sum_exec_runtime;
      };
      
      This is included in the structure "thread_group_cputime," which is a new
      substructure of signal_struct and which varies for uniprocessor versus
      multiprocessor kernels.  For uniprocessor kernels, it uses "task_cputime" as
      a simple substructure, while for multiprocessor kernels it is a pointer:
      
      struct thread_group_cputime {
      	struct task_cputime totals;
      };
      
      struct thread_group_cputime {
      	struct task_cputime *totals;
      };
      
      We also add a new task_cputime substructure directly to signal_struct, to
      cache the earliest expiration of process-wide timers, and task_cputime also
      replaces the it_*_expires fields of task_struct (used for earliest expiration
      of thread timers).  The "thread_group_cputime" structure contains process-wide
      timers that are updated via account_user_time() and friends.  In the non-SMP
      case the structure is a simple aggregator; unfortunately in the SMP case that
      simplicity was not achievable due to cache-line contention between CPUs (in
      one measured case performance was actually _worse_ on a 16-cpu system than
      the same test on a 4-cpu system, due to this contention).  For SMP, the
      thread_group_cputime counters are maintained as a per-cpu structure allocated
      using alloc_percpu().  The timer functions update only the timer field in
      the structure corresponding to the running CPU, obtained using per_cpu_ptr().
      
      We define a set of inline functions in sched.h that we use to maintain the
      thread_group_cputime structure and hide the differences between UP and SMP
      implementations from the rest of the kernel.  The thread_group_cputime_init()
      function initializes the thread_group_cputime structure for the given task.
      The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the
      out-of-line function thread_group_cputime_alloc_smp() to allocate and fill
      in the per-cpu structures and fields.  The thread_group_cputime_free()
      function, also a no-op for UP, in SMP frees the per-cpu structures.  The
      thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls
      thread_group_cputime_alloc() if the per-cpu structures haven't yet been
      allocated.  The thread_group_cputime() function fills the task_cputime
      structure it is passed with the contents of the thread_group_cputime fields;
      in UP it's that simple but in SMP it must also safely check that tsk->signal
      is non-NULL (if it is it just uses the appropriate fields of task_struct) and,
      if so, sums the per-cpu values for each online CPU.  Finally, the three
      functions account_group_user_time(), account_group_system_time() and
      account_group_exec_runtime() are used by timer functions to update the
      respective fields of the thread_group_cputime structure.
      
      Non-SMP operation is trivial and will not be mentioned further.
      
      The per-cpu structure is always allocated when a task creates its first new
      thread, via a call to thread_group_cputime_clone_thread() from copy_signal().
      It is freed at process exit via a call to thread_group_cputime_free() from
      cleanup_signal().
      
      All functions that formerly summed utime/stime/sum_sched_runtime values from
      from all threads in the thread group now use thread_group_cputime() to
      snapshot the values in the thread_group_cputime structure or the values in
      the task structure itself if the per-cpu structure hasn't been allocated.
      
      Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit.
      The run_posix_cpu_timers() function has been split into a fast path and a
      slow path; the former safely checks whether there are any expired thread
      timers and, if not, just returns, while the slow path does the heavy lifting.
      With the dedicated thread group fields, timers are no longer "rebalanced" and
      the process_timer_rebalance() function and related code has gone away.  All
      summing loops are gone and all code that used them now uses the
      thread_group_cputime() inline.  When process-wide timers are set, the new
      task_cputime structure in signal_struct is used to cache the earliest
      expiration; this is checked in the fast path.
      
      Performance
      
      The fix appears not to add significant overhead to existing operations.  It
      generally performs the same as the current code except in two cases, one in
      which it performs slightly worse (Case 5 below) and one in which it performs
      very significantly better (Case 2 below).  Overall it's a wash except in those
      two cases.
      
      I've since done somewhat more involved testing on a dual-core Opteron system.
      
      Case 1: With no itimer running, for a test with 100,000 threads, the fixed
      	kernel took 1428.5 seconds, 513 seconds more than the unfixed system,
      	all of which was spent in the system.  There were twice as many
      	voluntary context switches with the fix as without it.
      
      Case 2: With an itimer running at .01 second ticks and 4000 threads (the most
      	an unmodified kernel can handle), the fixed kernel ran the test in
      	eight percent of the time (5.8 seconds as opposed to 70 seconds) and
      	had better tick accuracy (.012 seconds per tick as opposed to .023
      	seconds per tick).
      
      Case 3: A 4000-thread test with an initial timer tick of .01 second and an
      	interval of 10,000 seconds (i.e. a timer that ticks only once) had
      	very nearly the same performance in both cases:  6.3 seconds elapsed
      	for the fixed kernel versus 5.5 seconds for the unfixed kernel.
      
      With fewer threads (eight in these tests), the Case 1 test ran in essentially
      the same time on both the modified and unmodified kernels (5.2 seconds versus
      5.8 seconds).  The Case 2 test ran in about the same time as well, 5.9 seconds
      versus 5.4 seconds but again with much better tick accuracy, .013 seconds per
      tick versus .025 seconds per tick for the unmodified kernel.
      
      Since the fix affected the rlimit code, I also tested soft and hard CPU limits.
      
      Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer
      	running), the modified kernel was very slightly favored in that while
      	it killed the process in 19.997 seconds of CPU time (5.002 seconds of
      	wall time), only .003 seconds of that was system time, the rest was
      	user time.  The unmodified kernel killed the process in 20.001 seconds
      	of CPU (5.014 seconds of wall time) of which .016 seconds was system
      	time.  Really, though, the results were too close to call.  The results
      	were essentially the same with no itimer running.
      
      Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds
      	(where the hard limit would never be reached) and an itimer running,
      	the modified kernel exhibited worse tick accuracy than the unmodified
      	kernel: .050 seconds/tick versus .028 seconds/tick.  Otherwise,
      	performance was almost indistinguishable.  With no itimer running this
      	test exhibited virtually identical behavior and times in both cases.
      
      In times past I did some limited performance testing.  those results are below.
      
      On a four-cpu Opteron system without this fix, a sixteen-thread test executed
      in 3569.991 seconds, of which user was 3568.435s and system was 1.556s.  On
      the same system with the fix, user and elapsed time were about the same, but
      system time dropped to 0.007 seconds.  Performance with eight, four and one
      thread were comparable.  Interestingly, the timer ticks with the fix seemed
      more accurate:  The sixteen-thread test with the fix received 149543 ticks
      for 0.024 seconds per tick, while the same test without the fix received 58720
      for 0.061 seconds per tick.  Both cases were configured for an interval of
      0.01 seconds.  Again, the other tests were comparable.  Each thread in this
      test computed the primes up to 25,000,000.
      
      I also did a test with a large number of threads, 100,000 threads, which is
      impossible without the fix.  In this case each thread computed the primes only
      up to 10,000 (to make the runtime manageable).  System time dominated, at
      1546.968 seconds out of a total 2176.906 seconds (giving a user time of
      629.938s).  It received 147651 ticks for 0.015 seconds per tick, still quite
      accurate.  There is obviously no comparable test without the fix.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f06febc9
  2. 25 5月, 2008 1 次提交
  3. 01 5月, 2008 1 次提交
    • R
      remove div_long_long_rem · f8bd2258
      Roman Zippel 提交于
      x86 is the only arch right now, which provides an optimized for
      div_long_long_rem and it has the downside that one has to be very careful that
      the divide doesn't overflow.
      
      The API is a little akward, as the arguments for the unsigned divide are
      signed.  The signed version also doesn't handle a negative divisor and
      produces worse code on 64bit archs.
      
      There is little incentive to keep this API alive, so this converts the few
      users to the new API.
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8bd2258
  4. 17 4月, 2008 1 次提交
  5. 09 2月, 2008 1 次提交
  6. 26 1月, 2008 2 次提交
  7. 20 10月, 2007 1 次提交
  8. 10 7月, 2007 1 次提交
  9. 09 5月, 2007 1 次提交
    • P
      Introduce a handy list_first_entry macro · b5e61818
      Pavel Emelianov 提交于
      There are many places in the kernel where the construction like
      
         foo = list_entry(head->next, struct foo_struct, list);
      
      are used.
      The code might look more descriptive and neat if using the macro
      
         list_first_entry(head, type, member) \
                   list_entry((head)->next, type, member)
      
      Here is the macro itself and the examples of its usage in the generic code.
       If it will turn out to be useful, I can prepare the set of patches to
      inject in into arch-specific code, drivers, networking, etc.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5e61818
  10. 17 2月, 2007 1 次提交
  11. 17 10月, 2006 1 次提交
    • T
      [PATCH] posix-cpu-timers: prevent signal delivery starvation · ac08c264
      Thomas Gleixner 提交于
      The integer divisions in the timer accounting code can round the result
      down to 0.  Adding 0 is without effect and the signal delivery stops.
      
      Clamp the division result to minimum 1 to avoid this.
      
      Problem was reported by Seongbae Park <spark@google.com>, who provided
      also an inital patch.
      
      Roland sayeth:
      
        I have had some more time to think about the problem, and to reproduce it
        using Toyo's test case.  For the record, if my understanding of the problem
        is correct, this happens only in one very particular case.  First, the
        expiry time has to be so soon that in cputime_t units (usually 1s/HZ ticks)
        it's < nthreads so the division yields zero.  Second, it only affects each
        thread that is so new that its CPU time accumulation is zero so now+0 is
        still zero and ->it_*_expires winds up staying zero.  For the VIRT and PROF
        clocks when cputime_t is tick granularity (or the SCHED clock on
        configurations where sched_clock's value only advances on clock ticks), this
        is not hard to arrange with new threads starting up and blocking before they
        accumulate a whole tick of CPU time.  That's what happens in Toyo's test
        case.
      
        Note that in general it is fine for that division to round down to zero,
        and set each thread's expiry time to its "now" time.  The problem only
        arises with thread's whose "now" value is still zero, so that now+0 winds up
        0 and is interpreted as "not set" instead of ">= now".  So it would be a
        sufficient and more precise fix to just use max(ticks, 1) inside the loop
        when setting each it_*_expires value.
      
        But, it does no harm to round the division up to one and always advance
        every thread's expiry time.  If the thread didn't already fire timers for
        the expiry time of "now", there is no expectation that it will do so before
        the next tick anyway.  So I followed Thomas's patch in lifting the max out
        of the loops.
      
        This patch also covers the reload cases, which are harder to write a test
        for (and I didn't try).  I've tested it with Toyo's case and it fixes that.
      
      [toyoa@mvista.com: fix: min_t -> max_t]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Daniel Walker <dwalker@mvista.com>
      Cc: Toyo Abe <toyoa@mvista.com>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Seongbae Park <spark@google.com>
      Cc: Peter Mattis <pmattis@google.com>
      Cc: Rohit Seth <rohitseth@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ac08c264
  12. 30 9月, 2006 2 次提交
    • T
      [PATCH] posix-timers: Fix the flags handling in posix_cpu_nsleep() · e4b76555
      Toyo Abe 提交于
      When a posix_cpu_nsleep() sleep is interrupted by a signal more than twice, it
      incorrectly reports the sleep time remaining to the user.  Because
      posix_cpu_nsleep() doesn't report back to the user when it's called from
      restart function due to the wrong flags handling.
      
      This patch, which applies after previous one, moves the nanosleep() function
      from posix_cpu_nsleep() to do_cpu_nanosleep() and cleans up the flags handling
      appropriately.
      Signed-off-by: NToyo Abe <toyoa@mvista.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e4b76555
    • T
      [PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining time in compatibility mode · 1711ef38
      Toyo Abe 提交于
      The clock_nanosleep() function does not return the time remaining when the
      sleep is interrupted by a signal.
      
      This patch creates a new call out, compat_clock_nanosleep_restart(), which
      handles returning the remaining time after a sleep is interrupted.  This
      patch revives clock_nanosleep_restart().  It is now accessed via the new
      call out.  The compat_clock_nanosleep_restart() is used for compatibility
      access.
      
      Since this is implemented in compatibility mode the normal path is
      virtually unaffected - no real performance impact.
      Signed-off-by: NToyo Abe <toyoa@mvista.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1711ef38
  13. 18 6月, 2006 3 次提交
    • O
      [PATCH] arm_timer: remove a racy and obsolete PF_EXITING check · f53ae1dc
      Oleg Nesterov 提交于
      arm_timer() checks PF_EXITING to prevent BUG_ON(->exit_state)
      in run_posix_cpu_timers().
      
      However, for some reason it does so only for CPUCLOCK_PERTHREAD
      case (which is imho wrong).
      
      Also, this check is not reliable, PF_EXITING could be set on
      another cpu without any locks/barriers just after the check,
      so it can't prevent from attaching the timer to the exiting
      task.
      
      The previous patch makes this check unneeded.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f53ae1dc
    • O
      [PATCH] run_posix_cpu_timers: remove a bogus BUG_ON() · 30f1e3dd
      Oleg Nesterov 提交于
      do_exit() clears ->it_##clock##_expires, but nothing prevents
      another cpu to attach the timer to exiting process after that.
      arm_timer() tries to protect against this race, but the check
      is racy.
      
      After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
      before do_exit() calls 'schedule() local timer interrupt can find
      tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
      does sys_wait4) interrupted task has ->signal == NULL.
      
      At this moment exiting task has no pending cpu timers, they were
      cleanuped in __exit_signal()->posix_cpu_timers_exit{,_group}(),
      so we can just return from irq.
      
      John Stultz recently confirmed this bug, see
      
      	http://marc.theaimsgroup.com/?l=linux-kernel&m=115015841413687Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      30f1e3dd
    • O
      [PATCH] check_process_timers: fix possible lockup · 8f17fc20
      Oleg Nesterov 提交于
      If the local timer interrupt happens just after do_exit() sets PF_EXITING
      (and before it clears ->it_xxx_expires) run_posix_cpu_timers() will call
      check_process_timers() with tasklist_lock + ->siglock held and
      
      	check_process_timers:
      
      		t = tsk;
      		do {
      			....
      
      			do {
      				t = next_thread(t);
      			} while (unlikely(t->flags & PF_EXITING));
      		} while (t != tsk);
      
      the outer loop will never stop.
      
      Actually, the window is bigger.  Another process can attach the timer
      after ->it_xxx_expires was cleared (see the next commit) and the 'if
      (PF_EXITING)' check in arm_timer() is racy (see the one after that).
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f17fc20
  14. 11 1月, 2006 2 次提交
  15. 07 1月, 2006 1 次提交
    • D
      [PATCH] Fix posix-cpu-timers sched_time accumulation · 0aec63e6
      David S. Miller 提交于
      I've spent the past 3 days digging into a glibc testsuite failure in
      current CVS, specifically libc/rt/tst-cputimer1.c The thr1 and thr2
      timers fire too early in the second pass of this test.  The second
      pass is noteworthy because it makes use of intervals, whereas the
      first pass does not.
      
      All throughout the posix-cpu-timers.c code, the calculation of the
      process sched_time sum is implemented roughly as:
      
      	unsigned long long sum;
      
      	sum = tsk->signal->sched_time;
      	t = tsk;
      	do {
      		sum += t->sched_time;
      		t = next_thread(t);
      	} while (t != tsk);
      
      In fact this is the exact scheme used by check_process_timers().
      
      In the case of check_process_timers(), current->sched_time has just
      been updated (via scheduler_tick(), which is invoked by
      update_process_times(), which subsequently invokes
      run_posix_cpu_timers()) So there is no special processing necessary
      wrt. that.
      
      In other contexts, we have to allot for the fact that tsk->sched_time
      might be a bit out of date if we are current.  And the
      posix-cpu-timers.c code uses current_sched_time() to deal with that.
      
      Unfortunately it does so in an erroneous and inconsistent manner in
      one spot which is what results in the early timer firing.
      
      In cpu_clock_sample_group_locked(), it does this:
      
      		cpu->sched = p->signal->sched_time;
      		/* Add in each other live thread.  */
      		while ((t = next_thread(t)) != p) {
      			cpu->sched += t->sched_time;
      		}
      		if (p->tgid == current->tgid) {
      			/*
      			 * We're sampling ourselves, so include the
      			 * cycles not yet banked.  We still omit
      			 * other threads running on other CPUs,
      			 * so the total can always be behind as
      			 * much as max(nthreads-1,ncpus) * (NSEC_PER_SEC/HZ).
      			 */
      			cpu->sched += current_sched_time(current);
      		} else {
      			cpu->sched += p->sched_time;
      		}
      
      The problem is the "p->tgid == current->tgid" test.  If "p" is
      not current, and the tgids are the same, we will add the process
      t->sched_time twice into cpu->sched and omit "p"'s sched_time
      which is very very very wrong.
      
      posix-cpu-timers.c has a helper function, sched_ns(p) which takes care
      of this, so my fix is to use that here instead of this special tgid
      test.
      
      The fact that current can be one of the sub-threads of "p" points out
      that we could make things a little bit more accurate, perhaps by using
      sched_ns() on every thread we process in these loops.  It also points
      out that we don't use the most accurate value for threads in the group
      actively running other cpus (and this is mentioned in the comment).
      
      But that is a future enhancement, and this fix here definitely makes
      sense.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0aec63e6
  16. 29 11月, 2005 1 次提交
  17. 07 11月, 2005 1 次提交
  18. 31 10月, 2005 1 次提交
  19. 28 10月, 2005 2 次提交
  20. 27 10月, 2005 2 次提交
  21. 24 10月, 2005 5 次提交
    • O
      [PATCH] posix-timers: fix posix_cpu_timer_set() vs run_posix_cpu_timers() race · a69ac4a7
      Oleg Nesterov 提交于
      This might be harmless, but looks like a race from code inspection (I
      was unable to trigger it).  I must admit, I don't understand why we
      can't return TIMER_RETRY after 'spin_unlock(&p->sighand->siglock)'
      without doing bump_cpu_timer(), but this is what original code does.
      
      posix_cpu_timer_set:
      
      	read_lock(&tasklist_lock);
      
      	spin_lock(&p->sighand->siglock);
      	list_del_init(&timer->it.cpu.entry);
      	spin_unlock(&p->sighand->siglock);
      
      We are probaly deleting the timer from run_posix_cpu_timers's 'firing'
      local list_head while run_posix_cpu_timers() does list_for_each_safe.
      
      Various bad things can happen, for example we can just delete this timer
      so that list_for_each() will not notice it and run_posix_cpu_timers()
      will not reset '->firing' flag. In that case,
      
      	....
      
      	if (timer->it.cpu.firing) {
      		read_unlock(&tasklist_lock);
      		timer->it.cpu.firing = -1;
      		return TIMER_RETRY;
      	}
      
      sys_timer_settime() goes to 'retry:', calls posix_cpu_timer_set() again,
      it returns TIMER_RETRY ...
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a69ac4a7
    • O
      [PATCH] posix-timers: exit path cleanup · ca531a0a
      Oleg Nesterov 提交于
      No need to rebalance when task exited
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ca531a0a
    • O
      [PATCH] posix-timers: remove false BUG_ON() from run_posix_cpu_timers() · 3de463c7
      Oleg Nesterov 提交于
      do_exit() clears ->it_##clock##_expires, but nothing prevents
      another cpu to attach the timer to exiting process after that.
      
      After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
      before do_exit() calls 'schedule() local timer interrupt can find
      tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
      does sys_wait4) interrupted task has ->signal == NULL.
      
      At this moment exiting task has no pending cpu timers, they were cleaned
      up in __exit_signal()->posix_cpu_timers_exit{,_group}(), so we can just
      return from irq.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3de463c7
    • O
      [PATCH] posix-timers: fix cleanup_timers() and run_posix_cpu_timers() races · 108150ea
      Oleg Nesterov 提交于
      1. cleanup_timers() sets timer->task = NULL under tasklist + ->sighand locks.
         That means that this code in posix_cpu_timer_del() and posix_cpu_timer_set()
      
         		lock_timer(timer);
      		if (timer->task == NULL)
      			return;
      		read_lock(tasklist);
      		put_task_struct(timer->task)
      
         is racy. With this patch timer->task modified and accounted only under
         timer->it_lock. Sadly, this means that dead task_struct won't be freed
         until timer deleted or armed.
      
      2. run_posix_cpu_timers() collects expired timers into local list under
         tasklist + ->sighand again. That means that posix_cpu_timer_del()
         should check timer->it.cpu.firing under these locks too.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      108150ea
    • L
      Posix timers: limit number of timers firing at once · e80eda94
      Linus Torvalds 提交于
      Bursty timers aren't good for anybody, very much including latency for
      other programs when we trigger lots of timers in interrupt context.  So
      set a random limit, after which we'll handle the rest on the next timer
      tick.
      
      Noted by Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e80eda94
  22. 22 10月, 2005 1 次提交
  23. 20 10月, 2005 1 次提交
    • R
      [PATCH] Fix cpu timers exit deadlock and races · e03d13e9
      Roland McGrath 提交于
      Oleg Nesterov reported an SMP deadlock.  If there is a running timer
      tracking a different process's CPU time clock when the process owning
      the timer exits, we deadlock on tasklist_lock in posix_cpu_timer_del via
      exit_itimers.
      
      That code was using tasklist_lock to check for a race with __exit_signal
      being called on the timer-target task and clearing its ->signal.
      However, there is actually no such race.  __exit_signal will have called
      posix_cpu_timers_exit and posix_cpu_timers_exit_group before it does
      that.  Those will clear those k_itimer's association with the dying
      task, so posix_cpu_timer_del will return early and never reach the code
      in question.
      
      In addition, posix_cpu_timer_del called from exit_itimers during execve
      or directly from timer_delete in the process owning the timer can race
      with an exiting timer-target task to cause a double put on timer-target
      task struct.  Make sure we always access cpu_timers lists with sighand
      lock held.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NChris Wright <chrisw@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e03d13e9
  24. 18 10月, 2005 1 次提交
  25. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4