1. 23 6月, 2006 2 次提交
    • P
      [PATCH] When CONFIG_BASE_SMALL=1, cascade() may enter an infinite loop · 3439dd86
      Porpoise 提交于
      When CONFIG_BASE_SAMLL=1, cascade() in may enter the infinite loop.
      Because of CONFIG_BASE_SMALL=1(TVR_BITS=6 and TVN_BITS=4), the list
      base->tv5 may cascade into base->tv5.  So, the kernel enters the infinite
      loop in the function cascade().
      
      I created a test module to verify this bug, and a patch to fix it.
      
      #include <linux/kernel.h>
      #include <linux/module.h>
      #include <linux/init.h>
      #include <linux/timer.h>
      #if 0
      #include <linux/kdb.h>
      #else
      #define kdb_printf printk
      #endif
      
      #define TVN_BITS (CONFIG_BASE_SMALL ? 4 : 6)
      #define TVR_BITS (CONFIG_BASE_SMALL ? 6 : 8)
      #define TVN_SIZE (1 << TVN_BITS)
      #define TVR_SIZE (1 << TVR_BITS)
      #define TVN_MASK (TVN_SIZE - 1)
      #define TVR_MASK (TVR_SIZE - 1)
      
      #define TV_SIZE(N)  (N*TVN_BITS  + TVR_BITS)
      
      struct timer_list timer0;
      struct timer_list dummy_timer1;
      struct timer_list dummy_timer2;
      
      void dummy_timer_fun(unsigned long data) {
      }
      unsigned long j=0;
      void check_timer_base(unsigned long data)
      {
              kdb_printf("check_timer_base %08x\n",jiffies);
              mod_timer(&timer0,(jiffies & (~0xFFF)) + 0x1FFF);
      }
      
      int init_module(void)
      {
              init_timer(&timer0);
              timer0.data = (unsigned long)0;
              timer0.function = check_timer_base;
              mod_timer(&timer0,jiffies+1);
      
              init_timer(&dummy_timer1);
              dummy_timer1.data = (unsigned long)0;
              dummy_timer1.function = dummy_timer_fun;
      
              init_timer(&dummy_timer2);
              dummy_timer2.data = (unsigned long)0;
              dummy_timer2.function = dummy_timer_fun;
      
              j=jiffies;
              j&=(~((1<<TV_SIZE(3))-1));
              j+=(1<<TV_SIZE(3));
              j+=(1<<TV_SIZE(4));
      
              kdb_printf("mod_timer %08x\n",j);
      
              mod_timer(&dummy_timer1, j );
              mod_timer(&dummy_timer2, j );
      
              return 0;
      }
      
      void cleanup_module()
      {
              del_timer_sync(&timer0);
              del_timer_sync(&dummy_timer1);
              del_timer_sync(&dummy_timer2);
      }
      
      (Cleanups from Oleg)
      
      [oleg@tv-sign.ru: use list_replace_init()]
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3439dd86
    • O
      [PATCH] list: use list_replace_init() instead of list_splice_init() · 626ab0e6
      Oleg Nesterov 提交于
      list_splice_init(list, head) does unneeded job if it is known that
      list_empty(head) == 1.  We can use list_replace_init() instead.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      626ab0e6
  2. 22 5月, 2006 1 次提交
    • Z
      [PATCH] Fix a NO_IDLE_HZ timer bug · 0662b713
      Zachary Amsden 提交于
      Under certain timing conditions, a race during boot occurs where timer
      ticks are being processed on remote CPUs.  The remote timer ticks can
      increment jiffies, and if this happens during a window when a timeout is
      very close to expiring but a local tick has not yet been delivered, you can
      end up with
      
      1) No softirq pending
      2) A local timer wheel which is not synced to jiffies
      3) No high resolution timer active
      4) A local timer which is supposed to fire before the current jiffies value.
      
      In this circumstance, the comparison in next_timer_interrupt overflows,
      because the base of the comparison for high resolution timers is jiffies,
      but for the softirq timer wheel, it is relative the the current base of the
      wheel (jiffies_base).
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0662b713
  3. 26 4月, 2006 2 次提交
  4. 11 4月, 2006 1 次提交
    • A
      [PATCH] timer initialisation fix · ba6edfcd
      Andrew Morton 提交于
      We need the boot CPU's tvec_bases[] entry to be initialised super-early in
      boot, for early_serial_setup().  That runs within setup_arch(), before even
      per-cpu areas are initialised.
      
      The patch changes tvec_bases to use compile-time initialisation, and adds a
      separate array `tvec_base_done' to keep track of which CPU has had its
      tvec_bases[] entry initialised (because we can no longer use the zeroness of
      that tvec_bases[] entry to determine whether it has been initialised).
      
      Thanks to Eugene Surovegin <ebs@ebshome.net> for diagnosing this.
      
      Cc: Eugene Surovegin <ebs@ebshome.net>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ba6edfcd
  5. 10 4月, 2006 1 次提交
    • J
      [PATCH] x86_64: Fix drift with HPET timer enabled · b20367a6
      Jordan Hargrave 提交于
      If the HPET timer is enabled, the clock can drift by ~3 seconds a day.
      This is due to the HPET timer not being initialized with the correct
      setting (still using PIT count).
      
      If HZ changes, this drift can become even more pronounced.
      
      HPET patch initializes tick_nsec with correct tick_nsec settings for
      HPET timer.
      
      Vojtech comments:
      
        "It's not entirely correct (it assumes the HPET ticks totally
         exactly), but it's significantly better than assuming the PIT error
         there."
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b20367a6
  6. 02 4月, 2006 1 次提交
  7. 01 4月, 2006 3 次提交
  8. 26 3月, 2006 2 次提交
    • R
      [PATCH] remove pps support · 5ddcfa87
      Roman Zippel 提交于
      This removes the support for pps.  It's completely unused within the kernel
      and is basically in the way for further cleanups.  It should be easier to
      readd proper support for it after the rest has been converted to NTP4
      (where the pps mechanisms are quite different from NTP3 anyway).
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: Adrian Bunk <bunk@stusta.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5ddcfa87
    • T
      [PATCH] sys_alarm() unsigned signed conversion fixup · c08b8a49
      Thomas Gleixner 提交于
      alarm() calls the kernel with an unsigend int timeout in seconds.  The
      value is stored in the tv_sec field of a struct timeval to setup the
      itimer.  The tv_sec field of struct timeval is of type long, which causes
      the tv_sec value to be negative on 32 bit machines if seconds > INT_MAX.
      
      Before the hrtimer merge (pre 2.6.16) such a negative value was converted
      to the maximum jiffies timeout by the timeval_to_jiffies conversion.  It's
      not clear whether this was intended or just happened to be done by the
      timeval_to_jiffies code.
      
      hrtimers expect a timeval in canonical form and treat a negative timeout as
      already expired.  This breaks the legitimate usage of alarm() with a
      timeout value > INT_MAX seconds.
      
      For 32 bit machines it is therefor necessary to limit the internal seconds
      value to avoid API breakage.  Instead of doing this in all implementations
      of sys_alarm the duplicated sys_alarm code is moved into a common function
      in itimer.c
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c08b8a49
  9. 24 3月, 2006 2 次提交
  10. 17 3月, 2006 1 次提交
  11. 07 3月, 2006 2 次提交
  12. 03 3月, 2006 1 次提交
  13. 18 2月, 2006 1 次提交
    • P
      [PATCH] Provide an interface for getting the current tick length · 726c14bf
      Paul Mackerras 提交于
      This provides an interface for arch code to find out how many
      nanoseconds are going to be added on to xtime by the next call to
      do_timer.  The value returned is a fixed-point number in 52.12 format
      in nanoseconds.  The reason for this format is that it gives the
      full precision that the timekeeping code is using internally.
      
      The motivation for this is to fix a problem that has arisen on 32-bit
      powerpc in that the value returned by do_gettimeofday drifts apart
      from xtime if NTP is being used.  PowerPC is now using a lockless
      do_gettimeofday based on reading the timebase register and performing
      some simple arithmetic.  (This method of getting the time is also
      exported to userspace via the VDSO.)  However, the factor and offset
      it uses were calculated based on the nominal tick length and weren't
      being adjusted when NTP varied the tick length.
      
      Note that 64-bit powerpc has had the lockless do_gettimeofday for a
      long time now.  It also had an extremely hairy routine that got called
      from the 32-bit compat routine for adjtimex, which adjusted the
      factor and offset according to what it thought the timekeeping code
      was going to do.  Not only was this only called if a 32-bit task did
      adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
      duplicating computations from kernel/timer.c and it wasn't clear that
      it was (still) correct.
      
      The simple solution is to ask the timekeeping code how long the
      current jiffy will be on each timer interrupt, after calling
      do_timer.  If this jiffy will be a different length from the last one,
      we then need to compute new values for the factor and offset used in
      the lockless do_gettimeofday.  In this way we can keep xtime and
      do_gettimeofday in sync, even when NTP is varying the tick length.
      
      Note that when adjtimex varies the tick length, it almost always
      introduces the variation from the next tick on.  The only case I could
      see where adjtimex would vary the length of the current tick is when
      an old-style adjtime adjustment is being cancelled.  (It's not clear
      to me why the adjustment has to be cancelled immediately rather than
      from the next tick on.)  Thus I don't see any real need for a hook in
      adjtimex; the rare case of an old-style adjustment being cancelled can
      be fixed up at the next tick.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: Njohn stultz <johnstul@us.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      726c14bf
  14. 08 2月, 2006 1 次提交
  15. 11 1月, 2006 2 次提交
  16. 09 1月, 2006 1 次提交
  17. 31 10月, 2005 5 次提交
  18. 30 10月, 2005 1 次提交
  19. 13 9月, 2005 1 次提交
  20. 11 9月, 2005 2 次提交
  21. 08 9月, 2005 2 次提交
  22. 24 8月, 2005 1 次提交
    • D
      [PATCH] preempt race in getppid · 4c5640cb
      David Meybohm 提交于
      With CONFIG_PREEMPT && !CONFIG_SMP, it's possible for sys_getppid to
      return a bogus value if the parent's task_struct gets reallocated after
      current->group_leader->real_parent is read:
      
              asmlinkage long sys_getppid(void)
              {
                      int pid;
                      struct task_struct *me = current;
                      struct task_struct *parent;
      
                      parent = me->group_leader->real_parent;
      RACE HERE =>    for (;;) {
                              pid = parent->tgid;
              #ifdef CONFIG_SMP
              {
                              struct task_struct *old = parent;
      
                              /*
                               * Make sure we read the pid before re-reading the
                               * parent pointer:
                               */
                              smp_rmb();
                              parent = me->group_leader->real_parent;
                              if (old != parent)
                                      continue;
              }
              #endif
                              break;
                      }
                      return pid;
              }
      
      If the process gets preempted at the indicated point, the parent process
      can go ahead and call exit() and then get wait()'d on to reap its
      task_struct. When the preempted process gets resumed, it will not do any
      further checks of the parent pointer on !CONFIG_SMP: it will read the
      bad pid and return.
      
      So, the same algorithm used when SMP is enabled should be used when
      preempt is enabled, which will recheck ->real_parent in this case.
      Signed-off-by: NDavid Meybohm <dmeybohmlkml@bellsouth.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c5640cb
  23. 26 6月, 2005 1 次提交
  24. 24 6月, 2005 3 次提交
    • J
      [PATCH] preempt_count is int - remove cast and don't assign to unsigned type · be5b4fbd
      Jesper Juhl 提交于
      In kernel/sched.c the return value from preempt_count() is cast to an int.
      That made sense when preempt_count was defined as different types on is not
      needed and should go away.  The patch removes the cast.
      
      In kernel/timer.c the return value from preempt_count() is assigned to a
      variable of type u32 and then that unsigned value is later compared to
      preempt_count().  Since preempt_count() returns an int, an int is what
      should be used to store its return value.  Storing the result in an
      unsigned 32bit integer made a tiny bit of sense back when preempt_count was
      different types on different archs, but no more - let's not play signed vs
      unsigned comparison games when we don't have to.  The patch modifies the
      code to use an int to hold the value.  While I was around that bit of code
      I also made two changes to a nearby (related) printk() - I modified it to
      specify the loglevel explicitly and also broke the line into a few pieces
      to avoid it being longer than 80 chars and clarified the text a bit.
      Signed-off-by: NJesper Juhl <juhl-lkml@dif.dk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      be5b4fbd
    • O
      [PATCH] timers: introduce try_to_del_timer_sync() · fd450b73
      Oleg Nesterov 提交于
      This patch splits del_timer_sync() into 2 functions.  The new one,
      try_to_del_timer_sync(), returns -1 when it hits executing timer.
      
      It can be used in interrupt context, or when the caller hold locks which
      can prevent completion of the timer's handler.
      
      NOTE.  Currently it can't be used in interrupt context in UP case, because
      ->running_timer is used only with CONFIG_SMP.
      
      Should the need arise, it is possible to kill #ifdef CONFIG_SMP in
      set_running_timer(), it is cheap.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fd450b73
    • O
      [PATCH] timers fixes/improvements · 55c888d6
      Oleg Nesterov 提交于
      This patch tries to solve following problems:
      
      1. del_timer_sync() is racy. The timer can be fired again after
         del_timer_sync have checked all cpus and before it will recheck
         timer_pending().
      
      2. It has scalability problems. All cpus are scanned to determine
         if the timer is running on that cpu.
      
         With this patch del_timer_sync is O(1) and no slower than plain
         del_timer(pending_timer), unless it has to actually wait for
         completion of the currently running timer.
      
         The only restriction is that the recurring timer should not use
         add_timer_on().
      
      3. The timers are not serialized wrt to itself.
      
         If CPU_0 does mod_timer(jiffies+1) while the timer is currently
         running on CPU 1, it is quite possible that local interrupt on
         CPU_0 will start that timer before it finished on CPU_1.
      
      4. The timers locking is suboptimal. __mod_timer() takes 3 locks
         at once and still requires wmb() in del_timer/run_timers.
      
         The new implementation takes 2 locks sequentially and does not
         need memory barriers.
      
      Currently ->base != NULL means that the timer is pending. In that case
      ->base.lock is used to lock the timer. __mod_timer also takes timer->lock
      because ->base can be == NULL.
      
      This patch uses timer->entry.next != NULL as indication that the timer is
      pending. So it does __list_del(), entry->next = NULL instead of list_del()
      when the timer is deleted.
      
      The ->base field is used for hashed locking only, it is initialized
      in init_timer() which sets ->base = per_cpu(tvec_bases). When the
      tvec_bases.lock is locked, it means that all timers which are tied
      to this base via timer->base are locked, and the base itself is locked
      too.
      
      So __run_timers/migrate_timers can safely modify all timers which could
      be found on ->tvX lists (pending timers).
      
      When the timer's base is locked, and the timer removed from ->entry list
      (which means that _run_timers/migrate_timers can't see this timer), it is
      possible to set timer->base = NULL and drop the lock: the timer remains
      locked.
      
      This patch adds lock_timer_base() helper, which waits for ->base != NULL,
      locks the ->base, and checks it is still the same.
      
      __mod_timer() schedules the timer on the local CPU and changes it's base.
      However, it does not lock both old and new bases at once. It locks the
      timer via lock_timer_base(), deletes the timer, sets ->base = NULL, and
      unlocks old base. Then __mod_timer() locks new_base, sets ->base = new_base,
      and adds this timer. This simplifies the code, because AB-BA deadlock is not
      possible. __mod_timer() also ensures that the timer's base is not changed
      while the timer's handler is running on the old base.
      
      __run_timers(), del_timer() do not change ->base anymore, they only clear
      pending flag.
      
      So del_timer_sync() can test timer->base->running_timer == timer to detect
      whether it is running or not.
      
      We don't need timer_list->lock anymore, this patch kills it.
      
      We also don't need barriers. del_timer() and __run_timers() used smp_wmb()
      before clearing timer's pending flag. It was needed because __mod_timer()
      did not lock old_base if the timer is not pending, so __mod_timer()->list_add()
      could race with del_timer()->list_del(). With this patch these functions are
      serialized through base->lock.
      
      One problem. TIMER_INITIALIZER can't use per_cpu(tvec_bases). So this patch
      adds global
      
              struct timer_base_s {
                      spinlock_t lock;
                      struct timer_list *running_timer;
              } __init_timer_base;
      
      which is used by TIMER_INITIALIZER. The corresponding fields in tvec_t_base_s
      struct are replaced by struct timer_base_s t_base.
      
      It is indeed ugly. But this can't have scalability problems. The global
      __init_timer_base.lock is used only when __mod_timer() is called for the first
      time AND the timer was compile time initialized. After that the timer migrates
      to the local CPU.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NRenaud Lienhart <renaud.lienhart@free.fr>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      55c888d6