1. 22 8月, 2009 1 次提交
    • J
      time: Introduce CLOCK_REALTIME_COARSE · da15cfda
      john stultz 提交于
      After talking with some application writers who want very fast, but not
      fine-grained timestamps, I decided to try to implement new clock_ids
      to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
      which returns the time at the last tick. This is very fast as we don't
      have to access any hardware (which can be very painful if you're using
      something like the acpi_pm clocksource), and we can even use the vdso
      clock_gettime() method to avoid the syscall. The only trade off is you
      only get low-res tick grained time resolution.
      
      This isn't a new idea, I know Ingo has a patch in the -rt tree that made
      the vsyscall gettimeofday() return coarse grained time when the
      vsyscall64 sysctrl was set to 2. However this affects all applications
      on a system.
      
      With this method, applications can choose the proper speed/granularity
      trade-off for themselves.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: nikolag@ca.ibm.com
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: arjan@infradead.org
      Cc: jonathan@jonmasters.org
      LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      da15cfda
  2. 04 8月, 2009 1 次提交
  3. 14 1月, 2009 1 次提交
  4. 21 12月, 2008 1 次提交
  5. 13 12月, 2008 2 次提交
    • O
      posix-timers: check ->it_signal instead of ->it_pid to validate the timer · 89992102
      Oleg Nesterov 提交于
      Impact: clean up, speed up
      
      ->it_pid (was ->it_process) has also a special meaning: if it is NULL,
      the timer is under deletion or it wasn't initialized yet. We can check
      ->it_signal != NULL instead, this way we can
      
      	- simplify sys_timer_create() a bit
      
      	- remove yet another check from lock_timer()
      
      	- move put_pid(->it_pid) into release_posix_timer() which
      	  runs outside of ->it_lock
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      89992102
    • O
      posix-timers: use "struct pid*" instead of "struct task_struct*" · 27af4245
      Oleg Nesterov 提交于
      Impact: restructure, clean up code
      
      k_itimer holds the ref to the ->it_process until sys_timer_delete(). This
      allows to pin up to RLIMIT_SIGPENDING dead task_struct's. Change the code
      to use "struct pid *" instead.
      
      The patch doesn't kill ->it_process, it places ->it_pid into the union.
      ->it_process is still used by do_cpu_nanosleep() as before. It would be
      trivial to change the nanosleep code as well, but since it uses it_process
      in a special way I think it is better to keep this field for grep.
      
      The patch bloats the kernel by 104 bytes and it also adds the new pointer,
      ->it_signal, to k_itimer. It is used by lock_timer() to verify that the
      found timer was not created by another process. It is not clear why do we
      use the global database (and thus the global idr_lock) for posix timers.
      We still need the signal_struct->posix_timers which contains all useable
      timers, perhaps it is better to use some form of per-process array
      instead.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      27af4245
  6. 03 10月, 2008 1 次提交
  7. 24 9月, 2008 9 次提交
  8. 06 9月, 2008 1 次提交
  9. 21 8月, 2008 1 次提交
    • J
      clocksource: introduce CLOCK_MONOTONIC_RAW · 2d42244a
      John Stultz 提交于
      In talking with Josip Loncaric, and his work on clock synchronization (see
      btime.sf.net), he mentioned that for really close synchronization, it is
      useful to have access to "hardware time", that is a notion of time that is
      not in any way adjusted by the clock slewing done to keep close time sync.
      
      Part of the issue is if we are using the kernel's ntp adjusted
      representation of time in order to measure how we should correct time, we
      can run into what Paul McKenney aptly described as "Painting a road using
      the lines we're painting as the guide".
      
      I had been thinking of a similar problem, and was trying to come up with a
      way to give users access to a purely hardware based time representation
      that avoided users having to know the underlying frequency and mask values
      needed to deal with the wide variety of possible underlying hardware
      counters.
      
      My solution is to introduce CLOCK_MONOTONIC_RAW.  This exposes a
      nanosecond based time value, that increments starting at bootup and has no
      frequency adjustments made to it what so ever.
      
      The time is accessed from userspace via the posix_clock_gettime() syscall,
      passing CLOCK_MONOTONIC_RAW as the clock_id.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d42244a
  10. 26 7月, 2008 2 次提交
  11. 24 7月, 2008 2 次提交
    • O
      posix-timers: fix posix_timer_event() vs dequeue_signal() race · ba661292
      Oleg Nesterov 提交于
      The bug was reported and analysed by Mark McLoughlin <markmc@redhat.com>,
      the patch is based on his and Roland's suggestions.
      
      posix_timer_event() always rewrites the pre-allocated siginfo before sending
      the signal. Most of the written info is the same all the time, but memset(0)
      is very wrong. If ->sigq is queued we can race with collect_signal() which
      can fail to find this siginfo looking at .si_signo, or copy_siginfo() can
      copy the wrong .si_code/si_tid/etc.
      
      In short, sys_timer_settime() can in fact stop the active timer, or the user
      can receive the siginfo with the wrong .si_xxx values.
      
      Move "memset(->info, 0)" from posix_timer_event() to alloc_posix_timer(),
      change send_sigqueue() to set .si_overrun = 0 when ->sigq is not queued.
      It would be nice to move the whole sigq->info initialization from send to
      create path, but this is not easy to do without uglifying timer_create()
      further.
      
      As Roland rightly pointed out, we need more cleanups/fixes here, see the
      "FIXME" comment in the patch. Hopefully this patch makes sense anyway, and
      it can mask the most bad implications.
      Reported-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Cc: Oliver Pinter <oliver.pntr@gmail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: stable@kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      
       kernel/posix-timers.c |   17 +++++++++++++----
       kernel/signal.c       |    1 +
       2 files changed, 14 insertions(+), 4 deletions(-)
      ba661292
    • O
      posix-timers: do_schedule_next_timer: fix the setting of ->si_overrun · 54da1174
      Oleg Nesterov 提交于
      do_schedule_next_timer() sets info->si_overrun = timr->it_overrun_last,
      this discards the already accumulated overruns.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Cc: Oliver Pinter <oliver.pntr@gmail.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: stable@kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      54da1174
  12. 30 4月, 2008 1 次提交
  13. 19 4月, 2008 1 次提交
  14. 15 2月, 2008 1 次提交
  15. 10 2月, 2008 1 次提交
    • O
      hrtimer: fix *rmtp handling in hrtimer_nanosleep() · 080344b9
      Oleg Nesterov 提交于
      Spotted by Pavel Emelyanov and Alexey Dobriyan.
      
      hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
      the local variable which lives in the caller's stack frame. This means that
      if sys_restart_syscall() actually happens and it is interrupted as well, we
      don't update the user-space variable, but write into the already dead stack
      frame.
      
      Introduced by commit 04c22714
      hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier
      
      Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
      hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.
      
      Small problem remains. man 2 nanosleep states that *rtmp should be written if
      nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
      if nanosleep returns 0), but (with or without this patch) we can dirty *rem
      even if nanosleep() returns 0.
      
      NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
      bugs. Fixed by the next patch.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Alexey Dobriyan <adobriyan@sw.ru>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Pavel Emelyanov <xemul@sw.ru>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Toyo Abe <toyoa@mvista.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      
       include/linux/hrtimer.h |    2 -
       kernel/hrtimer.c        |   51 +++++++++++++++++++++++++-----------------------
       kernel/posix-timers.c   |   14 +------------
       3 files changed, 30 insertions(+), 37 deletions(-)
      080344b9
  16. 09 2月, 2008 1 次提交
  17. 06 2月, 2008 1 次提交
    • D
      timerfd: new timerfd API · 4d672e7a
      Davide Libenzi 提交于
      This is the new timerfd API as it is implemented by the following patch:
      
      int timerfd_create(int clockid, int flags);
      int timerfd_settime(int ufd, int flags,
      		    const struct itimerspec *utmr,
      		    struct itimerspec *otmr);
      int timerfd_gettime(int ufd, struct itimerspec *otmr);
      
      The timerfd_create() API creates an un-programmed timerfd fd.  The "clockid"
      parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.
      
      The timerfd_settime() API give new settings by the timerfd fd, by optionally
      retrieving the previous expiration time (in case the "otmr" parameter is not
      NULL).
      
      The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
      is set in the "flags" parameter.  Otherwise it's a relative time.
      
      The timerfd_gettime() API returns the next expiration time of the timer, or
      {0, 0} if the timerfd has not been set yet.
      
      Like the previous timerfd API implementation, read(2) and poll(2) are
      supported (with the same interface).  Here's a simple test program I used to
      exercise the new timerfd APIs:
      
      http://www.xmailserver.org/timerfd-test2.c
      
      [akpm@linux-foundation.org: coding-style cleanups]
      [akpm@linux-foundation.org: fix ia64 build]
      [akpm@linux-foundation.org: fix m68k build]
      [akpm@linux-foundation.org: fix mips build]
      [akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
      [heiko.carstens@de.ibm.com: fix s390]
      [akpm@linux-foundation.org: fix powerpc build]
      [akpm@linux-foundation.org: fix sparc64 more]
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d672e7a
  18. 03 2月, 2008 1 次提交
  19. 20 10月, 2007 1 次提交
  20. 19 10月, 2007 1 次提交
  21. 17 10月, 2007 1 次提交
  22. 15 10月, 2007 1 次提交
  23. 23 8月, 2007 2 次提交
    • O
      posix-timers: fix creation race · d02479bd
      Oleg Nesterov 提交于
      sys_timer_create() sets ->it_process and unlocks ->siglock, then checks
      tmr->it_sigev_notify to define if get_task_struct() is needed.
      
      We already passed ->it_id to the caller, another thread can delete this timer
      and free its memory in between.
      
      As a minimal fix, move this code under ->siglock, sys_timer_delete() takes it
      too before calling release_posix_timer().  A proper serialization would be to
      take ->it_lock, we add a partly initialized timer on posix_timers_id, not
      good.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d02479bd
    • T
      posix-timers: fix deletion race · 179394af
      Thomas Gleixner 提交于
      timer_delete does:
      	lock_timer();
      	timer->it_process = NULL;
      	unlock_timer();
      	release_posix_timer();
      
      timer->it_process is checked in lock_timer() to prevent access to a
      timer, which is on the way to be deleted, but the check happens after
      idr_lock is dropped. This allows release_posix_timer() to delete the
      timer before the lock code can check the timer:
      
        CPU 0				CPU 1
      
        lock_timer();
        timer->it_process = NULL;
        unlock_timer();
      				lock_timer()
      					spin_lock(idr_lock);
      					timer = idr_find();
      					spin_lock(timer->lock);
      					spin_unlock(idr_lock);
        release_posix_timer();
      	spin_lock(idr_lock);
      	idr_remove(timer);
      	spin_unlock(idr_lock);
      	free_timer(timer);
      					if (timer->......)
      
      Change the locking to prevent this.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      179394af
  24. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  25. 22 6月, 2007 1 次提交
    • T
      posix-timers: Prevent softirq starvation by small intervals and SIG_IGN · 58229a18
      Thomas Gleixner 提交于
      posix-timers which deliver an ignored signal are currently rearmed in
      the timer softirq: This is necessary because the timer needs to be
      delivered again when SIG_IGN is removed. This is not a problem, when
      the interval is reasonable.
      
      With high resolution timers enabled one might arm a posix timer with a
      very small interval and ignore the signal. This might lead to a
      softirq starvation when the interval is so small that the timer is
      requeued onto the softirq pending list right away.
      
      This problem was pointed out by Jan Kiszka. Thanks Jan !
      
      The correct solution would be to stop the timer, when the signal is
      ignored and rearm it when SIG_IGN is removed. Unfortunately this
      requires modification in sigaction and involves non trivial sighand
      locking. It's too late in the release cycle for such a change.
      
      For now we just keep the timer running and enforce that the timer only
      fires every jiffie. This does not break anything as we keep the
      overrun counter correct. It adds a little inaccuracy to the
      timer_gettime() interface, but...
      
      The more complex change is necessary anyway to fix another short
      coming of the current implementation, which I discovered while looking
      at this problem: A pending signal is discarded when SIG_IGN is set. In
      case that a posixtimer signal is pending then it is discarded as well,
      but when SIG_IGN is removed later nothing rearms the timer. This is
      not new, it's that way since posix timers have been merged. So nothing
      to worry about right now.
      
      I have a working solution to fix all of this, but the impact is too
      large for both stable and 2.6.22. I'm going to send it out for review
      in the next days.
      
      This should go into 2.6.21.stable as well.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Stable Team <stable@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58229a18
  26. 09 5月, 2007 1 次提交
  27. 17 2月, 2007 2 次提交