1. 15 8月, 2012 1 次提交
  2. 27 7月, 2012 1 次提交
    • J
      posix_types.h: Cleanup stale __NFDBITS and related definitions · 8ded2bbc
      Josh Boyer 提交于
      Recently, glibc made a change to suppress sign-conversion warnings in
      FD_SET (glibc commit ceb9e56b3d1).  This uncovered an issue with the
      kernel's definition of __NFDBITS if applications #include
      <linux/types.h> after including <sys/select.h>.  A build failure would
      be seen when passing the -Werror=sign-compare and -D_FORTIFY_SOURCE=2
      flags to gcc.
      
      It was suggested that the kernel should either match the glibc
      definition of __NFDBITS or remove that entirely.  The current in-kernel
      uses of __NFDBITS can be replaced with BITS_PER_LONG, and there are no
      uses of the related __FDELT and __FDMASK defines.  Given that, we'll
      continue the cleanup that was started with commit 8b3d1cda
      ("posix_types: Remove fd_set macros") and drop the remaining unused
      macros.
      
      Additionally, linux/time.h has similar macros defined that expand to
      nothing so we'll remove those at the same time.
      Reported-by: NJeff Law <law@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      CC: <stable@vger.kernel.org>
      Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
      [ .. and fix up whitespace as per akpm ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ded2bbc
  3. 22 5月, 2012 1 次提交
  4. 24 3月, 2012 1 次提交
  5. 20 2月, 2012 1 次提交
  6. 15 2月, 2012 1 次提交
    • H
      posix_types: Remove fd_set macros · 8b3d1cda
      H. Peter Anvin 提交于
      <asm/posix_types.h> includes a set of macros that operate on file
      descriptors.  Way long ago those were exported to user space, but
      nowadays they are #ifdef __KERNEL__.
      
      However, they are nothing but standard (nonatomic) bit operations, and
      we already have optimized versions of bit operations in the kernel.
      We can't include <linux/bitops.h> in <asm/posix_types.h> but we can
      move the definitions to <linux/time.h> and define them there in terms
      of standard kernel bitops.
      
      [ v2: folds the following fixes in:
      
        a) Stray space in __FD_SET(), reported by Andrew Morton
        b) #include <linux/string.h> needed for memset(), reported by Tony Luck ]
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Link: http://lkml.kernel.org/r/1328677745-20121-22-git-send-email-hpa@zytor.com
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      8b3d1cda
  7. 23 5月, 2011 1 次提交
  8. 03 5月, 2011 1 次提交
  9. 27 4月, 2011 2 次提交
    • J
      timers: Posix interface for alarm-timers · 9a7adcf5
      John Stultz 提交于
      This patch exposes alarm-timers to userland via the posix clock
      and timers interface, using two new clockids: CLOCK_REALTIME_ALARM
      and CLOCK_BOOTTIME_ALARM. Both clockids behave identically to
      CLOCK_REALTIME and CLOCK_BOOTTIME, respectively, but timers
      set against the _ALARM suffixed clockids will wake the system if
      it is suspended.
      
      Some background can be found here:
      	https://lwn.net/Articles/429925/
      
      The concept for Alarm-timers was inspired by the Android Alarm
      driver (by Arve Hjønnevåg) found in the Android kernel tree.
      
      See: http://android.git.kernel.org/?p=kernel/common.git;a=blob;f=drivers/rtc/alarm.c;h=1250edfbdf3302f5e4ea6194847c6ef4bb7beb1c;hb=android-2.6.36
      
      While the in-kernel interface is pretty similar between
      alarm-timers and Android alarm driver, the user-space interface
      for the Android alarm driver is via ioctls to a new char device.
      As mentioned above, I've instead chosen to export this functionality
      via the posix interface, as it seemed a little simpler and avoids
      creating duplicate interfaces to things like CLOCK_REALTIME and
      CLOCK_MONOTONIC under alternate names (ie:ANDROID_ALARM_RTC and
      ANDROID_ALARM_SYSTEMTIME).
      
      The semantics of the Android alarm driver are different from what
      this posix interface provides. For instance, threads other then
      the thread waiting on the Android alarm driver are able to modify
      the alarm being waited on. Also this interface does not allow
      the same wakelock semantics that the Android driver provides
      (ie: kernel takes a wakelock on RTC alarm-interupt, and holds it
      through process wakeup, and while the process runs, until the
      process either closes the char device or calls back in to wait
      on a new alarm).
      
      One potential way to implement similar semantics may be via
      the timerfd infrastructure, but this needs more research.
      
      There may also need to be some sort of sysfs system level policy
      hooks that allow alarm timers to be disabled to keep them
      from firing at inappropriate times (ie: laptop in a well insulated
      bag, mid-flight).
      
      CC: Arve Hjønnevåg <arve@android.com>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Alessandro Zummo <a.zummo@towertech.it>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      9a7adcf5
    • J
      time: Add timekeeping_inject_sleeptime · 304529b1
      John Stultz 提交于
      Some platforms cannot implement read_persistent_clock, as
      their RTC devices are only accessible when interrupts are enabled.
      This keeps them from being used by the timekeeping code on resume
      to measure the time in suspend.
      
      The RTC layer tries to work around this, by calling do_settimeofday
      on resume after irqs are reenabled to set the time properly. However,
      this only corrects CLOCK_REALTIME, and does not properly adjust
      the sleep time value. This causes btime in /proc/stat to be incorrect
      as well as making the new CLOCK_BOTTTIME inaccurate.
      
      This patch resolves the issue by introducing a new timekeeping hook
      to allow the RTC layer to inject the sleep time on resume.
      
      The code also checks to make sure that read_persistent_clock is
      nonfunctional before setting the sleep time, so that should the RTC's
      HCTOSYS option be configured in on a system that does support
      read_persistent_clock we will not increase the total_sleep_time twice.
      
      CC: Arve Hjønnevåg <arve@android.com>
      CC: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      304529b1
  10. 22 2月, 2011 3 次提交
    • J
      timers: Add CLOCK_BOOTTIME hrtimer base · 70a08cca
      John Stultz 提交于
      CLOCK_MONOTONIC stops while the system is in suspend. This is because
      to applications system suspend is invisible. However, there is a
      growing set of applications that are wanting to be suspend-aware,
      but do not want to deal with the complications of CLOCK_REALTIME
      (which might jump around if settimeofday is called).
      
      For these applications, I propose a new clockid: CLOCK_BOOTTIME.
      CLOCK_BOOTTIME is idential to CLOCK_MONOTONIC, except it also
      includes any time spent in suspend.
      
      This patch add hrtimer base for CLOCK_BOOTTIME, using
      get_monotonic_boottime/ktime_get_boottime, to allow
      in kernel users to set timers against.
      
      CC: Jamie Lokier <jamie@shareable.org>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Alexander Shishkin <virtuoso@slind.org>
      CC: Arve Hjønnevåg <arve@android.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      70a08cca
    • J
      time: Extend get_xtime_and_monotonic_offset() to also return sleep · 314ac371
      John Stultz 提交于
      Extend get_xtime_and_monotonic_offset to
      get_xtime_and_monotonic_and_sleep_offset().
      
      CC: Jamie Lokier <jamie@shareable.org>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Alexander Shishkin <virtuoso@slind.org>
      CC: Arve Hjønnevåg <arve@android.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      314ac371
    • J
      time: Introduce get_monotonic_boottime and ktime_get_boottime · abb3a4ea
      John Stultz 提交于
      This adds new functions that return the monotonic time since boot
      (in other words, CLOCK_MONOTONIC + suspend time).
      
      CC: Jamie Lokier <jamie@shareable.org>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Alexander Shishkin <virtuoso@slind.org>
      CC: Arve Hjønnevåg <arve@android.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      abb3a4ea
  11. 02 2月, 2011 2 次提交
  12. 01 2月, 2011 1 次提交
  13. 31 1月, 2011 3 次提交
  14. 14 1月, 2011 1 次提交
  15. 14 8月, 2010 1 次提交
  16. 13 8月, 2010 1 次提交
  17. 27 7月, 2010 3 次提交
  18. 13 4月, 2010 1 次提交
    • J
      time: Remove xtime_cache · 6a867a39
      John Stultz 提交于
      With the earlier logarithmic time accumulation patch, xtime will now
      always be within one "tick" of the current time, instead of possibly
      half a second off.
      
      This removes the need for the xtime_cache value, which always stored the
      time at the last interrupt, so this patch cleans that up removing the
      xtime_cache related code.
      
      This patch also addresses an issue with an earlier version of this change,
      where xtime_cache was normalizing xtime, which could in some cases be
      not valid (ie: tv_nsec == NSEC_PER_SEC). This is fixed by handling
      the edge case in update_wall_time().
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Petr Titěra <P.Titera@century.cz>
      LKML-Reference: <1270589451-30773-1-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6a867a39
  19. 14 11月, 2009 1 次提交
    • J
      nohz: Prevent clocksource wrapping during idle · 98962465
      Jon Hunter 提交于
      The dynamic tick allows the kernel to sleep for periods longer than a
      single tick, but it does not limit the sleep time currently. In the
      worst case the kernel could sleep longer than the wrap around time of
      the time keeping clock source which would result in losing track of
      time.
      
      Prevent this by limiting it to the safe maximum sleep time of the
      current time keeping clock source. The value is calculated when the
      clock source is registered.
      
      [ tglx: simplified the code a bit and massaged the commit msg ]
      Signed-off-by: NJon Hunter <jon-hunter@ti.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1250617512-23567-2-git-send-email-jon-hunter@ti.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      98962465
  20. 24 9月, 2009 1 次提交
    • Z
      time: add function to convert between calendar time and broken-down time for universal use · 57f1f087
      Zhaolei 提交于
      There are many similar code in kernel for one object: convert time between
      calendar time and broken-down time.
      
      Here is some source I found:
        fs/ncpfs/dir.c
        fs/smbfs/proc.c
        fs/fat/misc.c
        fs/udf/udftime.c
        fs/cifs/netmisc.c
        net/netfilter/xt_time.c
        drivers/scsi/ips.c
        drivers/input/misc/hp_sdc_rtc.c
        drivers/rtc/rtc-lib.c
        arch/ia64/hp/sim/boot/fw-emu.c
        arch/m68k/mac/misc.c
        arch/powerpc/kernel/time.c
        arch/parisc/include/asm/rtc.h
        ...
      
      We can make a common function for this type of conversion, At least we
      can get following benefit:
      
      1: Make kernel simple and unify
      2: Easy to fix bug in converting code
      3: Reduce clone of code in future
         For example, I'm trying to make ftrace display walltime,
         this patch will make me easy.
      
      This code is based on code from glibc-2.6
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57f1f087
  21. 15 9月, 2009 1 次提交
    • T
      time: Prevent 32 bit overflow with set_normalized_timespec() · 12e09337
      Thomas Gleixner 提交于
      set_normalized_timespec() nsec argument is of type long. The recent
      timekeeping changes of ktime_get_ts() feed 
      
      	ts->tv_nsec + tomono.tv_nsec + nsecs
      
      to set_normalized_timespec(). On 32 bit machines that sum can be
      larger than (1 << 31) and therefor result in a negative value which
      screws up the result completely.
      
      Make the nsec argument of set_normalized_timespec() s64 to fix the
      problem at hand. This also prevents similar problems for future users
      of set_normalized_timespec().
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NCarsten Emde <carsten.emde@osadl.org>
      LKML-Reference: <new-submission>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      12e09337
  22. 22 8月, 2009 1 次提交
    • J
      time: Introduce CLOCK_REALTIME_COARSE · da15cfda
      john stultz 提交于
      After talking with some application writers who want very fast, but not
      fine-grained timestamps, I decided to try to implement new clock_ids
      to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
      which returns the time at the last tick. This is very fast as we don't
      have to access any hardware (which can be very painful if you're using
      something like the acpi_pm clocksource), and we can even use the vdso
      clock_gettime() method to avoid the syscall. The only trade off is you
      only get low-res tick grained time resolution.
      
      This isn't a new idea, I know Ingo has a patch in the -rt tree that made
      the vsyscall gettimeofday() return coarse grained time when the
      vsyscall64 sysctrl was set to 2. However this affects all applications
      on a system.
      
      With this method, applications can choose the proper speed/granularity
      trade-off for themselves.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: nikolag@ca.ibm.com
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: arjan@infradead.org
      Cc: jonathan@jonmasters.org
      LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      da15cfda
  23. 15 8月, 2009 3 次提交
  24. 02 5月, 2009 1 次提交
  25. 27 3月, 2009 1 次提交
    • A
      make exported headers use strict posix types · 85efde6f
      Arnd Bergmann 提交于
      A number of standard posix types are used in exported headers, which
      is not allowed if __STRICT_KERNEL_NAMES is defined. In order to
      get rid of the non-__STRICT_KERNEL_NAMES part and to make sane headers
      the default, we have to change them all to safe types.
      
      There are also still some leftovers in reiserfs_fs.h, elfcore.h
      and coda.h, but these files have not compiled in user space for
      a long time.
      
      This leaves out the various integer types ({u_,u,}int{8,16,32,64}_t),
      which we take care of separately.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-ppp@vger.kernel.org
      Cc: Jaroslav Kysela <perex@perex.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      85efde6f
  26. 31 12月, 2008 1 次提交
    • T
      sched_clock: prevent scd->clock from moving backwards, take #2 · 1c5745aa
      Thomas Gleixner 提交于
      Redo:
      
        5b7dba4f: sched_clock: prevent scd->clock from moving backwards
      
      which had to be reverted due to s2ram hangs:
      
        ca7e716c: Revert "sched_clock: prevent scd->clock from moving backwards"
      
      ... this time with resume restoring GTOD later in the sequence
      taken into account as well.
      
      The "timekeeping_suspended" flag is not very nice but we cannot call into
      GTOD before it has been properly resumed and the scheduler will run very
      early in the resume sequence.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1c5745aa
  27. 17 10月, 2008 1 次提交
    • C
      compat: generic compat get/settimeofday · b418da16
      Christoph Hellwig 提交于
      Nothing arch specific in get/settimeofday.  The details of the timeval
      conversion varied a little from arch to arch, but all with the same
      results.
      
      Also add an extern declaration for sys_tz to linux/time.h because externs
      in .c files are fowned upon.  I'll kill the externs in various other files
      in a sparate patch.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: David S. Miller <davem@davemloft.net> [ sparc bits ]
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b418da16
  28. 14 9月, 2008 1 次提交
    • F
      timers: fix itimer/many thread hang · f06febc9
      Frank Mayhar 提交于
      Overview
      
      This patch reworks the handling of POSIX CPU timers, including the
      ITIMER_PROF, ITIMER_VIRT timers and rlimit handling.  It was put together
      with the help of Roland McGrath, the owner and original writer of this code.
      
      The problem we ran into, and the reason for this rework, has to do with using
      a profiling timer in a process with a large number of threads.  It appears
      that the performance of the old implementation of run_posix_cpu_timers() was
      at least O(n*3) (where "n" is the number of threads in a process) or worse.
      Everything is fine with an increasing number of threads until the time taken
      for that routine to run becomes the same as or greater than the tick time, at
      which point things degrade rather quickly.
      
      This patch fixes bug 9906, "Weird hang with NPTL and SIGPROF."
      
      Code Changes
      
      This rework corrects the implementation of run_posix_cpu_timers() to make it
      run in constant time for a particular machine.  (Performance may vary between
      one machine and another depending upon whether the kernel is built as single-
      or multiprocessor and, in the latter case, depending upon the number of
      running processors.)  To do this, at each tick we now update fields in
      signal_struct as well as task_struct.  The run_posix_cpu_timers() function
      uses those fields to make its decisions.
      
      We define a new structure, "task_cputime," to contain user, system and
      scheduler times and use these in appropriate places:
      
      struct task_cputime {
      	cputime_t utime;
      	cputime_t stime;
      	unsigned long long sum_exec_runtime;
      };
      
      This is included in the structure "thread_group_cputime," which is a new
      substructure of signal_struct and which varies for uniprocessor versus
      multiprocessor kernels.  For uniprocessor kernels, it uses "task_cputime" as
      a simple substructure, while for multiprocessor kernels it is a pointer:
      
      struct thread_group_cputime {
      	struct task_cputime totals;
      };
      
      struct thread_group_cputime {
      	struct task_cputime *totals;
      };
      
      We also add a new task_cputime substructure directly to signal_struct, to
      cache the earliest expiration of process-wide timers, and task_cputime also
      replaces the it_*_expires fields of task_struct (used for earliest expiration
      of thread timers).  The "thread_group_cputime" structure contains process-wide
      timers that are updated via account_user_time() and friends.  In the non-SMP
      case the structure is a simple aggregator; unfortunately in the SMP case that
      simplicity was not achievable due to cache-line contention between CPUs (in
      one measured case performance was actually _worse_ on a 16-cpu system than
      the same test on a 4-cpu system, due to this contention).  For SMP, the
      thread_group_cputime counters are maintained as a per-cpu structure allocated
      using alloc_percpu().  The timer functions update only the timer field in
      the structure corresponding to the running CPU, obtained using per_cpu_ptr().
      
      We define a set of inline functions in sched.h that we use to maintain the
      thread_group_cputime structure and hide the differences between UP and SMP
      implementations from the rest of the kernel.  The thread_group_cputime_init()
      function initializes the thread_group_cputime structure for the given task.
      The thread_group_cputime_alloc() is a no-op for UP; for SMP it calls the
      out-of-line function thread_group_cputime_alloc_smp() to allocate and fill
      in the per-cpu structures and fields.  The thread_group_cputime_free()
      function, also a no-op for UP, in SMP frees the per-cpu structures.  The
      thread_group_cputime_clone_thread() function (also a UP no-op) for SMP calls
      thread_group_cputime_alloc() if the per-cpu structures haven't yet been
      allocated.  The thread_group_cputime() function fills the task_cputime
      structure it is passed with the contents of the thread_group_cputime fields;
      in UP it's that simple but in SMP it must also safely check that tsk->signal
      is non-NULL (if it is it just uses the appropriate fields of task_struct) and,
      if so, sums the per-cpu values for each online CPU.  Finally, the three
      functions account_group_user_time(), account_group_system_time() and
      account_group_exec_runtime() are used by timer functions to update the
      respective fields of the thread_group_cputime structure.
      
      Non-SMP operation is trivial and will not be mentioned further.
      
      The per-cpu structure is always allocated when a task creates its first new
      thread, via a call to thread_group_cputime_clone_thread() from copy_signal().
      It is freed at process exit via a call to thread_group_cputime_free() from
      cleanup_signal().
      
      All functions that formerly summed utime/stime/sum_sched_runtime values from
      from all threads in the thread group now use thread_group_cputime() to
      snapshot the values in the thread_group_cputime structure or the values in
      the task structure itself if the per-cpu structure hasn't been allocated.
      
      Finally, the code in kernel/posix-cpu-timers.c has changed quite a bit.
      The run_posix_cpu_timers() function has been split into a fast path and a
      slow path; the former safely checks whether there are any expired thread
      timers and, if not, just returns, while the slow path does the heavy lifting.
      With the dedicated thread group fields, timers are no longer "rebalanced" and
      the process_timer_rebalance() function and related code has gone away.  All
      summing loops are gone and all code that used them now uses the
      thread_group_cputime() inline.  When process-wide timers are set, the new
      task_cputime structure in signal_struct is used to cache the earliest
      expiration; this is checked in the fast path.
      
      Performance
      
      The fix appears not to add significant overhead to existing operations.  It
      generally performs the same as the current code except in two cases, one in
      which it performs slightly worse (Case 5 below) and one in which it performs
      very significantly better (Case 2 below).  Overall it's a wash except in those
      two cases.
      
      I've since done somewhat more involved testing on a dual-core Opteron system.
      
      Case 1: With no itimer running, for a test with 100,000 threads, the fixed
      	kernel took 1428.5 seconds, 513 seconds more than the unfixed system,
      	all of which was spent in the system.  There were twice as many
      	voluntary context switches with the fix as without it.
      
      Case 2: With an itimer running at .01 second ticks and 4000 threads (the most
      	an unmodified kernel can handle), the fixed kernel ran the test in
      	eight percent of the time (5.8 seconds as opposed to 70 seconds) and
      	had better tick accuracy (.012 seconds per tick as opposed to .023
      	seconds per tick).
      
      Case 3: A 4000-thread test with an initial timer tick of .01 second and an
      	interval of 10,000 seconds (i.e. a timer that ticks only once) had
      	very nearly the same performance in both cases:  6.3 seconds elapsed
      	for the fixed kernel versus 5.5 seconds for the unfixed kernel.
      
      With fewer threads (eight in these tests), the Case 1 test ran in essentially
      the same time on both the modified and unmodified kernels (5.2 seconds versus
      5.8 seconds).  The Case 2 test ran in about the same time as well, 5.9 seconds
      versus 5.4 seconds but again with much better tick accuracy, .013 seconds per
      tick versus .025 seconds per tick for the unmodified kernel.
      
      Since the fix affected the rlimit code, I also tested soft and hard CPU limits.
      
      Case 4: With a hard CPU limit of 20 seconds and eight threads (and an itimer
      	running), the modified kernel was very slightly favored in that while
      	it killed the process in 19.997 seconds of CPU time (5.002 seconds of
      	wall time), only .003 seconds of that was system time, the rest was
      	user time.  The unmodified kernel killed the process in 20.001 seconds
      	of CPU (5.014 seconds of wall time) of which .016 seconds was system
      	time.  Really, though, the results were too close to call.  The results
      	were essentially the same with no itimer running.
      
      Case 5: With a soft limit of 20 seconds and a hard limit of 2000 seconds
      	(where the hard limit would never be reached) and an itimer running,
      	the modified kernel exhibited worse tick accuracy than the unmodified
      	kernel: .050 seconds/tick versus .028 seconds/tick.  Otherwise,
      	performance was almost indistinguishable.  With no itimer running this
      	test exhibited virtually identical behavior and times in both cases.
      
      In times past I did some limited performance testing.  those results are below.
      
      On a four-cpu Opteron system without this fix, a sixteen-thread test executed
      in 3569.991 seconds, of which user was 3568.435s and system was 1.556s.  On
      the same system with the fix, user and elapsed time were about the same, but
      system time dropped to 0.007 seconds.  Performance with eight, four and one
      thread were comparable.  Interestingly, the timer ticks with the fix seemed
      more accurate:  The sixteen-thread test with the fix received 149543 ticks
      for 0.024 seconds per tick, while the same test without the fix received 58720
      for 0.061 seconds per tick.  Both cases were configured for an interval of
      0.01 seconds.  Again, the other tests were comparable.  Each thread in this
      test computed the primes up to 25,000,000.
      
      I also did a test with a large number of threads, 100,000 threads, which is
      impossible without the fix.  In this case each thread computed the primes only
      up to 10,000 (to make the runtime manageable).  System time dominated, at
      1546.968 seconds out of a total 2176.906 seconds (giving a user time of
      629.938s).  It received 147651 ticks for 0.015 seconds per tick, still quite
      accurate.  There is obviously no comparable test without the fix.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f06febc9
  29. 06 9月, 2008 1 次提交
  30. 21 8月, 2008 1 次提交
    • J
      clocksource: introduce CLOCK_MONOTONIC_RAW · 2d42244a
      John Stultz 提交于
      In talking with Josip Loncaric, and his work on clock synchronization (see
      btime.sf.net), he mentioned that for really close synchronization, it is
      useful to have access to "hardware time", that is a notion of time that is
      not in any way adjusted by the clock slewing done to keep close time sync.
      
      Part of the issue is if we are using the kernel's ntp adjusted
      representation of time in order to measure how we should correct time, we
      can run into what Paul McKenney aptly described as "Painting a road using
      the lines we're painting as the guide".
      
      I had been thinking of a similar problem, and was trying to come up with a
      way to give users access to a purely hardware based time representation
      that avoided users having to know the underlying frequency and mask values
      needed to deal with the wide variety of possible underlying hardware
      counters.
      
      My solution is to introduce CLOCK_MONOTONIC_RAW.  This exposes a
      nanosecond based time value, that increments starting at bootup and has no
      frequency adjustments made to it what so ever.
      
      The time is accessed from userspace via the posix_clock_gettime() syscall,
      passing CLOCK_MONOTONIC_RAW as the clock_id.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d42244a