1. 27 7月, 2010 1 次提交
    • J
      x86: Fix vtime/file timestamp inconsistencies · 8c73626a
      John Stultz 提交于
      Due to vtime calling vgettimeofday(), its possible that an application
      could call  time();create("stuff",O_RDRW);  only to see the file's
      creation timestamp to be before the value returned by time.
      
      A similar way to reproduce the issue is to compare the vsyscall time()
      with the syscall time(), and observe ordering issues.
      
      The modified test case from Oleg Nesterov below can illustrate this:
      
      int main(void)
      {
      	time_t sec1,sec2;
      	do {
      		sec1 = time(&sec2);
      		sec2 = syscall(__NR_time, NULL);
      	} while (sec1 <= sec2);
      
      	printf("vtime: %d.000000\n", sec1);
      	printf("time: %d.000000\n", sec2);
      	return 0;
      }
      
      The proper fix is to make vtime use the same time value as
      current_kernel_time() (which is exported via update_vsyscall) instead of
      vgettime().
      
      Thanks to Jiri Olsa for bringing up the issue and catching bugs in
      earlier verisons of this fix.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      LKML-Reference: <1279068988-21864-2-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      8c73626a
  2. 01 3月, 2010 1 次提交
  3. 17 11月, 2009 1 次提交
    • L
      timekeeping: Fix clock_gettime vsyscall time warp · 0696b711
      Lin Ming 提交于
      Since commit 0a544198 "timekeeping: Move NTP adjusted clock multiplier
      to struct timekeeper" the clock multiplier of vsyscall is updated with
      the unmodified clock multiplier of the clock source and not with the
      NTP adjusted multiplier of the timekeeper.
      
      This causes user space observerable time warps:
      new CLOCK-warp maximum: 120 nsecs,  00000025c337c537 -> 00000025c337c4bf
      
      Add a new argument "mult" to update_vsyscall() and hand in the
      timekeeping internal NTP adjusted multiplier.
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Cc: "Zhang Yanmin" <yanmin_zhang@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Tony Luck <tony.luck@intel.com>
      LKML-Reference: <1258436990.17765.83.camel@minggr.sh.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      0696b711
  4. 12 11月, 2009 1 次提交
  5. 24 9月, 2009 1 次提交
  6. 22 8月, 2009 1 次提交
    • J
      time: Introduce CLOCK_REALTIME_COARSE · da15cfda
      john stultz 提交于
      After talking with some application writers who want very fast, but not
      fine-grained timestamps, I decided to try to implement new clock_ids
      to clock_gettime(): CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE
      which returns the time at the last tick. This is very fast as we don't
      have to access any hardware (which can be very painful if you're using
      something like the acpi_pm clocksource), and we can even use the vdso
      clock_gettime() method to avoid the syscall. The only trade off is you
      only get low-res tick grained time resolution.
      
      This isn't a new idea, I know Ingo has a patch in the -rt tree that made
      the vsyscall gettimeofday() return coarse grained time when the
      vsyscall64 sysctrl was set to 2. However this affects all applications
      on a system.
      
      With this method, applications can choose the proper speed/granularity
      trade-off for themselves.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: nikolag@ca.ibm.com
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: arjan@infradead.org
      Cc: jonathan@jonmasters.org
      LKML-Reference: <1250734414.6897.5.camel@localhost.localdomain>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      da15cfda
  7. 28 5月, 2009 1 次提交
  8. 13 11月, 2008 1 次提交
  9. 12 11月, 2008 2 次提交
    • I
      tracing: branch tracer, fix vdso crash · 2b7d0390
      Ingo Molnar 提交于
      Impact: fix bootup crash
      
      the branch tracer missed arch/x86/vdso/vclock_gettime.c from
      disabling tracing, which caused such bootup crashes:
      
        [  201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077
      
      also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by
      creating DISABLE_UNLIKELY_PROFILE facility for code to turn off
      instrumentation on a per file basis.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2b7d0390
    • S
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt 提交于
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f0d69a9
  10. 09 11月, 2008 1 次提交
  11. 08 7月, 2008 1 次提交
  12. 26 6月, 2008 2 次提交
  13. 24 5月, 2008 1 次提交
  14. 25 4月, 2008 1 次提交
  15. 01 3月, 2008 1 次提交
  16. 26 2月, 2008 1 次提交
    • T
      x86: fix vsyscall wreckage · ce28b986
      Thomas Gleixner 提交于
      based on a report from Arne Georg Gleditsch about user-space apps
      misbehaving after toggling /proc/sys/kernel/vsyscall64, a review
      of the code revealed that the "NOP patching" done there is
      fundamentally unsafe for a number of reasons:
      
      1) the patching code runs without synchronizing other CPUs
      
      2) it inserts NOPs even if there is no clock source which provides vread
      
      3) when the clock source changes to one without vread we run in
         exactly the same problem as in #2
      
      4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then
         the syscall is not patched out
      
      as a result it is possible to break user-space via this patching.
      The only safe thing for now is to remove the patching.
      
      This code was broken since v2.6.21.
      Reported-by: NArne Georg Gleditsch <arne.gleditsch@dolphinics.no>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce28b986
  17. 30 1月, 2008 4 次提交
  18. 20 10月, 2007 3 次提交
  19. 19 10月, 2007 3 次提交
  20. 18 10月, 2007 2 次提交
    • A
      x86: remove duplicated vsyscall nsec update · c861eff8
      Andi Kleen 提交于
      Spotted by Chuck Ebbert
      
      [ tglx: arch/x86 adaptation ]
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c861eff8
    • M
      x86: fix cpu_to_node references · 98c9e27a
      Mike Travis 提交于
      In x86_64 and i386 architectures most arrays that are sized using
      NR_CPUS lay in local memory on node 0.  Not only will most (99%?) of the
      systems not use all the slots in these arrays, particularly when NR_CPUS
      is increased to accommodate future very high cpu count systems, but a
      number of cache lines are passed unnecessarily on the system bus when
      these arrays are referenced by cpus on other nodes.
      
      Typically, the values in these arrays are referenced by the cpu
      accessing it's own values, though when passing IPI interrupts, the cpu
      does access the data relevant to the targeted cpu/node.  Of course, if
      the referencing cpu is not on node 0, then the reference will still
      require cross node exchanges of cache lines.  A common use of this is
      for an interrupt service routine to pass the interrupt to other cpus
      local to that node.
      
      Ideally, all the elements in these arrays should be moved to the per_cpu
      data area.  In some cases (such as x86_cpu_to_apicid) the array is
      referenced before the per_cpu data areas are setup.  In this case, a
      static array is declared in the __initdata area and initialized by the
      booting cpu (BSP).  The values are then moved to the per_cpu area after
      it is initialized and the original static array is freed with the rest
      of the __initdata.
      
      This patch:
      
      Fix four instances where cpu_to_node is referenced by array instead of
      via the cpu_to_node macro.  This is preparation to moving it to the
      per_cpu data area.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      98c9e27a
  21. 14 10月, 2007 1 次提交
    • D
      Delete filenames in comments. · 835c34a1
      Dave Jones 提交于
      Since the x86 merge, lots of files that referenced their own filenames
      are no longer correct.  Rather than keep them up to date, just delete
      them, as they add no real value.
      
      Additionally:
      - fix up comment formatting in scx200_32.c
      - Remove a credit from myself in setup_64.c from a time when we had no SCM
      - remove longwinded history from tsc_32.c which can be figured out from
        git.
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      835c34a1
  22. 11 10月, 2007 2 次提交
  23. 22 7月, 2007 1 次提交
    • A
      x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu · 2aae950b
      Andi Kleen 提交于
      This implements new vDSO for x86-64.  The concept is similar
      to the existing vDSOs on i386 and PPC.  x86-64 has had static
      vsyscalls before,  but these are not flexible enough anymore.
      
      A vDSO is a ELF shared library supplied by the kernel that is mapped into
      user address space.  The vDSO mapping is randomized for each process
      for security reasons.
      
      Doing this was needed for clock_gettime, because clock_gettime
      always needs a syscall fallback and having one at a fixed
      address would have made buffer overflow exploits too easy to write.
      
      The vdso can be disabled with vdso=0
      
      It currently includes a new gettimeofday implemention and optimized
      clock_gettime(). The gettimeofday implementation is slightly faster
      than the one in the old vsyscall.  clock_gettime is significantly faster
      than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.
      
      The new calls are generally faster than the old vsyscall.
      
      Advantages over the old x86-64 vsyscalls:
      - Extensible
      - Randomized
      - Cleaner
      - Easier to virtualize (the old static address range previously causes
      overhead e.g. for Xen because it has to create special page tables for it)
      
      Weak points:
      - glibc support still to be written
      
      The VM interface is partly based on Ingo Molnar's i386 version.
      
      Includes compile fix from Joachim Deguara
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2aae950b
  24. 22 5月, 2007 1 次提交
    • J
      x86_64: vsyscall time() fix · d0aff6e6
      john stultz 提交于
      The vsyscall time() function basically returns the second portion of
      xtime directly.  This however means that there is about a ticks worth of
      time each second where time() will return a second value less then what
      gettimeofday() does.
      
      Additionally, this window where vtime() is behind vgettimeofday() grows
      when dynticks is enabled, so its probably good to get this in before
      dynticks lands.
      
      Big thanks to Sripathi for noticing this issue and creating a test case
      to work with!
      
      This patch changes the vtime() implemenation to call vgettimeofday(),
      much as syscall time() implementation calls gettimeofday().
      
      2.6.21 stable candidate too
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0aff6e6
  25. 10 5月, 2007 1 次提交
    • R
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki 提交于
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
  26. 03 5月, 2007 3 次提交
    • E
      [PATCH] x86-64: vsyscall_gtod_data diet and vgettimeofday() fix · c8118c6c
      Eric Dumazet 提交于
      Current vsyscall_gtod_data is large (3 or 4 cache lines dirtied at timer
      interrupt). We can shrink it to exactly 64 bytes (1 cache line on AMD64)
      
      Instead of copying a whole struct clocksource, we copy only needed fields.
      
      I deleted an unused field : offset_base
      
      This patch fixes one oddity in vgettimeofday(): It can returns a timeval with
      tv_usec = 1000000. Maybe not a bug, but why not doing the right thing ?
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      c8118c6c
    • E
      [PATCH] x86-64: fix vtime() vsyscall · 272a3713
      Eric Dumazet 提交于
      There is a tiny probability that the return value from vtime(time_t *t) is
      Signed-off-by: NAndi Kleen <ak@suse.de>
      
      different than the value stored in *t
      
      Using a temporary variable solves the problem and gives a faster code.
      
         17:   48 85 ff                test   %rdi,%rdi
         1a:   48 8b 05 00 00 00 00    mov    0(%rip),%rax        #
      __vsyscall_gtod_data.wall_time_tv.tv_sec
         21:   74 03                   je     26
         23:   48 89 07                mov    %rax,(%rdi)
         26:   c9                      leaveq
         27:   c3                      retq
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      272a3713
    • V
      [PATCH] x86: __pa and __pa_symbol address space separation · 0dbf7028
      Vivek Goyal 提交于
      Currently __pa_symbol is for use with symbols in the kernel address
      map and __pa is for use with pointers into the physical memory map.
      But the code is implemented so you can usually interchange the two.
      
      __pa which is much more common can be implemented much more cheaply
      if it is it doesn't have to worry about any other kernel address
      spaces.  This is especially true with a relocatable kernel as
      __pa_symbol needs to peform an extra variable read to resolve
      the address.
      
      There is a third macro that is added for the vsyscall data
      __pa_vsymbol for finding the physical addesses of vsyscall pages.
      
      Most of this patch is simply sorting through the references to
      __pa or __pa_symbol and using the proper one.  A little of
      it is continuing to use a physical address when we have it
      instead of recalculating it several times.
      
      swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
      and init_mm.pgd is initialized at boot (instead of compile time)
      to the physmem virtual mapping of init_level4_pgd.  The
      physical address changed.
      
      Except for the for EMPTY_ZERO page all of the remaining references
      to __pa_symbol appear to be during kernel initialization.  So this
      should reduce the cost of __pa in the common case, even on a relocated
      kernel.
      
      As this is technically a semantic change we need to be on the lookout
      for anything I missed.  But it works for me (tm).
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      0dbf7028
  27. 15 3月, 2007 1 次提交