1. 12 11月, 2008 1 次提交
    • S
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt 提交于
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f0d69a9
  2. 08 7月, 2008 1 次提交
  3. 26 6月, 2008 2 次提交
  4. 24 5月, 2008 1 次提交
  5. 25 4月, 2008 1 次提交
  6. 01 3月, 2008 1 次提交
  7. 26 2月, 2008 1 次提交
    • T
      x86: fix vsyscall wreckage · ce28b986
      Thomas Gleixner 提交于
      based on a report from Arne Georg Gleditsch about user-space apps
      misbehaving after toggling /proc/sys/kernel/vsyscall64, a review
      of the code revealed that the "NOP patching" done there is
      fundamentally unsafe for a number of reasons:
      
      1) the patching code runs without synchronizing other CPUs
      
      2) it inserts NOPs even if there is no clock source which provides vread
      
      3) when the clock source changes to one without vread we run in
         exactly the same problem as in #2
      
      4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then
         the syscall is not patched out
      
      as a result it is possible to break user-space via this patching.
      The only safe thing for now is to remove the patching.
      
      This code was broken since v2.6.21.
      Reported-by: NArne Georg Gleditsch <arne.gleditsch@dolphinics.no>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce28b986
  8. 30 1月, 2008 4 次提交
  9. 20 10月, 2007 3 次提交
  10. 19 10月, 2007 3 次提交
  11. 18 10月, 2007 2 次提交
    • A
      x86: remove duplicated vsyscall nsec update · c861eff8
      Andi Kleen 提交于
      Spotted by Chuck Ebbert
      
      [ tglx: arch/x86 adaptation ]
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c861eff8
    • M
      x86: fix cpu_to_node references · 98c9e27a
      Mike Travis 提交于
      In x86_64 and i386 architectures most arrays that are sized using
      NR_CPUS lay in local memory on node 0.  Not only will most (99%?) of the
      systems not use all the slots in these arrays, particularly when NR_CPUS
      is increased to accommodate future very high cpu count systems, but a
      number of cache lines are passed unnecessarily on the system bus when
      these arrays are referenced by cpus on other nodes.
      
      Typically, the values in these arrays are referenced by the cpu
      accessing it's own values, though when passing IPI interrupts, the cpu
      does access the data relevant to the targeted cpu/node.  Of course, if
      the referencing cpu is not on node 0, then the reference will still
      require cross node exchanges of cache lines.  A common use of this is
      for an interrupt service routine to pass the interrupt to other cpus
      local to that node.
      
      Ideally, all the elements in these arrays should be moved to the per_cpu
      data area.  In some cases (such as x86_cpu_to_apicid) the array is
      referenced before the per_cpu data areas are setup.  In this case, a
      static array is declared in the __initdata area and initialized by the
      booting cpu (BSP).  The values are then moved to the per_cpu area after
      it is initialized and the original static array is freed with the rest
      of the __initdata.
      
      This patch:
      
      Fix four instances where cpu_to_node is referenced by array instead of
      via the cpu_to_node macro.  This is preparation to moving it to the
      per_cpu data area.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      98c9e27a
  12. 14 10月, 2007 1 次提交
    • D
      Delete filenames in comments. · 835c34a1
      Dave Jones 提交于
      Since the x86 merge, lots of files that referenced their own filenames
      are no longer correct.  Rather than keep them up to date, just delete
      them, as they add no real value.
      
      Additionally:
      - fix up comment formatting in scx200_32.c
      - Remove a credit from myself in setup_64.c from a time when we had no SCM
      - remove longwinded history from tsc_32.c which can be figured out from
        git.
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      835c34a1
  13. 11 10月, 2007 2 次提交
  14. 22 7月, 2007 1 次提交
    • A
      x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu · 2aae950b
      Andi Kleen 提交于
      This implements new vDSO for x86-64.  The concept is similar
      to the existing vDSOs on i386 and PPC.  x86-64 has had static
      vsyscalls before,  but these are not flexible enough anymore.
      
      A vDSO is a ELF shared library supplied by the kernel that is mapped into
      user address space.  The vDSO mapping is randomized for each process
      for security reasons.
      
      Doing this was needed for clock_gettime, because clock_gettime
      always needs a syscall fallback and having one at a fixed
      address would have made buffer overflow exploits too easy to write.
      
      The vdso can be disabled with vdso=0
      
      It currently includes a new gettimeofday implemention and optimized
      clock_gettime(). The gettimeofday implementation is slightly faster
      than the one in the old vsyscall.  clock_gettime is significantly faster
      than the syscall for CLOCK_MONOTONIC and CLOCK_REALTIME.
      
      The new calls are generally faster than the old vsyscall.
      
      Advantages over the old x86-64 vsyscalls:
      - Extensible
      - Randomized
      - Cleaner
      - Easier to virtualize (the old static address range previously causes
      overhead e.g. for Xen because it has to create special page tables for it)
      
      Weak points:
      - glibc support still to be written
      
      The VM interface is partly based on Ingo Molnar's i386 version.
      
      Includes compile fix from Joachim Deguara
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2aae950b
  15. 22 5月, 2007 1 次提交
    • J
      x86_64: vsyscall time() fix · d0aff6e6
      john stultz 提交于
      The vsyscall time() function basically returns the second portion of
      xtime directly.  This however means that there is about a ticks worth of
      time each second where time() will return a second value less then what
      gettimeofday() does.
      
      Additionally, this window where vtime() is behind vgettimeofday() grows
      when dynticks is enabled, so its probably good to get this in before
      dynticks lands.
      
      Big thanks to Sripathi for noticing this issue and creating a test case
      to work with!
      
      This patch changes the vtime() implemenation to call vgettimeofday(),
      much as syscall time() implementation calls gettimeofday().
      
      2.6.21 stable candidate too
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0aff6e6
  16. 10 5月, 2007 1 次提交
    • R
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki 提交于
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
  17. 03 5月, 2007 3 次提交
    • E
      [PATCH] x86-64: vsyscall_gtod_data diet and vgettimeofday() fix · c8118c6c
      Eric Dumazet 提交于
      Current vsyscall_gtod_data is large (3 or 4 cache lines dirtied at timer
      interrupt). We can shrink it to exactly 64 bytes (1 cache line on AMD64)
      
      Instead of copying a whole struct clocksource, we copy only needed fields.
      
      I deleted an unused field : offset_base
      
      This patch fixes one oddity in vgettimeofday(): It can returns a timeval with
      tv_usec = 1000000. Maybe not a bug, but why not doing the right thing ?
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      c8118c6c
    • E
      [PATCH] x86-64: fix vtime() vsyscall · 272a3713
      Eric Dumazet 提交于
      There is a tiny probability that the return value from vtime(time_t *t) is
      Signed-off-by: NAndi Kleen <ak@suse.de>
      
      different than the value stored in *t
      
      Using a temporary variable solves the problem and gives a faster code.
      
         17:   48 85 ff                test   %rdi,%rdi
         1a:   48 8b 05 00 00 00 00    mov    0(%rip),%rax        #
      __vsyscall_gtod_data.wall_time_tv.tv_sec
         21:   74 03                   je     26
         23:   48 89 07                mov    %rax,(%rdi)
         26:   c9                      leaveq
         27:   c3                      retq
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      272a3713
    • V
      [PATCH] x86: __pa and __pa_symbol address space separation · 0dbf7028
      Vivek Goyal 提交于
      Currently __pa_symbol is for use with symbols in the kernel address
      map and __pa is for use with pointers into the physical memory map.
      But the code is implemented so you can usually interchange the two.
      
      __pa which is much more common can be implemented much more cheaply
      if it is it doesn't have to worry about any other kernel address
      spaces.  This is especially true with a relocatable kernel as
      __pa_symbol needs to peform an extra variable read to resolve
      the address.
      
      There is a third macro that is added for the vsyscall data
      __pa_vsymbol for finding the physical addesses of vsyscall pages.
      
      Most of this patch is simply sorting through the references to
      __pa or __pa_symbol and using the proper one.  A little of
      it is continuing to use a physical address when we have it
      instead of recalculating it several times.
      
      swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
      and init_mm.pgd is initialized at boot (instead of compile time)
      to the physmem virtual mapping of init_level4_pgd.  The
      physical address changed.
      
      Except for the for EMPTY_ZERO page all of the remaining references
      to __pa_symbol appear to be during kernel initialization.  So this
      should reduce the cost of __pa in the common case, even on a relocated
      kernel.
      
      As this is technically a semantic change we need to be on the lookout
      for anything I missed.  But it works for me (tm).
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      0dbf7028
  18. 15 3月, 2007 1 次提交
  19. 17 2月, 2007 1 次提交
  20. 15 2月, 2007 2 次提交
  21. 11 12月, 2006 1 次提交
  22. 08 12月, 2006 2 次提交
  23. 07 12月, 2006 1 次提交
    • E
      [PATCH] x86-64: fix perms/range of vsyscall vma in /proc/*/maps · 103efcd9
      Ernie Petrides 提交于
      The final line of /proc/<pid>/maps on x86_64 for native 64-bit
      tasks shows an incorrect ending address and incorrect permissions.  There
      is only a single page mapped in this vsyscall region, and it is accessible
      for both read and execute.
      
      The patch below fixes this.  (Since 32-bit-compat tasks have a real vma
      with correct perms/range, no change is necessary for that scenario.)
      
      Before the patch, a "cat /proc/self/maps | tail -1" shows this:
      
              ffffffffff600000-ffffffffffe00000 ---p 00000000 [...]
      
      After the patch, this is the output:
      
              ffffffffff600000-ffffffffff601000 r-xp 00000000 [...]
      Signed-off-by: NErnie Petrides <petrides@redhat.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      103efcd9
  24. 17 11月, 2006 1 次提交
  25. 14 11月, 2006 1 次提交
    • A
      [PATCH] x86-64: Fix vgetcpu when CONFIG_HOTPLUG_CPU is disabled · 8c131af1
      Andi Kleen 提交于
      The vgetcpu per CPU initialization previously relied on CPU hotplug
      events for all CPUs to initialize the per CPU state. That only
      worked only on kernels with CONFIG_HOTPLUG_CPU enabled.  On the
      others some CPUs didn't get their state initialized properly
      and vgetcpu wouldn't work.
      
      Change the initialization sequence to instead run in a normal
      initcall (which runs after the normal CPU bootup) and initialize
      all running CPUs there. Later hotplug CPUs are still handled
      with an hotplug notifier.
      
      This actually simplifies the code somewhat.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      8c131af1
  26. 01 10月, 2006 1 次提交
    • A
      [PATCH] kill wall_jiffies · 8ef38609
      Atsushi Nemoto 提交于
      With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
      So we can kill wall_jiffies completely.
      
      This is just a cleanup and logically should not change any real behavior
      except for one thing: RTC updating code in (old) ppc and xtensa use a
      condition "jiffies - wall_jiffies == 1".  This condition is never met so I
      suppose it is just a bug.  I just remove that condition only instead of
      kill the whole "if" block.
      
      [heiko.carstens@de.ibm.com: s390 build fix and cleanup]
      Signed-off-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8ef38609