1. 23 11月, 2008 2 次提交
    • S
      trace: profile all if conditionals · 2bcd521a
      Steven Rostedt 提交于
      Impact: feature to profile if statements
      
      This patch adds a branch profiler for all if () statements.
      The results will be found in:
      
        /debugfs/tracing/profile_branch
      
      For example:
      
         miss      hit    %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
             0        1 100 x86_64_start_reservations      head64.c             127
             0        1 100 copy_bootdata                  head64.c             69
             1        0   0 x86_64_start_kernel            head64.c             111
            32        0   0 set_intr_gate                  desc.h               319
             1        0   0 reserve_ebda_region            head.c               51
             1        0   0 reserve_ebda_region            head.c               47
             0        1 100 reserve_ebda_region            head.c               42
             0        0   X maxcpus                        main.c               165
      
      Miss means the branch was not taken. Hit means the branch was taken.
      The percent is the percentage the branch was taken.
      
      This adds a significant amount of overhead and should only be used
      by those analyzing their system.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2bcd521a
    • S
      trace: consolidate unlikely and likely profiler · 45b79749
      Steven Rostedt 提交于
      Impact: clean up to make one profiler of like and unlikely tracer
      
      The likely and unlikely profiler prints out the file and line numbers
      of the annotated branches that it is profiling. It shows the number
      of times it was correct or incorrect in its guess. Having two
      different files or sections for that matter to tell us if it was a
      likely or unlikely is pretty pointless. We really only care if
      it was correct or not.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      45b79749
  2. 16 11月, 2008 1 次提交
    • M
      tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() · 7e066fb8
      Mathieu Desnoyers 提交于
      Impact: API *CHANGE*. Must update all tracepoint users.
      
      Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint
      structure in a single spot for all the kernel. It helps reducing memory
      consumption, especially when declaring a lot of tracepoints, e.g. for
      kmalloc tracing.
      
      *API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for
      tracepoint declarations rather than DEFINE_TRACE(). This is the sane way
      to do it. The name previously used was misleading.
      
      Updates scheduler instrumentation to follow this API change.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7e066fb8
  3. 13 11月, 2008 1 次提交
  4. 12 11月, 2008 1 次提交
    • S
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt 提交于
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f0d69a9
  5. 17 10月, 2008 1 次提交
    • J
      driver core: basic infrastructure for per-module dynamic debug messages · 346e15be
      Jason Baron 提交于
      Base infrastructure to enable per-module debug messages.
      
      I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
      control of debugging statements on a per-module basis in one /proc file,
      currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
      is not set, debugging statements can still be enabled as before, often by
      defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
      affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.
      
      The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
      is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
      can be dynamically enabled/disabled on a per-module basis.
      
      Future plans include extending this functionality to subsystems, that define 
      their own debug levels and flags.
      
      Usage:
      
      Dynamic debugging is controlled by the debugfs file, 
      <debugfs>/dynamic_printk/modules. This file contains a list of the modules that
      can be enabled. The format of the file is as follows:
      
      	<module_name> <enabled=0/1>
      		.
      		.
      		.
      
      	<module_name> : Name of the module in which the debug call resides
      	<enabled=0/1> : whether the messages are enabled or not
      
      For example:
      
      	snd_hda_intel enabled=0
      	fixup enabled=1
      	driver enabled=0
      
      Enable a module:
      
      	$echo "set enabled=1 <module_name>" > dynamic_printk/modules
      
      Disable a module:
      
      	$echo "set enabled=0 <module_name>" > dynamic_printk/modules
      
      Enable all modules:
      
      	$echo "set enabled=1 all" > dynamic_printk/modules
      
      Disable all modules:
      
      	$echo "set enabled=0 all" > dynamic_printk/modules
      
      Finally, passing "dynamic_printk" at the command line enables
      debugging for all modules. This mode can be turned off via the above
      disable command.
      
      [gkh: minor cleanups and tweaks to make the build work quietly]
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      346e15be
  6. 16 10月, 2008 3 次提交
    • T
      genirq: revert dynarray · d6c88a50
      Thomas Gleixner 提交于
      Revert the dynarray changes. They need more thought and polishing.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d6c88a50
    • Y
      add per_cpu_dyn_array support · 1f3fcd4b
      Yinghai Lu 提交于
      allow dyn-array in per_cpu area, allocated dynamically.
      
      usage:
      
      |  /* in .h */
      | struct kernel_stat {
      |        struct cpu_usage_stat   cpustat;
      |        unsigned int *irqs;
      | };
      |
      |  /* in .c */
      | DEFINE_PER_CPU(struct kernel_stat, kstat);
      |
      | DEFINE_PER_CPU_DYN_ARRAY_ADDR(per_cpu__kstat_irqs, per_cpu__kstat.irqs, sizeof(unsigned int), nr_irqs, sizeof(unsigned long), NULL);
      
      after setup_percpu()/per_cpu_alloc_dyn_array(), the dyn_array in
      per_cpu area is ready to use.
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f3fcd4b
    • Y
      generic: add dyn_array support · 3ddfda11
      Yinghai Lu 提交于
      Allow crazy big arrays via bootmem at init stage.
      Architectures use CONFIG_HAVE_DYN_ARRAY to enable it.
      
      usage:
      
      | static struct irq_desc irq_desc_init __initdata = {
      |        .status = IRQ_DISABLED,
      |        .chip = &no_irq_chip,
      |        .handle_irq = handle_bad_irq,
      |        .depth = 1,
      |        .lock = __SPIN_LOCK_UNLOCKED(irq_desc->lock),
      | #ifdef CONFIG_SMP
      |        .affinity = CPU_MASK_ALL
      | #endif
      | };
      |
      | static void __init init_work(void *data)
      | {
      |        struct dyn_array *da = data;
      |        struct  irq_desc *desc;
      |        int i;
      |
      |        desc = *da->name;
      |
      |        for (i = 0; i < *da->nr; i++)
      |                memcpy(&desc[i], &irq_desc_init, sizeof(struct irq_desc));
      | }
      |
      | struct irq_desc *irq_desc;
      | DEFINE_DYN_ARRAY(irq_desc, sizeof(struct irq_desc), nr_irqs, PAGE_SIZE, init_work);
      
      after pre_alloc_dyn_array() after setup_arch(), the array is ready to be
      used.
      
      Via this facility we can replace irq_desc[NR_IRQS] array with dyn_array
      irq_desc[nr_irqs].
      
      v2: remove _nopanic in pre_alloc_dyn_array()
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3ddfda11
  7. 14 10月, 2008 2 次提交
    • S
      ftrace: create __mcount_loc section · 8da3821b
      Steven Rostedt 提交于
      This patch creates a section in the kernel called "__mcount_loc".
      This will hold a list of pointers to the mcount relocation for
      each call site of mcount.
      
      For example:
      
      objdump -dr init/main.o
      [...]
      Disassembly of section .text:
      
      0000000000000000 <do_one_initcall>:
         0:   55                      push   %rbp
      [...]
      000000000000017b <init_post>:
       17b:   55                      push   %rbp
       17c:   48 89 e5                mov    %rsp,%rbp
       17f:   53                      push   %rbx
       180:   48 83 ec 08             sub    $0x8,%rsp
       184:   e8 00 00 00 00          callq  189 <init_post+0xe>
                              185: R_X86_64_PC32      mcount+0xfffffffffffffffc
      [...]
      
      We will add a section to point to each function call.
      
         .section __mcount_loc,"a",@progbits
      [...]
         .quad .text + 0x185
      [...]
      
      The offset to of the mcount call site in init_post is an offset from
      the start of the section, and not the start of the function init_post.
      The mcount relocation is at the call site 0x185 from the start of the
      .text section.
      
        .text + 0x185  == init_post + 0xa
      
      We need a way to add this __mcount_loc section in a way that we do not
      lose the relocations after final link.  The .text section here will
      be attached to all other .text sections after final link and the
      offsets will be meaningless.  We need to keep track of where these
      .text sections are.
      
      To do this, we use the start of the first function in the section.
      do_one_initcall.  We can make a tmp.s file with this function as a reference
      to the start of the .text section.
      
         .section __mcount_loc,"a",@progbits
      [...]
         .quad do_one_initcall + 0x185
      [...]
      
      Then we can compile the tmp.s into a tmp.o
      
        gcc -c tmp.s -o tmp.o
      
      And link it into back into main.o.
      
        ld -r main.o tmp.o -o tmp_main.o
        mv tmp_main.o main.o
      
      But we have a problem.  What happens if the first function in a section
      is not exported, and is a static function. The linker will not let
      the tmp.o use it.  This case exists in main.o as well.
      
      Disassembly of section .init.text:
      
      0000000000000000 <set_reset_devices>:
         0:   55                      push   %rbp
         1:   48 89 e5                mov    %rsp,%rbp
         4:   e8 00 00 00 00          callq  9 <set_reset_devices+0x9>
                              5: R_X86_64_PC32        mcount+0xfffffffffffffffc
      
      The first function in .init.text is a static function.
      
      00000000000000a8 t __setup_set_reset_devices
      000000000000105f t __setup_str_set_reset_devices
      0000000000000000 t set_reset_devices
      
      The lowercase 't' means that set_reset_devices is local and is not exported.
      If we simply try to link the tmp.o with the set_reset_devices we end
      up with two symbols: one local and one global.
      
       .section __mcount_loc,"a",@progbits
       .quad set_reset_devices + 0x10
      
      00000000000000a8 t __setup_set_reset_devices
      000000000000105f t __setup_str_set_reset_devices
      0000000000000000 t set_reset_devices
                       U set_reset_devices
      
      We still have an undefined reference to set_reset_devices, and if we try
      to compile the kernel, we will end up with an undefined reference to
      set_reset_devices, or even worst, it could be exported someplace else,
      and then we will have a reference to the wrong location.
      
      To handle this case, we make an intermediate step using objcopy.
      We convert set_reset_devices into a global exported symbol before linking
      it with tmp.o and set it back afterwards.
      
      00000000000000a8 t __setup_set_reset_devices
      000000000000105f t __setup_str_set_reset_devices
      0000000000000000 T set_reset_devices
      
      00000000000000a8 t __setup_set_reset_devices
      000000000000105f t __setup_str_set_reset_devices
      0000000000000000 T set_reset_devices
      
      00000000000000a8 t __setup_set_reset_devices
      000000000000105f t __setup_str_set_reset_devices
      0000000000000000 t set_reset_devices
      
      Now we have a section in main.o called __mcount_loc that we can place
      somewhere in the kernel using vmlinux.ld.S and access it to convert
      all these locations that call mcount into nops before starting SMP
      and thus, eliminating the need to do this with kstop_machine.
      
      Note, A well documented perl script (scripts/recordmcount.pl) is used
      to do all this in one location.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8da3821b
    • M
      tracing: Kernel Tracepoints · 97e1c18e
      Mathieu Desnoyers 提交于
      Implementation of kernel tracepoints. Inspired from the Linux Kernel
      Markers. Allows complete typing verification by declaring both tracing
      statement inline functions and probe registration/unregistration static
      inline functions within the same macro "DEFINE_TRACE". No format string
      is required. See the tracepoint Documentation and Samples patches for
      usage examples.
      
      Taken from the documentation patch :
      
      "A tracepoint placed in code provides a hook to call a function (probe)
      that you can provide at runtime. A tracepoint can be "on" (a probe is
      connected to it) or "off" (no probe is attached). When a tracepoint is
      "off" it has no effect, except for adding a tiny time penalty (checking
      a condition for a branch) and space penalty (adding a few bytes for the
      function call at the end of the instrumented function and adds a data
      structure in a separate section).  When a tracepoint is "on", the
      function you provide is called each time the tracepoint is executed, in
      the execution context of the caller. When the function provided ends its
      execution, it returns to the caller (continuing from the tracepoint
      site).
      
      You can put tracepoints at important locations in the code. They are
      lightweight hooks that can pass an arbitrary number of parameters, which
      prototypes are described in a tracepoint declaration placed in a header
      file."
      
      Addition and removal of tracepoints is synchronized by RCU using the
      scheduler (and preempt_disable) as guarantees to find a quiescent state
      (this is really RCU "classic"). The update side uses rcu_barrier_sched()
      with call_rcu_sched() and the read/execute side uses
      "preempt_disable()/preempt_enable()".
      
      We make sure the previous array containing probes, which has been
      scheduled for deletion by the rcu callback, is indeed freed before we
      proceed to the next update. It therefore limits the rate of modification
      of a single tracepoint to one update per RCU period. The objective here
      is to permit fast batch add/removal of probes on _different_
      tracepoints.
      
      Changelog :
      - Use #name ":" #proto as string to identify the tracepoint in the
        tracepoint table. This will make sure not type mismatch happens due to
        connexion of a probe with the wrong type to a tracepoint declared with
        the same name in a different header.
      - Add tracepoint_entry_free_old.
      - Change __TO_TRACE to get rid of the 'i' iterator.
      
      Masami Hiramatsu <mhiramat@redhat.com> :
      Tested on x86-64.
      
      Performance impact of a tracepoint : same as markers, except that it
      adds about 70 bytes of instructions in an unlikely branch of each
      instrumented function (the for loop, the stack setup and the function
      call). It currently adds a memory read, a test and a conditional branch
      at the instrumentation site (in the hot path). Immediate values will
      eventually change this into a load immediate, test and branch, which
      removes the memory read which will make the i-cache impact smaller
      (changing the memory read for a load immediate removes 3-4 bytes per
      site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
      also saves the d-cache hit).
      
      About the performance impact of tracepoints (which is comparable to
      markers), even without immediate values optimizations, tests done by
      Hideo Aoki on ia64 show no regression. His test case was using hackbench
      on a kernel where scheduler instrumentation (about 5 events in code
      scheduler code) was added.
      
      Quoting Hideo Aoki about Markers :
      
      I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
      tree, which includes several markers for LTTng, using an ia64 server.
      
      While the immediate trace mark feature isn't implemented on ia64, there
      is no major performance regression. So, I think that we don't have any
      issues to propose merging marker point patches into Linus's tree from
      the viewpoint of performance impact.
      
      I prepared two kernels to evaluate. The first one was compiled without
      CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
      
      I downloaded the original hackbench from the following URL:
      http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
      
      I ran hackbench 5 times in each condition and calculated the average and
      difference between the kernels.
      
          The parameter of hackbench: every 50 from 50 to 800
          The number of CPUs of the server: 2, 4, and 8
      
      Below is the results. As you can see, major performance regression
      wasn't found in any case. Even if number of processes increases,
      differences between marker-enabled kernel and marker- disabled kernel
      doesn't increase. Moreover, if number of CPUs increases, the differences
      doesn't increase either.
      
      Curiously, marker-enabled kernel is better than marker-disabled kernel
      in more than half cases, although I guess it comes from the difference
      of memory access pattern.
      
      * 2 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
            100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
            150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
            200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
            250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
            300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
            350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
            400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
            450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
            500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
            550 |     58.401   |      57.700  |  -0.701  |  -1.2   |
            600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
            650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
            700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
            750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
            800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
      --------------------------------------------------------------
      
      * 4 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
            100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
            150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
            200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
            250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
            300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
            350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
            400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
            450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
            500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
            550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
            600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
            650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
            700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
            750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
            800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
      --------------------------------------------------------------
      
      * 8 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
            100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
            150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
            200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
            250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
            300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
            350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
            400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
            450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
            500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
            550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
            600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
            650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
            700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
            750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
            800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
      --------------------------------------------------------------
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: N'Peter Zijlstra' <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97e1c18e
  8. 02 8月, 2008 1 次提交
    • Y
      Missing symbol prefix on vmlinux.lds.h · c6de0026
      Yoshinori Sato 提交于
      ARCH=h8300:
      
      init/main.c:781: undefined reference to `___early_initcall_end'
      
      Same problem have
      __start___bug_table
      __stop___bug_table
      __tracedata_start
      __tracedata_end
      __per_cpu_start
      __per_cpu_end
      
      When defining a symbol in vmlinux.lds, use the VMLINUX_SYMBOL macro.
      VMLINUX_SYMBOL adds a prefix charactor.
      
      You can't just use straight symbol names in common header files as they
      dont take into consideration weird arch-specific ABI conventions.  in the
      case of Blackfin/h8300, the ABI dictates that any C-visible symbols have
      an underscore prefixed to them.  Thus all symbols in vmlinux.lds.h need to
      be wrapped in VMLINUX_SYMBOL() so that each arch can put hide this magic
      in their own files.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NYoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: "Mike Frysinger" <vapier.adi@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c6de0026
  9. 27 7月, 2008 1 次提交
  10. 26 7月, 2008 1 次提交
  11. 10 7月, 2008 1 次提交
    • D
      firmware: allow firmware files to be built into kernel image · 5658c769
      David Woodhouse 提交于
      Some drivers have their own hacks to bypass the kernel's firmware loader
      and build their firmware into the kernel; this renders those unnecessary.
      
      Other drivers don't use the firmware loader at all, because they always
      want the firmware to be available. This allows them to start using the
      firmware loader.
      
      A third set of drivers already use the firmware loader, but can't be
      used without help from userspace, which sometimes requires an initrd.
      This allows them to work in a static kernel.
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      5658c769
  12. 11 6月, 2008 1 次提交
    • R
      Suspend/Resume bug in PCI layer wrt quirks · e1a2a51e
      Rafael J. Wysocki 提交于
      Some quirks should be called with interrupt disabled, we can't directly
      call them in .resume_early. Also the patch introduces
      pci_fixup_resume_early and pci_fixup_suspend, which matches current
      device core callbacks (.suspend/.resume_early).
      
      TBD: Somebody knows why we need quirk resume should double check if a
      quirk should be called in resume or resume_early. I changed some per my
      understanding, but can't make sure I fixed all.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      e1a2a51e
  13. 25 5月, 2008 3 次提交
  14. 20 2月, 2008 1 次提交
    • S
      Add missing init section definitions · 37c514e3
      Sam Ravnborg 提交于
      When adding __devinitconst etc. the __initconst variant
      were missed.
      Add this one and proper definitions for .head.text for use
      in .S files.
      The naming .head.text is preferred over .text.head as the
      latter will conflict for a function named head when introducing
      -ffunctions-sections.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      37c514e3
  15. 30 1月, 2008 1 次提交
    • A
      x86: add testcases for RODATA and NX protections/attributes · edeed305
      Arjan van de Ven 提交于
      Latest update; I now have 4 NX tests, but 2 fail so they're #if 0'd.
      I also cleaned up the NX test code quite a bit, and got rid of the ugly
      exception table sorting stuff.
      
      From: Arjan van de Ven <arjan@linux.intel.com>
      
      This patch adds testcases for the CONFIG_DEBUG_RODATA configuration option
      as well as the NX CPU feature/mappings. Both testcases can move to tests/
      once that patch gets merged into mainline.
      (I'm half considering moving the rodata test into mm/init.c but I'll
      wait with that until init.c is unified)
      
      As part of this I had to fix a not-quite-right alignment in the vmlinux.lds.h
      for the RODATA sections, which lead to 1 page less being marked read only.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      edeed305
  16. 29 1月, 2008 4 次提交
    • S
      Introduce new section reference annotations tags: __ref, __refdata, __refconst · 312b1485
      Sam Ravnborg 提交于
      Today we have the following annotations for functions/data
      referencing __init/__exit functions / data:
      
      __init_refok     => for init functions
      __initdata_refok => for init data
      __exit_refok     => for exit functions
      
      There is really no difference between the __init and __exit
      versions and simplify it and to introduce a shorter annotation
      the following new annotations are introduced:
      
      __ref      => for functions (code) that
                    references __*init / __*exit
      __refdata  => for variables
      __refconst => for const variables
      
      Whit this annotation is it more obvious what the annotation
      is for and there is no longer the arbitary division
      between __init and __exit code.
      
      The mechanishm is the same as before - a special section
      is created which is made part of the usual sections
      in the linker script.
      
      We will start to see annotations like this:
      
      -static struct pci_serial_quirk pci_serial_quirks[] = {
      +static const struct pci_serial_quirk pci_serial_quirks[] __refconst = {
      -----------------
      -static struct notifier_block __cpuinitdata cpuid_class_cpu_notifier =
      +static struct notifier_block cpuid_class_cpu_notifier __refdata =
      ----------------
      -static int threshold_cpu_callback(struct notifier_block *nfb,
      +static int __ref threshold_cpu_callback(struct notifier_block *nfb,
      
      [The above is just random samples].
      
      Note: No modifications were needed in modpost
      to support the new sections due to the newly introduced
      blacklisting.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      312b1485
    • A
      asm-generic/vmlix.lds.h: simplify __mem{init,exit}* dependencies · 1a3fb6d4
      Adrian Bunk 提交于
      Simplify the dependencies on __mem{init,exit}* (ACPI_HOTPLUG_MEMORY requires
      MEMORY_HOTPLUG).
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      1a3fb6d4
    • S
      Use separate sections for __dev/__cpu/__mem code/data · eb8f6890
      Sam Ravnborg 提交于
      Introducing separate sections for __dev* (HOTPLUG),
      __cpu* (HOTPLUG_CPU) and __mem* (MEMORY_HOTPLUG)
      allows us to do a much more reliable Section mismatch
      check in modpost. We are no longer dependent on the actual
      configuration of for example HOTPLUG.
      
      This has the effect that all users see much more
      Section mismatch warnings than before because they
      were almost all hidden when HOTPLUG was enabled.
      The advantage of this is that when building a piece
      of code then it is much more likely that the Section
      mismatch errors are spotted and the warnings will be
      felt less random of nature.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Adrian Bunk <bunk@kernel.org>
      eb8f6890
    • S
      all archs: consolidate init and exit sections in vmlinux.lds.h · 01ba2bdc
      Sam Ravnborg 提交于
      This patch consolidate all definitions of .init.text, .init.data
      and .exit.text, .exit.data section definitions in
      the generic vmlinux.lds.h.
      
      This is a preparational patch - alone it does not buy
      us much good.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      01ba2bdc
  17. 20 10月, 2007 1 次提交
  18. 14 10月, 2007 1 次提交
  19. 20 7月, 2007 2 次提交
    • R
      i386: Put allocated ELF notes in read-only data segment · cbe87121
      Roland McGrath 提交于
      This changes the i386 linker script and the asm-generic macro it uses so that
      ELF note sections with SHF_ALLOC set are linked into the kernel image along
      with other read-only data.  The PT_NOTE also points to their location.
      
      This paves the way for putting useful build-time information into ELF notes
      that can be found easily later in a kernel memory dump.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbe87121
    • F
      define new percpu interface for shared data · 5fb7dc37
      Fenghua Yu 提交于
      per cpu data section contains two types of data.  One set which is
      exclusively accessed by the local cpu and the other set which is per cpu,
      but also shared by remote cpus.  In the current kernel, these two sets are
      not clearely separated out.  This can potentially cause the same data
      cacheline shared between the two sets of data, which will result in
      unnecessary bouncing of the cacheline between cpus.
      
      One way to fix the problem is to cacheline align the remotely accessed per
      cpu data, both at the beginning and at the end.  Because of the padding at
      both ends, this will likely cause some memory wastage and also the
      interface to achieve this is not clean.
      
      This patch:
      
      Moves the remotely accessed per cpu data (which is currently marked
      as ____cacheline_aligned_in_smp) into a different section, where all the data
      elements are cacheline aligned. And as such, this differentiates the local
      only data and remotely accessed data cleanly.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: <linux-arch@vger.kernel.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fb7dc37
  20. 30 5月, 2007 1 次提交
    • S
      sparc64: fix alignment bug in linker definition script · 4096b46f
      Sam Ravnborg 提交于
      The RO_DATA section were hardcoded to a specific
      alignment in include/asm-generic/vmlinux.h.
      But for sparc64 this did not match the PAGE_SIZE.
      
      Introduce a new section definition named:
      RO_DATA that takes actual alignment as parameter.
      RODATA are provided for backward compatibility.
      
      On top of this avoid hardcoding alignment for
      sparc64 in reset of the script
      Fix is build-tested on sparc64 + x86_64.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      4096b46f
  21. 19 5月, 2007 3 次提交
  22. 03 5月, 2007 1 次提交
  23. 21 12月, 2006 1 次提交
    • A
      PCI: Fix multiple problems with VIA hardware · 1597cacb
      Alan Cox 提交于
      This patch is designed to fix:
      - Disk eating corruptor on KT7 after resume from RAM
      - VIA IRQ handling
      - VIA fixups for bus lockups after resume from RAM
      
      The core of this is to add a table of resume fixups run at resume time.
      We need to do this for a variety of boards and features, but particularly
      we need to do this to get various critical VIA fixups done on resume.
      
      The second part of the problem is to handle VIA IRQ number rules which
      are a bit odd and need special handling for PIC interrupts. Various
      patches broke various boxes and while this one may not be perfect
      (hopefully it is) it ensures the workaround is applied to the right
      devices only.
      
      From: Jean Delvare <khali@linux-fr.org>
      
      Now that PCI quirks are replayed on software resume, we can safely
      re-enable the Asus SMBus unhiding quirk even when software suspend support
      is enabled.
      
      [akpm@osdl.org: fix const warning]
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Cc: Jean Delvare <khali@linux-fr.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      1597cacb
  24. 16 12月, 2006 1 次提交
    • L
      Remove stack unwinder for now · d1526e2c
      Linus Torvalds 提交于
      It has caused more problems than it ever really solved, and is
      apparently not getting cleaned up and fixed.  We can put it back when
      it's stable and isn't likely to make warning or bug events worse.
      
      In the meantime, enable frame pointers for more readable stack traces.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d1526e2c
  25. 12 12月, 2006 1 次提交
  26. 09 12月, 2006 1 次提交
    • J
      [PATCH] Generic BUG implementation · 7664c5a1
      Jeremy Fitzhardinge 提交于
      This patch adds common handling for kernel BUGs, for use by architectures as
      they wish.  The code is derived from arch/powerpc.
      
      The advantages of having common BUG handling are:
       - consistent BUG reporting across architectures
       - shared implementation of out-of-line file/line data
       - implement CONFIG_DEBUG_BUGVERBOSE consistently
      
      This means that in inline impact of BUG is just the illegal instruction
      itself, which is an improvement for i386 and x86-64.
      
      A BUG is represented in the instruction stream as an illegal instruction,
      which has file/line information associated with it.  This extra information is
      stored in the __bug_table section in the ELF file.
      
      When the kernel gets an illegal instruction, it first confirms it might
      possibly be from a BUG (ie, in kernel mode, the right illegal instruction).
      It then calls report_bug().  This searches __bug_table for a matching
      instruction pointer, and if found, prints the corresponding file/line
      information.  If report_bug() determines that it wasn't a BUG which caused the
      trap, it returns BUG_TRAP_TYPE_NONE.
      
      Some architectures (powerpc) implement WARN using the same mechanism; if the
      illegal instruction was the result of a WARN, then report_bug(Q) returns
      CONFIG_DEBUG_BUGVERBOSE; otherwise it returns BUG_TRAP_TYPE_BUG.
      
      lib/bug.c keeps a list of loaded modules which can be searched for __bug_table
      entries.  The architecture must call
      module_bug_finalize()/module_bug_cleanup() from its corresponding
      module_finalize/cleanup functions.
      
      Unsetting CONFIG_DEBUG_BUGVERBOSE will reduce the kernel size by some amount.
      At the very least, filename and line information will not be recorded for each
      but, but architectures may decide to store no extra information per BUG at
      all.
      
      Unfortunately, gcc doesn't have a general way to mark an asm() as noreturn, so
      architectures will generally have to include an infinite loop (or similar) in
      the BUG code, so that gcc knows execution won't continue beyond that point.
      gcc does have a __builtin_trap() operator which may be useful to achieve the
      same effect, unfortunately it cannot be used to actually implement the BUG
      itself, because there's no way to get the instruction's address for use in
      generating the __bug_table entry.
      
      [randy.dunlap@oracle.com: Handle BUG=n, GENERIC_BUG=n to prevent build errors]
      [bunk@stusta.de: include/linux/bug.h must always #include <linux/module.h]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Hugh Dickens <hugh@veritas.com>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7664c5a1
  27. 07 12月, 2006 2 次提交
    • J
      [PATCH] unwinder: move .eh_frame to RODATA · b65780e1
      Jan Beulich 提交于
      The .eh_frame section contents is never written to, so it can as well
      benefit from CONFIG_DEBUG_RODATA.
      
      Diff-ed against firstfloor tree.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      b65780e1
    • V
      [PATCH] i386: Distinguish absolute symbols · 6569580d
      Vivek Goyal 提交于
      Ld knows about 2 kinds of symbols,  absolute and section
      relative.  Section relative symbols symbols change value
      when a section is moved and absolute symbols do not.
      
      Currently in the linker script we have several labels
      marking the beginning and ending of sections that
      are outside of sections, making them absolute symbols.
      Having a mixture of absolute and section relative
      symbols refereing to the same data is currently harmless
      but it is confusing.
      
      This must be done carefully as newer revs of ld do not place
      symbols that appear in sections without data and instead
      ld makes those symbols global :(
      
      My ultimate goal is to build a relocatable kernel.  The
      safest and least intrusive technique is to generate
      relocation entries so the kernel can be relocated at load
      time.  The only penalty would be an increase in the size
      of the kernel binary.  The problem is that if absolute and
      relocatable symbols are not properly specified absolute symbols
      will be relocated or section relative symbols won't be, which
      is fatal.
      
      The practical motivation is that when generating kernels that
      will run from a reserved area for analyzing what caused
      a kernel panic, it is simpler if you don't need to hard code
      the physical memory location they will run at, especially
      for the distributions.
      
      [AK: and merged:]
      
      o Also put a message so that in future people can be aware of it and
        avoid introducing absolute symbols.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      6569580d