1. 14 10月, 2008 2 次提交
    • S
      ftrace: enable mcount recording for modules · 90d595fe
      Steven Rostedt 提交于
      This patch enables the loading of the __mcount_section of modules and
      changing all the callers of mcount into nops.
      
      The modification is done before the init_module function is called, so
      again, we do not need to use kstop_machine to make these changes.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      90d595fe
    • M
      tracing: Kernel Tracepoints · 97e1c18e
      Mathieu Desnoyers 提交于
      Implementation of kernel tracepoints. Inspired from the Linux Kernel
      Markers. Allows complete typing verification by declaring both tracing
      statement inline functions and probe registration/unregistration static
      inline functions within the same macro "DEFINE_TRACE". No format string
      is required. See the tracepoint Documentation and Samples patches for
      usage examples.
      
      Taken from the documentation patch :
      
      "A tracepoint placed in code provides a hook to call a function (probe)
      that you can provide at runtime. A tracepoint can be "on" (a probe is
      connected to it) or "off" (no probe is attached). When a tracepoint is
      "off" it has no effect, except for adding a tiny time penalty (checking
      a condition for a branch) and space penalty (adding a few bytes for the
      function call at the end of the instrumented function and adds a data
      structure in a separate section).  When a tracepoint is "on", the
      function you provide is called each time the tracepoint is executed, in
      the execution context of the caller. When the function provided ends its
      execution, it returns to the caller (continuing from the tracepoint
      site).
      
      You can put tracepoints at important locations in the code. They are
      lightweight hooks that can pass an arbitrary number of parameters, which
      prototypes are described in a tracepoint declaration placed in a header
      file."
      
      Addition and removal of tracepoints is synchronized by RCU using the
      scheduler (and preempt_disable) as guarantees to find a quiescent state
      (this is really RCU "classic"). The update side uses rcu_barrier_sched()
      with call_rcu_sched() and the read/execute side uses
      "preempt_disable()/preempt_enable()".
      
      We make sure the previous array containing probes, which has been
      scheduled for deletion by the rcu callback, is indeed freed before we
      proceed to the next update. It therefore limits the rate of modification
      of a single tracepoint to one update per RCU period. The objective here
      is to permit fast batch add/removal of probes on _different_
      tracepoints.
      
      Changelog :
      - Use #name ":" #proto as string to identify the tracepoint in the
        tracepoint table. This will make sure not type mismatch happens due to
        connexion of a probe with the wrong type to a tracepoint declared with
        the same name in a different header.
      - Add tracepoint_entry_free_old.
      - Change __TO_TRACE to get rid of the 'i' iterator.
      
      Masami Hiramatsu <mhiramat@redhat.com> :
      Tested on x86-64.
      
      Performance impact of a tracepoint : same as markers, except that it
      adds about 70 bytes of instructions in an unlikely branch of each
      instrumented function (the for loop, the stack setup and the function
      call). It currently adds a memory read, a test and a conditional branch
      at the instrumentation site (in the hot path). Immediate values will
      eventually change this into a load immediate, test and branch, which
      removes the memory read which will make the i-cache impact smaller
      (changing the memory read for a load immediate removes 3-4 bytes per
      site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
      also saves the d-cache hit).
      
      About the performance impact of tracepoints (which is comparable to
      markers), even without immediate values optimizations, tests done by
      Hideo Aoki on ia64 show no regression. His test case was using hackbench
      on a kernel where scheduler instrumentation (about 5 events in code
      scheduler code) was added.
      
      Quoting Hideo Aoki about Markers :
      
      I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
      tree, which includes several markers for LTTng, using an ia64 server.
      
      While the immediate trace mark feature isn't implemented on ia64, there
      is no major performance regression. So, I think that we don't have any
      issues to propose merging marker point patches into Linus's tree from
      the viewpoint of performance impact.
      
      I prepared two kernels to evaluate. The first one was compiled without
      CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
      
      I downloaded the original hackbench from the following URL:
      http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
      
      I ran hackbench 5 times in each condition and calculated the average and
      difference between the kernels.
      
          The parameter of hackbench: every 50 from 50 to 800
          The number of CPUs of the server: 2, 4, and 8
      
      Below is the results. As you can see, major performance regression
      wasn't found in any case. Even if number of processes increases,
      differences between marker-enabled kernel and marker- disabled kernel
      doesn't increase. Moreover, if number of CPUs increases, the differences
      doesn't increase either.
      
      Curiously, marker-enabled kernel is better than marker-disabled kernel
      in more than half cases, although I guess it comes from the difference
      of memory access pattern.
      
      * 2 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
            100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
            150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
            200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
            250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
            300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
            350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
            400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
            450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
            500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
            550 |     58.401   |      57.700  |  -0.701  |  -1.2   |
            600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
            650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
            700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
            750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
            800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
      --------------------------------------------------------------
      
      * 4 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
            100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
            150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
            200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
            250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
            300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
            350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
            400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
            450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
            500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
            550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
            600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
            650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
            700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
            750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
            800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
      --------------------------------------------------------------
      
      * 8 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
            100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
            150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
            200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
            250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
            300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
            350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
            400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
            450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
            500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
            550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
            600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
            650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
            700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
            750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
            800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
      --------------------------------------------------------------
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: N'Peter Zijlstra' <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97e1c18e
  2. 26 8月, 2008 1 次提交
    • L
      [module] Don't let gcc inline load_module() · ffb4ba76
      Linus Torvalds 提交于
      'load_module()' is a complex function that contains all the ELF section
      logic, and inlining it is utterly insane.  But gcc will do it, simply
      because there is only one call-site.  As a result, all the stack space
      that is allocated for all the work to load the module will still be
      active when we actually call the module init sequence, and the deep call
      chain makes stack overflows happen.
      
      And stack overflows are really hard to debug, because they not only
      corrupt random pages below the stack, but also corrupt the thread_info
      structure that is allocated under the stack.
      
      In this case, Alan Brunelle reported some crazy oopses at bootup, after
      loading the processor module that ends up doing complex ACPI stuff and
      has quite a deep callchain.  This should fix it, and is the sane thing
      to do regardless.
      
      Cc: Alan D. Brunelle <Alan.Brunelle@hp.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ffb4ba76
  3. 12 8月, 2008 1 次提交
    • A
      modules: extend initcall_debug functionality to the module loader · 59f9415f
      Arjan van de Ven 提交于
      The kernel has this really nice facility where if you put "initcall_debug"
      on the kernel commandline, it'll print which function it's going to
      execute just before calling an initcall, and then after the call completes
      it will
      
      1) print if it had an error code
      
      2) checks for a few simple bugs (like leaving irqs off)
      and
      
      3) print how long the init call took in milliseconds.
      
      While trying to optimize the boot speed of my laptop, I have been loving
      number 3 to figure out what to optimize...  ...  and then I wished that
      the same thing was done for module loading.
      
      This patch makes the module loader use this exact same functionality; it's
      a logical extension in my view (since modules are just sort of late
      binding initcalls anyway) and so far I've found it quite useful in finding
      where things are too slow in my boot.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      59f9415f
  4. 28 7月, 2008 2 次提交
  5. 22 7月, 2008 5 次提交
    • R
      modules: Take a shortcut for checking if an address is in a module · 3a642e99
      Rusty Russell 提交于
      This patch keeps track of the boundaries of module allocation, in
      order to speed up module_text_address().
      
      Inspired by Arjan's version, which required arch-specific defines:
      
      	Various pieces of the kernel (lockdep, latencytop, etc) tend
      	to store backtraces, sometimes at a relatively high
      	frequency. In itself this isn't a big performance deal (after
      	all you're using diagnostics features), but there have been
      	some complaints from people who have over 100 modules loaded
      	that this is a tad too slow.
      
      	This is due to the new backtracer code which looks at every
      	slot on the stack to see if it's a kernel/module text address,
      	so that's 1024 slots.  1024 times 100 modules... that's a lot
      	of list walking.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      3a642e99
    • D
      module: turn longs into ints for module sizes · 2f0f2a33
      Denys Vlasenko 提交于
      This shrinks module.o and each *.ko file.
      
      And finally, structure members which hold length of module
      code (four such members there) and count of symbols
      are converted from longs to ints.
      
      We cannot possibly have a module where 32 bits won't
      be enough to hold such counts.
      
      For one, module loading checks module size for sanity
      before loading, so such insanely big module will fail
      that test first.
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      2f0f2a33
    • D
      Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs · f7f5b675
      Denys Vlasenko 提交于
      module.c and module.h conatains code for finding
      exported symbols which are declared with EXPORT_UNUSED_SYMBOL,
      and this code is compiled in even if CONFIG_UNUSED_SYMBOLS is not set
      and thus there can be no EXPORT_UNUSED_SYMBOLs in modules anyway
      (because EXPORT_UNUSED_SYMBOL(x) are compiled out to nothing then).
      
      This patch adds required #ifdefs.
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      f7f5b675
    • R
      module: generic each_symbol iterator function · dafd0940
      Rusty Russell 提交于
      Introduce an each_symbol() iterator to avoid duplicating the knowledge
      about the 5 different sections containing symbols.  Currently only
      used by find_symbol(), but will be used by symbol_put_addr() too.
      
      (Includes NULL ptr deref fix by Jiri Kosina <jkosina@suse.cz>)
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Jiri Kosina <jkosina@suse.cz>
      dafd0940
    • R
      module: don't use stop_machine for waiting rmmod · da39ba5e
      Rusty Russell 提交于
      rmmod has a little-used "-w" option, meaning that instead of failing if the
      module is in use, it should block until the module becomes unused.
      
      In this case, we don't need to use stop_machine: Max Krasnyansky
      indicated that would be useful for SystemTap which loads/unloads new
      modules frequently.
      
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      da39ba5e
  6. 23 5月, 2008 2 次提交
  7. 09 5月, 2008 2 次提交
  8. 05 5月, 2008 1 次提交
    • L
      Make forced module loading optional · 826e4506
      Linus Torvalds 提交于
      The kernel module loader used to be much too happy to allow loading of
      modules for the wrong kernel version by default.  For example, if you
      had MODVERSIONS enabled, but tried to load a module with no version
      info, it would happily load it and taint the kernel - whether it was
      likely to actually work or not!
      
      Generally, such forced module loading should be considered a really
      really bad idea, so make it conditional on a new config option
      (MODULE_FORCE_LOAD), and make it default to off.
      
      If somebody really wants to force module loads, that's their problem,
      but we should not encourage it.  Especially as it happened to me by
      mistake (ie regular unversioned Fedora modules getting loaded) causing
      lots of strange behavior.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      826e4506
  9. 01 5月, 2008 6 次提交
  10. 19 4月, 2008 1 次提交
  11. 11 3月, 2008 2 次提交
  12. 05 3月, 2008 1 次提交
  13. 22 2月, 2008 1 次提交
  14. 14 2月, 2008 1 次提交
    • M
      Linux Kernel Markers: support multiple probes · fb40bd78
      Mathieu Desnoyers 提交于
      RCU style multiple probes support for the Linux Kernel Markers.  Common case
      (one probe) is still fast and does not require dynamic allocation or a
      supplementary pointer dereference on the fast path.
      
      - Move preempt disable from the marker site to the callback.
      
      Since we now have an internal callback, move the preempt disable/enable to the
      callback instead of the marker site.
      
      Since the callback change is done asynchronously (passing from a handler that
      supports arguments to a handler that does not setup the arguments is no
      arguments are passed), we can safely update it even if it is outside the
      preempt disable section.
      
      - Move probe arm to probe connection. Now, a connected probe is automatically
        armed.
      
      Remove MARK_MAX_FORMAT_LEN, unused.
      
      This patch modifies the Linux Kernel Markers API : it removes the probe
      "arm/disarm" and changes the probe function prototype : it now expects a
      va_list * instead of a "...".
      
      If we want to have more than one probe connected to a marker at a given
      time (LTTng, or blktrace, ssytemtap) then we need this patch. Without it,
      connecting a second probe handler to a marker will fail.
      
      It allow us, for instance, to do interesting combinations :
      
      Do standard tracing with LTTng and, eventually, to compute statistics
      with SystemTAP, or to have a special trigger on an event that would call
      a systemtap script which would stop flight recorder tracing.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Mike Mason <mmlnx@us.ibm.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: David Smith <dsmith@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb40bd78
  15. 09 2月, 2008 3 次提交
  16. 30 1月, 2008 1 次提交
  17. 29 1月, 2008 6 次提交
  18. 28 1月, 2008 1 次提交
  19. 26 1月, 2008 1 次提交
    • A
      debug: track and print last unloaded module in the oops trace · e14af7ee
      Arjan van de Ven 提交于
      Based on a suggestion from Andi:
      
       In various cases, the unload of a module may leave some bad state around
       that causes a kernel crash AFTER a module is unloaded; and it's then hard
       to find which module caused that.
      
      This patch tracks the last unloaded module, and prints this as part of the
      module list in the oops trace.
      
      Right now, only the last 1 module is tracked; I expect that this is enough
      for the vast majority of cases where this information matters; if it turns
      out that tracking more is important, we can always extend it to that.
      
      [ mingo@elte.hu: build fix ]
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e14af7ee