1. 14 10月, 2008 1 次提交
    • M
      tracing: Kernel Tracepoints · 97e1c18e
      Mathieu Desnoyers 提交于
      Implementation of kernel tracepoints. Inspired from the Linux Kernel
      Markers. Allows complete typing verification by declaring both tracing
      statement inline functions and probe registration/unregistration static
      inline functions within the same macro "DEFINE_TRACE". No format string
      is required. See the tracepoint Documentation and Samples patches for
      usage examples.
      
      Taken from the documentation patch :
      
      "A tracepoint placed in code provides a hook to call a function (probe)
      that you can provide at runtime. A tracepoint can be "on" (a probe is
      connected to it) or "off" (no probe is attached). When a tracepoint is
      "off" it has no effect, except for adding a tiny time penalty (checking
      a condition for a branch) and space penalty (adding a few bytes for the
      function call at the end of the instrumented function and adds a data
      structure in a separate section).  When a tracepoint is "on", the
      function you provide is called each time the tracepoint is executed, in
      the execution context of the caller. When the function provided ends its
      execution, it returns to the caller (continuing from the tracepoint
      site).
      
      You can put tracepoints at important locations in the code. They are
      lightweight hooks that can pass an arbitrary number of parameters, which
      prototypes are described in a tracepoint declaration placed in a header
      file."
      
      Addition and removal of tracepoints is synchronized by RCU using the
      scheduler (and preempt_disable) as guarantees to find a quiescent state
      (this is really RCU "classic"). The update side uses rcu_barrier_sched()
      with call_rcu_sched() and the read/execute side uses
      "preempt_disable()/preempt_enable()".
      
      We make sure the previous array containing probes, which has been
      scheduled for deletion by the rcu callback, is indeed freed before we
      proceed to the next update. It therefore limits the rate of modification
      of a single tracepoint to one update per RCU period. The objective here
      is to permit fast batch add/removal of probes on _different_
      tracepoints.
      
      Changelog :
      - Use #name ":" #proto as string to identify the tracepoint in the
        tracepoint table. This will make sure not type mismatch happens due to
        connexion of a probe with the wrong type to a tracepoint declared with
        the same name in a different header.
      - Add tracepoint_entry_free_old.
      - Change __TO_TRACE to get rid of the 'i' iterator.
      
      Masami Hiramatsu <mhiramat@redhat.com> :
      Tested on x86-64.
      
      Performance impact of a tracepoint : same as markers, except that it
      adds about 70 bytes of instructions in an unlikely branch of each
      instrumented function (the for loop, the stack setup and the function
      call). It currently adds a memory read, a test and a conditional branch
      at the instrumentation site (in the hot path). Immediate values will
      eventually change this into a load immediate, test and branch, which
      removes the memory read which will make the i-cache impact smaller
      (changing the memory read for a load immediate removes 3-4 bytes per
      site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
      also saves the d-cache hit).
      
      About the performance impact of tracepoints (which is comparable to
      markers), even without immediate values optimizations, tests done by
      Hideo Aoki on ia64 show no regression. His test case was using hackbench
      on a kernel where scheduler instrumentation (about 5 events in code
      scheduler code) was added.
      
      Quoting Hideo Aoki about Markers :
      
      I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
      tree, which includes several markers for LTTng, using an ia64 server.
      
      While the immediate trace mark feature isn't implemented on ia64, there
      is no major performance regression. So, I think that we don't have any
      issues to propose merging marker point patches into Linus's tree from
      the viewpoint of performance impact.
      
      I prepared two kernels to evaluate. The first one was compiled without
      CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
      
      I downloaded the original hackbench from the following URL:
      http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
      
      I ran hackbench 5 times in each condition and calculated the average and
      difference between the kernels.
      
          The parameter of hackbench: every 50 from 50 to 800
          The number of CPUs of the server: 2, 4, and 8
      
      Below is the results. As you can see, major performance regression
      wasn't found in any case. Even if number of processes increases,
      differences between marker-enabled kernel and marker- disabled kernel
      doesn't increase. Moreover, if number of CPUs increases, the differences
      doesn't increase either.
      
      Curiously, marker-enabled kernel is better than marker-disabled kernel
      in more than half cases, although I guess it comes from the difference
      of memory access pattern.
      
      * 2 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
            100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
            150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
            200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
            250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
            300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
            350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
            400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
            450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
            500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
            550 |     58.401   |      57.700  |  -0.701  |  -1.2   |
            600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
            650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
            700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
            750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
            800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
      --------------------------------------------------------------
      
      * 4 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
            100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
            150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
            200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
            250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
            300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
            350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
            400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
            450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
            500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
            550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
            600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
            650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
            700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
            750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
            800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
      --------------------------------------------------------------
      
      * 8 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
            100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
            150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
            200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
            250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
            300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
            350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
            400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
            450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
            500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
            550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
            600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
            650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
            700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
            750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
            800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
      --------------------------------------------------------------
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: N'Peter Zijlstra' <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97e1c18e
  2. 25 7月, 2008 1 次提交
  3. 22 7月, 2008 3 次提交
  4. 01 5月, 2008 2 次提交
  5. 15 2月, 2008 1 次提交
  6. 14 2月, 2008 1 次提交
    • M
      Linux Kernel Markers: support multiple probes · fb40bd78
      Mathieu Desnoyers 提交于
      RCU style multiple probes support for the Linux Kernel Markers.  Common case
      (one probe) is still fast and does not require dynamic allocation or a
      supplementary pointer dereference on the fast path.
      
      - Move preempt disable from the marker site to the callback.
      
      Since we now have an internal callback, move the preempt disable/enable to the
      callback instead of the marker site.
      
      Since the callback change is done asynchronously (passing from a handler that
      supports arguments to a handler that does not setup the arguments is no
      arguments are passed), we can safely update it even if it is outside the
      preempt disable section.
      
      - Move probe arm to probe connection. Now, a connected probe is automatically
        armed.
      
      Remove MARK_MAX_FORMAT_LEN, unused.
      
      This patch modifies the Linux Kernel Markers API : it removes the probe
      "arm/disarm" and changes the probe function prototype : it now expects a
      va_list * instead of a "...".
      
      If we want to have more than one probe connected to a marker at a given
      time (LTTng, or blktrace, ssytemtap) then we need this patch. Without it,
      connecting a second probe handler to a marker will fail.
      
      It allow us, for instance, to do interesting combinations :
      
      Do standard tracing with LTTng and, eventually, to compute statistics
      with SystemTAP, or to have a special trigger on an event that would call
      a systemtap script which would stop flight recorder tracing.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Mike Mason <mmlnx@us.ibm.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: David Smith <dsmith@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb40bd78
  7. 09 2月, 2008 1 次提交
  8. 29 1月, 2008 2 次提交
  9. 25 1月, 2008 2 次提交
  10. 20 10月, 2007 1 次提交
  11. 17 10月, 2007 2 次提交
  12. 17 7月, 2007 1 次提交
  13. 11 5月, 2007 1 次提交
  14. 10 5月, 2007 1 次提交
  15. 09 5月, 2007 4 次提交
  16. 03 5月, 2007 1 次提交
  17. 17 2月, 2007 1 次提交
  18. 08 2月, 2007 2 次提交
  19. 09 12月, 2006 1 次提交
    • J
      [PATCH] Generic BUG implementation · 7664c5a1
      Jeremy Fitzhardinge 提交于
      This patch adds common handling for kernel BUGs, for use by architectures as
      they wish.  The code is derived from arch/powerpc.
      
      The advantages of having common BUG handling are:
       - consistent BUG reporting across architectures
       - shared implementation of out-of-line file/line data
       - implement CONFIG_DEBUG_BUGVERBOSE consistently
      
      This means that in inline impact of BUG is just the illegal instruction
      itself, which is an improvement for i386 and x86-64.
      
      A BUG is represented in the instruction stream as an illegal instruction,
      which has file/line information associated with it.  This extra information is
      stored in the __bug_table section in the ELF file.
      
      When the kernel gets an illegal instruction, it first confirms it might
      possibly be from a BUG (ie, in kernel mode, the right illegal instruction).
      It then calls report_bug().  This searches __bug_table for a matching
      instruction pointer, and if found, prints the corresponding file/line
      information.  If report_bug() determines that it wasn't a BUG which caused the
      trap, it returns BUG_TRAP_TYPE_NONE.
      
      Some architectures (powerpc) implement WARN using the same mechanism; if the
      illegal instruction was the result of a WARN, then report_bug(Q) returns
      CONFIG_DEBUG_BUGVERBOSE; otherwise it returns BUG_TRAP_TYPE_BUG.
      
      lib/bug.c keeps a list of loaded modules which can be searched for __bug_table
      entries.  The architecture must call
      module_bug_finalize()/module_bug_cleanup() from its corresponding
      module_finalize/cleanup functions.
      
      Unsetting CONFIG_DEBUG_BUGVERBOSE will reduce the kernel size by some amount.
      At the very least, filename and line information will not be recorded for each
      but, but architectures may decide to store no extra information per BUG at
      all.
      
      Unfortunately, gcc doesn't have a general way to mark an asm() as noreturn, so
      architectures will generally have to include an infinite loop (or similar) in
      the BUG code, so that gcc knows execution won't continue beyond that point.
      gcc does have a __builtin_trap() operator which may be useful to achieve the
      same effect, unfortunately it cannot be used to actually implement the BUG
      itself, because there's no way to get the instruction's address for use in
      generating the __bug_table entry.
      
      [randy.dunlap@oracle.com: Handle BUG=n, GENERIC_BUG=n to prevent build errors]
      [bunk@stusta.de: include/linux/bug.h must always #include <linux/module.h]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Hugh Dickens <hugh@veritas.com>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7664c5a1
  20. 04 12月, 2006 1 次提交
  21. 02 12月, 2006 1 次提交
  22. 12 10月, 2006 1 次提交
    • F
      [PATCH] fix Module taint flags listing in Oops/panic · fa3ba2e8
      Florin Malita 提交于
      Module taint flags listing in Oops/panic has a couple of issues:
      
      * taint_flags() doesn't null-terminate the buffer after printing the flags
      
      * per-module taints are only set if the kernel is not already tainted
        (with that particular flag) => only the first offending module gets its
        taint info correctly updated
      
      Some additional changes:
      
      * 'license_gplok' is no longer needed - equivalent to !(taints &
        TAINT_PROPRIETARY_MODULE) - so we can drop it from struct module *
        exporting module taint info via /proc/module:
      
      pwc 88576 0 - Live 0xf8c32000
      evilmod 6784 1 pwc, Live 0xf8bbf000 (PF)
      Signed-off-by: NFlorin Malita <fmalita@gmail.com>
      Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fa3ba2e8
  23. 02 10月, 2006 1 次提交
    • R
      [PATCH] list module taint flags in Oops/panic · 2bc2d61a
      Randy Dunlap 提交于
      When listing loaded modules during an oops or panic, also list each
      module's Tainted flags if non-zero (P: Proprietary or F: Forced load only).
      
      If a module is did not taint the kernel, it is just listed like
      	usbcore
      but if it did taint the kernel, it is listed like
      	wizmodem(PF)
      
      Example:
      [ 3260.121718] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
      [ 3260.121729]  [<ffffffff8804c099>] :dump_test:proc_dump_test+0x99/0xc8
      [ 3260.121742] PGD fe8d067 PUD 264a6067 PMD 0
      [ 3260.121748] Oops: 0002 [1] SMP
      [ 3260.121753] CPU 1
      [ 3260.121756] Modules linked in: dump_test(P) snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device ide_cd generic ohci1394 snd_hda_intel snd_hda_codec snd_pcm snd_timer snd ieee1394 snd_page_alloc piix ide_core arcmsr aic79xx scsi_transport_spi usblp
      [ 3260.121785] Pid: 5556, comm: bash Tainted: P      2.6.18-git10 #1
      
      [Alternatively, I can look into listing tainted flags with 'lsmod',
      but that won't help in oopsen/panics so much.]
      
      [akpm@osdl.org: cleanup]
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2bc2d61a
  24. 30 9月, 2006 1 次提交
  25. 30 8月, 2006 1 次提交
    • J
      [SCSI] MODULE_FIRMWARE for binary firmware(s) · 187afbed
      Jon Masters 提交于
      Right now, various kernel modules are being migrated over to use
      request_firmware in order to pull in binary firmware blobs from userland
      when the module is loaded. This makes sense.
      
      However, there is right now little mechanism in place to automatically
      determine which binary firmware blobs must be included with a kernel in
      order to satisfy the prerequisites of these drivers. This affects
      vendors, but also regular users to a certain extent too.
      
      The attached patch introduces MODULE_FIRMWARE as a mechanism for
      advertising that a particular firmware file is to be loaded - it will
      then show up via modinfo and could be used e.g. when packaging a kernel.
      Signed-off-by: NJon Masters <jcm@redhat.com>
      
      Comments added in line with all the other MODULE_ tag
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      187afbed
  26. 15 7月, 2006 1 次提交
  27. 04 7月, 2006 1 次提交
  28. 29 6月, 2006 1 次提交
  29. 27 6月, 2006 1 次提交
  30. 23 6月, 2006 1 次提交