1. 05 6月, 2010 2 次提交
    • R
      module: fix kdb's illicit use of struct module_use. · c8e21ced
      Rusty Russell 提交于
      Linus changed the structure, and luckily this didn't compile any more.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Martin Hicks <mort@sgi.com>
      c8e21ced
    • L
      module: Make the 'usage' lists be two-way · 2c02dfe7
      Linus Torvalds 提交于
      When adding a module that depends on another one, we used to create a
      one-way list of "modules_which_use_me", so that module unloading could
      see who needs a module.
      
      It's actually quite simple to make that list go both ways: so that we
      not only can see "who uses me", but also see a list of modules that are
      "used by me".
      
      In fact, we always wanted that list in "module_unload_free()": when we
      unload a module, we want to also release all the other modules that are
      used by that module.  But because we didn't have that list, we used to
      first iterate over all modules, and then iterate over each "used by me"
      list of that module.
      
      By making the list two-way, we simplify module_unload_free(), and it
      allows for some trivial fixes later too.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cleaned & rebased)
      2c02dfe7
  2. 06 4月, 2010 1 次提交
    • N
      Fix up possibly racy module refcounting · 5fbfb18d
      Nick Piggin 提交于
      Module refcounting is implemented with a per-cpu counter for speed.
      However there is a race when tallying the counter where a reference may
      be taken by one CPU and released by another.  Reference count summation
      may then see the decrement without having seen the previous increment,
      leading to lower than expected count.  A module which never has its
      actual reference drop below 1 may return a reference count of 0 due to
      this race.
      
      Module removal generally runs under stop_machine, which prevents this
      race causing bugs due to removal of in-use modules.  However there are
      other real bugs in module.c code and driver code (module_refcount is
      exported) where the callers do not run under stop_machine.
      
      Fix this by maintaining running per-cpu counters for the number of
      module refcount increments and the number of refcount decrements.  The
      increments are tallied after the decrements, so any decrement seen will
      always have its corresponding increment counted.  The final refcount is
      the difference of the total increments and decrements, preventing a
      low-refcount from being returned.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fbfb18d
  3. 01 4月, 2010 1 次提交
  4. 31 3月, 2010 1 次提交
  5. 29 3月, 2010 2 次提交
  6. 13 3月, 2010 1 次提交
  7. 17 2月, 2010 1 次提交
  8. 05 1月, 2010 2 次提交
  9. 15 12月, 2009 1 次提交
    • A
      module: make MODULE_SYMBOL_PREFIX into a CONFIG option · 9e1b9b80
      Alan Jenkins 提交于
      The next commit will require the use of MODULE_SYMBOL_PREFIX in
      .tmp_exports-asm.S.  Currently it is mixed in with C structure
      definitions in "asm/module.h".  Move the definition of this arch option
      into Kconfig, so it can be easily accessed by any code.
      
      This also lets modpost.c use the same definition.  Previously modpost
      relied on a hardcoded list of architectures in mk_elfconfig.c.
      
      A build test for blackfin, one of the two MODULE_SYMBOL_PREFIX archs,
      showed the generated code was unchanged.  vmlinux was identical save
      for build ids, and an apparently randomized suffix on a single "__key"
      symbol in the kallsyms data).
      Signed-off-by: NAlan Jenkins <alan-jenkins@tuffmail.co.uk>
      Acked-by: Mike Frysinger <vapier@gentoo.org> (blackfin)
      CC: Sam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9e1b9b80
  10. 24 9月, 2009 3 次提交
  11. 19 9月, 2009 1 次提交
  12. 17 8月, 2009 1 次提交
    • L
      tracing/events: Add module tracepoints · 7ead8b83
      Li Zefan 提交于
      Add trace points to trace module_load, module_free, module_get,
      module_put and module_request, and use trace_event facility to
      get the trace output.
      
      Here's the sample output:
      
           TASK-PID    CPU#    TIMESTAMP  FUNCTION
              | |       |          |         |
          <...>-42    [000]     1.758380: module_request: fb0 wait=1 call_site=fb_open
          ...
          <...>-60    [000]     3.269403: module_load: scsi_wait_scan
          <...>-60    [000]     3.269432: module_put: scsi_wait_scan call_site=sys_init_module refcnt=0
          <...>-61    [001]     3.273168: module_free: scsi_wait_scan
          ...
          <...>-1021  [000]    13.836081: module_load: sunrpc
          <...>-1021  [000]    13.840589: module_put: sunrpc call_site=sys_init_module refcnt=-1
          <...>-1027  [000]    13.848098: module_get: sunrpc call_site=try_module_get refcnt=0
          <...>-1027  [000]    13.848308: module_get: sunrpc call_site=get_filesystem refcnt=1
          <...>-1027  [000]    13.848692: module_put: sunrpc call_site=put_filesystem refcnt=0
          ...
       modprobe-2587  [001]  1088.437213: module_load: trace_events_sample F
       modprobe-2587  [001]  1088.437786: module_put: trace_events_sample call_site=sys_init_module refcnt=0
      
      Note:
      
      - the taints flag can be 'F', 'C' and/or 'P' if mod->taints != 0
      
      - the module refcnt is percpu, so it can be negative in a
        specific cpu
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <4A891B3C.5030608@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ead8b83
  13. 19 6月, 2009 1 次提交
  14. 17 6月, 2009 1 次提交
  15. 12 6月, 2009 1 次提交
    • R
      module: trim exception table on init free. · ad6561df
      Rusty Russell 提交于
      It's theoretically possible that there are exception table entries
      which point into the (freed) init text of modules.  These could cause
      future problems if other modules get loaded into that memory and cause
      an exception as we'd see the wrong fixup.  The only case I know of is
      kvm-intel.ko (when CONFIG_CC_OPTIMIZE_FOR_SIZE=n).
      
      Amerigo fixed this long-standing FIXME in the x86 version, but this
      patch is more general.
      
      This implements trim_init_extable(); most archs are simple since they
      use the standard lib/extable.c sort code.  Alpha and IA64 use relative
      addresses in their fixups, so thier trimming is a slight variation.
      
      Sparc32 is unique; it doesn't seem to define ARCH_HAS_SORT_EXTABLE,
      yet it defines its own sort_extable() which overrides the one in lib.
      It doesn't sort, so we have to mark deleted entries instead of
      actually trimming them.
      Inspired-by: NAmerigo Wang <amwang@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: linux-alpha@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      ad6561df
  16. 17 4月, 2009 1 次提交
    • S
      ftrace: use module notifier for function tracer · 93eb677d
      Steven Rostedt 提交于
      The hooks in the module code for the function tracer must be called
      before any of that module code runs. The function tracer hooks
      modify the module (replacing calls to mcount to nops). If the code
      is executed while the change occurs, then the CPU can take a GPF.
      
      To handle the above with a bit of paranoia, I originally implemented
      the hooks as calls directly from the module code.
      
      After examining the notifier calls, it looks as though the start up
      notify is called before any of the module's code is executed. This makes
      the use of the notify safe with ftrace.
      
      Only the startup notify is required to be "safe". The shutdown simply
      removes the entries from the ftrace function list, and does not modify
      any code.
      
      This change has another benefit. It removes a issue with a reverse dependency
      in the mutexes of ftrace_lock and module_mutex.
      
      [ Impact: fix lock dependency bug, cleanup ]
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      93eb677d
  17. 15 4月, 2009 1 次提交
    • S
      tracing/events: add support for modules to TRACE_EVENT · 6d723736
      Steven Rostedt 提交于
      Impact: allow modules to add TRACE_EVENTS on load
      
      This patch adds the final hooks to allow modules to use the TRACE_EVENT
      macro. A notifier and a data structure are used to link the TRACE_EVENTs
      defined in the module to connect them with the ftrace event tracing system.
      
      It also adds the necessary automated clean ups to the trace events when a
      module is removed.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6d723736
  18. 31 3月, 2009 5 次提交
    • T
      module: Export symbols needed for Ksplice · c6b37801
      Tim Abbott 提交于
      Impact: Expose some module.c symbols
      
      Ksplice uses several functions from module.c in order to resolve
      symbols and implement dependency handling.  Calling these functions
      requires holding module_mutex, so it is exported.
      
      (This is just the module part of a bigger add-exports patch from Tim).
      
      Cc: Anders Kaseorg <andersk@mit.edu>
      Cc: Jeff Arnold <jbarnold@mit.edu>
      Signed-off-by: NTim Abbott <tabbott@mit.edu>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      c6b37801
    • A
      Ksplice: Add functions for walking kallsyms symbols · 75a66614
      Anders Kaseorg 提交于
      Impact: New API
      
      kallsyms_lookup_name only returns the first match that it finds.  Ksplice
      needs information about all symbols with a given name in order to correctly
      resolve local symbols.
      
      kallsyms_on_each_symbol provides a generic mechanism for iterating over the
      kallsyms table.
      
      Cc: Jeff Arnold <jbarnold@mit.edu>
      Cc: Tim Abbott <tabbott@mit.edu>
      Signed-off-by: NAnders Kaseorg <andersk@mit.edu>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      75a66614
    • R
      module: remove module_text_address() · a6e6abd5
      Rusty Russell 提交于
      Impact: Replace and remove risky (non-EXPORTed) API
      
      module_text_address() returns a pointer to the module, which given locking
      improvements in module.c, is useless except to test for NULL:
      
      1) If the module can't go away, use __module_text_address.
      2) Otherwise, just use is_module_text_address().
      
      Cc: linux-mtd@lists.infradead.org
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      a6e6abd5
    • R
      module: __module_address · e610499e
      Rusty Russell 提交于
      Impact: New API, cleanup
      
      ksplice wants to know the bounds of a module, not just the module text.
      
      It makes sense to have __module_address.  We then implement
      is_module_address and __module_text_address in terms of this (and
      change is_module_text_address() to bool while we're at it).
      
      Also, add proper kerneldoc for them all.
      
      Cc: Anders Kaseorg <andersk@mit.edu>
      Cc: Jeff Arnold <jbarnold@mit.edu>
      Cc: Tim Abbott <tabbott@mit.edu>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e610499e
    • R
      param: fix charp parameters set via sysfs · e180a6b7
      Rusty Russell 提交于
      Impact: fix crash on reading from /sys/module/.../ieee80211_default_rc_algo
      
      The module_param type "charp" simply sets a char * pointer in the
      module to the parameter in the commandline string: this is why we keep
      the (mangled) module command line around.  But when set via sysfs (as
      about 11 charp parameters can be) this memory is freed on the way
      out of the write().  Future reads hit random mem.
      
      So we kstrdup instead: we have to check we're not in early commandline
      parsing, and we have to note when we've used it so we can reliably
      kfree the parameter when it's next overwritten, and also on module
      unload.
      
      (Thanks to Randy Dunlap for CONFIG_SYSFS=n fixes)
      Reported-by: NSitsofe Wheeler <sitsofe@yahoo.com>
      Diagnosed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: NChristof Schmitt <christof.schmitt@de.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e180a6b7
  19. 07 3月, 2009 2 次提交
    • F
      tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() · 769b0441
      Frederic Weisbecker 提交于
      Impact: faster and lighter tracing
      
      Now that we have trace_bprintk() which is faster and consume lesser
      memory than trace_printk() and has the same purpose, we can now drop
      the old implementation in favour of the binary one from trace_bprintk(),
      which means we move all the implementation of trace_bprintk() to
      trace_printk(), so the Api doesn't change except that we must now use
      trace_seq_bprintk() to print the TRACE_PRINT entries.
      
      Some changes result of this:
      
      - Previously, trace_bprintk depended of a single tracer and couldn't
        work without. This tracer has been dropped and the whole implementation
        of trace_printk() (like the module formats management) is now integrated
        in the tracing core (comes with CONFIG_TRACING), though we keep the file
        trace_printk (previously trace_bprintk.c) where we can find the module
        management. Thus we don't overflow trace.c
      
      - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.
      
      - change a bit trace_printk/trace_vprintk macros to support non-builtin formats
        constants, and fix 'const' qualifiers warnings. But this is all transparent for
        developers.
      
      - etc...
      
      V2:
      
      - Rebase against last changes
      - Fix mispell on the changelog
      
      V3:
      
      - Rebase against last changes (moving trace_printk() to kernel.h)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      769b0441
    • L
      tracing: add trace_bprintk() · 1ba28e02
      Lai Jiangshan 提交于
      Impact: add a generic printk() for tracing, like trace_printk()
      
      trace_bprintk() uses the infrastructure to record events on ring_buffer.
      
      [ fweisbec@gmail.com: ported to latest -tip, made it work if
        !CONFIG_MODULES, never free the format strings from modules
        because we can't keep track of them and conditionnaly create
        the ftrace format strings section (reported by Steven Rostedt) ]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1ba28e02
  20. 08 2月, 2009 1 次提交
  21. 03 2月, 2009 1 次提交
    • E
      modules: Use a better scheme for refcounting · 720eba31
      Eric Dumazet 提交于
      Current refcounting for modules (done if CONFIG_MODULE_UNLOAD=y) is
      using a lot of memory.
      
      Each 'struct module' contains an [NR_CPUS] array of full cache lines.
      
      This patch uses existing infrastructure (percpu_modalloc() &
      percpu_modfree()) to allocate percpu space for the refcount storage.
      
      Instead of wasting NR_CPUS*128 bytes (on i386), we now use
      nr_cpu_ids*sizeof(local_t) bytes.
      
      On a typical distro, where NR_CPUS=8, shiping 2000 modules, we reduce
      size of module files by about 2 Mbytes. (1Kb per module)
      
      Instead of having all refcounters in the same memory node - with TLB misses
      because of vmalloc() - this new implementation permits to have better
      NUMA properties, since each  CPU will use storage on its preferred node,
      thanks to percpu storage.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      720eba31
  22. 07 1月, 2009 2 次提交
    • M
      module: add within_module_core() and within_module_init() · a06f6211
      Masami Hiramatsu 提交于
      This series of patches allows kprobes to probe module's __init and __exit
      functions.  This means, you can probe driver initialization and
      terminating.
      
      Currently, kprobes can't probe __init function because these functions are
      freed after module initialization.  And it also can't probe module __exit
      functions because kprobe increments reference count of target module and
      user can't unload it.  this means __exit functions never be called unless
      removing probes from the module.
      
      To solve both cases, this series of patches introduces GONE flag and sets
      it when the target code is freed(for this purpose, kprobes hooks
      MODULE_STATE_* events).  This also removes refcount incrementing for
      allowing user to unload target module.  Users can check which probes are
      GONE by debugfs interface.  For taking timing of freeing module's .init
      text, these also include a patch which adds module's notifier of
      MODULE_STATE_LIVE event.
      
      This patch:
      
      Add within_module_core() and within_module_init() for checking whether an
      address is in the module .init.text section or .text section, and replace
      within() local inline functions in kernel/module.c with them.
      
      kprobes uses these functions to check where the kprobe is inserted.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a06f6211
    • A
      Remove remaining unwinder code · f1883f86
      Alexey Dobriyan 提交于
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Gabor Gombas <gombasg@sztaki.hu>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>,
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1883f86
  23. 22 10月, 2008 3 次提交
    • R
      param: Fix duplicate module prefixes · 9b473de8
      Rusty Russell 提交于
      Instead of insisting each new module_param sysfs entry is unique,
      handle the case where it already exists (for builtin modules).
      
      The current code assumes that all identical prefixes are together in
      the section: true for normal uses, but not necessarily so if someone
      overrides MODULE_PARAM_PREFIX.  More importantly, it's not true with
      the new "core_param()" code which uses "kernel" as a prefix.
      
      This simplifies the caller for the builtin case, at a slight loss of
      efficiency (we do the lookup every time to see if the directory
      exists).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      9b473de8
    • R
      module: check kernel param length at compile time, not runtime · 730b69d2
      Rusty Russell 提交于
      The kparam code tries to handle over-length parameter prefixes at
      runtime.  Not only would I bet this has never been tested, it's not
      clear that truncating names is a good idea either.
      
      So let's check at compile time.  We need to move the #define to
      moduleparam.h to do this, though.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      730b69d2
    • R
      module: simplify load_module. · 5e458cc0
      Rusty Russell 提交于
      Linus' recent catch of stack overflow in load_module lead me to look
      at the code.  A couple of helpers to get a section address and get
      objects from a section can help clean things up a little.
      
      (And in case you're wondering, the stack size also dropped from 328 to
      284 bytes).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      5e458cc0
  24. 17 10月, 2008 1 次提交
    • J
      driver core: basic infrastructure for per-module dynamic debug messages · 346e15be
      Jason Baron 提交于
      Base infrastructure to enable per-module debug messages.
      
      I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
      control of debugging statements on a per-module basis in one /proc file,
      currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
      is not set, debugging statements can still be enabled as before, often by
      defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
      affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.
      
      The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
      is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
      can be dynamically enabled/disabled on a per-module basis.
      
      Future plans include extending this functionality to subsystems, that define 
      their own debug levels and flags.
      
      Usage:
      
      Dynamic debugging is controlled by the debugfs file, 
      <debugfs>/dynamic_printk/modules. This file contains a list of the modules that
      can be enabled. The format of the file is as follows:
      
      	<module_name> <enabled=0/1>
      		.
      		.
      		.
      
      	<module_name> : Name of the module in which the debug call resides
      	<enabled=0/1> : whether the messages are enabled or not
      
      For example:
      
      	snd_hda_intel enabled=0
      	fixup enabled=1
      	driver enabled=0
      
      Enable a module:
      
      	$echo "set enabled=1 <module_name>" > dynamic_printk/modules
      
      Disable a module:
      
      	$echo "set enabled=0 <module_name>" > dynamic_printk/modules
      
      Enable all modules:
      
      	$echo "set enabled=1 all" > dynamic_printk/modules
      
      Disable all modules:
      
      	$echo "set enabled=0 all" > dynamic_printk/modules
      
      Finally, passing "dynamic_printk" at the command line enables
      debugging for all modules. This mode can be turned off via the above
      disable command.
      
      [gkh: minor cleanups and tweaks to make the build work quietly]
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      346e15be
  25. 14 10月, 2008 1 次提交
    • M
      tracing: Kernel Tracepoints · 97e1c18e
      Mathieu Desnoyers 提交于
      Implementation of kernel tracepoints. Inspired from the Linux Kernel
      Markers. Allows complete typing verification by declaring both tracing
      statement inline functions and probe registration/unregistration static
      inline functions within the same macro "DEFINE_TRACE". No format string
      is required. See the tracepoint Documentation and Samples patches for
      usage examples.
      
      Taken from the documentation patch :
      
      "A tracepoint placed in code provides a hook to call a function (probe)
      that you can provide at runtime. A tracepoint can be "on" (a probe is
      connected to it) or "off" (no probe is attached). When a tracepoint is
      "off" it has no effect, except for adding a tiny time penalty (checking
      a condition for a branch) and space penalty (adding a few bytes for the
      function call at the end of the instrumented function and adds a data
      structure in a separate section).  When a tracepoint is "on", the
      function you provide is called each time the tracepoint is executed, in
      the execution context of the caller. When the function provided ends its
      execution, it returns to the caller (continuing from the tracepoint
      site).
      
      You can put tracepoints at important locations in the code. They are
      lightweight hooks that can pass an arbitrary number of parameters, which
      prototypes are described in a tracepoint declaration placed in a header
      file."
      
      Addition and removal of tracepoints is synchronized by RCU using the
      scheduler (and preempt_disable) as guarantees to find a quiescent state
      (this is really RCU "classic"). The update side uses rcu_barrier_sched()
      with call_rcu_sched() and the read/execute side uses
      "preempt_disable()/preempt_enable()".
      
      We make sure the previous array containing probes, which has been
      scheduled for deletion by the rcu callback, is indeed freed before we
      proceed to the next update. It therefore limits the rate of modification
      of a single tracepoint to one update per RCU period. The objective here
      is to permit fast batch add/removal of probes on _different_
      tracepoints.
      
      Changelog :
      - Use #name ":" #proto as string to identify the tracepoint in the
        tracepoint table. This will make sure not type mismatch happens due to
        connexion of a probe with the wrong type to a tracepoint declared with
        the same name in a different header.
      - Add tracepoint_entry_free_old.
      - Change __TO_TRACE to get rid of the 'i' iterator.
      
      Masami Hiramatsu <mhiramat@redhat.com> :
      Tested on x86-64.
      
      Performance impact of a tracepoint : same as markers, except that it
      adds about 70 bytes of instructions in an unlikely branch of each
      instrumented function (the for loop, the stack setup and the function
      call). It currently adds a memory read, a test and a conditional branch
      at the instrumentation site (in the hot path). Immediate values will
      eventually change this into a load immediate, test and branch, which
      removes the memory read which will make the i-cache impact smaller
      (changing the memory read for a load immediate removes 3-4 bytes per
      site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
      also saves the d-cache hit).
      
      About the performance impact of tracepoints (which is comparable to
      markers), even without immediate values optimizations, tests done by
      Hideo Aoki on ia64 show no regression. His test case was using hackbench
      on a kernel where scheduler instrumentation (about 5 events in code
      scheduler code) was added.
      
      Quoting Hideo Aoki about Markers :
      
      I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
      tree, which includes several markers for LTTng, using an ia64 server.
      
      While the immediate trace mark feature isn't implemented on ia64, there
      is no major performance regression. So, I think that we don't have any
      issues to propose merging marker point patches into Linus's tree from
      the viewpoint of performance impact.
      
      I prepared two kernels to evaluate. The first one was compiled without
      CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
      
      I downloaded the original hackbench from the following URL:
      http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
      
      I ran hackbench 5 times in each condition and calculated the average and
      difference between the kernels.
      
          The parameter of hackbench: every 50 from 50 to 800
          The number of CPUs of the server: 2, 4, and 8
      
      Below is the results. As you can see, major performance regression
      wasn't found in any case. Even if number of processes increases,
      differences between marker-enabled kernel and marker- disabled kernel
      doesn't increase. Moreover, if number of CPUs increases, the differences
      doesn't increase either.
      
      Curiously, marker-enabled kernel is better than marker-disabled kernel
      in more than half cases, although I guess it comes from the difference
      of memory access pattern.
      
      * 2 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
            100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
            150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
            200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
            250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
            300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
            350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
            400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
            450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
            500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
            550 |     58.401   |      57.700  |  -0.701  |  -1.2   |
            600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
            650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
            700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
            750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
            800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
      --------------------------------------------------------------
      
      * 4 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
            100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
            150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
            200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
            250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
            300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
            350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
            400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
            450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
            500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
            550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
            600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
            650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
            700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
            750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
            800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
      --------------------------------------------------------------
      
      * 8 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
            100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
            150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
            200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
            250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
            300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
            350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
            400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
            450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
            500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
            550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
            600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
            650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
            700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
            750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
            800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
      --------------------------------------------------------------
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: N'Peter Zijlstra' <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97e1c18e
  26. 25 7月, 2008 1 次提交
  27. 22 7月, 2008 1 次提交