1. 15 12月, 2009 1 次提交
    • A
      module: make MODULE_SYMBOL_PREFIX into a CONFIG option · 9e1b9b80
      Alan Jenkins 提交于
      The next commit will require the use of MODULE_SYMBOL_PREFIX in
      .tmp_exports-asm.S.  Currently it is mixed in with C structure
      definitions in "asm/module.h".  Move the definition of this arch option
      into Kconfig, so it can be easily accessed by any code.
      
      This also lets modpost.c use the same definition.  Previously modpost
      relied on a hardcoded list of architectures in mk_elfconfig.c.
      
      A build test for blackfin, one of the two MODULE_SYMBOL_PREFIX archs,
      showed the generated code was unchanged.  vmlinux was identical save
      for build ids, and an apparently randomized suffix on a single "__key"
      symbol in the kallsyms data).
      Signed-off-by: NAlan Jenkins <alan-jenkins@tuffmail.co.uk>
      Acked-by: Mike Frysinger <vapier@gentoo.org> (blackfin)
      CC: Sam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9e1b9b80
  2. 25 9月, 2009 1 次提交
  3. 13 9月, 2009 1 次提交
    • J
      ftrace: __start_mcount_loc should be .init.rodata · 4b3b4c5e
      John Reiser 提交于
      __start_mcount_loc[] is unused after init, yet occupies RAM forever
      as part of .rodata.  152kiB is typical on a 64-bit architecture.  Instead,
      __start_mcount_loc should be in the interval [__init_begin, __init_end)
      so that the space is reclaimed after init.
      
      __start_mcount_loc[] is generated during the load portion
      of kernel build, and is used only by ftrace_init().  ftrace_init is declared
      '__init' and is in .init.text, which is freed after init.
      __start_mcount_loc is placed into .rodata by a call to MCOUNT_REC inside
      the RO_DATA macro of include/asm-generic/vmlinux.lds.h.  The array *is*
      read-only, but more importantly it is not used after init.  So the call to
      MCOUNT_REC should be moved from RO_DATA to INIT_DATA.
      
      This patch has been tested on x86_64 with CONFIG_DEBUG_PAGEALLOC=y
      which verifies that the address range never is accessed after init.
      Signed-off-by: NJohn Reiser <jreiser@BitWagon.com>
      LKML-Reference: <4A6DF0B6.7080402@bitwagon.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4b3b4c5e
  4. 18 7月, 2009 1 次提交
    • T
      vmlinux.lds.h: restructure BSS linker script macros · 04e448d9
      Tim Abbott 提交于
      The BSS section macros in vmlinux.lds.h currently place the .sbss
      input section outside the bounds of [__bss_start, __bss_end].  On all
      architectures except for microblaze that handle both .sbss and
      __bss_start/__bss_end, this is wrong: the .sbss input section is
      within the range [__bss_start, __bss_end].  Relatedly, the example
      code at the top of the file actually has __bss_start/__bss_end defined
      twice; I believe the right fix here is to define them in the
      BSS_SECTION macro but not in the BSS macro.
      
      Another problem with the current macros is that several
      architectures have an ALIGN(4) or some other small number just before
      __bss_stop in their linker scripts.  The BSS_SECTION macro currently
      hardcodes this to 4; while it should really be an argument.  It also
      ignores its sbss_align argument; fix that.
      
      mn10300 is the only user at present of any of the macros touched by
      this patch.  It looks like mn10300 actually was incorrectly converted
      to use the new BSS() macro (the alignment of 4 prior to conversion was
      a __bss_stop alignment, but the argument to the BSS macro is a start
      alignment).  So fix this as well.
      
      I'd like acks from Sam and David on this one.  Also CCing Paul, since
      he has a patch from me which will need to be updated to use
      BSS_SECTION(0, PAGE_SIZE, 4) once this gets merged.
      Signed-off-by: NTim Abbott <tabbott@ksplice.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      04e448d9
  5. 09 7月, 2009 1 次提交
    • T
      linker script: unify usage of discard definition · 023bf6f1
      Tejun Heo 提交于
      Discarded sections in different archs share some commonality but have
      considerable differences.  This led to linker script for each arch
      implementing its own /DISCARD/ definition, which makes maintaining
      tedious and adding new entries error-prone.
      
      This patch makes all linker scripts to move discard definitions to the
      end of the linker script and use the common DISCARDS macro.  As ld
      uses the first matching section definition, archs can include default
      discarded sections by including them earlier in the linker script.
      
      ia64 is notable because it first throws away some ia64 specific
      subsections and then include the rest of the sections into the final
      image, so those sections must be discarded before the inclusion.
      
      defconfig compile tested for x86, x86-64, powerpc, powerpc64, ia64,
      alpha, sparc, sparc64 and s390.  Michal Simek tested microblaze.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      Tested-by: NMichal Simek <monstr@monstr.eu>
      Cc: linux-arch@vger.kernel.org
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: microblaze-uclinux@itee.uq.edu.au
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Tony Luck <tony.luck@intel.com>
      023bf6f1
  6. 01 7月, 2009 1 次提交
    • H
      gcov: fix __ctors_start alignment · 2a2325e6
      Heiko Carstens 提交于
      The ctors section for each object file is eight byte aligned (on 64 bit).
      However the __ctors_start symbol starts at an arbitrary address dependent
      on the size of the previous sections.
      
      Therefore the linker may add some zeroes after __ctors_start to make sure
      the ctors contents are properly aligned.  However the extra zeroes at the
      beginning aren't expected by the code.  When walking the functions
      pointers contained in there and extra zeroes are added this may result in
      random jumps.  So make sure that the __ctors_start symbol is always
      aligned as well.
      
      Fixes this crash on an allyesconfig on s390:
      
      [    0.582482] Kernel BUG at 0000000000000012 [verbose debug info unavailable]
      [    0.582489] illegal operation: 0001 [#1] SMP DEBUG_PAGEALLOC
      [    0.582496] Modules linked in:
      [    0.582501] CPU: 0 Tainted: G        W  2.6.31-rc1-dirty #273
      [    0.582506] Process swapper (pid: 1, task: 000000003f218000, ksp: 000000003f2238e8)
      [    0.582510] Krnl PSW : 0704200180000000 0000000000000012 (0x12)
      [    0.582518]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
      [    0.582524] Krnl GPRS: 0000000000036727 0000000000000010 0000000000000001 0000000000000001
      [    0.582529]            00000000001dfefa 0000000000000000 0000000000000000 0000000000000040
      [    0.582534]            0000000001fff0f0 0000000001790628 0000000002296048 0000000002296048
      [    0.582540]            00000000020c438e 0000000001786000 0000000002014a66 000000003f223e60
      [    0.582553] Krnl Code:>0000000000000012: 0000                unknown
      [    0.582559]            0000000000000014: 0000                unknown
      [    0.582564]            0000000000000016: 0000                unknown
      [    0.582570]            0000000000000018: 0000                unknown
      [    0.582575]            000000000000001a: 0000                unknown
      [    0.582580]            000000000000001c: 0000                unknown
      [    0.582585]            000000000000001e: 0000                unknown
      [    0.582591]            0000000000000020: 0000                unknown
      [    0.582596] Call Trace:
      [    0.582599] ([<0000000002014a46>] kernel_init+0x622/0x7a0)
      [    0.582607]  [<0000000000113e22>] kernel_thread_starter+0x6/0xc
      [    0.582615]  [<0000000000113e1c>] kernel_thread_starter+0x0/0xc
      [    0.582621] INFO: lockdep is turned off.
      [    0.582624] Last Breaking-Event-Address:
      [    0.582627]  [<0000000002014a64>] kernel_init+0x640/0x7a0
      
      Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a2325e6
  7. 27 6月, 2009 2 次提交
    • T
      asm-generic/vmlinux.lds.h: shuffle INIT_TASK* macro names in vmlinux.lds.h · 39a449d9
      Tim Abbott 提交于
      We recently added a INIT_TASK(align) in include/asm-generic/vmlinux.lds.h,
      but there is already a macro INIT_TASK in include/linux/init_task.h, which
      is quite confusing.  We should switch the macro in the linker script to
      INIT_TASK_DATA. (Sorry that I missed this in reviewing the patch).  Since
      the macros are new, there is only one user of the INIT_TASK in
      vmlinux.lds.h, arch/mn10300/kernel/vmlinux.lds.S.
      
      However, we are currently using INIT_TASK_DATA for laying down an entire
      .data.init_task section.  So rename that to INIT_TASK_DATA_SECTION.
      
      I would be worried about changing the meaning of INIT_TASK_DATA, but the
      old INIT_TASK_DATA implementation had no users, and in fact if anyone had
      tried to use it, it would have failed to compile because it didn't pass
      the alignment to the old INIT_TASK.
      Signed-off-by: NTim Abbott <tabbott@ksplice.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jesper Nilsson <Jesper.Nilsson@axis.com
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      39a449d9
    • P
      asm-generic/vmlinux.lds.h: Fix up RW_DATA_SECTION definition. · 73f1d939
      Paul Mundt 提交于
      RW_DATA_SECTION is defined to take 4 different alignment parameters,
      while NOSAVE_DATA currently uses a fixed PAGE_SIZE alignment as noted
      in the comments.
      
      There are presently no in-tree users of this at present, and I just
      stumbled across this while implementing the simplified script on a new
      architecture port, which subsequently resulted in a syntax error.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      73f1d939
  8. 24 6月, 2009 1 次提交
    • T
      linker script: throw away .discard section · 405d967d
      Tejun Heo 提交于
      x86 throws away .discard section but no other archs do.  Also,
      .discard is not thrown away while linking modules.  Make every arch
      and module linking throw it away.  This will be used to define dummy
      variables for percpu declarations and definitions.
      
      This patch is based on Ivan Kokshaysky's alpha percpu patch.
      
      [ Impact: always throw away everything in .discard ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Bryan Wu <cooloney@kernel.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      405d967d
  9. 23 6月, 2009 1 次提交
  10. 19 6月, 2009 1 次提交
  11. 15 6月, 2009 1 次提交
    • S
      vmlinux.lds.h update · 7923f90f
      Sam Ravnborg 提交于
      Updated after review by Tim Abbott.
      - Use HEAD_TEXT_SECTION
      - Drop use of section-names.h and delete file
      - Introduce EXIT_CALL
      
      Deleting section-names.h required a few simple
      updates of init.h
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      Cc: Tim Abbott <tabbott@ksplice.com>
      7923f90f
  12. 10 6月, 2009 2 次提交
    • S
      Improve vmlinux.lds.h support for arch specific linker scripts · ef53dae8
      Sam Ravnborg 提交于
      To support alingment of the individual architecture specific linker scripts
      provide a set of general definitions in vmlinux.lds.h
      
      With these definitions applied the diverse linekr scripts can be reduced
      in line count and their readability are improved - IMO.
      
      A sample linker script is included to give the preferred
      order of the sections for the architectures that do not
      have any special requirments.
      
      These definitions are also a first step towards eventual
      support for -ffunction-sections.
      The definitions makes it much easier to do a global
      renaming of section names - but the main purpose is
      to clean up the linker scripts.
      
      Tim Aboot has provided a lot of inputs to improve
      the definitions - all faults are mine.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      Cc: Tim Abbott <tabbott@mit.edu>
      ef53dae8
    • J
      initconst adjustments · fd6c3a8d
      Jan Beulich 提交于
      - add .init.rodata to INIT_DATA, and group all initconst flavors
        together
      - move strings generated from __setup_param() into .init.rodata
      - add .*init.rodata to modpost's sets of init sections
      - make modpost warn about references between meminit and cpuinit
        as well as memexit and cpuexit sections (as CPU and memory
        hotplug are independently selectable features)
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      fd6c3a8d
  13. 28 4月, 2009 1 次提交
  14. 27 4月, 2009 1 次提交
    • T
      Add new HEAD_TEXT_SECTION macro. · c80d471a
      Tim Abbott 提交于
      This patch is preparation for replacing all uses of ".head.text" or
      ".text.head" in the kernel with macros, so that the section name can
      later be changed without having to touch a lot of the kernel.
      
      Since some linker scripts do more complex things than referencing
      HEAD_TEXT, we add a HEAD_TEXT_SECTION macro that just contains the
      actual name.
      
      I've defined HEAD_TEXT_SECTION in a new header,
      include/linux/section-names.h, so that this section name only needs to
      appear in one place.  I anticipate creating similar macro structures
      for a number of other section names.
      
      The long-term goal here is to be able to change the kernel's magic
      section names to those that are compatible with -ffunction-sections
      -fdata-sections.  This requires renaming all magic sections with names
      of the form ".text.foo".
      Signed-off-by: NTim Abbott <tabbott@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c80d471a
  15. 14 4月, 2009 1 次提交
    • T
      tracing/infrastructure: separate event tracer from event support · 5f77a88b
      Tom Zanussi 提交于
      Add a new config option, CONFIG_EVENT_TRACING that gets selected
      when CONFIG_TRACING is selected and adds everything needed by the stuff
      in trace_export - basically all the event tracing support needed by e.g.
      bprint, minus the actual events, which are only included if
      CONFIG_EVENT_TRACER is selected.
      
      So CONFIG_EVENT_TRACER can be used to turn on or off the generated events
      (what I think of as the 'event tracer'), while CONFIG_EVENT_TRACING turns
      on or off the base event tracing support used by both the event tracer and
      the other things such as bprint that can't be configured out.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: fweisbec@gmail.com
      LKML-Reference: <1239178441.10295.34.camel@tropicana>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5f77a88b
  16. 25 3月, 2009 1 次提交
    • J
      dynamic debug: combine dprintk and dynamic printk · e9d376f0
      Jason Baron 提交于
      This patch combines Greg Bank's dprintk() work with the existing dynamic
      printk patchset, we are now calling it 'dynamic debug'.
      
      The new feature of this patchset is a richer /debugfs control file interface,
      (an example output from my system is at the bottom), which allows fined grained
      control over the the debug output. The output can be controlled by function,
      file, module, format string, and line number.
      
      for example, enabled all debug messages in module 'nf_conntrack':
      
      echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control
      
      to disable them:
      
      echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control
      
      A further explanation can be found in the documentation patch.
      Signed-off-by: NGreg Banks <gnb@sgi.com>
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      e9d376f0
  17. 13 3月, 2009 1 次提交
    • F
      tracing/syscalls: core infrastructure for syscalls tracing, enhancements · bed1ffca
      Frederic Weisbecker 提交于
      Impact: new feature
      
      This adds the generic support for syscalls tracing. This is
      currently exploited through a devoted tracer but other tracing
      engines can use it. (They just have to play with
      {start,stop}_ftrace_syscalls() and use the display callbacks
      unless they want to override them.)
      
      The syscalls prototypes definitions are abused here to steal
      some metadata informations:
      
      - syscall name, param types, param names, number of params
      
      The syscall addr is not directly saved during this definition
      because we don't know if its prototype is available in the
      namespace. But we don't really need it. The arch has just to
      build a function able to resolve the syscall number to its
      metadata struct.
      
      The current tracer prints the syscall names, parameters names
      and values (and their types optionally). Currently the value is
      a raw hex but higher level values diplaying is on my TODO list.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1236955332-10133-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bed1ffca
  18. 09 3月, 2009 1 次提交
    • I
      tracing: trace_printk() fix, move format array to data section · 8a20d84d
      Ingo Molnar 提交于
      Impact: fix kernel crash when using trace_printk()
      
      trace_printk_fmt section is defined into the readonly section.
      But we do:
      
      	trace_printk_fmt = fmt;
      
      to fill in that table of format strings - which is not read-only.
      Under CONFIG_DEBUG_RODATA=y this crashes ...
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8a20d84d
  19. 07 3月, 2009 1 次提交
  20. 25 2月, 2009 1 次提交
    • S
      tracing: add event trace infrastructure · b77e38aa
      Steven Rostedt 提交于
      This patch creates the event tracing infrastructure of ftrace.
      It will create the files:
      
       /debug/tracing/available_events
       /debug/tracing/set_event
      
      The available_events will list the trace points that have been
      registered with the event tracer.
      
      set_events will allow the user to enable or disable an event hook.
      
      example:
      
       # echo sched_wakeup > /debug/tracing/set_event
      
      Will enable the sched_wakeup event (if it is registered).
      
       # echo "!sched_wakeup" >> /debug/tracing/set_event
      
      Will disable the sched_wakeup event (and only that event).
      
       # echo > /debug/tracing/set_event
      
      Will disable all events (notice the '>')
      
       # cat /debug/tracing/available_events > /debug/tracing/set_event
      
      Will enable all registered event hooks.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      b77e38aa
  21. 31 1月, 2009 1 次提交
    • T
      linker script: use separate simpler definition for PERCPU() · 3ac6cffe
      Tejun Heo 提交于
      Impact: fix linker screwup on x86_32
      
      Recent x86_64 zerobased patches introduced PERCPU_VADDR() to put
      .data.percpu to a predefined address and re-defined PERCPU() in terms
      of it.  The new macro defined one extra symbol, __per_cpu_load, for
      LMA of the section so that the init data could be accessed.  This new
      symbol introduced the following problems to x86_32.
      
      1. If __per_cpu_load is defined outside of .data.percpu as an absolute
         symbol, relocation generation for relocatable kernel fails due to
         absolute relocation.
      
      2. If __per_cpu_load is put inside .data.percpu with absolute address
         assignment to work around #1, linker gets confused and under
         certain configurations ends up relocating the symbol against
         .data.percpu such that the load address gets added on top of
         already set load address.
      
      As x86_32 doesn't use predefined address for .data.percpu, there's no
      need for it to care about the possibility of __per_cpu_load being
      different from __per_cpu_start.
      
      This patch defines PERCPU() separately so that __per_cpu_load is
      defined inside .data.percpu so that everything is ordinary
      linking-wise.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3ac6cffe
  22. 30 1月, 2009 1 次提交
    • I
      Revert "generic, x86: fix __per_cpu_load relocation" · dba3d36b
      Ingo Molnar 提交于
      This reverts commit 5a611268.
      
      It is causing occasional boot crashes, caused by certain
      linker versions (GNU ld version 2.18.50.0.6-2 20080403) messing up:
      
       82dcc000 D __per_cpu_load
       c16e6000 A __per_cpu_load_abs
      
      The __per_cpu_load value is out of whack. Hpa noticed the following
      detail:
      
        * (gdb) p/x -(0xc16e6000-0x82dcc000)
        * $2 = 0xc16e6000
        * I.e. one is the other << 1
      
      The two symbols should be equal.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dba3d36b
  23. 26 1月, 2009 1 次提交
  24. 20 1月, 2009 2 次提交
  25. 17 1月, 2009 1 次提交
  26. 16 1月, 2009 2 次提交
    • T
      x86: fold pda into percpu area on SMP · 1a51e3a0
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      Currently pdas and percpu areas are allocated separately.  %gs points
      to local pda and percpu area can be reached using pda->data_offset.
      This patch folds pda into percpu area.
      
      Due to strange gcc requirement, pda needs to be at the beginning of
      the percpu area so that pda->stack_canary is at %gs:40.  To achieve
      this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
      added and used to reserve pda sized chunk at the start of the percpu
      area.
      
      After this change, for boot cpu, %gs first points to pda in the
      data.init area and later during setup_per_cpu_areas() gets updated to
      point to the actual pda.  This means that setup_per_cpu_areas() need
      to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
      already has modified it when control reaches setup_per_cpu_areas().
      
      This patch also removes now unnecessary get_local_pda() and its call
      sites.
      
      A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
      per cpu area" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a51e3a0
    • T
      x86: make percpu symbols zerobased on SMP · 3e5d8f97
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      This patch makes percpu symbols zerobased on x86_64 SMP by adding
      PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on
      the percpu output section and using it in vmlinux_64.lds.S.  A new
      PHDR is added as existing ones cannot contain sections near address
      zero.  PERCPU_VADDR() also adds a new symbol __per_cpu_load which
      always points to the vaddr of the loaded percpu data.init region.
      
      The following adjustments have been made to accomodate the address
      change.
      
      * code to locate percpu gdt_page in head_64.S is updated to add the
        load address to the gdt_page offset.
      
      * __per_cpu_load is used in places where access to the init data area
        is necessary.
      
      * pda->data_offset is initialized soon after C code is entered as zero
        value doesn't work anymore.
      
      This patch is mostly taken from Mike Travis' "x86_64: Base percpu
      variables at zero" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e5d8f97
  27. 12 12月, 2008 1 次提交
  28. 23 11月, 2008 2 次提交
    • S
      trace: profile all if conditionals · 2bcd521a
      Steven Rostedt 提交于
      Impact: feature to profile if statements
      
      This patch adds a branch profiler for all if () statements.
      The results will be found in:
      
        /debugfs/tracing/profile_branch
      
      For example:
      
         miss      hit    %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
             0        1 100 x86_64_start_reservations      head64.c             127
             0        1 100 copy_bootdata                  head64.c             69
             1        0   0 x86_64_start_kernel            head64.c             111
            32        0   0 set_intr_gate                  desc.h               319
             1        0   0 reserve_ebda_region            head.c               51
             1        0   0 reserve_ebda_region            head.c               47
             0        1 100 reserve_ebda_region            head.c               42
             0        0   X maxcpus                        main.c               165
      
      Miss means the branch was not taken. Hit means the branch was taken.
      The percent is the percentage the branch was taken.
      
      This adds a significant amount of overhead and should only be used
      by those analyzing their system.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2bcd521a
    • S
      trace: consolidate unlikely and likely profiler · 45b79749
      Steven Rostedt 提交于
      Impact: clean up to make one profiler of like and unlikely tracer
      
      The likely and unlikely profiler prints out the file and line numbers
      of the annotated branches that it is profiling. It shows the number
      of times it was correct or incorrect in its guess. Having two
      different files or sections for that matter to tell us if it was a
      likely or unlikely is pretty pointless. We really only care if
      it was correct or not.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      45b79749
  29. 16 11月, 2008 1 次提交
    • M
      tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() · 7e066fb8
      Mathieu Desnoyers 提交于
      Impact: API *CHANGE*. Must update all tracepoint users.
      
      Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint
      structure in a single spot for all the kernel. It helps reducing memory
      consumption, especially when declaring a lot of tracepoints, e.g. for
      kmalloc tracing.
      
      *API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for
      tracepoint declarations rather than DEFINE_TRACE(). This is the sane way
      to do it. The name previously used was misleading.
      
      Updates scheduler instrumentation to follow this API change.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7e066fb8
  30. 13 11月, 2008 1 次提交
  31. 12 11月, 2008 1 次提交
    • S
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt 提交于
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f0d69a9
  32. 17 10月, 2008 1 次提交
    • J
      driver core: basic infrastructure for per-module dynamic debug messages · 346e15be
      Jason Baron 提交于
      Base infrastructure to enable per-module debug messages.
      
      I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
      control of debugging statements on a per-module basis in one /proc file,
      currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
      is not set, debugging statements can still be enabled as before, often by
      defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
      affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.
      
      The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
      is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
      can be dynamically enabled/disabled on a per-module basis.
      
      Future plans include extending this functionality to subsystems, that define 
      their own debug levels and flags.
      
      Usage:
      
      Dynamic debugging is controlled by the debugfs file, 
      <debugfs>/dynamic_printk/modules. This file contains a list of the modules that
      can be enabled. The format of the file is as follows:
      
      	<module_name> <enabled=0/1>
      		.
      		.
      		.
      
      	<module_name> : Name of the module in which the debug call resides
      	<enabled=0/1> : whether the messages are enabled or not
      
      For example:
      
      	snd_hda_intel enabled=0
      	fixup enabled=1
      	driver enabled=0
      
      Enable a module:
      
      	$echo "set enabled=1 <module_name>" > dynamic_printk/modules
      
      Disable a module:
      
      	$echo "set enabled=0 <module_name>" > dynamic_printk/modules
      
      Enable all modules:
      
      	$echo "set enabled=1 all" > dynamic_printk/modules
      
      Disable all modules:
      
      	$echo "set enabled=0 all" > dynamic_printk/modules
      
      Finally, passing "dynamic_printk" at the command line enables
      debugging for all modules. This mode can be turned off via the above
      disable command.
      
      [gkh: minor cleanups and tweaks to make the build work quietly]
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      346e15be
  33. 16 10月, 2008 3 次提交
    • T
      genirq: revert dynarray · d6c88a50
      Thomas Gleixner 提交于
      Revert the dynarray changes. They need more thought and polishing.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d6c88a50
    • Y
      add per_cpu_dyn_array support · 1f3fcd4b
      Yinghai Lu 提交于
      allow dyn-array in per_cpu area, allocated dynamically.
      
      usage:
      
      |  /* in .h */
      | struct kernel_stat {
      |        struct cpu_usage_stat   cpustat;
      |        unsigned int *irqs;
      | };
      |
      |  /* in .c */
      | DEFINE_PER_CPU(struct kernel_stat, kstat);
      |
      | DEFINE_PER_CPU_DYN_ARRAY_ADDR(per_cpu__kstat_irqs, per_cpu__kstat.irqs, sizeof(unsigned int), nr_irqs, sizeof(unsigned long), NULL);
      
      after setup_percpu()/per_cpu_alloc_dyn_array(), the dyn_array in
      per_cpu area is ready to use.
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f3fcd4b
    • Y
      generic: add dyn_array support · 3ddfda11
      Yinghai Lu 提交于
      Allow crazy big arrays via bootmem at init stage.
      Architectures use CONFIG_HAVE_DYN_ARRAY to enable it.
      
      usage:
      
      | static struct irq_desc irq_desc_init __initdata = {
      |        .status = IRQ_DISABLED,
      |        .chip = &no_irq_chip,
      |        .handle_irq = handle_bad_irq,
      |        .depth = 1,
      |        .lock = __SPIN_LOCK_UNLOCKED(irq_desc->lock),
      | #ifdef CONFIG_SMP
      |        .affinity = CPU_MASK_ALL
      | #endif
      | };
      |
      | static void __init init_work(void *data)
      | {
      |        struct dyn_array *da = data;
      |        struct  irq_desc *desc;
      |        int i;
      |
      |        desc = *da->name;
      |
      |        for (i = 0; i < *da->nr; i++)
      |                memcpy(&desc[i], &irq_desc_init, sizeof(struct irq_desc));
      | }
      |
      | struct irq_desc *irq_desc;
      | DEFINE_DYN_ARRAY(irq_desc, sizeof(struct irq_desc), nr_irqs, PAGE_SIZE, init_work);
      
      after pre_alloc_dyn_array() after setup_arch(), the array is ready to be
      used.
      
      Via this facility we can replace irq_desc[NR_IRQS] array with dyn_array
      irq_desc[nr_irqs].
      
      v2: remove _nopanic in pre_alloc_dyn_array()
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3ddfda11