1. 13 12月, 2008 1 次提交
    • R
      cpumask: centralize cpu_online_map and cpu_possible_map · 98a79d6a
      Rusty Russell 提交于
      Impact: cleanup
      
      Each SMP arch defines these themselves.  Move them to a central
      location.
      
      Twists:
      1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
         CONFIG_INIT_ALL_POSSIBLE for this rather than break them.
      
      2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
         Those archs simply have phys_cpu_present_map replaced everywhere.
      
      3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
         so I just manipulate them both in sync.
      
      4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
         declarations.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Reviewed-by: NGrant Grundler <grundler@parisc-linux.org>
      Tested-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Mike Travis <travis@sgi.com>
      Cc: ink@jurassic.park.msu.ru
      Cc: rmk@arm.linux.org.uk
      Cc: starvik@axis.com
      Cc: tony.luck@intel.com
      Cc: takata@linux-m32r.org
      Cc: ralf@linux-mips.org
      Cc: grundler@parisc-linux.org
      Cc: paulus@samba.org
      Cc: schwidefsky@de.ibm.com
      Cc: lethal@linux-sh.org
      Cc: wli@holomorphy.com
      Cc: davem@davemloft.net
      Cc: jdike@addtoit.com
      Cc: mingo@redhat.com
      98a79d6a
  2. 14 11月, 2008 1 次提交
  3. 13 11月, 2008 1 次提交
  4. 02 11月, 2008 1 次提交
  5. 31 10月, 2008 2 次提交
  6. 26 10月, 2008 1 次提交
    • L
      Revert "Call init_workqueues before pre smp initcalls." · 4403b406
      Linus Torvalds 提交于
      This reverts commit a802dd0e by moving
      the call to init_workqueues() back where it belongs - after SMP has been
      initialized.
      
      It also moves stop_machine_init() - which needs workqueues - to a later
      phase using a core_initcall() instead of early_initcall().  That should
      satisfy all ordering requirements, and was apparently the reason why
      init_workqueues() was moved to be too early.
      
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4403b406
  7. 23 10月, 2008 3 次提交
  8. 22 10月, 2008 2 次提交
  9. 21 10月, 2008 1 次提交
    • T
      PCI: allow quirks to be compiled out · 3d137310
      Thomas Petazzoni 提交于
      This patch adds the CONFIG_PCI_QUIRKS option which allows to remove all
      the PCI quirks, which are not necessarily used on embedded systems when
      PCI is working properly. As this is a size-reduction option, it depends
      on CONFIG_EMBEDDED. It allows to save almost 12 kilobytes of kernel
      code:
      
         text	   data	    bss	    dec	    hex	filename
      1287806	 123596	 212992	1624394	 18c94a	vmlinux.old
      1275854	 123596	 212992	1612442	 189a9a	vmlinux
       -11952       0       0  -11952   -2EB0 +/-
      
      This patch has originally been written by Zwane Mwaikambo
      <zwane@arm.linux.org.uk> and is part of the Linux Tiny project.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      3d137310
  10. 20 10月, 2008 2 次提交
    • M
      container freezer: implement freezer cgroup subsystem · dc52ddc0
      Matt Helsley 提交于
      This patch implements a new freezer subsystem in the control groups
      framework.  It provides a way to stop and resume execution of all tasks in
      a cgroup by writing in the cgroup filesystem.
      
      The freezer subsystem in the container filesystem defines a file named
      freezer.state.  Writing "FROZEN" to the state file will freeze all tasks
      in the cgroup.  Subsequently writing "RUNNING" will unfreeze the tasks in
      the cgroup.  Reading will return the current state.
      
      * Examples of usage :
      
         # mkdir /containers/freezer
         # mount -t cgroup -ofreezer freezer  /containers
         # mkdir /containers/0
         # echo $some_pid > /containers/0/tasks
      
      to get status of the freezer subsystem :
      
         # cat /containers/0/freezer.state
         RUNNING
      
      to freeze all tasks in the container :
      
         # echo FROZEN > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         FREEZING
         # cat /containers/0/freezer.state
         FROZEN
      
      to unfreeze all tasks in the container :
      
         # echo RUNNING > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         RUNNING
      
      This is the basic mechanism which should do the right thing for user space
      task in a simple scenario.
      
      It's important to note that freezing can be incomplete.  In that case we
      return EBUSY.  This means that some tasks in the cgroup are busy doing
      something that prevents us from completely freezing the cgroup at this
      time.  After EBUSY, the cgroup will remain partially frozen -- reflected
      by freezer.state reporting "FREEZING" when read.  The state will remain
      "FREEZING" until one of these things happens:
      
      	1) Userspace cancels the freezing operation by writing "RUNNING" to
      		the freezer.state file
      	2) Userspace retries the freezing operation by writing "FROZEN" to
      		the freezer.state file (writing "FREEZING" is not legal
      		and returns EIO)
      	3) The tasks that blocked the cgroup from entering the "FROZEN"
      		state disappear from the cgroup's set of tasks.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: export thaw_process]
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Tested-by: NMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc52ddc0
    • N
      mm: rewrite vmap layer · db64fe02
      Nick Piggin 提交于
      Rewrite the vmap allocator to use rbtrees and lazy tlb flushing, and
      provide a fast, scalable percpu frontend for small vmaps (requires a
      slightly different API, though).
      
      The biggest problem with vmap is actually vunmap.  Presently this requires
      a global kernel TLB flush, which on most architectures is a broadcast IPI
      to all CPUs to flush the cache.  This is all done under a global lock.  As
      the number of CPUs increases, so will the number of vunmaps a scaled
      workload will want to perform, and so will the cost of a global TLB flush.
       This gives terrible quadratic scalability characteristics.
      
      Another problem is that the entire vmap subsystem works under a single
      lock.  It is a rwlock, but it is actually taken for write in all the fast
      paths, and the read locking would likely never be run concurrently anyway,
      so it's just pointless.
      
      This is a rewrite of vmap subsystem to solve those problems.  The existing
      vmalloc API is implemented on top of the rewritten subsystem.
      
      The TLB flushing problem is solved by using lazy TLB unmapping.  vmap
      addresses do not have to be flushed immediately when they are vunmapped,
      because the kernel will not reuse them again (would be a use-after-free)
      until they are reallocated.  So the addresses aren't allocated again until
      a subsequent TLB flush.  A single TLB flush then can flush multiple
      vunmaps from each CPU.
      
      XEN and PAT and such do not like deferred TLB flushing because they can't
      always handle multiple aliasing virtual addresses to a physical address.
      They now call vm_unmap_aliases() in order to flush any deferred mappings.
      That call is very expensive (well, actually not a lot more expensive than
      a single vunmap under the old scheme), however it should be OK if not
      called too often.
      
      The virtual memory extent information is stored in an rbtree rather than a
      linked list to improve the algorithmic scalability.
      
      There is a per-CPU allocator for small vmaps, which amortizes or avoids
      global locking.
      
      To use the per-CPU interface, the vm_map_ram / vm_unmap_ram interfaces
      must be used in place of vmap and vunmap.  Vmalloc does not use these
      interfaces at the moment, so it will not be quite so scalable (although it
      will use lazy TLB flushing).
      
      As a quick test of performance, I ran a test that loops in the kernel,
      linearly mapping then touching then unmapping 4 pages.  Different numbers
      of tests were run in parallel on an 4 core, 2 socket opteron.  Results are
      in nanoseconds per map+touch+unmap.
      
      threads           vanilla         vmap rewrite
      1                 14700           2900
      2                 33600           3000
      4                 49500           2800
      8                 70631           2900
      
      So with a 8 cores, the rewritten version is already 25x faster.
      
      In a slightly more realistic test (although with an older and less
      scalable version of the patch), I ripped the not-very-good vunmap batching
      code out of XFS, and implemented the large buffer mapping with vm_map_ram
      and vm_unmap_ram...  along with a couple of other tricks, I was able to
      speed up a large directory workload by 20x on a 64 CPU system.  I believe
      vmap/vunmap is actually sped up a lot more than 20x on such a system, but
      I'm running into other locks now.  vmap is pretty well blown off the
      profiles.
      
      Before:
      1352059 total                                      0.1401
      798784 _write_lock                              8320.6667 <- vmlist_lock
      529313 default_idle                             1181.5022
       15242 smp_call_function                         15.8771  <- vmap tlb flushing
        2472 __get_vm_area_node                         1.9312  <- vmap
        1762 remove_vm_area                             4.5885  <- vunmap
         316 map_vm_area                                0.2297  <- vmap
         312 kfree                                      0.1950
         300 _spin_lock                                 3.1250
         252 sn_send_IPI_phys                           0.4375  <- tlb flushing
         238 vmap                                       0.8264  <- vmap
         216 find_lock_page                             0.5192
         196 find_next_bit                              0.3603
         136 sn2_send_IPI                               0.2024
         130 pio_phys_write_mmr                         2.0312
         118 unmap_kernel_range                         0.1229
      
      After:
       78406 total                                      0.0081
       40053 default_idle                              89.4040
       33576 ia64_spinlock_contention                 349.7500
        1650 _spin_lock                                17.1875
         319 __reg_op                                   0.5538
         281 _atomic_dec_and_lock                       1.0977
         153 mutex_unlock                               1.5938
         123 iget_locked                                0.1671
         117 xfs_dir_lookup                             0.1662
         117 dput                                       0.1406
         114 xfs_iget_core                              0.0268
          92 xfs_da_hashname                            0.1917
          75 d_alloc                                    0.0670
          68 vmap_page_range                            0.0462 <- vmap
          58 kmem_cache_alloc                           0.0604
          57 memset                                     0.0540
          52 rb_next                                    0.1625
          50 __copy_user                                0.0208
          49 bitmap_find_free_region                    0.2188 <- vmap
          46 ia64_sn_udelay                             0.1106
          45 find_inode_fast                            0.1406
          42 memcmp                                     0.2188
          42 finish_task_switch                         0.1094
          42 __d_lookup                                 0.0410
          40 radix_tree_lookup_slot                     0.1250
          37 _spin_unlock_irqrestore                    0.3854
          36 xfs_bmapi                                  0.0050
          36 kmem_cache_free                            0.0256
          35 xfs_vn_getattr                             0.0322
          34 radix_tree_lookup                          0.1062
          33 __link_path_walk                           0.0035
          31 xfs_da_do_buf                              0.0091
          30 _xfs_buf_find                              0.0204
          28 find_get_page                              0.0875
          27 xfs_iread                                  0.0241
          27 __strncpy_from_user                        0.2812
          26 _xfs_buf_initialize                        0.0406
          24 _xfs_buf_lookup_pages                      0.0179
          24 vunmap_page_range                          0.0250 <- vunmap
          23 find_lock_page                             0.0799
          22 vm_map_ram                                 0.0087 <- vmap
          20 kfree                                      0.0125
          19 put_page                                   0.0330
          18 __kmalloc                                  0.0176
          17 xfs_da_node_lookup_int                     0.0086
          17 _read_lock                                 0.0885
          17 page_waitqueue                             0.0664
      
      vmap has gone from being the top 5 on the profiles and flushing the crap
      out of all TLBs, to using less than 1% of kernel time.
      
      [akpm@linux-foundation.org: cleanups, section fix]
      [akpm@linux-foundation.org: fix build on alpha]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      db64fe02
  11. 17 10月, 2008 4 次提交
    • A
      init/do_mounts_md.c must #include <linux/delay.h> · 73b4a24f
      Adrian Bunk 提交于
      This patch fixes the following compile error caused by commit
      589f800b ("fastboot: make the raid
      autodetect code wait for all devices to init"):
      
          CC      init/do_mounts_md.o
        init/do_mounts_md.c: In function 'autodetect_raid':
        init/do_mounts_md.c:285: error: implicit declaration of function 'msleep'
        make[2]: *** [init/do_mounts_md.o] Error 1
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73b4a24f
    • T
      Configure out AIO support · ebf3f09c
      Thomas Petazzoni 提交于
      This patchs adds the CONFIG_AIO option which allows to remove support
      for asynchronous I/O operations, that are not necessarly used by
      applications, particularly on embedded devices. As this is a
      size-reduction option, it depends on CONFIG_EMBEDDED. It allows to
      save ~7 kilobytes of kernel code/data:
      
         text	   data	    bss	    dec	    hex	filename
      1115067	 119180	 217088	1451335	 162547	vmlinux
      1108025	 119048	 217088	1444161	 160941	vmlinux.new
        -7042    -132       0   -7174   -1C06 +/-
      
      This patch has been originally written by Matt Mackall
      <mpm@selenic.com>, and is part of the Linux Tiny project.
      
      [randy.dunlap@oracle.com: build fix]
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebf3f09c
    • N
      initramfs: add option to preserve mtime from initramfs cpio images · 889d51a1
      Nye Liu 提交于
      When unpacking the cpio into the initramfs, mtimes are not preserved by
      default.  This patch adds an INITRAMFS_PRESERVE_MTIME option that allows
      mtimes stored in the cpio image to be used when constructing the
      initramfs.
      
      For embedded applications that run exclusively out of the initramfs, this
      is invaluable:
      
      When building embedded application initramfs images, its nice to know when
      the files were actually created during the build process - that makes it
      easier to see what files were modified when so we can compare the files
      that are being used on the image with the files used during the build
      process.  This might help (for example) to determine if the target system
      has all the updated files you expect to see w/o having to check MD5s etc.
      
      In our environment, the whole system runs off the initramfs partition, and
      seeing the modified times of the shared libraries (for example) helps us
      find bugs that may have been introduced by the build system incorrectly
      propogating outdated shared libraries into the image.
      
      Similarly, many of the initializion/configuration files in /etc might be
      dynamically built by the build system, and knowing when they were modified
      helps us sanity check whether the target system has the "latest" files
      etc.
      
      Finally, we might use last modified times to determine whether a hot fix
      should be applied or not to the running ramfs.
      Signed-off-by: NNye Liu <nyet@nyet.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      889d51a1
    • G
      identify_ramdisk_image(): correct typo about return value in comment · 93fd85d0
      Geert Uytterhoeven 提交于
      identify_ramdisk_image() returns 0 (not -1) if a gzipped ramdisk is found:
      
      	if (buf[0] == 037 && ((buf[1] == 0213) || (buf[1] == 0236))) {
      		printk(KERN_NOTICE
      		       "RAMDISK: Compressed image found at block %d\n",
      		       start_block);
      		nblocks = 0;
      		^^^^^^^^^^^
      		goto done;
      	}
      
      	...
      
      done:
      	sys_lseek(fd, start_block * BLOCK_SIZE, 0);
      	kfree(buf);
      	return nblocks;
      	^^^^^^^^^^^^^^
      
      Hence correct the typo in the comment, which has existed since the
      addition of compressed ramdisk support in 1.3.48.
      Signed-off-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93fd85d0
  12. 16 10月, 2008 10 次提交
  13. 14 10月, 2008 10 次提交
    • T
      tracing/fastboot: add better resolution to initcall debug/tracing · ca538f6b
      Tim Bird 提交于
      Change the time resolution for initcall_debug to microseconds, from
      milliseconds.  This is handy to determine which initcalls you want to work
      on for faster booting.
      
      One one of my test machines, over 90% of the initcalls are less than a
      millisecond and (without this patch) these are all reported as 0 msecs.
      Working on the 900 us ones is more important than the 4 us ones.
      
      With 'quiet' on the kernel command line, this adds no significant overhead
      to kernel boot time.
      Signed-off-by: NTim Bird <tim.bird@am.sony.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ca538f6b
    • F
      tracing/fastboot: only trace non-module initcalls · 097d036a
      Frederic Weisbecker 提交于
      At this time, only built-in initcalls interest us.
      We can't really produce a relevant graph if we include
      the modules initcall too.
      
      I had good results after this patch (see svg in attachment).
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      097d036a
    • F
      tracing/fastboot: get the initcall name before it disappears · 5601020f
      Frederic Weisbecker 提交于
      After some initcall traces, some initcall names may be inconsistent.
      That's because these functions will disappear from the .init section
      and also their name from the symbols table.
      
      So we have to copy the name of the function in a buffer large enough
      during the trace appending. It is not costly for the ring_buffer because
      the number of initcall entries is commonly not really large.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5601020f
    • F
      tracing/fastboot: change the printing of boot tracer according to bootgraph.pl · cb5ab742
      Frederic Weisbecker 提交于
      Change the boot tracer printing to make it parsable for
      the scripts/bootgraph.pl script.
      
      We have now to output two lines for each initcall, according to the
      printk in do_one_initcall() in init/main.c
      We need now the call's time and the return's time.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cb5ab742
    • F
      tracing/ftrace: launch boot tracing after pre-smp initcalls · 3bf77af6
      Frédéric Weisbecker 提交于
      Launch the boot tracing inside the initcall_debug area. Old printk
      have not been removed to keep the old way of initcall tracing for
      backward compatibility.
      
      [ mingo@elte.hu: resolved conflicts ]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3bf77af6
    • A
      tracing/fastboot: add a script to visualize the kernel boot process / time · aa5d9151
      Arjan van de Ven 提交于
      When optimizing the kernel boot time, it's very valuable to visualize
      what is going on at which time. In addition, with the fastboot asynchronous
      initcall level, it's very valuable to see which initcall gets run where
      and when.
      
      This patch adds a script to turn a dmesg into a SVG graph (that can be
      shown with tools such as InkScape, Gimp or Firefox) and a small change
      to the initcall code to print the PID of the thread calling the initcall
      (so that the script can work out the parallelism).
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      aa5d9151
    • S
      ftrace: mcount call site on boot nops core · 68bf21aa
      Steven Rostedt 提交于
      This is the infrastructure to the converting the mcount call sites
      recorded by the __mcount_loc section into nops on boot. It also allows
      for using these sites to enable tracing as normal. When the __mcount_loc
      section is used, the "ftraced" kernel thread is disabled.
      
      This uses the current infrastructure to record the mcount call sites
      as well as convert them to nops. The mcount function is kept as a stub
      on boot up and not converted to the ftrace_record_ip function. We use the
      ftrace_record_ip to only record from the table.
      
      This patch does not handle modules. That comes with a later patch.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      68bf21aa
    • I
      tracing: clean up tracepoints kconfig structure · 5f87f112
      Ingo Molnar 提交于
      do not expose users to CONFIG_TRACEPOINTS - tracers can select it
      just fine.
      
      update ftrace to select CONFIG_TRACEPOINTS.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5f87f112
    • I
      tracing: disable tracepoints by default · fa340d9c
      Ingo Molnar 提交于
      while it's arguably low overhead, we dont enable new features by default.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa340d9c
    • M
      tracing: Kernel Tracepoints · 97e1c18e
      Mathieu Desnoyers 提交于
      Implementation of kernel tracepoints. Inspired from the Linux Kernel
      Markers. Allows complete typing verification by declaring both tracing
      statement inline functions and probe registration/unregistration static
      inline functions within the same macro "DEFINE_TRACE". No format string
      is required. See the tracepoint Documentation and Samples patches for
      usage examples.
      
      Taken from the documentation patch :
      
      "A tracepoint placed in code provides a hook to call a function (probe)
      that you can provide at runtime. A tracepoint can be "on" (a probe is
      connected to it) or "off" (no probe is attached). When a tracepoint is
      "off" it has no effect, except for adding a tiny time penalty (checking
      a condition for a branch) and space penalty (adding a few bytes for the
      function call at the end of the instrumented function and adds a data
      structure in a separate section).  When a tracepoint is "on", the
      function you provide is called each time the tracepoint is executed, in
      the execution context of the caller. When the function provided ends its
      execution, it returns to the caller (continuing from the tracepoint
      site).
      
      You can put tracepoints at important locations in the code. They are
      lightweight hooks that can pass an arbitrary number of parameters, which
      prototypes are described in a tracepoint declaration placed in a header
      file."
      
      Addition and removal of tracepoints is synchronized by RCU using the
      scheduler (and preempt_disable) as guarantees to find a quiescent state
      (this is really RCU "classic"). The update side uses rcu_barrier_sched()
      with call_rcu_sched() and the read/execute side uses
      "preempt_disable()/preempt_enable()".
      
      We make sure the previous array containing probes, which has been
      scheduled for deletion by the rcu callback, is indeed freed before we
      proceed to the next update. It therefore limits the rate of modification
      of a single tracepoint to one update per RCU period. The objective here
      is to permit fast batch add/removal of probes on _different_
      tracepoints.
      
      Changelog :
      - Use #name ":" #proto as string to identify the tracepoint in the
        tracepoint table. This will make sure not type mismatch happens due to
        connexion of a probe with the wrong type to a tracepoint declared with
        the same name in a different header.
      - Add tracepoint_entry_free_old.
      - Change __TO_TRACE to get rid of the 'i' iterator.
      
      Masami Hiramatsu <mhiramat@redhat.com> :
      Tested on x86-64.
      
      Performance impact of a tracepoint : same as markers, except that it
      adds about 70 bytes of instructions in an unlikely branch of each
      instrumented function (the for loop, the stack setup and the function
      call). It currently adds a memory read, a test and a conditional branch
      at the instrumentation site (in the hot path). Immediate values will
      eventually change this into a load immediate, test and branch, which
      removes the memory read which will make the i-cache impact smaller
      (changing the memory read for a load immediate removes 3-4 bytes per
      site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
      also saves the d-cache hit).
      
      About the performance impact of tracepoints (which is comparable to
      markers), even without immediate values optimizations, tests done by
      Hideo Aoki on ia64 show no regression. His test case was using hackbench
      on a kernel where scheduler instrumentation (about 5 events in code
      scheduler code) was added.
      
      Quoting Hideo Aoki about Markers :
      
      I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
      tree, which includes several markers for LTTng, using an ia64 server.
      
      While the immediate trace mark feature isn't implemented on ia64, there
      is no major performance regression. So, I think that we don't have any
      issues to propose merging marker point patches into Linus's tree from
      the viewpoint of performance impact.
      
      I prepared two kernels to evaluate. The first one was compiled without
      CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
      
      I downloaded the original hackbench from the following URL:
      http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
      
      I ran hackbench 5 times in each condition and calculated the average and
      difference between the kernels.
      
          The parameter of hackbench: every 50 from 50 to 800
          The number of CPUs of the server: 2, 4, and 8
      
      Below is the results. As you can see, major performance regression
      wasn't found in any case. Even if number of processes increases,
      differences between marker-enabled kernel and marker- disabled kernel
      doesn't increase. Moreover, if number of CPUs increases, the differences
      doesn't increase either.
      
      Curiously, marker-enabled kernel is better than marker-disabled kernel
      in more than half cases, although I guess it comes from the difference
      of memory access pattern.
      
      * 2 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      4.811   |       4.872  |  +0.061  |  +1.27  |
            100 |      9.854   |      10.309  |  +0.454  |  +4.61  |
            150 |     15.602   |      15.040  |  -0.562  |  -3.6   |
            200 |     20.489   |      20.380  |  -0.109  |  -0.53  |
            250 |     25.798   |      25.652  |  -0.146  |  -0.56  |
            300 |     31.260   |      30.797  |  -0.463  |  -1.48  |
            350 |     36.121   |      35.770  |  -0.351  |  -0.97  |
            400 |     42.288   |      42.102  |  -0.186  |  -0.44  |
            450 |     47.778   |      47.253  |  -0.526  |  -1.1   |
            500 |     51.953   |      52.278  |  +0.325  |  +0.63  |
            550 |     58.401   |      57.700  |  -0.701  |  -1.2   |
            600 |     63.334   |      63.222  |  -0.112  |  -0.18  |
            650 |     68.816   |      68.511  |  -0.306  |  -0.44  |
            700 |     74.667   |      74.088  |  -0.579  |  -0.78  |
            750 |     78.612   |      79.582  |  +0.970  |  +1.23  |
            800 |     85.431   |      85.263  |  -0.168  |  -0.2   |
      --------------------------------------------------------------
      
      * 4 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.586   |       2.584  |  -0.003  |  -0.1   |
            100 |      5.254   |       5.283  |  +0.030  |  +0.56  |
            150 |      8.012   |       8.074  |  +0.061  |  +0.76  |
            200 |     11.172   |      11.000  |  -0.172  |  -1.54  |
            250 |     13.917   |      14.036  |  +0.119  |  +0.86  |
            300 |     16.905   |      16.543  |  -0.362  |  -2.14  |
            350 |     19.901   |      20.036  |  +0.135  |  +0.68  |
            400 |     22.908   |      23.094  |  +0.186  |  +0.81  |
            450 |     26.273   |      26.101  |  -0.172  |  -0.66  |
            500 |     29.554   |      29.092  |  -0.461  |  -1.56  |
            550 |     32.377   |      32.274  |  -0.103  |  -0.32  |
            600 |     35.855   |      35.322  |  -0.533  |  -1.49  |
            650 |     39.192   |      38.388  |  -0.804  |  -2.05  |
            700 |     41.744   |      41.719  |  -0.025  |  -0.06  |
            750 |     45.016   |      44.496  |  -0.520  |  -1.16  |
            800 |     48.212   |      47.603  |  -0.609  |  -1.26  |
      --------------------------------------------------------------
      
      * 8 CPUs
      
      Number of | without      | with         | diff     | diff    |
      processes | Marker [Sec] | Marker [Sec] |   [Sec]  |   [%]   |
      --------------------------------------------------------------
             50 |      2.094   |       2.072  |  -0.022  |  -1.07  |
            100 |      4.162   |       4.273  |  +0.111  |  +2.66  |
            150 |      6.485   |       6.540  |  +0.055  |  +0.84  |
            200 |      8.556   |       8.478  |  -0.078  |  -0.91  |
            250 |     10.458   |      10.258  |  -0.200  |  -1.91  |
            300 |     12.425   |      12.750  |  +0.325  |  +2.62  |
            350 |     14.807   |      14.839  |  +0.032  |  +0.22  |
            400 |     16.801   |      16.959  |  +0.158  |  +0.94  |
            450 |     19.478   |      19.009  |  -0.470  |  -2.41  |
            500 |     21.296   |      21.504  |  +0.208  |  +0.98  |
            550 |     23.842   |      23.979  |  +0.137  |  +0.57  |
            600 |     26.309   |      26.111  |  -0.198  |  -0.75  |
            650 |     28.705   |      28.446  |  -0.259  |  -0.9   |
            700 |     31.233   |      31.394  |  +0.161  |  +0.52  |
            750 |     34.064   |      33.720  |  -0.344  |  -1.01  |
            800 |     36.320   |      36.114  |  -0.206  |  -0.57  |
      --------------------------------------------------------------
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: N'Peter Zijlstra' <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97e1c18e
  14. 12 10月, 2008 1 次提交