1. 17 7月, 2007 2 次提交
    • D
      vsprintf.c: optimizing, part 2: base 10 conversion speedup, v2 · 4277eedd
      Denis Vlasenko 提交于
      Optimize integer-to-string conversion in vsprintf.c for base 10.  This is
      by far the most used conversion, and in some use cases it impacts
      performance.  For example, top reads /proc/$PID/stat for every process, and
      with 4000 processes decimal conversion alone takes noticeable time.
      
      Using code from
      
      http://www.cs.uiowa.edu/~jones/bcd/decimal.html
      (with permission from the author, Douglas W. Jones)
      
      binary-to-decimal-string conversion is done in groups of five digits at
      once, using only additions/subtractions/shifts (with -O2; -Os throws in
      some multiply instructions).
      
      On i386 arch gcc 4.1.2 -O2 generates ~500 bytes of code.
      
      This patch is run tested. Userspace benchmark/test is also attached.
      I tested it on PIII and AMD64 and new code is generally ~2.5 times
      faster. On AMD64:
      
      # ./vsprintf_verify-O2
      Original decimal conv: .......... 151 ns per iteration
      Patched decimal conv:  .......... 62 ns per iteration
      Testing correctness
      12895992590592 ok...        [Ctrl-C]
      # ./vsprintf_verify-O2
      Original decimal conv: .......... 151 ns per iteration
      Patched decimal conv:  .......... 62 ns per iteration
      Testing correctness
      26025406464 ok...        [Ctrl-C]
      
      More realistic test: top from busybox project was modified to
      report how many us it took to scan /proc (this does not account
      any processing done after that, like sorting process list),
      and then I test it with 4000 processes:
      
      #!/bin/sh
      i=4000
      while test $i != 0; do
          sleep 30 &
          let i--
      done
      busybox top -b -n3 >/dev/null
      
      on unpatched kernel:
      
      top: 4120 processes took 102864 microseconds to scan
      top: 4120 processes took 91757 microseconds to scan
      top: 4120 processes took 92517 microseconds to scan
      top: 4120 processes took 92581 microseconds to scan
      
      on patched kernel:
      
      top: 4120 processes took 75460 microseconds to scan
      top: 4120 processes took 66451 microseconds to scan
      top: 4120 processes took 67267 microseconds to scan
      top: 4120 processes took 67618 microseconds to scan
      
      The speedup comes from much faster generation of /proc/PID/stat
      by sprintf() calls inside the kernel.
      Signed-off-by: NDouglas W Jones <jones@cs.uiowa.edu>
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4277eedd
    • D
      vsprintf.c: optimizing, part 1 (easy and obvious stuff) · b39a7340
      Denis Vlasenko 提交于
      * There is no point in having full "0...9a...z" constant vector,
        if we use only "0...9a...f" (and "x" for "0x").
      
      * Post-decrement usually needs a few more instructions, so use
        pre decrement instead where makes sense:
      -       while (i < precision--) {
      +       while (i <= --precision) {
      
      * if base != 10 (=> base 8 or 16), we can avoid using division
        in a loop and use mask/shift, obtaining much faster conversion.
        (More complex optimization for base 10 case is in the second patch).
      
      Overall, size vsprintf.o shows ~80 bytes smaller text section
      with this patch applied.
      Signed-off-by: NDouglas W Jones <jones@cs.uiowa.edu>
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b39a7340
  2. 09 5月, 2007 1 次提交
  3. 01 5月, 2007 1 次提交
  4. 13 2月, 2007 1 次提交
  5. 12 2月, 2007 1 次提交
  6. 29 6月, 2006 1 次提交
  7. 26 6月, 2006 2 次提交
  8. 31 10月, 2005 1 次提交
    • T
      [PATCH] fix missing includes · 4e57b681
      Tim Schmielau 提交于
      I recently picked up my older work to remove unnecessary #includes of
      sched.h, starting from a patch by Dave Jones to not include sched.h
      from module.h. This reduces the number of indirect includes of sched.h
      by ~300. Another ~400 pointless direct includes can be removed after
      this disentangling (patch to follow later).
      However, quite a few indirect includes need to be fixed up for this.
      
      In order to feed the patches through -mm with as little disturbance as
      possible, I've split out the fixes I accumulated up to now (complete for
      i386 and x86_64, more archs to follow later) and post them before the real
      patch.  This way this large part of the patch is kept simple with only
      adding #includes, and all hunks are independent of each other.  So if any
      hunk rejects or gets in the way of other patches, just drop it.  My scripts
      will pick it up again in the next round.
      Signed-off-by: NTim Schmielau <tim@physik3.uni-rostock.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4e57b681
  9. 24 8月, 2005 1 次提交
  10. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4