1. 02 7月, 2012 1 次提交
  2. 21 6月, 2012 1 次提交
  3. 08 6月, 2012 2 次提交
  4. 06 6月, 2012 2 次提交
  5. 01 6月, 2012 5 次提交
    • D
      vsprintf: further optimize decimal conversion · 133fd9f5
      Denys Vlasenko 提交于
      Previous code was using optimizations which were developed to work well
      even on narrow-word CPUs (by today's standards).  But Linux runs only on
      32-bit and wider CPUs.  We can use that.
      
      First: using 32x32->64 multiply and trivial 32-bit shift, we can correctly
      divide by 10 much larger numbers, and thus we can print groups of 9 digits
      instead of groups of 5 digits.
      
      Next: there are two algorithms to print larger numbers.  One is generic:
      divide by 1000000000 and repeatedly print groups of (up to) 9 digits.
      It's conceptually simple, but requires an (unsigned long long) /
      1000000000 division.
      
      Second algorithm splits 64-bit unsigned long long into 16-bit chunks,
      manipulates them cleverly and generates groups of 4 decimal digits.  It so
      happens that it does NOT require long long division.
      
      If long is > 32 bits, division of 64-bit values is relatively easy, and we
      will use the first algorithm.  If long long is > 64 bits (strange
      architecture with VERY large long long), second algorithm can't be used,
      and we again use the first one.
      
      Else (if long is 32 bits and long long is 64 bits) we use second one.
      
      And third: there is a simple optimization which takes fast path not only
      for zero as was done before, but for all one-digit numbers.
      
      In all tested cases new code is faster than old one, in many cases by 30%,
      in few cases by more than 50% (for example, on x86-32, conversion of
      12345678).  Code growth is ~0 in 32-bit case and ~130 bytes in 64-bit
      case.
      
      This patch is based upon an original from Michal Nazarewicz.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NMichal Nazarewicz <mina86@mina86.com>
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Cc: Douglas W Jones <jones@cs.uiowa.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      133fd9f5
    • G
      vsprintf: correctly handle width when '#' flag used in %#p format · 725fe002
      Grant Likely 提交于
      The '%p' output of the kernel's vsprintf() uses spec.field_width to
      determine how many digits to output based on 2 * sizeof(void*) so that all
      digits of a pointer are shown.  ie.  a pointer will be output as
      "001A2B3C" instead of "1A2B3C".  However, if the '#' flag is used in the
      format (%#p), then the code doesn't take into account the width of the
      '0x' prefix and will end up outputing "0x1A2B3C" instead of "0x001A2B3C".
      
      This patch reworks the "pointer()" format hook to include 2 characters for
      the '0x' prefix if the '#' flag is included.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      725fe002
    • H
      bql: Avoid possible inconsistent calculation. · 914bec10
      Hiroaki SHIMODA 提交于
      dql->num_queued could change while processing dql_completed().
      To provide consistent calculation, added an on stack variable.
      Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Denys Fedoryshchenko <denys@visp.net.lb>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      914bec10
    • H
      bql: Avoid unneeded limit decrement. · 25426b79
      Hiroaki SHIMODA 提交于
      When below pattern is observed,
      
                                                     TIME
             dql_queued()         dql_completed()     |
            a) initial state                          |
                                                      |
            b) X bytes queued                         V
      
            c) Y bytes queued
                                 d) X bytes completed
            e) Z bytes queued
                                 f) Y bytes completed
      
      a) dql->limit has already some value and there is no in-flight packet.
      b) X bytes queued.
      c) Y bytes queued and excess limit.
      d) X bytes completed and dql->prev_ovlimit is set and also
         dql->prev_num_queued is set Y.
      e) Z bytes queued.
      f) Y bytes completed. inprogress and prev_inprogress are true.
      
      At f), according to the comment, all_prev_completed becomes
      true and limit should be increased. But POSDIFF() ignores
      (completed == dql->prev_num_queued) case, so limit is decreased.
      Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Denys Fedoryshchenko <denys@visp.net.lb>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25426b79
    • H
      bql: Fix POSDIFF() to integer overflow aware. · 0cfd32b7
      Hiroaki SHIMODA 提交于
      POSDIFF() fails to take into account integer overflow case.
      Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Denys Fedoryshchenko <denys@visp.net.lb>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cfd32b7
  6. 30 5月, 2012 9 次提交
  7. 28 5月, 2012 2 次提交
  8. 27 5月, 2012 2 次提交
    • L
      lib: add generic strnlen_user() function · a08c5356
      Linus Torvalds 提交于
      This adds a new generic optimized strnlen_user() function that uses the
      <asm/word-at-a-time.h> infrastructure to portably do efficient string
      handling.
      
      In many ways, strnlen is much simpler than strncpy, and in particular we
      can always pre-align the words we load from memory.  That means that all
      the worries about alignment etc are a non-issue, so this one can easily
      be used on any architecture.  You obviously do have to do the
      appropriate word-at-a-time.h macros.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a08c5356
    • L
      word-at-a-time: make the interfaces truly generic · 36126f8f
      Linus Torvalds 提交于
      This changes the interfaces in <asm/word-at-a-time.h> to be a bit more
      complicated, but a lot more generic.
      
      In particular, it allows us to really do the operations efficiently on
      both little-endian and big-endian machines, pretty much regardless of
      machine details.  For example, if you can rely on a fast population
      count instruction on your architecture, this will allow you to make your
      optimized <asm/word-at-a-time.h> file with that.
      
      NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
      not truly generic, it actually only works on big-endian.  Why? Because
      on little-endian the generic algorithms are wasteful, since you can
      inevitably do better. The x86 implementation is an example of that.
      
      (The only truly non-generic part of the asm-generic implementation is
      the "find_zero()" function, and you could make a little-endian version
      of it.  And if the Kbuild infrastructure allowed us to pick a particular
      header file, that would be lovely)
      
      The <asm/word-at-a-time.h> functions are as follows:
      
       - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
         uses.
      
       - has_zero(): take a word, and determine if it has a zero byte in it.
         It gets the word, the pointer to the constant pool, and a pointer to
         an intermediate "data" field it can set.
      
         This is the "quick-and-dirty" zero tester: it's what is run inside
         the hot loops.
      
       - "prep_zero_mask()": take the word, the data that has_zero() produced,
         and the constant pool, and generate an *exact* mask of which byte had
         the first zero.  This is run directly *outside* the loop, and allows
         the "has_zero()" function to answer the "is there a zero byte"
         question without necessarily getting exactly *which* byte is the
         first one to contain a zero.
      
         If you do multiple byte lookups concurrently (eg "hash_name()", which
         looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
         phase, the result of those can be or'ed together to get the "either
         or" case.
      
       - The result from "prep_zero_mask()" can then be fed into "find_zero()"
         (to find the byte offset of the first byte that was zero) or into
         "zero_bytemask()" (to find the bytemask of the bytes preceding the
         zero byte).
      
         The existence of zero_bytemask() is optional, and is not necessary
         for the normal string routines.  But dentry name hashing needs it, so
         if you enable DENTRY_WORD_AT_A_TIME you need to expose it.
      
      This changes the generic strncpy_from_user() function and the dentry
      hashing functions to use these modified word-at-a-time interfaces.  This
      gets us back to the optimized state of the x86 strncpy that we lost in
      the previous commit when moving over to the generic version.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36126f8f
  9. 25 5月, 2012 1 次提交
  10. 22 5月, 2012 4 次提交
  11. 17 5月, 2012 1 次提交
  12. 10 5月, 2012 1 次提交
  13. 08 5月, 2012 2 次提交
  14. 07 5月, 2012 1 次提交
  15. 05 5月, 2012 1 次提交
  16. 02 5月, 2012 1 次提交
  17. 01 5月, 2012 4 次提交
    • J
      dynamic_debug: init with early_initcall, not arch_initcall · 3ec5652a
      Jim Cromie 提交于
      1- Call dynamic_debug_init() from early_initcall, not arch_initcall.
      2- Call dynamic_debug_init_debugfs() from fs_initcall, not module_init.
      
      RFC: This works for me on a 64 bit desktop and a i586 SBC, but is
      untested on other arches.  I presume there is or was a reason
      original code used arch_initcall, maybe the constraints have changed.
      
      This makes facility available as soon as possible.
      
      2nd change has a downside when dynamic_debug.verbose=1; all the
      vpr_info()s called in the proc-fs code are activated, causing
      voluminous output from dmesg.  TBD: Im unsure of this explanation, but
      the output is there.  This could be fixed by changing those callsites
      to v2pr_info(if verbose > 1).
      
      1st change is still not early enough to enable pr_debugs in
      kernel/params, so parsing of boot-args isnt logged.  The reparse of
      those args is however visible after params.dyndbg="+p" is processed.
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ec5652a
    • J
      dynamic_debug: update Documentation/*, Kconfig.debug · 29e36c9f
      Jim Cromie 提交于
      In dynamic-debug-howto.txt:
      
      - add section: Debug Messages at Module Initialization Time
      - update flags indicators in example outputs to include '='
      - make flags descriptions tabular
      - add item on '_' flag-char
      - add dyndbg, boot-args examples
      - rewrap some paragraphs with long lines
      
      In Kconfig.debug, note that compiling with -DDEBUG enables all
      pr_debug()s in that code.
      
      In kernel-parameters.txt, add dyndbg and module.dyndbg items,
      and deprecate ddebug_query.
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29e36c9f
    • J
      dynamic_debug: add modname arg to exec_query callchain · 8e59b5cf
      Jim Cromie 提交于
      Pass module name into ddebug_exec_queries(), ddebug_exec_query(), and
      ddebug_parse_query() as separate parameter.  In ddebug_parse_query(),
      the module name is added into the query struct before the query-string
      is parsed.  This allows the query-string to be shorter:
      
      instead of:
         $modname.dyndbg="module $modname +fp"
      do this:
         $modname.dyndbg="+fp"
      
      Omitting "module $modname" from the query string is actually required
      for $modname.dyndbg rules; the set-only-once check added in a previous
      patch will throw an error if its added again.  ddebug_query="..." has
      no $modname associated with it, so the query string may include it.
      
      This also fixes redundant "module $modname" otherwise needed to handle
      multiple queries per string:
      
         $modname.dyndbg="func foo +fp; func bar +fp"
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8e59b5cf
    • J
      dynamic_debug: print ram usage by ddebug tables if verbose · 41076927
      Jim Cromie 提交于
      Print ram usage of dynamic-debug tables and verbose section so user
      knows cost of enabling CONFIG_DYNAMIC_DEBUG.  This only counts the
      size of the _ddebug tables for builtins and the __verbose section that
      they refer to, not those used in loadable modules.
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41076927