1. 10 2月, 2012 1 次提交
    • D
      Reduce the number of expensive division instructions done by _parse_integer() · 690d137f
      David Howells 提交于
      _parse_integer() does one or two division instructions (which are slow)
      per digit parsed to perform the overflow check.
      
      Furthermore, these are particularly expensive examples of division
      instruction as the number of clock cycles required to complete them may
      go up with the position of the most significant set bit in the dividend:
      
      	if (*res > div_u64(ULLONG_MAX - val, base))
      
      which is as maximal as possible.
      
      Worse, on 32-bit arches, more than one of these division instructions
      may be required per digit.
      
      So, assuming we don't support a base of more than 16, skip the check if the
      top nibble of the result is not set at this point.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      [ Changed it to not dereference the pointer all the time - even if the
        compiler can and does optimize it away, the code just looks cleaner.
        And edited the top nybble test slightly to make the code generated on
        x86-64 better in the loop - test against a hoisted constant instead of
        shifting and testing the result ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      690d137f
  2. 02 2月, 2012 2 次提交
  3. 01 2月, 2012 12 次提交
  4. 27 1月, 2012 1 次提交
    • P
      bugs, x86: Fix printk levels for panic, softlockups and stack dumps · b0f4c4b3
      Prarit Bhargava 提交于
      rsyslog will display KERN_EMERG messages on a connected
      terminal.  However, these messages are useless/undecipherable
      for a general user.
      
      For example, after a softlockup we get:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
       kernel:Stack:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
       kernel:Call Trace:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
       kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89
       d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <e8> ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89
      
      This happens because the printk levels for these messages are
      incorrect. Only an informational message should be displayed on
      a terminal.
      
      I modified the printk levels for various messages in the kernel
      and tested the output by using the drivers/misc/lkdtm.c kernel
      modules (ie, softlockups, panics, hard lockups, etc.) and
      confirmed that the console output was still the same and that
      the output to the terminals was correct.
      
      For example, in the case of a softlockup we now see the much
      more informative:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ...
       BUG: soft lockup - CPU4 stuck for 60s!
      
      instead of the above confusing messages.
      
      AFAICT, the messages no longer have to be KERN_EMERG.  In the
      most important case of a panic we set console_verbose().  As for
      the other less severe cases the correct data is output to the
      console and /var/log/messages.
      
      Successfully tested by me using the drivers/misc/lkdtm.c module.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: dzickus@redhat.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b0f4c4b3
  5. 19 1月, 2012 1 次提交
  6. 18 1月, 2012 4 次提交
  7. 13 1月, 2012 2 次提交
    • S
      unlzo: fix input buffer free · 35f15268
      Sascha Hauer 提交于
      unlzo modifies the pointer to in_buf, so we have to free the original
      buffer, not the modified pointer.
      Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de>
      Cc: Lasse Collin <lasse.collin@tukaani.org>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35f15268
    • H
      radix_tree: take radix_tree_path off stack · e2bdb933
      Hugh Dickins 提交于
      Down, down in the deepest depths of GFP_NOIO page reclaim, we have
      shrink_page_list() calling __remove_mapping() calling __delete_from_
      swap_cache() or __delete_from_page_cache().
      
      You would not expect those to need much stack, but in fact they call
      radix_tree_delete(): which declares a 192-byte radix_tree_path array on
      its stack (to record the node,offsets it visits when descending, in case
      it needs to ascend to update them).  And if any tag is still set [1],
      that calls radix_tree_tag_clear(), which declares a further such
      192-byte radix_tree_path array on the stack.  (At least we have
      interrupts disabled here, so won't then be pushing registers too.)
      
      That was probably a good choice when most users were 32-bit (array of
      half the size), and adding fields to radix_tree_node would have bloated
      it unnecessarily.  But nowadays many are 64-bit, and each
      radix_tree_node contains a struct rcu_head, which is only used when
      freeing; whereas the radix_tree_path info is only used for updating the
      tree (deleting, clearing tags or setting tags if tagged) when a lock
      must be held, of no interest when accessing the tree locklessly.
      
      So add a parent pointer to the radix_tree_node, in union with the
      rcu_head, and remove all uses of the radix_tree_path.  There would be
      space in that union to save the offset when descending as before (we can
      argue that a lock must already be held to exclude other users), but
      recalculating it when ascending is both easy (a constant shift and a
      constant mask) and uncommon, so it seems better just to do that.
      
      Two little optimizations: no need to decrement height when descending,
      adjusting shift is enough; and once radix_tree_tag_if_tagged() has set
      tag on a node and its ancestors, it need not ascend from that node
      again.
      
      perf on the radix tree test harness reports radix_tree_insert() as 2%
      slower (now having to set parent), but radix_tree_delete() 24% faster.
      Surely that's an exaggeration from rtth's artificially low map shift 3,
      but forcing it back to 6 still rates radix_tree_delete() 8% faster.
      
      [1] Can a pagecache tag (dirty, writeback or towrite) actually still be
      set at the time of radix_tree_delete()? Perhaps not if the filesystem is
      well-behaved.  But although I've not tracked any stack overflow down to
      this cause, I have observed a curious case in which a dirty tag is set
      and left set on tmpfs: page migration's migrate_page_copy() happens to
      use __set_page_dirty_nobuffers() to set PageDirty on the newpage, and
      that sets PAGECACHE_TAG_DIRTY as a side-effect - harmless to a
      filesystem which doesn't use tags, except for this stack depth issue.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nai Xia <nai.xia@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2bdb933
  8. 11 1月, 2012 2 次提交
  9. 07 1月, 2012 1 次提交
  10. 04 1月, 2012 1 次提交
  11. 22 12月, 2011 1 次提交
  12. 14 12月, 2011 1 次提交
  13. 10 12月, 2011 1 次提交
  14. 09 12月, 2011 1 次提交
    • E
      sch_red: Adaptative RED AQM · 8af2a218
      Eric Dumazet 提交于
      Adaptative RED AQM for linux, based on paper from Sally FLoyd,
      Ramakrishna Gummadi, and Scott Shenker, August 2001 :
      
      http://icir.org/floyd/papers/adaptiveRed.pdf
      
      Goal of Adaptative RED is to make max_p a dynamic value between 1% and
      50% to reach the target average queue : (max_th - min_th) / 2
      
      Every 500 ms:
       if (avg > target and max_p <= 0.5)
        increase max_p : max_p += alpha;
       else if (avg < target and max_p >= 0.01)
        decrease max_p : max_p *= beta;
      
      target :[min_th + 0.4*(min_th - max_th),
                min_th + 0.6*(min_th - max_th)].
      alpha : min(0.01, max_p / 4)
      beta : 0.9
      max_P is a Q0.32 fixed point number (unsigned, with 32 bits mantissa)
      
      Changes against our RED implementation are :
      
      max_p is no longer a negative power of two (1/(2^Plog)), but a Q0.32
      fixed point number, to allow full range described in Adatative paper.
      
      To deliver a random number, we now use a reciprocal divide (thats really
      a multiply), but this operation is done once per marked/droped packet
      when in RED_BETWEEN_TRESH window, so added cost (compared to previous
      AND operation) is near zero.
      
      dump operation gives current max_p value in a new TCA_RED_MAX_P
      attribute.
      
      Example on a 10Mbit link :
      
      tc qdisc add dev $DEV parent 1:1 handle 10: est 1sec 8sec red \
         limit 400000 min 30000 max 90000 avpkt 1000 \
         burst 55 ecn adaptative bandwidth 10Mbit
      
      # tc -s -d qdisc show dev eth3
      ...
      qdisc red 10: parent 1:1 limit 400000b min 30000b max 90000b ecn
      adaptative ewma 5 max_p=0.113335 Scell_log 15
       Sent 50414282 bytes 34504 pkt (dropped 35, overlimits 1392 requeues 0)
       rate 9749Kbit 831pps backlog 72056b 16p requeues 0
        marked 1357 early 35 pdrop 0 other 0
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8af2a218
  15. 07 12月, 2011 1 次提交
  16. 06 12月, 2011 3 次提交
  17. 03 12月, 2011 1 次提交
  18. 30 11月, 2011 1 次提交
    • T
      dql: Dynamic queue limits · 75957ba3
      Tom Herbert 提交于
      Implementation of dynamic queue limits (dql).  This is a libary which
      allows a queue limit to be dynamically managed.  The goal of dql is
      to set the queue limit, number of objects to the queue, to be minimized
      without allowing the queue to be starved.
      
      dql would be used with a queue which has these properties:
      
      1) Objects are queued up to some limit which can be expressed as a
         count of objects.
      2) Periodically a completion process executes which retires consumed
         objects.
      3) Starvation occurs when limit has been reached, all queued data has
         actually been consumed but completion processing has not yet run,
         so queuing new data is blocked.
      4) Minimizing the amount of queued data is desirable.
      
      A canonical example of such a queue would be a NIC HW transmit queue.
      
      The queue limit is dynamic, it will increase or decrease over time
      depending on the workload.  The queue limit is recalculated each time
      completion processing is done.  Increases occur when the queue is
      starved and can exponentially increase over successive intervals.
      Decreases occur when more data is being maintained in the queue than
      needed to prevent starvation.  The number of extra objects, or "slack",
      is measured over successive intervals, and to avoid hysteresis the
      limit is only reduced by the miminum slack seen over a configurable
      time period.
      
      dql API provides routines to manage the queue:
      - dql_init is called to intialize the dql structure
      - dql_reset is called to reset dynamic values
      - dql_queued called when objects are being enqueued
      - dql_avail returns availability in the queue
      - dql_completed is called when objects have be consumed in the queue
      
      Configuration consists of:
      - max_limit, maximum limit
      - min_limit, minimum limit
      - slack_hold_time, time to measure instances of slack before reducing
        queue limit
      Signed-off-by: NTom Herbert <therbert@google.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75957ba3
  19. 29 11月, 2011 1 次提交
  20. 25 11月, 2011 1 次提交
  21. 24 11月, 2011 1 次提交