1. 07 3月, 2010 1 次提交
    • D
      lib: more scalable list_sort() · 835cc0c8
      Don Mullis 提交于
      XFS and UBIFS can pass long lists to list_sort(); this alternative
      implementation scales better, reaching ~3x performance gain when list
      length exceeds the L2 cache size.
      
      Stand-alone program timings were run on a Core 2 duo L1=32KB L2=4MB,
      gcc-4.4, with flags extracted from an Ubuntu kernel build.  Object size is
      581 bytes compared to 455 for Mark J.  Roberts' code.
      
      Worst case for either implementation is a list length just over a power of
      two, and to roughly the same degree, so here are timing results for a
      range of 2^N+1 lengths.  List elements were 16 bytes each including malloc
      overhead; initial order was random.
      
                            time (msec)
                            Tatham-Roberts
                            |       generic-Mullis-v2
      loop_count  length    |       |    ratio
      4000000       2     206     294    1.427
      2000000       3     176     227    1.289
      1000000       5     199     172    0.864
       500000       9     235     178    0.757
       250000      17     243     182    0.748
       125000      33     261     196    0.750
        62500      65     277     209    0.754
        31250     129     292     219    0.75
        15625     257     317     235    0.741
         7812     513     340     252    0.741
         3906    1025     362     267    0.737
         1953    2049     388     283    0.729  ~ L1 size
          976    4097     556     323    0.580
          488    8193     678     361    0.532
          244   16385     773     395    0.510
          122   32769     844     418    0.495
           61   65537     917     454    0.495
           30  131073    1128     543    0.481
           15  262145    2355     869    0.369  ~ L2 size
            7  524289    5597    1714    0.306
            3 1048577    6218    2022    0.325
      
      Mark's code does not actually implement the usual or generic mergesort,
      but rather a variant from Simon Tatham described here:
      
          http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html
      
      Simon's algorithm performs O(log N) passes over the entire input list,
      doing merges of sublists that double in size on each pass.  The generic
      algorithm instead merges pairs of equal length lists as early as possible,
      in recursive order.  For either algorithm, the elements that extend the
      list beyond power-of-two length are a special case, handled as nearly as
      possible as a "rounding-up" to a full POT.
      
      Some intuition for the locality of reference implications of merge order
      may be gotten by watching this animation:
      
          http://www.sorting-algorithms.com/merge-sort
      
      Simon's algorithm requires only O(1) extra space rather than the generic
      algorithm's O(log N), but in my non-recursive implementation the actual
      O(log N) data is merely a vector of ~20 pointers, which I've put on the
      stack.
      
      Long-running list_sort() calls: If the list passed in may be long, or the
      client's cmp() callback function is slow, the client's cmp() may
      periodically invoke cond_resched() to voluntarily yield the CPU.  All
      inner loops of list_sort() call back to cmp().
      
      Stability of the sort: distinct elements that compare equal emerge from
      the sort in the same order as with Mark's code, for simple test cases.  A
      boot-time test is provided to verify this and other correctness
      requirements.
      
      A kernel that uses drm.ko appears to run normally with this change; I have
      no suitable hardware to similarly test the use by UBIFS.
      
      [akpm@linux-foundation.org: style tweaks, fix comment, make list_sort_test __init]
      Signed-off-by: NDon Mullis <don.mullis@gmail.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Artem Bityutskiy <dedekind@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      835cc0c8
  2. 13 1月, 2010 1 次提交