1. 26 10月, 2012 1 次提交
    • T
      genalloc: stop crashing the system when destroying a pool · eedce141
      Thadeu Lima de Souza Cascardo 提交于
      The genalloc code uses the bitmap API from include/linux/bitmap.h and
      lib/bitmap.c, which is based on long values.  Both bitmap_set from
      lib/bitmap.c and bitmap_set_ll, which is the lockless version from
      genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in
      the bitmap.
      
      That one uses (1 << bits) - 1, 0b111, if you are setting the first three
      bits.  This means that the API counts from the least significant bits
      (LSB from now on) to the MSB.  The LSB in the first long is bit 0, then.
      The same works for the lookup functions.
      
      The genalloc code uses longs for the bitmap, as it should.  In
      include/linux/genalloc.h, struct gen_pool_chunk has unsigned long
      bits[0] as its last member.  When allocating the struct, genalloc should
      reserve enough space for the bitmap.  This should be a proper number of
      longs that can fit the amount of bits in the bitmap.
      
      However, genalloc allocates an integer number of bytes that fit the
      amount of bits, but may not be an integer amount of longs.  9 bytes, for
      example, could be allocated for 70 bits.
      
      This is a problem in itself if the Least Significat Bit in a long is in
      the byte with the largest address, which happens in Big Endian machines.
      This means genalloc is not allocating the byte in which it will try to
      set or check for a bit.
      
      This may end up in memory corruption, where genalloc will try to set the
      bits it has not allocated.  In fact, genalloc may not set these bits
      because it may find them already set, because they were not zeroed since
      they were not allocated.  And that's what causes a BUG when
      gen_pool_destroy is called and check for any set bits.
      
      What really happens is that genalloc uses kmalloc_node with __GFP_ZERO
      on gen_pool_add_virt.  With SLAB and SLUB, this means the whole slab
      will be cleared, not only the requested bytes.  Since struct
      gen_pool_chunk has a size that is a multiple of 8, and slab sizes are
      multiples of 8, we get lucky and allocate and clear the right amount of
      bytes.
      
      Hower, this is not the case with SLOB or with older code that did memset
      after allocating instead of using __GFP_ZERO.
      
      So, a simple module as this (running 3.6.0), will cause a crash when
      rmmod'ed.
      
        [root@phantom-lp2 foo]# cat foo.c
        #include <linux/kernel.h>
        #include <linux/module.h>
        #include <linux/init.h>
        #include <linux/genalloc.h>
      
        MODULE_LICENSE("GPL");
        MODULE_VERSION("0.1");
      
        static struct gen_pool *foo_pool;
      
        static __init int foo_init(void)
        {
                int ret;
                foo_pool = gen_pool_create(10, -1);
                if (!foo_pool)
                        return -ENOMEM;
                ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1);
                if (ret) {
                        gen_pool_destroy(foo_pool);
                        return ret;
                }
                return 0;
        }
      
        static __exit void foo_exit(void)
        {
                gen_pool_destroy(foo_pool);
        }
      
        module_init(foo_init);
        module_exit(foo_exit);
        [root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB
        CONFIG_SLOB=y
        [root@phantom-lp2 foo]# insmod ./foo.ko
        [root@phantom-lp2 foo]# rmmod foo
        ------------[ cut here ]------------
        kernel BUG at lib/genalloc.c:243!
        cpu 0x4: Vector: 700 (Program Check) at [c0000000bb0e7960]
            pc: c0000000003cb50c: .gen_pool_destroy+0xac/0x110
            lr: c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110
            sp: c0000000bb0e7be0
           msr: 8000000000029032
          current = 0xc0000000bb0e0000
          paca    = 0xc000000006d30e00   softe: 0        irq_happened: 0x01
            pid   = 13044, comm = rmmod
        kernel BUG at lib/genalloc.c:243!
        [c0000000bb0e7ca0] d000000004b00020 .foo_exit+0x20/0x38 [foo]
        [c0000000bb0e7d20] c0000000000dff98 .SyS_delete_module+0x1a8/0x290
        [c0000000bb0e7e30] c0000000000097d4 syscall_exit+0x0/0x94
        --- Exception: c00 (System Call) at 000000800753d1a0
        SP (fffd0b0e640) is in userspace
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Benjamin Gaignard <benjamin.gaignard@stericsson.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eedce141
  2. 20 10月, 2012 1 次提交
  3. 11 10月, 2012 1 次提交
  4. 10 10月, 2012 1 次提交
  5. 09 10月, 2012 28 次提交
    • M
      mm: add CONFIG_DEBUG_VM_RB build option · ed8ea815
      Michel Lespinasse 提交于
      Add a CONFIG_DEBUG_VM_RB build option for the previously existing
      DEBUG_MM_RB code.  Now that Andi Kleen modified it to avoid using
      recursive algorithms, we can expose it a bit more.
      
      Also extend this code to validate_mm() after stack expansion, and to check
      that the vma's start and last pgoffs have not changed since the nodes were
      inserted on the anon vma interval tree (as it is important that the nodes
      be reindexed after each such update).
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed8ea815
    • M
      mm: interval tree updates · 9826a516
      Michel Lespinasse 提交于
      Update the generic interval tree code that was introduced in "mm: replace
      vma prio_tree with an interval tree".
      
      Changes:
      
      - fixed 'endpoing' typo noticed by Andrew Morton
      
      - replaced include/linux/interval_tree_tmpl.h, which was used as a
        template (including it automatically defined the interval tree
        functions) with include/linux/interval_tree_generic.h, which only
        defines a preprocessor macro INTERVAL_TREE_DEFINE(), which itself
        defines the interval tree functions when invoked. Now that is a very
        long macro which is unfortunate, but it does make the usage sites
        (lib/interval_tree.c and mm/interval_tree.c) a bit nicer than previously.
      
      - make use of RB_DECLARE_CALLBACKS() in the INTERVAL_TREE_DEFINE() macro,
        instead of duplicating that code in the interval tree template.
      
      - replaced vma_interval_tree_add(), which was actually handling the
        nonlinear and interval tree cases, with vma_interval_tree_insert_after()
        which handles only the interval tree case and has an API that is more
        consistent with the other interval tree handling functions.
        The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap().
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9826a516
    • M
      rbtree: move augmented rbtree functionality to rbtree_augmented.h · 9c079add
      Michel Lespinasse 提交于
      Provide rb_insert_augmented() and rb_erase_augmented() through a new
      rbtree_augmented.h include file.  rb_erase_augmented() is defined there as
      an __always_inline function, in order to allow inlining of augmented
      rbtree callbacks into it.  Since this generates a relatively large
      function, each augmented rbtree user should make sure to have a single
      call site.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c079add
    • M
      prio_tree: remove · 147e615f
      Michel Lespinasse 提交于
      After both prio_tree users have been converted to use red-black trees,
      there is no need to keep around the prio tree library anymore.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      147e615f
    • M
      mm: replace vma prio_tree with an interval tree · 6b2dbba8
      Michel Lespinasse 提交于
      Implement an interval tree as a replacement for the VMA prio_tree.  The
      algorithms are similar to lib/interval_tree.c; however that code can't be
      directly reused as the interval endpoints are not explicitly stored in the
      VMA.  So instead, the common algorithm is moved into a template and the
      details (node type, how to get interval endpoints from the node, etc) are
      filled in using the C preprocessor.
      
      Once the interval tree functions are available, using them as a
      replacement to the VMA prio tree is a relatively simple, mechanical job.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b2dbba8
    • M
      rbtree: add prio tree and interval tree tests · fff3fd8a
      Michel Lespinasse 提交于
      Patch 1 implements support for interval trees, on top of the augmented
      rbtree API. It also adds synthetic tests to compare the performance of
      interval trees vs prio trees. Short answers is that interval trees are
      slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x)
      on search. It is debatable how realistic the synthetic test is, and I have
      not made such measurements yet, but my impression is that interval trees
      would still come out faster.
      
      Patch 2 uses a preprocessor template to make the interval tree generic,
      and uses it as a replacement for the vma prio_tree.
      
      Patch 3 takes the other prio_tree user, kmemleak, and converts it to use
      a basic rbtree. We don't actually need the augmented rbtree support here
      because the intervals are always non-overlapping.
      
      Patch 4 removes the now-unused prio tree library.
      
      Patch 5 proposes an additional optimization to rb_erase_augmented, now
      providing it as an inline function so that the augmented callbacks can be
      inlined in. This provides an additional 5-10% performance improvement
      for the interval tree insert/erase benchmark. There is a maintainance cost
      as it exposes augmented rbtree users to some of the rbtree library internals;
      however I think this cost shouldn't be too high as I expect the augmented
      rbtree will always have much less users than the base rbtree.
      
      I should probably add a quick summary of why I think it makes sense to
      replace prio trees with augmented rbtree based interval trees now.  One of
      the drivers is that we need augmented rbtrees for Rik's vma gap finding
      code, and once you have them, it just makes sense to use them for interval
      trees as well, as this is the simpler and more well known algorithm.  prio
      trees, in comparison, seem *too* clever: they impose an additional 'heap'
      constraint on the tree, which they use to guarantee a faster worst-case
      complexity of O(k+log N) for stabbing queries in a well-balanced prio
      tree, vs O(k*log N) for interval trees (where k=number of matches,
      N=number of intervals).  Now this sounds great, but in practice prio trees
      don't realize this theorical benefit.  First, the additional constraint
      makes them harder to update, so that the kernel implementation has to
      simplify things by balancing them like a radix tree, which is not always
      ideal.  Second, the fact that there are both index and heap properties
      makes both tree manipulation and search more complex, which results in a
      higher multiplicative time constant.  As it turns out, the simple interval
      tree algorithm ends up running faster than the more clever prio tree.
      
      This patch:
      
      Add two test modules:
      
      - prio_tree_test measures the performance of lib/prio_tree.c, both for
        insertion/removal and for stabbing searches
      
      - interval_tree_test measures the performance of a library of equivalent
        functionality, built using the augmented rbtree support.
      
      In order to support the second test module, lib/interval_tree.c is
      introduced. It is kept separate from the interval_tree_test main file
      for two reasons: first we don't want to provide an unfair advantage
      over prio_tree_test by having everything in a single compilation unit,
      and second there is the possibility that the interval tree functionality
      could get some non-test users in kernel over time.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fff3fd8a
    • M
      rbtree: add RB_DECLARE_CALLBACKS() macro · 3908836a
      Michel Lespinasse 提交于
      As proposed by Peter Zijlstra, this makes it easier to define the augmented
      rbtree callbacks.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3908836a
    • M
      rbtree: remove prior augmented rbtree implementation · 9d9e6f97
      Michel Lespinasse 提交于
      convert arch/x86/mm/pat_rbtree.c to the proposed augmented rbtree api
      and remove the old augmented rbtree implementation.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d9e6f97
    • M
      rbtree: faster augmented rbtree manipulation · 14b94af0
      Michel Lespinasse 提交于
      Introduce new augmented rbtree APIs that allow minimal recalculation of
      augmented node information.
      
      A new callback is added to the rbtree insertion and erase rebalancing
      functions, to be called on each tree rotations. Such rotations preserve
      the subtree's root augmented value, but require recalculation of the one
      child that was previously located at the subtree root.
      
      In the insertion case, the handcoded search phase must be updated to
      maintain the augmented information on insertion, and then the rbtree
      coloring/rebalancing algorithms keep it up to date.
      
      In the erase case, things are more complicated since it is library
      code that manipulates the rbtree in order to remove internal nodes.
      This requires a couple additional callbacks to copy a subtree's
      augmented value when a new root is stitched in, and to recompute
      augmented values down the ancestry path when a node is removed from
      the tree.
      
      In order to preserve maximum speed for the non-augmented case,
      we provide two versions of each tree manipulation function.
      rb_insert_augmented() is the augmented equivalent of rb_insert_color(),
      and rb_erase_augmented() is the augmented equivalent of rb_erase().
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      14b94af0
    • M
      rbtree: augmented rbtree test · dadf9353
      Michel Lespinasse 提交于
      Small test to measure the performance of augmented rbtrees.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dadf9353
    • M
      rbtree: low level optimizations in rb_erase() · 4f035ad6
      Michel Lespinasse 提交于
      Various minor optimizations in rb_erase():
      - Avoid multiple loading of node->__rb_parent_color when computing parent
        and color information (possibly not in close sequence, as there might
        be further branches in the algorithm)
      - In the 1-child subcase of case 1, copy the __rb_parent_color field from
        the erased node to the child instead of recomputing it from the desired
        parent and color
      - When searching for the erased node's successor, differentiate between
        cases 2 and 3 based on whether any left links were followed. This avoids
        a condition later down.
      - In case 3, keep a pointer to the erased node's right child so we don't
        have to refetch it later to adjust its parent.
      - In the no-childs subcase of cases 2 and 3, place the rebalance assigment
        last so that the compiler can remove the following if(rebalance) test.
      
      Also, added some comments to illustrate cases 2 and 3.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f035ad6
    • M
      rbtree: handle 1-child recoloring in rb_erase() instead of rb_erase_color() · 46b6135a
      Michel Lespinasse 提交于
      An interesting observation for rb_erase() is that when a node has
      exactly one child, the node must be black and the child must be red.
      An interesting consequence is that removing such a node can be done by
      simply replacing it with its child and making the child black,
      which we can do efficiently in rb_erase(). __rb_erase_color() then
      only needs to handle the no-childs case and can be modified accordingly.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46b6135a
    • M
      rbtree: place easiest case first in rb_erase() · 60670b80
      Michel Lespinasse 提交于
      In rb_erase, move the easy case (node to erase has no more than
      1 child) first. I feel the code reads easier that way.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60670b80
    • M
      rbtree: add __rb_change_child() helper function · 7abc704a
      Michel Lespinasse 提交于
      Add __rb_change_child() as an inline helper function to replace code that
      would otherwise be duplicated 4 times in the source.
      
      No changes to binary size or speed.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7abc704a
    • M
      rbtree test: fix sparse warning about 64-bit constant · 28d75309
      Michel Lespinasse 提交于
      Just a small fix to make sparse happy.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Reported-by: NFengguang Wu <wfg@linux.intel.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28d75309
    • M
      rbtree: optimize fetching of sibling node · 59633abf
      Michel Lespinasse 提交于
      When looking to fetch a node's sibling, we went through a sequence of:
      - check if node is the parent's left child
      - if it is, then fetch the parent's right child
      
      This can be replaced with:
      - fetch the parent's right child as an assumed sibling
      - check that node is NOT the fetched child
      
      This avoids fetching the parent's left child when node is actually
      that child. Saves a bit on code size, though it doesn't seem to make
      a large difference in speed.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59633abf
    • M
      rbtree: coding style adjustments · 7ce6ff9e
      Michel Lespinasse 提交于
      Set comment and indentation style to be consistent with linux coding style
      and the rest of the file, as suggested by Peter Zijlstra
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ce6ff9e
    • M
      rbtree: low level optimizations in __rb_erase_color() · 6280d235
      Michel Lespinasse 提交于
      In __rb_erase_color(), we often already have pointers to the nodes being
      rotated and/or know what their colors must be, so we can generate more
      efficient code than the generic __rb_rotate_left() and __rb_rotate_right()
      functions.
      
      Also when the current node is red or when flipping the sibling's color,
      the parent is already known so we can use the more efficient
      rb_set_parent_color() function to set the desired color.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6280d235
    • M
      rbtree: optimize case selection logic in __rb_erase_color() · e125d147
      Michel Lespinasse 提交于
      In __rb_erase_color(), we have to select one of 3 cases depending on the
      color on the 'other' node children.  If both children are black, we flip a
      few node colors and iterate.  Otherwise, we do either one or two tree
      rotations, depending on the color of the 'other' child opposite to 'node',
      and then we are done.
      
      The corresponding logic had duplicate checks for the color of the 'other'
      child opposite to 'node'.  It was checking it first to determine if both
      children are black, and then to determine how many tree rotations are
      required.  Rearrange the logic to avoid that extra check.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e125d147
    • M
      rbtree: adjust node color in __rb_erase_color() only when necessary · d6ff1273
      Michel Lespinasse 提交于
      In __rb_erase_color(), we were always setting a node to black after
      exiting the main loop.  And in one case, after fixing up the tree to
      satisfy all rbtree invariants, we were setting the current node to root
      just to guarantee a loop exit, at which point the root would be set to
      black.  However this is not necessary, as the root of an rbtree is already
      known to be black.  The only case where the color flip is required is when
      we exit the loop due to the current node being red, and it's easiest to
      just do the flip at that point instead of doing it after the loop.
      
      [adrian.hunter@intel.com: perf tools: fix build for another rbtree.c change]
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d6ff1273
    • M
      rbtree: low level optimizations in rb_insert_color() · 5bc9188a
      Michel Lespinasse 提交于
      - Use the newly introduced rb_set_parent_color() function to flip the color
        of nodes whose parent is already known.
      - Optimize rb_parent() when the node is known to be red - there is no need
        to mask out the color in that case.
      - Flipping gparent's color to red requires us to fetch its rb_parent_color
        field, so we can reuse it as the parent value for the next loop iteration.
      - Do not use __rb_rotate_left() and __rb_rotate_right() to handle tree
        rotations: we already have pointers to all relevant nodes, and know their
        colors (either because we want to adjust it, or because we've tested it,
        or we can deduce it as black due to the node proximity to a known red node).
        So we can generate more efficient code by making use of the node pointers
        we already have, and setting both the parent and color attributes for
        nodes all at once. Also in Case 2, some node attributes don't have to
        be set because we know another tree rotation (Case 3) will always follow
        and override them.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5bc9188a
    • M
      rbtree: adjust root color in rb_insert_color() only when necessary · 6d58452d
      Michel Lespinasse 提交于
      The root node of an rbtree must always be black.  However,
      rb_insert_color() only needs to maintain this invariant when it has been
      broken - that is, when it exits the loop due to the current (red) node
      being the root.  In all other cases (exiting after tree rotations, or
      exiting due to an existing black parent) the invariant is already
      satisfied, so there is no need to adjust the root node color.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d58452d
    • M
      rbtree: break out of rb_insert_color loop after tree rotation · 1f052865
      Michel Lespinasse 提交于
      It is a well known property of rbtrees that insertion never requires more
      than two tree rotations.  In our implementation, after one loop iteration
      identified one or two necessary tree rotations, we would iterate and look
      for more.  However at that point the node's parent would always be black,
      which would cause us to exit the loop.
      
      We can make the code flow more obvious by just adding a break statement
      after the tree rotations, where we know we are done.  Additionally, in the
      cases where two tree rotations are necessary, we don't have to update the
      'node' pointer as it wouldn't be used until the next loop iteration, which
      we now avoid due to this break statement.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f052865
    • M
      rbtree: performance and correctness test · 910a742d
      Michel Lespinasse 提交于
      This small module helps measure the performance of rbtree insert and
      erase.
      
      Additionally, we run a few correctness tests to check that the rbtrees
      have all desired properties:
      
      - contains the right number of nodes in the order desired,
      - never two consecutive red nodes on any path,
      - all paths to leaf nodes have the same number of black nodes,
      - root node is black
      
      [akpm@linux-foundation.org: fix printk warning: sparc64 cycles_t is unsigned long]
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      910a742d
    • M
      rbtree: move some implementation details from rbtree.h to rbtree.c · bf7ad8ee
      Michel Lespinasse 提交于
      rbtree users must use the documented APIs to manipulate the tree
      structure.  Low-level helpers to manipulate node colors and parenthood are
      not part of that API, so move them to lib/rbtree.c
      
      [dwmw2@infradead.org: fix jffs2 build issue due to renamed __rb_parent_color field]
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf7ad8ee
    • M
      rbtree: empty nodes have no color · 4c199a93
      Michel Lespinasse 提交于
      Empty nodes have no color.  We can make use of this property to simplify
      the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros.  Also,
      we can get rid of the rb_init_node function which had been introduced by
      commit 88d19cf3 ("timers: Add rb_init_node() to allow for stack
      allocated rb nodes") to avoid some issue with the empty node's color not
      being initialized.
      
      I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
      doing there, though.  axboe introduced them in commit 10fd48f2
      ("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev").  The way I
      see it, the 'empty node' abstraction is only used by rbtree users to
      flag nodes that they haven't inserted in any rbtree, so asking the
      predecessor or successor of such nodes doesn't make any sense.
      
      One final rb_init_node() caller was recently added in sysctl code to
      implement faster sysctl name lookups.  This code doesn't make use of
      RB_EMPTY_NODE at all, and from what I could see it only called
      rb_init_node() under the mistaken assumption that such initialization was
      required before node insertion.
      
      [sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c199a93
    • C
      Kconfig: clean up the long arch list for the DEBUG_BUGVERBOSE config option · 9b2a60c4
      Catalin Marinas 提交于
      Introduce HAVE_DEBUG_BUGVERBOSE config option and select it in
      corresponding architecture Kconfig files.  Architectures that already
      select GENERIC_BUG don't need to select HAVE_DEBUG_BUGVERBOSE.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b2a60c4
    • C
      Kconfig: clean up the long arch list for the DEBUG_KMEMLEAK config option · b69ec42b
      Catalin Marinas 提交于
      Introduce HAVE_DEBUG_KMEMLEAK config option and select it in corresponding
      architecture Kconfig files.  DEBUG_KMEMLEAK now only depends on
      HAVE_DEBUG_KMEMLEAK.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b69ec42b
  6. 08 10月, 2012 6 次提交
    • D
      MPILIB: Provide a function to read raw data into an MPI · e1045992
      David Howells 提交于
      Provide a function to read raw data of a predetermined size into an MPI rather
      than expecting the size to be encoded within the data.  The data is assumed to
      represent an unsigned integer, and the resulting MPI will be positive.
      
      The function looks like this:
      
      	MPI mpi_read_raw_data(const void *, size_t);
      
      This is useful for reading ASN.1 integer primitives where the length is encoded
      in the ASN.1 metadata.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e1045992
    • D
      X.509: Add an ASN.1 decoder · 42d5ec27
      David Howells 提交于
      Add an ASN.1 BER/DER/CER decoder.  This uses the bytecode from the ASN.1
      compiler in the previous patch to inform it as to what to expect to find in the
      encoded byte stream.  The output from the compiler also tells it what functions
      to call on what tags, thus allowing the caller to retrieve information.
      
      The decoder is called as follows:
      
      	int asn1_decoder(const struct asn1_decoder *decoder,
      			 void *context,
      			 const unsigned char *data,
      			 size_t datalen);
      
      The decoder argument points to the bytecode from the ASN.1 compiler.  context
      is the caller's context and is passed to the action functions.  data and
      datalen define the byte stream to be decoded.
      
      Note that the decoder is currently limited to datalen being less than 64K.
      This reduces the amount of stack space used by the decoder because ASN.1 is a
      nested construct.  Similarly, the decoder is limited to a maximum of 10 levels
      of constructed data outside of a leaf node also in an effort to keep stack
      usage down.
      
      These restrictions can be raised if necessary.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      42d5ec27
    • D
      X.509: Add utility functions to render OIDs as strings · 4f73175d
      David Howells 提交于
      Add a pair of utility functions to render OIDs as strings.  The first takes an
      encoded OID and turns it into a "a.b.c.d" form string:
      
      	int sprint_oid(const void *data, size_t datasize,
      		       char *buffer, size_t bufsize);
      
      The second takes an OID enum index and calls the first on the data held
      therein:
      
      	int sprint_OID(enum OID oid, char *buffer, size_t bufsize);
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      4f73175d
    • D
      X.509: Implement simple static OID registry · a77ad6ea
      David Howells 提交于
      Implement a simple static OID registry that allows the mapping of an encoded
      OID to an enum value for ease of use.
      
      The OID registry index enum appears in the:
      
      	linux/oid_registry.h
      
      header file.  A script generates the registry from lines in the header file
      that look like:
      
      	<sp*>OID_foo,<sp*>/*<sp*>1.2.3.4<sp*>*/
      
      The actual OID is taken to be represented by the numbers with interpolated
      dots in the comment.
      
      All other lines in the header are ignored.
      
      The registry is queries by calling:
      
      	OID look_up_oid(const void *data, size_t datasize);
      
      This returns a number from the registry enum representing the OID if found or
      OID__NR if not.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      a77ad6ea
    • D
      MPILIB: Reinstate mpi_cmp[_ui]() and export for RSA signature verification · 12f008b6
      David Howells 提交于
      Reinstate and export mpi_cmp() and mpi_cmp_ui() from the MPI library for use by
      RSA signature verification as per RFC3447 section 5.2.2 step 1.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      12f008b6
    • D
      MPILIB: Provide count_leading/trailing_zeros() based on arch functions · aacf29bf
      David Howells 提交于
      Provide count_leading/trailing_zeros() macros based on extant arch bit scanning
      functions rather than reimplementing from scratch in MPILIB.
      
      Whilst we're at it, turn count_foo_zeros(n, x) into n = count_foo_zeros(x).
      
      Also move the definition to asm-generic as other people may be interested in
      using it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      aacf29bf
  7. 06 10月, 2012 2 次提交