- 18 12月, 2012 6 次提交
-
-
由 Oleg Nesterov 提交于
Add lockdep annotations. Not only this can help to find the potential problems, we do not want the false warnings if, say, the task takes two different percpu_rw_semaphore's for reading. IOW, at least ->rw_sem should not use a single class. This patch exposes this internal lock to lockdep so that it represents the whole percpu_rw_semaphore. This way we do not need to add another "fake" ->lockdep_map and lock_class_key. More importantly, this also makes the output from lockdep much more understandable if it finds the problem. In short, with this patch from lockdep pov percpu_down_read() and percpu_up_read() acquire/release ->rw_sem for reading, this matches the actual semantics. This abuses __up_read() but I hope this is fine and in fact I'd like to have down_read_no_lockdep() as well, percpu_down_read_recursive_readers() will need it. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Anton Arapov <anton@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
percpu_rw_semaphore->writer_mutex was only added to simplify the initial rewrite, the only thing it protects is clear_fast_ctr() which otherwise could be called by multiple writers. ->rw_sem is enough to serialize the writers. Kill this mutex and add "atomic_t write_ctr" instead. The writers increment/decrement this counter, the readers check it is zero instead of mutex_is_locked(). Move atomic_add(clear_fast_ctr(), slow_read_ctr) under down_write() to avoid the race with other writers. This is a bit sub-optimal, only the first writer needs this and we do not need to exclude the readers at this stage. But this is simple, we do not want another internal lock until we add more features. And this speeds up the write-contended case. Before this patch the racing writers sleep in synchronize_sched_expedited() sequentially, with this patch multiple synchronize_sched_expedited's can "overlap" with each other. Note: we can do more optimizations, this is only the first step. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Anton Arapov <anton@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Currently the writer does msleep() plus synchronize_sched() 3 times to acquire/release the semaphore, and during this time the readers are blocked completely. Even if the "write" section was not actually started or if it was already finished. With this patch down_write/up_write does synchronize_sched() twice and down_read/up_read are still possible during this time, just they use the slow path. percpu_down_write() first forces the readers to use rw_semaphore and increment the "slow" counter to take the lock for reading, then it takes that rw_semaphore for writing and blocks the readers. Also. With this patch the code relies on the documented behaviour of synchronize_sched(), it doesn't try to pair synchronize_sched() with barrier. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anton Arapov <anton@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Beulich 提交于
This is another step towards better standard conformance. Rather than adding a local buffer to store the specified portion of the string (with the need to enforce an arbitrary maximum supported width to limit the buffer size), do a maximum width conversion and then drop as much of it as is necessary to meet the caller's request. Also fail on negative field widths. Uses the deprecated simple_strto*() functions because kstrtoXX() fail on non-zero terminated strings. Signed-off-by: NJan Beulich <jbeulich@suse.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Shevchenko 提交于
Remove the custom implementation of the functionality similar to kbasename(). Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jason Gunthorpe 提交于
Documentation/printk-formats.txt says to use %zd for a ssize_t argument and some drivers do. Unfortunately this prints a positive number for negative values eg: tpm_tis 70030000.tpm_tis: tpm_transmit: tpm_send: error 4294967234 Add a case to va_args a ssize_t type if the interpretation should be signed. Tested on PPC32. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 12月, 2012 1 次提交
-
-
由 Joonsoo Kim 提交于
It is strange that alloc_bootmem() returns a virtual address and free_bootmem() requires a physical address. Anyway, free_bootmem()'s first parameter should be physical address. There are some call sites for free_bootmem() with virtual address. So fix them. [akpm@linux-foundation.org: improve free_bootmem() and free_bootmem_pate() documentation] Signed-off-by: NJoonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 12月, 2012 2 次提交
-
-
由 Nadia Yvette Chambers 提交于
I've legally changed my name with New York State, the US Social Security Administration, et al. This patch propagates the name change and change in initials and login to comments in the kernel source as well. Signed-off-by: NNadia Yvette Chambers <nyc@holomorphy.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 Tim Gardner 提交于
It is $(obj)/oid_registry.o that is dependent on $(obj)/oid_registry_data.c. The object file cannot be built until $(obj)/oid_registry_data.c has been generated. A periodic and hard to reproduce parallel build failure is due to this incorrect lib/Makefile dependency. The compile error is completely disingenuous. GEN lib/oid_registry_data.c Compiling 49 OIDs CC lib/oid_registry.o gcc: error: lib/oid_registry.c: No such file or directory gcc: fatal error: no input files compilation terminated. make[3]: *** [lib/oid_registry.o] Error 4 Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Michel Lespinasse <walken@google.com> Cc: David Howells <dhowells@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NTim Gardner <tim.gardner@canonical.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 05 12月, 2012 1 次提交
-
-
由 David Howells 提交于
Fix an error in asn1_find_indefinite_length() whereby small definite length elements of size 0x7f are incorrecly classified as non-small. Without this fix, an error will be given as the length of the length will be perceived as being very much greater than the maximum supported size. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 03 12月, 2012 1 次提交
-
-
由 Masanari Iida 提交于
Correct spelling typo within various Kconfig. Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 29 11月, 2012 1 次提交
-
-
由 Bill Pemberton 提交于
CONFIG_HOTPLUG is being removed so kobject_uevent needs to always be part of the library. Signed-off-by: NBill Pemberton <wfp5p@virginia.edu> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 24 11月, 2012 1 次提交
-
-
由 Manuel Lauss 提交于
Since 4.4 GCC on MIPS no longer recognizes the "h" constraint, leading to this build failure: CC lib/mpi/generic_mpih-mul1.o lib/mpi/generic_mpih-mul1.c: In function 'mpihelp_mul_1': lib/mpi/generic_mpih-mul1.c:50:3: error: impossible constraint in 'asm' This patch updates MPI with the latest umul_ppm implementations for MIPS. Signed-off-by: NManuel Lauss <manuel.lauss@gmail.com> Cc: Linux-MIPS <linux-mips@linux-mips.org> Cc: Dmitry Kasatkin <dmitry.kasatkin@intel.com> Cc: James Morris <jmorris@namei.org> Patchwork: https://patchwork.linux-mips.org/patch/4612/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
- 14 11月, 2012 1 次提交
-
-
由 Paul E. McKenney 提交于
The RCU CPU stall warning timeout has defaulted to 60 seconds for some years, with almost no false positives. This commit therefore reduces the default to 21 seconds, slightly shorter than the new soft-lockup timeout. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 30 10月, 2012 7 次提交
-
-
由 Alexander Duyck 提交于
Currently swiotlb is the only consumer for swiotlb_bounce. Since that is the case it doesn't make much sense to be exporting it so make it a static function only. In addition we can save a few more lines of code by making it so that it accepts the DMA address as a physical address instead of a virtual one. This is the last piece in essentially pushing all of the DMA address values to use physical addresses in swiotlb. In order to clarify things since we now have 2 physical addresses in use inside of swiotlb_bounce I am renaming phys to orig_addr, and dma_addr to tlb_addr. This way is should be clear that orig_addr is contained within io_orig_addr and tlb_addr is an address within the io_tlb_addr buffer. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Alexander Duyck 提交于
This change makes it so that the sync functionality also uses physical addresses. This helps to further reduce the use of virt_to_phys and phys_to_virt functions. In order to clarify things since we now have 2 physical addresses in use inside of swiotlb_tbl_sync_single I am renaming phys to orig_addr, and dma_addr to tlb_addr. This way is should be clear that orig_addr is contained within io_orig_addr and tlb_addr is an address within the io_tlb_addr buffer. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Alexander Duyck 提交于
This change makes it so that the unmap functionality also uses physical addresses. This helps to further reduce the use of virt_to_phys and phys_to_virt functions. In order to clarify things since we now have 2 physical addresses in use inside of swiotlb_tbl_unmap_single I am renaming phys to orig_addr, and dma_addr to tlb_addr. This way is should be clear that orig_addr is contained within io_orig_addr and tlb_addr is an address within the io_tlb_addr buffer. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Alexander Duyck 提交于
This change makes it so that swiotlb_tbl_map_single will return a physical address instead of a virtual address when called. The advantage to this once again is that we are avoiding a number of virt_to_phys and phys_to_virt translations by working with everything as a physical address. One change I had to make in order to support using physical addresses is that I could no longer trust 0 to be a invalid physical address on all platforms. So instead I made it so that ~0 is returned on error. This should never be a valid return value as it implies that only one byte would be available for use. In order to clarify things since we now have 2 physical addresses in use inside of swiotlb_tbl_map_single I am renaming phys to orig_addr, and dma_addr to tlb_addr. This way is should be clear that orig_addr is contained within io_orig_addr and tlb_addr is an address within the io_tlb_addr buffer. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Alexander Duyck 提交于
This change makes it so that we can avoid virt_to_phys overhead when using the io_tlb_overflow_buffer. My original plan was to completely remove the value and replace it with a constant but I had seen that there were recent patches that stated this couldn't be done until all device drivers that depended on that functionality be updated. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Alexander Duyck 提交于
This change replaces all references to the virtual address for io_tlb_start with references to the physical address io_tlb_end. The main advantage of replacing the virtual address with a physical address is that we can avoid having to do multiple translations from the virtual address to the physical one needed for testing an existing DMA address. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Alexander Duyck 提交于
This change replaces all references to the virtual address for io_tlb_end with references to the physical address io_tlb_end. The main advantage of replacing the virtual address with a physical address is that we can avoid having to do multiple translations from the virtual address to the physical one needed for testing an existing DMA address. Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 26 10月, 2012 1 次提交
-
-
The genalloc code uses the bitmap API from include/linux/bitmap.h and lib/bitmap.c, which is based on long values. Both bitmap_set from lib/bitmap.c and bitmap_set_ll, which is the lockless version from genalloc.c, use BITMAP_LAST_WORD_MASK to set the first bits in a long in the bitmap. That one uses (1 << bits) - 1, 0b111, if you are setting the first three bits. This means that the API counts from the least significant bits (LSB from now on) to the MSB. The LSB in the first long is bit 0, then. The same works for the lookup functions. The genalloc code uses longs for the bitmap, as it should. In include/linux/genalloc.h, struct gen_pool_chunk has unsigned long bits[0] as its last member. When allocating the struct, genalloc should reserve enough space for the bitmap. This should be a proper number of longs that can fit the amount of bits in the bitmap. However, genalloc allocates an integer number of bytes that fit the amount of bits, but may not be an integer amount of longs. 9 bytes, for example, could be allocated for 70 bits. This is a problem in itself if the Least Significat Bit in a long is in the byte with the largest address, which happens in Big Endian machines. This means genalloc is not allocating the byte in which it will try to set or check for a bit. This may end up in memory corruption, where genalloc will try to set the bits it has not allocated. In fact, genalloc may not set these bits because it may find them already set, because they were not zeroed since they were not allocated. And that's what causes a BUG when gen_pool_destroy is called and check for any set bits. What really happens is that genalloc uses kmalloc_node with __GFP_ZERO on gen_pool_add_virt. With SLAB and SLUB, this means the whole slab will be cleared, not only the requested bytes. Since struct gen_pool_chunk has a size that is a multiple of 8, and slab sizes are multiples of 8, we get lucky and allocate and clear the right amount of bytes. Hower, this is not the case with SLOB or with older code that did memset after allocating instead of using __GFP_ZERO. So, a simple module as this (running 3.6.0), will cause a crash when rmmod'ed. [root@phantom-lp2 foo]# cat foo.c #include <linux/kernel.h> #include <linux/module.h> #include <linux/init.h> #include <linux/genalloc.h> MODULE_LICENSE("GPL"); MODULE_VERSION("0.1"); static struct gen_pool *foo_pool; static __init int foo_init(void) { int ret; foo_pool = gen_pool_create(10, -1); if (!foo_pool) return -ENOMEM; ret = gen_pool_add(foo_pool, 0xa0000000, 32 << 10, -1); if (ret) { gen_pool_destroy(foo_pool); return ret; } return 0; } static __exit void foo_exit(void) { gen_pool_destroy(foo_pool); } module_init(foo_init); module_exit(foo_exit); [root@phantom-lp2 foo]# zcat /proc/config.gz | grep SLOB CONFIG_SLOB=y [root@phantom-lp2 foo]# insmod ./foo.ko [root@phantom-lp2 foo]# rmmod foo ------------[ cut here ]------------ kernel BUG at lib/genalloc.c:243! cpu 0x4: Vector: 700 (Program Check) at [c0000000bb0e7960] pc: c0000000003cb50c: .gen_pool_destroy+0xac/0x110 lr: c0000000003cb4fc: .gen_pool_destroy+0x9c/0x110 sp: c0000000bb0e7be0 msr: 8000000000029032 current = 0xc0000000bb0e0000 paca = 0xc000000006d30e00 softe: 0 irq_happened: 0x01 pid = 13044, comm = rmmod kernel BUG at lib/genalloc.c:243! [c0000000bb0e7ca0] d000000004b00020 .foo_exit+0x20/0x38 [foo] [c0000000bb0e7d20] c0000000000dff98 .SyS_delete_module+0x1a8/0x290 [c0000000bb0e7e30] c0000000000097d4 syscall_exit+0x0/0x94 --- Exception: c00 (System Call) at 000000800753d1a0 SP (fffd0b0e640) is in userspace Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Benjamin Gaignard <benjamin.gaignard@stericsson.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 10月, 2012 1 次提交
-
-
由 Ming Lei 提交于
If there is only one match, the unique matched entry should be returned. Without the fix, the upcoming dma debug interfaces ("dma-debug: new interfaces to debug dma mapping errors") can't work reliably because only device and dma_addr are passed to dma_mapping_error(). Signed-off-by: NMing Lei <ming.lei@canonical.com> Reported-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Tested-by: NShuah Khan <shuah.khan@hp.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Jakub Kicinski <kubakici@wp.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 10月, 2012 1 次提交
-
-
由 Ezequiel Garcia 提交于
Previously kvasprintf() allocation was being done through kmalloc(), thus producing an inaccurate trace report. This is a common problem: in order to get accurate callsite tracing, a lib/utils function shouldn't allocate kmalloc but instead use kmalloc_track_caller. Signed-off-by: NEzequiel Garcia <elezegarcia@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 10月, 2012 1 次提交
-
-
由 David Howells 提交于
asn1_find_indefinite_length() returns an error indicator of -1, which the caller asn1_ber_decoder() places in a size_t (which is usually unsigned) and then checks to see whether it is less than 0 (which it can't be). This can lead to the following warning: lib/asn1_decoder.c:320 asn1_ber_decoder() warn: unsigned 'len' is never less than zero. Instead, asn1_find_indefinite_length() update the caller's idea of the data cursor and length separately from returning the error code. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 09 10月, 2012 15 次提交
-
-
由 Michel Lespinasse 提交于
Add a CONFIG_DEBUG_VM_RB build option for the previously existing DEBUG_MM_RB code. Now that Andi Kleen modified it to avoid using recursive algorithms, we can expose it a bit more. Also extend this code to validate_mm() after stack expansion, and to check that the vma's start and last pgoffs have not changed since the nodes were inserted on the anon vma interval tree (as it is important that the nodes be reindexed after each such update). Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Update the generic interval tree code that was introduced in "mm: replace vma prio_tree with an interval tree". Changes: - fixed 'endpoing' typo noticed by Andrew Morton - replaced include/linux/interval_tree_tmpl.h, which was used as a template (including it automatically defined the interval tree functions) with include/linux/interval_tree_generic.h, which only defines a preprocessor macro INTERVAL_TREE_DEFINE(), which itself defines the interval tree functions when invoked. Now that is a very long macro which is unfortunate, but it does make the usage sites (lib/interval_tree.c and mm/interval_tree.c) a bit nicer than previously. - make use of RB_DECLARE_CALLBACKS() in the INTERVAL_TREE_DEFINE() macro, instead of duplicating that code in the interval tree template. - replaced vma_interval_tree_add(), which was actually handling the nonlinear and interval tree cases, with vma_interval_tree_insert_after() which handles only the interval tree case and has an API that is more consistent with the other interval tree handling functions. The nonlinear case is now handled explicitly in kernel/fork.c dup_mmap(). Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Santos <daniel.santos@pobox.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Provide rb_insert_augmented() and rb_erase_augmented() through a new rbtree_augmented.h include file. rb_erase_augmented() is defined there as an __always_inline function, in order to allow inlining of augmented rbtree callbacks into it. Since this generates a relatively large function, each augmented rbtree user should make sure to have a single call site. Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
After both prio_tree users have been converted to use red-black trees, there is no need to keep around the prio tree library anymore. Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Implement an interval tree as a replacement for the VMA prio_tree. The algorithms are similar to lib/interval_tree.c; however that code can't be directly reused as the interval endpoints are not explicitly stored in the VMA. So instead, the common algorithm is moved into a template and the details (node type, how to get interval endpoints from the node, etc) are filled in using the C preprocessor. Once the interval tree functions are available, using them as a replacement to the VMA prio tree is a relatively simple, mechanical job. Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Patch 1 implements support for interval trees, on top of the augmented rbtree API. It also adds synthetic tests to compare the performance of interval trees vs prio trees. Short answers is that interval trees are slightly faster (~25%) on insert/erase, and much faster (~2.4 - 3x) on search. It is debatable how realistic the synthetic test is, and I have not made such measurements yet, but my impression is that interval trees would still come out faster. Patch 2 uses a preprocessor template to make the interval tree generic, and uses it as a replacement for the vma prio_tree. Patch 3 takes the other prio_tree user, kmemleak, and converts it to use a basic rbtree. We don't actually need the augmented rbtree support here because the intervals are always non-overlapping. Patch 4 removes the now-unused prio tree library. Patch 5 proposes an additional optimization to rb_erase_augmented, now providing it as an inline function so that the augmented callbacks can be inlined in. This provides an additional 5-10% performance improvement for the interval tree insert/erase benchmark. There is a maintainance cost as it exposes augmented rbtree users to some of the rbtree library internals; however I think this cost shouldn't be too high as I expect the augmented rbtree will always have much less users than the base rbtree. I should probably add a quick summary of why I think it makes sense to replace prio trees with augmented rbtree based interval trees now. One of the drivers is that we need augmented rbtrees for Rik's vma gap finding code, and once you have them, it just makes sense to use them for interval trees as well, as this is the simpler and more well known algorithm. prio trees, in comparison, seem *too* clever: they impose an additional 'heap' constraint on the tree, which they use to guarantee a faster worst-case complexity of O(k+log N) for stabbing queries in a well-balanced prio tree, vs O(k*log N) for interval trees (where k=number of matches, N=number of intervals). Now this sounds great, but in practice prio trees don't realize this theorical benefit. First, the additional constraint makes them harder to update, so that the kernel implementation has to simplify things by balancing them like a radix tree, which is not always ideal. Second, the fact that there are both index and heap properties makes both tree manipulation and search more complex, which results in a higher multiplicative time constant. As it turns out, the simple interval tree algorithm ends up running faster than the more clever prio tree. This patch: Add two test modules: - prio_tree_test measures the performance of lib/prio_tree.c, both for insertion/removal and for stabbing searches - interval_tree_test measures the performance of a library of equivalent functionality, built using the augmented rbtree support. In order to support the second test module, lib/interval_tree.c is introduced. It is kept separate from the interval_tree_test main file for two reasons: first we don't want to provide an unfair advantage over prio_tree_test by having everything in a single compilation unit, and second there is the possibility that the interval tree functionality could get some non-test users in kernel over time. Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
As proposed by Peter Zijlstra, this makes it easier to define the augmented rbtree callbacks. Signed-off-by: NMichel Lespinasse <walken@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
convert arch/x86/mm/pat_rbtree.c to the proposed augmented rbtree api and remove the old augmented rbtree implementation. Signed-off-by: NMichel Lespinasse <walken@google.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Introduce new augmented rbtree APIs that allow minimal recalculation of augmented node information. A new callback is added to the rbtree insertion and erase rebalancing functions, to be called on each tree rotations. Such rotations preserve the subtree's root augmented value, but require recalculation of the one child that was previously located at the subtree root. In the insertion case, the handcoded search phase must be updated to maintain the augmented information on insertion, and then the rbtree coloring/rebalancing algorithms keep it up to date. In the erase case, things are more complicated since it is library code that manipulates the rbtree in order to remove internal nodes. This requires a couple additional callbacks to copy a subtree's augmented value when a new root is stitched in, and to recompute augmented values down the ancestry path when a node is removed from the tree. In order to preserve maximum speed for the non-augmented case, we provide two versions of each tree manipulation function. rb_insert_augmented() is the augmented equivalent of rb_insert_color(), and rb_erase_augmented() is the augmented equivalent of rb_erase(). Signed-off-by: NMichel Lespinasse <walken@google.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Small test to measure the performance of augmented rbtrees. Signed-off-by: NMichel Lespinasse <walken@google.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Various minor optimizations in rb_erase(): - Avoid multiple loading of node->__rb_parent_color when computing parent and color information (possibly not in close sequence, as there might be further branches in the algorithm) - In the 1-child subcase of case 1, copy the __rb_parent_color field from the erased node to the child instead of recomputing it from the desired parent and color - When searching for the erased node's successor, differentiate between cases 2 and 3 based on whether any left links were followed. This avoids a condition later down. - In case 3, keep a pointer to the erased node's right child so we don't have to refetch it later to adjust its parent. - In the no-childs subcase of cases 2 and 3, place the rebalance assigment last so that the compiler can remove the following if(rebalance) test. Also, added some comments to illustrate cases 2 and 3. Signed-off-by: NMichel Lespinasse <walken@google.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
An interesting observation for rb_erase() is that when a node has exactly one child, the node must be black and the child must be red. An interesting consequence is that removing such a node can be done by simply replacing it with its child and making the child black, which we can do efficiently in rb_erase(). __rb_erase_color() then only needs to handle the no-childs case and can be modified accordingly. Signed-off-by: NMichel Lespinasse <walken@google.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
In rb_erase, move the easy case (node to erase has no more than 1 child) first. I feel the code reads easier that way. Signed-off-by: NMichel Lespinasse <walken@google.com> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Add __rb_change_child() as an inline helper function to replace code that would otherwise be duplicated 4 times in the source. No changes to binary size or speed. Signed-off-by: NMichel Lespinasse <walken@google.com> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
Just a small fix to make sparse happy. Signed-off-by: NMichel Lespinasse <walken@google.com> Reported-by: NFengguang Wu <wfg@linux.intel.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-