1. 29 12月, 2008 1 次提交
  2. 11 12月, 2008 1 次提交
  3. 02 12月, 2008 1 次提交
    • K
      memcg: memory hotplug fix for notifier callback · dc19f9db
      KAMEZAWA Hiroyuki 提交于
      Fixes for memcg/memory hotplug.
      
      While memory hotplug allocate/free memmap, page_cgroup doesn't free
      page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
      (Because freeing bootmem requires special care.)
      
      Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
      by memory hotplug, page_cgroup->page == page is no longer true.
      
      But current MEM_ONLINE handler doesn't check it and update
      page_cgroup->page if it's not necessary to allocate page_cgroup.  (This
      was not found because memmap is not freed if SPARSEMEM_VMEMMAP is y.)
      
      And I noticed that MEM_ONLINE can be called against "part of section".
      So, freeing page_cgroup at CANCEL_ONLINE will cause trouble.  (freeing
      used page_cgroup) Don't rollback at CANCEL.
      
      One more, current memory hotplug notifier is stopped by slub because it
      sets NOTIFY_STOP_MASK to return vaule.  So, page_cgroup's callback never
      be called.  (low priority than slub now.)
      
      I think this slub's behavior is not intentional(BUG). and fixes it.
      
      Another way to be considered about page_cgroup allocation:
        - free page_cgroup at OFFLINE even if it's from bootmem
          and remove specieal handler. But it requires more changes.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12041Signed-off-by: NKAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Tested-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc19f9db
  4. 23 10月, 2008 1 次提交
  5. 15 9月, 2008 1 次提交
  6. 21 8月, 2008 1 次提交
  7. 05 8月, 2008 1 次提交
    • P
      SLUB: dynamic per-cache MIN_PARTIAL · 5595cffc
      Pekka Enberg 提交于
      This patch changes the static MIN_PARTIAL to a dynamic per-cache ->min_partial
      value that is calculated from object size. The bigger the object size, the more
      pages we keep on the partial list.
      
      I tested SLAB, SLUB, and SLUB with this patch on Jens Axboe's 'netio' example
      script of the fio benchmarking tool. The script stresses the networking
      subsystem which should also give a fairly good beating of kmalloc() et al.
      
      To run the test yourself, first clone the fio repository:
      
        git clone git://git.kernel.dk/fio.git
      
      and then run the following command n times on your machine:
      
        time ./fio examples/netio
      
      The results on my 2-way 64-bit x86 machine are as follows:
      
        [ the minimum, maximum, and average are captured from 50 individual runs ]
      
                       real time (seconds)
                       min      max      avg      sd
        SLAB           22.76    23.38    22.98    0.17
        SLUB           22.80    25.78    23.46    0.72
        SLUB (dynamic) 22.74    23.54    23.00    0.20
      
                       sys time (seconds)
                       min      max      avg      sd
        SLAB           6.90     8.28     7.70     0.28
        SLUB           7.42     16.95    8.89     2.28
        SLUB (dynamic) 7.17     8.64     7.73     0.29
      
                       user time (seconds)
                       min      max      avg      sd
        SLAB           36.89    38.11    37.50    0.29
        SLUB           30.85    37.99    37.06    1.67
        SLUB (dynamic) 36.75    38.07    37.59    0.32
      
      As you can see from the above numbers, this patch brings SLUB to the same level
      as SLAB for this particular workload fixing a ~2% regression. I'd expect this
      change to help similar workloads that allocate a lot of objects that are close
      to the size of a page.
      
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: NChristoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      5595cffc
  8. 30 7月, 2008 1 次提交
  9. 27 7月, 2008 1 次提交
  10. 25 7月, 2008 1 次提交
  11. 19 7月, 2008 1 次提交
  12. 17 7月, 2008 1 次提交
  13. 16 7月, 2008 2 次提交
  14. 15 7月, 2008 1 次提交
  15. 11 7月, 2008 1 次提交
    • D
      slub: Fix use-after-preempt of per-CPU data structure · bdb21928
      Dmitry Adamushko 提交于
      Vegard Nossum reported a crash in kmem_cache_alloc():
      
      	BUG: unable to handle kernel paging request at da87d000
      	IP: [<c01991c7>] kmem_cache_alloc+0xc7/0xe0
      	*pde = 28180163 *pte = 1a87d160
      	Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      	Pid: 3850, comm: grep Not tainted (2.6.26-rc9-00059-gb190333 #5)
      	EIP: 0060:[<c01991c7>] EFLAGS: 00210203 CPU: 0
      	EIP is at kmem_cache_alloc+0xc7/0xe0
      	EAX: 00000000 EBX: da87c100 ECX: 1adad71a EDX: 6b6b6b6b
      	ESI: 00200282 EDI: da87d000 EBP: f60bfe74 ESP: f60bfe54
      	DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      
      and analyzed it:
      
        "The register %ecx looks innocent but is very important here. The disassembly:
      
             mov    %edx,%ecx
             shr    $0x2,%ecx
             rep stos %eax,%es:(%edi) <-- the fault
      
         So %ecx has been loaded from %edx... which is 0x6b6b6b6b/POISON_FREE.
         (0x6b6b6b6b >> 2 == 0x1adadada.)
      
         %ecx is the counter for the memset, from here:
      
             memset(object, 0, c->objsize);
      
        i.e. %ecx was loaded from c->objsize, so "c" must have been freed.
        Where did "c" come from? Uh-oh...
      
             c = get_cpu_slab(s, smp_processor_id());
      
        This looks like it has very much to do with CPU hotplug/unplug. Is
        there a race between SLUB/hotplug since the CPU slab is used after it
        has been freed?"
      
      Good analysis.
      
      Yeah, it's possible that a caller of kmem_cache_alloc() -> slab_alloc()
      can be migrated on another CPU right after local_irq_restore() and
      before memset().  The inital cpu can become offline in the mean time (or
      a migration is a consequence of the CPU going offline) so its
      'kmem_cache_cpu' structure gets freed ( slab_cpuup_callback).
      
      At some point of time the caller continues on another CPU having an
      obsolete pointer...
      Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdb21928
  16. 05 7月, 2008 1 次提交
  17. 04 7月, 2008 1 次提交
  18. 26 6月, 2008 1 次提交
  19. 23 5月, 2008 1 次提交
  20. 09 5月, 2008 1 次提交
  21. 02 5月, 2008 2 次提交
  22. 01 5月, 2008 1 次提交
    • R
      remove div_long_long_rem · f8bd2258
      Roman Zippel 提交于
      x86 is the only arch right now, which provides an optimized for
      div_long_long_rem and it has the downside that one has to be very careful that
      the divide doesn't overflow.
      
      The API is a little akward, as the arguments for the unsigned divide are
      signed.  The signed version also doesn't handle a negative divisor and
      produces worse code on 64bit archs.
      
      There is little incentive to keep this API alive, so this converts the few
      users to the new API.
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8bd2258
  23. 30 4月, 2008 1 次提交
    • T
      infrastructure to debug (dynamic) objects · 3ac7fe5a
      Thomas Gleixner 提交于
      We can see an ever repeating problem pattern with objects of any kind in the
      kernel:
      
      1) freeing of active objects
      2) reinitialization of active objects
      
      Both problems can be hard to debug because the crash happens at a point where
      we have no chance to decode the root cause anymore.  One problem spot are
      kernel timers, where the detection of the problem often happens in interrupt
      context and usually causes the machine to panic.
      
      While working on a timer related bug report I had to hack specialized code
      into the timer subsystem to get a reasonable hint for the root cause.  This
      debug hack was fine for temporary use, but far from a mergeable solution due
      to the intrusiveness into the timer code.
      
      The code further lacked the ability to detect and report the root cause
      instantly and keep the system operational.
      
      Keeping the system operational is important to get hold of the debug
      information without special debugging aids like serial consoles and special
      knowledge of the bug reporter.
      
      The problems described above are not restricted to timers, but timers tend to
      expose it usually in a full system crash.  Other objects are less explosive,
      but the symptoms caused by such mistakes can be even harder to debug.
      
      Instead of creating specialized debugging code for the timer subsystem a
      generic infrastructure is created which allows developers to verify their code
      and provides an easy to enable debug facility for users in case of trouble.
      
      The debugobjects core code keeps track of operations on static and dynamic
      objects by inserting them into a hashed list and sanity checking them on
      object operations and provides additional checks whenever kernel memory is
      freed.
      
      The tracked object operations are:
      - initializing an object
      - adding an object to a subsystem list
      - deleting an object from a subsystem list
      
      Each operation is sanity checked before the operation is executed and the
      subsystem specific code can provide a fixup function which allows to prevent
      the damage of the operation.  When the sanity check triggers a warning message
      and a stack trace is printed.
      
      The list of operations can be extended if the need arises.  For now it's
      limited to the requirements of the first user (timers).
      
      The core code enqueues the objects into hash buckets.  The hash index is
      generated from the address of the object to simplify the lookup for the check
      on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
      global lock.
      
      The debug code can be compiled in without being active.  The runtime overhead
      is minimal and could be optimized by asm alternatives.  A kernel command line
      option enables the debugging code.
      
      Thanks to Ingo Molnar for review, suggestions and cleanup patches.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ac7fe5a
  24. 29 4月, 2008 1 次提交
  25. 28 4月, 2008 4 次提交
  26. 27 4月, 2008 10 次提交