1. 05 6月, 2014 5 次提交
  2. 07 5月, 2014 2 次提交
    • C
      slub: use sysfs'es release mechanism for kmem_cache · 41a21285
      Christoph Lameter 提交于
      debugobjects warning during netfilter exit:
      
          ------------[ cut here ]------------
          WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G        W 3.11.0-next-20130906-sasha #3984
          Workqueue: netns cleanup_net
          Call Trace:
            dump_stack+0x52/0x87
            warn_slowpath_common+0x8c/0xc0
            warn_slowpath_fmt+0x46/0x50
            debug_print_object+0x8d/0xb0
            __debug_check_no_obj_freed+0xa5/0x220
            debug_check_no_obj_freed+0x15/0x20
            kmem_cache_free+0x197/0x340
            kmem_cache_destroy+0x86/0xe0
            nf_conntrack_cleanup_net_list+0x131/0x170
            nf_conntrack_pernet_exit+0x5d/0x70
            ops_exit_list+0x5e/0x70
            cleanup_net+0xfb/0x1c0
            process_one_work+0x338/0x550
            worker_thread+0x215/0x350
            kthread+0xe7/0xf0
            ret_from_fork+0x7c/0xb0
      
      Also during dcookie cleanup:
      
          WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408
          Call Trace:
            dump_stack (lib/dump_stack.c:52)
            warn_slowpath_common (kernel/panic.c:430)
            warn_slowpath_fmt (kernel/panic.c:445)
            debug_print_object (lib/debugobjects.c:262)
            __debug_check_no_obj_freed (lib/debugobjects.c:697)
            debug_check_no_obj_freed (lib/debugobjects.c:726)
            kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717)
            kmem_cache_destroy (mm/slab_common.c:363)
            dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343)
            event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153)
            __fput (fs/file_table.c:217)
            ____fput (fs/file_table.c:253)
            task_work_run (kernel/task_work.c:125 (discriminator 1))
            do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751)
            int_signal (arch/x86/kernel/entry_64.S:807)
      
      Sysfs has a release mechanism.  Use that to release the kmem_cache
      structure if CONFIG_SYSFS is enabled.
      
      Only slub is changed - slab currently only supports /proc/slabinfo and
      not /sys/kernel/slab/*.  We talked about adding that and someone was
      working on it.
      
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build]
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more]
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NGreg KH <greg@kroah.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41a21285
    • V
      slub: fix memcg_propagate_slab_attrs · 93030d83
      Vladimir Davydov 提交于
      After creating a cache for a memcg we should initialize its sysfs attrs
      with the values from its parent.  That's what memcg_propagate_slab_attrs
      is for.  Currently it's broken - we clearly muddled root-vs-memcg caches
      there.  Let's fix it up.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93030d83
  3. 08 4月, 2014 6 次提交
    • C
      slub: use raw_cpu_inc for incrementing statistics · 88da03a6
      Christoph Lameter 提交于
      Statistics are not critical to the operation of the allocation but
      should also not cause too much overhead.
      
      When __this_cpu_inc is altered to check if preemption is disabled this
      triggers.  Use raw_cpu_inc to avoid the checks.  Using this_cpu_ops may
      cause interrupt disable/enable sequences on various arches which may
      significantly impact allocator performance.
      
      [akpm@linux-foundation.org: add comment]
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88da03a6
    • D
      slub: fix leak of 'name' in sysfs_slab_add · 54b6a731
      Dave Jones 提交于
      The failure paths of sysfs_slab_add don't release the allocation of
      'name' made by create_unique_id() a few lines above the context of the
      diff below.  Create a common exit path to make it more obvious what
      needs freeing.
      
      [vdavydov@parallels.com: free the name only if !unmergeable]
      Signed-off-by: NDave Jones <davej@fedoraproject.org>
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54b6a731
    • V
      slub: rework sysfs layout for memcg caches · 9a41707b
      Vladimir Davydov 提交于
      Currently, we try to arrange sysfs entries for memcg caches in the same
      manner as for global caches.  Apart from turning /sys/kernel/slab into a
      mess when there are a lot of kmem-active memcgs created, it actually
      does not work properly - we won't create more than one link to a memcg
      cache in case its parent is merged with another cache.  For instance, if
      A is a root cache merged with another root cache B, we will have the
      following sysfs setup:
      
        X
        A -> X
        B -> X
      
      where X is some unique id (see create_unique_id()).  Now if memcgs M and
      N start to allocate from cache A (or B, which is the same), we will get:
      
        X
        X:M
        X:N
        A -> X
        B -> X
        A:M -> X:M
        A:N -> X:N
      
      Since B is an alias for A, we won't get entries B:M and B:N, which is
      confusing.
      
      It is more logical to have entries for memcg caches under the
      corresponding root cache's sysfs directory.  This would allow us to keep
      sysfs layout clean, and avoid such inconsistencies like one described
      above.
      
      This patch does the trick.  It creates a "cgroup" kset in each root
      cache kobject to keep its children caches there.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a41707b
    • V
      slub: adjust memcg caches when creating cache alias · 84d0ddd6
      Vladimir Davydov 提交于
      Otherwise, kzalloc() called from a memcg won't clear the whole object.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84d0ddd6
    • V
      memcg, slab: never try to merge memcg caches · a44cb944
      Vladimir Davydov 提交于
      When a kmem cache is created (kmem_cache_create_memcg()), we first try to
      find a compatible cache that already exists and can handle requests from
      the new cache, i.e.  has the same object size, alignment, ctor, etc.  If
      there is such a cache, we do not create any new caches, instead we simply
      increment the refcount of the cache found and return it.
      
      Currently we do this procedure not only when creating root caches, but
      also for memcg caches.  However, there is no point in that, because, as
      every memcg cache has exactly the same parameters as its parent and cache
      merging cannot be turned off in runtime (only on boot by passing
      "slub_nomerge"), the root caches of any two potentially mergeable memcg
      caches should be merged already, i.e.  it must be the same root cache, and
      therefore we couldn't even get to the memcg cache creation, because it
      already exists.
      
      The only exception is boot caches - they are explicitly forbidden to be
      merged by setting their refcount to -1.  There are currently only two of
      them - kmem_cache and kmem_cache_node, which are used in slab internals (I
      do not count kmalloc caches as their refcount is set to 1 immediately
      after creation).  Since they are prevented from merging preliminary I
      guess we should avoid to merge their children too.
      
      So let's remove the useless code responsible for merging memcg caches.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a44cb944
    • D
      mm, mempolicy: rename slab_node for clarity · 2a389610
      David Rientjes 提交于
      slab_node() is actually a mempolicy function, so rename it to
      mempolicy_slab_node() to make it clearer that it used for processes with
      mempolicies.
      
      At the same time, cleanup its code by saving numa_mem_id() in a local
      variable (since we require a node with memory, not just any node) and
      remove an obsolete comment that assumes the mempolicy is actually passed
      into the function.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Tim Hockin <thockin@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a389610
  4. 04 4月, 2014 2 次提交
    • V
      slub: do not drop slab_mutex for sysfs_slab_add · 421af243
      Vladimir Davydov 提交于
      We release the slab_mutex while calling sysfs_slab_add from
      __kmem_cache_create since commit 66c4c35c ("slub: Do not hold
      slub_lock when calling sysfs_slab_add()"), because kobject_uevent called
      by sysfs_slab_add might block waiting for the usermode helper to exec,
      which would result in a deadlock if we took the slab_mutex while
      executing it.
      
      However, apart from complicating synchronization rules, releasing the
      slab_mutex on kmem cache creation can result in a kmemcg-related race.
      The point is that we check if the memcg cache exists before going to
      __kmem_cache_create, but register the new cache in memcg subsys after
      it.  Since we can drop the mutex there, several threads can see that the
      memcg cache does not exist and proceed to creating it, which is wrong.
      
      Fortunately, recently kobject_uevent was patched to call the usermode
      helper with the UMH_NO_WAIT flag, making the deadlock impossible.
      Therefore there is no point in releasing the slab_mutex while calling
      sysfs_slab_add, so let's simplify kmem_cache_create synchronization and
      fix the kmemcg-race mentioned above by holding the slab_mutex during the
      whole cache creation path.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      421af243
    • M
      mm: optimize put_mems_allowed() usage · d26914d1
      Mel Gorman 提交于
      Since put_mems_allowed() is strictly optional, its a seqcount retry, we
      don't need to evaluate the function if the allocation was in fact
      successful, saving a smp_rmb some loads and comparisons on some relative
      fast-paths.
      
      Since the naming, get/put_mems_allowed() does suggest a mandatory
      pairing, rename the interface, as suggested by Mel, to resemble the
      seqcount interface.
      
      This gives us: read_mems_allowed_begin() and read_mems_allowed_retry(),
      where it is important to note that the return value of the latter call
      is inverted from its previous incarnation.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d26914d1
  5. 27 3月, 2014 1 次提交
  6. 11 2月, 2014 2 次提交
  7. 31 1月, 2014 2 次提交
    • D
      mm: slub: work around unneeded lockdep warning · 67b6c900
      Dave Hansen 提交于
      The slub code does some setup during early boot in
      early_kmem_cache_node_alloc() with some local data.  There is no
      possible way that another CPU can see this data, so the slub code
      doesn't unnecessarily lock it.  However, some new lockdep asserts
      check to make sure that add_partial() _always_ has the list_lock
      held.
      
      Just add the locking, even though it is technically unnecessary.
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      67b6c900
    • D
      mm/slub.c: fix page->_count corruption (again) · a0320865
      Dave Hansen 提交于
      Commit abca7c49 ("mm: fix slab->page _count corruption when using
      slub") notes that we can not _set_ a page->counters directly, except
      when using a real double-cmpxchg.  Doing so can lose updates to
      ->_count.
      
      That is an absolute rule:
      
              You may not *set* page->counters except via a cmpxchg.
      
      Commit abca7c49 fixed this for the folks who have the slub
      cmpxchg_double code turned off at compile time, but it left the bad case
      alone.  It can still be reached, and the same bug triggered in two
      cases:
      
      1. Turning on slub debugging at runtime, which is available on
         the distro kernels that I looked at.
      2. On 64-bit CPUs with no CMPXCHG16B (some early AMD x86-64
         cpus, evidently)
      
      There are at least 3 ways we could fix this:
      
      1. Take all of the exising calls to cmpxchg_double_slab() and
         __cmpxchg_double_slab() and convert them to take an old, new
         and target 'struct page'.
      2. Do (1), but with the newly-introduced 'slub_data'.
      3. Do some magic inside the two cmpxchg...slab() functions to
         pull the counters out of new_counters and only set those
         fields in page->{inuse,frozen,objects}.
      
      I've done (2) as well, but it's a bunch more code.  This patch is an
      attempt at (3).  This was the most straightforward and foolproof way
      that I could think to do this.
      
      This would also technically allow us to get rid of the ugly
      
      #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \
             defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE)
      
      in 'struct page', but leaving it alone has the added benefit that
      'counters' stays 'unsigned' instead of 'unsigned long', so all the
      copies that the slub code does stay a bit smaller.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0320865
  8. 30 1月, 2014 1 次提交
  9. 24 1月, 2014 1 次提交
  10. 14 1月, 2014 2 次提交
  11. 29 12月, 2013 1 次提交
    • L
      slub: Fix calculation of cpu slabs · 8afb1474
      Li Zefan 提交于
        /sys/kernel/slab/:t-0000048 # cat cpu_slabs
        231 N0=16 N1=215
        /sys/kernel/slab/:t-0000048 # cat slabs
        145 N0=36 N1=109
      
      See, the number of slabs is smaller than that of cpu slabs.
      
      The bug was introduced by commit 49e22585
      ("slub: per cpu cache for partial pages").
      
      We should use page->pages instead of page->pobjects when calculating
      the number of cpu partial slabs. This also fixes the mapping of slabs
      and nodes.
      
      As there's no variable storing the number of total/active objects in
      cpu partial slabs, and we don't have user interfaces requiring those
      statistics, I just add WARN_ON for those cases.
      
      Cc: <stable@vger.kernel.org> # 3.2+
      Acked-by: NChristoph Lameter <cl@linux.com>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      8afb1474
  12. 13 11月, 2013 1 次提交
  13. 12 11月, 2013 2 次提交
    • Z
      mm, slub: fix the typo in mm/slub.c · 721ae22a
      Zhi Yong Wu 提交于
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      721ae22a
    • C
      slub: Handle NULL parameter in kmem_cache_flags · c6f58d9b
      Christoph Lameter 提交于
      Andreas Herrmann writes:
      
        When I've used slub_debug kernel option (e.g.
        "slub_debug=,skbuff_fclone_cache" or similar) on a debug session I've
        seen a panic like:
      
          Highbank #setenv bootargs console=ttyAMA0 root=/dev/sda2 kgdboc.kgdboc=ttyAMA0,115200 slub_debug=,kmalloc-4096 earlyprintk=ttyAMA0
          ...
          Unable to handle kernel NULL pointer dereference at virtual address 00000000
          pgd = c0004000
          [00000000] *pgd=00000000
          Internal error: Oops: 5 [#1] SMP ARM
          Modules linked in:
          CPU: 0 PID: 0 Comm: swapper Tainted: G        W    3.12.0-00048-gbe408cd3 #314
          task: c0898360 ti: c088a000 task.ti: c088a000
          PC is at strncmp+0x1c/0x84
          LR is at kmem_cache_flags.isra.46.part.47+0x44/0x60
          pc : [<c02c6da0>]    lr : [<c0110a3c>]    psr: 200001d3
          sp : c088bea8  ip : c088beb8  fp : c088beb4
          r10: 00000000  r9 : 413fc090  r8 : 00000001
          r7 : 00000000  r6 : c2984a08  r5 : c0966e78  r4 : 00000000
          r3 : 0000006b  r2 : 0000000c  r1 : 00000000  r0 : c2984a08
          Flags: nzCv  IRQs off  FIQs off  Mode SVC_32  ISA ARM  Segment kernel
          Control: 10c5387d  Table: 0000404a  DAC: 00000015
          Process swapper (pid: 0, stack limit = 0xc088a248)
          Stack: (0xc088bea8 to 0xc088c000)
          bea0:                   c088bed4 c088beb8 c0110a3c c02c6d90 c0966e78 00000040
          bec0: ef001f00 00000040 c088bf14 c088bed8 c0112070 c0110a04 00000005 c010fac8
          bee0: c088bf5c c088bef0 c010fac8 ef001f00 00000040 00000000 00000040 00000001
          bf00: 413fc090 00000000 c088bf34 c088bf18 c0839190 c0112040 00000000 ef001f00
          bf20: 00000000 00000000 c088bf54 c088bf38 c0839200 c083914c 00000006 c0961c4c
          bf40: c0961c28 00000000 c088bf7c c088bf58 c08392ac c08391c0 c08a2ed8 c0966e78
          bf60: c086b874 c08a3f50 c0961c28 00000001 c088bfb4 c088bf80 c083b258 c0839248
          bf80: 2f800000 0f000000 c08935b4 ffffffff c08cd400 ffffffff c08cd400 c0868408
          bfa0: c29849c0 00000000 c088bff4 c088bfb8 c0824974 c083b1e4 ffffffff ffffffff
          bfc0: c08245c0 00000000 00000000 c0868408 00000000 10c5387d c0892bcc c0868404
          bfe0: c0899440 0000406a 00000000 c088bff8 00008074 c0824824 00000000 00000000
          [<c02c6da0>] (strncmp+0x1c/0x84) from [<c0110a3c>] (kmem_cache_flags.isra.46.part.47+0x44/0x60)
          [<c0110a3c>] (kmem_cache_flags.isra.46.part.47+0x44/0x60) from [<c0112070>] (__kmem_cache_create+0x3c/0x410)
          [<c0112070>] (__kmem_cache_create+0x3c/0x410) from [<c0839190>] (create_boot_cache+0x50/0x74)
          [<c0839190>] (create_boot_cache+0x50/0x74) from [<c0839200>] (create_kmalloc_cache+0x4c/0x88)
          [<c0839200>] (create_kmalloc_cache+0x4c/0x88) from [<c08392ac>] (create_kmalloc_caches+0x70/0x114)
          [<c08392ac>] (create_kmalloc_caches+0x70/0x114) from [<c083b258>] (kmem_cache_init+0x80/0xe0)
          [<c083b258>] (kmem_cache_init+0x80/0xe0) from [<c0824974>] (start_kernel+0x15c/0x318)
          [<c0824974>] (start_kernel+0x15c/0x318) from [<00008074>] (0x8074)
          Code: e3520000 01a00002 089da800 e5d03000 (e5d1c000)
          ---[ end trace 1b75b31a2719ed1d ]---
          Kernel panic - not syncing: Fatal exception
      
        Problem is that slub_debug option is not parsed before
        create_boot_cache is called. Solve this by changing slub_debug to
        early_param.
      
        Kernels 3.11, 3.10 are also affected.  I am not sure about older
        kernels.
      
      Christoph Lameter explains:
      
        kmem_cache_flags may be called with NULL parameter during early boot.
        Skip the test in that case.
      
      Cc: stable@vger.kernel.org # 3.10 and 3.11
      Reported-by: NAndreas Herrmann <andreas.herrmann@calxeda.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      c6f58d9b
  14. 25 10月, 2013 1 次提交
  15. 18 10月, 2013 1 次提交
  16. 12 9月, 2013 1 次提交
  17. 05 9月, 2013 2 次提交
  18. 13 8月, 2013 2 次提交
  19. 09 8月, 2013 1 次提交
    • L
      Revert "slub: do not put a slab to cpu partial list when cpu_partial is 0" · 37090506
      Linus Torvalds 提交于
      This reverts commit 318df36e.
      
      This commit caused Steven Rostedt's hackbench runs to run out of memory
      due to a leak.  As noted by Joonsoo Kim, it is buggy in the following
      scenario:
      
       "I guess, you may set 0 to all kmem caches's cpu_partial via sysfs,
        doesn't it?
      
        In this case, memory leak is possible in following case.  Code flow of
        possible leak is follwing case.
      
         * in __slab_free()
         1. (!new.inuse || !prior) && !was_frozen
         2. !kmem_cache_debug && !prior
         3. new.frozen = 1
         4. after cmpxchg_double_slab, run the (!n) case with new.frozen=1
         5. with this patch, put_cpu_partial() doesn't do anything,
            because this cache's cpu_partial is 0
         6. return
      
        In step 5, leak occur"
      
      And Steven does indeed have cpu_partial set to 0 due to RT testing.
      
      Joonsoo is cooking up a patch, but everybody agrees that reverting this
      for now is the right thing to do.
      Reported-and-bisected-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37090506
  20. 17 7月, 2013 1 次提交
  21. 15 7月, 2013 3 次提交
    • C
      mm/slub: remove 'per_cpu' which is useless variable · e35e1a97
      Chen Gang 提交于
      Remove 'per_cpu', since it is useless now after the patch: "205ab99d
      slub: Update statistics handling for variable order slabs". And the
      partial list is handled in the same way as the per cpu slab.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NChen Gang <gang.chen@asianux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      e35e1a97
    • P
      kernel: delete __cpuinit usage from all core kernel files · 0db0628d
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the uses of the __cpuinit macros from C files in
      the core kernel directories (kernel, init, lib, mm, and include)
      that don't really have a specific maintainer.
      
      [1] https://lkml.org/lkml/2013/5/20/589Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      0db0628d
    • S
      slub: Check for page NULL before doing the node_match check · c25f195e
      Steven Rostedt 提交于
      In the -rt kernel (mrg), we hit the following dump:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
      PGD a2d39067 PUD b1641067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
      CPU 3
      Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992
      RIP: 0010:[<ffffffff811573f1>]  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
      RSP: 0018:ffff8800a9b17d70  EFLAGS: 00010213
      RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000
      RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500
      RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd
      R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500
      R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000
      FS:  00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000)
      Stack:
       ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011
       0000000001200011 0000000001200011 0000000000000000 0000000000000000
       00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd
      Call Trace:
       [<ffffffff81202e08>] ? current_has_perm+0x68/0x80
       [<ffffffff81041cbd>] copy_process+0xdd/0x15b0
       [<ffffffff810a2125>] ? rt_up_read+0x25/0x30
       [<ffffffff8104369a>] do_fork+0x5a/0x360
       [<ffffffff8107c66b>] ? migrate_enable+0xeb/0x220
       [<ffffffff8100b068>] sys_clone+0x28/0x30
       [<ffffffff81527423>] stub_clone+0x13/0x20
       [<ffffffff81527152>] ? system_call_fastpath+0x16/0x1b
      Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 <48> 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2
      RIP  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
       RSP <ffff8800a9b17d70>
      CR2: 0000000000000000
      ---[ end trace 0000000000000002 ]---
      
      Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
      with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
      disable migration. But the SLUB code is relatively lockless, and the
      spin_locks there are raw_spin_locks (not converted to mutexes), thus I
      believe this bug can happen in mainline without -rt features. The -rt
      patch is just good at triggering mainline bugs ;-)
      
      Anyway, looking at where this crashed, it seems that the page variable
      can be NULL when passed to the node_match() function (which does not
      check if it is NULL). When this happens we get the above panic.
      
      As page is only used in slab_alloc() to check if the node matches, if
      it's NULL I'm assuming that we can say it doesn't and call the
      __slab_alloc() code. Is this a correct assumption?
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c25f195e