1. 07 8月, 2014 1 次提交
  2. 05 6月, 2014 4 次提交
    • V
      memcg, slab: merge memcg_{bind,release}_pages to memcg_{un}charge_slab · c67a8a68
      Vladimir Davydov 提交于
      Currently we have two pairs of kmemcg-related functions that are called on
      slab alloc/free.  The first is memcg_{bind,release}_pages that count the
      total number of pages allocated on a kmem cache.  The second is
      memcg_{un}charge_slab that {un}charge slab pages to kmemcg resource
      counter.  Let's just merge them to keep the code clean.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c67a8a68
    • V
      memcg, slab: do not schedule cache destruction when last page goes away · 1e32e77f
      Vladimir Davydov 提交于
      This patchset is a part of preparations for kmemcg re-parenting.  It
      targets at simplifying kmemcg work-flows and synchronization.
      
      First, it removes async per memcg cache destruction (see patches 1, 2).
      Now caches are only destroyed on memcg offline.  That means the caches
      that are not empty on memcg offline will be leaked.  However, they are
      already leaked, because memcg_cache_params::nr_pages normally never drops
      to 0 so the destruction work is never scheduled except kmem_cache_shrink
      is called explicitly.  In the future I'm planning reaping such dead caches
      on vmpressure or periodically.
      
      Second, it substitutes per memcg slab_caches_mutex's with the global
      memcg_slab_mutex, which should be taken during the whole per memcg cache
      creation/destruction path before the slab_mutex (see patch 3).  This
      greatly simplifies synchronization among various per memcg cache
      creation/destruction paths.
      
      I'm still not quite sure about the end picture, in particular I don't know
      whether we should reap dead memcgs' kmem caches periodically or try to
      merge them with their parents (see https://lkml.org/lkml/2014/4/20/38 for
      more details), but whichever way we choose, this set looks like a
      reasonable change to me, because it greatly simplifies kmemcg work-flows
      and eases further development.
      
      This patch (of 3):
      
      After a memcg is offlined, we mark its kmem caches that cannot be deleted
      right now due to pending objects as dead by setting the
      memcg_cache_params::dead flag, so that memcg_release_pages will schedule
      cache destruction (memcg_cache_params::destroy) as soon as the last slab
      of the cache is freed (memcg_cache_params::nr_pages drops to zero).
      
      I guess the idea was to destroy the caches as soon as possible, i.e.
      immediately after freeing the last object.  However, it just doesn't work
      that way, because kmem caches always preserve some pages for the sake of
      performance, so that nr_pages never gets to zero unless the cache is
      shrunk explicitly using kmem_cache_shrink.  Of course, we could account
      the total number of objects on the cache or check if all the slabs
      allocated for the cache are empty on kmem_cache_free and schedule
      destruction if so, but that would be too costly.
      
      Thus we have a piece of code that works only when we explicitly call
      kmem_cache_shrink, but complicates the whole picture a lot.  Moreover,
      it's racy in fact.  For instance, kmem_cache_shrink may free the last slab
      and thus schedule cache destruction before it finishes checking that the
      cache is empty, which can lead to use-after-free.
      
      So I propose to remove this async cache destruction from
      memcg_release_pages, and check if the cache is empty explicitly after
      calling kmem_cache_shrink instead.  This will simplify things a lot w/o
      introducing any functional changes.
      
      And regarding dead memcg caches (i.e.  those that are left hanging around
      after memcg offline for they have objects), I suppose we should reap them
      either periodically or on vmpressure as Glauber suggested initially.  I'm
      going to implement this later.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e32e77f
    • V
      slab: get_online_mems for kmem_cache_{create,destroy,shrink} · 03afc0e2
      Vladimir Davydov 提交于
      When we create a sl[au]b cache, we allocate kmem_cache_node structures
      for each online NUMA node.  To handle nodes taken online/offline, we
      register memory hotplug notifier and allocate/free kmem_cache_node
      corresponding to the node that changes its state for each kmem cache.
      
      To synchronize between the two paths we hold the slab_mutex during both
      the cache creationg/destruction path and while tuning per-node parts of
      kmem caches in memory hotplug handler, but that's not quite right,
      because it does not guarantee that a newly created cache will have all
      kmem_cache_nodes initialized in case it races with memory hotplug.  For
      instance, in case of slub:
      
          CPU0                            CPU1
          ----                            ----
          kmem_cache_create:              online_pages:
           __kmem_cache_create:            slab_memory_callback:
                                            slab_mem_going_online_callback:
                                             lock slab_mutex
                                             for each slab_caches list entry
                                                 allocate kmem_cache node
                                             unlock slab_mutex
            lock slab_mutex
            init_kmem_cache_nodes:
             for_each_node_state(node, N_NORMAL_MEMORY)
                 allocate kmem_cache node
            add kmem_cache to slab_caches list
            unlock slab_mutex
                                          online_pages (continued):
                                           node_states_set_node
      
      As a result we'll get a kmem cache with not all kmem_cache_nodes
      allocated.
      
      To avoid issues like that we should hold get/put_online_mems() during
      the whole kmem cache creation/destruction/shrink paths, just like we
      deal with cpu hotplug.  This patch does the trick.
      
      Note, that after it's applied, there is no need in taking the slab_mutex
      for kmem_cache_shrink any more, so it is removed from there.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03afc0e2
    • V
      sl[au]b: charge slabs to kmemcg explicitly · 5dfb4175
      Vladimir Davydov 提交于
      We have only a few places where we actually want to charge kmem so
      instead of intruding into the general page allocation path with
      __GFP_KMEMCG it's better to explictly charge kmem there.  All kmem
      charges will be easier to follow that way.
      
      This is a step towards removing __GFP_KMEMCG.  It removes __GFP_KMEMCG
      from memcg caches' allocflags.  Instead it makes slab allocation path
      call memcg_charge_kmem directly getting memcg to charge from the cache's
      memcg params.
      
      This also eliminates any possibility of misaccounting an allocation
      going from one memcg's cache to another memcg, because now we always
      charge slabs against the memcg the cache belongs to.  That's why this
      patch removes the big comment to memcg_kmem_get_cache.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5dfb4175
  3. 07 5月, 2014 1 次提交
    • C
      slub: use sysfs'es release mechanism for kmem_cache · 41a21285
      Christoph Lameter 提交于
      debugobjects warning during netfilter exit:
      
          ------------[ cut here ]------------
          WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G        W 3.11.0-next-20130906-sasha #3984
          Workqueue: netns cleanup_net
          Call Trace:
            dump_stack+0x52/0x87
            warn_slowpath_common+0x8c/0xc0
            warn_slowpath_fmt+0x46/0x50
            debug_print_object+0x8d/0xb0
            __debug_check_no_obj_freed+0xa5/0x220
            debug_check_no_obj_freed+0x15/0x20
            kmem_cache_free+0x197/0x340
            kmem_cache_destroy+0x86/0xe0
            nf_conntrack_cleanup_net_list+0x131/0x170
            nf_conntrack_pernet_exit+0x5d/0x70
            ops_exit_list+0x5e/0x70
            cleanup_net+0xfb/0x1c0
            process_one_work+0x338/0x550
            worker_thread+0x215/0x350
            kthread+0xe7/0xf0
            ret_from_fork+0x7c/0xb0
      
      Also during dcookie cleanup:
      
          WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0()
          ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
          Modules linked in:
          CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408
          Call Trace:
            dump_stack (lib/dump_stack.c:52)
            warn_slowpath_common (kernel/panic.c:430)
            warn_slowpath_fmt (kernel/panic.c:445)
            debug_print_object (lib/debugobjects.c:262)
            __debug_check_no_obj_freed (lib/debugobjects.c:697)
            debug_check_no_obj_freed (lib/debugobjects.c:726)
            kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717)
            kmem_cache_destroy (mm/slab_common.c:363)
            dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343)
            event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153)
            __fput (fs/file_table.c:217)
            ____fput (fs/file_table.c:253)
            task_work_run (kernel/task_work.c:125 (discriminator 1))
            do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751)
            int_signal (arch/x86/kernel/entry_64.S:807)
      
      Sysfs has a release mechanism.  Use that to release the kmem_cache
      structure if CONFIG_SYSFS is enabled.
      
      Only slub is changed - slab currently only supports /proc/slabinfo and
      not /sys/kernel/slab/*.  We talked about adding that and someone was
      working on it.
      
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build]
      [akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more]
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NGreg KH <greg@kroah.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41a21285
  4. 08 4月, 2014 1 次提交
    • V
      memcg, slab: never try to merge memcg caches · a44cb944
      Vladimir Davydov 提交于
      When a kmem cache is created (kmem_cache_create_memcg()), we first try to
      find a compatible cache that already exists and can handle requests from
      the new cache, i.e.  has the same object size, alignment, ctor, etc.  If
      there is such a cache, we do not create any new caches, instead we simply
      increment the refcount of the cache found and return it.
      
      Currently we do this procedure not only when creating root caches, but
      also for memcg caches.  However, there is no point in that, because, as
      every memcg cache has exactly the same parameters as its parent and cache
      merging cannot be turned off in runtime (only on boot by passing
      "slub_nomerge"), the root caches of any two potentially mergeable memcg
      caches should be merged already, i.e.  it must be the same root cache, and
      therefore we couldn't even get to the memcg cache creation, because it
      already exists.
      
      The only exception is boot caches - they are explicitly forbidden to be
      merged by setting their refcount to -1.  There are currently only two of
      them - kmem_cache and kmem_cache_node, which are used in slab internals (I
      do not count kmalloc caches as their refcount is set to 1 immediately
      after creation).  Since they are prevented from merging preliminary I
      guess we should avoid to merge their children too.
      
      So let's remove the useless code responsible for merging memcg caches.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a44cb944
  5. 24 1月, 2014 2 次提交
    • V
      memcg, slab: RCU protect memcg_params for root caches · f8570263
      Vladimir Davydov 提交于
      We relocate root cache's memcg_params whenever we need to grow the
      memcg_caches array to accommodate all kmem-active memory cgroups.
      Currently on relocation we free the old version immediately, which can
      lead to use-after-free, because the memcg_caches array is accessed
      lock-free (see cache_from_memcg_idx()).  This patch fixes this by making
      memcg_params RCU-protected for root caches.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8570263
    • V
      memcg, slab: fix barrier usage when accessing memcg_caches · 959c8963
      Vladimir Davydov 提交于
      Each root kmem_cache has pointers to per-memcg caches stored in its
      memcg_params::memcg_caches array.  Whenever we want to allocate a slab
      for a memcg, we access this array to get per-memcg cache to allocate
      from (see memcg_kmem_get_cache()).  The access must be lock-free for
      performance reasons, so we should use barriers to assert the kmem_cache
      is up-to-date.
      
      First, we should place a write barrier immediately before setting the
      pointer to it in the memcg_caches array in order to make sure nobody
      will see a partially initialized object.  Second, we should issue a read
      barrier before dereferencing the pointer to conform to the write
      barrier.
      
      However, currently the barrier usage looks rather strange.  We have a
      write barrier *after* setting the pointer and a read barrier *before*
      reading the pointer, which is incorrect.  This patch fixes this.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      959c8963
  6. 13 11月, 2013 1 次提交
  7. 29 8月, 2013 1 次提交
    • A
      memcg: check that kmem_cache has memcg_params before accessing it · 6f6b8951
      Andrey Vagin 提交于
      If the system had a few memory groups and all of them were destroyed,
      memcg_limited_groups_array_size has non-zero value, but all new caches
      are created without memcg_params, because memcg_kmem_enabled() returns
      false.
      
      We try to enumirate child caches in a few places and all of them are
      potentially dangerous.
      
      For example my kernel is compiled with CONFIG_SLAB and it crashed when I
      tryed to mount a NFS share after a few experiments with kmemcg.
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
        IP: [<ffffffff8118166a>] do_tune_cpucache+0x8a/0xd0
        PGD b942a067 PUD b999f067 PMD 0
        Oops: 0000 [#1] SMP
        Modules linked in: fscache(+) ip6table_filter ip6_tables iptable_filter ip_tables i2c_piix4 pcspkr virtio_net virtio_balloon i2c_core floppy
        CPU: 0 PID: 357 Comm: modprobe Not tainted 3.11.0-rc7+ #59
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        task: ffff8800b9f98240 ti: ffff8800ba32e000 task.ti: ffff8800ba32e000
        RIP: 0010:[<ffffffff8118166a>]  [<ffffffff8118166a>] do_tune_cpucache+0x8a/0xd0
        RSP: 0018:ffff8800ba32fb70  EFLAGS: 00010246
        RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
        RDX: 0000000000000000 RSI: ffff8800b9f98910 RDI: 0000000000000246
        RBP: ffff8800ba32fba0 R08: 0000000000000002 R09: 0000000000000004
        R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000010
        R13: 0000000000000008 R14: 00000000000000d0 R15: ffff8800375d0200
        FS:  00007f55f1378740(0000) GS:ffff8800bfa00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 00007f24feba57a0 CR3: 0000000037b51000 CR4: 00000000000006f0
        Call Trace:
          enable_cpucache+0x49/0x100
          setup_cpu_cache+0x215/0x280
          __kmem_cache_create+0x2fa/0x450
          kmem_cache_create_memcg+0x214/0x350
          kmem_cache_create+0x2b/0x30
          fscache_init+0x19b/0x230 [fscache]
          do_one_initcall+0xfa/0x1b0
          load_module+0x1c41/0x26d0
          SyS_finit_module+0x86/0xb0
          system_call_fastpath+0x16/0x1b
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Glauber Costa <glommer@openvz.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f6b8951
  8. 08 7月, 2013 1 次提交
  9. 07 7月, 2013 1 次提交
  10. 01 2月, 2013 4 次提交
  11. 19 12月, 2012 6 次提交
    • G
      slab: propagate tunable values · 943a451a
      Glauber Costa 提交于
      SLAB allows us to tune a particular cache behavior with tunables.  When
      creating a new memcg cache copy, we'd like to preserve any tunables the
      parent cache already had.
      
      This could be done by an explicit call to do_tune_cpucache() after the
      cache is created.  But this is not very convenient now that the caches are
      created from common code, since this function is SLAB-specific.
      
      Another method of doing that is taking advantage of the fact that
      do_tune_cpucache() is always called from enable_cpucache(), which is
      called at cache initialization.  We can just preset the values, and then
      things work as expected.
      
      It can also happen that a root cache has its tunables updated during
      normal system operation.  In this case, we will propagate the change to
      all caches that are already active.
      
      This change will require us to move the assignment of root_cache in
      memcg_params a bit earlier.  We need this to be already set - which
      memcg_kmem_register_cache will do - when we reach __kmem_cache_create()
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      943a451a
    • G
      memcg: aggregate memcg cache values in slabinfo · 749c5415
      Glauber Costa 提交于
      When we create caches in memcgs, we need to display their usage
      information somewhere.  We'll adopt a scheme similar to /proc/meminfo,
      with aggregate totals shown in the global file, and per-group information
      stored in the group itself.
      
      For the time being, only reads are allowed in the per-group cache.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      749c5415
    • G
      memcg: destroy memcg caches · 1f458cbf
      Glauber Costa 提交于
      Implement destruction of memcg caches.  Right now, only caches where our
      reference counter is the last remaining are deleted.  If there are any
      other reference counters around, we just leave the caches lying around
      until they go away.
      
      When that happens, a destruction function is called from the cache code.
      Caches are only destroyed in process context, so we queue them up for
      later processing in the general case.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f458cbf
    • G
      sl[au]b: always get the cache from its page in kmem_cache_free() · b9ce5ef4
      Glauber Costa 提交于
      struct page already has this information.  If we start chaining caches,
      this information will always be more trustworthy than whatever is passed
      into the function.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9ce5ef4
    • G
      slab/slub: consider a memcg parameter in kmem_create_cache · 2633d7a0
      Glauber Costa 提交于
      Allow a memcg parameter to be passed during cache creation.  When the slub
      allocator is being used, it will only merge caches that belong to the same
      memcg.  We'll do this by scanning the global list, and then translating
      the cache to a memcg-specific cache
      
      Default function is created as a wrapper, passing NULL to the memcg
      version.  We only merge caches that belong to the same memcg.
      
      A helper is provided, memcg_css_id: because slub needs a unique cache name
      for sysfs.  Since this is visible, but not the canonical location for slab
      data, the cache name is not used, the css_id should suffice.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2633d7a0
    • G
      slab/slub: struct memcg_params · ba6c496e
      Glauber Costa 提交于
      For the kmem slab controller, we need to record some extra information in
      the kmem_cache structure.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NSuleiman Souhlal <suleiman@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba6c496e
  12. 11 12月, 2012 2 次提交
  13. 31 10月, 2012 1 次提交
    • G
      slab: Ignore internal flags in cache creation · d8843922
      Glauber Costa 提交于
      Some flags are used internally by the allocators for management
      purposes. One example of that is the CFLGS_OFF_SLAB flag that slab uses
      to mark that the metadata for that cache is stored outside of the slab.
      
      No cache should ever pass those as a creation flags. We can just ignore
      this bit if it happens to be passed (such as when duplicating a cache in
      the kmem memcg patches).
      
      Because such flags can vary from allocator to allocator, we allow them
      to make their own decisions on that, defining SLAB_AVAILABLE_FLAGS with
      all flags that are valid at creation time.  Allocators that doesn't have
      any specific flag requirement should define that to mean all flags.
      
      Common code will mask out all flags not belonging to that set.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      d8843922
  14. 24 10月, 2012 3 次提交
  15. 05 9月, 2012 8 次提交
    • P
      Revert "mm/sl[aou]b: Move sysfs_slab_add to common" · aac3a166
      Pekka Enberg 提交于
      This reverts commit 96d17b7b which
      caused the following errors at boot:
      
        [    1.114885] kobject (ffff88001a802578): tried to init an initialized object, something is seriously wrong.
        [    1.114885] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
        [    1.114885] Call Trace:
        [    1.114885]  [<ffffffff81273f37>] kobject_init+0x87/0xa0
        [    1.115555]  [<ffffffff8127426a>] kobject_init_and_add+0x2a/0x90
        [    1.115555]  [<ffffffff8127c870>] ? sprintf+0x40/0x50
        [    1.115555]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
        [    1.115555]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
        [    1.115555]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
        [    1.115555]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
        [    1.115555]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
        [    1.115836]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
        [    1.116834]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
        [    1.117835]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
        [    1.117835]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
        [    1.118401]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
        [    1.119832]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
        [    1.120325] ------------[ cut here ]------------
        [    1.120835] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xc1/0xf0()
        [    1.121437] sysfs: cannot create duplicate filename '/kernel/slab/:t-0000016'
        [    1.121831] Modules linked in:
        [    1.122138] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
        [    1.122831] Call Trace:
        [    1.123074]  [<ffffffff81195ce1>] ? sysfs_add_one+0xc1/0xf0
        [    1.123833]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
        [    1.124405]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
        [    1.124832]  [<ffffffff81195ce1>] sysfs_add_one+0xc1/0xf0
        [    1.125337]  [<ffffffff81195eb3>] create_dir+0x73/0xd0
        [    1.125832]  [<ffffffff81196221>] sysfs_create_dir+0x81/0xe0
        [    1.126363]  [<ffffffff81273d3d>] kobject_add_internal+0x9d/0x210
        [    1.126832]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
        [    1.127406]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
        [    1.127832]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
        [    1.128384]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
        [    1.128833]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
        [    1.129831]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
        [    1.130305]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
        [    1.130831]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
        [    1.131351]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
        [    1.131830]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
        [    1.132392]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
        [    1.132830]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
        [    1.133315] ---[ end trace 2703540871c8fab7 ]---
        [    1.133830] ------------[ cut here ]------------
        [    1.134274] WARNING: at lib/kobject.c:196 kobject_add_internal+0x1f5/0x210()
        [    1.134829] kobject_add_internal failed for :t-0000016 with -EEXIST, don't try to register things with the same name in the same directory.
        [    1.135829] Modules linked in:
        [    1.136135] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
        [    1.136828] Call Trace:
        [    1.137071]  [<ffffffff81273e95>] ? kobject_add_internal+0x1f5/0x210
        [    1.137830]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
        [    1.138402]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
        [    1.138830]  [<ffffffff811955a3>] ? release_sysfs_dirent+0x73/0xf0
        [    1.139419]  [<ffffffff81273e95>] kobject_add_internal+0x1f5/0x210
        [    1.139830]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
        [    1.140429]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
        [    1.140830]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
        [    1.141829]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
        [    1.142307]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
        [    1.142829]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
        [    1.143307]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
        [    1.143829]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
        [    1.144352]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
        [    1.144829]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
        [    1.145405]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
        [    1.145828]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
        [    1.146313] ---[ end trace 2703540871c8fab8 ]---
      
      Conflicts:
      
      	mm/slub.c
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      aac3a166
    • C
      mm/sl[aou]b: Shrink __kmem_cache_create() parameter lists · 8a13a4cc
      Christoph Lameter 提交于
      Do the initial settings of the fields in common code. This will allow us
      to push more processing into common code later and improve readability.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      8a13a4cc
    • C
      mm/sl[aou]b: Move kmem_cache allocations into common code · 278b1bb1
      Christoph Lameter 提交于
      Shift the allocations to common code. That way the allocation and
      freeing of the kmem_cache structures is handled by common code.
      Reviewed-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      278b1bb1
    • C
      mm/sl[aou]b: Move sysfs_slab_add to common · 96d17b7b
      Christoph Lameter 提交于
      Simplify locking by moving the slab_add_sysfs after all locks have been
      dropped. Eases the upcoming move to provide sysfs support for all
      allocators.
      Reviewed-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      96d17b7b
    • C
      mm/sl[aou]b: Do slab aliasing call from common code · cbb79694
      Christoph Lameter 提交于
      The slab aliasing logic causes some strange contortions in slub. So add
      a call to deal with aliases to slab_common.c but disable it for other
      slab allocators by providng stubs that fail to create aliases.
      
      Full general support for aliases will require additional cleanup passes
      and more standardization of fields in kmem_cache.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      cbb79694
    • C
      mm/sl[aou]b: Get rid of __kmem_cache_destroy · 12c3667f
      Christoph Lameter 提交于
      What is done there can be done in __kmem_cache_shutdown.
      
      This affects RCU handling somewhat. On rcu free all slab allocators do
      not refer to other management structures than the kmem_cache structure.
      Therefore these other structures can be freed before the rcu deferred
      free to the page allocator occurs.
      Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      12c3667f
    • C
      mm/sl[aou]b: Use "kmem_cache" name for slab cache with kmem_cache struct · 9b030cb8
      Christoph Lameter 提交于
      Make all allocators use the "kmem_cache" slabname for the "kmem_cache"
      structure.
      Reviewed-by: NGlauber Costa <glommer@parallels.com>
      Reviewed-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      9b030cb8
    • C
      mm/sl[aou]b: Extract a common function for kmem_cache_destroy · 945cf2b6
      Christoph Lameter 提交于
      kmem_cache_destroy does basically the same in all allocators.
      
      Extract common code which is easy since we already have common mutex
      handling.
      Reviewed-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      945cf2b6
  16. 09 7月, 2012 2 次提交