1. 24 1月, 2014 5 次提交
    • V
      slab: do not panic if we fail to create memcg cache · f717eb3a
      Vladimir Davydov 提交于
      There is no point in flooding logs with warnings or especially crashing
      the system if we fail to create a cache for a memcg.  In this case we
      will be accounting the memcg allocation to the root cgroup until we
      succeed to create its own cache, but it isn't that critical.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f717eb3a
    • V
      memcg, slab: fix races in per-memcg cache creation/destruction · 2edefe11
      Vladimir Davydov 提交于
      We obtain a per-memcg cache from a root kmem_cache by dereferencing an
      entry of the root cache's memcg_params::memcg_caches array.  If we find
      no cache for a memcg there on allocation, we initiate the memcg cache
      creation (see memcg_kmem_get_cache()).  The cache creation proceeds
      asynchronously in memcg_create_kmem_cache() in order to avoid lock
      clashes, so there can be several threads trying to create the same
      kmem_cache concurrently, but only one of them may succeed.  However, due
      to a race in the code, it is not always true.  The point is that the
      memcg_caches array can be relocated when we activate kmem accounting for
      a memcg (see memcg_update_all_caches(), memcg_update_cache_size()).  If
      memcg_update_cache_size() and memcg_create_kmem_cache() proceed
      concurrently as described below, we can leak a kmem_cache.
      
      Asume two threads schedule creation of the same kmem_cache.  One of them
      successfully creates it.  Another one should fail then, but if
      memcg_create_kmem_cache() interleaves with memcg_update_cache_size() as
      follows, it won't:
      
        memcg_create_kmem_cache()             memcg_update_cache_size()
        (called w/o mutexes held)             (called with slab_mutex,
                                               set_limit_mutex held)
        -------------------------             -------------------------
      
        mutex_lock(&memcg_cache_mutex)
      
                                              s->memcg_params=kzalloc(...)
      
        new_cachep=cache_from_memcg_idx(cachep,idx)
        // new_cachep==NULL => proceed to creation
      
                                              s->memcg_params->memcg_caches[i]
                                                  =cur_params->memcg_caches[i]
      
        // kmem_cache_create_memcg takes slab_mutex
        // so we will hang around until
        // memcg_update_cache_size finishes, but
        // nothing will prevent it from succeeding so
        // memcg_caches[idx] will be overwritten in
        // memcg_register_cache!
      
        new_cachep = kmem_cache_create_memcg(...)
        mutex_unlock(&memcg_cache_mutex)
      
      Let's fix this by moving the check for existence of the memcg cache to
      kmem_cache_create_memcg() to be called under the slab_mutex and make it
      return NULL if so.
      
      A similar race is possible when destroying a memcg cache (see
      kmem_cache_destroy()).  Since memcg_unregister_cache(), which clears the
      pointer in the memcg_caches array, is called w/o protection, we can race
      with memcg_update_cache_size() and omit clearing the pointer.  Therefore
      memcg_unregister_cache() should be moved before we release the
      slab_mutex.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2edefe11
    • V
      memcg, slab: clean up memcg cache initialization/destruction · 1aa13254
      Vladimir Davydov 提交于
      Currently, we have rather a messy function set relating to per-memcg
      kmem cache initialization/destruction.
      
      Per-memcg caches are created in memcg_create_kmem_cache().  This
      function calls kmem_cache_create_memcg() to allocate and initialize a
      kmem cache and then "registers" the new cache in the
      memcg_params::memcg_caches array of the parent cache.
      
      During its work-flow, kmem_cache_create_memcg() executes the following
      memcg-related functions:
      
       - memcg_alloc_cache_params(), to initialize memcg_params of the newly
         created cache;
       - memcg_cache_list_add(), to add the new cache to the memcg_slab_caches
         list.
      
      On the other hand, kmem_cache_destroy() called on a cache destruction
      only calls memcg_release_cache(), which does all the work: it cleans the
      reference to the cache in its parent's memcg_params::memcg_caches,
      removes the cache from the memcg_slab_caches list, and frees
      memcg_params.
      
      Such an inconsistency between destruction and initialization paths make
      the code difficult to read, so let's clean this up a bit.
      
      This patch moves all the code relating to registration of per-memcg
      caches (adding to memcg list, setting the pointer to a cache from its
      parent) to the newly created memcg_register_cache() and
      memcg_unregister_cache() functions making the initialization and
      destruction paths look symmetrical.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1aa13254
    • V
      memcg, slab: kmem_cache_create_memcg(): fix memleak on fail path · 363a044f
      Vladimir Davydov 提交于
      We do not free the cache's memcg_params if __kmem_cache_create fails.
      Fix this.
      
      Plus, rename memcg_register_cache() to memcg_alloc_cache_params(),
      because it actually does not register the cache anywhere, but simply
      initialize kmem_cache::memcg_params.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      363a044f
    • V
      slab: clean up kmem_cache_create_memcg() error handling · 3965fc36
      Vladimir Davydov 提交于
      Currently kmem_cache_create_memcg() backoffs on failure inside
      conditionals, without using gotos.  This results in the rollback code
      duplication, which makes the function look cumbersome even though on
      error we should only free the allocated cache.  Since in the next patch
      I am going to add yet another rollback function call on error path
      there, let's employ labels instead of conditionals for undoing any
      changes on failure to keep things clean.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3965fc36
  2. 13 11月, 2013 1 次提交
  3. 28 9月, 2013 1 次提交
    • C
      slab_common: Do not check for duplicate slab names · 3e374919
      Christoph Lameter 提交于
      SLUB can alias multiple slab kmem_create_requests to one slab cache to save
      memory and increase the cache hotness. As a result the name of the slab can be
      stale. Only check the name for duplicates if we are in debug mode where we do
      not merge multiple caches.
      
      This fixes the following problem reported by Jonathan Brassow:
      
        The problem with kmem_cache* is this:
      
        *) Assume CONFIG_SLUB is set
        1) kmem_cache_create(name="foo-a")
        - creates new kmem_cache structure
        2) kmem_cache_create(name="foo-b")
        - If identical cache characteristics, it will be merged with the previously
          created cache associated with "foo-a".  The cache's refcount will be
          incremented and an alias will be created via sysfs_slab_alias().
        3) kmem_cache_destroy(<ptr>)
        - Attempting to destroy cache associated with "foo-a", but instead the
          refcount is simply decremented.  I don't even think the sysfs aliases are
          ever removed...
        4) kmem_cache_create(name="foo-a")
        - This FAILS because kmem_cache_sanity_check colides with the existing
          name ("foo-a") associated with the non-removed cache.
      
        This is a problem for RAID (specifically dm-raid) because the name used
        for the kmem_cache_create is ("raid%d-%p", level, mddev).  If the cache
        persists for long enough, the memory address of an old mddev will be
        reused for a new mddev - causing an identical formulation of the cache
        name.  Even though kmem_cache_destory had long ago been used to delete
        the old cache, the merging of caches has cause the name and cache of that
        old instance to be preserved and causes a colision (and thus failure) in
        kmem_cache_create().  I see this regularly in my testing.
      Reported-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      3e374919
  4. 05 9月, 2013 1 次提交
    • C
      mm/sl[aou]b: Move kmallocXXX functions to common code · f1b6eb6e
      Christoph Lameter 提交于
      The kmalloc* functions of all slab allcoators are similar now so
      lets move them into slab.h. This requires some function naming changes
      in slob.
      
      As a results of this patch there is a common set of functions for
      all allocators. Also means that kmalloc_large() is now available
      in general to perform large order allocations that go directly
      via the page allocator. kmalloc_large() can be substituted if
      kmalloc() throws warnings because of too large allocations.
      
      kmalloc_large() has exactly the same semantics as kmalloc but
      can only used for allocations > PAGE_SIZE.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      f1b6eb6e
  5. 14 8月, 2013 1 次提交
  6. 08 7月, 2013 1 次提交
  7. 07 7月, 2013 2 次提交
  8. 13 6月, 2013 1 次提交
    • S
      slab: prevent warnings when allocating with __GFP_NOWARN · 907985f4
      Sasha Levin 提交于
      Sasha Levin noticed that the warning introduced by commit 6286ae97
      ("slab: Return NULL for oversized allocations) is being triggered:
      
        WARNING: CPU: 15 PID: 21519 at mm/slab_common.c:376 kmalloc_slab+0x2f/0xb0()
        can: request_module (can-proto-4) failed.
        mpoa: proc_mpc_write: could not parse ''
        Modules linked in:
        CPU: 15 PID: 21519 Comm: trinity-child15 Tainted: G W    3.10.0-rc4-next-20130607-sasha-00011-gcd78395-dirty #2
         0000000000000009 ffff880020a95e30 ffffffff83ff4041 0000000000000000
         ffff880020a95e68 ffffffff8111fe12 fffffffffffffff0 00000000000082d0
         0000000000080000 0000000000080000 0000000001400000 ffff880020a95e78
        Call Trace:
         [<ffffffff83ff4041>] dump_stack+0x4e/0x82
         [<ffffffff8111fe12>] warn_slowpath_common+0x82/0xb0
         [<ffffffff8111fe55>] warn_slowpath_null+0x15/0x20
         [<ffffffff81243dcf>] kmalloc_slab+0x2f/0xb0
         [<ffffffff81278d54>] __kmalloc+0x24/0x4b0
         [<ffffffff8196ffe3>] ? security_capable+0x13/0x20
         [<ffffffff812a26b7>] ? pipe_fcntl+0x107/0x210
         [<ffffffff812a26b7>] pipe_fcntl+0x107/0x210
         [<ffffffff812b7ea0>] ? fget_raw_light+0x130/0x3f0
         [<ffffffff812aa5fb>] SyS_fcntl+0x60b/0x6a0
         [<ffffffff8403ca98>] tracesys+0xe1/0xe6
      
      Andrew Morton writes:
      
        __GFP_NOWARN is frequently used by kernel code to probe for "how big
        an allocation can I get".  That's a bit lame, but it's used on slow
        paths and is pretty simple.
      
      However, SLAB would still spew a warning when a big allocation happens
      if the __GFP_NOWARN flag is _not_ set to expose kernel bugs.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      [ penberg@kernel.org: improve changelog ]
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      907985f4
  9. 09 5月, 2013 1 次提交
  10. 07 5月, 2013 1 次提交
  11. 06 5月, 2013 1 次提交
  12. 07 2月, 2013 1 次提交
  13. 01 2月, 2013 4 次提交
  14. 19 12月, 2012 5 次提交
    • G
      slab: propagate tunable values · 943a451a
      Glauber Costa 提交于
      SLAB allows us to tune a particular cache behavior with tunables.  When
      creating a new memcg cache copy, we'd like to preserve any tunables the
      parent cache already had.
      
      This could be done by an explicit call to do_tune_cpucache() after the
      cache is created.  But this is not very convenient now that the caches are
      created from common code, since this function is SLAB-specific.
      
      Another method of doing that is taking advantage of the fact that
      do_tune_cpucache() is always called from enable_cpucache(), which is
      called at cache initialization.  We can just preset the values, and then
      things work as expected.
      
      It can also happen that a root cache has its tunables updated during
      normal system operation.  In this case, we will propagate the change to
      all caches that are already active.
      
      This change will require us to move the assignment of root_cache in
      memcg_params a bit earlier.  We need this to be already set - which
      memcg_kmem_register_cache will do - when we reach __kmem_cache_create()
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      943a451a
    • G
      memcg: aggregate memcg cache values in slabinfo · 749c5415
      Glauber Costa 提交于
      When we create caches in memcgs, we need to display their usage
      information somewhere.  We'll adopt a scheme similar to /proc/meminfo,
      with aggregate totals shown in the global file, and per-group information
      stored in the group itself.
      
      For the time being, only reads are allowed in the per-group cache.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      749c5415
    • G
      memcg/sl[au]b: track all the memcg children of a kmem_cache · 7cf27982
      Glauber Costa 提交于
      This enables us to remove all the children of a kmem_cache being
      destroyed, if for example the kernel module it's being used in gets
      unloaded.  Otherwise, the children will still point to the destroyed
      parent.
      Signed-off-by: NSuleiman Souhlal <suleiman@google.com>
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cf27982
    • G
      memcg: allocate memory for memcg caches whenever a new memcg appears · 55007d84
      Glauber Costa 提交于
      Every cache that is considered a root cache (basically the "original"
      caches, tied to the root memcg/no-memcg) will have an array that should be
      large enough to store a cache pointer per each memcg in the system.
      
      Theoreticaly, this is as high as 1 << sizeof(css_id), which is currently
      in the 64k pointers range.  Most of the time, we won't be using that much.
      
      What goes in this patch, is a simple scheme to dynamically allocate such
      an array, in order to minimize memory usage for memcg caches.  Because we
      would also like to avoid allocations all the time, at least for now, the
      array will only grow.  It will tend to be big enough to hold the maximum
      number of kmem-limited memcgs ever achieved.
      
      We'll allocate it to be a minimum of 64 kmem-limited memcgs.  When we have
      more than that, we'll start doubling the size of this array every time the
      limit is reached.
      
      Because we are only considering kmem limited memcgs, a natural point for
      this to happen is when we write to the limit.  At that point, we already
      have set_limit_mutex held, so that will become our natural synchronization
      mechanism.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55007d84
    • G
      slab/slub: consider a memcg parameter in kmem_create_cache · 2633d7a0
      Glauber Costa 提交于
      Allow a memcg parameter to be passed during cache creation.  When the slub
      allocator is being used, it will only merge caches that belong to the same
      memcg.  We'll do this by scanning the global list, and then translating
      the cache to a memcg-specific cache
      
      Default function is created as a wrapper, passing NULL to the memcg
      version.  We only merge caches that belong to the same memcg.
      
      A helper is provided, memcg_css_id: because slub needs a unique cache name
      for sysfs.  Since this is visible, but not the canonical location for slab
      data, the cache name is not used, the css_id should suffice.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2633d7a0
  15. 11 12月, 2012 2 次提交
  16. 31 10月, 2012 1 次提交
    • G
      slab: Ignore internal flags in cache creation · d8843922
      Glauber Costa 提交于
      Some flags are used internally by the allocators for management
      purposes. One example of that is the CFLGS_OFF_SLAB flag that slab uses
      to mark that the metadata for that cache is stored outside of the slab.
      
      No cache should ever pass those as a creation flags. We can just ignore
      this bit if it happens to be passed (such as when duplicating a cache in
      the kmem memcg patches).
      
      Because such flags can vary from allocator to allocator, we allow them
      to make their own decisions on that, defining SLAB_AVAILABLE_FLAGS with
      all flags that are valid at creation time.  Allocators that doesn't have
      any specific flag requirement should define that to mean all flags.
      
      Common code will mask out all flags not belonging to that set.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      d8843922
  17. 24 10月, 2012 3 次提交
  18. 10 10月, 2012 1 次提交
    • J
      mm, slab: release slab_mutex earlier in kmem_cache_destroy() · 210ed9de
      Jiri Kosina 提交于
      Commit 1331e7a1 ("rcu: Remove _rcu_barrier() dependency on
      __stop_machine()") introduced slab_mutex -> cpu_hotplug.lock dependency
      through kmem_cache_destroy() -> rcu_barrier() -> _rcu_barrier() ->
      get_online_cpus().
      
      Lockdep thinks that this might actually result in ABBA deadlock,
      and reports it as below:
      
      === [ cut here ] ===
       ======================================================
       [ INFO: possible circular locking dependency detected ]
       3.6.0-rc5-00004-g0d8ee37e #143 Not tainted
       -------------------------------------------------------
       kworker/u:2/40 is trying to acquire lock:
        (rcu_sched_state.barrier_mutex){+.+...}, at: [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
      
       but task is already holding lock:
        (slab_mutex){+.+.+.}, at: [<ffffffff81176e15>] kmem_cache_destroy+0x45/0xe0
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #2 (slab_mutex){+.+.+.}:
              [<ffffffff810ae1e2>] validate_chain+0x632/0x720
              [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
              [<ffffffff810ae921>] lock_acquire+0x121/0x190
              [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
              [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
              [<ffffffff81558cb5>] cpuup_callback+0x2f/0xbe
              [<ffffffff81564b83>] notifier_call_chain+0x93/0x140
              [<ffffffff81076f89>] __raw_notifier_call_chain+0x9/0x10
              [<ffffffff8155719d>] _cpu_up+0xba/0x14e
              [<ffffffff815572ed>] cpu_up+0xbc/0x117
              [<ffffffff81ae05e3>] smp_init+0x6b/0x9f
              [<ffffffff81ac47d6>] kernel_init+0x147/0x1dc
              [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
      
       -> #1 (cpu_hotplug.lock){+.+.+.}:
              [<ffffffff810ae1e2>] validate_chain+0x632/0x720
              [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
              [<ffffffff810ae921>] lock_acquire+0x121/0x190
              [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
              [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
              [<ffffffff81049197>] get_online_cpus+0x37/0x50
              [<ffffffff810f21bb>] _rcu_barrier+0xbb/0x1e0
              [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
              [<ffffffff810f2309>] rcu_barrier+0x9/0x10
              [<ffffffff8118c129>] deactivate_locked_super+0x49/0x90
              [<ffffffff8118cc01>] deactivate_super+0x61/0x70
              [<ffffffff811aaaa7>] mntput_no_expire+0x127/0x180
              [<ffffffff811ab49e>] sys_umount+0x6e/0xd0
              [<ffffffff81569979>] system_call_fastpath+0x16/0x1b
      
       -> #0 (rcu_sched_state.barrier_mutex){+.+...}:
              [<ffffffff810adb4e>] check_prev_add+0x3de/0x440
              [<ffffffff810ae1e2>] validate_chain+0x632/0x720
              [<ffffffff810ae5d9>] __lock_acquire+0x309/0x530
              [<ffffffff810ae921>] lock_acquire+0x121/0x190
              [<ffffffff8155d4cc>] __mutex_lock_common+0x5c/0x450
              [<ffffffff8155d9ee>] mutex_lock_nested+0x3e/0x50
              [<ffffffff810f2126>] _rcu_barrier+0x26/0x1e0
              [<ffffffff810f22f0>] rcu_barrier_sched+0x10/0x20
              [<ffffffff810f2309>] rcu_barrier+0x9/0x10
              [<ffffffff81176ea1>] kmem_cache_destroy+0xd1/0xe0
              [<ffffffffa04c3154>] nf_conntrack_cleanup_net+0xe4/0x110 [nf_conntrack]
              [<ffffffffa04c31aa>] nf_conntrack_cleanup+0x2a/0x70 [nf_conntrack]
              [<ffffffffa04c42ce>] nf_conntrack_net_exit+0x5e/0x80 [nf_conntrack]
              [<ffffffff81454b79>] ops_exit_list+0x39/0x60
              [<ffffffff814551ab>] cleanup_net+0xfb/0x1b0
              [<ffffffff8106917b>] process_one_work+0x26b/0x4c0
              [<ffffffff81069f3e>] worker_thread+0x12e/0x320
              [<ffffffff8106f73e>] kthread+0x9e/0xb0
              [<ffffffff8156ab44>] kernel_thread_helper+0x4/0x10
      
       other info that might help us debug this:
      
       Chain exists of:
         rcu_sched_state.barrier_mutex --> cpu_hotplug.lock --> slab_mutex
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(slab_mutex);
                                      lock(cpu_hotplug.lock);
                                      lock(slab_mutex);
         lock(rcu_sched_state.barrier_mutex);
      
        *** DEADLOCK ***
      === [ cut here ] ===
      
      This is actually a false positive. Lockdep has no way of knowing the fact
      that the ABBA can actually never happen, because of special semantics of
      cpu_hotplug.refcount and its handling in cpu_hotplug_begin(); the mutual
      exclusion there is not achieved through mutex, but through
      cpu_hotplug.refcount.
      
      The "neither cpu_up() nor cpu_down() will proceed past cpu_hotplug_begin()
      until everyone who called get_online_cpus() will call put_online_cpus()"
      semantics is totally invisible to lockdep.
      
      This patch therefore moves the unlock of slab_mutex so that rcu_barrier()
      is being called with it unlocked. It has two advantages:
      
      - it slightly reduces hold time of slab_mutex; as it's used to protect
        the cachep list, it's not necessary to hold it over kmem_cache_free()
        call any more
      - it silences the lockdep false positive warning, as it avoids lockdep ever
        learning about slab_mutex -> cpu_hotplug.lock dependency
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      210ed9de
  19. 05 9月, 2012 7 次提交
    • P
      Revert "mm/sl[aou]b: Move sysfs_slab_add to common" · aac3a166
      Pekka Enberg 提交于
      This reverts commit 96d17b7b which
      caused the following errors at boot:
      
        [    1.114885] kobject (ffff88001a802578): tried to init an initialized object, something is seriously wrong.
        [    1.114885] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
        [    1.114885] Call Trace:
        [    1.114885]  [<ffffffff81273f37>] kobject_init+0x87/0xa0
        [    1.115555]  [<ffffffff8127426a>] kobject_init_and_add+0x2a/0x90
        [    1.115555]  [<ffffffff8127c870>] ? sprintf+0x40/0x50
        [    1.115555]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
        [    1.115555]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
        [    1.115555]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
        [    1.115555]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
        [    1.115555]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
        [    1.115836]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
        [    1.116834]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
        [    1.117835]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
        [    1.117835]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
        [    1.118401]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
        [    1.119832]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
        [    1.120325] ------------[ cut here ]------------
        [    1.120835] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xc1/0xf0()
        [    1.121437] sysfs: cannot create duplicate filename '/kernel/slab/:t-0000016'
        [    1.121831] Modules linked in:
        [    1.122138] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
        [    1.122831] Call Trace:
        [    1.123074]  [<ffffffff81195ce1>] ? sysfs_add_one+0xc1/0xf0
        [    1.123833]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
        [    1.124405]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
        [    1.124832]  [<ffffffff81195ce1>] sysfs_add_one+0xc1/0xf0
        [    1.125337]  [<ffffffff81195eb3>] create_dir+0x73/0xd0
        [    1.125832]  [<ffffffff81196221>] sysfs_create_dir+0x81/0xe0
        [    1.126363]  [<ffffffff81273d3d>] kobject_add_internal+0x9d/0x210
        [    1.126832]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
        [    1.127406]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
        [    1.127832]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
        [    1.128384]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
        [    1.128833]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
        [    1.129831]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
        [    1.130305]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
        [    1.130831]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
        [    1.131351]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
        [    1.131830]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
        [    1.132392]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
        [    1.132830]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
        [    1.133315] ---[ end trace 2703540871c8fab7 ]---
        [    1.133830] ------------[ cut here ]------------
        [    1.134274] WARNING: at lib/kobject.c:196 kobject_add_internal+0x1f5/0x210()
        [    1.134829] kobject_add_internal failed for :t-0000016 with -EEXIST, don't try to register things with the same name in the same directory.
        [    1.135829] Modules linked in:
        [    1.136135] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
        [    1.136828] Call Trace:
        [    1.137071]  [<ffffffff81273e95>] ? kobject_add_internal+0x1f5/0x210
        [    1.137830]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
        [    1.138402]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
        [    1.138830]  [<ffffffff811955a3>] ? release_sysfs_dirent+0x73/0xf0
        [    1.139419]  [<ffffffff81273e95>] kobject_add_internal+0x1f5/0x210
        [    1.139830]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
        [    1.140429]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
        [    1.140830]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
        [    1.141829]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
        [    1.142307]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
        [    1.142829]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
        [    1.143307]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
        [    1.143829]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
        [    1.144352]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
        [    1.144829]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
        [    1.145405]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
        [    1.145828]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
        [    1.146313] ---[ end trace 2703540871c8fab8 ]---
      
      Conflicts:
      
      	mm/slub.c
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      aac3a166
    • C
      mm/sl[aou]b: Move kmem_cache refcounting to common code · cce89f4f
      Christoph Lameter 提交于
      Get rid of the refcount stuff in the allocators and do that part of
      kmem_cache management in the common code.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      cce89f4f
    • C
      mm/sl[aou]b: Shrink __kmem_cache_create() parameter lists · 8a13a4cc
      Christoph Lameter 提交于
      Do the initial settings of the fields in common code. This will allow us
      to push more processing into common code later and improve readability.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      8a13a4cc
    • C
      mm/sl[aou]b: Move kmem_cache allocations into common code · 278b1bb1
      Christoph Lameter 提交于
      Shift the allocations to common code. That way the allocation and
      freeing of the kmem_cache structures is handled by common code.
      Reviewed-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      278b1bb1
    • C
      mm/sl[aou]b: Move sysfs_slab_add to common · 96d17b7b
      Christoph Lameter 提交于
      Simplify locking by moving the slab_add_sysfs after all locks have been
      dropped. Eases the upcoming move to provide sysfs support for all
      allocators.
      Reviewed-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      96d17b7b
    • C
      mm/sl[aou]b: Do slab aliasing call from common code · cbb79694
      Christoph Lameter 提交于
      The slab aliasing logic causes some strange contortions in slub. So add
      a call to deal with aliases to slab_common.c but disable it for other
      slab allocators by providng stubs that fail to create aliases.
      
      Full general support for aliases will require additional cleanup passes
      and more standardization of fields in kmem_cache.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      cbb79694
    • C
      mm/sl[aou]b: Move duping of slab name to slab_common.c · db265eca
      Christoph Lameter 提交于
      Duping of the slabname has to be done by each slab. Moving this code to
      slab_common avoids duplicate implementations.
      
      With this patch we have common string handling for all slab allocators.
      Strings passed to kmem_cache_create() are copied internally. Subsystems
      can create temporary strings to create slab caches.
      
      Slabs allocated in early states of bootstrap will never be freed (and
      those can never be freed since they are essential to slab allocator
      operations).  During bootstrap we therefore do not have to worry about
      duping names.
      Reviewed-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      db265eca