1. 15 4月, 2015 1 次提交
  2. 14 2月, 2015 1 次提交
    • A
      mm: slub: add kernel address sanitizer support for slub allocator · 0316bec2
      Andrey Ryabinin 提交于
      With this patch kasan will be able to catch bugs in memory allocated by
      slub.  Initially all objects in newly allocated slab page, marked as
      redzone.  Later, when allocation of slub object happens, requested by
      caller number of bytes marked as accessible, and the rest of the object
      (including slub's metadata) marked as redzone (inaccessible).
      
      We also mark object as accessible if ksize was called for this object.
      There is some places in kernel where ksize function is called to inquire
      size of really allocated area.  Such callers could validly access whole
      allocated memory, so it should be marked as accessible.
      
      Code in slub.c and slab_common.c files could validly access to object's
      metadata, so instrumentation for this files are disabled.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Signed-off-by: NDmitry Chernenkov <dmitryc@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0316bec2
  3. 13 2月, 2015 3 次提交
    • V
      memcg: free memcg_caches slot on css offline · 2a4db7eb
      Vladimir Davydov 提交于
      We need to look up a kmem_cache in ->memcg_params.memcg_caches arrays only
      on allocations, so there is no need to have the array entries set until
      css free - we can clear them on css offline.  This will allow us to reuse
      array entries more efficiently and avoid costly array relocations.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a4db7eb
    • V
      slab: link memcg caches of the same kind into a list · 426589f5
      Vladimir Davydov 提交于
      Sometimes, we need to iterate over all memcg copies of a particular root
      kmem cache.  Currently, we use memcg_cache_params->memcg_caches array for
      that, because it contains all existing memcg caches.
      
      However, it's a bad practice to keep all caches, including those that
      belong to offline cgroups, in this array, because it will be growing
      beyond any bounds then.  I'm going to wipe away dead caches from it to
      save space.  To still be able to perform iterations over all memcg caches
      of the same kind, let us link them into a list.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      426589f5
    • V
      slab: embed memcg_cache_params to kmem_cache · f7ce3190
      Vladimir Davydov 提交于
      Currently, kmem_cache stores a pointer to struct memcg_cache_params
      instead of embedding it.  The rationale is to save memory when kmem
      accounting is disabled.  However, the memcg_cache_params has shrivelled
      drastically since it was first introduced:
      
      * Initially:
      
      struct memcg_cache_params {
      	bool is_root_cache;
      	union {
      		struct kmem_cache *memcg_caches[0];
      		struct {
      			struct mem_cgroup *memcg;
      			struct list_head list;
      			struct kmem_cache *root_cache;
      			bool dead;
      			atomic_t nr_pages;
      			struct work_struct destroy;
      		};
      	};
      };
      
      * Now:
      
      struct memcg_cache_params {
      	bool is_root_cache;
      	union {
      		struct {
      			struct rcu_head rcu_head;
      			struct kmem_cache *memcg_caches[0];
      		};
      		struct {
      			struct mem_cgroup *memcg;
      			struct kmem_cache *root_cache;
      		};
      	};
      };
      
      So the memory saving does not seem to be a clear win anymore.
      
      OTOH, keeping a pointer to memcg_cache_params struct instead of embedding
      it results in touching one more cache line on kmem alloc/free hot paths.
      Besides, it makes linking kmem caches in a list chained by a field of
      struct memcg_cache_params really painful due to a level of indirection,
      while I want to make them linked in the following patch.  That said, let
      us embed it.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f7ce3190
  4. 11 2月, 2015 2 次提交
    • V
      memcg: zap memcg_slab_caches and memcg_slab_mutex · d5b3cf71
      Vladimir Davydov 提交于
      mem_cgroup->memcg_slab_caches is a list of kmem caches corresponding to
      the given cgroup.  Currently, it is only used on css free in order to
      destroy all caches corresponding to the memory cgroup being freed.  The
      list is protected by memcg_slab_mutex.  The mutex is also used to protect
      kmem_cache->memcg_params->memcg_caches arrays and synchronizes
      kmem_cache_destroy vs memcg_unregister_all_caches.
      
      However, we can perfectly get on without these two.  To destroy all caches
      corresponding to a memory cgroup, we can walk over the global list of kmem
      caches, slab_caches, and we can do all the synchronization stuff using the
      slab_mutex instead of the memcg_slab_mutex.  This patch therefore gets rid
      of the memcg_slab_caches and memcg_slab_mutex.
      
      Apart from this nice cleanup, it also:
      
       - assures that rcu_barrier() is called once at max when a root cache is
         destroyed or a memory cgroup is freed, no matter how many caches have
         SLAB_DESTROY_BY_RCU flag set;
      
       - fixes the race between kmem_cache_destroy and kmem_cache_create that
         exists, because memcg_cleanup_cache_params, which is called from
         kmem_cache_destroy after checking that kmem_cache->refcount=0,
         releases the slab_mutex, which gives kmem_cache_create a chance to
         make an alias to a cache doomed to be destroyed.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5b3cf71
    • V
      memcg: zap memcg_name argument of memcg_create_kmem_cache · 3e0350a3
      Vladimir Davydov 提交于
      Instead of passing the name of the memory cgroup which the cache is
      created for in the memcg_name_argument, let's obtain it immediately in
      memcg_create_kmem_cache.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e0350a3
  5. 14 12月, 2014 1 次提交
    • V
      memcg: fix possible use-after-free in memcg_kmem_get_cache() · 8135be5a
      Vladimir Davydov 提交于
      Suppose task @t that belongs to a memory cgroup @memcg is going to
      allocate an object from a kmem cache @c.  The copy of @c corresponding to
      @memcg, @mc, is empty.  Then if kmem_cache_alloc races with the memory
      cgroup destruction we can access the memory cgroup's copy of the cache
      after it was destroyed:
      
      CPU0				CPU1
      ----				----
      [ current=@t
        @mc->memcg_params->nr_pages=0 ]
      
      kmem_cache_alloc(@c):
        call memcg_kmem_get_cache(@c);
        proceed to allocation from @mc:
          alloc a page for @mc:
            ...
      
      				move @t from @memcg
      				destroy @memcg:
      				  mem_cgroup_css_offline(@memcg):
      				    memcg_unregister_all_caches(@memcg):
      				      kmem_cache_destroy(@mc)
      
          add page to @mc
      
      We could fix this issue by taking a reference to a per-memcg cache, but
      that would require adding a per-cpu reference counter to per-memcg caches,
      which would look cumbersome.
      
      Instead, let's take a reference to a memory cgroup, which already has a
      per-cpu reference counter, in the beginning of kmem_cache_alloc to be
      dropped in the end, and move per memcg caches destruction from css offline
      to css free.  As a side effect, per-memcg caches will be destroyed not one
      by one, but all at once when the last page accounted to the memory cgroup
      is freed.  This doesn't sound as a high price for code readability though.
      
      Note, this patch does add some overhead to the kmem_cache_alloc hot path,
      but it is pretty negligible - it's just a function call plus a per cpu
      counter decrement, which is comparable to what we already have in
      memcg_kmem_get_cache.  Besides, it's only relevant if there are memory
      cgroups with kmem accounting enabled.  I don't think we can find a way to
      handle this race w/o it, because alloc_page called from kmem_cache_alloc
      may sleep so we can't flush all pending kmallocs w/o reference counting.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8135be5a
  6. 11 12月, 2014 1 次提交
    • V
      memcg: use generic slab iterators for showing slabinfo · b047501c
      Vladimir Davydov 提交于
      Let's use generic slab_start/next/stop for showing memcg caches info.  In
      contrast to the current implementation, this will work even if all memcg
      caches' info doesn't fit into a seq buffer (a page), plus it simply looks
      neater.
      
      Actually, the main reason I do this isn't mere cleanup.  I'm going to zap
      the memcg_slab_caches list, because I find it useless provided we have the
      slab_caches list, and this patch is a step in this direction.
      
      It should be noted that before this patch an attempt to read
      memory.kmem.slabinfo of a cgroup that doesn't have kmem limit set resulted
      in -EIO, while after this patch it will silently show nothing except the
      header, but I don't think it will frustrate anyone.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b047501c
  7. 10 10月, 2014 2 次提交
  8. 05 6月, 2014 5 次提交
    • V
      memcg: cleanup kmem cache creation/destruction functions naming · 776ed0f0
      Vladimir Davydov 提交于
      Current names are rather inconsistent. Let's try to improve them.
      
      Brief change log:
      
      ** old name **                          ** new name **
      
      kmem_cache_create_memcg                 memcg_create_kmem_cache
      memcg_kmem_create_cache                 memcg_regsiter_cache
      memcg_kmem_destroy_cache                memcg_unregister_cache
      
      kmem_cache_destroy_memcg_children       memcg_cleanup_cache_params
      mem_cgroup_destroy_all_caches           memcg_unregister_all_caches
      
      create_work                             memcg_register_cache_work
      memcg_create_cache_work_func            memcg_register_cache_func
      memcg_create_cache_enqueue              memcg_schedule_register_cache
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      776ed0f0
    • V
      memcg: get rid of memcg_create_cache_name · 073ee1c6
      Vladimir Davydov 提交于
      Instead of calling back to memcontrol.c from kmem_cache_create_memcg in
      order to just create the name of a per memcg cache, let's allocate it in
      place.  We only need to pass the memcg name to kmem_cache_create_memcg for
      that - everything else can be done in slab_common.c.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      073ee1c6
    • V
      memcg, slab: simplify synchronization scheme · bd673145
      Vladimir Davydov 提交于
      At present, we have the following mutexes protecting data related to per
      memcg kmem caches:
      
       - slab_mutex.  This one is held during the whole kmem cache creation
         and destruction paths.  We also take it when updating per root cache
         memcg_caches arrays (see memcg_update_all_caches).  As a result, taking
         it guarantees there will be no changes to any kmem cache (including per
         memcg).  Why do we need something else then?  The point is it is
         private to slab implementation and has some internal dependencies with
         other mutexes (get_online_cpus).  So we just don't want to rely upon it
         and prefer to introduce additional mutexes instead.
      
       - activate_kmem_mutex.  Initially it was added to synchronize
         initializing kmem limit (memcg_activate_kmem).  However, since we can
         grow per root cache memcg_caches arrays only on kmem limit
         initialization (see memcg_update_all_caches), we also employ it to
         protect against memcg_caches arrays relocation (e.g.  see
         __kmem_cache_destroy_memcg_children).
      
       - We have a convention not to take slab_mutex in memcontrol.c, but we
         want to walk over per memcg memcg_slab_caches lists there (e.g.  for
         destroying all memcg caches on offline).  So we have per memcg
         slab_caches_mutex's protecting those lists.
      
      The mutexes are taken in the following order:
      
         activate_kmem_mutex -> slab_mutex -> memcg::slab_caches_mutex
      
      Such a syncrhonization scheme has a number of flaws, for instance:
      
       - We can't call kmem_cache_{destroy,shrink} while walking over a
         memcg::memcg_slab_caches list due to locking order.  As a result, in
         mem_cgroup_destroy_all_caches we schedule the
         memcg_cache_params::destroy work shrinking and destroying the cache.
      
       - We don't have a mutex to synchronize per memcg caches destruction
         between memcg offline (mem_cgroup_destroy_all_caches) and root cache
         destruction (__kmem_cache_destroy_memcg_children).  Currently we just
         don't bother about it.
      
      This patch simplifies it by substituting per memcg slab_caches_mutex's
      with the global memcg_slab_mutex.  It will be held whenever a new per
      memcg cache is created or destroyed, so it protects per root cache
      memcg_caches arrays and per memcg memcg_slab_caches lists.  The locking
      order is following:
      
         activate_kmem_mutex -> memcg_slab_mutex -> slab_mutex
      
      This allows us to call kmem_cache_{create,shrink,destroy} under the
      memcg_slab_mutex.  As a result, we don't need memcg_cache_params::destroy
      work any more - we can simply destroy caches while iterating over a per
      memcg slab caches list.
      
      Also using the global mutex simplifies synchronization between concurrent
      per memcg caches creation/destruction, e.g.  mem_cgroup_destroy_all_caches
      vs __kmem_cache_destroy_memcg_children.
      
      The downside of this is that we substitute per-memcg slab_caches_mutex's
      with a hummer-like global mutex, but since we already take either the
      slab_mutex or the cgroup_mutex along with a memcg::slab_caches_mutex, it
      shouldn't hurt concurrency a lot.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd673145
    • V
      memcg, slab: do not schedule cache destruction when last page goes away · 1e32e77f
      Vladimir Davydov 提交于
      This patchset is a part of preparations for kmemcg re-parenting.  It
      targets at simplifying kmemcg work-flows and synchronization.
      
      First, it removes async per memcg cache destruction (see patches 1, 2).
      Now caches are only destroyed on memcg offline.  That means the caches
      that are not empty on memcg offline will be leaked.  However, they are
      already leaked, because memcg_cache_params::nr_pages normally never drops
      to 0 so the destruction work is never scheduled except kmem_cache_shrink
      is called explicitly.  In the future I'm planning reaping such dead caches
      on vmpressure or periodically.
      
      Second, it substitutes per memcg slab_caches_mutex's with the global
      memcg_slab_mutex, which should be taken during the whole per memcg cache
      creation/destruction path before the slab_mutex (see patch 3).  This
      greatly simplifies synchronization among various per memcg cache
      creation/destruction paths.
      
      I'm still not quite sure about the end picture, in particular I don't know
      whether we should reap dead memcgs' kmem caches periodically or try to
      merge them with their parents (see https://lkml.org/lkml/2014/4/20/38 for
      more details), but whichever way we choose, this set looks like a
      reasonable change to me, because it greatly simplifies kmemcg work-flows
      and eases further development.
      
      This patch (of 3):
      
      After a memcg is offlined, we mark its kmem caches that cannot be deleted
      right now due to pending objects as dead by setting the
      memcg_cache_params::dead flag, so that memcg_release_pages will schedule
      cache destruction (memcg_cache_params::destroy) as soon as the last slab
      of the cache is freed (memcg_cache_params::nr_pages drops to zero).
      
      I guess the idea was to destroy the caches as soon as possible, i.e.
      immediately after freeing the last object.  However, it just doesn't work
      that way, because kmem caches always preserve some pages for the sake of
      performance, so that nr_pages never gets to zero unless the cache is
      shrunk explicitly using kmem_cache_shrink.  Of course, we could account
      the total number of objects on the cache or check if all the slabs
      allocated for the cache are empty on kmem_cache_free and schedule
      destruction if so, but that would be too costly.
      
      Thus we have a piece of code that works only when we explicitly call
      kmem_cache_shrink, but complicates the whole picture a lot.  Moreover,
      it's racy in fact.  For instance, kmem_cache_shrink may free the last slab
      and thus schedule cache destruction before it finishes checking that the
      cache is empty, which can lead to use-after-free.
      
      So I propose to remove this async cache destruction from
      memcg_release_pages, and check if the cache is empty explicitly after
      calling kmem_cache_shrink instead.  This will simplify things a lot w/o
      introducing any functional changes.
      
      And regarding dead memcg caches (i.e.  those that are left hanging around
      after memcg offline for they have objects), I suppose we should reap them
      either periodically or on vmpressure as Glauber suggested initially.  I'm
      going to implement this later.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e32e77f
    • V
      mm: get rid of __GFP_KMEMCG · 52383431
      Vladimir Davydov 提交于
      Currently to allocate a page that should be charged to kmemcg (e.g.
      threadinfo), we pass __GFP_KMEMCG flag to the page allocator.  The page
      allocated is then to be freed by free_memcg_kmem_pages.  Apart from
      looking asymmetrical, this also requires intrusion to the general
      allocation path.  So let's introduce separate functions that will
      alloc/free pages charged to kmemcg.
      
      The new functions are called alloc_kmem_pages and free_kmem_pages.  They
      should be used when the caller actually would like to use kmalloc, but
      has to fall back to the page allocator for the allocation is large.
      They only differ from alloc_pages and free_pages in that besides
      allocating or freeing pages they also charge them to the kmem resource
      counter of the current memory cgroup.
      
      [sfr@canb.auug.org.au: export kmalloc_order() to modules]
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NGreg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52383431
  9. 08 4月, 2014 1 次提交
  10. 01 4月, 2014 1 次提交
    • J
      slab: fix wrongly used macro · 24f870d8
      Joonsoo Kim 提交于
      commit 'slab: restrict the number of objects in a slab' uses
      __builtin_constant_p() on #if macro. It is wrong usage of builtin
      function, but it is compiled on x86 without any problem, so I can't
      find it before 0 day build system find it.
      
      This commit fixes the situation by using KMALLOC_MIN_SIZE, instead of
      KMALLOC_SHIFT_LOW. KMALLOC_SHIFT_LOW is parsed to ilog2() on some
      architecture and this ilog2() uses __builtin_constant_p() and results in
      the problem. This problem would disappear by using KMALLOC_MIN_SIZE,
      since it is just constant.
      Tested-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      24f870d8
  11. 11 3月, 2014 1 次提交
    • J
      mm: fix GFP_THISNODE callers and clarify · e97ca8e5
      Johannes Weiner 提交于
      GFP_THISNODE is for callers that implement their own clever fallback to
      remote nodes.  It restricts the allocation to the specified node and
      does not invoke reclaim, assuming that the caller will take care of it
      when the fallback fails, e.g.  through a subsequent allocation request
      without GFP_THISNODE set.
      
      However, many current GFP_THISNODE users only want the node exclusive
      aspect of the flag, without actually implementing their own fallback or
      triggering reclaim if necessary.  This results in things like page
      migration failing prematurely even when there is easily reclaimable
      memory available, unless kswapd happens to be running already or a
      concurrent allocation attempt triggers the necessary reclaim.
      
      Convert all callsites that don't implement their own fallback strategy
      to __GFP_THISNODE.  This restricts the allocation a single node too, but
      at the same time allows the allocator to enter the slowpath, wake
      kswapd, and invoke direct reclaim if necessary, to make the allocation
      happen when memory is full.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Jan Stancek <jstancek@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e97ca8e5
  12. 08 2月, 2014 1 次提交
    • J
      slab: restrict the number of objects in a slab · f315e3fa
      Joonsoo Kim 提交于
      To prepare to implement byte sized index for managing the freelist
      of a slab, we should restrict the number of objects in a slab to be less
      or equal to 256, since byte only represent 256 different values.
      Setting the size of object to value equal or more than newly introduced
      SLAB_OBJ_MIN_SIZE ensures that the number of objects in a slab is less or
      equal to 256 for a slab with 1 page.
      
      If page size is rather larger than 4096, above assumption would be wrong.
      In this case, we would fall back on 2 bytes sized index.
      
      If minimum size of kmalloc is less than 16, we use it as minimum object
      size and give up this optimization.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      f315e3fa
  13. 31 1月, 2014 1 次提交
  14. 24 1月, 2014 1 次提交
    • V
      memcg, slab: RCU protect memcg_params for root caches · f8570263
      Vladimir Davydov 提交于
      We relocate root cache's memcg_params whenever we need to grow the
      memcg_caches array to accommodate all kmem-active memory cgroups.
      Currently on relocation we free the old version immediately, which can
      lead to use-after-free, because the memcg_caches array is accessed
      lock-free (see cache_from_memcg_idx()).  This patch fixes this by making
      memcg_params RCU-protected for root caches.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8570263
  15. 18 12月, 2013 1 次提交
  16. 25 11月, 2013 1 次提交
  17. 25 10月, 2013 1 次提交
  18. 05 9月, 2013 2 次提交
    • C
      slab: Use correct GFP_DMA constant · 23774a2f
      Christoph Lameter 提交于
      On Thu, 5 Sep 2013, kbuild test robot wrote:
      > >> include/linux/slab.h:433:53: sparse: restricted gfp_t degrades to integer
      >    429	static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
      >    430	{
      >    431	#ifndef CONFIG_SLOB
      >    432		if (__builtin_constant_p(size) &&
      >  > 433			size <= KMALLOC_MAX_CACHE_SIZE && !(flags & SLAB_CACHE_DMA)) {
      >    434			int i = kmalloc_index(size);
      >    435
      
      flags is of type gfp_t and not a slab internal flag. Therefore
      use GFP_DMA.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      23774a2f
    • C
      mm/sl[aou]b: Move kmallocXXX functions to common code · f1b6eb6e
      Christoph Lameter 提交于
      The kmalloc* functions of all slab allcoators are similar now so
      lets move them into slab.h. This requires some function naming changes
      in slob.
      
      As a results of this patch there is a common set of functions for
      all allocators. Also means that kmalloc_large() is now available
      in general to perform large order allocations that go directly
      via the page allocator. kmalloc_large() can be substituted if
      kmalloc() throws warnings because of too large allocations.
      
      kmalloc_large() has exactly the same semantics as kmalloc but
      can only used for allocations > PAGE_SIZE.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      f1b6eb6e
  19. 08 7月, 2013 1 次提交
  20. 18 6月, 2013 1 次提交
  21. 07 2月, 2013 1 次提交
    • C
      slab: Handle ARCH_DMA_MINALIGN correctly · c601fd69
      Christoph Lameter 提交于
      James Hogan hit boot problems in next-20130204 on Meta:
      
        META213-Thread0 DSP [LogF] kobject (4fc03980): tried to init an initialized object, something is seriously wrong.
        META213-Thread0 DSP [LogF]
        META213-Thread0 DSP [LogF] Call trace:
        META213-Thread0 DSP [LogF] [<4000888c>] _show_stack+0x68/0x7c
        META213-Thread0 DSP [LogF] [<400088b4>] _dump_stack+0x14/0x28
        META213-Thread0 DSP [LogF] [<40103794>] _kobject_init+0x58/0x9c
        META213-Thread0 DSP [LogF] [<40103810>] _kobject_create+0x38/0x64
        META213-Thread0 DSP [LogF] [<40103eac>] _kobject_create_and_add+0x14/0x8c
        META213-Thread0 DSP [LogF] [<40190ac4>] _mnt_init+0xd8/0x220
        META213-Thread0 DSP [LogF] [<40190508>] _vfs_caches_init+0xb0/0x160
        META213-Thread0 DSP [LogF] [<401851f4>] _start_kernel+0x274/0x340
        META213-Thread0 DSP [LogF] [<40188424>] _metag_start_kernel+0x58/0x6c
        META213-Thread0 DSP [LogF] [<40000044>] __start+0x44/0x48
        META213-Thread0 DSP [LogF]
        META213-Thread0 DSP [LogF] devtmpfs: initialized
        META213-Thread0 DSP [LogF] L2 Cache: Not present
        META213-Thread0 DSP [LogF] BUG: failure at fs/sysfs/dir.c:736/sysfs_read_ns_type()!
        META213-Thread0 DSP [LogF] Kernel panic - not syncing: BUG!
        META213-Thread0 DSP [Thread Exit] Thread has exited - return code = 4294967295
      
      And bisected the problem to commit 95a05b42 ("slab: Common constants for
      kmalloc boundaries").
      
      As it turns out, a fixed KMALLOC_SHIFT_LOW does not work for arches with
      higher alignment requirements.
      
      Determine KMALLOC_SHIFT_LOW from ARCH_DMA_MINALIGN instead.
      Reported-and-tested-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      c601fd69
  22. 01 2月, 2013 4 次提交
  23. 19 12月, 2012 5 次提交
    • G
      slab: propagate tunable values · 943a451a
      Glauber Costa 提交于
      SLAB allows us to tune a particular cache behavior with tunables.  When
      creating a new memcg cache copy, we'd like to preserve any tunables the
      parent cache already had.
      
      This could be done by an explicit call to do_tune_cpucache() after the
      cache is created.  But this is not very convenient now that the caches are
      created from common code, since this function is SLAB-specific.
      
      Another method of doing that is taking advantage of the fact that
      do_tune_cpucache() is always called from enable_cpucache(), which is
      called at cache initialization.  We can just preset the values, and then
      things work as expected.
      
      It can also happen that a root cache has its tunables updated during
      normal system operation.  In this case, we will propagate the change to
      all caches that are already active.
      
      This change will require us to move the assignment of root_cache in
      memcg_params a bit earlier.  We need this to be already set - which
      memcg_kmem_register_cache will do - when we reach __kmem_cache_create()
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      943a451a
    • G
      memcg: aggregate memcg cache values in slabinfo · 749c5415
      Glauber Costa 提交于
      When we create caches in memcgs, we need to display their usage
      information somewhere.  We'll adopt a scheme similar to /proc/meminfo,
      with aggregate totals shown in the global file, and per-group information
      stored in the group itself.
      
      For the time being, only reads are allowed in the per-group cache.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      749c5415
    • G
      memcg: destroy memcg caches · 1f458cbf
      Glauber Costa 提交于
      Implement destruction of memcg caches.  Right now, only caches where our
      reference counter is the last remaining are deleted.  If there are any
      other reference counters around, we just leave the caches lying around
      until they go away.
      
      When that happens, a destruction function is called from the cache code.
      Caches are only destroyed in process context, so we queue them up for
      later processing in the general case.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f458cbf
    • G
      slab/slub: consider a memcg parameter in kmem_create_cache · 2633d7a0
      Glauber Costa 提交于
      Allow a memcg parameter to be passed during cache creation.  When the slub
      allocator is being used, it will only merge caches that belong to the same
      memcg.  We'll do this by scanning the global list, and then translating
      the cache to a memcg-specific cache
      
      Default function is created as a wrapper, passing NULL to the memcg
      version.  We only merge caches that belong to the same memcg.
      
      A helper is provided, memcg_css_id: because slub needs a unique cache name
      for sysfs.  Since this is visible, but not the canonical location for slab
      data, the cache name is not used, the css_id should suffice.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2633d7a0
    • G
      slab/slub: struct memcg_params · ba6c496e
      Glauber Costa 提交于
      For the kmem slab controller, we need to record some extra information in
      the kmem_cache structure.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Signed-off-by: NSuleiman Souhlal <suleiman@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba6c496e
  24. 31 10月, 2012 1 次提交