1. 15 7月, 2013 1 次提交
    • S
      slub: Check for page NULL before doing the node_match check · c25f195e
      Steven Rostedt 提交于
      In the -rt kernel (mrg), we hit the following dump:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
      PGD a2d39067 PUD b1641067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: sunrpc cpufreq_ondemand ipv6 tg3 joydev sg serio_raw pcspkr k8temp amd64_edac_mod edac_core i2c_piix4 e100 mii shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom sata_svw ata_generic pata_acpi pata_serverworks radeon ttm drm_kms_helper drm hwmon i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
      CPU 3
      Pid: 20878, comm: hackbench Not tainted 3.6.11-rt25.14.el6rt.x86_64 #1 empty empty/Tyan Transport GT24-B3992
      RIP: 0010:[<ffffffff811573f1>]  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
      RSP: 0018:ffff8800a9b17d70  EFLAGS: 00010213
      RAX: 0000000000000000 RBX: 0000000001200011 RCX: ffff8800a06d8000
      RDX: 0000000004d92a03 RSI: 00000000000000d0 RDI: ffff88013b805500
      RBP: ffff8800a9b17dc0 R08: ffff88023fd14d10 R09: ffffffff81041cbd
      R10: 00007f4e3f06e9d0 R11: 0000000000000246 R12: ffff88013b805500
      R13: ffff8801ff46af40 R14: 0000000000000001 R15: 0000000000000000
      FS:  00007f4e3f06e700(0000) GS:ffff88023fd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000000 CR3: 00000000a2d3a000 CR4: 00000000000007e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process hackbench (pid: 20878, threadinfo ffff8800a9b16000, task ffff8800a06d8000)
      Stack:
       ffff8800a9b17da0 ffffffff81202e08 ffff8800a9b17de0 000000d001200011
       0000000001200011 0000000001200011 0000000000000000 0000000000000000
       00007f4e3f06e9d0 0000000000000000 ffff8800a9b17e60 ffffffff81041cbd
      Call Trace:
       [<ffffffff81202e08>] ? current_has_perm+0x68/0x80
       [<ffffffff81041cbd>] copy_process+0xdd/0x15b0
       [<ffffffff810a2125>] ? rt_up_read+0x25/0x30
       [<ffffffff8104369a>] do_fork+0x5a/0x360
       [<ffffffff8107c66b>] ? migrate_enable+0xeb/0x220
       [<ffffffff8100b068>] sys_clone+0x28/0x30
       [<ffffffff81527423>] stub_clone+0x13/0x20
       [<ffffffff81527152>] ? system_call_fastpath+0x16/0x1b
      Code: 89 fc 89 75 cc 41 89 d6 4d 8b 04 24 65 4c 03 04 25 48 ae 00 00 49 8b 50 08 4d 8b 28 49 8b 40 10 4d 85 ed 74 12 41 83 fe ff 74 27 <48> 8b 00 48 c1 e8 3a 41 39 c6 74 1b 8b 75 cc 4c 89 c9 44 89 f2
      RIP  [<ffffffff811573f1>] kmem_cache_alloc_node+0x51/0x180
       RSP <ffff8800a9b17d70>
      CR2: 0000000000000000
      ---[ end trace 0000000000000002 ]---
      
      Now, this uses SLUB pretty much unmodified, but as it is the -rt kernel
      with CONFIG_PREEMPT_RT set, spinlocks are mutexes, although they do
      disable migration. But the SLUB code is relatively lockless, and the
      spin_locks there are raw_spin_locks (not converted to mutexes), thus I
      believe this bug can happen in mainline without -rt features. The -rt
      patch is just good at triggering mainline bugs ;-)
      
      Anyway, looking at where this crashed, it seems that the page variable
      can be NULL when passed to the node_match() function (which does not
      check if it is NULL). When this happens we get the above panic.
      
      As page is only used in slab_alloc() to check if the node matches, if
      it's NULL I'm assuming that we can say it doesn't and call the
      __slab_alloc() code. Is this a correct assumption?
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c25f195e
  2. 08 7月, 2013 1 次提交
  3. 07 7月, 2013 3 次提交
  4. 05 4月, 2013 2 次提交
  5. 02 4月, 2013 2 次提交
  6. 28 2月, 2013 1 次提交
  7. 01 2月, 2013 4 次提交
  8. 19 12月, 2012 7 次提交
    • G
      slub: drop mutex before deleting sysfs entry · 5413dfba
      Glauber Costa 提交于
      Sasha Levin recently reported a lockdep problem resulting from the new
      attribute propagation introduced by kmemcg series.  In short, slab_mutex
      will be called from within the sysfs attribute store function.  This will
      create a dependency, that will later be held backwards when a cache is
      destroyed - since destruction occurs with the slab_mutex held, and then
      calls in to the sysfs directory removal function.
      
      In this patch, I propose to adopt a strategy close to what
      __kmem_cache_create does before calling sysfs_slab_add, and release the
      lock before the call to sysfs_slab_remove.  This is pretty much the last
      operation in the kmem_cache_shutdown() path, so we could do better by
      splitting this and moving this call alone to later on.  This will fit
      nicely when sysfs handling is consistent between all caches, but will look
      weird now.
      
      Lockdep info:
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.7.0-rc4-next-20121106-sasha-00008-g353b62f #117 Tainted: G        W
        -------------------------------------------------------
        trinity-child13/6961 is trying to acquire lock:
         (s_active#43){++++.+}, at:  sysfs_addrm_finish+0x31/0x60
      
        but task is already holding lock:
         (slab_mutex){+.+.+.}, at:  kmem_cache_destroy+0x22/0xe0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
        -> #1 (slab_mutex){+.+.+.}:
                lock_acquire+0x1aa/0x240
                __mutex_lock_common+0x59/0x5a0
                mutex_lock_nested+0x3f/0x50
                slab_attr_store+0xde/0x110
                sysfs_write_file+0xfa/0x150
                vfs_write+0xb0/0x180
                sys_pwrite64+0x60/0xb0
                tracesys+0xe1/0xe6
        -> #0 (s_active#43){++++.+}:
                __lock_acquire+0x14df/0x1ca0
                lock_acquire+0x1aa/0x240
                sysfs_deactivate+0x122/0x1a0
                sysfs_addrm_finish+0x31/0x60
                sysfs_remove_dir+0x89/0xd0
                kobject_del+0x16/0x40
                __kmem_cache_shutdown+0x40/0x60
                kmem_cache_destroy+0x40/0xe0
                mon_text_release+0x78/0xe0
                __fput+0x122/0x2d0
                ____fput+0x9/0x10
                task_work_run+0xbe/0x100
                do_exit+0x432/0xbd0
                do_group_exit+0x84/0xd0
                get_signal_to_deliver+0x81d/0x930
                do_signal+0x3a/0x950
                do_notify_resume+0x3e/0x90
                int_signal+0x12/0x17
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
               CPU0                    CPU1
               ----                    ----
          lock(slab_mutex);
                                       lock(s_active#43);
                                       lock(slab_mutex);
          lock(s_active#43);
      
         *** DEADLOCK ***
      
        2 locks held by trinity-child13/6961:
         #0:  (mon_lock){+.+.+.}, at:  mon_text_release+0x25/0xe0
         #1:  (slab_mutex){+.+.+.}, at:  kmem_cache_destroy+0x22/0xe0
      
        stack backtrace:
        Pid: 6961, comm: trinity-child13 Tainted: G        W    3.7.0-rc4-next-20121106-sasha-00008-g353b62f #117
        Call Trace:
          print_circular_bug+0x1fb/0x20c
          __lock_acquire+0x14df/0x1ca0
          lock_acquire+0x1aa/0x240
          sysfs_deactivate+0x122/0x1a0
          sysfs_addrm_finish+0x31/0x60
          sysfs_remove_dir+0x89/0xd0
          kobject_del+0x16/0x40
          __kmem_cache_shutdown+0x40/0x60
          kmem_cache_destroy+0x40/0xe0
          mon_text_release+0x78/0xe0
          __fput+0x122/0x2d0
          ____fput+0x9/0x10
          task_work_run+0xbe/0x100
          do_exit+0x432/0xbd0
          do_group_exit+0x84/0xd0
          get_signal_to_deliver+0x81d/0x930
          do_signal+0x3a/0x950
          do_notify_resume+0x3e/0x90
          int_signal+0x12/0x17
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5413dfba
    • G
      memcg: add comments clarifying aspects of cache attribute propagation · ebe945c2
      Glauber Costa 提交于
      This patch clarifies two aspects of cache attribute propagation.
      
      First, the expected context for the for_each_memcg_cache macro in
      memcontrol.h.  The usages already in the codebase are safe.  In mm/slub.c,
      it is trivially safe because the lock is acquired right before the loop.
      In mm/slab.c, it is less so: the lock is acquired by an outer function a
      few steps back in the stack, so a VM_BUG_ON() is added to make sure it is
      indeed safe.
      
      A comment is also added to detail why we are returning the value of the
      parent cache and ignoring the children's when we propagate the attributes.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebe945c2
    • G
      slub: slub-specific propagation changes · 107dab5c
      Glauber Costa 提交于
      SLUB allows us to tune a particular cache behavior with sysfs-based
      tunables.  When creating a new memcg cache copy, we'd like to preserve any
      tunables the parent cache already had.
      
      This can be done by tapping into the store attribute function provided by
      the allocator.  We of course don't need to mess with read-only fields.
      Since the attributes can have multiple types and are stored internally by
      sysfs, the best strategy is to issue a ->show() in the root cache, and
      then ->store() in the memcg cache.
      
      The drawback of that, is that sysfs can allocate up to a page in buffering
      for show(), that we are likely not to need, but also can't guarantee.  To
      avoid always allocating a page for that, we can update the caches at store
      time with the maximum attribute size ever stored to the root cache.  We
      will then get a buffer big enough to hold it.  The corolary to this, is
      that if no stores happened, nothing will be propagated.
      
      It can also happen that a root cache has its tunables updated during
      normal system operation.  In this case, we will propagate the change to
      all caches that are already active.
      
      [akpm@linux-foundation.org: tweak code to avoid __maybe_unused]
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      107dab5c
    • G
      memcg: destroy memcg caches · 1f458cbf
      Glauber Costa 提交于
      Implement destruction of memcg caches.  Right now, only caches where our
      reference counter is the last remaining are deleted.  If there are any
      other reference counters around, we just leave the caches lying around
      until they go away.
      
      When that happens, a destruction function is called from the cache code.
      Caches are only destroyed in process context, so we queue them up for
      later processing in the general case.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f458cbf
    • G
      sl[au]b: allocate objects from memcg cache · d79923fa
      Glauber Costa 提交于
      We are able to match a cache allocation to a particular memcg.  If the
      task doesn't change groups during the allocation itself - a rare event,
      this will give us a good picture about who is the first group to touch a
      cache page.
      
      This patch uses the now available infrastructure by calling
      memcg_kmem_get_cache() before all the cache allocations.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d79923fa
    • G
      sl[au]b: always get the cache from its page in kmem_cache_free() · b9ce5ef4
      Glauber Costa 提交于
      struct page already has this information.  If we start chaining caches,
      this information will always be more trustworthy than whatever is passed
      into the function.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9ce5ef4
    • G
      slab/slub: consider a memcg parameter in kmem_create_cache · 2633d7a0
      Glauber Costa 提交于
      Allow a memcg parameter to be passed during cache creation.  When the slub
      allocator is being used, it will only merge caches that belong to the same
      memcg.  We'll do this by scanning the global list, and then translating
      the cache to a memcg-specific cache
      
      Default function is created as a wrapper, passing NULL to the memcg
      version.  We only merge caches that belong to the same memcg.
      
      A helper is provided, memcg_css_id: because slub needs a unique cache name
      for sysfs.  Since this is visible, but not the canonical location for slab
      data, the cache name is not used, the css_id should suffice.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <fweisbec@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: JoonSoo Kim <js1304@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Suleiman Souhlal <suleiman@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2633d7a0
  9. 12 12月, 2012 1 次提交
    • L
      slub, hotplug: ignore unrelated node's hot-adding and hot-removing · b9d5ab25
      Lai Jiangshan 提交于
      SLUB only focuses on the nodes which have normal memory and it ignores the
      other node's hot-adding and hot-removing.
      
      Aka: if some memory of a node which has no onlined memory is online, but
      this new memory onlined is not normal memory (for example, highmem), we
      should not allocate kmem_cache_node for SLUB.
      
      And if the last normal memory is offlined, but the node still has memory,
      we should remove kmem_cache_node for that node.  (The current code delays
      it when all of the memory is offlined)
      
      So we only do something when marg->status_change_nid_normal > 0.
      marg->status_change_nid is not suitable here.
      
      The same problem doesn't exist in SLAB, because SLAB allocates kmem_list3
      for every node even the node don't have normal memory, SLAB tolerates
      kmem_list3 on alien nodes.  SLUB only focuses on the nodes which have
      normal memory, it don't tolerate alien kmem_cache_node.  The patch makes
      SLUB become self-compatible and avoids WARNs and BUGs in rare conditions.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9d5ab25
  10. 11 12月, 2012 4 次提交
  11. 31 10月, 2012 2 次提交
  12. 24 10月, 2012 4 次提交
  13. 19 10月, 2012 1 次提交
    • J
      slub: remove one code path and reduce lock contention in __slab_free() · 837d678d
      Joonsoo Kim 提交于
      When we try to free object, there is some of case that we need
      to take a node lock. This is the necessary step for preventing a race.
      After taking a lock, then we try to cmpxchg_double_slab().
      But, there is a possible scenario that cmpxchg_double_slab() is failed
      with taking a lock. Following example explains it.
      
      CPU A               CPU B
      need lock
      ...                 need lock
      ...                 lock!!
      lock..but spin      free success
      spin...             unlock
      lock!!
      free fail
      
      In this case, retry with taking a lock is occured in CPU A.
      I think that in this case for CPU A,
      "release a lock first, and re-take a lock if necessary" is preferable way.
      
      There are two reasons for this.
      
      First, this makes __slab_free()'s logic somehow simple.
      With this patch, 'was_frozen = 1' is "always" handled without taking a lock.
      So we can remove one code path.
      
      Second, it may reduce lock contention.
      When we do retrying, status of slab is already changed,
      so we don't need a lock anymore in almost every case.
      "release a lock first, and re-take a lock if necessary" policy is
      helpful to this.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      837d678d
  14. 03 10月, 2012 1 次提交
  15. 25 9月, 2012 1 次提交
  16. 19 9月, 2012 1 次提交
  17. 18 9月, 2012 1 次提交
    • J
      slub: consider pfmemalloc_match() in get_partial_node() · 8ba00bb6
      Joonsoo Kim 提交于
      get_partial() is currently not checking pfmemalloc_match() meaning that
      it is possible for pfmemalloc pages to leak to non-pfmemalloc users.
      This is a problem in the following situation.  Assume that there is a
      request from normal allocation and there are no objects in the per-cpu
      cache and no node-partial slab.
      
      In this case, slab_alloc enters the slow path and new_slab_objects() is
      called which may return a PFMEMALLOC page.  As the current user is not
      allowed to access PFMEMALLOC page, deactivate_slab() is called
      ([5091b74a: mm: slub: optimise the SLUB fast path to avoid pfmemalloc
      checks]) and returns an object from PFMEMALLOC page.
      
      Next time, when we get another request from normal allocation,
      slab_alloc() enters the slow-path and calls new_slab_objects().  In
      new_slab_objects(), we call get_partial() and get a partial slab which
      was just deactivated but is a pfmemalloc page.  We extract one object
      from it and re-deactivate.
      
        "deactivate -> re-get in get_partial -> re-deactivate" occures repeatedly.
      
      As a result, access to PFMEMALLOC page is not properly restricted and it
      can cause a performance degradation due to frequent deactivation.
      deactivation frequently.
      
      This patch changes get_partial_node() to take pfmemalloc_match() into
      account and prevents the "deactivate -> re-get in get_partial()
      scenario.  Instead, new_slab() is called.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ba00bb6
  18. 10 9月, 2012 1 次提交
    • C
      slub: Zero initial memory segment for kmem_cache and kmem_cache_node · 9df53b15
      Christoph Lameter 提交于
      Tony Luck reported the following problem on IA-64:
      
        Worked fine yesterday on next-20120905, crashes today. First sign of
        trouble was an unaligned access, then a NULL dereference. SL*B related
        bits of my config:
      
        CONFIG_SLUB_DEBUG=y
        # CONFIG_SLAB is not set
        CONFIG_SLUB=y
        CONFIG_SLABINFO=y
        # CONFIG_SLUB_DEBUG_ON is not set
        # CONFIG_SLUB_STATS is not set
      
        And he console log.
      
        PID hash table entries: 4096 (order: 1, 32768 bytes)
        Dentry cache hash table entries: 262144 (order: 7, 2097152 bytes)
        Inode-cache hash table entries: 131072 (order: 6, 1048576 bytes)
        Memory: 2047920k/2086064k available (13992k code, 38144k reserved,
        6012k data, 880k init)
        kernel unaligned access to 0xca2ffc55fb373e95, ip=0xa0000001001be550
        swapper[0]: error during unaligned kernel access
         -1 [1]
        Modules linked in:
      
        Pid: 0, CPU 0, comm:              swapper
        psr : 00001010084a2018 ifs : 800000000000060f ip  :
        [<a0000001001be550>]    Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
        ip is at new_slab+0x90/0x680
        unat: 0000000000000000 pfs : 000000000000060f rsc : 0000000000000003
        rnat: 9666960159966a59 bsps: a0000001001441c0 pr  : 9666960159965a59
        ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
        csd : 0000000000000000 ssd : 0000000000000000
        b0  : a0000001001be500 b6  : a00000010112cb20 b7  : a0000001011660a0
        f6  : 0fff7f0f0f0f0e54f0000 f7  : 0ffe8c5c1000000000000
        f8  : 1000d8000000000000000 f9  : 100068800000000000000
        f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
        r1  : a00000010155eef0 r2  : 0000000000000000 r3  : fffffffffffc1638
        r8  : e0000040600081b8 r9  : ca2ffc55fb373e95 r10 : 0000000000000000
        r11 : e000004040001646 r12 : a000000101287e20 r13 : a000000101280000
        r14 : 0000000000004000 r15 : 0000000000000078 r16 : ca2ffc55fb373e75
        r17 : e000004040040000 r18 : fffffffffffc1646 r19 : e000004040001646
        r20 : fffffffffffc15f8 r21 : 000000000000004d r22 : a00000010132fa68
        r23 : 00000000000000ed r24 : 0000000000000000 r25 : 0000000000000000
        r26 : 0000000000000001 r27 : a0000001012b8500 r28 : a00000010135f4a0
        r29 : 0000000000000000 r30 : 0000000000000000 r31 : 0000000000000001
        Unable to handle kernel NULL pointer dereference (address
        0000000000000018)
        swapper[0]: Oops 11003706212352 [2]
        Modules linked in:
      
        Pid: 0, CPU 0, comm:              swapper
        psr : 0000121008022018 ifs : 800000000000cc18 ip  :
        [<a0000001004dc8f1>]    Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
        ip is at __copy_user+0x891/0x960
        unat: 0000000000000000 pfs : 0000000000000813 rsc : 0000000000000003
        rnat: 0000000000000000 bsps: 0000000000000000 pr  : 9666960159961765
        ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
        csd : 0000000000000000 ssd : 0000000000000000
        b0  : a00000010004b550 b6  : a00000010004b740 b7  : a00000010000c750
        f6  : 000000000000000000000 f7  : 1003e9e3779b97f4a7c16
        f8  : 1003e0a00000010001550 f9  : 100068800000000000000
        f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
        r1  : a00000010155eef0 r2  : a0000001012870b0 r3  : a0000001012870b8
        r8  : 0000000000000298 r9  : 0000000000000013 r10 : 0000000000000000
        r11 : 9666960159961a65 r12 : a000000101287010 r13 : a000000101280000
        r14 : a000000101287068 r15 : a000000101287080 r16 : 0000000000000298
        r17 : 0000000000000010 r18 : 0000000000000018 r19 : a000000101287310
        r20 : 0000000000000290 r21 : 0000000000000000 r22 : 0000000000000000
        r23 : a000000101386f58 r24 : 0000000000000000 r25 : 000000007fffffff
        r26 : a000000101287078 r27 : a0000001013c69b0 r28 : 0000000000000000
        r29 : 0000000000000014 r30 : 0000000000000000 r31 : 0000000000000813
      
      Sedat Dilek and Hugh Dickins reported similar problems as well.
      
      Earlier patches in the common set moved the zeroing of the kmem_cache
      structure into common code. See "Move allocation of kmem_cache into
      common code".
      
      The allocation for the two special structures is still done from SLUB
      specific code but no zeroing is done since the cache creation functions
      used to zero. This now needs to be updated so that the structures are
      zeroed during allocation in kmem_cache_init().  Otherwise random pointer
      values may be followed.
      Reported-by: NTony Luck <tony.luck@intel.com>
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Reported-by: NHugh Dickins <hughd@google.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      9df53b15
  19. 05 9月, 2012 2 次提交
    • P
      Revert "mm/sl[aou]b: Move sysfs_slab_add to common" · aac3a166
      Pekka Enberg 提交于
      This reverts commit 96d17b7b which
      caused the following errors at boot:
      
        [    1.114885] kobject (ffff88001a802578): tried to init an initialized object, something is seriously wrong.
        [    1.114885] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
        [    1.114885] Call Trace:
        [    1.114885]  [<ffffffff81273f37>] kobject_init+0x87/0xa0
        [    1.115555]  [<ffffffff8127426a>] kobject_init_and_add+0x2a/0x90
        [    1.115555]  [<ffffffff8127c870>] ? sprintf+0x40/0x50
        [    1.115555]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
        [    1.115555]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
        [    1.115555]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
        [    1.115555]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
        [    1.115555]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
        [    1.115836]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
        [    1.116834]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
        [    1.117835]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
        [    1.117835]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
        [    1.118401]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
        [    1.119832]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
        [    1.120325] ------------[ cut here ]------------
        [    1.120835] WARNING: at fs/sysfs/dir.c:536 sysfs_add_one+0xc1/0xf0()
        [    1.121437] sysfs: cannot create duplicate filename '/kernel/slab/:t-0000016'
        [    1.121831] Modules linked in:
        [    1.122138] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
        [    1.122831] Call Trace:
        [    1.123074]  [<ffffffff81195ce1>] ? sysfs_add_one+0xc1/0xf0
        [    1.123833]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
        [    1.124405]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
        [    1.124832]  [<ffffffff81195ce1>] sysfs_add_one+0xc1/0xf0
        [    1.125337]  [<ffffffff81195eb3>] create_dir+0x73/0xd0
        [    1.125832]  [<ffffffff81196221>] sysfs_create_dir+0x81/0xe0
        [    1.126363]  [<ffffffff81273d3d>] kobject_add_internal+0x9d/0x210
        [    1.126832]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
        [    1.127406]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
        [    1.127832]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
        [    1.128384]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
        [    1.128833]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
        [    1.129831]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
        [    1.130305]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
        [    1.130831]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
        [    1.131351]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
        [    1.131830]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
        [    1.132392]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
        [    1.132830]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
        [    1.133315] ---[ end trace 2703540871c8fab7 ]---
        [    1.133830] ------------[ cut here ]------------
        [    1.134274] WARNING: at lib/kobject.c:196 kobject_add_internal+0x1f5/0x210()
        [    1.134829] kobject_add_internal failed for :t-0000016 with -EEXIST, don't try to register things with the same name in the same directory.
        [    1.135829] Modules linked in:
        [    1.136135] Pid: 1, comm: swapper/0 Tainted: G        W    3.6.0-rc1+ #6
        [    1.136828] Call Trace:
        [    1.137071]  [<ffffffff81273e95>] ? kobject_add_internal+0x1f5/0x210
        [    1.137830]  [<ffffffff8103adfa>] warn_slowpath_common+0x7a/0xb0
        [    1.138402]  [<ffffffff8103aed1>] warn_slowpath_fmt+0x41/0x50
        [    1.138830]  [<ffffffff811955a3>] ? release_sysfs_dirent+0x73/0xf0
        [    1.139419]  [<ffffffff81273e95>] kobject_add_internal+0x1f5/0x210
        [    1.139830]  [<ffffffff812742a3>] kobject_init_and_add+0x63/0x90
        [    1.140429]  [<ffffffff81124c60>] sysfs_slab_add+0x80/0x210
        [    1.140830]  [<ffffffff81100175>] kmem_cache_create+0xa5/0x250
        [    1.141829]  [<ffffffff81cf24cd>] ? md_init+0x144/0x144
        [    1.142307]  [<ffffffff81cf25b6>] local_init+0xa4/0x11b
        [    1.142829]  [<ffffffff81cf24e1>] dm_init+0x14/0x45
        [    1.143307]  [<ffffffff810001ba>] do_one_initcall+0x3a/0x160
        [    1.143829]  [<ffffffff81cc2c90>] kernel_init+0x133/0x1b7
        [    1.144352]  [<ffffffff81cc25c4>] ? do_early_param+0x86/0x86
        [    1.144829]  [<ffffffff8171aff4>] kernel_thread_helper+0x4/0x10
        [    1.145405]  [<ffffffff81cc2b5d>] ? start_kernel+0x33f/0x33f
        [    1.145828]  [<ffffffff8171aff0>] ? gs_change+0xb/0xb
        [    1.146313] ---[ end trace 2703540871c8fab8 ]---
      
      Conflicts:
      
      	mm/slub.c
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      aac3a166
    • C
      mm/sl[aou]b: Move kmem_cache refcounting to common code · cce89f4f
      Christoph Lameter 提交于
      Get rid of the refcount stuff in the allocators and do that part of
      kmem_cache management in the common code.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      cce89f4f