1. 13 1月, 2012 13 次提交
  2. 08 1月, 2012 1 次提交
    • G
      net: fix sock_clone reference mismatch with tcp memcontrol · f3f511e1
      Glauber Costa 提交于
      Sockets can also be created through sock_clone. Because it copies
      all data in the sock structure, it also copies the memcg-related pointer,
      and all should be fine. However, since we now use reference counts in
      socket creation, we are left with some sockets that have no reference
      counts. It matters when we destroy them, since it leads to a mismatch.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Greg Thelen <gthelen@google.com>
      CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
      CC: Laurent Chavey <chavey@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3f511e1
  3. 23 12月, 2011 1 次提交
    • G
      Partial revert "Basic kernel memory functionality for the Memory Controller" · 65c64ce8
      Glauber Costa 提交于
      This reverts commit e5671dfa.
      
      After a follow up discussion with Michal, it was agreed it would
      be better to leave the kmem controller with just the tcp files,
      deferring the behavior of the other general memory.kmem.* files
      for a later time, when more caches are controlled. This is because
      generic kmem files are not used by tcp accounting and it is
      not clear how other slab caches would fit into the scheme.
      
      We are reverting the original commit so we can track the reference.
      Part of the patch is kept, because it was used by the later tcp
      code. Conflicts are shown in the bottom. init/Kconfig is removed from
      the revert entirely.
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      CC: Kirill A. Shutemov <kirill@shutemov.name>
      CC: Paul Menage <paul@paulmenage.org>
      CC: Greg Thelen <gthelen@google.com>
      CC: Johannes Weiner <jweiner@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      
      Conflicts:
      
      	Documentation/cgroups/memory.txt
      	mm/memcontrol.c
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65c64ce8
  4. 21 12月, 2011 1 次提交
  5. 13 12月, 2011 4 次提交
  6. 03 11月, 2011 6 次提交
  7. 01 11月, 2011 1 次提交
    • M
      mm: change isolate mode from #define to bitwise type · 4356f21d
      Minchan Kim 提交于
      Change ISOLATE_XXX macro with bitwise isolate_mode_t type.  Normally,
      macro isn't recommended as it's type-unsafe and making debugging harder as
      symbol cannot be passed throught to the debugger.
      
      Quote from Johannes
      " Hmm, it would probably be cleaner to fully convert the isolation mode
      into independent flags.  INACTIVE, ACTIVE, BOTH is currently a
      tri-state among flags, which is a bit ugly."
      
      This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4356f21d
  8. 31 10月, 2011 1 次提交
  9. 15 9月, 2011 1 次提交
  10. 26 8月, 2011 2 次提交
  11. 10 8月, 2011 1 次提交
    • M
      Revert "memcg: get rid of percpu_charge_mutex lock" · 9f50fad6
      Michal Hocko 提交于
      This reverts commit 8521fc50.
      
      The patch incorrectly assumes that using atomic FLUSHING_CACHED_CHARGE
      bit operations is sufficient but that is not true.  Johannes Weiner has
      reported a crash during parallel memory cgroup removal:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
        IP: [<ffffffff81083b70>] css_is_ancestor+0x20/0x70
        Oops: 0000 [#1] PREEMPT SMP
        Pid: 19677, comm: rmdir Tainted: G        W   3.0.0-mm1-00188-gf38d32b #35 ECS MCP61M-M3/MCP61M-M3
        RIP: 0010:[<ffffffff81083b70>]  css_is_ancestor+0x20/0x70
        RSP: 0018:ffff880077b09c88  EFLAGS: 00010202
        Process rmdir (pid: 19677, threadinfo ffff880077b08000, task ffff8800781bb310)
        Call Trace:
         [<ffffffff810feba3>] mem_cgroup_same_or_subtree+0x33/0x40
         [<ffffffff810feccf>] drain_all_stock+0x11f/0x170
         [<ffffffff81103211>] mem_cgroup_force_empty+0x231/0x6d0
         [<ffffffff811036c4>] mem_cgroup_pre_destroy+0x14/0x20
         [<ffffffff81080559>] cgroup_rmdir+0xb9/0x500
         [<ffffffff81114d26>] vfs_rmdir+0x86/0xe0
         [<ffffffff81114e7b>] do_rmdir+0xfb/0x110
         [<ffffffff81114ea6>] sys_rmdir+0x16/0x20
         [<ffffffff8154d76b>] system_call_fastpath+0x16/0x1b
      
      We are crashing because we try to dereference cached memcg when we are
      checking whether we should wait for draining on the cache.  The cache is
      already cleaned up, though.
      
      There is also a theoretical chance that the cached memcg gets freed
      between we test for the FLUSHING_CACHED_CHARGE and dereference it in
      mem_cgroup_same_or_subtree:
      
              CPU0                    CPU1                         CPU2
        mem=stock->cached
        stock->cached=NULL
                                    clear_bit
                                                              test_and_set_bit
        test_bit()                    ...
        <preempted>             mem_cgroup_destroy
        use after free
      
      The percpu_charge_mutex protected from this race because sync draining
      is exclusive.
      
      It is safer to revert now and come up with a more parallel
      implementation later.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Reported-by: NJohannes Weiner <jweiner@redhat.com>
      Acked-by: NJohannes Weiner <jweiner@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f50fad6
  12. 04 8月, 2011 1 次提交
  13. 27 7月, 2011 7 次提交
    • M
      memcg: get rid of percpu_charge_mutex lock · 8521fc50
      Michal Hocko 提交于
      percpu_charge_mutex protects from multiple simultaneous per-cpu charge
      caches draining because we might end up having too many work items.  At
      least this was the case until commit 26fe6168 ("memcg: fix percpu
      cached charge draining frequency") when we introduced a more targeted
      draining for async mode.
      
      Now that also sync draining is targeted we can safely remove mutex
      because we will not send more work than the current number of CPUs.
      FLUSHING_CACHED_CHARGE protects from sending the same work multiple
      times and stock->nr_pages == 0 protects from pointless sending a work if
      there is obviously nothing to be done.  This is of course racy but we
      can live with it as the race window is really small (we would have to
      see FLUSHING_CACHED_CHARGE cleared while nr_pages would be still
      non-zero).
      
      The only remaining place where we can race is synchronous mode when we
      rely on FLUSHING_CACHED_CHARGE test which might have been set by other
      drainer on the same group but we should wait in that case as well.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8521fc50
    • M
      memcg: add mem_cgroup_same_or_subtree() helper · 3e92041d
      Michal Hocko 提交于
      We are checking whether a given two groups are same or at least in the
      same subtree of a hierarchy at several places.  Let's make a helper for
      it to make code easier to read.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e92041d
    • M
      memcg: unify sync and async per-cpu charge cache draining · d38144b7
      Michal Hocko 提交于
      Currently we have two ways how to drain per-CPU caches for charges.
      drain_all_stock_sync will synchronously drain all caches while
      drain_all_stock_async will asynchronously drain only those that refer to
      a given memory cgroup or its subtree in hierarchy.  Targeted async
      draining has been introduced by 26fe6168 (memcg: fix percpu cached
      charge draining frequency) to reduce the cpu workers number.
      
      sync draining is currently triggered only from mem_cgroup_force_empty
      which is triggered only by userspace (mem_cgroup_force_empty_write) or
      when a cgroup is removed (mem_cgroup_pre_destroy).  Although these are
      not usually frequent operations it still makes some sense to do targeted
      draining as well, especially if the box has many CPUs.
      
      This patch unifies both methods to use the single code (drain_all_stock)
      which relies on the original async implementation and just adds
      flush_work to wait on all caches that are still under work for the sync
      mode.  We are using FLUSHING_CACHED_CHARGE bit check to prevent from
      waiting on a work that we haven't triggered.  Please note that both sync
      and async functions are currently protected by percpu_charge_mutex so we
      cannot race with other drainers.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d38144b7
    • M
      memcg: do not try to drain per-cpu caches without pages · d1a05b69
      Michal Hocko 提交于
      drain_all_stock_async tries to optimize a work to be done on the work
      queue by excluding any work for the current CPU because it assumes that
      the context we are called from already tried to charge from that cache
      and it's failed so it must be empty already.
      
      While the assumption is correct we can optimize it even more by checking
      the current number of pages in the cache.  This will also reduce a work
      on other CPUs with an empty stock.
      
      For the current CPU we can simply call drain_local_stock rather than
      deferring it to the work queue.
      
      [kamezawa.hiroyu@jp.fujitsu.com: use drain_local_stock for current CPU optimization]
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1a05b69
    • K
      memcg: add memory.vmscan_stat · 82f9d486
      KAMEZAWA Hiroyuki 提交于
      The commit log of 0ae5e89c ("memcg: count the soft_limit reclaim
      in...") says it adds scanning stats to memory.stat file.  But it doesn't
      because we considered we needed to make a concensus for such new APIs.
      
      This patch is a trial to add memory.scan_stat. This shows
        - the number of scanned pages(total, anon, file)
        - the number of rotated pages(total, anon, file)
        - the number of freed pages(total, anon, file)
        - the number of elaplsed time (including sleep/pause time)
      
        for both of direct/soft reclaim.
      
      The biggest difference with oringinal Ying's one is that this file
      can be reset by some write, as
      
        # echo 0 ...../memory.scan_stat
      
      Example of output is here. This is a result after make -j 6 kernel
      under 300M limit.
      
        [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.scan_stat
        [kamezawa@bluextal ~]$ cat /cgroup/memory/A/memory.vmscan_stat
        scanned_pages_by_limit 9471864
        scanned_anon_pages_by_limit 6640629
        scanned_file_pages_by_limit 2831235
        rotated_pages_by_limit 4243974
        rotated_anon_pages_by_limit 3971968
        rotated_file_pages_by_limit 272006
        freed_pages_by_limit 2318492
        freed_anon_pages_by_limit 962052
        freed_file_pages_by_limit 1356440
        elapsed_ns_by_limit 351386416101
        scanned_pages_by_system 0
        scanned_anon_pages_by_system 0
        scanned_file_pages_by_system 0
        rotated_pages_by_system 0
        rotated_anon_pages_by_system 0
        rotated_file_pages_by_system 0
        freed_pages_by_system 0
        freed_anon_pages_by_system 0
        freed_file_pages_by_system 0
        elapsed_ns_by_system 0
        scanned_pages_by_limit_under_hierarchy 9471864
        scanned_anon_pages_by_limit_under_hierarchy 6640629
        scanned_file_pages_by_limit_under_hierarchy 2831235
        rotated_pages_by_limit_under_hierarchy 4243974
        rotated_anon_pages_by_limit_under_hierarchy 3971968
        rotated_file_pages_by_limit_under_hierarchy 272006
        freed_pages_by_limit_under_hierarchy 2318492
        freed_anon_pages_by_limit_under_hierarchy 962052
        freed_file_pages_by_limit_under_hierarchy 1356440
        elapsed_ns_by_limit_under_hierarchy 351386416101
        scanned_pages_by_system_under_hierarchy 0
        scanned_anon_pages_by_system_under_hierarchy 0
        scanned_file_pages_by_system_under_hierarchy 0
        rotated_pages_by_system_under_hierarchy 0
        rotated_anon_pages_by_system_under_hierarchy 0
        rotated_file_pages_by_system_under_hierarchy 0
        freed_pages_by_system_under_hierarchy 0
        freed_anon_pages_by_system_under_hierarchy 0
        freed_file_pages_by_system_under_hierarchy 0
        elapsed_ns_by_system_under_hierarchy 0
      
      total_xxxx is for hierarchy management.
      
      This will be useful for further memcg developments and need to be
      developped before we do some complicated rework on LRU/softlimit
      management.
      
      This patch adds a new struct memcg_scanrecord into scan_control struct.
      sc->nr_scanned at el is not designed for exporting information.  For
      example, nr_scanned is reset frequentrly and incremented +2 at scanning
      mapped pages.
      
      To avoid complexity, I added a new param in scan_control which is for
      exporting scanning score.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ying Han <yinghan@google.com>
      Cc: Andrew Bresticker <abrestic@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82f9d486
    • D
      memcg: fix behavior of mem_cgroup_resize_limit() · 108b6a78
      Daisuke Nishimura 提交于
      Commit 22a668d7 ("memcg: fix behavior under memory.limit equals to
      memsw.limit") introduced "memsw_is_minimum" flag, which becomes true
      when mem_limit == memsw_limit.  The flag is checked at the beginning of
      reclaim, and "noswap" is set if the flag is true, because using swap is
      meaningless in this case.
      
      This works well in most cases, but when we try to shrink mem_limit,
      which is the same as memsw_limit now, we might fail to shrink mem_limit
      because swap doesn't used.
      
      This patch fixes this behavior by:
       - check MEM_CGROUP_RECLAIM_SHRINK at the begining of reclaim
       - If it is set, don't set "noswap" flag even if memsw_is_minimum is true.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Ying Han <yinghan@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      108b6a78
    • M
      memcg: change memcg_oom_mutex to spinlock · 1af8efe9
      Michal Hocko 提交于
      memcg_oom_mutex is used to protect memcg OOM path and eventfd interface
      for oom_control.  None of the critical sections which it protects sleep
      (eventfd_signal works from atomic context and the rest are simple linked
      list resp.  oom_lock atomic operations).
      
      Mutex is also too heavyweight for those code paths because it triggers a
      lot of scheduling.  It also makes makes convoying effects more visible
      when we have a big number of oom killing because we take the lock
      mutliple times during mem_cgroup_handle_oom so we have multiple places
      where many processes can sleep.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1af8efe9