1. 01 2月, 2009 1 次提交
    • L
      Stop playing silly games with the VM_ACCOUNT flag · fc8744ad
      Linus Torvalds 提交于
      The mmap_region() code would temporarily set the VM_ACCOUNT flag for
      anonymous shared mappings just to inform shmem_zero_setup() that it
      should enable accounting for the resulting shm object.  It would then
      clear the flag after calling ->mmap (for the /dev/zero case) or doing
      shmem_zero_setup() (for the MAP_ANON case).
      
      This just resulted in vma merge issues, but also made for just
      unnecessary confusion.  Use the already-existing VM_NORESERVE flag for
      this instead, and let shmem_{zero|file}_setup() just figure it out from
      that.
      
      This also happens to make it obvious that the new DRI2 GEM layer uses a
      non-reserving backing store for its object allocation - which is quite
      possibly not intentional.  But since I didn't want to change semantics
      in this patch, I left it alone, and just updated the caller to use the
      new flag semantics.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc8744ad
  2. 31 1月, 2009 1 次提交
    • L
      Allow opportunistic merging of VM_CAN_NONLINEAR areas · 33bfad54
      Linus Torvalds 提交于
      Commit de33c8db ("Fix OOPS in
      mmap_region() when merging adjacent VM_LOCKED file segments") unified
      the vma merging of anonymous and file maps to just one place, which
      simplified the code and fixed a use-after-free bug that could cause an
      oops.
      
      But by doing the merge opportunistically before even having called
      ->mmap() on the file method, it now compares two different 'vm_flags'
      values: the pre-mmap() value of the new not-yet-formed vma, and previous
      mappings of the same file around it.
      
      And in doing so, it refused to merge the common file case, which adds a
      marker to say "I can be made non-linear".
      
      This fixes it by just adding a set of flags that don't have to match,
      because we know they are ok to merge.  Currently it's only that single
      VM_CAN_NONLINEAR flag, but at least conceptually there could be others
      in the future.
      Reported-and-acked-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg KH <gregkh@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33bfad54
  3. 30 1月, 2009 4 次提交
  4. 27 1月, 2009 1 次提交
  5. 21 1月, 2009 1 次提交
  6. 16 1月, 2009 8 次提交
    • L
      memcg: fix a race when setting memory.swappiness · 068b38c1
      Li Zefan 提交于
      (suppose: memcg->use_hierarchy == 0 and memcg->swappiness == 60)
      
      echo 10 > /memcg/0/swappiness   |
        mem_cgroup_swappiness_write() |
          ...                         | echo 1 > /memcg/0/use_hierarchy
                                      | mkdir /mnt/0/1
                                      |   sub_memcg->swappiness = 60;
          memcg->swappiness = 10;     |
      
      In the above scenario, we end up having 2 different swappiness
      values in a single hierarchy.
      
      We should hold cgroup_lock() when cheking cgrp->children list.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      068b38c1
    • L
      memcg: fix section mismatch · 0eb253e2
      Li Zefan 提交于
      At system boot when creating the top cgroup, mem_cgroup_create() calls
      enable_swap_cgroup() which is marked as __init, so mark
      mem_cgroup_create() as __ref to avoid false section mismatch warning.
      Reported-by: NRakib Mullick <rakib.mullick@gmail.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by; KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0eb253e2
    • A
      revert "mm: vmalloc use mutex for purge" · 46666d8a
      Andrew Morton 提交于
      Revert commit e97a630e ("mm: vmalloc use
      mutex for purge")
      
      Bryan Donlan reports:
      
      : After testing 2.6.29-rc1 on xen-x86 with a btrfs root filesystem, I
      : got the OOPS quoted below and a hard freeze shortly after boot.
      : Boot messages and config are attached.
      :
      : ------------[ cut here ]------------
      : Kernel BUG at c05ef80d [verbose debug info unavailable]
      : invalid opcode: 0000 [#1] SMP
      : last sysfs file: /sys/block/xvdc/size
      : Modules linked in:
      :
      : Pid: 0, comm: swapper Not tainted (2.6.29-rc1 #6)
      : EIP: 0061:[<c05ef80d>] EFLAGS: 00010087 CPU: 2
      : EIP is at schedule+0x7cd/0x950
      : EAX: d5aeca80 EBX: 00000002 ECX: 00000000 EDX: d4cb9a40
      : ESI: c12f5600 EDI: d4cb9a40 EBP: d6033fa4 ESP: d6033ef4
      :  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
      : Process swapper (pid: 0, ti=d6032000 task=d6020b70 task.ti=d6032000)
      : Stack:
      :  000d85bc 00000000 000186a0 00000000 0dd11410 c0105417 c12efe00 0dc367c3
      :  00000011 c0105d46 d5a5d310 deadbeef d4cb9a40 c07cc600 c05f1340 c12e0060
      :  deadbeef d6020b70 d6020d08 00000002 c014377d 00000000 c12f5600 00002c22
      : Call Trace:
      :  [<c0105417>] xen_force_evtchn_callback+0x17/0x30
      :  [<c0105d46>] check_events+0x8/0x12
      :  [<c05f1340>] _spin_unlock_irqrestore+0x20/0x40
      :  [<c014377d>] hrtimer_start_range_ns+0x12d/0x2e0
      :  [<c014c4f6>] tick_nohz_restart_sched_tick+0x146/0x160
      :  [<c0107485>] cpu_idle+0xa5/0xc0
      
      and bisected it to this commit.
      
      Let's remove it now while we have a think about the problem.
      Reported-by: NBryan Donlan <bdonlan@gmail.com>
      Tested-by: NChristophe Saout <christophe@saout.de>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46666d8a
    • D
      memcg: make oom less frequently · 4d1c6273
      Daisuke Nishimura 提交于
      In previous implementation, mem_cgroup_try_charge checked the return
      value of mem_cgroup_try_to_free_pages, and just retried if some pages
      had been reclaimed.
      But now, try_charge(and mem_cgroup_hierarchical_reclaim called from it)
      only checks whether the usage is less than the limit.
      
      This patch tries to change the behavior as before to cause oom less
      frequently.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d1c6273
    • D
      memcg: fix hierarchical reclaim · c268e994
      Daisuke Nishimura 提交于
      If root_mem has no children, last_scaned_child is set to root_mem itself.
      But after some children added to root_mem, mem_cgroup_get_next_node can
      mem_cgroup_put the root_mem although root_mem has not been mem_cgroup_get.
      
      This patch fixes this behavior by:
      
      - Set last_scanned_child to NULL if root_mem has no children or DFS
        search has returned to root_mem itself(root_mem is not a "child" of
        root_mem).  Make mem_cgroup_get_first_node return root_mem in this case.
         There are no mem_cgroup_get/put for root_mem.
      
      - Rename mem_cgroup_get_next_node to __mem_cgroup_get_next_node, and
        mem_cgroup_get_first_node to mem_cgroup_get_next_node.  Make
        mem_cgroup_hierarchical_reclaim call only new mem_cgroup_get_next_node.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c268e994
    • D
      memcg: fix error path of mem_cgroup_move_parent · 40d58138
      Daisuke Nishimura 提交于
      There is a bug in error path of mem_cgroup_move_parent.
      
      Extra refcnt got from try_charge should be dropped, and usages incremented
      by try_charge should be decremented in both error paths:
      
          A: failure at get_page_unless_zero
          B: failure at isolate_lru_page
      
      This bug makes this parent directory unremovable.
      
      In case of A, rmdir doesn't return, because res.usage doesn't go down to 0
      at mem_cgroup_force_empty even after all the pc in lru are removed.
      
      In case of B, rmdir fails and returns -EBUSY, because it has extra ref
      counts even after res.usage goes down to 0.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40d58138
    • D
      memcg: fix mem_cgroup_get_reclaim_stat_from_page · bd112db8
      Daisuke Nishimura 提交于
      In case of swapin, a new page is added to lru before it is charged,
      so page->pc->mem_cgroup points to NULL or last mem_cgroup the page
      was charged before.
      
      In the latter case, if the mem_cgroup has already freed by rmdir,
      the area pointed to by page->pc->mem_cgroup may have invalid data.
      
      Actually, I saw general protection fault.
      
          general protection fault: 0000 [#1] SMP
          last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map
          CPU 4
          Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery ac lp sg ide_cd_mod cdrom button serio_raw acpi_memhotplug parport_pc e1000 rtc_cmos parport rtc_core rtc_lib i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
          Pid: 26038, comm: page01 Tainted: G        W  2.6.28-rc9-mm1-mmotm-2008-12-22-16-14-f2ab3dea #1
          RIP: 0010:[<ffffffff8028e710>]  [<ffffffff8028e710>] update_page_reclaim_stat+0x2f/0x42
          RSP: 0000:ffff8801ee457da8  EFLAGS: 00010002
          RAX: 32353438312021c8 RBX: 0000000000000000 RCX: 32353438312021c8
          RDX: 0000000000000000 RSI: ffff8800cb0b1000 RDI: ffff8801164d1d28
          RBP: ffff880110002cb8 R08: ffff88010f2eae23 R09: 0000000000000001
          R10: ffff8800bc514b00 R11: ffff880110002c00 R12: 0000000000000000
          R13: ffff88000f484100 R14: 0000000000000003 R15: 00000000001200d2
          FS:  00007f8a261726f0(0000) GS:ffff88010f2eaa80(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 00007f8a25d22000 CR3: 00000001ef18c000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          Process page01 (pid: 26038, threadinfo ffff8801ee456000, task ffff8800b585b960)
          Stack:
           ffffe200071ee568 ffff880110001f00 0000000000000000 ffffffff8028ea17
           ffff88000f484100 0000000000000000 0000000000000020 00007f8a25d22000
           ffff8800bc514b00 ffffffff8028ec34 0000000000000000 0000000000016fd8
          Call Trace:
           [<ffffffff8028ea17>] ? ____pagevec_lru_add+0xc1/0x13c
           [<ffffffff8028ec34>] ? drain_cpu_pagevecs+0x36/0x89
           [<ffffffff802a4f8c>] ? swapin_readahead+0x78/0x98
           [<ffffffff8029a37a>] ? handle_mm_fault+0x3d9/0x741
           [<ffffffff804da654>] ? do_page_fault+0x3ce/0x78c
           [<ffffffff804d7a42>] ? trace_hardirqs_off_thunk+0x3a/0x3c
           [<ffffffff804d860f>] ? page_fault+0x1f/0x30
          Code: cc 55 48 8d af b8 0d 00 00 48 89 f7 53 89 d3 e8 39 85 02 00 48 63 d3 48 ff 44 d5 10 45 85 e4 74 05 48 ff 44 d5 00 48 85 c0 74 0e <48> ff 44 d0 10 45 85 e4 74 04 48 ff 04 d0 5b 5d 41 5c c3 41 54
          RIP  [<ffffffff8028e710>] update_page_reclaim_stat+0x2f/0x42
           RSP <ffff8801ee457da8>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd112db8
    • I
      alpha: fix vmalloc breakage · 822c18f2
      Ivan Kokshaysky 提交于
      On alpha, we have to map some stuff in the VMALLOC space very early in the
      boot process (to make SRM console callbacks work and so on, see
      arch/alpha/mm/init.c).  For old VM allocator, we just manually placed a
      vm_struct onto the global vmlist and this worked for ages.
      
      Unfortunately, the new allocator isn't aware of this, so it constantly
      tries to allocate the VM space which is already in use, making vmalloc on
      alpha defunct.
      
      This patch forces KVA to import vmlist entries on init.
      
      [akpm@linux-foundation.org: remove unneeded check (per Johannes)]
      Signed-off-by: NIvan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      822c18f2
  7. 14 1月, 2009 8 次提交
  8. 12 1月, 2009 1 次提交
  9. 09 1月, 2009 15 次提交