1. 06 2月, 2009 1 次提交
    • C
      do_wp_page: fix regression with execute in place · ab92661d
      Carsten Otte 提交于
      Fix do_wp_page for VM_MIXEDMAP mappings.
      
      In the case where pfn_valid returns 0 for a pfn at the beginning of
      do_wp_page and the mapping is not shared writable, the code branches to
      label `gotten:' with old_page == NULL.
      
      In case the vma is locked (vma->vm_flags & VM_LOCKED), lock_page,
      clear_page_mlock, and unlock_page try to access the old_page.
      
      This patch checks whether old_page is valid before it is dereferenced.
      
      The regression was introduced by "mlock: mlocked pages are unevictable"
      (commit b291f000).
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab92661d
  2. 04 2月, 2009 1 次提交
    • A
      write-back: fix nr_to_write counter · dcf6a79d
      Artem Bityutskiy 提交于
      Commit 05fe478d introduced some
      @wbc->nr_to_write breakage.
      
      It made the following changes:
       1. Decrement wbc->nr_to_write instead of nr_to_write
       2. Decrement wbc->nr_to_write _only_ if wbc->sync_mode == WB_SYNC_NONE
       3. If synced nr_to_write pages, stop only if if wbc->sync_mode ==
          WB_SYNC_NONE, otherwise keep going.
      
      However, according to the commit message, the intention was to only make
      change 3.  Change 1 is a bug.  Change 2 does not seem to be necessary,
      and it breaks UBIFS expectations, so if needed, it should be done
      separately later.  And change 2 does not seem to be documented in the
      commit message.
      
      This patch does the following:
       1. Undo changes 1 and 2
       2. Add a comment explaining change 3 (it very useful to have comments
          in _code_, not only in the commit).
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dcf6a79d
  3. 02 2月, 2009 1 次提交
    • L
      Manually revert "mlock: downgrade mmap sem while populating mlocked regions" · 27421e21
      Linus Torvalds 提交于
      This essentially reverts commit 8edb08ca.
      
      It downgraded our mmap semaphore to a read-lock while mlocking pages, in
      order to allow other threads (and external accesses like "ps" et al) to
      walk the vma lists and take page faults etc.  Which is a nice idea, but
      the implementation does not work.
      
      Because we cannot upgrade the lock back to a write lock without
      releasing the mmap semaphore, the code had to release the lock entirely
      and then re-take it as a writelock.  However, that meant that the caller
      possibly lost the vma chain that it was following, since now another
      thread could come in and mmap/munmap the range.
      
      The code tried to work around that by just looking up the vma again and
      erroring out if that happened, but quite frankly, that was just a buggy
      hack that doesn't actually protect against anything (the other thread
      could just have replaced the vma with another one instead of totally
      unmapping it).
      
      The only way to downgrade to a read map _reliably_ is to do it at the
      end, which is likely the right thing to do: do all the 'vma' operations
      with the write-lock held, then downgrade to a read after completing them
      all, and then do the "populate the newly mlocked regions" while holding
      just the read lock.  And then just drop the read-lock and return to user
      space.
      
      The (perhaps somewhat simpler) alternative is to just make all the
      callers of mlock_vma_pages_range() know that the mmap lock got dropped,
      and just re-grab the mmap semaphore if it needs to mlock more than one
      vma region.
      
      So we can do this "downgrade mmap sem while populating mlocked regions"
      thing right, but the way it was done here was absolutely not correct.
      Thus the revert, in the expectation that we will do it all correctly
      some day.
      
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27421e21
  4. 01 2月, 2009 1 次提交
    • L
      Stop playing silly games with the VM_ACCOUNT flag · fc8744ad
      Linus Torvalds 提交于
      The mmap_region() code would temporarily set the VM_ACCOUNT flag for
      anonymous shared mappings just to inform shmem_zero_setup() that it
      should enable accounting for the resulting shm object.  It would then
      clear the flag after calling ->mmap (for the /dev/zero case) or doing
      shmem_zero_setup() (for the MAP_ANON case).
      
      This just resulted in vma merge issues, but also made for just
      unnecessary confusion.  Use the already-existing VM_NORESERVE flag for
      this instead, and let shmem_{zero|file}_setup() just figure it out from
      that.
      
      This also happens to make it obvious that the new DRI2 GEM layer uses a
      non-reserving backing store for its object allocation - which is quite
      possibly not intentional.  But since I didn't want to change semantics
      in this patch, I left it alone, and just updated the caller to use the
      new flag semantics.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc8744ad
  5. 31 1月, 2009 1 次提交
    • L
      Allow opportunistic merging of VM_CAN_NONLINEAR areas · 33bfad54
      Linus Torvalds 提交于
      Commit de33c8db ("Fix OOPS in
      mmap_region() when merging adjacent VM_LOCKED file segments") unified
      the vma merging of anonymous and file maps to just one place, which
      simplified the code and fixed a use-after-free bug that could cause an
      oops.
      
      But by doing the merge opportunistically before even having called
      ->mmap() on the file method, it now compares two different 'vm_flags'
      values: the pre-mmap() value of the new not-yet-formed vma, and previous
      mappings of the same file around it.
      
      And in doing so, it refused to merge the common file case, which adds a
      marker to say "I can be made non-linear".
      
      This fixes it by just adding a set of flags that don't have to match,
      because we know they are ok to merge.  Currently it's only that single
      VM_CAN_NONLINEAR flag, but at least conceptually there could be others
      in the future.
      Reported-and-acked-by: NHugh Dickins <hugh@veritas.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg KH <gregkh@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33bfad54
  6. 30 1月, 2009 4 次提交
  7. 28 1月, 2009 1 次提交
  8. 27 1月, 2009 1 次提交
  9. 21 1月, 2009 1 次提交
  10. 16 1月, 2009 8 次提交
    • L
      memcg: fix a race when setting memory.swappiness · 068b38c1
      Li Zefan 提交于
      (suppose: memcg->use_hierarchy == 0 and memcg->swappiness == 60)
      
      echo 10 > /memcg/0/swappiness   |
        mem_cgroup_swappiness_write() |
          ...                         | echo 1 > /memcg/0/use_hierarchy
                                      | mkdir /mnt/0/1
                                      |   sub_memcg->swappiness = 60;
          memcg->swappiness = 10;     |
      
      In the above scenario, we end up having 2 different swappiness
      values in a single hierarchy.
      
      We should hold cgroup_lock() when cheking cgrp->children list.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      068b38c1
    • L
      memcg: fix section mismatch · 0eb253e2
      Li Zefan 提交于
      At system boot when creating the top cgroup, mem_cgroup_create() calls
      enable_swap_cgroup() which is marked as __init, so mark
      mem_cgroup_create() as __ref to avoid false section mismatch warning.
      Reported-by: NRakib Mullick <rakib.mullick@gmail.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by; KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0eb253e2
    • A
      revert "mm: vmalloc use mutex for purge" · 46666d8a
      Andrew Morton 提交于
      Revert commit e97a630e ("mm: vmalloc use
      mutex for purge")
      
      Bryan Donlan reports:
      
      : After testing 2.6.29-rc1 on xen-x86 with a btrfs root filesystem, I
      : got the OOPS quoted below and a hard freeze shortly after boot.
      : Boot messages and config are attached.
      :
      : ------------[ cut here ]------------
      : Kernel BUG at c05ef80d [verbose debug info unavailable]
      : invalid opcode: 0000 [#1] SMP
      : last sysfs file: /sys/block/xvdc/size
      : Modules linked in:
      :
      : Pid: 0, comm: swapper Not tainted (2.6.29-rc1 #6)
      : EIP: 0061:[<c05ef80d>] EFLAGS: 00010087 CPU: 2
      : EIP is at schedule+0x7cd/0x950
      : EAX: d5aeca80 EBX: 00000002 ECX: 00000000 EDX: d4cb9a40
      : ESI: c12f5600 EDI: d4cb9a40 EBP: d6033fa4 ESP: d6033ef4
      :  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069
      : Process swapper (pid: 0, ti=d6032000 task=d6020b70 task.ti=d6032000)
      : Stack:
      :  000d85bc 00000000 000186a0 00000000 0dd11410 c0105417 c12efe00 0dc367c3
      :  00000011 c0105d46 d5a5d310 deadbeef d4cb9a40 c07cc600 c05f1340 c12e0060
      :  deadbeef d6020b70 d6020d08 00000002 c014377d 00000000 c12f5600 00002c22
      : Call Trace:
      :  [<c0105417>] xen_force_evtchn_callback+0x17/0x30
      :  [<c0105d46>] check_events+0x8/0x12
      :  [<c05f1340>] _spin_unlock_irqrestore+0x20/0x40
      :  [<c014377d>] hrtimer_start_range_ns+0x12d/0x2e0
      :  [<c014c4f6>] tick_nohz_restart_sched_tick+0x146/0x160
      :  [<c0107485>] cpu_idle+0xa5/0xc0
      
      and bisected it to this commit.
      
      Let's remove it now while we have a think about the problem.
      Reported-by: NBryan Donlan <bdonlan@gmail.com>
      Tested-by: NChristophe Saout <christophe@saout.de>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46666d8a
    • D
      memcg: make oom less frequently · 4d1c6273
      Daisuke Nishimura 提交于
      In previous implementation, mem_cgroup_try_charge checked the return
      value of mem_cgroup_try_to_free_pages, and just retried if some pages
      had been reclaimed.
      But now, try_charge(and mem_cgroup_hierarchical_reclaim called from it)
      only checks whether the usage is less than the limit.
      
      This patch tries to change the behavior as before to cause oom less
      frequently.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d1c6273
    • D
      memcg: fix hierarchical reclaim · c268e994
      Daisuke Nishimura 提交于
      If root_mem has no children, last_scaned_child is set to root_mem itself.
      But after some children added to root_mem, mem_cgroup_get_next_node can
      mem_cgroup_put the root_mem although root_mem has not been mem_cgroup_get.
      
      This patch fixes this behavior by:
      
      - Set last_scanned_child to NULL if root_mem has no children or DFS
        search has returned to root_mem itself(root_mem is not a "child" of
        root_mem).  Make mem_cgroup_get_first_node return root_mem in this case.
         There are no mem_cgroup_get/put for root_mem.
      
      - Rename mem_cgroup_get_next_node to __mem_cgroup_get_next_node, and
        mem_cgroup_get_first_node to mem_cgroup_get_next_node.  Make
        mem_cgroup_hierarchical_reclaim call only new mem_cgroup_get_next_node.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c268e994
    • D
      memcg: fix error path of mem_cgroup_move_parent · 40d58138
      Daisuke Nishimura 提交于
      There is a bug in error path of mem_cgroup_move_parent.
      
      Extra refcnt got from try_charge should be dropped, and usages incremented
      by try_charge should be decremented in both error paths:
      
          A: failure at get_page_unless_zero
          B: failure at isolate_lru_page
      
      This bug makes this parent directory unremovable.
      
      In case of A, rmdir doesn't return, because res.usage doesn't go down to 0
      at mem_cgroup_force_empty even after all the pc in lru are removed.
      
      In case of B, rmdir fails and returns -EBUSY, because it has extra ref
      counts even after res.usage goes down to 0.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40d58138
    • D
      memcg: fix mem_cgroup_get_reclaim_stat_from_page · bd112db8
      Daisuke Nishimura 提交于
      In case of swapin, a new page is added to lru before it is charged,
      so page->pc->mem_cgroup points to NULL or last mem_cgroup the page
      was charged before.
      
      In the latter case, if the mem_cgroup has already freed by rmdir,
      the area pointed to by page->pc->mem_cgroup may have invalid data.
      
      Actually, I saw general protection fault.
      
          general protection fault: 0000 [#1] SMP
          last sysfs file: /sys/devices/system/cpu/cpu15/cache/index1/shared_cpu_map
          CPU 4
          Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp ipv6 autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_region_hash dm_log dm_multipath dm_mod rfkill input_polldev sbs sbshc battery ac lp sg ide_cd_mod cdrom button serio_raw acpi_memhotplug parport_pc e1000 rtc_cmos parport rtc_core rtc_lib i2c_i801 i2c_core shpchp pcspkr ata_piix libata megaraid_mbox megaraid_mm sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [last unloaded: microcode]
          Pid: 26038, comm: page01 Tainted: G        W  2.6.28-rc9-mm1-mmotm-2008-12-22-16-14-f2ab3dea #1
          RIP: 0010:[<ffffffff8028e710>]  [<ffffffff8028e710>] update_page_reclaim_stat+0x2f/0x42
          RSP: 0000:ffff8801ee457da8  EFLAGS: 00010002
          RAX: 32353438312021c8 RBX: 0000000000000000 RCX: 32353438312021c8
          RDX: 0000000000000000 RSI: ffff8800cb0b1000 RDI: ffff8801164d1d28
          RBP: ffff880110002cb8 R08: ffff88010f2eae23 R09: 0000000000000001
          R10: ffff8800bc514b00 R11: ffff880110002c00 R12: 0000000000000000
          R13: ffff88000f484100 R14: 0000000000000003 R15: 00000000001200d2
          FS:  00007f8a261726f0(0000) GS:ffff88010f2eaa80(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
          CR2: 00007f8a25d22000 CR3: 00000001ef18c000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          Process page01 (pid: 26038, threadinfo ffff8801ee456000, task ffff8800b585b960)
          Stack:
           ffffe200071ee568 ffff880110001f00 0000000000000000 ffffffff8028ea17
           ffff88000f484100 0000000000000000 0000000000000020 00007f8a25d22000
           ffff8800bc514b00 ffffffff8028ec34 0000000000000000 0000000000016fd8
          Call Trace:
           [<ffffffff8028ea17>] ? ____pagevec_lru_add+0xc1/0x13c
           [<ffffffff8028ec34>] ? drain_cpu_pagevecs+0x36/0x89
           [<ffffffff802a4f8c>] ? swapin_readahead+0x78/0x98
           [<ffffffff8029a37a>] ? handle_mm_fault+0x3d9/0x741
           [<ffffffff804da654>] ? do_page_fault+0x3ce/0x78c
           [<ffffffff804d7a42>] ? trace_hardirqs_off_thunk+0x3a/0x3c
           [<ffffffff804d860f>] ? page_fault+0x1f/0x30
          Code: cc 55 48 8d af b8 0d 00 00 48 89 f7 53 89 d3 e8 39 85 02 00 48 63 d3 48 ff 44 d5 10 45 85 e4 74 05 48 ff 44 d5 00 48 85 c0 74 0e <48> ff 44 d0 10 45 85 e4 74 04 48 ff 04 d0 5b 5d 41 5c c3 41 54
          RIP  [<ffffffff8028e710>] update_page_reclaim_stat+0x2f/0x42
           RSP <ffff8801ee457da8>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd112db8
    • I
      alpha: fix vmalloc breakage · 822c18f2
      Ivan Kokshaysky 提交于
      On alpha, we have to map some stuff in the VMALLOC space very early in the
      boot process (to make SRM console callbacks work and so on, see
      arch/alpha/mm/init.c).  For old VM allocator, we just manually placed a
      vm_struct onto the global vmlist and this worked for ages.
      
      Unfortunately, the new allocator isn't aware of this, so it constantly
      tries to allocate the VM space which is already in use, making vmalloc on
      alpha defunct.
      
      This patch forces KVA to import vmlist entries on init.
      
      [akpm@linux-foundation.org: remove unneeded check (per Johannes)]
      Signed-off-by: NIvan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      822c18f2
  11. 14 1月, 2009 8 次提交
  12. 12 1月, 2009 1 次提交
  13. 09 1月, 2009 11 次提交