1. 25 5月, 2010 1 次提交
  2. 22 5月, 2010 9 次提交
  3. 20 5月, 2010 3 次提交
  4. 19 5月, 2010 2 次提交
  5. 17 5月, 2010 1 次提交
    • J
      writeback: fix WB_SYNC_NONE writeback from umount · e913fc82
      Jens Axboe 提交于
      When umount calls sync_filesystem(), we first do a WB_SYNC_NONE
      writeback to kick off writeback of pending dirty inodes, then follow
      that up with a WB_SYNC_ALL to wait for it. Since umount already holds
      the sb s_umount mutex, WB_SYNC_NONE ends up doing nothing and all
      writeback happens as WB_SYNC_ALL. This can greatly slow down umount,
      since WB_SYNC_ALL writeback is a data integrity operation and thus
      a bigger hammer than simple WB_SYNC_NONE. For barrier aware file systems
      it's a lot slower.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      e913fc82
  6. 12 5月, 2010 4 次提交
    • K
      memcg: fix css_is_ancestor() RCU locking · 747388d7
      KAMEZAWA Hiroyuki 提交于
      Some callers (in memcontrol.c) calls css_is_ancestor() without
      rcu_read_lock.  Because css_is_ancestor() has to access RCU protected
      data, it should be under rcu_read_lock().
      
      This makes css_is_ancestor() itself does safe access to RCU protected
      area.  (At least, "root" can have refcnt==0 if it's not an ancestor of
      "child".  So, we need rcu_read_lock().)
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      747388d7
    • K
      memcg: fix css_id() RCU locking for real · 7f0f1546
      KAMEZAWA Hiroyuki 提交于
      Commit ad4ba375 ("memcg: css_id() must be
      called under rcu_read_lock()") modifies memcontol.c for fixing RCU check
      message.  But Andrew Morton pointed out that the fix doesn't seems sane
      and it was just for hidining lockdep messages.
      
      This is a patch for do proper things.  Checking again, all places,
      accessing without rcu_read_lock, that commit fixies was intentional....
      all callers of css_id() has reference count on it.  So, it's not necessary
      to be under rcu_read_lock().
      
      Considering again, we can use rcu_dereference_check for css_id().  We know
      css->id is valid if css->refcnt > 0.  (css->id never changes and freed
      after css->refcnt going to be 0.)
      
      This patch makes use of rcu_dereference_check() in css_id/depth and remove
      unnecessary rcu-read-lock added by the commit.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f0f1546
    • N
      rmap: remove anon_vma check in page_address_in_vma() · ab941e0f
      Naoya Horiguchi 提交于
      Currently page_address_in_vma() compares vma->anon_vma and
      page_anon_vma(page) for parameter check, but in 2.6.34 a vma can have
      multiple anon_vmas with anon_vma_chain, so current check does not work.
      (For anonymous page shared by multiple processes, some verified (page,vma)
      pairs return -EFAULT wrongly.)
      
      We can go to checking all anon_vmas in the "same_vma" chain, but it needs
      to meet lock requirement.  Instead, we can remove anon_vma check safely
      because page_address_in_vma() assumes that page and vma are already
      checked to belong to the identical process.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab941e0f
    • M
      hugetlbfs: kill applications that use MAP_NORESERVE with SIGBUS instead of OOM-killer · 4a6018f7
      Mel Gorman 提交于
      Ordinarily, application using hugetlbfs will create mappings with
      reserves.  For shared mappings, these pages are reserved before mmap()
      returns success and for private mappings, the caller process is guaranteed
      and a child process that cannot get the pages gets killed with sigbus.
      
      An application that uses MAP_NORESERVE gets no reservations and mmap()
      will always succeed at the risk the page will not be available at fault
      time.  This might be used for example on very large sparse mappings where
      the developer is confident the necessary huge pages exist to satisfy all
      faults even though the whole mapping cannot be backed by huge pages.
      Unfortunately, if an allocation does fail, VM_FAULT_OOM is returned to the
      fault handler which proceeds to trigger the OOM-killer.  This is
      unhelpful.
      
      Even without hugetlbfs mounted, a user using mmap() can trivially trigger
      the OOM-killer because VM_FAULT_OOM is returned (will provide example
      program if desired - it's a whopping 24 lines long).  It could be
      considered a DOS available to an unprivileged user.
      
      This patch alters hugetlbfs to kill a process that uses MAP_NORESERVE
      where huge pages were not available with SIGBUS instead of triggering the
      OOM killer.
      
      This change affects hugetlb_cow() as well.  I feel there is a failure case
      in there, but I didn't create one.  It would need a fairly specific target
      in terms of the faulting application and the hugepage pool size.  The
      hugetlb_no_page() path is much easier to hit but both might as well be
      closed.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a6018f7
  7. 06 5月, 2010 1 次提交
  8. 05 5月, 2010 1 次提交
  9. 01 5月, 2010 5 次提交
    • T
      percpu: implement kernel memory based chunk allocation · b0c9778b
      Tejun Heo 提交于
      Implement an alternate percpu chunk management based on kernel memeory
      for nommu SMP architectures.  Instead of mapping into vmalloc area,
      chunks are allocated as a contiguous kernel memory using
      alloc_pages().  As such, percpu allocator on nommu will have the
      following restrictions.
      
      * It can't fill chunks on-demand page-by-page.  It has to allocate
        each chunk fully upfront.
      
      * It can't support sparse chunk for NUMA configurations.  SMP w/o mmu
        is crazy enough.  Let's hope no one does NUMA w/o mmu.  :-P
      
      * If chunk size isn't power-of-two multiple of PAGE_SIZE, the
        unaligned amount will be wasted on each chunk.  So, archs which use
        this better align chunk size.
      
      For instructions on how to use this, read the comment on top of
      mm/percpu-km.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Cc: Graff Yang <graff.yang@gmail.com>
      Cc: Sonic Zhang <sonic.adi@gmail.com>
      b0c9778b
    • T
      percpu: move vmalloc based chunk management into percpu-vm.c · 9f645532
      Tejun Heo 提交于
      Separate out and move chunk management (creation/desctruction and
      [de]population) code into percpu-vm.c which is included by percpu.c
      and compiled together.  The interface for chunk management is defined
      as follows.
      
       * pcpu_populate_chunk		- populate the specified range of a chunk
       * pcpu_depopulate_chunk	- depopulate the specified range of a chunk
       * pcpu_create_chunk		- create a new chunk
       * pcpu_destroy_chunk		- destroy a chunk, always preceded by full depop
       * pcpu_addr_to_page		- translate address to physical address
       * pcpu_verify_alloc_info	- check alloc_info is acceptable during init
      
      Other than wrapping vmalloc_to_page() inside pcpu_addr_to_page() and
      dummy pcpu_verify_alloc_info() implementation, this patch only moves
      code around.  This separation is to allow alternate chunk management
      implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Cc: Graff Yang <graff.yang@gmail.com>
      Cc: Sonic Zhang <sonic.adi@gmail.com>
      9f645532
    • T
      percpu: misc preparations for nommu support · 88999a89
      Tejun Heo 提交于
      Make the following misc preparations for percpu nommu support.
      
      * Remove refernces to vmalloc in common comments as nommu percpu won't
        use it.
      
      * Rename chunk->vms to chunk->data and make it void *.  Its use is
        determined by chunk management implementation.
      
      * Relocate utility functions and add __maybe_unused to functions which
        might not be used by different chunk management implementations.
      
      This patch doesn't cause any functional change.  This is to allow
      alternate chunk management implementation for percpu nommu support.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Cc: Graff Yang <graff.yang@gmail.com>
      Cc: Sonic Zhang <sonic.adi@gmail.com>
      88999a89
    • T
      percpu: reorganize chunk creation and destruction · 6081089f
      Tejun Heo 提交于
      Reorganize alloc/free_pcpu_chunk() such that chunk struct alloc/free
      live in pcpu_alloc/free_chunk() and the rest in
      pcpu_create/destroy_chunk().  While at it, add missing error handling
      for chunk->map allocation failure.
      
      This is to allow alternate chunk management implementation for percpu
      nommu support.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Cc: Graff Yang <graff.yang@gmail.com>
      Cc: Sonic Zhang <sonic.adi@gmail.com>
      6081089f
    • T
      percpu: factor out pcpu_addr_in_first/reserved_chunk() and update per_cpu_ptr_to_phys() · 020ec653
      Tejun Heo 提交于
      Factor out pcpu_addr_in_first/reserved_chunk() from
      pcpu_chunk_addr_search() and use it to update per_cpu_ptr_to_phys()
      such that it handles first chunk differently from the rest.
      
      This patch doesn't cause any functional change and is to prepare for
      percpu nommu support.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Cc: Graff Yang <graff.yang@gmail.com>
      Cc: Sonic Zhang <sonic.adi@gmail.com>
      020ec653
  10. 29 4月, 2010 1 次提交
  11. 27 4月, 2010 1 次提交
  12. 25 4月, 2010 5 次提交
  13. 22 4月, 2010 1 次提交
  14. 20 4月, 2010 1 次提交
    • R
      rmap: add exclusively owned pages to the newest anon_vma · e8a03feb
      Rik van Riel 提交于
      The recent anon_vma fixes cause many anonymous pages to end up
      in the parent process anon_vma, even when the page is exclusively
      owned by the current process.
      
      Adding exclusively owned anonymous pages to the top anon_vma
      reduces rmap scanning overhead, especially in workloads with
      forking servers.
      
      This patch adds a parameter to __page_set_anon_rmap that can
      be used to indicate whether or not the added page is exclusively
      owned by the current process.
      
      Pages added through page_add_new_anon_rmap are exclusively
      owned by the current process, and can be added to the top
      anon_vma.
      
      Pages added through page_add_anon_rmap can be either shared
      or exclusively owned, so we do the conservative thing and
      add it to the oldest anon_vma.
      
      A next step would be to add the exclusive parameter to
      page_add_anon_rmap, to be used from functions where we do
      know for sure whether a page is exclusively owned.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Lightly-tested-by: NBorislav Petkov <bp@alien8.de>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      [ Edited to look nicer  - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8a03feb
  15. 15 4月, 2010 1 次提交
    • S
      slab: Fix missing DEBUG_SLAB last user · 5c5e3b33
      Shiyong Li 提交于
      Even with SLAB_RED_ZONE and SLAB_STORE_USER enabled, kernel would NOT store
      redzone and last user data around allocated memory space if "arch cache line >
      sizeof(unsigned long long)". As a result, last user information is unexpectedly
      MISSED while dumping slab corruption log.
      
      This fix makes sure that redzone and last user tags get stored unless the
      required alignment breaks redzone's.
      Signed-off-by: NShiyong Li <shi-yong.li@motorola.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      5c5e3b33
  16. 13 4月, 2010 3 次提交
    • L
      anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma · ea90002b
      Linus Torvalds 提交于
      Otherwise we might be mapping in a page in a new mapping, but that page
      (through the swapcache) would later be mapped into an old mapping too.
      The page->mapping must be the case that works for everybody, not just
      the mapping that happened to page it in first.
      
      Here's the scenario:
      
       - page gets allocated/mapped by process A. Let's call the anon_vma we
         associate the page with 'A' to keep it easy to track.
      
       - Process A forks, creating process B. The anon_vma in B is 'B', and has
         a chain that looks like 'B' -> 'A'. Everything is fine.
      
       - Swapping happens. The page (with mapping pointing to 'A') gets swapped
         out (perhaps not to disk - it's enough to assume that it's just not
         mapped any more, and lives entirely in the swap-cache)
      
       - Process B pages it in, which goes like this:
      
              do_swap_page ->
                page = lookup_swap_cache(entry);
               ...
                set_pte_at(mm, address, page_table, pte);
                page_add_anon_rmap(page, vma, address);
      
         And think about what happens here!
      
         In particular, what happens is that this will now be the "first"
         mapping of that page, so page_add_anon_rmap() used to do
      
              if (first)
                      __page_set_anon_rmap(page, vma, address);
      
         and notice what anon_vma it will use? It will use the anon_vma for
         process B!
      
         What happens then? Trivial: process 'A' also pages it in (nothing
         happens, it's not the first mapping), and then process 'B' execve's
         or exits or unmaps, making anon_vma B go away.
      
         End result: process A has a page that points to anon_vma B, but
         anon_vma B does not exist any more.  This can go on forever.  Forget
         about RCU grace periods, forget about locking, forget anything like
         that.  The bug is simply that page->mapping points to an anon_vma
         that was correct at one point, but was _not_ the one that was shared
         by all users of that possible mapping.
      
      Changing it to always use the deepest anon_vma in the anonvma chain gets
      us to the safest model.
      
      This can be improved in certain cases: if we know the page is private to
      just this particular mapping (for example, it's a new page, or it is the
      only swapcache entry), we could pick the top (most specific) anon_vma.
      
      But that's a future optimization. Make it _work_ reliably first.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "What do you know, I think you fixed it!" ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea90002b
    • L
      anon_vma: clone the anon_vma chain in the right order · 646d87b4
      Linus Torvalds 提交于
      We want to walk the chain in reverse order when cloning it, so that the
      order of the result chain will be the same as the order in the source
      chain.  When we add entries to the chain, they go at the head of the
      chain, so we want to add the source head last.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "No, it still oopses" ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      646d87b4
    • L
      vma_adjust: fix the copying of anon_vma chains · 287d97ac
      Linus Torvalds 提交于
      When we move the boundaries between two vma's due to things like
      mprotect, we need to make sure that the anon_vma of the pages that got
      moved from one vma to another gets properly copied around.  And that was
      not always the case, in this rather hard-to-follow code sequence.
      
      Clarify the code, and fix it so that it copies the anon_vma from the
      right source.
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: Borislav Petkov <bp@alien8.de> [ "Yeah, not so much this one either" ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      287d97ac