1. 03 4月, 2009 6 次提交
    • K
      cgroup: fix frequent -EBUSY at rmdir · ec64f515
      KAMEZAWA Hiroyuki 提交于
      In following situation, with memory subsystem,
      
      	/groupA use_hierarchy==1
      		/01 some tasks
      		/02 some tasks
      		/03 some tasks
      		/04 empty
      
      When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
      is triggered and the kernel walks tree under groupA. In this case,
      rmdir /groupA/04 fails with -EBUSY frequently because of temporal
      refcnt from the kernel.
      
      In general. cgroup can be rmdir'd if there are no children groups and
      no tasks. Frequent fails of rmdir() is not useful to users.
      (And the reason for -EBUSY is unknown to users.....in most cases)
      
      This patch tries to modify above behavior, by
      	- retries if css_refcnt is got by someone.
      	- add "return value" to pre_destroy() and allows subsystem to
      	  say "we're really busy!"
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec64f515
    • J
      workqueue: add to_delayed_work() helper function · bf6aede7
      Jean Delvare 提交于
      It is a fairly common operation to have a pointer to a work and to need a
      pointer to the delayed work it is contained in.  In particular, all
      delayed works which want to rearm themselves will have to do that.  So it
      would seem fair to offer a helper function for this operation.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf6aede7
    • M
      mm: do_xip_mapping_read: fix length calculation · 58984ce2
      Martin Schwidefsky 提交于
      The calculation of the value nr in do_xip_mapping_read is incorrect.  If
      the copy required more than one iteration in the do while loop the copies
      variable will be non-zero.  The maximum length that may be passed to the
      call to copy_to_user(buf+copied, xip_mem+offset, nr) is len-copied but the
      check only compares against (nr > len).
      
      This bug is the cause for the heap corruption Carsten has been chasing
      for so long:
      
      *** glibc detected *** /bin/bash: free(): invalid next size (normal): 0x00000000800e39f0 ***
      ======= Backtrace: =========
      /lib64/libc.so.6[0x200000b9b44]
      /lib64/libc.so.6(cfree+0x8e)[0x200000bdade]
      /bin/bash(free_buffered_stream+0x32)[0x80050e4e]
      /bin/bash(close_buffered_stream+0x1c)[0x80050ea4]
      /bin/bash(unset_bash_input+0x2a)[0x8001c366]
      /bin/bash(make_child+0x1d4)[0x8004115c]
      /bin/bash[0x8002fc3c]
      /bin/bash(execute_command_internal+0x656)[0x8003048e]
      /bin/bash(execute_command+0x5e)[0x80031e1e]
      /bin/bash(execute_command_internal+0x79a)[0x800305d2]
      /bin/bash(execute_command+0x5e)[0x80031e1e]
      /bin/bash(reader_loop+0x270)[0x8001efe0]
      /bin/bash(main+0x1328)[0x8001e960]
      /lib64/libc.so.6(__libc_start_main+0x100)[0x200000592a8]
      /bin/bash(clearerr+0x5e)[0x8001c092]
      
      With this bug fix the commit 0e4a9b59
      "ext2/xip: refuse to change xip flag during remount with busy inodes" can
      be removed again.
      
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58984ce2
    • A
      mm: align vmstat_work's timer · 98f4ebb2
      Anton Blanchard 提交于
      Even though vmstat_work is marked deferrable, there are still benefits to
      aligning it.  For certain applications we want to keep OS jitter as low as
      possible and aligning timers and work so they occur together can reduce
      their overall impact.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98f4ebb2
    • D
      nommu: fix a number of issues with the per-MM VMA patch · 33e5d769
      David Howells 提交于
      Fix a number of issues with the per-MM VMA patch:
      
       (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
           a NOMMU system with more than 2G pages.  Makes no difference on a 32-bit
           system.
      
       (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
           lest it overflow.
      
       (3) Move the allocation of the vm_area_struct slab back for fork.c.
      
       (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.
      
       (5) Use BUG_ON() rather than if () BUG().
      
       (6) Make the default validate_nommu_regions() a static inline rather than a
           #define.
      
       (7) Make free_page_series()'s objection to pages with a refcount != 1 more
           informative.
      
       (8) Adjust the __put_nommu_region() banner comment to indicate that the
           semaphore must be held for writing.
      
       (9) Limit the number of warnings about munmaps of non-mmapped regions.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33e5d769
    • A
      generic debug pagealloc: build fix · ee3b4290
      Akinobu Mita 提交于
      This fixes a build failure with generic debug pagealloc:
      
        mm/debug-pagealloc.c: In function 'set_page_poison':
        mm/debug-pagealloc.c:8: error: 'struct page' has no member named 'debug_flags'
        mm/debug-pagealloc.c: In function 'clear_page_poison':
        mm/debug-pagealloc.c:13: error: 'struct page' has no member named 'debug_flags'
        mm/debug-pagealloc.c: In function 'page_poison':
        mm/debug-pagealloc.c:18: error: 'struct page' has no member named 'debug_flags'
        mm/debug-pagealloc.c: At top level:
        mm/debug-pagealloc.c:120: error: redefinition of 'kernel_map_pages'
        include/linux/mm.h:1278: error: previous definition of 'kernel_map_pages' was here
        mm/debug-pagealloc.c: In function 'kernel_map_pages':
        mm/debug-pagealloc.c:122: error: 'debug_pagealloc_enabled' undeclared (first use in this function)
      
      by fixing
      
       - debug_flags should be in struct page
       - define DEBUG_PAGEALLOC config option for all architectures
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee3b4290
  2. 01 4月, 2009 25 次提交
  3. 31 3月, 2009 1 次提交
    • I
      lockdep: annotate reclaim context (__GFP_NOFS), fix SLOB · 19cefdff
      Ingo Molnar 提交于
      Impact: build fix
      
      fix typo in mm/slob.c:
      
       mm/slob.c:469: error: ‘flags’ undeclared (first use in this function)
       mm/slob.c:469: error: (Each undeclared identifier is reported only once
       mm/slob.c:469: error: for each function it appears in.)
      
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090128135457.350751756@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      19cefdff
  4. 30 3月, 2009 2 次提交
  5. 27 3月, 2009 1 次提交
    • W
      writeback: double the dirty thresholds · 1b5e62b4
      Wu Fengguang 提交于
      Enlarge default dirty ratios from 5/10 to 10/20.  This fixes [Bug
      #12809] iozone regression with 2.6.29-rc6.
      
      The iozone benchmarks are performed on a 1200M file, with 8GB ram.
      
        iozone -i 0 -i 1 -i 2 -i 3 -i 4 -r 4k -s 64k -s 512m -s 1200m -b tmp.xls
        iozone -B -r 4k -s 64k -s 512m -s 1200m -b tmp.xls
      
      The performance regression is triggered by commit 1cf6e7d8(mm: task
      dirty accounting fix), which makes more correct/thorough dirty
      accounting.
      
      The default 5/10 dirty ratios were picked (a) with the old dirty logic
      and (b) largely at random and (c) designed to be aggressive.  In
      particular, that (a) means that having fixed some of the dirty
      accounting, maybe the real bug is now that it was always too aggressive,
      just hidden by an accounting issue.
      
      The enlarged 10/20 dirty ratios are just about enough to fix the regression.
      
      [ We will have to look at how this affects the old fsync() latency issue,
        but that probably will need independent work.  - Linus ]
      
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Reported-by: N"Lin, Ming M" <ming.m.lin@intel.com>
      Tested-by: N"Lin, Ming M" <ming.m.lin@intel.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b5e62b4
  6. 26 3月, 2009 1 次提交
  7. 23 3月, 2009 2 次提交
  8. 16 3月, 2009 1 次提交
    • N
      highmem: atomic highmem kmap page pinning · 3297e760
      Nicolas Pitre 提交于
      Most ARM machines have a non IO coherent cache, meaning that the
      dma_map_*() set of functions must clean and/or invalidate the affected
      memory manually before DMA occurs.  And because the majority of those
      machines have a VIVT cache, the cache maintenance operations must be
      performed using virtual
      addresses.
      
      When a highmem page is kunmap'd, its mapping (and cache) remains in place
      in case it is kmap'd again. However if dma_map_page() is then called with
      such a page, some cache maintenance on the remaining mapping must be
      performed. In that case, page_address(page) is non null and we can use
      that to synchronize the cache.
      
      It is unlikely but still possible for kmap() to race and recycle the
      virtual address obtained above, and use it for another page before some
      on-going cache invalidation loop in dma_map_page() is done. In that case,
      the new mapping could end up with dirty cache lines for another page,
      and the unsuspecting cache invalidation loop in dma_map_page() might
      simply discard those dirty cache lines resulting in data loss.
      
      For example, let's consider this sequence of events:
      
      	- dma_map_page(..., DMA_FROM_DEVICE) is called on a highmem page.
      
      	-->	- vaddr = page_address(page) is non null. In this case
      		it is likely that the page has valid cache lines
      		associated with vaddr. Remember that the cache is VIVT.
      
      		-->	for (i = vaddr; i < vaddr + PAGE_SIZE; i += 32)
      				invalidate_cache_line(i);
      
      	*** preemption occurs in the middle of the loop above ***
      
      	- kmap_high() is called for a different page.
      
      	-->	- last_pkmap_nr wraps to zero and flush_all_zero_pkmaps()
      		  is called.  The pkmap_count value for the page passed
      		  to dma_map_page() above happens to be 1, so the page
      		  is unmapped.  But prior to that, flush_cache_kmaps()
      		  cleared the cache for it.  So far so good.
      
      		- A fresh pkmap entry is assigned for this kmap request.
      		  The Murphy law says this pkmap entry will eventually
      		  happen to use the same vaddr as the one which used to
      		  belong to the other page being processed by
      		  dma_map_page() in the preempted thread above.
      
      	- The kmap_high() caller start dirtying the cache using the
      	  just assigned virtual mapping for its page.
      
      	*** the first thread is rescheduled ***
      
      			- The for(...) loop is resumed, but now cached
      			  data belonging to a different physical page is
      			  being discarded !
      
      And this is not only a preemption issue as ARM can be SMP as well,
      making the above scenario just as likely. Hence the need for some kind
      of pkmap page pinning which can be used in any context, primarily for
      the benefit of dma_map_page() on ARM.
      
      This provides the necessary interface to cope with the above issue if
      ARCH_NEEDS_KMAP_HIGH_GET is defined, otherwise the resulting code is
      unchanged.
      Signed-off-by: NNicolas Pitre <nico@marvell.com>
      Reviewed-by: NMinChan Kim <minchan.kim@gmail.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      3297e760
  9. 15 3月, 2009 1 次提交