提交 · cba48b98f2348c814316c4b4f411a07a0e4a2bf9 · openeuler / raspberrypi-kernel

10 8月, 2010 6 次提交

mm: change direct call of spin_lock(anon_vma->lock) to inline function · cba48b98

由 Rik van Riel 提交于 8月 09, 2010

Subsitute a direct call of spin_lock(anon_vma->lock) with an inline
function doing exactly the same.

This makes it easier to do the substitution to the root anon_vma lock in a
following patch.

We will deal with the handful of special locks (nested, dec_and_lock, etc)
separately.
Signed-off-by: NRik van Riel <riel@redhat.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: NLarry Woodman <lwoodman@redhat.com>
Acked-by: NLarry Woodman <lwoodman@redhat.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cba48b98

mm: rename anon_vma_lock to vma_lock_anon_vma · bb4a340e

由 Rik van Riel 提交于 8月 09, 2010

Rename anon_vma_lock to vma_lock_anon_vma.  This matches the naming style
used in page_lock_anon_vma and will come in really handy further down in
this patch series.
Signed-off-by: NRik van Riel <riel@redhat.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: NLarry Woodman <lwoodman@redhat.com>
Acked-by: NLarry Woodman <lwoodman@redhat.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb4a340e

hugetlb: call mmu notifiers on hugepage cow · 3edd4fc9

由 Doug Doan 提交于 8月 09, 2010

When a copy-on-write occurs, we take one of two paths in handle_mm_fault:
through handle_pte_fault for normal pages, or through hugetlb_fault for
huge pages.

In the normal page case, we eventually get to do_wp_page and call mmu
notifiers via ptep_clear_flush_notify.  There is no callout to the mmmu
notifiers in the huge page case.  This patch fixes that.
Signed-off-by: NDoug Doan <dougd@cray.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3edd4fc9

mm: provide init_mm mm_context initializer · a1b200e2

由 Heiko Carstens 提交于 8月 09, 2010

Provide an INIT_MM_CONTEXT intializer macro which can be used to
statically initialize mm_struct:mm_context of init_mm.  This way we can
get rid of code which will do the initialization at run time (on s390).

In addition the current code can be found at a place where it is not
expected.  So let's have a common initializer which architectures
can use if needed.

This is based on a patch from Suzuki Poulose.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Suzuki Poulose <suzuki@in.ibm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a1b200e2

mm: use ERR_CAST · e7d86340

由 Julia Lawall 提交于 8月 09, 2010

Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)).  The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
type T;
T x;
identifier f;
@@

T f (...) { <+...
- ERR_PTR(PTR_ERR(x))
+ x
 ...+> }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e7d86340

mm: use memdup_user · 90d74045

由 Julia Lawall 提交于 8月 09, 2010

Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90d74045

05 8月, 2010 1 次提交

mm,kdb,kgdb: Add a debug reference for the kdb kmap usage · eac79005

由 Jason Wessel 提交于 8月 05, 2010

The kdb kmap should never get used outside of the kernel debugger
exception context.

Signed-off-by: Jason Wessel<jason.wessel@windriver.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: linux-mm@kvack.org

eac79005

03 8月, 2010 2 次提交

slub: Allow removal of slab caches during boot · 2bce6485

由 Christoph Lameter 提交于 7月 19, 2010

Serialize kmem_cache_create and kmem_cache_destroy using the slub_lock. Only
possible after the use of the slub_lock during dynamic dma creation has been
removed.

Then make sure that the setup of the slab sysfs entries does not race
with kmem_cache_create and kmem_cache destroy.

If a slab cache is removed before we have setup sysfs then simply skip over
the sysfs handling.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

2bce6485

P
Revert "slub: Allow removal of slab caches during boot" · e438444d
由 Pekka Enberg 提交于 8月 03, 2010
```
This reverts commit f5b801ac.
```
e438444d

01 8月, 2010 2 次提交

KVM: Fix a race condition for usage of is_hwpoison_address() · bbeb3406

由 Huang Ying 提交于 6月 22, 2010

is_hwpoison_address accesses the page table, so the caller must hold
current->mm->mmap_sem in read mode. So fix its usage in hva_to_pfn of
kvm accordingly.

Comment is_hwpoison_address to remind other users.
Reported-by: NAvi Kivity <avi@redhat.com>
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bbeb3406

KVM: Avoid killing userspace through guest SRAO MCE on unmapped pages · bf998156

由 Huang Ying 提交于 5月 31, 2010

In common cases, guest SRAO MCE will cause corresponding poisoned page
be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay
the MCE to guest OS.

But it is reported that if the poisoned page is accessed in guest
after unmapping and before MCE is relayed to guest OS, userspace will
be killed.

The reason is as follows. Because poisoned page has been un-mapped,
guest access will cause guest exit and kvm_mmu_page_fault will be
called. kvm_mmu_page_fault can not get the poisoned page for fault
address, so kernel and user space MMIO processing is tried in turn. In
user MMIO processing, poisoned page is accessed again, then userspace
is killed by force_sig_info.

To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM
and do not try kernel and user space MMIO processing for poisoned
page.

[xiao: fix warning introduced by avi]
Reported-by: NMax Asbock <masbock@linux.vnet.ibm.com>
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

bf998156

31 7月, 2010 1 次提交

mm: fix ia64 crash when gcore reads gate area · de51257a

由 Hugh Dickins 提交于 7月 30, 2010

Debian's ia64 autobuilders have been seeing kernel freeze or reboot
when running the gdb testsuite (Debian bug 588574): dannf bisected to
2.6.32 62eede62 "mm: ZERO_PAGE without
PTE_SPECIAL"; and reproduced it with gdb's gcore on a simple target.

I'd missed updating the gate_vma handling in __get_user_pages(): that
happens to use vm_normal_page() (nowadays failing on the zero page),
yet reported success even when it failed to get a page - boom when
access_process_vm() tried to copy that to its intermediate buffer.

Fix this, resisting cleanups: in particular, leave it for now reporting
success when not asked to get any pages - very probably safe to change,
but let's not risk it without testing exposure.

Why did ia64 crash with 16kB pages, but succeed with 64kB pages?
Because setup_gate() pads each 64kB of its gate area with zero pages.
Reported-by: NAndreas Barth <aba@not.so.argh.org>
Bisected-by: Ndann frazier <dannf@debian.org>
Signed-off-by: NHugh Dickins <hughd@google.com>
Tested-by: Ndann frazier <dannf@dannf.org>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

de51257a

29 7月, 2010 1 次提交

slub numa: Fix rare allocation from unexpected node · bc6488e9

由 Christoph Lameter 提交于 7月 26, 2010

The network developers have seen sporadic allocations resulting in objects
coming from unexpected NUMA nodes despite asking for objects from a
specific node.

This is due to get_partial() calling get_any_partial() if partial
slabs are exhausted for a node even if a node was specified and therefore
one would expect allocations only from the specified node.

get_any_partial() sporadically may return a slab from a foreign
node to gradually reduce the size of partial lists on remote nodes
and thereby reduce total memory use for a slab cache.

The behavior is controlled by the remote_defrag_ratio of each cache.

Strictly speaking this is permitted behavior since __GFP_THISNODE was
not specified for the allocation but it is certain surprising.

This patch makes sure that the remote defrag behavior only occurs
if no node was specified.
Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

bc6488e9

21 7月, 2010 2 次提交

x86,nobootmem: make alloc_bootmem_node fall back to other node when 32bit numa is used · b8ab9f82

由 Yinghai Lu 提交于 7月 20, 2010

Borislav Petkov reported his 32bit numa system has problem:

[    0.000000] Reserving total of 4c00 pages for numa KVA remap
[    0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe
[    0.000000] max_pfn = 238000
[    0.000000] 8202MB HIGHMEM available.
[    0.000000] 885MB LOWMEM available.
[    0.000000]   mapped low ram: 0 - 375fe000
[    0.000000]   low ram: 0 - 375fe000
[    0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 1000 1000 => 34e7000
[    0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 200 40 => 34c9d80
[    0.000000] alloc (nid=0 100000 - 7ee00000) (1000000 - ffffffffffffffff) 180 40 => 34e6140
[    0.000000] alloc (nid=1 80000000 - c7e60000) (1000000 - ffffffffffffffff) 240 40 => 80000000
[    0.000000] BUG: unable to handle kernel paging request at 40000000
[    0.000000] IP: [<c2c8cff1>] __alloc_memory_core_early+0x147/0x1d6
[    0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00
...
[    0.000000] Call Trace:
[    0.000000]  [<c2c8b4f8>] ? __alloc_bootmem_node+0x216/0x22f
[    0.000000]  [<c2c90c9b>] ? sparse_early_usemaps_alloc_node+0x5a/0x10b
[    0.000000]  [<c2c9149e>] ? sparse_init+0x1dc/0x499
[    0.000000]  [<c2c79118>] ? paging_init+0x168/0x1df
[    0.000000]  [<c2c780ff>] ? native_pagetable_setup_start+0xef/0x1bb

looks like it allocates too much high address for bootmem.

Try to cut limit with get_max_mapped()
Reported-by: NBorislav Petkov <borislav.petkov@amd.com>
Tested-by: NConny Seidel <conny.seidel@amd.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: <stable@kernel.org>		[2.6.34.x]
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8ab9f82

mm/vmscan.c: fix mapping use after free · a6aa62a0

由 Nick Piggin 提交于 7月 20, 2010

We need lock_page_nosync() here because we have no reference to the
mapping when taking the page lock.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a6aa62a0

20 7月, 2010 1 次提交

slab: use deferable timers for its periodic housekeeping · 78b43536

由 Arjan van de Ven 提交于 7月 19, 2010

slab has a "once every 2 second" timer for its housekeeping.
As the number of logical processors is growing, its more and more
common that this 2 second timer becomes the primary wakeup source.

This patch turns this housekeeping timer into a deferable timer,
which means that the timer does not interrupt idle, but just runs
at the next event that wakes the cpu up.

The impact is that the timer likely runs a bit later, but during the
delay no code is running so there's not all that much reason for
a difference in housekeeping to occur because of this delay.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

78b43536

19 7月, 2010 3 次提交

kmemleak: Add support for NO_BOOTMEM configurations · 9078370c

由 Catalin Marinas 提交于 7月 19, 2010

With commits 08677214 and 59be5a8e, alloc_bootmem()/free_bootmem() and
friends use the early_res functions for memory management when
NO_BOOTMEM is enabled. This patch adds the kmemleak calls in the
corresponding code paths for bootmem allocations.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org

9078370c

kmemleak: Annotate false positive in init_section_page_cgroup() · 7952f988

由 Catalin Marinas 提交于 7月 19, 2010

The pointer to the page_cgroup table allocated in
init_section_page_cgroup() is stored in section->page_cgroup as (base -
pfn). Since this value does not point to the beginning or inside the
allocated memory block, kmemleak reports a false positive.

This was reported in bugzilla.kernel.org as #16297.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NAdrien Dessemond <adrien.dessemond@gmail.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>

7952f988

mm: add context argument to shrinker callback · 7f8275d0

由 Dave Chinner 提交于 7月 19, 2010

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7f8275d0

16 7月, 2010 6 次提交

slub: Use kmem_cache flags to detect if slab is in debugging mode. · af537b0a

由 Christoph Lameter 提交于 7月 09, 2010

The cacheline with the flags is reachable from the hot paths after the
percpu allocator changes went in. So there is no need anymore to put a
flag into each slab page. Get rid of the SlubDebug flag and use
the flags in kmem_cache instead.
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

af537b0a

slub: Allow removal of slab caches during boot · f5b801ac

由 Christoph Lameter 提交于 7月 09, 2010

If a slab cache is removed before we have setup sysfs then simply skip over
the sysfs handling.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roland Dreier <rdreier@cisco.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

f5b801ac

slub: Check kasprintf results in kmem_cache_init() · d7278bd7

由 Christoph Lameter 提交于 7月 09, 2010

Small allocations may fail during slab bringup which is fatal. Add a BUG_ON()
so that we fail immediately rather than failing later during sysfs
processing.
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

d7278bd7

SLUB: Constants need UL · f90ec390

由 Christoph Lameter 提交于 7月 09, 2010

UL suffix is missing in some constants. Conform to how slab.h uses constants.
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

f90ec390

slub: Use a constant for a unspecified node. · 2154a336

由 Christoph Lameter 提交于 7月 09, 2010

kmalloc_node() and friends can be passed a constant -1 to indicate
that no choice was made for the node from which the object needs to
come.

Use NUMA_NO_NODE instead of -1.

CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NChristoph Lameter <cl@linux-foundation.org>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

2154a336

SLOB: Free objects to their own list · d602daba

由 Bob Liu 提交于 7月 10, 2010

SLOB has alloced smaller objects from their own list in reduce overall external
fragmentation and increase repeatability, free to their own list also.

This is /proc/meminfo result in my test machine:

  without this patch:
  ===
  MemTotal:        1030720 kB
  MemFree:          750012 kB
  Buffers:           15496 kB
  Cached:           160396 kB
  SwapCached:            0 kB
  Active:           105024 kB
  Inactive:         145604 kB
  Active(anon):      74816 kB
  Inactive(anon):     2180 kB
  Active(file):      30208 kB
  Inactive(file):   143424 kB
  Unevictable:          16 kB
  ....

  with this patch:
  ===
  MemTotal:        1030720 kB
  MemFree:          751908 kB
  Buffers:           15492 kB
  Cached:           160280 kB
  SwapCached:            0 kB
  Active:           102720 kB
  Inactive:         146140 kB
  Active(anon):      73168 kB
  Inactive(anon):     2180 kB
  Active(file):      29552 kB
  Inactive(file):   143960 kB
  Unevictable:          16 kB
  ...

The result shows an improvement of 1 MB!

And when I tested it on a embeded system with 64 MB, I found this path is never
called during kernel bootup.
Acked-by: NMatt Mackall <mpm@selenic.com>
Signed-off-by: NBob Liu <lliubbo@gmail.com>
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

d602daba

14 7月, 2010 1 次提交

lmb: rename to memblock · 95f72d1e

由 Yinghai Lu 提交于 7月 12, 2010

via following scripts

      FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

      sed -i \
        -e 's/lmb/memblock/g' \
        -e 's/LMB/MEMBLOCK/g' \
        $FILES

      for N in $(find . -name lmb.[ch]); do
        M=$(echo $N | sed 's/lmb/memblock/g')
        mv $N $M
      done

and remove some wrong change like lmbench and dlmb etc.

also move memblock.c from lib/ to mm/
Suggested-by: NIngo Molnar <mingo@elte.hu>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

95f72d1e

10 7月, 2010 1 次提交

x86, ioremap: Fix incorrect physical address handling in PAE mode · ffa71f33

由 Kenji Kaneshige 提交于 6月 18, 2010

Current x86 ioremap() doesn't handle physical address higher than
32-bit properly in X86_32 PAE mode. When physical address higher than
32-bit is passed to ioremap(), higher 32-bits in physical address is
cleared wrongly. Due to this bug, ioremap() can map wrong address to
linear address space.

In my case, 64-bit MMIO region was assigned to a PCI device (ioat
device) on my system. Because of the ioremap()'s bug, wrong physical
address (instead of MMIO region) was mapped to linear address space.
Because of this, loading ioatdma driver caused unexpected behavior
(kernel panic, kernel hangup, ...).
Signed-off-by: NKenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <4C1AE680.7090408@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

ffa71f33

06 7月, 2010 2 次提交

writeback: simplify the write back thread queue · 83ba7b07

由 Christoph Hellwig 提交于 7月 06, 2010

First remove items from work_list as soon as we start working on them. This
means we don't have to track any pending or visited state and can get
rid of all the RCU magic freeing the work items - we can simply free
them once the operation has finished. Second use a real completion for
tracking synchronous requests - if the caller sets the completion pointer
we complete it, otherwise use it as a boolean indicator that we can free
the work item directly. Third unify struct wb_writeback_args and struct
bdi_work into a single data structure, wb_writeback_work. Previous we
set all parameters into a struct wb_writeback_args, copied it into
struct bdi_work, copied it again on the stack to use it there. Instead
of just allocate one structure dynamically or on the stack and use it
all the way through the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

83ba7b07

writeback: remove writeback_inodes_wbc · 9c3a8ee8

由 Christoph Hellwig 提交于 6月 10, 2010

This was just an odd wrapper around writeback_inodes_wb.  Removing this
also allows to get rid of the bdi member of struct writeback_control
which was rather out of place there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

9c3a8ee8

30 6月, 2010 2 次提交

mempolicy: fix dangling reference to tmpfs superblock mpol · 5c0c1654

由 Lee Schermerhorn 提交于 6月 29, 2010

My patch to "Factor out duplicate put/frees in mpol_shared_policy_init()
to a common return path"; and Dan Carpenter's fix thereto both left a
dangling reference to the incoming tmpfs superblock mempolicy structure.
A similar leak was introduced earlier when the nodemask was moved offstack
to the scratch area despite the note in the comment block regarding the
incoming ref.

Move the remaining 'put of the incoming "mpol" to the common exit path to
drop the reference.
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: NDan Carpenter <error27@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5c0c1654

memcg: fix wake up in oom wait queue · 4d845ebf

由 KAMEZAWA Hiroyuki 提交于 6月 29, 2010

OOM-waitqueue should be waken up when oom_disable is canceled.  This is a
fix for 3c11ecf4 ("memcg: oom kill disable and oom status").

How to test:
 Create a cgroup A...
 1. set memory.limit and memory.memsw.limit to be small value
 2. echo 1 > /cgroup/A/memory.oom_control, this disables oom-kill.
 3. run a program which must cause OOM.

A program executed in 3 will sleep by oom_waiqueue in memcg.  Then, how to
wake it up is problem.

 1. echo 0 > /cgroup/A/memory.oom_control (enable OOM-killer)
 2. echo big mem > /cgroup/A/memory.memsw.limit_in_bytes(allow more swap)

etc..

Without the patch, a task in slept can not be waken up.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d845ebf

28 6月, 2010 2 次提交

percpu: allow limited allocation before slab is online · 099a19d9

由 Tejun Heo 提交于 6月 27, 2010

This patch updates percpu allocator such that it can serve limited
amount of allocation before slab comes online.  This is primarily to
allow slab to depend on working percpu allocator.

Two parameters, PERCPU_DYNAMIC_EARLY_SIZE and SLOTS, determine how
much memory space and allocation map slots are reserved.  If this
reserved area is exhausted, WARN_ON_ONCE() will trigger and allocation
will fail till slab comes online.

The following changes are made to implement early alloc.

* pcpu_mem_alloc() now checks slab_is_available()

* Chunks are allocated using pcpu_mem_alloc()

* Init paths make sure ai->dyn_size is at least as large as
  PERCPU_DYNAMIC_EARLY_SIZE.

* Initial alloc maps are allocated in __initdata and copied to
  kmalloc'd areas once slab is online.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>

099a19d9

percpu: make @dyn_size always mean min dyn_size in first chunk init functions · 4ba6ce25

由 Tejun Heo 提交于 6月 27, 2010

In pcpu_build_alloc_info() and pcpu_embed_first_chunk(), @dyn_size was
ssize_t, -1 meant auto-size, 0 forced 0 and positive meant minimum
size.  There's no use case for forcing 0 and the upcoming early alloc
support always requires non-zero dynamic size.  Make @dyn_size always
mean minimum dyn_size.

While at it, make pcpu_build_alloc_info() static which doesn't have
any external caller as suggested by David Rientjes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: David Rientjes <rientjes@google.com>

4ba6ce25

18 6月, 2010 1 次提交

percpu: fix first chunk match in per_cpu_ptr_to_phys() · 9983b6f0

由 Tejun Heo 提交于 6月 18, 2010

per_cpu_ptr_to_phys() determines whether the passed in @addr belongs
to the first_chunk or not by just matching the address against the
address range of the base unit (unit0, used by cpu0).  When an adress
from another cpu was passed in, it will always determine that the
address doesn't belong to the first chunk even when it does.  This
makes the function return a bogus physical address which may lead to
crash.

This problem was discovered by Cliff Wickman while investigating a
crash during kdump on a SGI UV system.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NCliff Wickman <cpw@sgi.com>
Tested-by: NCliff Wickman <cpw@sgi.com>
Cc: stable@kernel.org

9983b6f0

17 6月, 2010 1 次提交

percpu: fix trivial bugs in pcpu_build_alloc_info() · a92d3ff9

由 Pavel V. Panteleev 提交于 6月 17, 2010

Fix the following two trivial bugs in pcpu_build_alloc_info()

* we should memset group_cnt to 0 by size of group_cnt, not size of
  group_map (both are of the same size, so the bug isn't dangerous)

* we can delete useless variable group_cnt_max.
Signed-off-by: NPavel V. Panteleev <pp_84@mail.ru>
Signed-off-by: NTejun Heo <tj@kernel.org>

a92d3ff9

15 6月, 2010 1 次提交

mm: remove all rcu head initializations · 875352c9

由 Paul E. McKenney 提交于 5月 10, 2010

Remove all rcu head inits. We don't care about the RCU head state before passing
it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can
keep track of objects on stack.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Andrew Morton <akpm@linux-foundation.org>

875352c9

11 6月, 2010 1 次提交

writeback: simplify and split bdi_start_writeback · c5444198

由 Christoph Hellwig 提交于 6月 08, 2010

bdi_start_writeback now never gets a superblock passed, so we can just remove
that case.  And to further untangle the code and flatten the call stack
split it into two trivial helpers for it's two callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

c5444198

09 6月, 2010 3 次提交

tracing: Remove kmemtrace ftrace plugin · 039ca4e7

由 Li Zefan 提交于 5月 26, 2010

We have been resisting new ftrace plugins and removing existing
ones, and kmemtrace has been superseded by kmem trace events
and perf-kmem, so we remove it.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NEduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
[ remove kmemtrace from the makefile, handle slob too ]
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

039ca4e7

perf: Add non-exec mmap() tracking · 3af9e859

由 Eric B Munson 提交于 5月 18, 2010

Add the capacility to track data mmap()s. This can be used together
with PERF_SAMPLE_ADDR for data profiling.
Signed-off-by: NAnton Blanchard <anton@samba.org>
[Updated code for stable perf ABI]
Signed-off-by: NEric B Munson <ebmunson@us.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1274193049-25997-1-git-send-email-ebmunson@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3af9e859

writeback: limit write_cache_pages integrity scanning to current EOF · d87815cb

由 Dave Chinner 提交于 6月 09, 2010

sync can currently take a really long time if a concurrent writer is
extending a file. The problem is that the dirty pages on the address
space grow in the same direction as write_cache_pages scans, so if
the writer keeps ahead of writeback, the writeback will not
terminate until the writer stops adding dirty pages.

For a data integrity sync, we only need to write the pages dirty at
the time we start the writeback, so we can stop scanning once we get
to the page that was at the end of the file at the time the scan
started.

This will prevent operations like copying a large file preventing
sync from completing as it will not write back pages that were
dirtied after the sync was started. This does not impact the
existing integrity guarantees, as any dirty page (old or new)
within the EOF range at the start of the scan will still be
captured.

This patch will not prevent sync from blocking on large writes into
holes. That requires more complex intervention while this patch only
addresses the common append-case of this sync holdoff.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d87815cb