提交 · 6ed311b282210d23d1a2cb2665aa899979993628 · OpenHarmony / kernel_linux

05 8月, 2010 15 次提交

memblock: Move functions around into a more sensible order · 6ed311b2

由 Benjamin Herrenschmidt 提交于 7月 12, 2010

Some shuffling is needed for doing array resize so we may as well
put some sense into the ordering of the functions in the whole memblock.c
file. No code change. Added some comments.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6ed311b2

memblock: split memblock_find_base() out of __memblock_alloc_base() · 7f219c73

由 Benjamin Herrenschmidt 提交于 7月 12, 2010

This will be used by the array resize code and might prove useful
to some arch code as well at which point it can be made non-static.

Also add comment as to why aligning size is important
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
---

v2. Fix loss of size alignment
v3. Fix result code

7f219c73

memblock: Move memblock_init() to the bottom of the file · 7590abe8

由 Benjamin Herrenschmidt 提交于 7月 06, 2010

It's a real PITA to have to search for it in the middle
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7590abe8

B
memblock: Define MEMBLOCK_ERROR internally instead of using ~(phys_addr_t)0 · 4d629f9a
由 Benjamin Herrenschmidt 提交于 7月 06, 2010
```
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
4d629f9a

memblock: Make memblock_find_region() out of memblock_alloc_region() · 3a9c2c81

由 Benjamin Herrenschmidt 提交于 7月 12, 2010

This function will be used to locate a free area to put the new memblock
arrays when attempting to resize them. memblock_alloc_region() is gone,
the two callsites now call memblock_add_region().
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
---
v2. Fix membase_alloc_nid_region() conversion

3a9c2c81

memblock: Add debug markers at the end of the array · 449e8df3

由 Benjamin Herrenschmidt 提交于 7月 06, 2010

Since we allocate one more than needed, why not do a bit of sanity checking
here to ensure we don't walk past the end of the array ?
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

449e8df3

memblock: Move memblock arrays to static storage in memblock.c and make their size a variable · bf23c51f

由 Benjamin Herrenschmidt 提交于 7月 06, 2010

This is in preparation for having resizable arrays.

Note that we still allocate one more than needed, this is unchanged from
the previous implementation.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

bf23c51f

memblock: Remove memblock_type.size and add memblock.memory_size instead · 4734b594

由 Benjamin Herrenschmidt 提交于 7月 28, 2010

Right now, both the "memory" and "reserved" memblock_type structures have
a "size" member. It represents the calculated memory size in the former
case and is unused in the latter.

This moves it out to the main memblock structure instead
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4734b594

memblock: Change u64 to phys_addr_t · 2898cc4c

由 Benjamin Herrenschmidt 提交于 8月 04, 2010

Let's not waste space and cycles on archs that don't support >32-bit
physical address space.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

2898cc4c

memblock: Remove rmo_size, burry it in arch/powerpc where it belongs · cd3db0c4

由 Benjamin Herrenschmidt 提交于 7月 06, 2010

The RMA (RMO is a misnomer) is a concept specific to ppc64 (in fact
server ppc64 though I hijack it on embedded ppc64 for similar purposes)
and represents the area of memory that can be accessed in real mode
(aka with MMU off), or on embedded, from the exception vectors (which
is bolted in the TLB) which pretty much boils down to the same thing.

We take that out of the generic MEMBLOCK data structure and move it into
arch/powerpc where it belongs, renaming it to "RMA" while at it.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

cd3db0c4

memblock: Introduce default allocation limit and use it to replace explicit ones · e63075a3

由 Benjamin Herrenschmidt 提交于 7月 06, 2010

This introduce memblock.current_limit which is used to limit allocations
from memblock_alloc() or memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE).

The old MEMBLOCK_ALLOC_ANYWHERE changes value from 0 to ~(u64)0 and can still
be used with memblock_alloc_base() to allocate really anywhere.

It is -no-longer- cropped to MEMBLOCK_REAL_LIMIT which disappears.

Note to archs: I'm leaving the default limit to MEMBLOCK_ALLOC_ANYWHERE. I
strongly recommend that you ensure that you set an appropriate limit
during boot in order to guarantee that an memblock_alloc() at any time
results in something that is accessible with a simple __va().

The reason is that a subsequent patch will introduce the ability for
the array to resize itself by reallocating itself. The MEMBLOCK core will
honor the current limit when performing those allocations.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e63075a3

B
memblock: Expose MEMBLOCK_ALLOC_ANYWHERE · 27f574c2
由 Benjamin Herrenschmidt 提交于 7月 06, 2010
```
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
27f574c2
B
memblock: Factor the lowest level alloc function · c3f72b57
由 Benjamin Herrenschmidt 提交于 7月 06, 2010
```
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
c3f72b57
B
memblock: Remove nid_range argument, arch provides memblock_nid_range() instead · 35a1f0bd
由 Benjamin Herrenschmidt 提交于 7月 06, 2010
```
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
35a1f0bd

memblock: Remove memblock_find() · b693fffb

由 Benjamin Herrenschmidt 提交于 8月 04, 2010

Nobody uses it anymore. It's semantics were ... weird
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b693fffb

04 8月, 2010 3 次提交

memblock: Implement memblock_is_memory and memblock_is_region_memory · 72d4b0b4

由 Benjamin Herrenschmidt 提交于 8月 04, 2010

To make it fast, we steal ARM's binary search for memblock_is_memory()
and we use that to also the replace existing implementation of
memblock_is_reserved().
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

72d4b0b4

B
memblock: Rename memblock_region to memblock_type and memblock_property to memblock_region · e3239ff9
由 Benjamin Herrenschmidt 提交于 8月 04, 2010
```
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
e3239ff9

memblock: Fix memblock_is_region_reserved() to return a boolean · f1c2c19c

由 Benjamin Herrenschmidt 提交于 8月 04, 2010

All callers expect a boolean result which is true if the region
overlaps a reserved region. However, the implementation actually
returns -1 if there is no overlap, and a region index (0 based)
if there is.

Make it behave as callers (and common sense) expect.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f1c2c19c

31 7月, 2010 1 次提交

mm: fix ia64 crash when gcore reads gate area · de51257a

由 Hugh Dickins 提交于 7月 30, 2010

Debian's ia64 autobuilders have been seeing kernel freeze or reboot
when running the gdb testsuite (Debian bug 588574): dannf bisected to
2.6.32 62eede62 "mm: ZERO_PAGE without
PTE_SPECIAL"; and reproduced it with gdb's gcore on a simple target.

I'd missed updating the gate_vma handling in __get_user_pages(): that
happens to use vm_normal_page() (nowadays failing on the zero page),
yet reported success even when it failed to get a page - boom when
access_process_vm() tried to copy that to its intermediate buffer.

Fix this, resisting cleanups: in particular, leave it for now reporting
success when not asked to get any pages - very probably safe to change,
but let's not risk it without testing exposure.

Why did ia64 crash with 16kB pages, but succeed with 64kB pages?
Because setup_gate() pads each 64kB of its gate area with zero pages.
Reported-by: NAndreas Barth <aba@not.so.argh.org>
Bisected-by: Ndann frazier <dannf@debian.org>
Signed-off-by: NHugh Dickins <hughd@google.com>
Tested-by: Ndann frazier <dannf@dannf.org>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

de51257a

21 7月, 2010 2 次提交

x86,nobootmem: make alloc_bootmem_node fall back to other node when 32bit numa is used · b8ab9f82

由 Yinghai Lu 提交于 7月 20, 2010

Borislav Petkov reported his 32bit numa system has problem:

[    0.000000] Reserving total of 4c00 pages for numa KVA remap
[    0.000000] kva_start_pfn ~ 32800 max_low_pfn ~ 375fe
[    0.000000] max_pfn = 238000
[    0.000000] 8202MB HIGHMEM available.
[    0.000000] 885MB LOWMEM available.
[    0.000000]   mapped low ram: 0 - 375fe000
[    0.000000]   low ram: 0 - 375fe000
[    0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 1000 1000 => 34e7000
[    0.000000] alloc (nid=8 100000 - 7ee00000) (1000000 - ffffffff) 200 40 => 34c9d80
[    0.000000] alloc (nid=0 100000 - 7ee00000) (1000000 - ffffffffffffffff) 180 40 => 34e6140
[    0.000000] alloc (nid=1 80000000 - c7e60000) (1000000 - ffffffffffffffff) 240 40 => 80000000
[    0.000000] BUG: unable to handle kernel paging request at 40000000
[    0.000000] IP: [<c2c8cff1>] __alloc_memory_core_early+0x147/0x1d6
[    0.000000] *pdpt = 0000000000000000 *pde = f000ff53f000ff00
...
[    0.000000] Call Trace:
[    0.000000]  [<c2c8b4f8>] ? __alloc_bootmem_node+0x216/0x22f
[    0.000000]  [<c2c90c9b>] ? sparse_early_usemaps_alloc_node+0x5a/0x10b
[    0.000000]  [<c2c9149e>] ? sparse_init+0x1dc/0x499
[    0.000000]  [<c2c79118>] ? paging_init+0x168/0x1df
[    0.000000]  [<c2c780ff>] ? native_pagetable_setup_start+0xef/0x1bb

looks like it allocates too much high address for bootmem.

Try to cut limit with get_max_mapped()
Reported-by: NBorislav Petkov <borislav.petkov@amd.com>
Tested-by: NConny Seidel <conny.seidel@amd.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: <stable@kernel.org>		[2.6.34.x]
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8ab9f82

mm/vmscan.c: fix mapping use after free · a6aa62a0

由 Nick Piggin 提交于 7月 20, 2010

We need lock_page_nosync() here because we have no reference to the
mapping when taking the page lock.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a6aa62a0

19 7月, 2010 3 次提交

kmemleak: Add support for NO_BOOTMEM configurations · 9078370c

由 Catalin Marinas 提交于 7月 19, 2010

With commits 08677214 and 59be5a8e, alloc_bootmem()/free_bootmem() and
friends use the early_res functions for memory management when
NO_BOOTMEM is enabled. This patch adds the kmemleak calls in the
corresponding code paths for bootmem allocations.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org

9078370c

kmemleak: Annotate false positive in init_section_page_cgroup() · 7952f988

由 Catalin Marinas 提交于 7月 19, 2010

The pointer to the page_cgroup table allocated in
init_section_page_cgroup() is stored in section->page_cgroup as (base -
pfn). Since this value does not point to the beginning or inside the
allocated memory block, kmemleak reports a false positive.

This was reported in bugzilla.kernel.org as #16297.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NAdrien Dessemond <adrien.dessemond@gmail.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>

7952f988

mm: add context argument to shrinker callback · 7f8275d0

由 Dave Chinner 提交于 7月 19, 2010

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7f8275d0

14 7月, 2010 1 次提交

lmb: rename to memblock · 95f72d1e

由 Yinghai Lu 提交于 7月 12, 2010

via following scripts

      FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

      sed -i \
        -e 's/lmb/memblock/g' \
        -e 's/LMB/MEMBLOCK/g' \
        $FILES

      for N in $(find . -name lmb.[ch]); do
        M=$(echo $N | sed 's/lmb/memblock/g')
        mv $N $M
      done

and remove some wrong change like lmbench and dlmb etc.

also move memblock.c from lib/ to mm/
Suggested-by: NIngo Molnar <mingo@elte.hu>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

95f72d1e

06 7月, 2010 2 次提交

writeback: simplify the write back thread queue · 83ba7b07

由 Christoph Hellwig 提交于 7月 06, 2010

First remove items from work_list as soon as we start working on them. This
means we don't have to track any pending or visited state and can get
rid of all the RCU magic freeing the work items - we can simply free
them once the operation has finished. Second use a real completion for
tracking synchronous requests - if the caller sets the completion pointer
we complete it, otherwise use it as a boolean indicator that we can free
the work item directly. Third unify struct wb_writeback_args and struct
bdi_work into a single data structure, wb_writeback_work. Previous we
set all parameters into a struct wb_writeback_args, copied it into
struct bdi_work, copied it again on the stack to use it there. Instead
of just allocate one structure dynamically or on the stack and use it
all the way through the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

83ba7b07

writeback: remove writeback_inodes_wbc · 9c3a8ee8

由 Christoph Hellwig 提交于 6月 10, 2010

This was just an odd wrapper around writeback_inodes_wb.  Removing this
also allows to get rid of the bdi member of struct writeback_control
which was rather out of place there.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

9c3a8ee8

30 6月, 2010 2 次提交

mempolicy: fix dangling reference to tmpfs superblock mpol · 5c0c1654

由 Lee Schermerhorn 提交于 6月 29, 2010

My patch to "Factor out duplicate put/frees in mpol_shared_policy_init()
to a common return path"; and Dan Carpenter's fix thereto both left a
dangling reference to the incoming tmpfs superblock mempolicy structure.
A similar leak was introduced earlier when the nodemask was moved offstack
to the scratch area despite the note in the comment block regarding the
incoming ref.

Move the remaining 'put of the incoming "mpol" to the common exit path to
drop the reference.
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: NDan Carpenter <error27@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5c0c1654

memcg: fix wake up in oom wait queue · 4d845ebf

由 KAMEZAWA Hiroyuki 提交于 6月 29, 2010

OOM-waitqueue should be waken up when oom_disable is canceled.  This is a
fix for 3c11ecf4 ("memcg: oom kill disable and oom status").

How to test:
 Create a cgroup A...
 1. set memory.limit and memory.memsw.limit to be small value
 2. echo 1 > /cgroup/A/memory.oom_control, this disables oom-kill.
 3. run a program which must cause OOM.

A program executed in 3 will sleep by oom_waiqueue in memcg.  Then, how to
wake it up is problem.

 1. echo 0 > /cgroup/A/memory.oom_control (enable OOM-killer)
 2. echo big mem > /cgroup/A/memory.memsw.limit_in_bytes(allow more swap)

etc..

Without the patch, a task in slept can not be waken up.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d845ebf

18 6月, 2010 1 次提交

percpu: fix first chunk match in per_cpu_ptr_to_phys() · 9983b6f0

由 Tejun Heo 提交于 6月 18, 2010

per_cpu_ptr_to_phys() determines whether the passed in @addr belongs
to the first_chunk or not by just matching the address against the
address range of the base unit (unit0, used by cpu0).  When an adress
from another cpu was passed in, it will always determine that the
address doesn't belong to the first chunk even when it does.  This
makes the function return a bogus physical address which may lead to
crash.

This problem was discovered by Cliff Wickman while investigating a
crash during kdump on a SGI UV system.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NCliff Wickman <cpw@sgi.com>
Tested-by: NCliff Wickman <cpw@sgi.com>
Cc: stable@kernel.org

9983b6f0

17 6月, 2010 1 次提交

percpu: fix trivial bugs in pcpu_build_alloc_info() · a92d3ff9

由 Pavel V. Panteleev 提交于 6月 17, 2010

Fix the following two trivial bugs in pcpu_build_alloc_info()

* we should memset group_cnt to 0 by size of group_cnt, not size of
  group_map (both are of the same size, so the bug isn't dangerous)

* we can delete useless variable group_cnt_max.
Signed-off-by: NPavel V. Panteleev <pp_84@mail.ru>
Signed-off-by: NTejun Heo <tj@kernel.org>

a92d3ff9

11 6月, 2010 1 次提交

writeback: simplify and split bdi_start_writeback · c5444198

由 Christoph Hellwig 提交于 6月 08, 2010

bdi_start_writeback now never gets a superblock passed, so we can just remove
that case.  And to further untangle the code and flatten the call stack
split it into two trivial helpers for it's two callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

c5444198

09 6月, 2010 2 次提交

writeback: limit write_cache_pages integrity scanning to current EOF · d87815cb

由 Dave Chinner 提交于 6月 09, 2010

sync can currently take a really long time if a concurrent writer is
extending a file. The problem is that the dirty pages on the address
space grow in the same direction as write_cache_pages scans, so if
the writer keeps ahead of writeback, the writeback will not
terminate until the writer stops adding dirty pages.

For a data integrity sync, we only need to write the pages dirty at
the time we start the writeback, so we can stop scanning once we get
to the page that was at the end of the file at the time the scan
started.

This will prevent operations like copying a large file preventing
sync from completing as it will not write back pages that were
dirtied after the sync was started. This does not impact the
existing integrity guarantees, as any dirty page (old or new)
within the EOF range at the start of the scan will still be
captured.

This patch will not prevent sync from blocking on large writes into
holes. That requires more complex intervention while this patch only
addresses the common append-case of this sync holdoff.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d87815cb

writeback: pay attention to wbc->nr_to_write in write_cache_pages · 0b564927

由 Dave Chinner 提交于 6月 09, 2010

If a filesystem writes more than one page in ->writepage, write_cache_pages
fails to notice this and continues to attempt writeback when wbc->nr_to_write
has gone negative - this trace was captured from XFS:

    wbc_writeback_start: towrt=1024
    wbc_writepage: towrt=1024
    wbc_writepage: towrt=0
    wbc_writepage: towrt=-1
    wbc_writepage: towrt=-5
    wbc_writepage: towrt=-21
    wbc_writepage: towrt=-85

This has adverse effects on filesystem writeback behaviour. write_cache_pages()
needs to terminate after a certain number of pages are written, not after a
certain number of calls to ->writepage are made.  This is a regression
introduced by 17bc6c30 ("vfs: Add
no_nrwrite_index_update writeback control flag"), but cannot be reverted
directly due to subsequent bug fixes that have gone in on top of it.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0b564927

05 6月, 2010 2 次提交

vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim failure · bb21c7ce

由 KOSAKI Motohiro 提交于 6月 04, 2010

Greg Thelen reported recent Johannes's stack diet patch makes kernel hang.
 His test is following.

  mount -t cgroup none /cgroups -o memory
  mkdir /cgroups/cg1
  echo $$ > /cgroups/cg1/tasks
  dd bs=1024 count=1024 if=/dev/null of=/data/foo
  echo $$ > /cgroups/tasks
  echo 1 > /cgroups/cg1/memory.force_empty

Actually, This OOM hard to try logic have been corrupted since following
two years old patch.

	commit a41f24ea
	Author: Nishanth Aravamudan <nacc@us.ibm.com>
	Date:   Tue Apr 29 00:58:25 2008 -0700

	    page allocator: smarter retry of costly-order allocations

Original intention was "return success if the system have shrinkable zones
though priority==0 reclaim was failure".  But the above patch changed to
"return nr_reclaimed if .....".  Oh, That forgot nr_reclaimed may be 0 if
priority==0 reclaim failure.

And Johannes's patch 0aeb2339 ("vmscan: remove all_unreclaimable scan
control") made it more corrupt.  Originally, priority==0 reclaim failure
on memcg return 0, but this patch changed to return 1.  It totally
confused memcg.

This patch fixes it completely.
Reported-by: NGreg Thelen <gthelen@google.com>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: NGreg Thelen <gthelen@google.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb21c7ce

fix truncate inode time modification breakage · af5a30d8

由 Nick Piggin 提交于 6月 03, 2010

mtime and ctime should be changed only if the file size has actually
changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate
sequence has caused regressions where they always update timestamps.

There is some strange cases in POSIX where truncate(2) must not update
times unless the size has acutally changed, see 6e656be8.

This area is all still rather buggy in different ways in a lot of
filesystems and needs a cleanup and audit (ideally the vfs will provide
a simple attribute or call to direct all filesystems exactly which
attributes to change). But coming up with the best solution will take a
while and is not appropriate for rc anyway.

So fix recent regression for now.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

af5a30d8

01 6月, 2010 1 次提交

Revert "writeback: fix WB_SYNC_NONE writeback from umount" · 0e3c9a22

由 Jens Axboe 提交于 6月 01, 2010

This reverts commit e913fc82.

We are investigating a hang associated with the WB_SYNC_NONE changes,
so revert them for now.

Conflicts:

	fs/fs-writeback.c
	mm/page-writeback.c
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

0e3c9a22

28 5月, 2010 3 次提交

tmpfs: convert to use the new truncate convention · 3889e6e7

由 npiggin@suse.de 提交于 5月 27, 2010

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3889e6e7

fs: introduce new truncate sequence · 7bb46a67

由 npiggin@suse.de 提交于 5月 27, 2010

Introduce a new truncate calling sequence into fs/mm subsystems. Rather than
setattr > vmtruncate > truncate, have filesystems call their truncate sequence
from ->setattr if filesystem specific operations are required. vmtruncate is
deprecated, and truncate_pagecache and inode_newsize_ok helpers introduced
previously should be used.

simple_setattr is introduced for simple in-ram filesystems to implement
the new truncate sequence. Eventually all filesystems should be converted
to implement a setattr, and the default code in notify_change should go
away.

simple_setsize is also introduced to perform just the ATTR_SIZE portion
of simple_setattr (ie. changing i_size and trimming pagecache).

To implement the new truncate sequence:
- filesystem specific manipulations (eg freeing blocks) must be done in
  the setattr method rather than ->truncate.
- vmtruncate can not be used by core code to trim blocks past i_size in
  the event of write failure after allocation, so this must be performed
  in the fs code.
- convert usage of helpers block_write_begin, nobh_write_begin,
  cont_write_begin, and *blockdev_direct_IO* to use _newtrunc postfixed
  variants. These avoid calling vmtruncate to trim blocks (see previous).
- inode_setattr should not be used. generic_setattr is a new function
  to be used to copy simple attributes into the generic inode.
- make use of the better opportunity to handle errors with the new sequence.

Big problem with the previous calling sequence: the filesystem is not called
until i_size has already changed.  This means it is not allowed to fail the
call, and also it does not know what the previous i_size was. Also, generic
code calling vmtruncate to truncate allocated blocks in case of error had
no good way to return a meaningful error (or, for example, atomically handle
block deallocation).

Cc: Christoph Hellwig <hch@lst.de>
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7bb46a67

rename the generic fsync implementations · 1b061d92

由 Christoph Hellwig 提交于 5月 26, 2010

We don't name our generic fsync implementations very well currently.
The no-op implementation for in-memory filesystems currently is called
simple_sync_file which doesn't make too much sense to start with,
the the generic one for simple filesystems is called simple_fsync
which can lead to some confusion.

This patch renames the generic file fsync method to generic_file_fsync
to match the other generic_file_* routines it is supposed to be used
with, and the no-op implementation to noop_fsync to make it obvious
what to expect.  In addition add some documentation for both methods.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1b061d92

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多