提交 · 8869477a49c3e99def1fcdadd6bbc407fea14b45 · openanolis / cloud-kernel

05 12月, 2007 1 次提交

security: protect from stack expantion into low vm addresses · 8869477a

由 Eric Paris 提交于 11月 26, 2007

Add security checks to make sure we are not attempting to expand the
stack into memory protected by mmap_min_addr
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

8869477a

01 12月, 2007 1 次提交

Fix kmem_cache_free performance regression in slab · 80cbd911

由 Matthew Wilcox 提交于 11月 29, 2007

The database performance group have found that half the cycles spent
in kmem_cache_free are spent in this one call to BUG_ON.  Moving it
into the CONFIG_SLAB_DEBUG-only function cache_free_debugcheck() is a
performance win of almost 0.5% on their particular benchmark.

The call was added as part of commit ddc2e812
with the comment that "overhead should be minimal".  It may have been
minimal at the time, but it isn't now.

[ Quoth Pekka Enberg: "I don't think the BUG_ON per se caused the
  performance regression but rather the virt_to_head_page() changes to
  virt_to_cache() that were added later." ]
Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
Acked-by: NPekka J Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

80cbd911

30 11月, 2007 2 次提交

memory hotplug fix: fix section mismatch in vmammap_allock_block() · e0dc3a53

由 KAMEZAWA Hiroyuki 提交于 11月 28, 2007

Fixes section mismatch below.

WARNING: vmlinux.o(.text+0x946b5): Section mismatch: reference to .init.text:'
__alloc_bootmem_node (between 'vmemmap_alloc_block' and 'vmemmap_pgd_populate')
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0dc3a53

Fix boot problem with iSeries lacking hugepage support · ba72cb8c

由 Mel Gorman 提交于 11月 28, 2007

Ordinarily the size of a pageblock is determined at compile-time based on the
hugepage size. On PPC64, the hugepage size is determined at runtime based on
what is supported by the machine. With legacy machines such as iSeries that
do not support hugepages, HPAGE_SHIFT is 0. This results in pageblock_order
being set to -PAGE_SHIFT and a crash results shortly afterwards.

This patch adds a function to select a sensible value for pageblock order by
default when HUGETLB_PAGE_SIZE_VARIABLE is set. It checks that HPAGE_SHIFT
is a sensible value before using the hugepage size; if it is not MAX_ORDER-1
is used.

This is a fix for 2.6.24.

Credit goes to Stephen Rothwell for identifying the bug and testing candidate
patches.  Additional credit goes to Andy Whitcroft for spotting a problem
with respects to IA-64 before releasing. Additional credit to David Gibson
for testing with the libhugetlbfs test suite.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Tested-by: NStephen Rothwell <sfr@canb.auug.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba72cb8c

29 11月, 2007 2 次提交

prep_zero_page: remove bogus BUG_ON · 09f345da

由 Hugh Dickins 提交于 11月 28, 2007

2.6.11 gave __GFP_ZERO's prep_zero_page a bogus "highmem may have to wait"
assertion. Presumably added under the misconception that clear_highpage
uses nonatomic kmap; but then and now it uses kmap_atomic, so no problem.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09f345da

tmpfs: restore missing clear_highpage · e84e2e13

由 Hugh Dickins 提交于 11月 28, 2007

tmpfs was misconverted to __GFP_ZERO in 2.6.11. There's an unusual case in
which shmem_getpage receives the page from its caller instead of allocating.
We must cover this case by clear_highpage before SetPageUptodate, as before.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e84e2e13

20 11月, 2007 1 次提交

[S390] Optimize storage key handling for anonymous pages · ce7e9fae

由 Christian Borntraeger 提交于 11月 20, 2007

page_mkclean used to call page_clear_dirty for every given page. This
is different to all other architectures, where the dirty bit in the
PTEs is only resetted, if page_mapping() returns a non-NULL pointer.
We can move the page_test_dirty/page_clear_dirty sequence into the
2nd if to avoid unnecessary iske/sske sequences, which are expensive.

This change also helps kvm for s390 as the host must transfer the
dirty bit into the guest status bits. By moving the page_clear_dirty
operation into the 2nd if, the vm will only call page_clear_dirty
for pages where it walks the mapping anyway. There it calls
ptep_clear_flush for writable ptes, so we can transfer the dirty bit
to the guest.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ce7e9fae

16 11月, 2007 2 次提交

dirty page balancing: Get rid of broken unmapped_ratio logic · 8c086340

由 Linus Torvalds 提交于 11月 15, 2007

This code harks back to the days when we didn't count dirty mapped
pages, which led us to try to balance the number of dirty unmapped pages
by how much unmapped memory there was in the system.

That makes no sense any more, since now the dirty counts include the
mapped pages. Not to mention that the math doesn't work with HIGHMEM
machines anyway, and causes the unmapped_ratio to potentially turn
negative (which we do catch thanks to clamping it at a minimum value,
but I mention that as an indication of how broken the code is).

The code also was written at a time when the default dirty ratio was
much larger, and the unmapped_ratio logic effectively capped that large
dirty ratio a bit. Again, we've since lowered the dirty ratio rather
aggressively, further lessening the point of that code.
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c086340

slob: fix memory corruption · d32ddd8f

由 Nick Piggin 提交于 11月 15, 2007

Previously, it would be possible for prev->next to point to
&free_slob_pages, and thus we would try to move a list onto itself, and
bad things would happen.

It seems a bit hairy to be doing list operations with the list marker as
an entry, rather than a head, but...

this resolves the following crash:

  http://bugzilla.kernel.org/show_bug.cgi?id=9379Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NMatt Mackall <mpm@selenic.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d32ddd8f

15 11月, 2007 16 次提交

Swap delay accounting, include lock_page() delays · 20a1022d

由 Balbir Singh 提交于 11月 14, 2007

The delay incurred in lock_page() should also be accounted in swap delay
accounting
Reported-by: NNick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

20a1022d

vmstat: fix section mismatch warning · 42614fcd

由 Randy Dunlap 提交于 11月 14, 2007

Mark start_cpu_timer() as __cpuinit instead of __devinit.
Fixes this section warning:

WARNING: vmlinux.o(.text+0x60e53): Section mismatch: reference to .init.text:start_cpu_timer (between 'vmstat_cpuup_callback' and 'vmstat_show')
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

42614fcd

fix mm/util.c:krealloc() · be21f0ab

由 Adrian Bunk 提交于 11月 14, 2007

Commit ef8b4520 added one NULL check for
"p" in krealloc(), but that doesn't seem to be enough since there
doesn't seem to be any guarantee that memcpy(ret, NULL, 0) works
(spotted by the Coverity checker).

For making it clearer what happens this patch also removes the pointless
min().
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be21f0ab

hugetlb: fix i_blocks accounting · 45c682a6

由 Ken Chen 提交于 11月 14, 2007

For administrative purpose, we want to query actual block usage for
hugetlbfs file via fstat.  Currently, hugetlbfs always return 0.  Fix that
up since kernel already has all the information to track it properly.
Signed-off-by: NKen Chen <kenchen@google.com>
Acked-by: NAdam Litke <agl@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

45c682a6

mm/hugetlb.c: make a function static · 8cde045c

由 Adrian Bunk 提交于 11月 14, 2007

return_unused_surplus_pages() can become static.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Acked-by: NAdam Litke <agl@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8cde045c

hugetlb: enforce quotas during reservation for shared mappings · 90d8b7e6

由 Adam Litke 提交于 11月 14, 2007

When a MAP_SHARED mmap of a hugetlbfs file succeeds, huge pages are reserved
to guarantee no problems will occur later when instantiating pages.  If quotas
are in force, page instantiation could fail due to a race with another process
or an oversized (but approved) shared mapping.

To prevent these scenarios, debit the quota for the full reservation amount up
front and credit the unused quota when the reservation is released.
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90d8b7e6

hugetlb: allow bulk updating in hugetlb_*_quota() · 9a119c05

由 Adam Litke 提交于 11月 14, 2007

Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to
allow bulk updating of the sbinfo->free_blocks counter.  This will be used by
the next patch in the series.
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9a119c05

hugetlb: debit quota in alloc_huge_page · 2fc39cec

由 Adam Litke 提交于 11月 14, 2007

Now that quota is credited by free_huge_page(), calls to hugetlb_get_quota()
seem out of place.  The alloc/free API is unbalanced because we handle the
hugetlb_put_quota() but expect the caller to open-code hugetlb_get_quota().
Move the get inside alloc_huge_page to clean up this disparity.

This patch has been kept apart from the previous patch because of the somewhat
dodgy ERR_PTR() use herein.  Moving the quota logic means that
alloc_huge_page() has two failure modes.  Quota failure must result in a
SIGBUS while a standard allocation failure is OOM.  Unfortunately, ERR_PTR()
doesn't like the small positive errnos we have in VM_FAULT_* so they must be
negated before they are used.

Does anyone take issue with the way I am using PTR_ERR.  If so, what are your
thoughts on how to clean this up (without needing an if,else if,else block at
each alloc_huge_page() callsite)?
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2fc39cec

hugetlb: fix quota management for private mappings · c79fb75e

由 Adam Litke 提交于 11月 14, 2007

The hugetlbfs quota management system was never taught to handle MAP_PRIVATE
mappings when that support was added.  Currently, quota is debited at page
instantiation and credited at file truncation.  This approach works correctly
for shared pages but is incomplete for private pages.  In addition to
hugetlb_no_page(), private pages can be instantiated by hugetlb_cow(); but
this function does not respect quotas.

Private huge pages are treated very much like normal, anonymous pages.  They
are not "backed" by the hugetlbfs file and are not stored in the mapping's
radix tree.  This means that private pages are invisible to
truncate_hugepages() so that function will not credit the quota.

This patch (based on a prototype provided by Ken Chen) moves quota crediting
for all pages into free_huge_page().  page->private is used to store a pointer
to the mapping to which this page belongs.  This is used to credit quota on
the appropriate hugetlbfs instance.
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c79fb75e

hugetlb: split alloc_huge_page into private and shared components · 348ea204

由 Adam Litke 提交于 11月 14, 2007

Hugetlbfs implements a quota system which can limit the amount of memory that
can be used by the filesystem.  Before allocating a new huge page for a file,
the quota is checked and debited.  The quota is then credited when truncating
the file.  I found a few bugs in the code for both MAP_PRIVATE and MAP_SHARED
mappings.  Before detailing the problems and my proposed solutions, we should
agree on a definition of quotas that properly addresses both private and
shared pages.  Since the purpose of quotas is to limit total memory
consumption on a per-filesystem basis, I argue that all pages allocated by the
fs (private and shared) should be charged against quota.

Private Mappings
================

The current code will debit quota for private pages sometimes, but will never
credit it.  At a minimum, this causes a leak in the quota accounting which
renders the accounting essentially useless as it is.  Shared pages have a one
to one mapping with a hugetlbfs file and are easy to account by debiting on
allocation and crediting on truncate.  Private pages are anonymous in nature
and have a many to one relationship with their hugetlbfs files (due to copy on
write).  Because private pages are not indexed by the mapping's radix tree,
thier quota cannot be credited at file truncation time.  Crediting must be
done when the page is unmapped and freed.

Shared Pages
============

I discovered an issue concerning the interaction between the MAP_SHARED
reservation system and quotas.  Since quota is not checked until page
instantiation, an over-quota mmap/reservation will initially succeed.  When
instantiating the first over-quota page, the program will receive SIGBUS.
This is inconsistent since the reservation is supposed to be a guarantee.  The
solution is to debit the full amount of quota at reservation time and credit
the unused portion when the reservation is released.

This patch series brings quotas back in line by making the following
modifications:
 * Private pages
   - Debit quota in alloc_huge_page()
   - Credit quota in free_huge_page()
 * Shared pages
   - Debit quota for entire reservation at mmap time
   - Credit quota for instantiated pages in free_huge_page()
   - Credit quota for unused reservation at munmap time

This patch:

The shared page reservation and dynamic pool resizing features have made the
allocation of private vs.  shared huge pages quite different.  By splitting
out the private/shared-specific portions of the process into their own
functions, readability is greatly improved.  alloc_huge_page now calls the
proper helper and performs common operations.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

348ea204

hugetlb: follow_hugetlb_page() for write access · 5b23dbe8

由 Adam Litke 提交于 11月 14, 2007

When calling get_user_pages(), a write flag is passed in by the caller to
indicate if write access is required on the faulted-in pages.  Currently,
follow_hugetlb_page() ignores this flag and always faults pages for
read-only access.  This can cause data corruption because a device driver
that calls get_user_pages() with write set will not expect COW faults to
occur on the returned pages.

This patch passes the write flag down to follow_hugetlb_page() and makes
sure hugetlb_fault() is called with the right write_access parameter.

[ezk@cs.sunysb.edu: build fix]
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Reviewed-by: NKen Chen <kenchen@google.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: NErez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b23dbe8

Add IORESOUCE_BUSY flag for System RAM · 887c3cb1

由 Yasunori Goto 提交于 11月 14, 2007

i386 and x86-64 registers System RAM as IORESOURCE_MEM | IORESOURCE_BUSY.

But ia64 registers it as IORESOURCE_MEM only.
In addition, memory hotplug code registers new memory as IORESOURCE_MEM too.

This difference causes a failure of memory unplug of x86-64.  This patch
fixes it.

This patch adds IORESOURCE_BUSY to avoid potential overlap mapping by PCI
device.
Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
Cc: Luck, Tony" <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

887c3cb1

mm: speed up writeback ramp-up on clean systems · 5fce25a9

由 Peter Zijlstra 提交于 11月 14, 2007

We allow violation of bdi limits if there is a lot of room on the system.
Once we hit half the total limit we start enforcing bdi limits and bdi
ramp-up should happen. Doing it this way avoids many small writeouts on an
otherwise idle system and should also speed up the ramp-up.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5fce25a9

memory hotremove: unset migrate type "ISOLATE" after removal · dbc0e4ce

由 KAMEZAWA Hiroyuki 提交于 11月 14, 2007

We should unset migrate type "ISOLATE" when we successfully removed memory.
 But current code has BUG and cannot works well.

This patch also includes bugfix?  to change get_pageblock_flags to
get_pageblock_migratetype().

Thanks to Badari Pulavarty for finding this.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NBadari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dbc0e4ce

Migration: find correct vma in new_vma_page() · 3ad33b24

由 Lee Schermerhorn 提交于 11月 14, 2007

We hit the BUG_ON() in mm/rmap.c:vma_address() when trying to migrate via
mbind(MPOL_MF_MOVE) a non-anon region that spans multiple vmas. For
anon-regions, we just fail to migrate any pages beyond the 1st vma in the
range.

This occurs because do_mbind() collects a list of pages to migrate by
calling check_range(). check_range() walks the task's mm, spanning vmas as
necessary, to collect the migratable pages into a list. Then, do_mbind()
calls migrate_pages() passing the list of pages, a function to allocate new
pages based on vma policy [new_vma_page()], and a pointer to the first vma
of the range.

For each page in the list, new_vma_page() calls page_address_in_vma()
passing the page and the vma [first in range] to obtain the address to get
for alloc_page_vma(). The page address is needed to get interleaving
policy correct. If the pages in the list come from multiple vmas,
eventually, new_page_address() will pass that page to page_address_in_vma()
with the incorrect vma. For !PageAnon pages, this will result in a bug
check in rmap.c:vma_address(). For anon pages, vma_address() will just
return EFAULT and fail the migration.

This patch modifies new_vma_page() to check the return value from
page_address_in_vma(). If the return value is EFAULT, new_vma_page()
searchs forward via vm_next for the vma that maps the page--i.e., that does
not return EFAULT. This assumes that the pages in the list handed to
migrate_pages() is in address order. This is currently case. The patch
documents this assumption in a new comment block for new_vma_page().

If new_vma_page() cannot locate the vma mapping the page in a forward
search in the mm, it will pass a NULL vma to alloc_page_vma(). This will
result in the allocation using the task policy, if any, else system default
policy. This situation is unlikely, but the patch documents this behavior
with a comment.

Note, this patch results in restarting from the first vma in a multi-vma
range each time new_vma_page() is called. If this is not acceptable, we
can make the vma argument a pointer, both in new_vma_page() and it's caller
unmap_and_move() so that the value held by the loop in migrate_pages()
always passes down the last vma in which a page was found. This will
require changes to all new_page_t functions passed to migrate_pages(). Is
this necessary?

For this patch to work, we can't bug check in vma_address() for pages
outside the argument vma. This patch removes the BUG_ON(). All other
callers [besides new_vma_page()] already check the return status.

Tested on x86_64, 4 node NUMA platform.
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ad33b24

slab: fix typo in allocation failure handling · cc550def

由 Akinobu Mita 提交于 11月 14, 2007

This patch fixes wrong array index in allocation failure handling.

Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cc550def

13 11月, 2007 2 次提交

Revert "Bias the placement of kernel pages at lower PFNs" · 44048d70

由 Linus Torvalds 提交于 11月 12, 2007

This reverts commit 5adc5be7.

Alexey Dobriyan reports that it causes huge slowdowns under some loads,
in his case a "mkfs.ext2" on a 30G partition.  With the placement bias,
the mkfs took over four minutes, with it reverted it's back to about ten
seconds for Alexey.
Reported-and-tested-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

44048d70

SLUB: killed the unused "end" variable · efe44183

由 Denis Cheng 提交于 11月 13, 2007

Since the macro "for_each_object" introduced, the "end" variable becomes unused anymore.
Signed-off-by: NDenis Cheng <crquan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

efe44183

06 11月, 2007 1 次提交

SLUB: Fix memory leak by not reusing cpu_slab · 05aa3450

由 Christoph Lameter 提交于 11月 05, 2007

Fix the memory leak that may occur when we attempt to reuse a cpu_slab
that was allocated while we reenabled interrupts in order to be able to
grow a slab cache.

The per cpu freelist may contain objects and in that situation we may
overwrite the per cpu freelist pointer loosing objects.  This only
occurs if we find that the concurrently allocated slab fits our
allocation needs.

If we simply always deactivate the slab then the freelist will be
properly reintegrated and the memory leak will go away.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Acked-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

05aa3450

05 11月, 2007 1 次提交

unexport access_process_vm · 02c3530d

由 Adrian Bunk 提交于 11月 02, 2007

This patch removes the no longer used EXPORT_SYMBOL_GPL(access_process_vm).
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

02c3530d

01 11月, 2007 1 次提交

Remove broken ptrace() special-case code from file mapping · 5307cc1a

由 Linus Torvalds 提交于 10月 31, 2007

The kernel has for random historical reasons allowed ptrace() accesses
to access (and insert) pages into the page cache above the size of the
file.

However, Nick broke that by mistake when doing the new fault handling in
commit 54cb8821 ("mm: merge populate and
nopage into fault (fixes nonlinear)".  The breakage caused a hang with
gdb when trying to access the invalid page.

The ptrace "feature" really isn't worth resurrecting, since it really is
wrong both from a portability _and_ from an internal page cache validity
standpoint.  So this removes those old broken remnants, and fixes the
ptrace() hang in the process.

Noticed and bisected by Duane Griffin, who also supplied a test-case
(quoth Nick: "Well that's probably the best bug report I've ever had,
thanks Duane!").

Cc: Duane Griffin <duaneg@dghda.com>
Acked-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5307cc1a

31 10月, 2007 1 次提交

dio: fix cache invalidation after sync writes · bdb76ef5

由 Zach Brown 提交于 10月 30, 2007

Commit commit 65b8291c ("dio: invalidate
clean pages before dio write") introduced a bug which stopped dio from
ever invalidating the page cache after writes.  It still invalidated it
before writes so most users were fine.

Karl Schendel reported ( http://lkml.org/lkml/2007/10/26/481 ) hitting
this bug when he had a buffered reader immediately reading file data
after an O_DIRECT wirter had written the data.  The kernel issued
read-ahead beyond the position of the reader which overlapped with the
O_DIRECT writer.  The failure to invalidate after writes caused the
reader to see stale data from the read-ahead.

The following patch is originally from Karl.  The following commentary
is his:

	The below 3rd try takes on your suggestion of just invalidating
	no matter what the retval from the direct_IO call.  I ran it
	thru the test-case several times and it has worked every time.
	The post-invalidate is probably still too early for async-directio,
	but I don't have a testcase for that;  just sync.  And, this
	won't be any worse in the async case.

I added a test to the aio-dio-regress repository which mimics Karl's IO
pattern.  It verifed the bad behaviour and that the patch fixed it.  I
agree with Karl, this still doesn't help the case where a buffered
reader follows an AIO O_DIRECT writer.  That will require a bit more
work.

This gives up on the idea of returning EIO to indicate to userspace that
stale data remains if the invalidation failed.
Signed-off-by: NZach Brown <zach.brown@oracle.com>
Cc: Karl Schendel <kschendel@datallegro.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Leonid Ananiev <leonid.i.ananiev@linux.intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bdb76ef5

30 10月, 2007 3 次提交

fix tmpfs BUG and AOP_WRITEPAGE_ACTIVATE · 487e9bf2

由 Hugh Dickins 提交于 10月 29, 2007

It's possible to provoke unionfs (not yet in mainline, though in mm and
some distros) to hit shmem_writepage's BUG_ON(page_mapped(page)).  I expect
it's possible to provoke the 2.6.23 ecryptfs in the same way (but the
2.6.24 ecryptfs no longer calls lower level's ->writepage).

This came to light with the recent find that AOP_WRITEPAGE_ACTIVATE could
leak from tmpfs via write_cache_pages and unionfs to userspace.  There's
already a fix (e4230030 - writeback: don't
propagate AOP_WRITEPAGE_ACTIVATE) in the tree for that, and it's okay so
far as it goes; but insufficient because it doesn't address the underlying
issue, that shmem_writepage expects to be called only by vmscan (relying on
backing_dev_info capabilities to prevent the normal writeback path from
ever approaching it).

That's an increasingly fragile assumption, and ramdisk_writepage (the other
source of AOP_WRITEPAGE_ACTIVATEs) is already careful to check
wbc->for_reclaim before returning it.  Make the same check in
shmem_writepage, thereby sidestepping the page_mapped BUG also.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: <stable@kernel.org>
Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

487e9bf2

mm/sparse-vmemmap.c: make sure init_mm is included · 8bca44bb

由 Glauber de Oliveira Costa 提交于 10月 29, 2007

mm/sparse-vmemmap.c uses init_mm in some places. However, it is not
present in any of the headers currently included in the file.

init_mm is defined as extern in sched.h, so we add it to the headers list

Up to now, this problem was masked by the fact that functions like
set_pte_at() and pmd_populate_kernel() are usually macros that expand to
simpler variants that does not use the first parameter at all.
Signed-off-by: NGlauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8bca44bb

Revert "x86_64: allocate sparsemem memmap above 4G" · 6a22c57b

由 Linus Torvalds 提交于 10月 29, 2007

This reverts commit 2e1c49db.

First off, testing in Fedora has shown it to cause boot failures,
bisected down by Martin Ebourne, and reported by Dave Jobes.  So the
commit will likely be reverted in the 2.6.23 stable kernels.

Secondly, in the 2.6.24 model, x86-64 has now grown support for
SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the
bug is not visible any more, it's become invisible due to the code just
being irrelevant and no longer enabled on the only architecture that
this ever affected.
Reported-by: NDave Jones <davej@redhat.com>
Tested-by: NMartin Ebourne <fedora@ebourne.me.uk>
Cc: Zou Nan hai <nanhai.zou@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6a22c57b

29 10月, 2007 3 次提交

NOMMU: mm/nommu.c needs linux/module.h · f2b8544f

由 David Howells 提交于 10月 29, 2007

mm/nommu.c needs to #include linux/module.h for it to understand EXPORT_*()
macros.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2b8544f

missing atomic_read_long() in slub.c · 27bb628a

由 Al Viro 提交于 10月 29, 2007

nr_slabs is atomic_long_t, not atomic_t
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

27bb628a

Fix a build error when BLOCK=n · 3a424f2d

由 Emil Medve 提交于 10月 24, 2007

mm/filemap.c: In function '__filemap_fdatawrite_range':
mm/filemap.c:200: error: implicit declaration of function
'mapping_cap_writeback_dirty'

This happens when we don't use/have any block devices and a NFS root
filesystem is used.

mapping_cap_writeback_dirty() is defined in linux/backing-dev.h which
used to be provided in mm/filemap.c by linux/blkdev.h until commit
f5ff8422 (Fix warnings with
!CONFIG_BLOCK).
Signed-off-by: NEmil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3a424f2d

23 10月, 2007 1 次提交

fix mprotect vma_wants_writenotify prot · 1ddd439e

由 Hugh Dickins 提交于 10月 22, 2007

Fix mprotect bug in recent commit 3ed75eb8
(setup vma->vm_page_prot by vm_get_page_prot()): the vma_wants_writenotify
case was setting the same prot as when not.

Nothing wrong with the use of protection_map[] in mmap_region(),
but use vm_get_page_prot() there too in the same ~VM_SHARED way.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Cc: Coly Li <coyli@suse.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1ddd439e

22 10月, 2007 2 次提交

exportfs: make struct export_operations const · 39655164

由 Christoph Hellwig 提交于 10月 21, 2007

Now that nfsd has stopped writing to the find_exported_dentry member we an
mark the export_operations const
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <linux-ext4@vger.kernel.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: David Chinner <dgc@sgi.com>
Cc: Timothy Shimmin <tes@sgi.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Chris Mason <mason@suse.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: "Vladimir V. Saveliev" <vs@namesys.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

39655164

shmem: new export ops · 480b116c

由 Christoph Hellwig 提交于 10月 21, 2007

I'm not sure what people were thinking when adding support to export tmpfs,
but here's the conversion anyway:
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: Neil Brown <neilb@suse.de>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

480b116c

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功