提交 · 38510754a50192a072210e24fdc4ae65592182f0 · openanolis / cloud-kernel

02 7月, 2008 1 次提交

avr32: Use a quicklist for PTE allocation as well · 38510754

由 Haavard Skinnemoen 提交于 1月 14, 2008

Using a quicklist to allocate PTEs might be slightly faster than using
the page allocator directly since we might avoid zeroing the page
after each allocation.
Signed-off-by: NHaavard Skinnemoen <haavard.skinnemoen@atmel.com>

38510754

28 4月, 2008 1 次提交

PAGEFLAGS_EXTENDED and separate page flags for Head and Tail · e20b8cca

由 Christoph Lameter 提交于 4月 28, 2008

Having separate page flags for the head and the tail of a compound page allows
the compiler to use bitops instead of operations on a word to check for a tail
page.  That is f.e.  important for virt_to_head_page() which is used in
various critical code paths (kfree for example):

Code for PageTail(page)

Before:

 mov    (%rdi),%rdx		page->flags
 mov    %rdx,%rax		3 bytes
 and    $0x12000,%eax		5 bytes
 cmp    $0x12000,%rax		6 bytes
 je     897 <kfree+0xa7>

After:

 mov    (%rdi),%rax
 test   $0x40,%ah			(3 bytes)
 jne    887 <kfree+0x97>

So we go from 14 bytes to 3 bytes and from 3 instructions to one.  From the
use of 2 registers we go to none.

We can only use page flags for this if we have page flags available.  This
patch introduces CONFIG_PAGEFLAGS_EXTENDED that is set if pageflags are not
scarce due to SPARSEMEM using page flags for its sectionid on 32 bit NUMA
platforms.

Additional page flag definitions can be added to the CONFIG_PAGEFLAGS_EXTENDED
section in page-flags.h if the functionality depends on PAGEFLAGS_EXTENDED or
if more page flag overlapping tricks are used for the !PAGEFLAGS_EXTENDED
fallback (the upcoming virtual compound patch may hook in here and Rik's/Lee's
additional page flags to solve the reclaim issues could also be added there
[hint...  hint...  where are these patchsets?]).

Avoiding the overlaying of Pg_reclaim also clears the way for possible use of
compound pages for the pagecache or on the LRU.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e20b8cca

28 1月, 2008 1 次提交
- P
  sh: Bump number of quicklists for SH-5. · d5f68c6d
  由 Paul Mundt 提交于 11月 22, 2007
```
Sync up with the SH definitions.
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
```
  d5f68c6d
18 12月, 2007 1 次提交

sparsemem: make SPARSEMEM_VMEMMAP selectable · a5ee6daa

由 Geoff Levand 提交于 12月 17, 2007

SPARSEMEM_VMEMMAP needs to be a selectable config option to support
building the kernel both with and without sparsemem vmemmap support.  This
selection is desirable for platforms which could be configured one way for
platform specific builds and the other for multi-platform builds.
Signed-off-by: NMiguel Botón <mboton@gmail.com>
Signed-off-by: NGeoff Levand <geoffrey.levand@am.sony.com>
Acked-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5ee6daa

20 10月, 2007 1 次提交
- P
  small documentation fixes · ad3d0a38
  由 Philipp Marek 提交于 10月 20, 2007
```
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
```
  ad3d0a38
17 10月, 2007 3 次提交

xen: lock pte pages while pinning/unpinning · 74260714

由 Jeremy Fitzhardinge 提交于 10月 16, 2007

When a pagetable is created, it is made globally visible in the rmap
prio tree before it is pinned via arch_dup_mmap(), and remains in the
rmap tree while it is unpinned with arch_exit_mmap().

This means that other CPUs may race with the pinning/unpinning
process, and see a pte between when it gets marked RO and actually
pinned, causing any pte updates to fail with write-protect faults.

As a result, all pte pages must be properly locked, and only unlocked
once the pinning/unpinning process has finished.

In order to avoid taking spinlocks for the whole pagetable - which may
overflow the PREEMPT_BITS portion of preempt counter - it locks and pins
each pte page individually, and then finally pins the whole pagetable.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickens <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Keir Fraser <keir@xensource.com>
Cc: Jan Beulich <jbeulich@novell.com>

74260714

memory unplug: page offline · 0c0e6195

由 KAMEZAWA Hiroyuki 提交于 10月 16, 2007

Logic.
 - set all pages in  [start,end)  as isolated migration-type.
   by this, all free pages in the range will be not-for-use.
 - Migrate all LRU pages in the range.
 - Test all pages in the range's refcnt is zero or not.

Todo:
 - allocate migration destination page from better area.
 - confirm page_count(page)== 0 && PageReserved(page) page is safe to be freed..
 (I don't like this kind of page but..
 - Find out pages which cannot be migrated.
 - more running tests.
 - Use reclaim for unplugging other memory type area.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0c0e6195

vmemmap: generify initialisation via helpers · 29c71111

由 Andy Whitcroft 提交于 10月 16, 2007

Convert the common vmemmap population into initialisation helpers for use by
architecture vmemmap populators.  All architecture implementing the
SPARSEMEM_VMEMMAP variant supply an architecture specific vmemmap_populate()
initialiser, which may make use of the helpers.

This allows us to clean up and remove the initialisation Kconfig entries.
With this patch there is a single SPARSEMEM_VMEMMAP_ENABLE Kconfig option to
indicate use of that variant.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29c71111

07 10月, 2007 1 次提交

xen: disable split pte locks for now · 67dd5a25

由 Jeremy Fitzhardinge 提交于 10月 05, 2007

When pinning and unpinning pagetables, we must protect them against
being used by other CPUs, lest they see the pagetable in an
intermediate read-only-but-not-pinned state.

When using split pte locks, doing this properly would require taking
all the pte locks for the pagetable while pinning, but this may overflow
the PREEMPT_BITS part of the preempt counter if the process has mapped
more than about 512M of memory.

However, failing to take the pte locks causes write-protect faults when
the pageout code is trying to clear the Access bit on a pte which is part
of a freshy created and still being pinned process after fork.

This is a short-term fix until the problem is solved properly.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NHugh Dickins <hugh@veritas.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Keir Fraser <keir@xensource.com>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

67dd5a25

30 7月, 2007 1 次提交

Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION · b0cb1a19

由 Rafael J. Wysocki 提交于 7月 29, 2007

Replace CONFIG_SOFTWARE_SUSPEND with CONFIG_HIBERNATION to avoid
confusion (among other things, with CONFIG_SUSPEND introduced in the
next patch).
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b0cb1a19

18 7月, 2007 1 次提交

CONFIG_BOUNCE to avoid useless inclusion of bounce buffer logic · 2a7326b5

由 Christoph Lameter 提交于 7月 17, 2007

The bounce buffer logic is included on systems that do not need it.  If a
system does not have zones like ZONE_DMA and ZONE_HIGHMEM that can lead to
the use of bounce buffers then there is no need to reserve memory pools etc
etc.  This is true f.e.  for SGI Altix.

Also nicifies the Makefile and gets rid of the tricky "and" there.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Acked-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2a7326b5

17 7月, 2007 1 次提交

Introduce CONFIG_VIRT_TO_BUS · f057eac0

由 Stephen Rothwell 提交于 7月 15, 2007

Make some offending drivers depend on it and set CONFIG_ARCH_NO_VIRT_TO_BUS
for ppc64 so that we don't build those drivers.

This gets PowerPC allmodconfig and allyesconfig much closer to building.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Cc: Al Viro <viro@ftp.linux.org.uk>
Acked-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f057eac0

08 6月, 2007 1 次提交

sh: memory hot-add for sparsemem users support. · 33d63bd8

由 Paul Mundt 提交于 6月 07, 2007

This enables simple hotplug support for sparsemem users. Presently
this only permits memory being added in to node 0 on ZONE_NORMAL.
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

33d63bd8

14 5月, 2007 1 次提交
- P
  sh64: generic quicklist support. · 6c645ac7
  由 Paul Mundt 提交于 5月 14, 2007
```
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
```
  6c645ac7
09 5月, 2007 1 次提交

sh: generic quicklist support. · 5f8c9908

由 Paul Mundt 提交于 5月 08, 2007

This moves SH over to the generic quicklists. As per x86_64,
we have special mappings for the PGDs, so these go on their
own list..
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>

5f8c9908

08 5月, 2007 1 次提交

Quicklists for page table pages · 6225e937

由 Christoph Lameter 提交于 5月 06, 2007

On x86_64 this cuts allocation overhead for page table pages down to a
fraction (kernel compile / editing load.  TSC based measurement of times spend
in each function):

no quicklist

pte_alloc               1569048 4.3s(401ns/2.7us/179.7us)
pmd_alloc                780988 2.1s(337ns/2.7us/86.1us)
pud_alloc                780072 2.2s(424ns/2.8us/300.6us)
pgd_alloc                260022 1s(920ns/4us/263.1us)

quicklist:

pte_alloc                452436 573.4ms(8ns/1.3us/121.1us)
pmd_alloc                196204 174.5ms(7ns/889ns/46.1us)
pud_alloc                195688 172.4ms(7ns/881ns/151.3us)
pgd_alloc                 65228 9.8ms(8ns/150ns/6.1us)

pgd allocations are the most complex and there we see the most dramatic
improvement (may be we can cut down the amount of pgds cached somewhat?).  But
even the pte allocations still see a doubling of performance.

1. Proven code from the IA64 arch.

	The method used here has been fine tuned for years and
	is NUMA aware. It is based on the knowledge that accesses
	to page table pages are sparse in nature. Taking a page
	off the freelists instead of allocating a zeroed pages
	allows a reduction of number of cachelines touched
	in addition to getting rid of the slab overhead. So
	performance improves. This is particularly useful if pgds
	contain standard mappings. We can save on the teardown
	and setup of such a page if we have some on the quicklists.
	This includes avoiding lists operations that are otherwise
	necessary on alloc and free to track pgds.

2. Light weight alternative to use slab to manage page size pages

	Slab overhead is significant and even page allocator use
	is pretty heavy weight. The use of a per cpu quicklist
	means that we touch only two cachelines for an allocation.
	There is no need to access the page_struct (unless arch code
	needs to fiddle around with it). So the fast past just
	means bringing in one cacheline at the beginning of the
	page. That same cacheline may then be used to store the
	page table entry. Or a second cacheline may be used
	if the page table entry is not in the first cacheline of
	the page. The current code will zero the page which means
	touching 32 cachelines (assuming 128 byte). We get down
	from 32 to 2 cachelines in the fast path.

3. x86_64 gets lightweight page table page management.

	This will allow x86_64 arch code to faster repopulate pgds
	and other page table entries. The list operations for pgds
	are reduced in the same way as for i386 to the point where
	a pgd is allocated from the page allocator and when it is
	freed back to the page allocator. A pgd can pass through
	the quicklists without having to be reinitialized.

64 Consolidation of code from multiple arches

	So far arches have their own implementation of quicklist
	management. This patch moves that feature into the core allowing
	an easier maintenance and consistent management of quicklists.

Page table pages have the characteristics that they are typically zero or in a
known state when they are freed.  This is usually the exactly same state as
needed after allocation.  So it makes sense to build a list of freed page
table pages and then consume the pages already in use first.  Those pages have
already been initialized correctly (thus no need to zero them) and are likely
already cached in such a way that the MMU can use them most effectively.  Page
table pages are used in a sparse way so zeroing them on allocation is not too
useful.

Such an implementation already exits for ia64.  Howver, that implementation
did not support constructors and destructors as needed by i386 / x86_64.  It
also only supported a single quicklist.  The implementation here has
constructor and destructor support as well as the ability for an arch to
specify how many quicklists are needed.

Quicklists are defined by an arch defining CONFIG_QUICKLIST.  If more than one
quicklist is necessary then we can define NR_QUICK for additional lists.  F.e.
 i386 needs two and thus has

config NR_QUICK
	int
	default 2

If an arch has requested quicklist support then pages can be allocated
from the quicklist (or from the page allocator if the quicklist is
empty) via:

quicklist_alloc(<quicklist-nr>, <gfpflags>, <constructor>)

Page table pages can be freed using:

quicklist_free(<quicklist-nr>, <destructor>, <page>)

Pages must have a definite state after allocation and before
they are freed. If no constructor is specified then pages
will be zeroed on allocation and must be zeroed before they are
freed.

If a constructor is used then the constructor will establish
a definite page state. F.e. the i386 and x86_64 pgd constructors
establish certain mappings.

Constructors and destructors can also be used to track the pages.
i386 and x86_64 use a list of pgds in order to be able to dynamically
update standard mappings.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6225e937

12 2月, 2007 3 次提交

[PATCH] Set CONFIG_ZONE_DMA for arches with GENERIC_ISA_DMA · 5ac6da66

由 Christoph Lameter 提交于 2月 10, 2007

As Andi pointed out: CONFIG_GENERIC_ISA_DMA only disables the ISA DMA
channel management. Other functionality may still expect GFP_DMA to
provide memory below 16M. So we need to make sure that CONFIG_ZONE_DMA is
set independent of CONFIG_GENERIC_ISA_DMA. Undo the modifications to
mm/Kconfig where we made ZONE_DMA dependent on GENERIC_ISA_DMA and set
theses explicitly in each arches Kconfig.

Reviews must occur for each arch in order to determine if ZONE_DMA can be
switched off. It can only be switched off if we know that all devices
supported by a platform are capable of performing DMA transfers to all of
memory (Some arches already support this: uml, avr32, sh sh64, parisc and
IA64/Altix).

In order to switch ZONE_DMA off conditionally, one would have to establish
a scheme by which one can assure that no drivers are enabled that are only
capable of doing I/O to a part of memory, or one needs to provide an
alternate means of performing an allocation from a specific range of memory
(like provided by alloc_pages_range()) and insure that all drivers use that
call. In that case the arches alloc_dma_coherent() may need to be modified
to call alloc_pages_range() instead of relying on GFP_DMA.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ac6da66

[PATCH] optional ZONE_DMA: optional ZONE_DMA in the VM · 4b51d669

由 Christoph Lameter 提交于 2月 10, 2007

Make ZONE_DMA optional in core code.

- ifdef all code for ZONE_DMA and related definitions following the example
  for ZONE_DMA32 and ZONE_HIGHMEM.

- Without ZONE_DMA, ZONE_HIGHMEM and ZONE_DMA32 we get to a ZONES_SHIFT of
  0.

- Modify the VM statistics to work correctly without a DMA zone.

- Modify slab to not create DMA slabs if there is no ZONE_DMA.

[akpm@osdl.org: cleanup]
[jdike@addtoit.com: build fix]
[apw@shadowen.org: Simplify calculation of the number of bits we need for ZONES_SHIFT]
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NJeff Dike <jdike@addtoit.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4b51d669

[PATCH] optional ZONE_DMA: introduce CONFIG_ZONE_DMA · 66701b14

由 Christoph Lameter 提交于 2月 10, 2007

This patch simply defines CONFIG_ZONE_DMA for all arches.  We later do special
things with CONFIG_ZONE_DMA after the VM and an arch are prepared to work
without ZONE_DMA.

CONFIG_ZONE_DMA can be defined in two ways depending on how an architecture
handles ISA DMA.

First if CONFIG_GENERIC_ISA_DMA is set by the arch then we know that the arch
needs ZONE_DMA because ISA DMA devices are supported.  We can catch this in
mm/Kconfig and do not need to modify arch code.

Second, arches may use ZONE_DMA in an unknown way.  We set CONFIG_ZONE_DMA for
all arches that do not set CONFIG_GENERIC_ISA_DMA in order to insure backwards
compatibility.  The arches may later undefine ZONE_DMA if their arch code has
been verified to not depend on ZONE_DMA.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Matthew Wilcox <willy@debian.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

66701b14

04 10月, 2006 2 次提交

Fix "can not" in Documentation and Kconfig · 84eb8d06

由 Matt LaPlante 提交于 10月 03, 2006

Randy brought it to my attention that in proper english "can not" should always
be written "cannot". I donot see any reason to argue, even if I mightnot
understand why this rule exists.  This patch fixes "can not" in several
Documentation files as well as three Kconfigs.
Signed-off-by: NMatt LaPlante <kernel1@cyberdogtech.com>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

84eb8d06

M
more misc typo fixes · 44c09201
由 Matt LaPlante 提交于 10月 03, 2006
```
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
```
44c09201

01 10月, 2006 1 次提交

[PATCH] hot-add-mem x86_64: Kconfig changes · ec69acbb

由 Keith Mannthey 提交于 9月 30, 2006

Create Kconfig namespace for MEMORY_HOTPLUG_RESERVE and MEMORY_HOTPLUG_SPARSE.
 This is needed to create a disticiton between the 2 paths.  Selecting the
high level opiton of MEMORY_HOTPLUG will get you MEMORY_HOTPLUG_SPARSE if you
have sparsemem enabled or MEMORY_HOTPLUG_RESERVE if you are x86_64 with
discontig and ACPI numa support.
Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ec69acbb

30 6月, 2006 1 次提交

[PATCH] solve config broken: undefined reference to `online_page' · cc57637b

由 Yasunori Goto 提交于 6月 29, 2006

Memory hotplug code of i386 adds memory to only highmem.  So, if
CONFIG_HIGHMEM is not set, CONFIG_MEMORY_HOTPLUG shouldn't be set.
Otherwise, it causes compile error.

In addition, many architecture can't use memory hotplug feature yet.  So, I
introduce CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG.
Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

cc57637b

28 6月, 2006 2 次提交

[PATCH] sparc64: support sparsemem and !memory hotplug · 1f04bbd2

由 Yasunori Goto 提交于 6月 27, 2006

Fix "undefined reference to `arch_add_memory'" on sparc64 allmodconfig.

sparc64 doesn't support memory hotplug.  But we want it to support
sparsemem.
Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1f04bbd2

[PATCH] 64bit Resource: finally enable 64bit resource sizes · 6550e07f

由 Greg Kroah-Hartman 提交于 6月 12, 2006

Introduce the Kconfig entry and actually switch to a 64bit value, if
wanted, for resource_size_t.

Based on a patch series originally from Vivek Goyal <vgoyal@in.ibm.com>

Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

6550e07f

23 6月, 2006 1 次提交

[PATCH] Swapless page migration: modify core logic · 6c5240ae

由 Christoph Lameter 提交于 6月 23, 2006

Use the migration entries for page migration

This modifies the migration code to use the new migration entries.  It now
becomes possible to migrate anonymous pages without having to add a swap
entry.

We add a couple of new functions to replace migration entries with the proper
ptes.

We cannot take the tree_lock for migrating anonymous pages anymore.  However,
we know that we hold the only remaining reference to the page when the page
count reaches 1.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6c5240ae

26 3月, 2006 1 次提交

[PATCH] mm: make page migration dependent on swap and NUMA · d784124c

由 Christoph Lameter 提交于 3月 25, 2006

The page migration code could function without NUMA but we currently have
no users for the non-NUMA case.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d784124c

22 3月, 2006 1 次提交

[PATCH] page migration reorg · b20a3503

由 Christoph Lameter 提交于 3月 22, 2006

Centralize the page migration functions in anticipation of additional
tinkering.  Creates a new file mm/migrate.c

1. Extract buffer_migrate_page() from fs/buffer.c

2. Extract central migration code from vmscan.c

3. Extract some components from mempolicy.c

4. Export pageout() and remove_from_swap() from vmscan.c

5. Make it possible to configure NUMA systems without page migration
   and non-NUMA systems with page migration.

I had to so some #ifdeffing in mempolicy.c that may need a cleanup.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b20a3503

09 1月, 2006 1 次提交

[PATCH] Swap Migration V5: Add CONFIG_MIGRATION for page migration support · 7cbe34cf

由 Christoph Lameter 提交于 1月 08, 2006

Include page migration if the system is NUMA or having a memory model that
allows distinct areas of memory (SPARSEMEM, DISCONTIGMEM).

And:
- Only include lru_add_drain_per_cpu if building for an SMP system.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7cbe34cf

07 1月, 2006 1 次提交

[PATCH] allow flatmem to be disabled when only sparsemem is implemented · c898ec16

由 Anton Blanchard 提交于 1月 06, 2006

On architectures that implement sparsemem but not discontigmem we want to
be able to hide the flatmem option in some cases.  On ppc64 for example,
when we select NUMA we must not select flatmem.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c898ec16

24 11月, 2005 1 次提交

[PATCH] mm: update split ptlock Kconfig · 7b6ac9df

由 Hugh Dickins 提交于 11月 23, 2005

Closer attention to the arithmetic shows that neither ppc64 nor sparc really
uses one page for multiple page tables: how on earth could they, while
pte_alloc_one returns just a struct page pointer, with no offset?

Well, arm26 manages it by returning a pte_t pointer cast to a struct page
pointer, harumph, then compensating in its pmd_populate. But arm26 is never
SMP, so it's not a problem for split ptlock either.

And the PA-RISC situation has been recently improved: CONFIG_PA20 works
without the 16-byte alignment which inflated its spinlock_t. But the current
union of spinlock_t with private does make the 7xxx struct page significantly
larger, even without debug, so disable its split ptlock.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7b6ac9df

07 11月, 2005 1 次提交

[PATCH] Suppress split ptlock on arches which may use one page for multiple page tables · 2d4b95f0

由 Hugh Dickins 提交于 11月 07, 2005

Suppress split ptlock on arches which may use one page for multiple page
tables.  Reconsider what better to do (particularly on ppc64) later on.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2d4b95f0

30 10月, 2005 2 次提交

[PATCH] memory hotplug: sysfs and add/remove functions · 3947be19

由 Dave Hansen 提交于 10月 29, 2005

This adds generic memory add/remove and supporting functions for memory
hotplug into a new file as well as a memory hotplug kernel config option.

Individual architecture patches will follow.

For now, disable memory hotplug when swsusp is enabled.  There's a lot of
churn there right now.  We'll fix it up properly once it calms down.
Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3947be19

[PATCH] mm: split page table lock · 4c21e2f2

由 Hugh Dickins 提交于 10月 29, 2005

Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.

This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)

In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.

Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.

There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4c21e2f2

18 9月, 2005 1 次提交

[PATCH] fix mm/Kconfig spelling · f3519f91

由 Dave Hansen 提交于 9月 16, 2005

Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f3519f91

05 9月, 2005 2 次提交

[PATCH] sparsemem extreme implementation · 3e347261

由 Bob Picco 提交于 9月 03, 2005

With cleanups from Dave Hansen <haveblue@us.ibm.com>

SPARSEMEM_EXTREME makes mem_section a one dimensional array of pointers to
mem_sections.  This two level layout scheme is able to achieve smaller
memory requirements for SPARSEMEM with the tradeoff of an additional shift
and load when fetching the memory section.  The current SPARSEMEM
implementation is a one dimensional array of mem_sections which is the
default SPARSEMEM configuration.  The patch attempts isolates the
implementation details of the physical layout of the sparsemem section
array.

SPARSEMEM_EXTREME requires bootmem to be functioning at the time of
memory_present() calls.  This is not always feasible, so architectures
which do not need it may allocate everything statically by using
SPARSEMEM_STATIC.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NBob Picco <bob.picco@hp.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3e347261

[PATCH] SPARSEMEM EXTREME · 802f192e

由 Bob Picco 提交于 9月 03, 2005

A new option for SPARSEMEM is ARCH_SPARSEMEM_EXTREME.  Architecture
platforms with a very sparse physical address space would likely want to
select this option.  For those architecture platforms that don't select the
option, the code generated is equivalent to SPARSEMEM currently in -mm.
I'll be posting a patch on ia64 ml which uses this new SPARSEMEM feature.

ARCH_SPARSEMEM_EXTREME makes mem_section a one dimensional array of
pointers to mem_sections.  This two level layout scheme is able to achieve
smaller memory requirements for SPARSEMEM with the tradeoff of an
additional shift and load when fetching the memory section.  The current
SPARSEMEM -mm implementation is a one dimensional array of mem_sections
which is the default SPARSEMEM configuration.  The patch attempts isolates
the implementation details of the physical layout of the sparsemem section
array.

ARCH_SPARSEMEM_EXTREME depends on 64BIT and is by default boolean false.

I've boot tested under aim load ia64 configured for ARCH_SPARSEMEM_EXTREME.
 I've also boot tested a 4 way Opteron machine with !ARCH_SPARSEMEM_EXTREME
and tested with aim.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NBob Picco <bob.picco@hp.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

802f192e

24 6月, 2005 3 次提交

[PATCH] sparsemem memory model · d41dee36

由 Andy Whitcroft 提交于 6月 23, 2005

Sparsemem abstracts the use of discontiguous mem_maps[].  This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems.  Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.

A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA.  When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.

Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous.  It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.

Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory.  This is what allows the mem_map[]
to be chopped up.

In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags.  Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations.  However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions.  Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.

One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags.  It might provide
speed increases on certain platforms and will be stored there if there is
room.  But, if out of room, an alternate (theoretically slower) mechanism is
used.

This patch introduces CONFIG_FLATMEM.  It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NMartin Bligh <mbligh@aracnet.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: NBob Picco <bob.picco@hp.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d41dee36

[PATCH] generify memory present · af705362

由 Andy Whitcroft 提交于 6月 23, 2005

Allow architectures to indicate that they will be providing hooks to indice
installed memory areas, memory_present().  Provide prototypes for the i386
implementation.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NMartin Bligh <mbligh@aracnet.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

af705362

[PATCH] mm/Kconfig: give DISCONTIG more help text · 785dcd44

由 Dave Hansen 提交于 6月 23, 2005

This gives DISCONTIGMEM a bit more help text to explain what it does, not just
when to choose it.
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

785dcd44

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功