提交 · e965f9630c651fa4249039fd4b80c9392d07a856 · openeuler / raspberrypi-kernel

02 2月, 2006 12 次提交

[PATCH] Direct Migration V9: Avoid writeback / page_migrate() method · e965f963

由 Christoph Lameter 提交于 2月 01, 2006

Migrate a page with buffers without requiring writeback

This introduces a new address space operation migratepage() that may be used
by a filesystem to implement its own version of page migration.

A version is provided that migrates buffers attached to pages.  Some
filesystems (ext2, ext3, xfs) are modified to utilize this feature.

The swapper address space operation are modified so that a regular
migrate_page() will occur for anonymous pages without writeback (migrate_pages
forces every anonymous page to have a swap entry).
Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e965f963

[PATCH] Direct Migration V9: remove_from_swap() to remove swap ptes · a3351e52

由 Christoph Lameter 提交于 2月 01, 2006

Add remove_from_swap

remove_from_swap() allows the restoration of the pte entries that existed
before page migration occurred for anonymous pages by walking the reverse
maps.  This reduces swap use and establishes regular pte's without the need
for page faults.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a3351e52

[PATCH] Direct Migration V9: migrate_pages() extension · a48d07af

由 Christoph Lameter 提交于 2月 01, 2006

Add direct migration support with fall back to swap.

Direct migration support on top of the swap based page migration facility.

This allows the direct migration of anonymous pages and the migration of file
backed pages by dropping the associated buffers (requires writeout).

Fall back to swap out if necessary.

The patch is based on lots of patches from the hotplug project but the code
was restructured, documented and simplified as much as possible.

Note that an additional patch that defines the migrate_page() method for
filesystems is necessary in order to avoid writeback for anonymous and file
backed pages.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a48d07af

[PATCH] Reclaim slab during zone reclaim · 2a16e3f4

由 Christoph Lameter 提交于 2月 01, 2006

If large amounts of zone memory are used by empty slabs then zone_reclaim
becomes uneffective.  This patch shakes the slab a bit.

The problem with this patch is that the slab reclaim is not containable to a
zone.  Thus slab reclaim may affect the whole system and be extremely slow.
This also means that we cannot determine how many pages were freed in this
zone.  Thus we need to go off node for at least one allocation.

The functionality is disabled by default.

We could modify the shrinkers to take a zone parameter but that would be quite
invasive.  Better ideas are welcome.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2a16e3f4

[PATCH] Zone reclaim: Allow modification of zone reclaim behavior · 1b2ffb78

由 Christoph Lameter 提交于 2月 01, 2006

In some situations one may want zone_reclaim to behave differently. For
example a process writing large amounts of memory will spew unto other nodes
to cache the writes if many pages in a zone become dirty. This may impact the
performance of processes running on other nodes.

Allowing writes during reclaim puts a stop to that behavior and throttles the
process by restricting the pages to the local zone.

Similarly one may want to contain processes to local memory by enabling
regular swap behavior during zone_reclaim. Off node memory allocation can
then be controlled through memory policies and cpusets.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1b2ffb78

[PATCH] zone_reclaim: configurable off node allocation period. · 2a11ff06

由 Christoph Lameter 提交于 2月 01, 2006

Currently the zone_reclaim code has a fixed window of 30 seconds of off node
allocations should a local zone have no unused pagecache pages left. Reclaim
will be attempted again after this timeout period to avoid repeated useless
scans for memory. This is also useful to established sufficiently large off
node allocation chunks to relieve the local node.

It may be beneficial to adjust that time period for some special situations.
For example if memory use was exceeding node capacity one may want to give up
for longer periods of time. If memory spikes intermittendly then one may want
to shorten the time period to reduce the number of off node allocations.

This patch allows just that....
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2a11ff06

[PATCH] zone_reclaim: partial scans instead of full scan · a92f7126

由 Christoph Lameter 提交于 2月 01, 2006

Instead of scanning all the pages in a zone, imitate real swap and scan
only a portion of the pages and gradually scan more if we do not free up
enough pages.  This avoids a zone suddenly loosing all unused pagecache
pages (we may after all access some of these again so they deserve another
chance) but it still frees up large chunks of memory if a zone only
contains unused pagecache pages.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a92f7126

[PATCH] zone_reclaim: do not unmap file backed pages · aa3f18b3

由 Christoph Lameter 提交于 2月 01, 2006

zone_reclaim should leave that to the real swapper.  We are only interested
in evicting unmapped pages.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

aa3f18b3

[PATCH] zone_reclaim: minor fixes · c84db23c

由 Christoph Lameter 提交于 2月 01, 2006

- If we only reclaim nr_pages then its okay to stay on node.
  Switch from > to >= for the comparison.

- vm_table[] entry for zone_reclaim_mode is a bit screwed up.

- Add empty lines around shrink_zone to show that this is the
  central function to be called.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c84db23c

[PATCH] mm: improve function of sc->may_writepage · 52a8363e

由 Christoph Lameter 提交于 2月 01, 2006

Make sc->may_writepage control the writeout behavior of shrink_list.

Remove the laptop_mode trick from shrink_list and instead set may_writepage
in try_to_free_pages properly.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

52a8363e

[PATCH] zone_reclaim: reclaim on memory only node support · 42c722d4

由 Christoph Lameter 提交于 2月 01, 2006

Zone reclaim is usually only run on the local node.  Headless nodes do not
have any local processors.  This patch checks for headless nodes and
performs zone reclaim on them.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

42c722d4

[PATCH] Optimize off-node performance of zone reclaim · 89288623

由 Christoph Lameter 提交于 2月 01, 2006

Ensure that the performance of off node pages stays the same as before.
Off node pagefault tests showed an 18% drop in performance without this
patch.

- Increase the timeout to 30 seconds to reduce the overhead.

- Move all code possible out of the off node hot path for zone reclaim
  (Sorry Andrew, the struct initialization had to be sacrificed).
  The read_page_state() bit us there.

- Check first for the timeout before any other checks.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

89288623

19 1月, 2006 3 次提交

[PATCH] Zone reclaim: Reclaim logic · 9eeff239

由 Christoph Lameter 提交于 1月 18, 2006

Some bits for zone reclaim exists in 2.6.15 but they are not usable. This
patch fixes them up, removes unused code and makes zone reclaim usable.

Zone reclaim allows the reclaiming of pages from a zone if the number of
free pages falls below the watermarks even if other zones still have enough
pages available. Zone reclaim is of particular importance for NUMA
machines. It can be more beneficial to reclaim a page than taking the
performance penalties that come with allocating a page on a remote zone.

Zone reclaim is enabled if the maximum distance to another node is higher
than RECLAIM_DISTANCE, which may be defined by an arch. By default
RECLAIM_DISTANCE is 20. 20 is the distance to another node in the same
component (enclosure or motherboard) on IA64. The meaning of the NUMA
distance information seems to vary by arch.

If zone reclaim is not successful then no further reclaim attempts will
occur for a certain time period (ZONE_RECLAIM_INTERVAL).

This patch was discussed before. See

http://marc.theaimsgroup.com/?l=linux-kernel&m=113519961504207&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113408418232531&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113389027420032&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113380938612205&w=2Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9eeff239

[PATCH] Zone reclaim: resurrect may_swap · f1fd1067

由 Christoph Lameter 提交于 1月 18, 2006

Zone reclaim has a huge impact on NUMA performance (f.e.  our maximum
throughput with XFS is raised from 4GB to 6GB/sec / page cache contamination
of numa nodes destroys locality if one just does a large copy operation which
results in performance dropping for good until reboot).

This patch:

Resurrect may_swap in struct scan_control
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f1fd1067

[PATCH] mm: migration page refcounting fix · 053837fc

由 Nick Piggin 提交于 1月 18, 2006

Migration code currently does not take a reference to target page
properly, so between unlocking the pte and trying to take a new
reference to the page with isolate_lru_page, anything could happen to
it.

Fix this by holding the pte lock until we get a chance to elevate the
refcount.

Other small cleanups while we're here.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

053837fc

09 1月, 2006 10 次提交

[PATCH] SwapMig: Switch error handling in migrate_pages to use -Exx · d0d96328

由 Christoph Lameter 提交于 1月 08, 2006

Use -Exxx instead of numeric return codes and cleanup the code in
migrate_pages() using -Exx error codes.

Consolidate successful migration handling
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d0d96328

[PATCH] SwapMig: Extend parameters for migrate_pages() · d4984711

由 Christoph Lameter 提交于 1月 08, 2006

Extend the parameters of migrate_pages() to allow the caller control over the
fate of successfully migrated or impossible to migrate pages.

Swap migration and direct migration will have the same interface after this
patch so that patches can be independently applied to the policy layer and the
core migration code.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d4984711

[PATCH] SwapMig: Drop unused pages immediately · ee27497d

由 Christoph Lameter 提交于 1月 08, 2006

Drop unused pages immediately

If a page is encountered that is only referenced by the migration code then
there is no reason to swap or migrate the page.  Release the page by calling
move_to_lru().
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ee27497d

[PATCH] SwapMig: add_to_swap() avoid atomic allocations · 1480a540

由 Christoph Lameter 提交于 1月 08, 2006

Add gfp_mask to add_to_swap

add_to_swap does allocations with GFP_ATOMIC in order not to interfere with
swapping.  During migration we may have use add_to_swap extensively which may
lead to out of memory errors.

This patch makes add_to_swap take a parameter that specifies the gfp mask.
The page migration code can then make add_to_swap use GFP_KERNEL.
Signed-off-by: NHirokazu Takahashi <taka@valinux.co.jp>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1480a540

[PATCH] SwapMig: CONFIG_MIGRATION fixes · 8419c318

由 Christoph Lameter 提交于 1月 08, 2006

Move move_to_lru, putback_lru_pages and isolate_lru in section surrounded by
CONFIG_MIGRATION saving some codesize for single processor kernels.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8419c318

[PATCH] Swap Migration V5: Add CONFIG_MIGRATION for page migration support · 7cbe34cf

由 Christoph Lameter 提交于 1月 08, 2006

Include page migration if the system is NUMA or having a memory model that
allows distinct areas of memory (SPARSEMEM, DISCONTIGMEM).

And:
- Only include lru_add_drain_per_cpu if building for an SMP system.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7cbe34cf

[PATCH] Swap Migration V5: migrate_pages() function · 49d2e9cc

由 Christoph Lameter 提交于 1月 08, 2006

This adds the basic page migration function with a minimal implementation that
only allows the eviction of pages to swap space.

Page eviction and migration may be useful to migrate pages, to suspend
programs or for remapping single pages (useful for faulty pages or pages with
soft ECC failures)

The process is as follows:

The function wanting to migrate pages must first build a list of pages to be
migrated or evicted and take them off the lru lists via isolate_lru_page().
isolate_lru_page determines that a page is freeable based on the LRU bit set.

Then the actual migration or swapout can happen by calling migrate_pages().

migrate_pages does its best to migrate or swapout the pages and does multiple
passes over the list.  Some pages may only be swappable if they are not dirty.
 migrate_pages may start writing out dirty pages in the initial passes over
the pages.  However, migrate_pages may not be able to migrate or evict all
pages for a variety of reasons.

The remaining pages may be returned to the LRU lists using putback_lru_pages().

Changelog V4->V5:
- Use the lru caches to return pages to the LRU

Changelog V3->V4:
- Restructure code so that applying patches to support full migration does
  require minimal changes. Rename swapout_pages() to migrate_pages().

Changelog V2->V3:
- Extract common code from shrink_list() and swapout_pages()
Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: "Michael Kerrisk" <mtk-manpages@gmx.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

49d2e9cc

[PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap · 930d9152

由 Christoph Lameter 提交于 1月 08, 2006

Add PF_SWAPWRITE to control a processes permission to write to swap.

- Use PF_SWAPWRITE in may_write_to_queue() instead of checking for kswapd
  and pdflush

- Set PF_SWAPWRITE flag for kswapd and pdflush
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

930d9152

[PATCH] Swap Migration V5: LRU operations · 21eac81f

由 Christoph Lameter 提交于 1月 08, 2006

This is the start of the `swap migration' patch series.

Swap migration allows the moving of the physical location of pages between
nodes in a numa system while the process is running.  This means that the
virtual addresses that the process sees do not change.  However, the system
rearranges the physical location of those pages.

The main intent of page migration patches here is to reduce the latency of
memory access by moving pages near to the processor where the process
accessing that memory is running.

The patchset allows a process to manually relocate the node on which its
pages are located through the MF_MOVE and MF_MOVE_ALL options while
setting a new memory policy.

The pages of process can also be relocated from another process using the
sys_migrate_pages() function call.  Requires CAP_SYS_ADMIN.  The migrate_pages
function call takes two sets of nodes and moves pages of a process that are
located on the from nodes to the destination nodes.

Manual migration is very useful if for example the scheduler has relocated a
process to a processor on a distant node.  A batch scheduler or an
administrator can detect the situation and move the pages of the process
nearer to the new processor.

sys_migrate_pages() could be used on non-numa machines as well, to force all
of a particualr process's pages out to swap, if someone thinks that's useful.

Larger installations usually partition the system using cpusets into sections
of nodes.  Paul has equipped cpusets with the ability to move pages when a
task is moved to another cpuset.  This allows automatic control over locality
of a process.  If a task is moved to a new cpuset then also all its pages are
moved with it so that the performance of the process does not sink
dramatically (as is the case today).

Swap migration works by simply evicting the page.  The pages must be faulted
back in.  The pages are then typically reallocated by the system near the node
where the process is executing.

For swap migration the destination of the move is controlled by the allocation
policy.  Cpusets set the allocation policy before calling sys_migrate_pages()
in order to move the pages as intended.

No allocation policy changes are performed for sys_migrate_pages().  This
means that the pages may not faulted in to the specified nodes if no
allocation policy was set by other means.  The pages will just end up near the
node where the fault occurred.

There's another patch series in the pipeline which implements "direct
migration".

The direct migration patchset extends the migration functionality to avoid
going through swap.  The destination node of the relation is controllable
during the actual moving of pages.  The crutch of using the allocation policy
to relocate is not necessary and the pages are moved directly to the target.
Its also faster since swap is not used.

And sys_migrate_pages() can then move pages directly to the specified node.
Implement functions to isolate pages from the LRU and put them back later.

This patch:

An earlier implementation was provided by Hirokazu Takahashi
<taka@valinux.co.jp> and IWAMOTO Toshihiro <iwamoto@valinux.co.jp> for the
memory hotplug project.

From: Magnus

This breaks out isolate_lru_page() and putpack_lru_page().  Needed for swap
migration.
Signed-off-by: NMagnus Damm <magnus.damm@gmail.com>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

21eac81f

[PATCH] drop-pagecache · 9d0243bc

由 Andrew Morton 提交于 1月 08, 2006

Add /proc/sys/vm/drop_caches.  When written to, this will cause the kernel to
discard as much pagecache and/or reclaimable slab objects as it can.  THis
operation requires root permissions.

It won't drop dirty data, so the user should run `sync' first.

Caveats:

a) Holds inode_lock for exorbitant amounts of time.

b) Needs to be taught about NUMA nodes: propagate these all the way through
   so the discarding can be controlled on a per-node basis.

This is a debugging feature: useful for getting consistent results between
filesystem benchmarks.  We could possibly put it under a config option, but
it's less than 300 bytes.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9d0243bc

07 1月, 2006 4 次提交

[PATCH] mm: page_state opt · a74609fa

由 Nick Piggin 提交于 1月 06, 2006

Optimise page_state manipulations by introducing interrupt unsafe accessors
to page_state fields.  Callers must provide their own locking (either
disable interrupts or not update from interrupt context).

Switch over the hot callsites that can easily be moved under interrupts off
sections.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a74609fa

[PATCH] mm: add populated_zone() helper · f3fe6512

由 Con Kolivas 提交于 1月 06, 2006

There are numerous places we check whether a zone is populated or not.

Provide a helper function to check for populated zones and convert all
checks for zone->present_pages.
Signed-off-by: NCon Kolivas <kernel@kolivas.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f3fe6512

[PATCH] vmscan: balancing fix · 210fe530

由 Andrew Morton 提交于 1月 06, 2006

Revert a patch which went into 2.6.8-rc1.  The changelog for that patch was:

  The shrink_zone() logic can, under some circumstances, cause far too many
  pages to be reclaimed.  Say, we're scanning at high priority and suddenly
  hit a large number of reclaimable pages on the LRU.

  Change things so we bale out when SWAP_CLUSTER_MAX pages have been
  reclaimed.

Problem is, this change caused significant imbalance in inter-zone scan
balancing by truncating scans of larger zones.

Suppose, for example, ZONE_HIGHMEM is 10x the size of ZONE_NORMAL.  The zone
balancing algorithm would require that if we're scanning 100 pages of
ZONE_HIGHMEM, we should scan 10 pages of ZONE_NORMAL.  But this logic will
cause the scanning of ZONE_HIGHMEM to bale out after only 32 pages are
reclaimed.  Thus effectively causing smaller zones to be scanned relatively
harder than large ones.

Now I need to remember what the workload was which caused me to write this
patch originally, then fix it up in a different way...
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

210fe530

[PATCH] kill last zone_reclaim() bits · 7756b9e4

由 Andrew Morton 提交于 1月 06, 2006

Remove the last bits of Martin's ill-fated sys_set_zone_reclaim().

Cc: Martin Hicks <mort@wildopensource.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7756b9e4

04 1月, 2006 1 次提交

[PATCH] add AOP_TRUNCATED_PAGE, prepend AOP_ to WRITEPAGE_ACTIVATE · 994fc28c

由 Zach Brown 提交于 12月 15, 2005

readpage(), prepare_write(), and commit_write() callers are updated to
understand the special return code AOP_TRUNCATED_PAGE in the style of
writepage() and WRITEPAGE_ACTIVATE. AOP_TRUNCATED_PAGE tells the caller that
the callee has unlocked the page and that the operation should be tried again
with a new page. OCFS2 uses this to detect and work around a lock inversion in
its aop methods. There should be no change in behaviour for methods that don't
return AOP_TRUNCATED_PAGE.

WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
made enums so that kerneldoc can be used to document their semantics.
Signed-off-by: NZach Brown <zach.brown@oracle.com>

994fc28c

29 11月, 2005 2 次提交

[PATCH] shrinker->nr = LONG_MAX means deadlock for icache · ea164d73

由 Andrea Arcangeli 提交于 11月 28, 2005

With Andrew Morton <akpm@osdl.org>

The slab scanning code tries to balance the scanning rate of slabs versus the
scanning rate of LRU pages.  To do this, it retains state concerning how many
slabs have been scanned - if a particular slab shrinker didn't scan enough
objects, we remember that for next time, and scan more objects on the next
pass.

The problem with this is that with (say) a huge number of GFP_NOIO
direct-reclaim attempts, the number of objects which are to be scanned when we
finally get a GFP_KERNEL request can be huge.  Because some shrinker handlers
just bail out if !__GFP_FS.

So the patch clamps the number of objects-to-be-scanned to 2* the total number
of objects in the slab cache.
Signed-off-by: NAndrea Arcangeli <andrea@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ea164d73

[PATCH] temporarily disable swap token on memory pressure · f7b7fd8f

由 Rik van Riel 提交于 11月 28, 2005

Some users (hi Zwane) have seen a problem when running a workload that
eats nearly all of physical memory - th system does an OOM kill, even
when there is still a lot of swap free.

The problem appears to be a very big task that is holding the swap
token, and the VM has a very hard time finding any other page in the
system that is swappable.

Instead of ignoring the swap token when sc->priority reaches 0, we could
simply take the swap token away from the memory hog and make sure we
don't give it back to the memory hog for a few seconds.

This patch resolves the problem Zwane ran into.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f7b7fd8f

14 11月, 2005 1 次提交

[PATCH] mm: __alloc_pages cleanup · 7fb1d9fc

由 Rohit Seth 提交于 11月 13, 2005

Clean up of __alloc_pages.

Restoration of previous behaviour, plus further cleanups by introducing an
'alloc_flags', removing the last of should_reclaim_zone.
Signed-off-by: NRohit Seth <rohit.seth@intel.com>
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7fb1d9fc

30 10月, 2005 2 次提交

[PATCH] mm: split page table lock · 4c21e2f2

由 Hugh Dickins 提交于 10月 29, 2005

Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
a many-threaded application which concurrently initializes different parts of
a large anonymous area.

This patch corrects that, by using a separate spinlock per page table page, to
guard the page table entries in that page, instead of using the mm's single
page_table_lock. (But even then, page_table_lock is still used to guard page
table allocation, and anon_vma allocation.)

In this implementation, the spinlock is tucked inside the struct page of the
page table page: with a BUILD_BUG_ON in case it overflows - which it would in
the case of 32-bit PA-RISC with spinlock debugging enabled.

Splitting the lock is not quite for free: another cacheline access. Ideally,
I suppose we would use split ptlock only for multi-threaded processes on
multi-cpu machines; but deciding that dynamically would have its own costs.
So for now enable it by config, at some number of cpus - since the Kconfig
language doesn't support inequalities, let preprocessor compare that with
NR_CPUS. But I don't think it's worth being user-configurable: for good
testing of both split and unsplit configs, split now at 4 cpus, and perhaps
change that to 8 later.

There is a benefit even for singly threaded processes: kswapd can be attacking
one part of the mm while another part is busy faulting.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4c21e2f2

[PATCH] shrink_list(): skip anon pages if not may_swap · c340010e

由 Lee Schermerhorn 提交于 10月 29, 2005

Martin Hicks' page cache reclaim patch added the 'may_swap' flag to the
scan_control struct; and modified shrink_list() not to add anon pages to
the swap cache if may_swap is not asserted.

Ref:  http://marc.theaimsgroup.com/?l=linux-mm&m=111461480725322&w=4

However, further down, if the page is mapped, shrink_list() calls
try_to_unmap() which will call try_to_unmap_one() via try_to_unmap_anon ().
 try_to_unmap_one() will BUG_ON() an anon page that is NOT in the swap
cache.  Martin says he never encountered this path in his testing, but
agrees that it might happen.

This patch modifies shrink_list() to skip anon pages that are not already
in the swap cache when !may_swap, rather than just not adding them to the
cache.
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c340010e

28 10月, 2005 1 次提交

[PATCH] gfp_t: mm/* (easy parts) · 6daa0e28

由 Al Viro 提交于 10月 21, 2005

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6daa0e28

17 10月, 2005 1 次提交

Fix memory ordering bug in page reclaim · 3d80636a

由 Linus Torvalds 提交于 10月 16, 2005

As noticed by Nick Piggin, we need to make sure that we check the page
count before we check for PageDirty, since the dirty check is only valid
if the count implies that we're the only possible ones holding the page.

We always did do this, but the code needs a read-memory-barrier to make
sure that the orderign is also honored by the CPU.

(The writer side is ordered due to the atomic decrement and test on the
page count, see the discussion on linux-kernel)
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3d80636a

13 9月, 2005 1 次提交

[PATCH] vm: kswapd cleanup: use pgdat · 8d0986e2

由 Con Kolivas 提交于 9月 13, 2005

Use the pgdat pointer we've already defined in wakeup_kswapd
Signed-off-by: NCon Kolivas <kernel@kolivas.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8d0986e2

08 9月, 2005 1 次提交

[PATCH] cpusets: formalize intermediate GFP_KERNEL containment · 9bf2229f

由 Paul Jackson 提交于 9月 06, 2005

This patch makes use of the previously underutilized cpuset flag
'mem_exclusive' to provide what amounts to another layer of memory placement
resolution. With this patch, there are now the following four layers of
memory placement available:

1) The whole system (interrupt and GFP_ATOMIC allocations can use this),
2) The nearest enclosing mem_exclusive cpuset (GFP_KERNEL allocations can use),
3) The current tasks cpuset (GFP_USER allocations constrained to here), and
4) Specific node placement, using mbind and set_mempolicy.

These nest - each layer is a subset (same or within) of the previous.

Layer (2) above is new, with this patch. The call used to check whether a
zone (its node, actually) is in a cpuset (in its mems_allowed, actually) is
extended to take a gfp_mask argument, and its logic is extended, in the case
that __GFP_HARDWALL is not set in the flag bits, to look up the cpuset
hierarchy for the nearest enclosing mem_exclusive cpuset, to determine if
placement is allowed. The definition of GFP_USER, which used to be identical
to GFP_KERNEL, is changed to also set the __GFP_HARDWALL bit, in the previous
cpuset_gfp_hardwall_flag patch.

GFP_ATOMIC and GFP_KERNEL allocations will stay within the current tasks
cpuset, so long as any node therein is not too tight on memory, but will
escape to the larger layer, if need be.

The intended use is to allow something like a batch manager to handle several
jobs, each job in its own cpuset, but using common kernel memory for caches
and such. Swapper and oom_kill activity is also constrained to Layer (2). A
task in or below one mem_exclusive cpuset should not cause swapping on nodes
in another non-overlapping mem_exclusive cpuset, nor provoke oom_killing of a
task in another such cpuset. Heavy use of kernel memory for i/o caching and
such by one job should not impact the memory available to jobs in other
non-overlapping mem_exclusive cpusets.

This patch enables providing hardwall, inescapable cpusets for memory
allocations of each job, while sharing kernel memory allocations between
several jobs, in an enclosing mem_exclusive cpuset.

Like Dinakar's patch earlier to enable administering sched domains using the
cpu_exclusive flag, this patch also provides a useful meaning to a cpuset flag
that had previously done nothing much useful other than restrict what cpuset
configurations were allowed.
Signed-off-by: NPaul Jackson <pj@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9bf2229f

05 9月, 2005 1 次提交

[PATCH] VM: zone reclaim atomic ops cleanup · 53e9a615

由 Martin Hicks 提交于 9月 03, 2005

Christoph Lameter and Marcelo Tosatti asked to get rid of the
atomic_inc_and_test() to cleanup the atomic ops in the zone reclaim code.
Signed-off-by: NMartin Hicks <mort@sgi.com>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

53e9a615