提交 · 88356e908521a84b2220c9e8850d1a356a851aa9 · openeuler / Kernel

19 1月, 2006 1 次提交

[PATCH] Zone reclaim: Reclaim logic · 9eeff239

由 Christoph Lameter 提交于 1月 18, 2006

Some bits for zone reclaim exists in 2.6.15 but they are not usable. This
patch fixes them up, removes unused code and makes zone reclaim usable.

Zone reclaim allows the reclaiming of pages from a zone if the number of
free pages falls below the watermarks even if other zones still have enough
pages available. Zone reclaim is of particular importance for NUMA
machines. It can be more beneficial to reclaim a page than taking the
performance penalties that come with allocating a page on a remote zone.

Zone reclaim is enabled if the maximum distance to another node is higher
than RECLAIM_DISTANCE, which may be defined by an arch. By default
RECLAIM_DISTANCE is 20. 20 is the distance to another node in the same
component (enclosure or motherboard) on IA64. The meaning of the NUMA
distance information seems to vary by arch.

If zone reclaim is not successful then no further reclaim attempts will
occur for a certain time period (ZONE_RECLAIM_INTERVAL).

This patch was discussed before. See

http://marc.theaimsgroup.com/?l=linux-kernel&m=113519961504207&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113408418232531&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113389027420032&w=2
http://marc.theaimsgroup.com/?l=linux-kernel&m=113380938612205&w=2Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9eeff239

17 1月, 2006 1 次提交

[PATCH] x86_64: add __meminit for memory hotplug · c09b4240

由 Matt Tolentino 提交于 1月 17, 2006

Add __meminit to the __init lineup to ensure functions default
to __init when memory hotplug is not enabled.  Replace __devinit
with __meminit on functions that were changed when the memory
hotplug code was introduced.
Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c09b4240

13 1月, 2006 1 次提交

[PATCH] memmap_init_zone(): remove uneccesary page++ · cbe8dd4a

由 Greg Ungerer 提交于 1月 12, 2006

Remove unecessary page++ from memmap_init_zone loop.
Signed-off-by: NGreg Ungerer <gerg@uclinux.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

cbe8dd4a

12 1月, 2006 3 次提交

[PATCH] mm: gfp_atomic comments · 4eac915d

由 Paul Jackson 提交于 1月 11, 2006

Clarify in comments that GFP_ATOMIC means both "don't sleep" and "use
emergency pools", hence both ALLOC_HARDER and ALLOC_HIGH.
Signed-off-by: NPaul Jackson <pj@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4eac915d

[PATCH] Restore KERN_EMERG to each line printed by bad_page · 7365f3d1

由 Hugh Dickins 提交于 1月 11, 2006

Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7365f3d1

[PATCH] fix/simplify mutex debugging code · a4fc7ab1

由 David Woodhouse 提交于 1月 11, 2006

Let's switch mutex_debug_check_no_locks_freed() to take (addr, len) as
arguments instead, since all its callers were just calculating the 'to'
address for themselves anyway... (and sometimes doing so badly).
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
Acked-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a4fc7ab1

10 1月, 2006 1 次提交

[PATCH] mutex subsystem, more debugging code · de5097c2

由 Ingo Molnar 提交于 1月 09, 2006

more mutex debugging: check for held locks during memory freeing,
task exit, enable sysrq printouts, etc.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NArjan van de Ven <arjan@infradead.org>

de5097c2

09 1月, 2006 6 次提交

[PATCH] cpuset: memory pressure meter · 3e0d98b9

由 Paul Jackson 提交于 1月 08, 2006

Provide a simple per-cpuset metric of memory pressure, tracking the -rate-
that the tasks in a cpuset call try_to_free_pages(), the synchronous
(direct) memory reclaim code.

This enables batch managers monitoring jobs running in dedicated cpusets to
efficiently detect what level of memory pressure that job is causing.

This is useful both on tightly managed systems running a wide mix of
submitted jobs, which may choose to terminate or reprioritize jobs that are
trying to use more memory than allowed on the nodes assigned them, and with
tightly coupled, long running, massively parallel scientific computing jobs
that will dramatically fail to meet required performance goals if they
start to use more memory than allowed to them.

This patch just provides a very economical way for the batch manager to
monitor a cpuset for signs of memory pressure.  It's up to the batch
manager or other user code to decide what to do about it and take action.

==> Unless this feature is enabled by writing "1" to the special file
    /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
    code of __alloc_pages() for this metric reduces to simply noticing
    that the cpuset_memory_pressure_enabled flag is zero.  So only
    systems that enable this feature will compute the metric.

Why a per-cpuset, running average:

    Because this meter is per-cpuset, rather than per-task or mm, the
    system load imposed by a batch scheduler monitoring this metric is
    sharply reduced on large systems, because a scan of the tasklist can be
    avoided on each set of queries.

    Because this meter is a running average, instead of an accumulating
    counter, a batch scheduler can detect memory pressure with a single
    read, instead of having to read and accumulate results for a period of
    time.

    Because this meter is per-cpuset rather than per-task or mm, the
    batch scheduler can obtain the key information, memory pressure in a
    cpuset, with a single read, rather than having to query and accumulate
    results over all the (dynamically changing) set of tasks in the cpuset.

A per-cpuset simple digital filter (requires a spinlock and 3 words of data
per-cpuset) is kept, and updated by any task attached to that cpuset, if it
enters the synchronous (direct) page reclaim code.

A per-cpuset file provides an integer number representing the recent
(half-life of 10 seconds) rate of direct page reclaims caused by the tasks
in the cpuset, in units of reclaims attempted per second, times 1000.
Signed-off-by: NPaul Jackson <pj@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3e0d98b9

[PATCH] mm: free_pages opt · 48db57f8

由 Nick Piggin 提交于 1月 08, 2006

Try to streamline free_pages_bulk by ensuring callers don't pass in a
'count' that exceeds the list size.

Some cleanups:
Rename __free_pages_bulk to __free_one_page.
Put the page list manipulation from __free_pages_ok into free_one_page.
Make __free_pages_ok static.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

48db57f8

[PATCH] mm: cleanup zone_pcp · 23316bc8

由 Nick Piggin 提交于 1月 08, 2006

Use zone_pcp everywhere even though NUMA code "knows" the internal details
of the zone.  Stop other people trying to copy, and it looks nicer.

Also, only print the pagesets of online cpus in zoneinfo.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

23316bc8

[PATCH] Make high and batch sizes of per_cpu_pagelists configurable · 8ad4b1fb

由 Rohit Seth 提交于 1月 08, 2006

As recently there has been lot of traffic on the right values for batch and
high water marks for per_cpu_pagelists.  This patch makes these two
variables configurable through /proc interface.

A new tunable /proc/sys/vm/percpu_pagelist_fraction is added.  This entry
controls the fraction of pages at most in each zone that are allocated for
each per cpu page list.  The min value for this is 8.  It means that we
don't allow more than 1/8th of pages in each zone to be allocated in any
single per_cpu_pagelist.

The batch value of each per cpu pagelist is also updated as a result.  It
is set to pcp->high/4.  The upper limit of batch is (PAGE_SHIFT * 8)
Signed-off-by: NRohit Seth <rohit.seth@intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8ad4b1fb

[PATCH] slab: remove nested #ifdef CONFIG_NUMA · bec6b0c8

由 Christoph Lameter 提交于 1月 08, 2006

For some reason there is an #ifdef CONFIG_NUMA within another #ifdef
CONFIG_NUMA in the page allocator.  Remove innermost #ifdef CONFIG_NUMA
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bec6b0c8

[PATCH] revert "mm: page_state fixes" · 84c2008a

由 Andrew Morton 提交于 1月 08, 2006

Hugh says:

page_alloc_cpu_notify() specifically contains code to

 		/* Add dead cpu's page_states to our own. */

which handles this more efficiently.

Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

84c2008a

07 1月, 2006 18 次提交

[PATCH] mm: page_state opt · a74609fa

由 Nick Piggin 提交于 1月 06, 2006

Optimise page_state manipulations by introducing interrupt unsafe accessors
to page_state fields.  Callers must provide their own locking (either
disable interrupts or not update from interrupt context).

Switch over the hot callsites that can easily be moved under interrupts off
sections.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a74609fa

[PATCH] build_zonelists_node(): rename args · 070f8032

由 Christoph Lameter 提交于 1月 06, 2006

Give j and r meaningful names.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

070f8032

[PATCH] Fix zone policy determination · 02a68a5e

由 Christoph Lameter 提交于 1月 06, 2006

The use k in the inner loop means that the highest zone nr is always used
if any zone of a node is populated.  This means that the policy zone is not
correctly determined on arches that do no use HIGHMEM like ia64.

Change the loop to decrement k which also simplifies the BUG_ON.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

02a68a5e

[PATCH] mm: move determination of policy_zone into page allocator · 4be38e35

由 Christoph Lameter 提交于 1月 06, 2006

Currently the function to build a zonelist for a BIND policy has the side
effect to set the policy_zone.  This seems to be a bit strange.  policy
zone seems to not be initialized elsewhere and therefore 0.  Do we police
ZONE_DMA if no bind policy has been used yet?

This patch moves the determination of the zone to apply policies to into
the page allocator.  We determine the zone while building the zonelist for
nodes.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4be38e35

[PATCH] mm: simplify build_zonelists_node by removing the case statement. · 1a93205b

由 Christoph Lameter 提交于 1月 06, 2006

Simplify build_zonelists_node by removing the case statement.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1a93205b

[PATCH] mm: add populated_zone() helper · f3fe6512

由 Con Kolivas 提交于 1月 06, 2006

There are numerous places we check whether a zone is populated or not.

Provide a helper function to check for populated zones and convert all
checks for zone->present_pages.
Signed-off-by: NCon Kolivas <kernel@kolivas.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f3fe6512

[PATCH] mm: bad_page optimisation · 224abf92

由 Nick Piggin 提交于 1月 06, 2006

Cut down size slightly by not passing bad_page the function name (it should be
able to be determined by dump_stack()).  And cut down the number of printks in
bad_page.

Also, cut down some branching in the destroy_compound_page path.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

224abf92

[PATCH] mm: dma32 zone statistics · 9328b8fa

由 Nick Piggin 提交于 1月 06, 2006

Add dma32 to zone statistics.  Also attempt to arrange struct page_state a
bit better (visually).
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

9328b8fa

[PATCH] FRV: Clean up bootmem allocator's page freeing algorithm · a226f6c8

由 David Howells 提交于 1月 06, 2006

The attached patch cleans up the way the bootmem allocator frees pages.

A new function, __free_pages_bootmem(), is provided in mm/page_alloc.c that is
called from mm/bootmem.c to turn pages over to the main allocator. All the
bits of code to initialise pages (clearing PG_reserved and setting the page
count) are moved to here. The checks on page validity are removed, on the
assumption that the struct page arrays will have been prepared correctly.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a226f6c8

[PATCH] mm: page_alloc cleanups · 085cc7d5

由 Nick Piggin 提交于 1月 06, 2006

Small cleanups that does not change generated code with the gcc's I've tested
with.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

085cc7d5

[PATCH] mm: page_state fixes · a86b1f53

由 Nick Piggin 提交于 1月 06, 2006

read_page_state and __get_page_state only traverse online CPUs, which will
cause results to fluctuate when CPUs are plugged in or out.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a86b1f53

[PATCH] mm: remove pcp low · 2d92c5c9

由 Nick Piggin 提交于 1月 06, 2006

struct per_cpu_pages.low is useless.  Remove it.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2d92c5c9

[PATCH] mm: remove bad_range · 13e7444b

由 Nick Piggin 提交于 1月 06, 2006

bad_range is supposed to be a temporary check.  It would be a pity to throw it
out.  Make it depend on CONFIG_DEBUG_VM instead.

CONFIG_HOLES_IN_ZONE systems were relying on this to check pfn_valid in the
page allocator.  Add that to page_is_buddy instead.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

13e7444b

[PATCH] mm: microopt conditions · 92be2e33

由 Nick Piggin 提交于 1月 06, 2006

Micro optimise some conditionals where we don't need lazy evaluation.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

92be2e33

[PATCH] mm: set_page_refs opt · 77a8a788

由 Nick Piggin 提交于 1月 06, 2006

Inline set_page_refs.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

77a8a788

[PATCH] mm: pagealloc opt · c54ad30c

由 Nick Piggin 提交于 1月 06, 2006

Slightly optimise some page allocation and freeing functions by taking
advantage of knowing whether or not interrupts are disabled.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c54ad30c

[PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES · a94b3ab7

由 Mike Kravetz 提交于 1月 06, 2006

The NODES_SPAN_OTHER_NODES config option was created so that DISCONTIGMEM
could handle pSeries numa layouts.  However, support for DISCONTIGMEM has
been replaced by SPARSEMEM on powerpc.  As a result, this config option and
supporting code is no longer needed.

I have already sent a patch to Paul that removes the option from powerpc
specific code.  This removes the arch independent piece.  Doesn't really
matter which is applied first.
Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a94b3ab7

[PATCH] mm: fix __alloc_pages cpuset ALLOC_* flags · 47f3a867

由 Paul Jackson 提交于 1月 06, 2006

Two changes to the setting of the ALLOC_CPUSET flag in
mm/page_alloc.c:__alloc_pages()

- A bug fix - the "ignoring mins" case should not be honoring ALLOC_CPUSET.
  This case of all cases, since it is handling a request that will free up
  more memory than is asked for (exiting tasks, e.g.) should be allowed to
  escape cpuset constraints when memory is tight.

- A logic change to make it simpler.  Honor cpusets even on GFP_ATOMIC
  (!wait) requests.  With this, cpuset confinement applies to all requests
  except ALLOC_NO_WATERMARKS, so that in a subsequent cleanup patch, I can
  remove the ALLOC_CPUSET flag entirely.  Since I don't know any real reason
  this logic has to be either way, I am choosing the path of the simplest
  code.
Signed-off-by: NPaul Jackson <pj@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

47f3a867

16 12月, 2005 1 次提交

[PATCH] missing prototype (mm/page_alloc.c) · 78d9955b

由 Al Viro 提交于 12月 15, 2005

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

78d9955b

04 12月, 2005 1 次提交

[PATCH] Fix up per-cpu page batch sizes · 0ceaacc9

由 Nick Piggin 提交于 12月 04, 2005

The code to clamp batch sizes to 2^n - 1 went missing and an extra
check got added, which must have been a hunk of the "higer order pcp
batch refills" work sneaking in.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0ceaacc9

29 11月, 2005 1 次提交

[PATCH] mm: __alloc_pages cleanup fix · 3148890b

由 Nick Piggin 提交于 11月 28, 2005

I believe this patch is required to fix breakage in the asynch reclaim
watermark logic introduced by this patch:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7fb1d9fca5c6e3b06773b69165a73f3fb786b8ee

Just some background of the watermark logic in case it isn't clear...
Basically what we have is this:

 ---  pages_high
   |
   | (a)
   |
 ---  pages_low
   |
   | (b)
   |
 ---  pages_min
   |
   | (c)
   |
 ---  0

Now when pages_low is reached, we want to kick asynch reclaim, which gives us
an interval of "b" before we must start synch reclaim, and gives kswapd an
interval of "a" before it need go back to sleep.

When pages_min is reached, normal allocators must enter synch reclaim, but
PF_MEMALLOC, ALLOC_HARDER, and ALLOC_HIGH (ie.  atomic allocations, recursive
allocations, etc.) get access to varying amounts of the reserve "c".
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: "Seth, Rohit" <rohit.seth@intel.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3148890b

23 11月, 2005 2 次提交

[PATCH] unpaged: PG_reserved bad_page · 689bcebf

由 Hugh Dickins 提交于 11月 21, 2005

It used to be the case that PG_reserved pages were silently never freed, but
in 2.6.15-rc1 they may be freed with a "Bad page state" message. We should
work through such cases as they appear, fixing the code; but for now it's
safer to issue the message without freeing the page, leaving PG_reserved set.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

689bcebf

[PATCH] unpaged: unifdefed PageCompound · 664beed0

由 Hugh Dickins 提交于 11月 21, 2005

It looks like snd_xxx is not the only nopage to be using PageReserved as a way
of holding a high-order page together: which no longer works, but is masked by
our failure to free from VM_RESERVED areas. We cannot fix that bug without
first substituting another way to hold the high-order page together, while
farming out the 0-order pages from within it.

That's just what PageCompound is designed for, but it's been kept under
CONFIG_HUGETLB_PAGE. Remove the #ifdefs: which saves some space (out- of-line
put_page), doesn't slow down what most needs to be fast (already using
hugetlb), and unifies the way we handle high-order pages.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

664beed0

18 11月, 2005 1 次提交

[PATCH] VM: fix zone list restart in page allocatate · 6b1de916

由 Jens Axboe 提交于 11月 17, 2005

We must reassign z before looping through the zones kicking kswapd,
since it will be NULL if we hit an OOM condition and jump back to the
beginning again. 'z' is initially assigned before the restart: label. So
move the restart label up a little.
Signed-off-by: NJens Axboe <axboe@suse.de>

6b1de916

15 11月, 2005 3 次提交

[PATCH] x86_64: Remove obsolete ARCH_HAS_ATOMIC_UNSIGNED and page_flags_t · 07808b74

由 Andi Kleen 提交于 11月 05, 2005

Has been introduced for x86-64 at some point to save memory
in struct page, but has been obsolete for some time. Just
remove it.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

07808b74

A
[PATCH] x86_64: When cpu_up fails clean up page allocator properly · b0d41693
由 Andi Kleen 提交于 11月 05, 2005
```
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
```
b0d41693

[PATCH] x86_64: Add 4GB DMA32 zone · a2f1b424

由 Andi Kleen 提交于 11月 05, 2005

Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.

As a bit of historical background: when the x86-64 port
was originally designed we had some discussion if we should
use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
both. Both was ruled out at this point because it was in early
2.4 when VM is still quite shakey and had bad troubles even
dealing with one DMA zone.  We settled on the 16MB DMA zone mainly
because we worried about older soundcards and the floppy.

But this has always caused problems since then because
device drivers had trouble getting enough DMA able memory. These days
the VM works much better and the wide use of NUMA has proven
it can deal with many zones successfully.

So this patch adds both zones.

This helps drivers who need a lot of memory below 4GB because
their hardware is not accessing more (graphic drivers - proprietary
and free ones, video frame buffer drivers, sound drivers etc.).
Previously they could only use IOMMU+16MB GFP_DMA, which
was not enough memory.

Another common problem is that hardware who has full memory
addressing for >4GB misses it for some control structures in memory
(like transmit rings or other metadata).  They tended to allocate memory
in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
(even on AMD systems the IOMMU tends to be quite small) especially if you have
many devices.  With the new zone pci_alloc_consistent can just put
this stuff into memory below 4GB which works better.

One argument was still if the zone should be 4GB or 2GB. The main
motivation for 2GB would be an unnamed not so unpopular hardware
raid controller (mostly found in older machines from a particular four letter
company) who has a strange 2GB restriction in firmware. But
that one works ok with swiotlb/IOMMU anyways, so it doesn't really
need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
it seems to be the most common restriction.

The new zone is so far added only for x86-64.

For other architectures who don't set up this
new zone nothing changes. Architectures can set a compatibility
define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
as GFP_DMA. Otherwise it's a nop because on 32bit architectures
it's normally not needed because GFP_NORMAL (=0) is DMA able
enough.

One problem is still that GFP_DMA means different things on different
architectures. e.g. some drivers used to have #ifdef ia64  use GFP_DMA
(trusting it to be 4GB) #elif __x86_64__ (use other hacks like
the swiotlb because 16MB is not enough) ... . This was quite
ugly and is now obsolete.

These should be now converted to use GFP_DMA32 unconditionally. I haven't done
this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
which will use GFP_DMA32 transparently.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a2f1b424

openeuler / Kernel 11 个月 前同步成功

openeuler / Kernel
11 个月前同步成功