提交 · 76381fee7e8feb4c22be636aa5d4765dbe4fbf9e · openeuler / raspberrypi-kernel

24 6月, 2005 40 次提交

[PATCH] xen: x86_64: use more usermode macro · 76381fee

由 Vincent Hanquez 提交于 6月 23, 2005

Make use of the user_mode macro where it's possible.  This is useful for Xen
because it will need only to redefine only the macro to a hypervisor call.
Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

76381fee

[PATCH] xen: x86_64: Add macro for debugreg · e9129e56

由 Vincent Hanquez 提交于 6月 23, 2005

Add 2 macros to set and get debugreg on x86_64.  This is useful for Xen
because it will need only to redefine each macro to a hypervisor call.
Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e9129e56

[PATCH] xen: x86: Use more usermode macro · 717b594a

由 Vincent Hanquez 提交于 6月 23, 2005

Use the user_mode macro where it's possible.
Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

717b594a

[PATCH] xen: x86: Rename usermode macro · fa1e1bdf

由 Vincent Hanquez 提交于 6月 23, 2005

Rename user_mode to user_mode_vm and add a user_mode macro similar to the
x86-64 one.

This is useful for Xen because the linux xen kernel does not runs on the same
priviledge that a vanilla linux kernel, and with this we just need to redefine
user_mode().
Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

fa1e1bdf

[PATCH] xen: x86: Use new macro for debugreg · 1cc6f12e

由 Vincent Hanquez 提交于 6月 23, 2005

Make use of the 2 new macro set_debugreg and get_debugreg.
Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1cc6f12e

[PATCH] xen: x86: add macro for debugreg · f5012310

由 Vincent Hanquez 提交于 6月 23, 2005

Add 2 macros to set and get debugreg on x86.  This is useful for Xen because
it will need only to redefine each macro to a hypervisor call.
Signed-off-by: NVincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
Cc: Ian Pratt <m+Ian.Pratt@cl.cam.ac.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f5012310

[PATCH] x86_64: avoid wasting IRQs · 701067c4

由 Natalie Protasevich 提交于 6月 23, 2005

I suggest to change the way IRQs are handed out to PCI devices.

Currently, each I/O APIC pin gets associated with an IRQ, no matter if the
pin is used or not. It is expected that each pin can potentually be
engaged by a device inserted into the corresponding PCI slot. However,
this imposes severe limitation on systems that have designs that employ
many I/O APICs, only utilizing couple lines of each, such as P64H2 chipset.

It is used in ES7000, and currently, there is no way to boot the system
with more that 9 I/O APICs.

I only implemented this for the ACPI boot, since if the system is this big
and using newer chipsets it is probably (better be!) an ACPI based system
:). The change is completely "mechanical" and does not alter any internal
structures or interrupt model/implementation. The patch works for both
i386 and x86_64 archs. It works with MSIs just fine, and should not
intervene with implementations like shared vectors, when they get worked
out and incorporated.

To illustrate, below is the interrupt distribution for 2-cell ES7000 with
20 I/O APICs, and an Ethernet card in the last slot, which should be eth1
and which was not configured because its IRQ exceeded allowable number (it
actially turned out huge - 480!):

zorro-tb2:~ # cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 65716 30012 30007 30002 30009 30010 30010 30010 IO-APIC-edge timer
4: 373 0 725 280 0 0 0 0 IO-APIC-edge serial
8: 0 0 0 0 0 0 0 0 IO-APIC-edge rtc
9: 0 0 0 0 0 0 0 0 IO-APIC-level acpi
14: 39 3 0 0 0 0 0 0 IO-APIC-edge ide0
16: 108 13 0 0 0 0 0 0 IO-APIC-level uhci_hcd:usb1
18: 0 0 0 0 0 0 0 0 IO-APIC-level uhci_hcd:usb3
19: 15 0 0 0 0 0 0 0 IO-APIC-level uhci_hcd:usb2
23: 3 0 0 0 0 0 0 0 IO-APIC-level ehci_hcd:usb4
96: 4240 397 18 0 0 0 0 0 IO-APIC-level aic7xxx
97: 15 0 0 0 0 0 0 0 IO-APIC-level aic7xxx
192: 847 0 0 0 0 0 0 0 IO-APIC-level eth0
NMI: 0 0 0 0 0 0 0 0
LOC: 273423 274528 272829 274228 274092 273761 273827 273694
ERR: 7
MIS: 0

Even though the system doesn't have that many devices, some don't get
enabled only because of IRQ numbering model.

This is the IRQ picture after the patch was applied:

zorro-tb2:~ # cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 44169 10004 10004 10001 10004 10003 10004 6135 IO-APIC-edge timer
4: 345 0 0 0 0 244 0 0 IO-APIC-edge serial
8: 0 0 0 0 0 0 0 0 IO-APIC-edge rtc
9: 0 0 0 0 0 0 0 0 IO-APIC-level acpi
14: 39 0 3 0 0 0 0 0 IO-APIC-edge ide0
17: 4425 0 9 0 0 0 0 0 IO-APIC-level aic7xxx
18: 15 0 0 0 0 0 0 0 IO-APIC-level aic7xxx, uhci_hcd:usb3
21: 231 0 0 0 0 0 0 0 IO-APIC-level uhci_hcd:usb1
22: 26 0 0 0 0 0 0 0 IO-APIC-level uhci_hcd:usb2
23: 3 0 0 0 0 0 0 0 IO-APIC-level ehci_hcd:usb4
24: 348 0 0 0 0 0 0 0 IO-APIC-level eth0
25: 6 192 0 0 0 0 0 0 IO-APIC-level eth1
NMI: 0 0 0 0 0 0 0 0
LOC: 107981 107636 108899 108698 108489 108326 108331 108254
ERR: 7
MIS: 0

Not only we see the card in the last I/O APIC, but we are not even close to
using up available IRQs, since we didn't waste any.
Signed-off-by: NNatalie Protasevich <Natalie.Protasevich@unisys.com>
Acked-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

701067c4

[PATCH] eliminate duplicate rdpmc definition · 32ecd42b

由 Jan Beulich 提交于 6月 23, 2005

Eliminate duplicate definition of rdpmc in x86-64's mtrr.h.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

32ecd42b

[PATCH] x86_64: never block forced SIGSEGV · 0928d6ef

由 Roland McGrath 提交于 6月 23, 2005

This is the x86_64 version of the signal fix I just posted for i386.

This problem was first noticed on PPC and has already been fixed there.
But the exact same issue applies to other platforms in the same way. The
signal blocking for sa_mask and the handled signal takes place after the
handler setup. When the stack is bogus, the handler setup forces a
SIGSEGV. But then this will be blocked, and returning to user mode will
fault again and iterate. This patch fixes the problem by checking whether
signal handler setup failed, and not doing the signal-blocking if so. This
copies what was done in the ppc code. I think all architectures' signal
handler setup code follows this pattern and needs the change.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0928d6ef

[PATCH] x86_64: fix hpet for systems that don't support legacy replacement · a3a00751

由 john stultz 提交于 6月 23, 2005

Currently the x86-64 HPET code assumes the entire HPET implementation from
the spec is present.  This breaks on boxes that do not implement the
optional legacy timer replacement functionality portion of the spec.

This patch fixes this issue, allowing x86-64 systems that cannot use the
HPET for the timer interrupt and RTC to still use the HPET as a time
source.  I've tested this patch on a system systems without HPET, with HPET
but without legacy timer replacement, as well as HPET with legacy timer
replacement.

This version adds a minor check to cap the HPET counter value in
gettimeoffset_hpet to avoid possible time inconsistencies.  Please ignore
the A2 version I sent to you earlier.
Acked-by: NAndi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a3a00751

[PATCH] x86_64: i8259.c iso99 structure initialization · c0a88c98

由 Alexander Nyberg 提交于 6月 23, 2005

Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c0a88c98

[PATCH] mtrr size-and-base debugging · c92c6ffd

由 Andrew Morton 提交于 6月 23, 2005

Consolidate the mtrr sanity checking, add a dump_stack().
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c92c6ffd

[PATCH] x86: cpu_khz type fix · a3a255e7

由 Andrew Morton 提交于 6月 23, 2005

x86_64's cpu_khz is unsigned int and there is no reason why x86 needs to use
unsigned long.

So make cpu_khz unsigned int on x86 as well.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a3a255e7

[PATCH] Remove i386_ksyms.c, almost. · 129f6946

由 Alexey Dobriyan 提交于 6月 23, 2005

* EXPORT_SYMBOL's moved to other files
* #include <linux/config.h>, <linux/module.h> where needed
* #include's in i386_ksyms.c cleaned up
* After copy-paste, redundant due to Makefiles rules preprocessor directives
  removed:

	#ifdef CONFIG_FOO
	EXPORT_SYMBOL(foo);
	#endif

	obj-$(CONFIG_FOO) += foo.o

* Tiny reformat to fit in 80 columns
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

129f6946

[PATCH] x86: #include asm/uaccess.h in asm/checksum.h · a9ed8817

由 Alexey Dobriyan 提交于 6月 23, 2005

csum_and_copy_to_user is static inline and uses VERIFY_WRITE.  Patch allows
to remove asm/uaccess.h from i386_ksyms.c without dependency surprises.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a9ed8817

[PATCH] VIA 82C586B IRQ routing fix · 80bb82af

由 Aleksey Gorelov 提交于 6月 23, 2005

According to the VIA 82C586B datasheet (still available from
http://gkernel.sourceforge.net/specs/via/586b.pdf.bz2) this chip need a
special PIRQ mapping.
Signed-off-by: NKarsten Keil <kkeil@suse.de>
Signed-off-by: NAleksey Gorelov <aleksey_gorelov@phoenix.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

80bb82af

[PATCH] x86: avoid wasting IRQs for PCI devices · c434b7a6

由 Natalie Protasevich 提交于 6月 23, 2005

I have submitted the patch for x86_64, this is submission for i386.

The patch changes the way IRQs are handed out to PCI devices. Currently,
each I/O APIC pin gets associated with an IRQ, no matter if the pin is used
or not. This imposes severe limitation on systems that have designs that
employ many I/O APICs, only utilizing couple lines of each, such as P64H2
chipset. It is used in ES7000, and currently, there is no way to boot the
system with more that 9 I/O APICs.

The simple change below allows to boot a system with say 64 (or more) I/O
APICs, each providing 1 slot, which otherwise impossible because of the IRQ
gaps created for unused lines on each I/O APIC. It does not resolve the
problem with number of devices that exceeds number of possible IRQs, but
eases up a tension for IRQs on any large system with potentually large
number of devices.
Signed-off-by: NNatalie Protasevich <Natalie.Protasevich@unisys.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

c434b7a6

[PATCH] ia64: Selectable Timer Interrupt Frequency · b5d23e5b

由 Christoph Lameter 提交于 6月 23, 2005

It allows a selectable timer interrupt frequency of 100, 250 and 1000 HZ.
Reducing the timer frequency may have important performance benefits on
large systems.
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b5d23e5b

[PATCH] i386: Selectable Frequency of the Timer Interrupt · 59121003

由 Christoph Lameter 提交于 6月 23, 2005

Make the timer frequency selectable. The timer interrupt may cause bus
and memory contention in large NUMA systems since the interrupt occurs
on each processor HZ times per second.
Signed-off-by: NChristoph Lameter <christoph@lameter.com>
Signed-off-by: NShai Fultheim <shai@scalex86.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

59121003

[PATCH] allow early printk to use more than 25 lines · 799d19f6

由 Jan Beulich 提交于 6月 23, 2005

Allow early printk code to take advantage of the full size of the screen, not
just the first 25 lines.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Acked-by: NAndi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

799d19f6

[PATCH] adjust i386 watchdog tick calculation · 7fbb4f6e

由 Jan Beulich 提交于 6月 23, 2005

Get the i386 watchdog tick calculation into a state where it can also be used
on CPUs with frequencies beyond 4GHz, and it consolidates the calculation into
a single place (for potential furture adjustments).
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7fbb4f6e

[PATCH] Do not enforce unique IO_APIC_ID check for xAPIC systems (i386) · ca05fea6

由 Natalie Protasevich 提交于 6月 23, 2005

This patch is per Andi's request to remove NO_IOAPIC_CHECK from genapic and
use heuristics to prevent unique I/O APIC ID check for systems that don't
need it.  The patch disables unique I/O APIC ID check for Xeon-based and
other platforms that don't use serial APIC bus for interrupt delivery.
Andi stated that AMD systems don't need unique IO_APIC_IDs either.
Signed-off-by: NNatalie Protasevich <Natalie.Protasevich@unisys.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ca05fea6

[PATCH] i386: never block forced SIGSEGV · 7c1def16

由 Roland McGrath 提交于 6月 23, 2005

This problem was first noticed on PPC and has already been fixed there.
But the exact same issue applies to other platforms in the same way. The
signal blocking for sa_mask and the handled signal takes place after the
handler setup. When the stack is bogus, the handler setup forces a
SIGSEGV. But then this will be blocked, and returning to user mode will
fault again and iterate. This patch fixes the problem by checking whether
signal handler setup failed, and not doing the signal-blocking if so. This
copies what was done in the ppc code. I think all architectures' signal
handler setup code follows this pattern and needs the change.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

7c1def16

[PATCH] NUMA aware block device control structure allocation · 1946089a

由 Christoph Lameter 提交于 6月 23, 2005

Patch to allocate the control structures for for ide devices on the node of
the device itself (for NUMA systems).  The patch depends on the Slab API
change patch by Manfred and me (in mm) and the pcidev_to_node patch that I
posted today.

Does some realignment too.
Signed-off-by: NJustin M. Forbes <jmforbes@linuxtx.org>
Signed-off-by: NChristoph Lameter <christoph@lameter.com>
Signed-off-by: NPravin Shelar <pravin@calsoftinc.com>
Signed-off-by: NShobhit Dayal <shobhit@calsoftinc.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1946089a

[PATCH] x86/x86_64: pcibus_to_node · 8c5a0908

由 Christoph Lameter 提交于 6月 23, 2005

Define pcibus_to_node to be able to figure out which NUMA node contains a
given PCI device.  This defines pcibus_to_node(bus) in
include/linux/topology.h and adjusts the macros for i386 and x86_64 that
already provided a way to determine the cpumask of a pci device.

x86_64 was changed to not build an array of cpumasks anymore.  Instead an
array of nodes is build which can be used to generate the cpumask via
node_to_cpumask.
Signed-off-by: NChristoph Lameter <christoph@lameter.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8c5a0908

[PATCH] ppc64: pcibus_to_node fix · e164f557

由 Christoph Lameter 提交于 6月 23, 2005

asm-generic/topology.h must also be included if CONFIG_NUMA is set in order to
provide the fall back pcibus_to_node function.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e164f557

[PATCH] m32r: build fix for asm-m32r/topology.h · 4a352936

由 Hirokazu Takata 提交于 6月 23, 2005

Use asm-generic/topology.h to fix yet another pcibus_to_node() build error.

Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4a352936

[PATCH] Platform SMIs and their interferance with tsc based delay calibration · 8a9e1b0f

由 Venkatesh Pallipadi 提交于 6月 23, 2005

Issue:
Current tsc based delay_calibration can result in significant errors in
loops_per_jiffy count when the platform events like SMIs
(System Management Interrupts that are non-maskable) are present. This could
lead to potential kernel panic(). This issue is becoming more visible with 2.6
kernel (as default HZ is 1000) and on platforms with higher SMI handling
latencies. During the boot time, SMIs are mostly used by BIOS (for things
like legacy keyboard emulation).

Description:
The psuedocode for current delay calibration with tsc based delay looks like
(0) Estimate a value for loops_per_jiffy
(1) While (loops_per_jiffy estimate is accurate enough)
(2)   wait for jiffy transition (jiffy1)
(3)   Note down current tsc (tsc1)
(4)   loop until tsc becomes tsc1 + loops_per_jiffy
(5)   check whether jiffy changed since jiffy1 or not and refine
loops_per_jiffy estimate

Consider the following cases
Case 1:
If SMIs happen between (2) and (3) above, we can end up with a
loops_per_jiffy value that is too low. This results in shorted delays and
kernel can panic () during boot (Mostly at IOAPIC timer initialization
timer_irq_works() as we don't have enough timer interrupts in a specified
interval).

Case 2:
If SMIs happen between (3) and (4) above, then we can end up with a
loops_per_jiffy value that is too high. And with current i386 code, too
high lpj value (greater than 17M) can result in a overflow in
delay.c:__const_udelay() again resulting in shorter delay and panic().

Solution:
The patch below makes the calibration routine aware of asynchronous events
like SMIs. We increase the delay calibration time and also identify any
significant errors (greater than 12.5%) in the calibration and notify it to
user.

Patch below changes both i386 and x86-64 architectures to use this
new and improved calibrate_delay_direct() routine.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8a9e1b0f

[PATCH] use ${CROSS_COMPILE}installkernel in arch/*/boot/install.sh · 0f8e2d62

由 Ian Campbell 提交于 6月 23, 2005

The attached patch causes the various arch specific install.sh scripts to
look for ${CROSS_COMPILE}installkernel rather than just installkernel (in
both /sbin/ and ~/bin/ where the script already did this).  This allows you
to have e.g.  arm-linux-installkernel as a handy way to install on your
cross target.  It also prevents the script picking up on the host
/sbin/installkernel which causes the script to fall through and do the
install itself (which is what I actually use myself, with $INSTALL_PATH
set).

I don't believe it causes back-compatibility problems since calling the
host installkernel was never likely to work or be what you wanted when
cross compiling anyway.  If $CROSS_COMPILE isn't set then nothing changes.

I only use ARM and i386 myself but I figured it couldn't hurt to do the
whole lot.  I've cc'd those who I hope are the arch maintainers for files
that I've touched.
Signed-off-by: NIan Campbell <icampbell@arcom.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0f8e2d62

[PATCH] biarch compiler support for i386 · d0e7feb0

由 H. Peter Anvin 提交于 6月 23, 2005

This allows the i386 architecture to be built on a system with a biarch
compiler that defaults to x86-64, merely by specifying ARCH=i386.

As previously discussed, this uses the equivalent logic to the ppc port.
Signed-Off-By: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d0e7feb0

[PATCH] add page_state info to show_mem · 6f4e1e50

由 Martin J. Bligh 提交于 6月 23, 2005

This helps a lot when debugging out of memory stuff - useful especially to
see if all the memory is sucked into slab, etc.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6f4e1e50

[PATCH] add x86-64 specific support for sparsemem · bbfceef4

由 Matt Tolentino 提交于 6月 23, 2005

This patch adds in the necessary support for sparsemem such that x86-64
kernels may use sparsemem as an alternative to discontigmem for NUMA
kernels.  Note that this does no preclude one from continuing to build NUMA
kernels using discontigmem, but merely allows the option to build NUMA
kernels with sparsemem.

Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels
results in reduced text size for otherwise equivalent kernels as shown in
the example builds below:

   text	   data	    bss	    dec	    hex	filename
2371036	 765884	1237108	4374028	 42be0c	vmlinux.discontig
2366549	 776484	1302772	4445805	 43d66d	vmlinux.sparse
Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bbfceef4

[PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options · 2b97690f

由 Matt Tolentino 提交于 6月 23, 2005

In order to use the alternative sparsemem implmentation for NUMA kernels,
we need to reorganize the config options.  This patch effectively abstracts
out the CONFIG_DISCONTIGMEM options to CONFIG_NUMA in most cases.  Thus,
the discontigmem implementation may be employed as always, but the
sparsemem implementation may be used alternatively.
Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

2b97690f

[PATCH] add x86-64 Kconfig options for sparsemem · 1035faf1

由 Matt Tolentino 提交于 6月 23, 2005

Add the requisite arch specific Kconfig options to enable the use of the
sparsemem implementation for NUMA kernels on x86-64.
Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1035faf1

[PATCH] remove direct ref to contig_page_data for x86-64 · 07332663

由 Matt Tolentino 提交于 6月 23, 2005

This patch pulls out all remaining direct references to contig_page_data
from arch/x86-64, thus saving an ifdef in one case.
Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

07332663

[PATCH] ppc64: sparsemem memory model · 145e6642

由 Andy Whitcroft 提交于 6月 23, 2005

Provide the architecture specific implementation for SPARSEMEM for PPC64
systems.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> (in part)
Signed-off-by: NMartin Bligh <mbligh@aracnet.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

145e6642

[PATCH] ppc64: add memory present · 74b30be2

由 Andy Whitcroft 提交于 6月 23, 2005

Provide hooks for PPC64 to allow memory models to be informed of installed
memory areas.  This allows SPARSEMEM to instantiate mem_map for the populated
areas.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NMartin Bligh <mbligh@aracnet.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

74b30be2

[PATCH] ppc64: add early_pfn_to_nid · 510f8fa7

由 Andy Whitcroft 提交于 6月 23, 2005

Provide an implementation of early_pfn_to_nid for PPC64.  This is used by
memory models to determine the node from which to take allocations before the
memory allocators are fully initialised.
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NMartin Bligh <mbligh@aracnet.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

510f8fa7

[PATCH] sparsemem hotplug base · 29751f69

由 Andy Whitcroft 提交于 6月 23, 2005

Make sparse's initalization be accessible at runtime. This allows sparse
mappings to be created after boot in a hotplug situation.

This patch is separated from the previous one just to give an indication how
much of the sparse infrastructure is *just* for hotplug memory.

The section_mem_map doesn't really store a pointer. It stores something that
is convenient to do some math against to get a pointer. It isn't valid to
just do *section_mem_map, so I don't think it should be stored as a pointer.

There are a couple of things I'd like to store about a section. First of all,
the fact that it is !NULL does not mean that it is present. There could be
such a combination where section_mem_map *is* NULL, but the math gets you
properly to a real mem_map. So, I don't think that check is safe.

Since we're storing 32-bit-aligned structures, we have a few bits in the
bottom of the pointer to play with. Use one bit to encode whether there's
really a mem_map there, and the other one to tell whether there's a valid
section there. We need to distinguish between the two because sometimes
there's a gap between when a section is discovered to be present and when we
can get the mem_map for it.
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NJack Steiner <steiner@sgi.com>
Signed-off-by: NBob Picco <bob.picco@hp.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

29751f69

[PATCH] sparsemem swiss cheese numa layouts · 641c7673

由 Andy Whitcroft 提交于 6月 23, 2005

The part of the sparsemem patch which modifies memmap_init_zone() has recently
become a problem.  It changes behavior so that there is a call to
pfn_to_page() for each individual page inside of a node's range:
node_start_pfn through node_end_pfn.  It used to simply do this once, at the
beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside
of a node made it necessary to change.

Mike Kravetz recently wrote a patch which made the NUMA code accept some new
kinds of layouts.  The system's memory was laid out like this, with node 0's
memory in two pieces: one before and one after node 1's memory:

	Node 0: +++++     +++++
	Node 1:      +++++

Previous behavior before Mike's patch was to assign nodes like this:

	Node 0: 00000     XXXXX
	Node 1:      11111

Where the 'X' areas were simply thrown away.  The new behavior was to make the
pg_data_t span node 0 across all of its areas, including areas that are really
node 1's: Node 0: 000000000000000 Node 1: 11111

This wastes a little bit of mem_map space, but ends up being OK, and more
fully utilizes the system's memory.  memmap_init_zone() initializes all of the
"struct page"s for node 0, even for the "hole", but those never get used,
because there is no pfn_to_page() that resolves to those pages.  However, only
calling pfn_to_page() once, memmap_init_zone() always uses the pages that were
allocated for node0->node_mem_map because:

	struct page *start = pfn_to_page(start_pfn);
	// effectively start = &node->node_mem_map[0]
	for (page = start; page < (start + size); page++) {
		init_page_here();...
		page++;
	}

Slow, and wasteful, but generally harmless.

But, modify that to call pfn_to_page() for each loop iteration (like sparsemem
does):

	for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) {
		page = pfn_to_page(pfn);
	}

And you end up trying to initialize node 1's pages too early, along with bogus
data from node 0.  This patch checks for those weird layouts and declines to
touch the pages, making the more frequent pfn_to_page() calls OK to do.
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

641c7673