- 30 4月, 2008 2 次提交
-
-
由 Thomas Gleixner 提交于
We can see an ever repeating problem pattern with objects of any kind in the kernel: 1) freeing of active objects 2) reinitialization of active objects Both problems can be hard to debug because the crash happens at a point where we have no chance to decode the root cause anymore. One problem spot are kernel timers, where the detection of the problem often happens in interrupt context and usually causes the machine to panic. While working on a timer related bug report I had to hack specialized code into the timer subsystem to get a reasonable hint for the root cause. This debug hack was fine for temporary use, but far from a mergeable solution due to the intrusiveness into the timer code. The code further lacked the ability to detect and report the root cause instantly and keep the system operational. Keeping the system operational is important to get hold of the debug information without special debugging aids like serial consoles and special knowledge of the bug reporter. The problems described above are not restricted to timers, but timers tend to expose it usually in a full system crash. Other objects are less explosive, but the symptoms caused by such mistakes can be even harder to debug. Instead of creating specialized debugging code for the timer subsystem a generic infrastructure is created which allows developers to verify their code and provides an easy to enable debug facility for users in case of trouble. The debugobjects core code keeps track of operations on static and dynamic objects by inserting them into a hashed list and sanity checking them on object operations and provides additional checks whenever kernel memory is freed. The tracked object operations are: - initializing an object - adding an object to a subsystem list - deleting an object from a subsystem list Each operation is sanity checked before the operation is executed and the subsystem specific code can provide a fixup function which allows to prevent the damage of the operation. When the sanity check triggers a warning message and a stack trace is printed. The list of operations can be extended if the need arises. For now it's limited to the requirements of the first user (timers). The core code enqueues the objects into hash buckets. The hash index is generated from the address of the object to simplify the lookup for the check on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a global lock. The debug code can be compiled in without being active. The runtime overhead is minimal and could be optimized by asm alternatives. A kernel command line option enables the debugging code. Thanks to Ingo Molnar for review, suggestions and cleanup patches. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Samuel Thibault 提交于
This adds a minimalistic braille screen reader support. This is meant to be used by blind people e.g. on boot failures or when / cannot be mounted etc and thus the userland screen readers can not work. [akpm@linux-foundation.org: fix exports] Signed-off-by: NSamuel Thibault <samuel.thibault@ens-lyon.org> Cc: Jiri Kosina <jikos@jikos.cz> Cc: Dmitry Torokhov <dtor@mail.ru> Acked-by: NAlan Cox <alan@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 4月, 2008 2 次提交
-
-
由 Tim Gardner 提交于
Add a kernel parameter option to 'edd' to enable/disable BIOS Enhanced Disk Drive Services. CONFIG_EDD_OFF disables EDD while still compiling EDD into the kernel. Default behavior can be forced using 'edd=on' or 'edd=off' as a kernel parameter. [akpm@linux-foundation.org: fix kernel-parameters.txt] Signed-off-by: NTim Gardner <tim.gardner@canonical.com> Signed-off-by: NMatt Domsch <Matt_Domsch@dell.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andres Salomon 提交于
This adds support for OLPC XO hardware. Open Firmware on XOs don't contain the VSA, so it is necessary to emulate the PCI BARs in the kernel. This also adds functionality for running EC commands, and a CONFIG_OLPC. A number of OLPC drivers depend upon CONFIG_OLPC. olpc_ec_timeout is a hack to work around Embedded Controller bugs. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: geode_has_vsa build fix] [akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist] Signed-off-by: NAndres Salomon <dilinger@debian.org> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@suse.de> Cc: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 4月, 2008 1 次提交
-
-
* Remove obsoleted "idex=" kernel parameters. * Make probe_* and cmd640_vlb variables static. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
- 21 4月, 2008 2 次提交
-
-
由 mark gross 提交于
This patch is for batching up the flushing of the IOTLB for the DMAR implementation found in the Intel VT-d hardware. It works by building a list of to be flushed IOTLB entries and a bitmap list of which DMAR engine they are from. After either a high water mark (250 accessible via debugfs) or 10ms the list of iova's will be reclaimed and the DMAR engines associated are IOTLB-flushed. This approach recovers 15 to 20% of the performance lost when using the IOMMU for my netperf udp stream benchmark with small packets. It can be disabled with a kernel boot parameter "intel_iommu=strict". Its use does weaken the IOMMU protections a bit. Signed-off-by: NMark Gross <mgross@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Greg Kroah-Hartman 提交于
We currently keep 2 lists of PCI devices in the system, one in the driver core, and one all on its own. This second list is sorted at boot time, in "BIOS" order, to try to remain compatible with older kernels (2.2 and earlier days). There was also a "nosort" option to turn this sorting off, to remain compatible with even older kernel versions, but that just ends up being what we have been doing from 2.5 days... Unfortunately, the second list of devices is not really ever used to determine the probing order of PCI devices or drivers[1]. That is done using the driver core list instead. This change happened back in the early 2.5 days. Relying on BIOS ording for the binding of drivers to specific device names is problematic for many reasons, and userspace tools like udev exist to properly name devices in a persistant manner if that is needed, no reliance on the BIOS is needed. Matt Domsch and others at Dell noticed this back in 2006, and added a boot option to sort the PCI device lists (both of them) in a breadth-first manner to help remain compatible with the 2.4 order, if needed for any reason. This option is not going away, as some systems rely on them. This patch removes the sorting of the internal PCI device list in "BIOS" mode, as it's not needed at all anymore, and hasn't for many years. I've also removed the PCI flags for this from some other arches that for some reason defined them, but never used them. This should not change the ordering of any drivers or device probing. [1] The old-style pci_get_device and pci_find_device() still used this sorting order, but there are very few drivers that use these functions, as they are deprecated for use in this manner. If for some reason, a driver rely on the order and uses these functions, the breadth-first boot option will resolve any problem. Cc: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 20 4月, 2008 1 次提交
-
-
由 Jiri Slaby 提交于
- noexec32 is on by default for years already - add noexec32 to kernel-parameters and fix noexec typo in there Signed-off-by: NJiri Slaby <jirislaby@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 19 4月, 2008 1 次提交
-
-
由 Ahmed S. Darwish 提交于
Add the security= boot parameter. This is done to avoid LSM registration clashes in case of more than one bult-in module. User can choose a security module to enable at boot. If no security= boot parameter is specified, only the first LSM asking for registration will be loaded. An invalid security module name will be treated as if no module has been chosen. LSM modules must check now if they are allowed to register by calling security_module_enable(ops) first. Modify SELinux and SMACK to do so. Do not let SMACK register smackfs if it was not chosen on boot. Smackfs assumes that smack hooks are registered and the initial task security setup (swapper->security) is done. Signed-off-by: NAhmed S. Darwish <darwish.07@gmail.com> Acked-by: NJames Morris <jmorris@namei.org>
-
- 18 4月, 2008 3 次提交
-
-
* Remove obsoleted "idex=noprobe" kernel parameter. * Remove no longer needed hwif->noprobe quirk from ide_hwif_configure() and hwif->noprobe checking from cmd640.c. v2: * "ide?=noprobe" -> "ide?=ata66" in Documentation/kernel-parameters.txt. Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
由 Greg Kroah-Hartman 提交于
This option is obsolete and can be removed safely. It allows us to remove the pci_get_device_reverse() function from the PCI core. Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de> Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
由 Jason Wessel 提交于
document the kgdboc module/boot parameter. Signed-off-by: NJason Wessel <jason.wessel@windriver.com> Signed-off-by: NJan Kiszka <jan.kiszka@web.de> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 17 4月, 2008 2 次提交
-
-
由 Yinghai Lu 提交于
Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Pavel Machek 提交于
Fix coding style in pci-dma_64.c and add stubs for documentation. I hope someone fills the rest, I understand maybe off and soft... Signed-off-by: NPavel Machek <pavel@suse.cz> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 4月, 2008 1 次提交
-
-
由 J. Bruce Fields 提交于
Documentation/ is a little large, and filesystems/ seems an obvious place for this file. Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 05 4月, 2008 2 次提交
-
-
由 Paul Menage 提交于
The effects of cgroup_disable=foo are: - foo isn't auto-mounted if you mount all cgroups in a single hierarchy - foo isn't visible as an individually mountable subsystem As a result there will only ever be one call to foo->create(), at init time; all processes will stay in this group, and the group will never be mounted on a visible hierarchy. Any additional effects (e.g. not allocating metadata) are up to the foo subsystem. This doesn't handle early_init subsystems (their "disabled" bit isn't set be, but it could easily be extended to do so if any of the early_init systems wanted it - I think it would just involve some nastier parameter processing since it would occur before the command-line argument parser had been run. Hugh said: Ballpark figures, I'm trying to get this question out rather than processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead to the affected paths, booting with cgroup_disable=memory cuts that back to 1% overhead (due to slightly bigger struct page). I'm no expert on distros, they may have no interest whatever in CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or without it, or apply the cgroup_disable=memory patches. Unix bench's execl test result on x86_64 was == just after boot without mounting any cgroup fs.== mem_cgorup=off : Execl Throughput 43.0 3150.1 732.6 mem_cgroup=on : Execl Throughput 43.0 2932.6 682.0 == [lizf@cn.fujitsu.com: fix boot option parsing] Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Fenghua Yu 提交于
The patch defines kernel parameter "nptcg=". The parameter overrides max number of concurrent global TLB purges which is reported from either PAL_VM_SUMMARY or SAL PALO. Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 02 4月, 2008 1 次提交
-
-
由 Rafael J. Wysocki 提交于
Some time ago it turned out that our suspend code ordering broke some NVidia-based systems that hung if _PTS was executed with one of the PCI devices, specifically a USB controller, in a low power state. Then, it was noticed that the suspend code ordering was not compliant with ACPI 1.0, although it was compliant with ACPI 2.0 (and later), and it was argued that the code had to be changed for that reason (ref. http://bugzilla.kernel.org/show_bug.cgi?id=9528). So we did, but evidently we did wrong, because it's now turning out that some systems have been broken by this change. Refs: http://bugzilla.kernel.org/show_bug.cgi?id=10340 https://bugzilla.novell.com/show_bug.cgi?id=374217#c16 [ I said at that time that something like this might happend, but the majority of people involved thought that it was improbable due to the necessity to preserve the compliance of hardware with ACPI 1.0. ] This actually is a quite serious regression from 2.6.24. Moreover, the ACPI 1.0 ordering of suspend code introduced another issue that I have only noticed recently. Namely, if the suspend of one of devices fails, the already suspended devices will be resumed without executing _WAK before, which leads to problems on some systems (for example, in such situations thermal management is broken on my HP nx6325). Consequently, it also breaks suspend debugging on the affected systems. Note also, that the requirement to execute _PTS before suspending devices does not really make sense, because the device in question may be put into a low power state at run time for a reason unrelated to a system-wide suspend. For the reasons outlined above, the change of the suspend ordering should be reverted, which is done by the patch below. [ Felix Möller: "I am the reporter from the original Novell Bug: https://bugzilla.novell.com/show_bug.cgi?id=374217 I just tried current git head (two hours ago) with the patch (the one from the beginning of this thread) from Rafael and without it. With the patch my MacBook does suspend without it does not." ] Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Tested-by: NFelix Möller <felix@derklecks.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 4月, 2008 1 次提交
-
-
由 Robert Brose 提交于
Old-world powermacs don't set L2CR or L3CR on processor upgrade cards. This simple patch allows the setting of L3CR via a kernel parameter (like the existing kernel parameter to set L2CR). Signed-off-by: NRobert Brose <bob@qbjnet.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 25 3月, 2008 1 次提交
-
-
由 Pavel Machek 提交于
Provide example for memmap exclude option (it is slightly strange and non-trivial) and provide nice small HOWTO for people with bad memory. Signed-off-by: NJan-Simon Moeller <dl9pf@gmx.de> Signed-off-by: NPavel Machek <pavel@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 3月, 2008 1 次提交
-
-
由 Linus Torvalds 提交于
This essentially reverts commit 71fc47a9 ("ACPI: basic initramfs DSDT override support"), because the code simply isn't ready. It did ugly things to the init sequence to populate the rootfs image early, but that just ended up showing other problems with the whole approach. The fact is, the VFS layer simply isn't initialized this early, and the relevant ACPI code should either run much later, or this shouldn't be done at all. For 2.6.25, we'll just pick the latter option. We can revisit this concept later if necessary. Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Tilman Schmidt <tilman@imap.cc> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Renninger <trenn@suse.de> Cc: Eric Piel <eric.piel@tremplin-utc.net> Cc: Len Brown <len.brown@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Markus Gaugusch <dsdt@gaugusch.at> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 3月, 2008 1 次提交
-
-
由 Jiri Kosina 提交于
Document 'noloop' kernel parameter of i8042 controller driver. Pointed out in #10236. Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
-
- 13 3月, 2008 1 次提交
-
-
由 Randy Dunlap 提交于
Move 00-INDEX entries to power/00-INDEX (and add entry for pm_qos_interface.txt). Update references to moved filenames. Fix some trailing whitespace. Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 08 3月, 2008 1 次提交
-
-
由 Randy Dunlap 提交于
Fix all references to Documentation/ide/ide.txt. Add/update ide/00-INDEX file. Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
- 21 2月, 2008 1 次提交
-
-
由 Tejun Heo 提交于
This patch implements libata.force module parameter which can selectively override ATA port, link and device configurations including cable type, SATA PHY SPD limit, transfer mode and NCQ. For example, you can say "use 1.5Gbps for all fan-out ports attached to the second port but allow 3.0Gbps for the PMP device itself, oh, the device attached to the third fan-out port chokes on NCQ and shouldn't go over UDMA4" by the following. libata.force=2:1.5g,2.15:3.0g,2.03:noncq,udma4 Signed-off-by: NTejun Heo <htejun@gmail.com> Signed-off-by: NJeff Garzik <jeff@garzik.org>
-
- 19 2月, 2008 1 次提交
-
-
由 Adrian Bunk 提交于
This patch removes the mca-pentium boot option that was a noop. besides the source code cleanup factor, this saves some text as well: arch/x86/kernel/cpu/bugs.o: text data bss dec hex filename 651 77 4 732 2dc bugs.o.before 631 53 4 688 2b0 bugs.o.after Signed-off-by: NAdrian Bunk <bunk@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 09 2月, 2008 1 次提交
-
-
由 Adrian Bunk 提交于
The scheduled removal of the 'time' option. Signed-off-by: NAdrian Bunk <bunk@kernel.org> Acked-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 2月, 2008 2 次提交
-
-
由 Éric Piel 提交于
The acpi_no_initrd_override parameter permits to disable the load of an ACPI table from the initramfs. Signed-off-by: NEric Piel <eric.piel@tremplin-utc.net> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Pavel Machek 提交于
Signed-off-by: NPavel Machek <pavel@suse.cz> Acked-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 06 2月, 2008 1 次提交
-
-
由 Denis Cheng 提交于
with module_param macro, the __setup code can be killed now: const __setup("all-generic-ide", ide_generic_all_on); and the module name "generic.ko" is not descriptive to its functionality, can be changed in Makefile, the "ide-pci-generic.ko" is better. the ide-pci-generic.all-generic-ide parameter also documented in Documentation/kernel-parameters.txt Signed-off-by: NDenis Cheng <crquan@gmail.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
- 03 2月, 2008 3 次提交
-
-
由 Robert P. J. Day 提交于
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: NAdrian Bunk <bunk@kernel.org>
-
由 Robert P. J. Day 提交于
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: NAdrian Bunk <bunk@kernel.org>
-
由 Robert P. J. Day 提交于
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: NAdrian Bunk <bunk@kernel.org>
-
- 02 2月, 2008 1 次提交
-
-
由 Rafael J. Wysocki 提交于
The ACPI 1.0 specification wants us to put devices into low power states after executing the _PTS global control method, while ACPI 2.0 and later want us to do that in the reverse order. The current suspend code follows ACPI 2.0 in that respect which causes some ACPI 1.0x systems to hang during suspend (ref. http://bugzilla.kernel.org/show_bug.cgi?id=9528). Make the suspend code execute _PTS before putting devices into low power states (ie. in accordance with ACPI 1.0x) and provide a command line option to override the default if need be. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 30 1月, 2008 6 次提交
-
-
由 Willy Tarreau 提交于
The new "mfgptfix" boot command line option may be usd to fix MFGPT timers on AMD Geode platforms when the BIOS has incorrectly applied a workaround. TinyBIOS version 0.98 is known to be affected, 0.99 fixes the problem by letting the user disable the workaround. Signed-off-by: NWilly Tarreau <w@1wt.eu> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Yinghai Lu 提交于
when MTRRs are not covering the whole e820 table, we need to trim the RAM and need to update e820. reuse some code on 64-bit as well. here need to add early_get_cap and use it in early_cpu_detect, and move mtrr_bp_init early. The code successfully trimmed the memory map on Justin's system: from: [ 0.000000] BIOS-e820: 0000000100000000 - 000000022c000000 (usable) to: [ 0.000000] modified: 0000000100000000 - 0000000228000000 (usable) [ 0.000000] modified: 0000000228000000 - 000000022c000000 (reserved) According to Justin it makes quite a difference: | When I boot the box without any trimming it acts like a 286 or 386, | takes about 10 minutes to boot (using raptor disks). Signed-off-by: NYinghai Lu <yinghai.lu@sun.com> Tested-by: NJustin Piszcz <jpiszcz@lucidpixels.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Andi Kleen 提交于
Add a generic option to clear any cpuid bit. I added it because it was very easy to add with the new generic cpuid disable bitmap and perhaps it will be useful in the future. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Andi Kleen 提交于
To disable CLFLUSH usage, especially in change_page_attr(). Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Jesse Barnes 提交于
On some machines, buggy BIOSes don't properly setup WB MTRRs to cover all available RAM, meaning the last few megs (or even gigs) of memory will be marked uncached. Since Linux tends to allocate from high memory addresses first, this causes the machine to be unusably slow as soon as the kernel starts really using memory (i.e. right around init time). This patch works around the problem by scanning the MTRRs at boot and figuring out whether the current end_pfn value (setup by early e820 code) goes beyond the highest WB MTRR range, and if so, trimming it to match. A fairly obnoxious KERN_WARNING is printed too, letting the user know that not all of their memory is available due to a likely BIOS bug. Something similar could be done on i386 if needed, but the boot ordering would be slightly different, since the MTRR code on i386 depends on the boot_cpu_data structure being setup. This patch fixes a bug in the last patch that caused the code to run on non-Intel machines (AMD machines apparently don't need it and it's untested on other non-Intel machines, so best keep it off). Further enhancements and fixes from: Yinghai Lu <Yinghai.Lu@Sun.COM> Andi Kleen <ak@suse.de> Signed-off-by: NJesse Barnes <jesse.barnes@intel.com> Tested-by: NJustin Piszcz <jpiszcz@lucidpixels.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Yinghai Lu 提交于
For K8 system: 4G RAM with memory hole remapping enabled, or more than 4G RAM installed. when try to use kexec second kernel, and the first doesn't include gart_shutdown. the second kernel could have different aper position than the first kernel. and second kernel could use that hole as RAM that is still used by GART set by the first kernel. esp. when try to kexec 2.6.24 with sparse mem enable from previous kernel (from RHEL 5 or SLES 10). the new kernel will use aper by GART (set by first kernel) for vmemmap. and after new kernel setting one new GART. the position will be real RAM. the _mapcount set is lost. Bad page state in process 'swapper' page:ffffe2000e600020 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0 Trying to fix it up, but a reboot is needed Backtrace: Pid: 0, comm: swapper Not tainted 2.6.24-rc7-smp-gcdf71a10-dirty #13 Call Trace: [<ffffffff8026401f>] bad_page+0x63/0x8d [<ffffffff80264169>] __free_pages_ok+0x7c/0x2a5 [<ffffffff80ba75d1>] free_all_bootmem_core+0xd0/0x198 [<ffffffff80ba3a42>] numa_free_all_bootmem+0x3b/0x76 [<ffffffff80ba3461>] mem_init+0x3b/0x152 [<ffffffff80b959d3>] start_kernel+0x236/0x2c2 [<ffffffff80b9511a>] _sinittext+0x11a/0x121 and [ffffe2000e600000-ffffe2000e7fffff] PMD ->ffff81001c200000 on node 0 phys addr is : 0x1c200000 RHEL 5.1 kernel -53 said: PCI-DMA: aperture base @ 1c000000 size 65536 KB new kernel said: Mapping aperture over 65536 KB of RAM @ 3c000000 So could try to disable that GART if possible. According to Ingo > hm, i'm wondering, instead of modifying the GART, why dont we simply > _detect_ whatever GART settings we have inherited, and propagate that > into our e820 maps? I.e. if there's inconsistency, then punch that out > from the memory maps and just dont use that memory. > > that way it would not matter whether the GART settings came from a [old > or crashing] Linux kernel that has not called gart_iommu_shutdown(), or > whether it's a BIOS that has set up an aperture hole inconsistent with > the memory map it passed. (or the memory map we _think_ i tried to pass > us) > > it would also be more robust to only read and do a memory map quirk > based on that, than actively trying to change the GART so early in the > bootup. Later on we have to re-enable the GART _anyway_ and have to > punch a hole for it. > > and as a bonus, we would have shored up our defenses against crappy > BIOSes as well. add e820 modification for gart inconsistent setting. gart_fix_e820=off could be used to disable e820 fix. Signed-off-by: NYinghai Lu <yinghai.lu@sun.com> Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-