- 02 6月, 2009 1 次提交
-
-
由 Anton Blanchard 提交于
RTAS event scan has to run across all cpus. Right now we use a kernel thread and set_cpus_allowed but in doing so we wake up the previous cpu unnecessarily. Some ftrace output shows this: previous cpu (2): [002] 7.022331: sched_switch: task swapper:0 [140] ==> rtasd:194 [120] [002] 7.022338: sched_switch: task rtasd:194 [120] ==> migration/2:9 [0] [002] 7.022344: sched_switch: task migration/2:9 [0] ==> swapper:0 [140] next cpu (3): [003] 7.022345: sched_switch: task swapper:0 [140] ==> rtasd:194 [120] [003] 7.022371: sched_switch: task rtasd:194 [120] ==> swapper:0 [140] We can use schedule_delayed_work_on and avoid the unnecessary wakeup. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 21 5月, 2009 2 次提交
-
-
由 Kumar Gala 提交于
There doesn't appear to be any specific reason that we need to setup the pseries specific notifier in generic arch pci code. Move it into pseries land. Signed-off-by: NKumar Gala <galak@kernel.crashing.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Robert Jennings 提交于
Adds support for the "unused" page hint which can be used in shared memory partitions to flag pages not in use, which will then be stolen before active pages by the hypervisor when memory needs to be moved to LPARs in need of additional memory. Failure to mark pages as 'unused' makes the LPAR slower to give up unused memory to other partitions. This adds the kernel parameter 'cmo_free_hint' to disable this functionality. Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 15 4月, 2009 2 次提交
-
-
由 Sachin Sant 提交于
A randconfig build on powerpc failed with: dtl.c: In function 'dtl_init': dtl.c:238: error: implicit declaration of function 'firmware_has_feature' dtl.c:238: error: 'FW_FEATURE_SPLPAR' undeclared (first use in this function) - We need firmware.h for these definitions. Signed-off-by: NSachin Sant <sachinp@in.ibm.com> Signed-off-by: NJeremy Kerr <jk@ozlabs.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Mike Mason 提交于
While adding native EEH support to Emulex and Qlogic drivers, it was discovered that dev->error_state was set to pci_io_channel_normal too late in the recovery process. These drivers rely on error_state to determine if they can access the device in their slot_reset callback, thus error_state needs to be set to pci_io_channel_normal in eeh_report_reset(). Below is a detailed explanation (courtesy of Richard Lary) as to why this is necessary. Background: PCI MMIO or DMA accesses to a frozen slot generate additional EEH errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the adapter will be shutdown. To avoid triggering excessive EEH errors and an undesirable adapter shutdown, some drivers use the pci_channel_offline(dev) wrapper function to return a Boolean value based on the value of pci_dev->error_state to determine if PCI MMIO or DMA accesses are safe. If the wrapper returns TRUE, drivers must not make PCI MMIO or DMA access to their hardware. The pci_dev structure member error_state reflects one of three values, 1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3) pci_channel_io_perm_failure. Function pci_channel_offline(dev) returns TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure. The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the point where the PCI slot is frozen. Currently, the EEH driver restores dev->error_state to pci_channel_io_normal in eeh_report_resume() before calling the driver's resume callback. However, when the EEH driver calls the driver's slot_reset callback() from eeh_report_reset(), it incorrectly indicates the error state is still pci_channel_io_frozen. Waiting until eeh_report_resume() to restore dev->error_state to pci_channel_io_normal is too late for Emulex and QLogic FC drivers and any other drivers which are designed to use common code paths in these two cases: i) those called after the driver's slot_reset callback() and ii) those called after the PCI slot is frozen but before the driver's slot_reset callback is called. Case i) all driver paths executed to reinitialize the hardware after a reset and case ii) all code paths executed by driver kernel threads that run asynchronous to the main driver thread, such as interrupt handlers and worker threads to process driver work queues. Emulex and QLogic FC drivers are designed with common code paths which require that pci_channel_offline(dev) reflect the true state of the hardware. The state transitions that the hardware takes from Normal Operations to Slot Frozen to Reset to Normal Operations are documented in the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75. PE State Control. PAPR defines the following 3 states: 0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed (Normal Operations) 1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled 2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled (Slot Frozen) An EEH error places the slot in state 2 (Frozen) and the adapter driver is notified that an EEH error was detected. If the adapter driver returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls eeh_reset_device() to place the slot into state 1 (Reset) and eeh_reset_device completes by placing the slot into State 0 (Normal Operations). Upon return from eeh_reset_device(), the EEH driver calls eeh_report_reset, which then calls the adapter's slot_reset callback. At the time the adapter's slot_reset callback is called, the true state of the hardware is Normal Operations and should be accurately reflected by setting dev->error_state to pci_channel_io_normal. The current implementation of EEH driver does not do so and requires this change to correct this deficiency. Signed-off-by: NMike Mason <mmlnx@us.ibm.com> Acked-by: NLinas Vepstas <linasvepstas@gmail.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 27 3月, 2009 1 次提交
-
-
由 Jeremy Kerr 提交于
Currently, we don't enforce any ordering for updates to the lppaca when enabling dtl logging, so we may end up enabling logging before the index fields have been established. This change adds a smp_wmb() before doing the actual enable. Signed-off-by: NJeremy Kerr <jk@ozlabs.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 24 3月, 2009 2 次提交
-
-
由 Jeremy Kerr 提交于
pseries SPLPAR machines are able to retrieve a log of dispatch and preempt events from the hypervisor. With this information, we can see when and why each dispatch & preempt is occuring. This change adds a set of debugfs files allowing userspace to read this dispatch log. Based on initial patches from Nishanth Aravamudan <nacc@us.ibm.com>. Signed-off-by: NJeremy Kerr <jk@ozlabs.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Fontenot 提交于
The return code from invoking the notifier chain when updating the ibm,dynamic-memory property is not handled properly. In failure cases (rc == NOTIFY_BAD) we should be restoring the original value of the property. In success (rc == NOTIFY_OK) we should be returning zero from the calling routine. Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 11 3月, 2009 3 次提交
-
-
由 Benjamin Herrenschmidt 提交于
CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't really meaningful anymore. It was basically equivalent to PPC64 || 6xx. This removes it along with the following changes: - 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely on 6xx which is what they want anyway. - A new symbol, PPC_BOOK3S, is defined that represent compliance with the "Server" variant of the architecture. This is set when either 6xx or PPC64 is set and open the door for future BOOK3E 64-bit. - 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use PPC64 && PPC_BOOK3S - A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now used to control the use of prom_init.c Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
There's no way for us to express to firmware that we want a discontiguous, or non-zero based, range of MSI-X entries. So we must reject such requests. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 23 2月, 2009 2 次提交
-
-
由 Michael Ellerman 提交于
There are hardware limitations on the number of available MSIs, which firmware expresses using a property named "ibm,pe-total-#msi". This property tells us how many MSIs are available for devices below the point in the PCI tree where we find the property. For old firmwares which don't have the property, we assume there are 8 MSIs available per "partitionable endpoint" (PE). The PE can be found using existing EEH code, which uses the methods described in PAPR. For our purposes we want the parent of the node that's identified using this method. When a driver requests n MSIs for a device, we first establish where the "ibm,pe-total-#msi" property above that device is, or we find the PE if the property is not found. In both cases we call this node the "pe_dn". We then count all non-bridge devices below the pe_dn, to establish how many devices in total may need MSIs. The quota is then simply the total available divided by the number of devices, if the request is less than or equal to the quota, the request is fine and we're done. If the request is greater than the quota, we try to determine if there are any "spare" MSIs which we can give to this device. Spare MSIs are found by looking for other devices which can never use their full quota, because their "req#msi(-x)" property is less than the quota. If we find any spare, we divide the spares by the number of devices that could request more than their quota. This ensures the spare MSIs are spread evenly amongst all over-quota requestors. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
If a driver asks for more MSIs than the devices "req#msi(-x)" property, we currently return -ENOSPC. This doesn't give the driver any chance to make a new request with a number that might work. So if "req#msi(-x)" is less than the request, return its value. To be 100% safe, make sure we return an error if req_msi == 0. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 11 2月, 2009 6 次提交
-
-
由 Mike Mason 提交于
The EEH code disables and enables interrupts during the device recovery process. This is unnecessary for MSI and MSI-X interrupts because they are effectively disabled by the DMA Stopped state when an EEH error occurs. The current code is also incorrect for MSI-X interrupts. It doesn't take into account that MSI-X interrupts are tracked in a different way than LSI/MSI interrupts. This patch ensures only LSI interrupts are disabled/enabled. Signed-off-by: NMike Mason <mmlnx@us.ibm.com> Acked-by: NLinas Vepstas <linasvepstas@gmail.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
If we can't allocate the requested number of MSIs, we can still tell the generic code how many we were able to allocate. That can then be passed onto the driver, allowing it to request that many in future, and probably succeeed. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
We also need to check that the device isn't using MSI-X in the irq fixup routine, otherwise we might leave MSI-Xs configured at boot. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
Firmware encodes the number of MSI-X requested by a device in a different property than for MSI. Pull the property name out as a parameter and share the logic for both cases. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Ellerman 提交于
We need to increment i in the loop that queries what interrupts firmware gave us, otherwise we'll incorrectly use the first value over and over. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Milton Miller 提交于
Since we never hotplug add an isa bus, we never need to set primary. Delete this write-only variable. Signed-off-by: NMilton Miller <miltonm@bga.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 10 2月, 2009 1 次提交
-
-
由 Michael Neuling 提交于
arch/powerpc/platforms/pseries/hotplug-memory.c uses remove_section_mapping() but doesn't include sparsemem.h which defines it. This can cause compilation fails for some configs. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 28 1月, 2009 1 次提交
-
-
由 Stephen Rothwell 提交于
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 13 1月, 2009 2 次提交
-
-
由 Ingo Molnar 提交于
Convert arch/powerpc/ over to long long based u64: -#ifdef __powerpc64__ -# include <asm-generic/int-l64.h> -#else -# include <asm-generic/int-ll64.h> -#endif +#include <asm-generic/int-ll64.h> This will avoid reoccuring spurious warnings in core kernel code that comes when people test on their own hardware. (i.e. x86 in ~98% of the cases) This is what x86 uses and it generally helps keep 64-bit code 32-bit clean too. [Adjusted to not impact user mode (from paulus) - sfr] Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Mike Travis 提交于
Impact: cleanup, update to new cpumask API Irq_desc.affinity and irq_desc.pending_mask are now cpumask_var_t's so access to them should be using the new cpumask API. Signed-off-by: NMike Travis <travis@sgi.com>
-
- 23 12月, 2008 1 次提交
-
-
由 Sebastien Dugue 提交于
Currently, pseries_cpu_die() calls msleep() while polling RTAS for the status of the dying cpu. However, if the cpu that is going down also happens to be the one doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is only passed later on in tick_shutdown() when _cpu_down() does the CPU_DEAD notification. Therefore jiffies won't be updated anymore. This replaces that msleep() with a cpu_relax() to make sure we're not going to schedule at that point. With this patch my test box survives a 100k iterations hotplug stress test on _all_ cpus, whereas without it, it quickly dies after ~50 iterations. Signed-off-by: NSebastien Dugue <sebastien.dugue@bull.net> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 21 12月, 2008 4 次提交
-
-
由 Brian King 提交于
When running Active Memory Sharing, pages can get marked as "loaned" with the hypervisor by the CMM driver. This state gets cleared by the system firmware when rebooting the partition. When using kexec to boot a new kernel, this state never gets cleared and the hypervisor and CMM driver can get out of sync with respect to the number of pages currently marked "loaned". Fix this by adding a reboot notifier to the CMM driver to deflate the balloon and mark all pages as active. Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Brian King 提交于
When running Active Memory Sharing, the Collaborative Memory Manager (CMM) may mark some pages as "loaned" with the hypervisor. Periodically, the CMM will query the hypervisor for a loan request, which is a single signed value. When kexec'ing into a kdump kernel, the CMM driver in the kdump kernel is not aware of the pages the previous kernel had marked as "loaned", so the hypervisor and the CMM driver are out of sync. This results in the CMM driver getting a negative loan request, which can then get treated as a large unsigned value and can cause kdump to hang due to the CMM driver inflating too large. Since there really is no clean way for the CMM driver in the kdump kernel to clean this up, simply disable CMM in the kdump kernel. This fixes hangs we were seeing doing kdump with AMS. Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Tony Breeds 提交于
ibm_configure_kernel_dump is passed as the token to rtas_call() is never initialised. This sets it to something sane. Signed-off-by: NTony Breeds <tony@bakeyournoodle.com> Acked-by: NNathan Lynch <ntl@pobox.com> Acked-by: NManish Ahuja <mahujam@gmail.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Tony Breeds 提交于
print_dump_header() will be called at least once with a NULL pointer in a normal boot sequence. If DEBUG is defined then we will dereference the pointer and crash. Add a quick fix to exit early in the NULL pointer case. Signed-off-by: NTony Breeds <tony@bakeyournoodle.com> Acked-by: NManish Ahuja <mahujam@gmail.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 19 12月, 2008 1 次提交
-
-
由 Paul E. McKenney 提交于
This patch fixes a long-standing performance bug in classic RCU that results in massive internal-to-RCU lock contention on systems with more than a few hundred CPUs. Although this patch creates a separate flavor of RCU for ease of review and patch maintenance, it is intended to replace classic RCU. This patch still handles stress better than does mainline, so I am still calling it ready for inclusion. This patch is against the -tip tree. Nevertheless, experience on an actual 1000+ CPU machine would still be most welcome. Most of the changes noted below were found while creating an rcutiny (which should permit ejecting the current rcuclassic) and while doing detailed line-by-line documentation. Updates from v9 (http://lkml.org/lkml/2008/12/2/334): o Fixes from remainder of line-by-line code walkthrough, including comment spelling, initialization, undesirable narrowing due to type conversion, removing redundant memory barriers, removing redundant local-variable initialization, and removing redundant local variables. I do not believe that any of these fixes address the CPU-hotplug issues that Andi Kleen was seeing, but please do give it a whirl in case the machine is smarter than I am. A writeup from the walkthrough may be found at the following URL, in case you are suffering from terminal insomnia or masochism: http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf o Made rcutree tracing use seq_file, as suggested some time ago by Lai Jiangshan. o Added a .csv variant of the rcudata debugfs trace file, to allow people having thousands of CPUs to drop the data into a spreadsheet. Tested with oocalc and gnumeric. Updated documentation to suit. Updates from v8 (http://lkml.org/lkml/2008/11/15/139): o Fix a theoretical race between grace-period initialization and force_quiescent_state() that could occur if more than three jiffies were required to carry out the grace-period initialization. Which it might, if you had enough CPUs. o Apply Ingo's printk-standardization patch. o Substitute local variables for repeated accesses to global variables. o Fix comment misspellings and redundant (but harmless) increments of ->n_rcu_pending (this latter after having explicitly added it). o Apply checkpatch fixes. Updates from v7 (http://lkml.org/lkml/2008/10/10/291): o Fixed a number of problems noted by Gautham Shenoy, including the cpu-stall-detection bug that he was having difficulty convincing me was real. ;-) o Changed cpu-stall detection to wait for ten seconds rather than three in order to reduce false positive, as suggested by Ingo Molnar. o Produced a design document (http://lwn.net/Articles/305782/). The act of writing this document uncovered a number of both theoretical and "here and now" bugs as noted below. o Fix dynticks_nesting accounting confusion, simplify WARN_ON() condition, fix kerneldoc comments, and add memory barriers in dynticks interface functions. o Add more data to tracing. o Remove unused "rcu_barrier" field from rcu_data structure. o Count calls to rcu_pending() from scheduling-clock interrupt to use as a surrogate timebase should jiffies stop counting. o Fix a theoretical race between force_quiescent_state() and grace-period initialization. Yes, initialization does have to go on for some jiffies for this race to occur, but given enough CPUs... Updates from v6 (http://lkml.org/lkml/2008/9/23/448): o Fix a number of checkpatch.pl complaints. o Apply review comments from Ingo Molnar and Lai Jiangshan on the stall-detection code. o Fix several bugs in !CONFIG_SMP builds. o Fix a misspelled config-parameter name so that RCU now announces at boot time if stall detection is configured. o Run tests on numerous combinations of configurations parameters, which after the fixes above, now build and run correctly. Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line): o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a changeset some time ago, and finally got around to retesting this option). o Fix some tracing bugs in rcupreempt that caused incorrect totals to be printed. o I now test with a more brutal random-selection online/offline script (attached). Probably more brutal than it needs to be on the people reading it as well, but so it goes. o A number of optimizations and usability improvements: o Make rcu_pending() ignore the grace-period timeout when there is no grace period in progress. o Make force_quiescent_state() avoid going for a global lock in the case where there is no grace period in progress. o Rearrange struct fields to improve struct layout. o Make call_rcu() initiate a grace period if RCU was idle, rather than waiting for the next scheduling clock interrupt. o Invoke rcu_irq_enter() and rcu_irq_exit() only when idle, as suggested by Andi Kleen. I still don't completely trust this change, and might back it out. o Make CONFIG_RCU_TRACE be the single config variable manipulated for all forms of RCU, instead of the prior confusion. o Document tracing files and formats for both rcupreempt and rcutree. Updates from v4 for those missing v5 given its bad subject line: o Separated dynticks interface so that NMIs and irqs call separate functions, greatly simplifying it. In particular, this code no longer requires a proof of correctness. ;-) o Separated dynticks state out into its own per-CPU structure, avoiding the duplicated accounting. o The case where a dynticks-idle CPU runs an irq handler that invokes call_rcu() is now correctly handled, forcing that CPU out of dynticks-idle mode. o Review comments have been applied (thank you all!!!). For but one example, fixed the dynticks-ordering issue that Manfred pointed out, saving me much debugging. ;-) o Adjusted rcuclassic and rcupreempt to handle dynticks changes. Attached is an updated patch to Classic RCU that applies a hierarchy, greatly reducing the contention on the top-level lock for large machines. This passes 10-hour concurrent rcutorture and online-offline testing on 128-CPU ppc64 without dynticks enabled, and exposes some timekeeping bugs in presence of dynticks (exciting working on a system where "sleep 1" hangs until interrupted...), which were fixed in the 2.6.27 kernel. It is getting more reliable than mainline by some measures, so the next version will be against -tip for inclusion. See also Manfred Spraul's recent patches (or his earlier work from 2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2). We will converge onto a common patch in the fullness of time, but are currently exploring different regions of the design space. That said, I have already gratefully stolen quite a few of Manfred's ideas. This patch provides CONFIG_RCU_FANOUT, which controls the bushiness of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on 64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT, there is no hierarchy. By default, the RCU initialization code will adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable this balancing, allowing the hierarchy to be exactly aligned to the underlying hardware. Up to two levels of hierarchy are permitted (in addition to the root node), allowing up to 16,384 CPUs on 32-bit systems and up to 262,144 CPUs on 64-bit systems. I just know that I am going to regret saying this, but this seems more than sufficient for the foreseeable future. (Some architectures might wish to set CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs. If this becomes a real problem, additional levels can be added, but I doubt that it will make a significant difference on real hardware.) In the common case, a given CPU will manipulate its private rcu_data structure and the rcu_node structure that it shares with its immediate neighbors. This can reduce both lock and memory contention by multiple orders of magnitude, which should eliminate the need for the strange manipulations that are reported to be required when running Linux on very large systems. Some shortcomings: o More bugs will probably surface as a result of an ongoing line-by-line code inspection. Patches will be provided as required. o There are probably hangs, rcutorture failures, &c. Seems quite stable on a 128-CPU machine, but that is kind of small compared to 4096 CPUs. However, seems to do better than mainline. Patches will be provided as required. o The memory footprint of this version is several KB larger than rcuclassic. A separate UP-only rcutiny patch will be provided, which will reduce the memory footprint significantly, even compared to the old rcuclassic. One such patch passes light testing, and has a memory footprint smaller even than rcuclassic. Initial reaction from various embedded guys was "it is not worth it", so am putting it aside. Credits: o Manfred Spraul for ideas, review comments, and bugs spotted, as well as some good friendly competition. ;-) o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers, Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton for reviews and comments. o Thomas Gleixner for much-needed help with some timer issues (see patches below). o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos, Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines alive despite my heavy abuse^Wtesting. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 16 12月, 2008 1 次提交
-
-
由 Nathan Lynch 提交于
Since "Factor out cpu joining/unjoining the GIQ" (b4963255) the WARN_ON in xics_set_cpu_giq() is being triggered during boot on JS20 because the GIQ indicator is not available on that platform. While the warning is harmless and the system runs normally, it's nicer to check for the existence of the indicator before trying to manipulate it. Implement rtas_indicator_present(), which searches the /rtas/rtas-indicators property for the given indicator token, and use this function in xics_set_cpu_giq(). Also use a WARN statement in xics_set_cpu_giq to get better information on failure. Signed-off-by: NNathan Lynch <ntl@pobox.com> Acked-by: NMilton Miller <miltonm@bga.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 13 12月, 2008 2 次提交
-
-
由 Rusty Russell 提交于
Impact: change existing irq_chip API Not much point with gentle transition here: the struct irq_chip's setaffinity method signature needs to change. Fortunately, not widely used code, but hits a few architectures. Note: In irq_select_affinity() I save a temporary in by mangling irq_desc[irq].affinity directly. Ingo, does this break anything? (Folded in fix from KOSAKI Motohiro) Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMike Travis <travis@sgi.com> Reviewed-by: NGrant Grundler <grundler@parisc-linux.org> Acked-by: NIngo Molnar <mingo@redhat.com> Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: jeremy@xensource.com Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
-
由 Rusty Russell 提交于
cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers. Impact: change calling convention of existing cpumask APIs Most cpumask functions started with cpus_: these have been replaced by cpumask_ ones which take struct cpumask pointers as expected. These four functions don't have good replacement names; fortunately they're rarely used, so we just change them over. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMike Travis <travis@sgi.com> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: mingo@redhat.com Cc: tony.luck@intel.com Cc: ralf@linux-mips.org Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: cl@linux-foundation.org Cc: srostedt@redhat.com
-
- 06 11月, 2008 2 次提交
-
-
由 Benjamin Herrenschmidt 提交于
The pseries PCI hotplug code has a number of issues, ranging from incorrect resource setup to crashes, depending on what is added, when, whether it contains a bridge, etc etc.... This fixes a whole bunch of these, while actually simplifying the code a bit, using more generic code in the process and factoring out common code between adding of a PHB, a slot or a device. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Benjamin Herrenschmidt 提交于
To properly fix PCI hotplug, it's useful to be able to make the fixup passes on all devices whether they were just hot plugged or already there. The EEH code however used to not be very friendly with calling eeh_add_device_late() multiple time, and not very rebust in the way it generally tests whether a device is in the expected state vs. the EEH code. This improves it, along with cleaning up a couple of debug printk's. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 05 11月, 2008 2 次提交
-
-
由 Sebastien Dugue 提交于
The 'ibm,interrupt-server#-size' properties are not in the cpu nodes, which is where we currently look for them, but rather live under the interrupt source controller nodes (which have "ibm,ppc-xics" in their compatible property). This moves the code that looks for the ibm,interrupt-server#-size properties from xics_update_irq_servers() into xics_init_IRQ(). Also this adds a check for mismatched sizes across the interrupt source controller nodes. Not sure this is necessary as in this case the firmware might be seriously busted. This property only appears on POWER6 boxes and is only used in the set-indicator(gqirm) call, and apparently firmware currently ignores the value we pass. Nevertheless we need to fix it in case future firmware versions use it. Signed-off-by: NSebastien Dugue <sebastien.dugue@bull.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NMilton Miller <miltonm@bga.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Stephen Rothwell 提交于
This gets rid of this build warning: arch/powerpc/platforms/pseries/pci_dlpar.c: In function 'init_phb_dynamic': arch/powerpc/platforms/pseries/pci_dlpar.c:192: warning: unused variable 'b' This is one of the very few warnings left in a ppc64_defconfig build and getting rid of it will make it easier to see future introduced ones (in fact this was introduced very recently). Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 31 10月, 2008 2 次提交
-
-
由 Nathan Fontenot 提交于
Resources for PHB's that are dynamically added to a system are not properly allocated in the resource tree. Not having these resources allocated causes an oops when removing the PHB when we try to release them. The diff appears a bit messy, this is mainly due to moving everything one tab to the left in the pcibios_allocate_bus_resources routine. The functionality change in this routine is only that the list_for_each_entry() loop is pulled out and moved to the necessary calling routine. Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Milton Miller 提交于
linux/crash_dump.h defines is_kdump_kernel() to be used by code that needs to know if the previous kernel crashed instead of a (clean) boot or reboot. This updates the just added powerpc code to use it. This is needed for the next commit, which will remove __kdump_flag. Signed-off-by: NMilton Miller <miltonm@bga.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 22 10月, 2008 1 次提交
-
-
由 Mohan Kumar M 提交于
This adds relocatable kernel support for kdump. With this one can use the same regular kernel to capture the kdump. A signature (0xfeed1234) is passed in r6 from panic code to the next kernel through kexec_sequence and purgatory code. The signature is used to differentiate between kdump kernel and non-kdump kernels. The purgatory code compares the signature and sets the __kdump_flag in head_64.S. During the boot up, kernel code checks __kdump_flag and if it is set, the kernel will behave as relocatable kdump kernel. This kernel will boot at the address where it was loaded by kexec-tools ie. at the address reserved through crashkernel boot parameter. CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump kernel as relocatable. So the same kernel can be used as production and kdump kernel. This patch incorporates the changes suggested by Paul Mackerras to avoid GOT use and to avoid two copies of the code. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NMohan Kumar M <mohan@in.ibm.com> Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 21 10月, 2008 1 次提交
-
-
由 Milton Miller 提交于
We used to assume that even numbered threads were the primary threads, ie those that would be listed and started as a cpu from open firmware. Replace a left over is even (% 2) check with a check for it being a primary thread and update the comments. Tested with a debug print on pseries, identical code found for cell. Signed-off-by: NMilton Miller <miltonm@bga.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-