- 19 12月, 2008 1 次提交
-
-
由 Paul E. McKenney 提交于
This patch fixes a long-standing performance bug in classic RCU that results in massive internal-to-RCU lock contention on systems with more than a few hundred CPUs. Although this patch creates a separate flavor of RCU for ease of review and patch maintenance, it is intended to replace classic RCU. This patch still handles stress better than does mainline, so I am still calling it ready for inclusion. This patch is against the -tip tree. Nevertheless, experience on an actual 1000+ CPU machine would still be most welcome. Most of the changes noted below were found while creating an rcutiny (which should permit ejecting the current rcuclassic) and while doing detailed line-by-line documentation. Updates from v9 (http://lkml.org/lkml/2008/12/2/334): o Fixes from remainder of line-by-line code walkthrough, including comment spelling, initialization, undesirable narrowing due to type conversion, removing redundant memory barriers, removing redundant local-variable initialization, and removing redundant local variables. I do not believe that any of these fixes address the CPU-hotplug issues that Andi Kleen was seeing, but please do give it a whirl in case the machine is smarter than I am. A writeup from the walkthrough may be found at the following URL, in case you are suffering from terminal insomnia or masochism: http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf o Made rcutree tracing use seq_file, as suggested some time ago by Lai Jiangshan. o Added a .csv variant of the rcudata debugfs trace file, to allow people having thousands of CPUs to drop the data into a spreadsheet. Tested with oocalc and gnumeric. Updated documentation to suit. Updates from v8 (http://lkml.org/lkml/2008/11/15/139): o Fix a theoretical race between grace-period initialization and force_quiescent_state() that could occur if more than three jiffies were required to carry out the grace-period initialization. Which it might, if you had enough CPUs. o Apply Ingo's printk-standardization patch. o Substitute local variables for repeated accesses to global variables. o Fix comment misspellings and redundant (but harmless) increments of ->n_rcu_pending (this latter after having explicitly added it). o Apply checkpatch fixes. Updates from v7 (http://lkml.org/lkml/2008/10/10/291): o Fixed a number of problems noted by Gautham Shenoy, including the cpu-stall-detection bug that he was having difficulty convincing me was real. ;-) o Changed cpu-stall detection to wait for ten seconds rather than three in order to reduce false positive, as suggested by Ingo Molnar. o Produced a design document (http://lwn.net/Articles/305782/). The act of writing this document uncovered a number of both theoretical and "here and now" bugs as noted below. o Fix dynticks_nesting accounting confusion, simplify WARN_ON() condition, fix kerneldoc comments, and add memory barriers in dynticks interface functions. o Add more data to tracing. o Remove unused "rcu_barrier" field from rcu_data structure. o Count calls to rcu_pending() from scheduling-clock interrupt to use as a surrogate timebase should jiffies stop counting. o Fix a theoretical race between force_quiescent_state() and grace-period initialization. Yes, initialization does have to go on for some jiffies for this race to occur, but given enough CPUs... Updates from v6 (http://lkml.org/lkml/2008/9/23/448): o Fix a number of checkpatch.pl complaints. o Apply review comments from Ingo Molnar and Lai Jiangshan on the stall-detection code. o Fix several bugs in !CONFIG_SMP builds. o Fix a misspelled config-parameter name so that RCU now announces at boot time if stall detection is configured. o Run tests on numerous combinations of configurations parameters, which after the fixes above, now build and run correctly. Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line): o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a changeset some time ago, and finally got around to retesting this option). o Fix some tracing bugs in rcupreempt that caused incorrect totals to be printed. o I now test with a more brutal random-selection online/offline script (attached). Probably more brutal than it needs to be on the people reading it as well, but so it goes. o A number of optimizations and usability improvements: o Make rcu_pending() ignore the grace-period timeout when there is no grace period in progress. o Make force_quiescent_state() avoid going for a global lock in the case where there is no grace period in progress. o Rearrange struct fields to improve struct layout. o Make call_rcu() initiate a grace period if RCU was idle, rather than waiting for the next scheduling clock interrupt. o Invoke rcu_irq_enter() and rcu_irq_exit() only when idle, as suggested by Andi Kleen. I still don't completely trust this change, and might back it out. o Make CONFIG_RCU_TRACE be the single config variable manipulated for all forms of RCU, instead of the prior confusion. o Document tracing files and formats for both rcupreempt and rcutree. Updates from v4 for those missing v5 given its bad subject line: o Separated dynticks interface so that NMIs and irqs call separate functions, greatly simplifying it. In particular, this code no longer requires a proof of correctness. ;-) o Separated dynticks state out into its own per-CPU structure, avoiding the duplicated accounting. o The case where a dynticks-idle CPU runs an irq handler that invokes call_rcu() is now correctly handled, forcing that CPU out of dynticks-idle mode. o Review comments have been applied (thank you all!!!). For but one example, fixed the dynticks-ordering issue that Manfred pointed out, saving me much debugging. ;-) o Adjusted rcuclassic and rcupreempt to handle dynticks changes. Attached is an updated patch to Classic RCU that applies a hierarchy, greatly reducing the contention on the top-level lock for large machines. This passes 10-hour concurrent rcutorture and online-offline testing on 128-CPU ppc64 without dynticks enabled, and exposes some timekeeping bugs in presence of dynticks (exciting working on a system where "sleep 1" hangs until interrupted...), which were fixed in the 2.6.27 kernel. It is getting more reliable than mainline by some measures, so the next version will be against -tip for inclusion. See also Manfred Spraul's recent patches (or his earlier work from 2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2). We will converge onto a common patch in the fullness of time, but are currently exploring different regions of the design space. That said, I have already gratefully stolen quite a few of Manfred's ideas. This patch provides CONFIG_RCU_FANOUT, which controls the bushiness of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on 64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT, there is no hierarchy. By default, the RCU initialization code will adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable this balancing, allowing the hierarchy to be exactly aligned to the underlying hardware. Up to two levels of hierarchy are permitted (in addition to the root node), allowing up to 16,384 CPUs on 32-bit systems and up to 262,144 CPUs on 64-bit systems. I just know that I am going to regret saying this, but this seems more than sufficient for the foreseeable future. (Some architectures might wish to set CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs. If this becomes a real problem, additional levels can be added, but I doubt that it will make a significant difference on real hardware.) In the common case, a given CPU will manipulate its private rcu_data structure and the rcu_node structure that it shares with its immediate neighbors. This can reduce both lock and memory contention by multiple orders of magnitude, which should eliminate the need for the strange manipulations that are reported to be required when running Linux on very large systems. Some shortcomings: o More bugs will probably surface as a result of an ongoing line-by-line code inspection. Patches will be provided as required. o There are probably hangs, rcutorture failures, &c. Seems quite stable on a 128-CPU machine, but that is kind of small compared to 4096 CPUs. However, seems to do better than mainline. Patches will be provided as required. o The memory footprint of this version is several KB larger than rcuclassic. A separate UP-only rcutiny patch will be provided, which will reduce the memory footprint significantly, even compared to the old rcuclassic. One such patch passes light testing, and has a memory footprint smaller even than rcuclassic. Initial reaction from various embedded guys was "it is not worth it", so am putting it aside. Credits: o Manfred Spraul for ideas, review comments, and bugs spotted, as well as some good friendly competition. ;-) o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers, Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton for reviews and comments. o Thomas Gleixner for much-needed help with some timer issues (see patches below). o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos, Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines alive despite my heavy abuse^Wtesting. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 07 10月, 2008 1 次提交
-
-
由 Vitaly Mayatskikh 提交于
rtas_log_read() doesn't check file flags for O_NONBLOCK and blocks non-blocking readers of /proc/ppc64/rtas/error_log when there is no data available. This fixes it. Also rtas_log_read() returns now with ENODATA to prevent suspending of process in wait_event_interruptible() when logging facility was switched off and log is already empty. Signed-off-by: NVitaly Mayatskikh <v.mayatskih@gmail.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 14 5月, 2008 1 次提交
-
-
由 Michael Ellerman 提交于
Don't return void in pseries/iommu.c Make mce_data_buf static in pseries/ras.c Make things static in pseries/rtasd.c Make things static in pseries/setup.c vtermno may as well be static in platforms/pseries/lpar.c Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 29 4月, 2008 1 次提交
-
-
由 Denis V. Lunev 提交于
Use proc_create()/proc_create_data() to make sure that ->proc_fops and ->data be setup before gluing PDE to main tree. Add correct ->owner to proc_fops to fix reading/module unloading race. Signed-off-by: NDenis V. Lunev <den@openvz.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 4月, 2008 1 次提交
-
-
由 Michael Ellerman 提交于
In pseries/lpar.c, fix some printf specifier mismatches, and add a newline to one printk. In pseries/rtasd.c add "rtasd" to some messages to make it clear where they're coming from. In pseries/scanlog.c remove the hand-rolled runtime debugging support in there. This file has been largely unchanged for eons, if we need to debug it in future we can recompile. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 26 1月, 2008 1 次提交
-
-
由 Gautham R Shenoy 提交于
Replace all lock_cpu_hotplug/unlock_cpu_hotplug from the kernel and use get_online_cpus and put_online_cpus instead as it highlights the refcount semantics in these operations. The new API guarantees protection against the cpu-hotplug operation, but it doesn't guarantee serialized access to any of the local data structures. Hence the changes needs to be reviewed. In case of pseries_add_processor/pseries_remove_processor, use cpu_maps_update_begin()/cpu_maps_update_done() as we're modifying the cpu_present_map there. Signed-off-by: NGautham R Shenoy <ego@in.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 03 10月, 2007 1 次提交
-
-
由 Tony Breeds 提交于
Some older pSeries machines were panicking in pSeries_log_error because it was getting called before it was ready. This is a result of commit "[POWERPC] pseries: Fix jumbled no_logging flag." (79c0108d). This fixes it by explicitly enabling RTAS error logging when it has been initialized, and also makes the code clearer by renaming the "no_more_logging" variable to "logging_enabled". Signed-off-by: NTony Breeds <tony@bakeyournoodle.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 17 8月, 2007 5 次提交
-
-
由 Linas Vepstas 提交于
Eliminate the use of error_log_cnt as a global var shared across different directories. Pass it as a parameter instead. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- Respin of earlier patch, with the CONFIG_PSERIES junk removed from the header file. arch/powerpc/kernel/nvram_64.c | 10 +++++----- arch/powerpc/platforms/pseries/rtasd.c | 7 ++++--- include/asm-powerpc/nvram.h | 6 ++++-- 3 files changed, 13 insertions(+), 10 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Get rid of the jumbled usage of the no_logging flag. Its use spans several directories, and is incorrectly/misleadingly documented. Instead, two changes: 1) nvram will accept error log as soon as its ready. 2) logging to nvram stops on the first fatal error reported. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/kernel/nvram_64.c | 8 -------- arch/powerpc/platforms/pseries/rtasd.c | 14 ++++++-------- 2 files changed, 6 insertions(+), 16 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
Simplify rtasd initialization code; this also fixes a buglet, where the /proc entries weren't being cleaned up in case of failure. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/rtasd.c | 53 +++++++++++---------------------- 1 file changed, 19 insertions(+), 34 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
The rtas_token() call does the same thing as this hand-rolled code. This makes the code easier to read. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- arch/powerpc/platforms/pseries/rtasd.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Linas Vepstas 提交于
We don't need to look up the rtas event token once per cpu per second. This avoids some misc device-tree lookups and string ops and so provides some minor performance improvement. Signed-off-by: NLinas Vepstas <linas@austin.ibm.com> ---- Revised commit-log message. arch/powerpc/platforms/pseries/rtasd.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 13 4月, 2007 1 次提交
-
-
由 Stephen Rothwell 提交于
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 13 2月, 2007 1 次提交
-
-
由 Arjan van de Ven 提交于
Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. [akpm@osdl.org: sparc64 fix] Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 7月, 2006 1 次提交
-
-
由 Jeremy Kerr 提交于
Now that get_property() returns a void *, there's no need to cast its return value. Also, treat the return value as const, so we can constify get_property later. pseries platform changes. Built for pseries_defconfig Signed-off-by: NJeremy Kerr <jk@ozlabs.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 22 4月, 2006 1 次提交
-
-
由 Olof Johansson 提交于
Most users won't really know the difference between a started RTAS daemon and a missing event-scan. Move it to debug levels. Signed-off-by: NOlof Johansson <olof@lixom.net> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 14 4月, 2006 1 次提交
-
-
由 Anton Blanchard 提交于
Fix __initcall return in proc_rtas_init and rtas_init. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 28 3月, 2006 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This removes statically assigned platform numbers and reworks the powerpc platform probe code to use a better mechanism. With this, board support files can simply declare a new machine type with a macro, and implement a probe() function that uses the flattened device-tree to detect if they apply for a given machine. We now have a machine_is() macro that replaces the comparisons of _machine with the various PLATFORM_* constants. This commit also changes various drivers to use the new macro instead of looking at _machine. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 11 11月, 2005 1 次提交
-
-
由 Paul Mackerras 提交于
We needed the VDSO symbols in the arch/ppc asm-offsets.c, and there were a few usages of _systemcfg still left lying around. Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 10 11月, 2005 1 次提交
-
-
由 Paul Mackerras 提交于
This patch merges platform codes. systemcfg->platform is no longer used, systemcfg use in general is deprecated as much as possible (and renamed _systemcfg before it gets completely moved elsewhere in a future patch), _machine is now used on ppc64 along as ppc32. Platform codes aren't gone yet but we are getting a step closer. A bunch of asm code in head[_64].S is also turned into C code. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 03 11月, 2005 1 次提交
-
-
由 Paul Mackerras 提交于
This moves rtas-proc.c and rtas_flash.c into arch/powerpc/kernel, since cell wants them as well as pseries (and chrp can use rtas-proc.c too, at least in principle). rtas_fw.c is gone, with its bits moved into rtas_flash.c and rtas.c. Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 05 9月, 2005 1 次提交
-
-
由 Nishanth Aravamudan 提交于
Use msleep_interruptible() instead of schedule_timeout() in ppc64-specific code to cleanup/simplify the sleeping logic. Change the units of the parameter of do_event_scan_all_cpus() to milliseconds from jiffies. The return value of rtas_extended_busy_delay_time() was incorrectly being used as a jiffies value (it is actually milliseconds), which is fixed by using the value as a parameter to msleep_interruptible(). Also, use rtas_extended_busy_delay_time() in another case where similar logic is duplicated. Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 20 6月, 2005 1 次提交
-
-
由 Anton Blanchard 提交于
Some rtasd printks were too loud. They would appear on a quiet boot even though they were only informational. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 17 4月, 2005 1 次提交
-
-
由 Linus Torvalds 提交于
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
-