提交 · 944916858a430a0627e483657d4cfa2cd2dfb4f7 · openeuler / raspberrypi-kernel

02 6月, 2009 1 次提交

powerpc: Convert RTAS event scan from kernel thread to workqueue · f8729e85

由 Anton Blanchard 提交于 5月 25, 2009

RTAS event scan has to run across all cpus. Right now we use a kernel
thread and set_cpus_allowed but in doing so we wake up the previous cpu
unnecessarily.

Some ftrace output shows this:

previous cpu (2):
[002]  7.022331: sched_switch: task swapper:0 [140] ==> rtasd:194 [120]
[002]  7.022338: sched_switch: task rtasd:194 [120] ==> migration/2:9 [0]
[002]  7.022344: sched_switch: task migration/2:9 [0] ==> swapper:0 [140]

next cpu (3):
[003]  7.022345: sched_switch: task swapper:0 [140] ==> rtasd:194 [120]
[003]  7.022371: sched_switch: task rtasd:194 [120] ==> swapper:0 [140]

We can use schedule_delayed_work_on and avoid the unnecessary wakeup.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f8729e85

21 5月, 2009 2 次提交

powerpc/pci: Move pseries code into pseries platform specific area · 2eb4afb6

由 Kumar Gala 提交于 4月 30, 2009

There doesn't appear to be any specific reason that we need to setup the
pseries specific notifier in generic arch pci code.  Move it into pseries
land.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

2eb4afb6

powerpc/pseries: CMO unused page hinting · 14f966e7

由 Robert Jennings 提交于 4月 15, 2009

Adds support for the "unused" page hint which can be used in shared
memory partitions to flag pages not in use, which will then be stolen
before active pages by the hypervisor when memory needs to be moved to
LPARs in need of additional memory.  Failure to mark pages as 'unused'
makes the LPAR slower to give up unused memory to other partitions.

This adds the kernel parameter 'cmo_free_hint' to disable this
functionality.
Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

14f966e7

15 4月, 2009 2 次提交

powerpc: pseries/dtl.c should include asm/firmware.h · b71a0c29

由 Sachin Sant 提交于 4月 14, 2009

A randconfig build on powerpc failed with:

dtl.c: In function 'dtl_init':
dtl.c:238: error: implicit declaration of function 'firmware_has_feature'
dtl.c:238: error: 'FW_FEATURE_SPLPAR' undeclared (first use in this function)

- We need firmware.h for these definitions.
Signed-off-by: NSachin Sant <sachinp@in.ibm.com>
Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

b71a0c29

powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset() · c58dc575

由 Mike Mason 提交于 4月 10, 2009

While adding native EEH support to Emulex and Qlogic drivers, it was
discovered that dev->error_state was set to pci_io_channel_normal too
late in the recovery process. These drivers rely on error_state to
determine if they can access the device in their slot_reset callback,
thus error_state needs to be set to pci_io_channel_normal in
eeh_report_reset(). Below is a detailed explanation (courtesy of Richard
Lary) as to why this is necessary.

Background:
PCI MMIO or DMA accesses to a frozen slot generate additional EEH
errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the
adapter will be shutdown. To avoid triggering excessive EEH errors and
an undesirable adapter shutdown, some drivers use the
pci_channel_offline(dev) wrapper function to return a Boolean value
based on the value of pci_dev->error_state to determine if PCI MMIO or
DMA accesses are safe. If the wrapper returns TRUE, drivers must not
make PCI MMIO or DMA access to their hardware.

The pci_dev structure member error_state reflects one of three values,
1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3)
pci_channel_io_perm_failure.  Function pci_channel_offline(dev) returns
TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure.

The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the
point where the PCI slot is frozen. Currently, the EEH driver restores
dev->error_state to pci_channel_io_normal in eeh_report_resume() before
calling the driver's resume callback. However, when the EEH driver calls
the driver's slot_reset callback() from eeh_report_reset(), it
incorrectly indicates the error state is still pci_channel_io_frozen.

Waiting until eeh_report_resume() to restore dev->error_state to
pci_channel_io_normal is too late for Emulex and QLogic FC drivers and
any other drivers which are designed to use common code paths in these
two cases: i) those called after the driver's slot_reset callback() and
ii) those called after the PCI slot is frozen but before the driver's
slot_reset callback is called. Case i) all driver paths executed to
reinitialize the hardware after a reset and case ii) all code paths
executed by driver kernel threads that run asynchronous to the main
driver thread, such as interrupt handlers and worker threads to process
driver work queues.

Emulex and QLogic FC drivers are designed with common code paths which
require that pci_channel_offline(dev) reflect the true state of the
hardware. The state transitions that the hardware takes from Normal
Operations to Slot Frozen to Reset to Normal Operations are documented
in the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75.
PE State Control.

PAPR defines the following 3 states:

0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed
     (Normal Operations)
1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled
2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled
     (Slot Frozen)

An EEH error places the slot in state 2 (Frozen) and the adapter driver
is notified that an EEH error was detected. If the adapter driver
returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls
eeh_reset_device() to place the slot into state 1 (Reset) and
eeh_reset_device completes by placing the slot into State 0 (Normal
Operations). Upon return from eeh_reset_device(), the EEH driver calls
eeh_report_reset, which then calls the adapter's slot_reset callback. At
the time the adapter's slot_reset callback is called, the true state of
the hardware is Normal Operations and should be accurately reflected by
setting dev->error_state to pci_channel_io_normal.

The current implementation of EEH driver does not do so and requires
this change to correct this deficiency.
Signed-off-by: NMike Mason <mmlnx@us.ibm.com>
Acked-by: NLinas Vepstas <linasvepstas@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

c58dc575

27 3月, 2009 1 次提交

powerpc: Add write barrier before enabling DTL flags · 82631f5d

由 Jeremy Kerr 提交于 3月 23, 2009

Currently, we don't enforce any ordering for updates to the lppaca
when enabling dtl logging, so we may end up enabling logging before the
index fields have been established.

This change adds a smp_wmb() before doing the actual enable.
Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

82631f5d

24 3月, 2009 2 次提交

powerpc: Add virtual processor dispatch trace log · fc59a3fc

由 Jeremy Kerr 提交于 3月 11, 2009

pseries SPLPAR machines are able to retrieve a log of dispatch and
preempt events from the hypervisor. With this information, we can
see when and why each dispatch & preempt is occuring.

This change adds a set of debugfs files allowing userspace to read this
dispatch log.

Based on initial patches from Nishanth Aravamudan <nacc@us.ibm.com>.
Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fc59a3fc

powerpc/pseries: Failed reconfig notifier chain call cleanup · c5785f9e

由 Nathan Fontenot 提交于 3月 09, 2009

The return code from invoking the notifier chain when updating the
ibm,dynamic-memory property is not handled properly. In failure
cases (rc == NOTIFY_BAD) we should be restoring the original value
of the property.  In success (rc == NOTIFY_OK) we should be returning
zero from the calling routine.
Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c5785f9e

11 3月, 2009 3 次提交

powerpc/kconfig: Kill PPC_MULTIPLATFORM · 28794d34

由 Benjamin Herrenschmidt 提交于 3月 10, 2009

CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't
really meaningful anymore. It was basically equivalent to PPC64 || 6xx.

This removes it along with the following changes:

 - 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely
   on 6xx which is what they want anyway.

 - A new symbol, PPC_BOOK3S, is defined that represent compliance with
   the "Server" variant of the architecture. This is set when either 6xx
   or PPC64 is set and open the door for future BOOK3E 64-bit.

 - 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use
   PPC64 && PPC_BOOK3S

 - A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now
   used to control the use of prom_init.c
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

28794d34

powerpc/pseries: The pseries MSI code depends on EEH · 1bac0221

由 Michael Ellerman 提交于 3月 05, 2009

Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1bac0221

powerpc/pseries: Reject discontiguous/non-zero based MSI-X requests · 94afa5a5

由 Michael Ellerman 提交于 3月 05, 2009

There's no way for us to express to firmware that we want a
discontiguous, or non-zero based, range of MSI-X entries. So we
must reject such requests.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

94afa5a5

23 2月, 2009 2 次提交

powerpc/pseries: Implement a quota system for MSIs · 448e2ca0

由 Michael Ellerman 提交于 2月 17, 2009

There are hardware limitations on the number of available MSIs,
which firmware expresses using a property named "ibm,pe-total-#msi".
This property tells us how many MSIs are available for devices below
the point in the PCI tree where we find the property.

For old firmwares which don't have the property, we assume there are
8 MSIs available per "partitionable endpoint" (PE). The PE can be
found using existing EEH code, which uses the methods described in
PAPR. For our purposes we want the parent of the node that's
identified using this method.

When a driver requests n MSIs for a device, we first establish where
the "ibm,pe-total-#msi" property above that device is, or we find the
PE if the property is not found. In both cases we call this node
the "pe_dn".

We then count all non-bridge devices below the pe_dn, to establish
how many devices in total may need MSIs. The quota is then simply the
total available divided by the number of devices, if the request is
less than or equal to the quota, the request is fine and we're done.

If the request is greater than the quota, we try to determine if there
are any "spare" MSIs which we can give to this device. Spare MSIs are
found by looking for other devices which can never use their full
quota, because their "req#msi(-x)" property is less than the quota.

If we find any spare, we divide the spares by the number of devices
that could request more than their quota. This ensures the spare
MSIs are spread evenly amongst all over-quota requestors.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

448e2ca0

powerpc/pseries: Return req#msi(-x) if request is larger · d523cc37

由 Michael Ellerman 提交于 2月 17, 2009

If a driver asks for more MSIs than the devices "req#msi(-x)" property,
we currently return -ENOSPC. This doesn't give the driver any chance to
make a new request with a number that might work.

So if "req#msi(-x)" is less than the request, return its value. To be
100% safe, make sure we return an error if req_msi == 0.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d523cc37

11 2月, 2009 6 次提交

powerpc/eeh: Only disable/enable LSI interrupts in EEH · 8535ef05

由 Mike Mason 提交于 2月 10, 2009

The EEH code disables and enables interrupts during the
device recovery process.  This is unnecessary for MSI
and MSI-X interrupts because they are effectively disabled
by the DMA Stopped state when an EEH error occurs.  The
current code is also incorrect for MSI-X interrupts.  It
doesn't take into account that MSI-X interrupts are tracked
in a different way than LSI/MSI interrupts.  This patch
ensures only LSI interrupts are disabled/enabled.
Signed-off-by: NMike Mason <mmlnx@us.ibm.com>
Acked-by: NLinas Vepstas <linasvepstas@gmail.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8535ef05

powerpc/pseries: Return the number of MSIs we could allocate · 6071ed04

由 Michael Ellerman 提交于 1月 22, 2009

If we can't allocate the requested number of MSIs, we can still tell the
generic code how many we were able to allocate. That can then be passed
onto the driver, allowing it to request that many in future, and
probably succeeed.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6071ed04

powerpc/pseries: Check for MSI-X also in rtas_msi_pci_irq_fixup() · 649781f8

由 Michael Ellerman 提交于 1月 22, 2009

We also need to check that the device isn't using MSI-X in the irq fixup
routine, otherwise we might leave MSI-Xs configured at boot.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

649781f8

powerpc/pseries: Add support for ibm,req#msi-x · 3a51c0cb

由 Michael Ellerman 提交于 1月 22, 2009

Firmware encodes the number of MSI-X requested by a device in a

different property than for MSI. Pull the property name out as a
parameter and share the logic for both cases.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

3a51c0cb

powerpc/pseries: Fix MSI-X interrupt querying · e27ed698

由 Michael Ellerman 提交于 1月 22, 2009

We need to increment i in the loop that queries what interrupts firmware
gave us, otherwise we'll incorrectly use the first value over and over.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e27ed698

powerpc/pseries: Remove write only variable in PCI DLPAR · 7ce14a31

由 Milton Miller 提交于 1月 08, 2009

Since we never hotplug add an isa bus, we never need to set primary.
Delete this write-only variable.
Signed-off-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7ce14a31

10 2月, 2009 1 次提交

powerpc: Add missing sparsemem.h include · 0b2f8287

由 Michael Neuling 提交于 2月 08, 2009

arch/powerpc/platforms/pseries/hotplug-memory.c uses
remove_section_mapping() but doesn't include sparsemem.h which defines
it.  This can cause compilation fails for some configs.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0b2f8287

28 1月, 2009 1 次提交

powerpc: Printing fix for l64 to ll64 conversion: phyp_dump.c · 802bdea8

由 Stephen Rothwell 提交于 1月 18, 2009

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

802bdea8

13 1月, 2009 2 次提交

powerpc: Change u64/s64 to a long long integer type · fe333321

由 Ingo Molnar 提交于 1月 06, 2009

Convert arch/powerpc/ over to long long based u64:

 -#ifdef __powerpc64__
 -# include <asm-generic/int-l64.h>
 -#else
 -# include <asm-generic/int-ll64.h>
 -#endif
 +#include <asm-generic/int-ll64.h>

This will avoid reoccuring spurious warnings in core kernel code that
comes when people test on their own hardware. (i.e. x86 in ~98% of the
cases) This is what x86 uses and it generally helps keep 64-bit code
32-bit clean too.

[Adjusted to not impact user mode (from paulus) - sfr]
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fe333321

irq: update all arches for new irq_desc · e65e49d0

由 Mike Travis 提交于 1月 12, 2009

Impact: cleanup, update to new cpumask API

Irq_desc.affinity and irq_desc.pending_mask are now cpumask_var_t's
so access to them should be using the new cpumask API.
Signed-off-by: NMike Travis <travis@sgi.com>

e65e49d0

23 12月, 2008 1 次提交

powerpc/pseries: Fix cpu hotplug · b906cfa3

由 Sebastien Dugue 提交于 11月 27, 2008

Currently, pseries_cpu_die() calls msleep() while polling RTAS for
the status of the dying cpu.

However, if the cpu that is going down also happens to be the one
doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is
only passed later on in tick_shutdown() when _cpu_down() does the
CPU_DEAD notification.  Therefore jiffies won't be updated anymore.

This replaces that msleep() with a cpu_relax() to make sure we're not
going to schedule at that point.

With this patch my test box survives a 100k iterations hotplug stress
test on _all_ cpus, whereas without it, it quickly dies after ~50
iterations.
Signed-off-by: NSebastien Dugue <sebastien.dugue@bull.net>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

b906cfa3

21 12月, 2008 4 次提交

powerpc: Add reboot notifier to Collaborative Memory Manager · fecba962

由 Brian King 提交于 12月 18, 2008

When running Active Memory Sharing, pages can get marked as
"loaned" with the hypervisor by the CMM driver. This state gets
cleared by the system firmware when rebooting the partition.
When using kexec to boot a new kernel, this state never gets
cleared and the hypervisor and CMM driver can get out of sync
with respect to the number of pages currently marked "loaned".
Fix this by adding a reboot notifier to the CMM driver to deflate
the balloon and mark all pages as active.
Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

fecba962

powerpc: Disable Collaborative Memory Manager for kdump · 2218108e

由 Brian King 提交于 12月 18, 2008

When running Active Memory Sharing, the Collaborative Memory Manager
(CMM) may mark some pages as "loaned" with the hypervisor.
Periodically, the CMM will query the hypervisor for a loan request,
which is a single signed value.  When kexec'ing into a kdump kernel,
the CMM driver in the kdump kernel is not aware of the pages the
previous kernel had marked as "loaned", so the hypervisor and the CMM
driver are out of sync.  This results in the CMM driver getting a
negative loan request, which can then get treated as a large unsigned
value and can cause kdump to hang due to the CMM driver inflating too
large.  Since there really is no clean way for the CMM driver in the
kdump kernel to clean this up, simply disable CMM in the kdump kernel.
This fixes hangs we were seeing doing kdump with AMS.
Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

2218108e

powerpc: Pass a valid token to rtas_call() in phyp-dump code · 532774ec

由 Tony Breeds 提交于 12月 15, 2008

ibm_configure_kernel_dump is passed as the token to rtas_call() is
never initialised.  This sets it to something sane.
Signed-off-by: NTony Breeds <tony@bakeyournoodle.com>
Acked-by: NNathan Lynch <ntl@pobox.com>
Acked-by: NManish Ahuja <mahujam@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

532774ec

powerpc: Protect against NULL pointer deref in phyp-dump code · 7a2eab0d

由 Tony Breeds 提交于 12月 15, 2008

print_dump_header() will be called at least once with a NULL pointer in
a normal boot sequence.  If DEBUG is defined then we will dereference
the pointer and crash.  Add a quick fix to exit early in the NULL pointer
case.
Signed-off-by: NTony Breeds <tony@bakeyournoodle.com>
Acked-by: NManish Ahuja <mahujam@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

7a2eab0d

19 12月, 2008 1 次提交

"Tree RCU": scalable classic RCU implementation · 64db4cff

由 Paul E. McKenney 提交于 12月 18, 2008

This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs. Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.

This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion. This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.

Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.

Updates from v9 (http://lkml.org/lkml/2008/12/2/334):

o Fixes from remainder of line-by-line code walkthrough,
including comment spelling, initialization, undesirable
narrowing due to type conversion, removing redundant memory
barriers, removing redundant local-variable initialization,
and removing redundant local variables.

I do not believe that any of these fixes address the CPU-hotplug
issues that Andi Kleen was seeing, but please do give it a whirl
in case the machine is smarter than I am.

A writeup from the walkthrough may be found at the following
URL, in case you are suffering from terminal insomnia or
masochism:

http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf

o Made rcutree tracing use seq_file, as suggested some time
ago by Lai Jiangshan.

o Added a .csv variant of the rcudata debugfs trace file, to allow
people having thousands of CPUs to drop the data into
a spreadsheet. Tested with oocalc and gnumeric. Updated
documentation to suit.

Updates from v8 (http://lkml.org/lkml/2008/11/15/139):

o Fix a theoretical race between grace-period initialization and
force_quiescent_state() that could occur if more than three
jiffies were required to carry out the grace-period
initialization. Which it might, if you had enough CPUs.

o Apply Ingo's printk-standardization patch.

o Substitute local variables for repeated accesses to global
variables.

o Fix comment misspellings and redundant (but harmless) increments
of ->n_rcu_pending (this latter after having explicitly added it).

o Apply checkpatch fixes.

Updates from v7 (http://lkml.org/lkml/2008/10/10/291):

o Fixed a number of problems noted by Gautham Shenoy, including
the cpu-stall-detection bug that he was having difficulty
convincing me was real. ;-)

o Changed cpu-stall detection to wait for ten seconds rather than
three in order to reduce false positive, as suggested by Ingo
Molnar.

o Produced a design document (http://lwn.net/Articles/305782/).
The act of writing this document uncovered a number of both
theoretical and "here and now" bugs as noted below.

o Fix dynticks_nesting accounting confusion, simplify WARN_ON()
condition, fix kerneldoc comments, and add memory barriers
in dynticks interface functions.

o Add more data to tracing.

o Remove unused "rcu_barrier" field from rcu_data structure.

o Count calls to rcu_pending() from scheduling-clock interrupt
to use as a surrogate timebase should jiffies stop counting.

o Fix a theoretical race between force_quiescent_state() and
grace-period initialization. Yes, initialization does have to
go on for some jiffies for this race to occur, but given enough
CPUs...

Updates from v6 (http://lkml.org/lkml/2008/9/23/448):

o Fix a number of checkpatch.pl complaints.

o Apply review comments from Ingo Molnar and Lai Jiangshan
on the stall-detection code.

o Fix several bugs in !CONFIG_SMP builds.

o Fix a misspelled config-parameter name so that RCU now announces
at boot time if stall detection is configured.

o Run tests on numerous combinations of configurations parameters,
which after the fixes above, now build and run correctly.

Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):

o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
changeset some time ago, and finally got around to retesting
this option).

o Fix some tracing bugs in rcupreempt that caused incorrect
totals to be printed.

o I now test with a more brutal random-selection online/offline
script (attached). Probably more brutal than it needs to be
on the people reading it as well, but so it goes.

o A number of optimizations and usability improvements:

o Make rcu_pending() ignore the grace-period timeout when
there is no grace period in progress.

o Make force_quiescent_state() avoid going for a global
lock in the case where there is no grace period in
progress.

o Rearrange struct fields to improve struct layout.

o Make call_rcu() initiate a grace period if RCU was
idle, rather than waiting for the next scheduling
clock interrupt.

o Invoke rcu_irq_enter() and rcu_irq_exit() only when
idle, as suggested by Andi Kleen. I still don't
completely trust this change, and might back it out.

o Make CONFIG_RCU_TRACE be the single config variable
manipulated for all forms of RCU, instead of the prior
confusion.

o Document tracing files and formats for both rcupreempt
and rcutree.

Updates from v4 for those missing v5 given its bad subject line:

o Separated dynticks interface so that NMIs and irqs call separate
functions, greatly simplifying it. In particular, this code
no longer requires a proof of correctness. ;-)

o Separated dynticks state out into its own per-CPU structure,
avoiding the duplicated accounting.

o The case where a dynticks-idle CPU runs an irq handler that
invokes call_rcu() is now correctly handled, forcing that CPU
out of dynticks-idle mode.

o Review comments have been applied (thank you all!!!).
For but one example, fixed the dynticks-ordering issue that
Manfred pointed out, saving me much debugging. ;-)

o Adjusted rcuclassic and rcupreempt to handle dynticks changes.

Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel. It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space. That said,
I have already gratefully stolen quite a few of Manfred's ideas.

This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on
64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy. By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware. Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems. I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future. (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)

In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors. This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.

Some shortcomings:

o More bugs will probably surface as a result of an ongoing
line-by-line code inspection.

Patches will be provided as required.

o There are probably hangs, rcutorture failures, &c. Seems
quite stable on a 128-CPU machine, but that is kind of small
compared to 4096 CPUs. However, seems to do better than
mainline.

Patches will be provided as required.

o The memory footprint of this version is several KB larger
than rcuclassic.

A separate UP-only rcutiny patch will be provided, which will
reduce the memory footprint significantly, even compared
to the old rcuclassic. One such patch passes light testing,
and has a memory footprint smaller even than rcuclassic.
Initial reaction from various embedded guys was "it is not
worth it", so am putting it aside.

Credits:

o Manfred Spraul for ideas, review comments, and bugs spotted,
as well as some good friendly competition. ;-)

o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
for reviews and comments.

o Thomas Gleixner for much-needed help with some timer issues
(see patches below).

o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
alive despite my heavy abuse^Wtesting.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

64db4cff

16 12月, 2008 1 次提交

powerpc/pseries: Check for GIQ indicator before calling set-indicator · edc72ac4

由 Nathan Lynch 提交于 12月 11, 2008

Since "Factor out cpu joining/unjoining the GIQ"
(b4963255) the WARN_ON in
xics_set_cpu_giq() is being triggered during boot on JS20 because the
GIQ indicator is not available on that platform.  While the warning is
harmless and the system runs normally, it's nicer to check for the
existence of the indicator before trying to manipulate it.

Implement rtas_indicator_present(), which searches the
/rtas/rtas-indicators property for the given indicator token, and use
this function in xics_set_cpu_giq().

Also use a WARN statement in xics_set_cpu_giq to get better
information on failure.
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Acked-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

edc72ac4

13 12月, 2008 2 次提交

cpumask: make irq_set_affinity() take a const struct cpumask · 0de26520

由 Rusty Russell 提交于 12月 13, 2008

Impact: change existing irq_chip API

Not much point with gentle transition here: the struct irq_chip's
setaffinity method signature needs to change.

Fortunately, not widely used code, but hits a few architectures.

Note: In irq_select_affinity() I save a temporary in by mangling
irq_desc[irq].affinity directly.  Ingo, does this break anything?

(Folded in fix from KOSAKI Motohiro)
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Reviewed-by: NGrant Grundler <grundler@parisc-linux.org>
Acked-by: NIngo Molnar <mingo@redhat.com>
Cc: ralf@linux-mips.org
Cc: grundler@parisc-linux.org
Cc: jeremy@xensource.com
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

0de26520

cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e

由 Rusty Russell 提交于 12月 13, 2008

cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.

Impact: change calling convention of existing cpumask APIs

Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.

These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com

29c0177e

06 11月, 2008 2 次提交

powerpc/pci: Fix various pseries PCI hotplug issues · fd6852c8

由 Benjamin Herrenschmidt 提交于 10月 27, 2008

The pseries PCI hotplug code has a number of issues, ranging from
incorrect resource setup to crashes, depending on what is added,
when, whether it contains a bridge, etc etc....

This fixes a whole bunch of these, while actually simplifying the code
a bit, using more generic code in the process and factoring out common
code between adding of a PHB, a slot or a device.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

fd6852c8

powerpc/eeh: Make EEH device add/remove more robust · 57b066ff

由 Benjamin Herrenschmidt 提交于 10月 27, 2008

To properly fix PCI hotplug, it's useful to be able to make the fixup
passes on all devices whether they were just hot plugged or already
there.

The EEH code however used to not be very friendly with calling
eeh_add_device_late() multiple time, and not very rebust in the way it
generally tests whether a device is in the expected state vs. the EEH
code.

This improves it, along with cleaning up a couple of debug printk's.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

57b066ff

05 11月, 2008 2 次提交

powerpc/pseries: Fix getting the server number size · 1ef8014d

由 Sebastien Dugue 提交于 10月 22, 2008

The 'ibm,interrupt-server#-size' properties are not in the cpu nodes,
which is where we currently look for them, but rather live under the
interrupt source controller nodes (which have "ibm,ppc-xics" in their
compatible property).

This moves the code that looks for the ibm,interrupt-server#-size
properties from xics_update_irq_servers() into xics_init_IRQ().

Also this adds a check for mismatched sizes across the interrupt
source controller nodes.  Not sure this is necessary as in this case
the firmware might be seriously busted.

This property only appears on POWER6 boxes and is only used in the
set-indicator(gqirm) call, and apparently firmware currently ignores
the value we pass.  Nevertheless we need to fix it in case future
firmware versions use it.
Signed-off-by: NSebastien Dugue <sebastien.dugue@bull.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

1ef8014d

powerpc: Fix "unused variable" warning in pci_dlpar.c · 454666eb

由 Stephen Rothwell 提交于 11月 02, 2008

This gets rid of this build warning:

arch/powerpc/platforms/pseries/pci_dlpar.c: In function 'init_phb_dynamic':
arch/powerpc/platforms/pseries/pci_dlpar.c:192: warning: unused variable 'b'

This is one of the very few warnings left in a ppc64_defconfig build and
getting rid of it will make it easier to see future introduced ones (in
fact this was introduced very recently).
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

454666eb

31 10月, 2008 2 次提交

powerpc/pci: Properly allocate bus resources for hotplug PHBs · e90a1318

由 Nathan Fontenot 提交于 10月 27, 2008

Resources for PHB's that are dynamically added to a system are not
properly allocated in the resource tree.

Not having these resources allocated causes an oops when removing
the PHB when we try to release them.

The diff appears a bit messy, this is mainly due to moving everything
one tab to the left in the pcibios_allocate_bus_resources routine.
The functionality change in this routine is only that the
list_for_each_entry() loop is pulled out and moved to the necessary
calling routine.
Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

e90a1318

powerpc: Use is_kdump_kernel() · 62a8bd6c

由 Milton Miller 提交于 10月 22, 2008

linux/crash_dump.h defines is_kdump_kernel() to be used by code that
needs to know if the previous kernel crashed instead of a (clean) boot
or reboot.

This updates the just added powerpc code to use it.  This is needed
for the next commit, which will remove __kdump_flag.
Signed-off-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

62a8bd6c

22 10月, 2008 1 次提交

powerpc: Support for relocatable kdump kernel · 54622f10

由 Mohan Kumar M 提交于 10月 21, 2008

This adds relocatable kernel support for kdump. With this one can
use the same regular kernel to capture the kdump. A signature (0xfeed1234)
is passed in r6 from panic code to the next kernel through kexec_sequence
and purgatory code. The signature is used to differentiate between
kdump kernel and non-kdump kernels.

The purgatory code compares the signature and sets the __kdump_flag in
head_64.S.  During the boot up, kernel code checks __kdump_flag and if it
is set, the kernel will behave as relocatable kdump kernel. This kernel
will boot at the address where it was loaded by kexec-tools ie. at the
address reserved through crashkernel boot parameter.

CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump
kernel as relocatable. So the same kernel can be used as production and
kdump kernel.

This patch incorporates the changes suggested by Paul Mackerras to avoid
GOT use and to avoid two copies of the code.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NMohan Kumar M <mohan@in.ibm.com>
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

54622f10

21 10月, 2008 1 次提交

powerpc: Use cpu_thread_in_core in smp_init for of_spin_map · 6a75a6b8

由 Milton Miller 提交于 10月 20, 2008

We used to assume that even numbered threads were the primary
threads, ie those that would be listed and started as a cpu from
open firmware. Replace a left over is even (% 2) check with a check
for it being a primary thread and update the comments.

Tested with a debug print on pseries, identical code found for cell.
Signed-off-by: NMilton Miller <miltonm@bga.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6a75a6b8