- 09 7月, 2010 6 次提交
-
-
由 Benjamin Herrenschmidt 提交于
Our handling of debug interrupts on Book3E 64-bit is not quite the way it should be just yet. This is a workaround to let gdb work at least for now. We ensure that when context switching, we set the appropriate DBCR0 value for the new task. We also make sure that we turn off MSR[DE] within the kernel, and set it as part of the bits that get set when going back to userspace. In the long run, we will probably set the userspace DBCR0 on the exception exit code path and ensure we have some proper kernel value to set on the way into the kernel, a bit like ppc32 does, but that will take more work. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Martyn Welch 提交于
Currently the irqs for the i8042, which historically provides keyboard and mouse (aux) support, is hardwired in the driver rather than parsing the dts. This patch modifies the powerpc legacy IO code to attempt to parse the device tree for this information, failing back to the hardcoded values if it fails. Signed-off-by: NMartyn Welch <martyn.welch@ge.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Now we dynamically allocate the paca array, it takes an extra load whenever we want to access another cpu's paca. One place we do that a lot is per cpu variables. A simple example: DEFINE_PER_CPU(unsigned long, vara); unsigned long test4(int cpu) { return per_cpu(vara, cpu); } This takes 4 loads, 5 if you include the actual load of the per cpu variable: ld r11,-32760(r30) # load address of paca pointer ld r9,-32768(r30) # load link address of percpu variable sldi r3,r29,9 # get offset into paca (each entry is 512 bytes) ld r0,0(r11) # load paca pointer add r3,r0,r3 # paca + offset ld r11,64(r3) # load paca[cpu].data_offset ldx r3,r9,r11 # load per cpu variable If we remove the ppc64 specific per_cpu_offset(), we get the generic one which indexes into a statically allocated array. This removes one load and one add: ld r11,-32760(r30) # load address of __per_cpu_offset ld r9,-32768(r30) # load link address of percpu variable sldi r3,r29,3 # get offset into __per_cpu_offset (each entry 8 bytes) ldx r11,r11,r3 # load __per_cpu_offset[cpu] ldx r3,r9,r11 # load per cpu variable Having all the offsets in one array also helps when iterating over a per cpu variable across a number of cpus, such as in the scheduler. Before we would need to load one paca cacheline when calculating each per cpu offset. Now we have 16 (128 / sizeof(long)) per cpu offsets in each cacheline. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Brian King 提交于
Partition hibernation will use some of the same code as is currently used for Live Partition Migration. This function further abstracts this code such that code outside of rtas.c can utilize it. It also changes the error field in the suspend me data structure to be an atomic type, since it is set and checked on different cpus without any barriers or locking. Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Paul Mackerras 提交于
Since the decrementer and timekeeping code was moved over to using the generic clockevents and timekeeping infrastructure, several variables and functions have been obsolete and effectively unused. This deletes them. In particular, wakeup_decrementer() is no longer needed since the generic code reprograms the decrementer as part of the process of resuming the timekeeping code, which happens during sysdev resume. Thus the wakeup_decrementer calls in the suspend_enter methods for 52xx platforms have been removed. The call in the powermac cpu frequency change code has been replaced by set_dec(1), which will cause a timer interrupt as soon as interrupts are enabled, and the generic code will then reprogram the decrementer with the correct value. This also simplifies the generic_suspend_en/disable_irqs functions and makes them static since they are not referenced outside time.c. The preempt_enable/disable calls are removed because the generic code has disabled all but the boot cpu at the point where these functions are called, so we can't be moved to another cpu. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Paul Mackerras 提交于
Currently it is possible for userspace to see the result of gettimeofday() going backwards by 1 microsecond, assuming that userspace is using the gettimeofday() in the VDSO. The VDSO gettimeofday() algorithm computes the time in "xsecs", which are units of 2^-20 seconds, or approximately 0.954 microseconds, using the algorithm now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec and then converts the time in xsecs to seconds and microseconds. The kernel updates the tb_orig_stamp and stamp_xsec values every tick in update_vsyscall(). If the length of the tick is not an integer number of xsecs, then some precision is lost in converting the current time to xsecs. For example, with CONFIG_HZ=1000, the tick is 1ms long, which is 1048.576 xsecs. That means that stamp_xsec will advance by either 1048 or 1049 on each tick. With the right conditions, it is possible for userspace to get (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is slightly late in updating the vdso_datapage, and then for stamp_xsec to advance by 1048 when the kernel does update it, and for userspace to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to integer truncation. The result is that time appears to go backwards by 1 microsecond. To fix this we change the VDSO gettimeofday to use a new field in the VDSO datapage which stores the nanoseconds part of the time as a fractional number of seconds in a 0.32 binary fraction format. (Or put another way, as a 32-bit number in units of 0.23283 ns.) This is convenient because we can use the mulhwu instruction to convert it to either microseconds or nanoseconds. Since it turns out that computing the time of day using this new field is simpler than either using stamp_xsec (as gettimeofday does) or stamp_xtime.tv_nsec (as clock_gettime does), this converts both gettimeofday and clock_gettime to use the new field. The existing __do_get_tspec function is converted to use the new field and take a parameter in r7 that indicates the desired resolution, 1,000,000 for microseconds or 1,000,000,000 for nanoseconds. The __do_get_xsec function is then unused and is deleted. The new algorithm is now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs + (stamp_xtime_seconds << 32) + stamp_sec_fraction with 'now' in units of 2^-32 seconds. That is then converted to seconds and either microseconds or nanoseconds with seconds = now >> 32 partseconds = ((now & 0xffffffff) * resolution) >> 32 The 32-bit VDSO code also makes a further simplification: it ignores the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary fraction. Doing so gets rid of 4 multiply instructions. Assuming a timebase frequency of 1GHz or less and an update interval of no more than 10ms, the upper 32 bits of tb_to_xs will be at least 4503599, so the error from ignoring the low 32 bits will be at most 2.2ns, which is more than an order of magnitude less than the time taken to do gettimeofday or clock_gettime on our fastest processors, so there is no possibility of seeing inconsistent values due to this. This also moves update_gtod() down next to its only caller, and makes update_vsyscall use the time passed in via the wall_time argument rather than accessing xtime directly. At present, wall_time always points to xtime, but that could change in future. Signed-off-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 08 7月, 2010 5 次提交
-
-
由 Paul E. McKenney 提交于
crash_kexec_wait_realmode() is defined only if CONFIG_PPC_STD_MMU_64 and CONFIG_SMP, but is called if CONFIG_PPC_STD_MMU_64 even if !CONFIG_SMP. Fix the conditional compilation around the invocation. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Johannes Berg 提交于
When SPARSE_IRQ is set, irq_to_desc() can return NULL. While the code here has a check for NULL, it's not really correct. Fix it by separating the check for it. This fixes CPU hot unplug for me. Reported-by: NAlastair Bridgewater <alastair.bridgewater@gmail.com> Cc: stable@kernel.org [2.6.32+] Signed-off-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
If we configure with CONFIG_SMP=n or set NR_CPUS less than the number of SMT threads we will set the max cores property to 0 in the ibm,client-architecture-support structure. On new versions of firmware that understand this property it obliges and terminates our partition. Use DIV_ROUND_UP so we handle not only the CONFIG_SMP=n case but also the case where NR_CPUS isn't a multiple of the number of SMT threads. Signed-off-by: NAnton Blanchard <anton@samba.org> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Stephen Rothwell 提交于
Just whitelist these extra compiler generated symbols. Fixes these errors: Error: External symbol '_restgpr0_14' referenced from prom_init.c Error: External symbol '_restgpr0_20' referenced from prom_init.c Error: External symbol '_restgpr0_22' referenced from prom_init.c Error: External symbol '_restgpr0_24' referenced from prom_init.c Error: External symbol '_restgpr0_25' referenced from prom_init.c Error: External symbol '_restgpr0_26' referenced from prom_init.c Error: External symbol '_restgpr0_27' referenced from prom_init.c Error: External symbol '_restgpr0_28' referenced from prom_init.c Error: External symbol '_restgpr0_29' referenced from prom_init.c Error: External symbol '_restgpr0_31' referenced from prom_init.c Error: External symbol '_savegpr0_14' referenced from prom_init.c Error: External symbol '_savegpr0_20' referenced from prom_init.c Error: External symbol '_savegpr0_22' referenced from prom_init.c Error: External symbol '_savegpr0_24' referenced from prom_init.c Error: External symbol '_savegpr0_25' referenced from prom_init.c Error: External symbol '_savegpr0_26' referenced from prom_init.c Error: External symbol '_savegpr0_27' referenced from prom_init.c Error: External symbol '_savegpr0_28' referenced from prom_init.c Error: External symbol '_savegpr0_29' referenced from prom_init.c Error: External symbol '_savegpr0_31' referenced from prom_init.c Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Acked-by: NSegher Boessenkool <segher@kernel.crashing.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Matt Evans 提交于
When power_pmu_disable() removes the given event from a particular index into cpuhw->event[], it shuffles down higher event[] entries. But, this array is paired with cpuhw->events[] and cpuhw->flags[] so should shuffle them similarly. If these arrays get out of sync, code such as power_check_constraints() will fail. This caused a bug where events were temporarily disabled and then failed to be re-enabled; subsequent code tried to write_pmc() with its (disabled) idx of 0, causing a message "oops trying to write PMC0". This triggers this bug on POWER7, running a miss-heavy test: perf record -e L1-dcache-load-misses -e L1-dcache-store-misses ./misstest Signed-off-by: NMatt Evans <matt@ozlabs.org> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 30 6月, 2010 1 次提交
-
-
由 Paul Mackerras 提交于
At present, hw_breakpoint_slots() returns 1 regardless of what type of breakpoint is specified in the type argument. Since we don't define CONFIG_HAVE_MIXED_BREAKPOINTS_REGS, there are separate values for TYPE_INST and TYPE_DATA, and hw_breakpoint_slots() returns 1 for both, effectively advertising instruction breakpoint support which doesn't exist. This fixes it by making hw_breakpoint_slots return 1 for TYPE_DATA and 0 for TYPE_INST. This moves hw_breakpoint_slots() from the powerpc hw_breakpoint.h to hw_breakpoint.c because the definitions of TYPE_INST and TYPE_DATA aren't available in <asm/hw_breakpoint.h>. They are defined in <linux/hw_breakpoint.h> but we can't include that header in <asm/hw_breakpoint.h>, and nor can we rely on <linux/hw_breakpoint.h> being included before <asm/hw_breakpoint.h>. Since hw_breakpoint_slots() is only called at boot time, there is no performance impact from making it a real function rather than a static inline. Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 23 6月, 2010 2 次提交
-
-
由 Paul Mackerras 提交于
The code we had to clear the MSR_SE bit was not doing anything because the caller (ultimately single_step_exception() in traps.c) had already cleared. Instead of trying to leave MSR_SE set if the TIF_SINGLESTEP flag is set (which indicates that the process is being single-stepped by ptrace), we instead return NOTIFY_DONE in that case, which means the caller will generate a SIGTRAP for the process. Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Paul Mackerras 提交于
The code would accept an access to an address one byte past the end of the requested range as legitimate, due to having a "<=" rather than a "<". This fixes that and cleans up the code a bit. Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 22 6月, 2010 4 次提交
-
-
由 K.Prasad 提交于
Many a times, the requested breakpoint length can be less than the fixed breakpoint length i.e. 8 bytes supported by PowerPC 64-bit server (Book III S) processors. This could lead to extraneous interrupts resulting in false breakpoint notifications. This detects and discards such interrupts for non-ptrace requests. We don't change ptrace behaviour to avoid breaking compatability. [Suggestion from Paul Mackerras <paulus@samba.org> to add a new flag in 'struct arch_hw_breakpoint' to identify extraneous interrupts] Signed-off-by: NK.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 K.Prasad 提交于
A signal delivered between a hw_breakpoint_handler() and the single_step_dabr_instruction() will not have the breakpoint active while the signal handler is running -- the signal delivery will set up a new MSR value which will not have MSR_SE set, so we won't get the signal step interrupt until and unless the signal handler returns (which it may never do). To fix this, we restore the breakpoint when delivering a signal -- we clear the MSR_SE bit and set the DABR again. If the signal handler returns, the DABR interrupt will occur again when the instruction that we were originally trying to single-step gets re-executed. [Paul Mackerras <paulus@samba.org> pointed out the need to do this.] Signed-off-by: NK.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 K.Prasad 提交于
If an alignment interrupt occurs on an instruction that is being single-stepped, the alignment interrupt handler currently handles the single-step condition by unconditionally sending a SIGTRAP to the process. Other synchronous interrupts that result in the instruction being emulated do likewise. With hw_breakpoint support, the hw_breakpoint code needs to be able to intercept these single-step events as well as those where the instruction executes normally and a trace interrupt happens. Fix this by making emulate_single_step() use the existing single_step_exception() function instead of calling _exception() directly. We then make single_step_exception() use the abstracted clear_single_step() rather than clearing bits in the MSR image directly so that emulate_single_step() will continue to work correctly on Book 3E processors. Signed-off-by: NK.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 K.Prasad 提交于
Implement perf-events based hw-breakpoint interfaces for PowerPC 64-bit server (Book III S) processors. This allows access to a given location to be used as an event that can be counted or profiled by the perf_events subsystem. This is done using the DABR (data breakpoint register), which can also be used for process debugging via ptrace. When perf_event hw_breakpoint support is configured in, the perf_event subsystem manages the DABR and arbitrates access to it, and ptrace then creates a perf_event when it is requested to set a data breakpoint. [Adopted suggestions from Paul Mackerras <paulus@samba.org> to - emulate_step() all system-wide breakpoints and single-step only the per-task breakpoints - perform arch-specific cleanup before unregistration through arch_unregister_hw_breakpoint() ] Signed-off-by: NK.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 15 6月, 2010 3 次提交
-
-
由 Milton Miller 提交于
When trying to flash a machine via the update_flash command, Anton received the following error: Restarting system. FLASH: kernel bug...flash list header addr above 4GB The code in question has a comment that the flash list should be in the kernel data and therefore under 4GB: /* NOTE: the "first" block list is a global var with no data * blocks in the kernel data segment. We do this because * we want to ensure this block_list addr is under 4GB. */ Unfortunately the Kconfig option is marked tristate which means the variable may not be in the kernel data and could be above 4GB. Instead of relying on the data segment being below 4GB, use the static data buffer allocated by the kernel for use by rtas. Since we don't use the header struct directly anymore, convert it to a simple pointer. Reported-By: NAnton Blanchard <anton@samba.org> Signed-Off-By: Milton Miller <miltonm@bga.com Tested-By: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Christoph Hellwig 提交于
Irq stacks provide an essential protection from stack overflows through external interrupts, at the cost of two additionals stacks per CPU. Enable them unconditionally to simplify the kernel build and prevent people from accidentally disabling them. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Matt Evans 提交于
kexec_perpare_cpus_wait() iterates i through NR_CPUS to check paca[i].kexec_state of each to make sure they have quiesced. However now we have dynamic PACA allocation, paca[NR_CPUS] is not necessarily valid and we overrun the array; spurious "cpu is not possible, ignoring" errors result. This patch iterates for_each_online_cpu so stays within the bounds of paca[] -- and every CPU is now 'possible'. Signed-off-by: NMatt Evans <matt@ozlabs.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 12 6月, 2010 1 次提交
-
-
由 Yinghai Lu 提交于
Yannick found that video does not work with 2.6.34. The cause of this bug was that the BIOS had assigned the wrong range to the PCI bridge above the video device. Before 2.6.34 the kernel would have shrunk the size of the bridge window, but since d65245c3 PCI: don't shrink bridge resources the kernel will avoid shrinking BIOS ranges. So zero out the old range if we fail to claim it at boot time; this will cause us to allocate a new range at startup, restoring the 2.6.34 behavior. Fixes regression https://bugzilla.kernel.org/show_bug.cgi?id=16009. Reported-by: NYannick <yannick.roehlly@free.fr> Acked-by: NBjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
-
- 02 6月, 2010 1 次提交
-
-
emulate_step() in kprobe_handler() would've already determined if the probed instruction can be emulated. We single-step in hardware only if the instruction couldn't be emulated. resume_execution() therefore is superfluous -- all we need is to fix up the instruction pointer after single-stepping. Thanks to Paul Mackerras for catching this. Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 31 5月, 2010 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 28 5月, 2010 1 次提交
-
-
由 FUJITA Tomonori 提交于
sync_single_range_for_cpu and sync_single_range_for_device hooks in swiotlb_dma_ops are unnecessary because sync_single_for_cpu and sync_single_for_device are used there. Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Becky Bruce <beckyb@kernel.crashing.org> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 5月, 2010 3 次提交
-
-
This adds support kexec on FSL-BookE where the MMU can not be simply switched off. The code borrows the initial MMU-setup code to create the identical mapping mapping. The only difference to the original boot code is the size of the mapping(s) and the executeable address. The kexec code maps the first 2 GiB of memory in 256 MiB steps. This should work also on e500v1 boxes. SMP support is still not available. (Kumar: Added minor change to build to ifdef CONFIG_PPC_STD_MMU_64 some code that was PPC64 specific) Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
This patch only moves the initial entry code which setups the mapping from what ever to KERNELBASE into a seperate file. No code change has been made here. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
During boot we change the mapping a few times until we have a "defined" mapping. During this procedure a small 4KiB mapping is created and after that one a 64MiB. Currently the offset of the 4KiB page in that we run is zero because the complete startup up code is in first page which starts at RPN zero. If the code is recycled and moved to another location then its execution will fail because the start address in 64 MiB mapping is computed wrongly. It does not consider the offset to the page from the begin of the memory. This patch fixes this. Usually (system boot) r25 is zero so this does not change anything unless the code is recycled. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
- 22 5月, 2010 3 次提交
-
-
由 Grant Likely 提交于
.name, .match_table and .owner are duplicated in both of_platform_driver and device_driver. This patch is a removes the extra copies from struct of_platform_driver and converts all users to the device_driver members. This patch is a pretty mechanical change. The usage model doesn't change and if any drivers have been missed, or if anything has been fixed up incorrectly, then it will fail with a compile time error, and the fixup will be trivial. This patch looks big and scary because it touches so many files, but it should be pretty safe. Signed-off-by: NGrant Likely <grant.likely@secretlab.ca> Acked-by: NSean MacLennan <smaclennan@pikatech.com>
-
由 Grant Likely 提交于
OF-style matching can be available to any device, on any type of bus. This patch allows any driver to provide an OF match table when CONFIG_OF is enabled so that drivers can be bound against devices described in the device tree. Signed-off-by: NGrant Likely <grant.likely@secretlab.ca> Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Grant Likely 提交于
By moving dma_mask into pdev_archdata, and adding archdata to struct of_device, it makes it possible to substitute of_device with struct platform_device, which is a stepping stone to removing the of_platform bus entirely. Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
- 21 5月, 2010 9 次提交
-
-
由 Anton Vorontsov 提交于
This is started as swsusp_32.S modifications, but the amount of #ifdefs made the whole file horribly unreadable, so let's put the support into its own separate file. The code should be relatively easy to modify to support 44x BookEs as well, but since I don't have any 44x to test, let's confine the code to FSL BookE. (The only FSL-specific part so far is 'flush_dcache_L1'.) Signed-off-by: NAnton Vorontsov <avorontsov@mvista.com> Acked-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
由 Scott Wood 提交于
Most of the MSCR bit assigments are different in e500mc versus e500, and they are now write-one-to-clear. Some e500mc machine check conditions are made recoverable (as long as they aren't stuck on), most notably L1 instruction cache parity errors. Signed-off-by: NScott Wood <scottwood@freescale.com> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
由 FUJITA Tomonori 提交于
'protect4gb' boot parameter was introduced to avoid allocating dma space acrossing 4GB boundary in 2007 (the commit 56997559). In 2008, the IOMMU was fixed to use the boundary_mask parameter per device properly. So 'protect4gb' workaround was removed (the 383af952). But somehow I messed the 'protect4gb' boot parameter that was used to enable the workaround. Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Right now if we want to busy loop and not give up any time to the hypervisor we put a very large value into smt_snooze_delay. This is sometimes useful when running a single partition and you want to avoid any latencies due to the hypervisor or CPU power state transitions. While this works, it's a bit ugly - how big a number is enough now we have NO_HZ and can be idle for a very long time. The patch below makes smt_snooze_delay signed, and a negative value means loop forever: echo -1 > /sys/devices/system/cpu/cpu0/smt_snooze_delay This change shouldn't affect the existing userspace tools (eg ppc64_cpu), but I'm cc-ing Nathan just to be sure. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
I'm not sure why we have code for parsing an ibm,smt-snooze-delay OF property. Since we have a smt-snooze-delay= boot option and we can also set it at runtime via sysfs, it should be safe to get rid of this code. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Neuling 提交于
When we are crashing, the crashing/primary CPU IPIs the secondaries to turn off IRQs, go into real mode and wait in kexec_wait. While this is happening, the primary tears down all the MMU maps. Unfortunately the primary doesn't check to make sure the secondaries have entered real mode before doing this. On PHYP machines, the secondaries can take a long time shutting down the IRQ controller as RTAS calls are need. These RTAS calls need to be serialised which resilts in the secondaries contending in lock_rtas() and hence taking a long time to shut down. We've hit this on large POWER7 machines, where some secondaries are still waiting in lock_rtas(), when the primary tears down the HPTEs. This patch makes sure all secondaries are in real mode before the primary tears down the MMU. It uses the new kexec_state entry in the paca. It times out if the secondaries don't reach real mode after 10sec. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Neuling 提交于
In kexec_prepare_cpus, the primary CPU IPIs the secondary CPUs to kexec_smp_down(). kexec_smp_down() calls kexec_smp_wait() which sets the hw_cpu_id() to -1. The primary does this while leaving IRQs on which means the primary can take a timer interrupt which can lead to the IPIing one of the secondary CPUs (say, for a scheduler re-balance) but since the secondary CPU now has a hw_cpu_id = -1, we IPI CPU -1... Kaboom! We are hitting this case regularly on POWER7 machines. There is also a second race, where the primary will tear down the MMU mappings before knowing the secondaries have entered real mode. Also, the secondaries are clearing out any pending IPIs before guaranteeing that no more will be received. This changes kexec_prepare_cpus() so that we turn off IRQs in the primary CPU much earlier. It adds a paca flag to say that the secondaries have entered the kexec_smp_down() IPI and turned off IRQs, rather than overloading hw_cpu_id with -1. This new paca flag is again used to in indicate when the secondaries has entered real mode. It also ensures that all CPUs have their IRQs off before we clear out any pending IPI requests (in kexec_cpu_down()) to ensure there are no trailing IPIs left unacknowledged. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Author: Milton Miller <miltonm@bga.com> On large machines we are running out of room below 256MB. In some cases we only need to ensure the allocation is in the first segment, which may be 256MB or 1TB. Add slb0_limit and use it to specify the upper limit for the irqstack and emergency stacks. On a large ppc64 box, this fixes a panic at boot when the crashkernel= option is specified (previously we would run out of memory below 256MB). Signed-off-by: NMilton Miller <miltonm@bga.com> Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
I saw this in a kdump kernel: IOMMU table initialized, virtual merging enabled Interrupt 155954 (real) is invalid, disabling it. Interrupt 155953 (real) is invalid, disabling it. ie we took some spurious interrupts. default_machine_crash_shutdown tries to disable all interrupt sources but uses chip->disable which maps to the default action of: static void default_disable(unsigned int irq) { } If we use chip->shutdown, then we actually mask the IRQ: static void default_shutdown(unsigned int irq) { struct irq_desc *desc = irq_to_desc(irq); desc->chip->mask(irq); desc->status |= IRQ_MASKED; } Not sure why we don't implement a ->disable action for xics.c, or why default_disable doesn't mask the interrupt. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-