- 14 11月, 2016 15 次提交
-
-
由 Michael Neuling 提交于
Load monitored is no longer supported on POWER9 so let's remove the code. This reverts commit bd3ea317 ("powerpc: Load Monitor Register Support"). Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
No one uses reiserfs much these days, or is likely to in future. So drop it from pseries and powernv defconfigs to save time and space. It's still enabled in ppc64_defconfig so we get some build coverage. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Balbir Singh 提交于
In ISA v2.05, the tlbiel instruction takes two arguments, RB and L: tlbiel RB,L +---------+---------+----+---------+---------+---------+----+ | 31 | / | L | / | RB | 274 | / | | 31 - 26 | 25 - 22 | 21 | 20 - 16 | 15 - 11 | 10 - 1 | 0 | +---------+---------+----+---------+---------+---------+----+ In ISA v2.06 tlbiel takes only one argument, RB: tlbiel RB +---------+---------+---------+---------+---------+----+ | 31 | / | / | RB | 274 | / | | 31 - 26 | 25 - 21 | 20 - 16 | 15 - 11 | 10 - 1 | 0 | +---------+---------+---------+---------+---------+----+ And in ISA v3.00 tlbiel takes five arguments: tlbiel RB,RS,RIC,PRS,R +---------+---------+----+---------+----+----+---------+---------+----+ | 31 | RS | / | RIC |PRS | R | RB | 274 | / | | 31 - 26 | 25 - 21 | 20 | 19 - 18 | 17 | 16 | 15 - 11 | 10 - 1 | 0 | +---------+---------+----+---------+----+----+---------+---------+----+ However the assembler also accepts "tlbiel RB", and generates "tlbiel RB,r0,0,0,0". As you can see above the L field from the v2.05 encoding overlaps with the reserved field of the v2.06 encoding, and the low bit of the RS field of the v3.00 encoding. Currently in __tlbiel() we generate two tlbiel instructions manually using hex constants. In the first case, for MMU_PAGE_4K, we generate "tlbiel RB,0", which is safe in all cases, because the L bit is zero. However in the default case we generate "tlbiel RB,1", therefore setting bit 21 to 1. This is not an actual bug on v2.06 processors, because the CPU ignores the value of the reserved field. However software is supposed to encode the reserved fields as zero to enable forward compatibility. On v3.00 processors setting bit 21 to 1 and no other bits of RS, means we are using r1 for the value of RS. Although it's not obvious, the code sets the IS field (bits 10-11) to 0 (by omission), and L=1, in the va value, which is passed as RB. We also pass R=0 in the instruction. The combination of IS=0, L=1 and R=0 means the value of RS is not used, so even on ISA v3.00 there is no actual bug. We should still fix it, as setting a reserved bit on v2.06 is naughty, and we are only avoiding a bug on v3.00 by accident rather than design. Use ASM_FTR_IFSET() to generate the single argument form on ISA v2.06 and later, and the two argument form on pre v2.06. Although there may be very old toolchains which don't understand tlbiel, we have other code in the tree which has been using tlbiel for over five years, and no one has reported any build failures, so just let the assembler generate the instructions. Signed-off-by: NBalbir Singh <bsingharora@gmail.com> [mpe: Rewrite change log, use IFSET instead of IFCLR] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
When we're not compiling for a specific CPU, ie. none of the CONFIG_POWERx_CPU options are set, and CONFIG_GENERIC_CPU *is* set, we currently don't pass any -mcpu option to the compiler. This means the compiler builds for a "generic" Power CPU. But back in 2014 we dropped support for pre power4 CPUs in commit 468a3302 ("powerpc: Drop support for pre-POWER4 cpus"). Given that, there's no point in building the kernel to run on pre power4 cpus. So update the flags we pass to the compiler when CONFIG_GENERIC_CPU is set, to specify -mcpu=power4. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
This adds a config option that can help exercise the case when the kernel is not running at PAGE_OFFSET. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Reviewed-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Anton Blanchard 提交于
An hcall was recently added that does exactly what we need during kexec - it clears the entire MMU hash table, ignoring any VRMA mappings. Try it and fall back to the old method if we get a failure. On a POWER8 box with 5TB of memory, this reduces the time it takes to kexec a new kernel from from 4 minutes to 1 minute. Signed-off-by: NAnton Blanchard <anton@samba.org> Tested-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> [mpe: Split into separate functions and tweak function naming] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Denis Kirjanov 提交于
That's unclear why lockdep shows the following warning but adding a lockdep class to struct pmac_i2c_bus solves it [ 20.507795] ====================================================== [ 20.507796] [ INFO: possible circular locking dependency detected ] [ 20.507800] 4.8.0-rc7-00037-gd2ffb010 #21 Not tainted [ 20.507801] ------------------------------------------------------- [ 20.507803] swapper/0/1 is trying to acquire lock: [ 20.507818] (&bus->mutex){+.+.+.}, at: [<c000000000052830>] .pmac_i2c_open+0x30/0x100 [ 20.507819] [ 20.507819] but task is already holding lock: [ 20.507829] (&policy->rwsem){+.+.+.}, at: [<c00000000068adcc>] .cpufreq_online+0x1ac/0x9d0 [ 20.507830] [ 20.507830] which lock already depends on the new lock. [ 20.507830] [ 20.507832] [ 20.507832] the existing dependency chain (in reverse order) is: [ 20.507837] [ 20.507837] -> #4 (&policy->rwsem){+.+.+.}: [ 20.507844] [<c00000000082385c>] .down_write+0x6c/0x110 [ 20.507849] [<c00000000068adcc>] .cpufreq_online+0x1ac/0x9d0 [ 20.507855] [<c0000000004d76d8>] .subsys_interface_register+0xb8/0x110 [ 20.507860] [<c000000000689bb0>] .cpufreq_register_driver+0x1d0/0x250 [ 20.507866] [<c000000000b4f8f4>] .g5_cpufreq_init+0x9cc/0xa28 [ 20.507872] [<c00000000000a98c>] .do_one_initcall+0x5c/0x1d0 [ 20.507878] [<c000000000b0f86c>] .kernel_init_freeable+0x1ac/0x28c [ 20.507883] [<c00000000000b3bc>] .kernel_init+0x1c/0x140 [ 20.507887] [<c0000000000098f4>] .ret_from_kernel_thread+0x58/0x64 [ 20.507894] [ 20.507894] -> #3 (subsys mutex#2){+.+.+.}: [ 20.507899] [<c000000000820448>] .mutex_lock_nested+0xa8/0x590 [ 20.507903] [<c0000000004d7f24>] .bus_probe_device+0x44/0xe0 [ 20.507907] [<c0000000004d5208>] .device_add+0x508/0x730 [ 20.507911] [<c0000000004dd528>] .register_cpu+0x118/0x190 [ 20.507916] [<c000000000b14450>] .topology_init+0x148/0x248 [ 20.507921] [<c00000000000a98c>] .do_one_initcall+0x5c/0x1d0 [ 20.507925] [<c000000000b0f86c>] .kernel_init_freeable+0x1ac/0x28c [ 20.507929] [<c00000000000b3bc>] .kernel_init+0x1c/0x140 [ 20.507934] [<c0000000000098f4>] .ret_from_kernel_thread+0x58/0x64 [ 20.507939] [ 20.507939] -> #2 (cpu_add_remove_lock){+.+.+.}: [ 20.507944] [<c000000000820448>] .mutex_lock_nested+0xa8/0x590 [ 20.507950] [<c000000000087a9c>] .register_cpu_notifier+0x2c/0x70 [ 20.507955] [<c000000000b267e0>] .spawn_ksoftirqd+0x18/0x4c [ 20.507959] [<c00000000000a98c>] .do_one_initcall+0x5c/0x1d0 [ 20.507964] [<c000000000b0f770>] .kernel_init_freeable+0xb0/0x28c [ 20.507968] [<c00000000000b3bc>] .kernel_init+0x1c/0x140 [ 20.507972] [<c0000000000098f4>] .ret_from_kernel_thread+0x58/0x64 [ 20.507978] [ 20.507978] -> #1 (&host->mutex){+.+.+.}: [ 20.507982] [<c000000000820448>] .mutex_lock_nested+0xa8/0x590 [ 20.507987] [<c0000000000527e8>] .kw_i2c_open+0x18/0x30 [ 20.507991] [<c000000000052894>] .pmac_i2c_open+0x94/0x100 [ 20.507995] [<c000000000b220a0>] .smp_core99_probe+0x260/0x410 [ 20.507999] [<c000000000b185bc>] .smp_prepare_cpus+0x280/0x2ac [ 20.508003] [<c000000000b0f748>] .kernel_init_freeable+0x88/0x28c [ 20.508008] [<c00000000000b3bc>] .kernel_init+0x1c/0x140 [ 20.508012] [<c0000000000098f4>] .ret_from_kernel_thread+0x58/0x64 [ 20.508018] [ 20.508018] -> #0 (&bus->mutex){+.+.+.}: [ 20.508023] [<c0000000000ed5b4>] .lock_acquire+0x84/0x100 [ 20.508027] [<c000000000820448>] .mutex_lock_nested+0xa8/0x590 [ 20.508032] [<c000000000052830>] .pmac_i2c_open+0x30/0x100 [ 20.508037] [<c000000000052e14>] .pmac_i2c_do_begin+0x34/0x120 [ 20.508040] [<c000000000056bc0>] .pmf_call_one+0x50/0xd0 [ 20.508045] [<c00000000068ff1c>] .g5_pfunc_switch_volt+0x2c/0xc0 [ 20.508050] [<c00000000068fecc>] .g5_pfunc_switch_freq+0x1cc/0x1f0 [ 20.508054] [<c00000000068fc2c>] .g5_cpufreq_target+0x2c/0x40 [ 20.508058] [<c0000000006873ec>] .__cpufreq_driver_target+0x23c/0x840 [ 20.508062] [<c00000000068c798>] .cpufreq_gov_performance_limits+0x18/0x30 [ 20.508067] [<c00000000068915c>] .cpufreq_start_governor+0xac/0x100 [ 20.508071] [<c00000000068a788>] .cpufreq_set_policy+0x208/0x260 [ 20.508076] [<c00000000068abdc>] .cpufreq_init_policy+0x6c/0xb0 [ 20.508081] [<c00000000068ae70>] .cpufreq_online+0x250/0x9d0 [ 20.508085] [<c0000000004d76d8>] .subsys_interface_register+0xb8/0x110 [ 20.508090] [<c000000000689bb0>] .cpufreq_register_driver+0x1d0/0x250 [ 20.508094] [<c000000000b4f8f4>] .g5_cpufreq_init+0x9cc/0xa28 [ 20.508099] [<c00000000000a98c>] .do_one_initcall+0x5c/0x1d0 [ 20.508103] [<c000000000b0f86c>] .kernel_init_freeable+0x1ac/0x28c [ 20.508107] [<c00000000000b3bc>] .kernel_init+0x1c/0x140 [ 20.508112] [<c0000000000098f4>] .ret_from_kernel_thread+0x58/0x64 [ 20.508113] [ 20.508113] other info that might help us debug this: [ 20.508113] [ 20.508121] Chain exists of: [ 20.508121] &bus->mutex --> subsys mutex#2 --> &policy->rwsem [ 20.508121] [ 20.508123] Possible unsafe locking scenario: [ 20.508123] [ 20.508124] CPU0 CPU1 [ 20.508125] ---- ---- [ 20.508128] lock(&policy->rwsem); [ 20.508132] lock(subsys mutex#2); [ 20.508135] lock(&policy->rwsem); [ 20.508138] lock(&bus->mutex); [ 20.508139] [ 20.508139] *** DEADLOCK *** [ 20.508139] [ 20.508141] 3 locks held by swapper/0/1: [ 20.508150] #0: (cpu_hotplug.lock){++++++}, at: [<c000000000087838>] .get_online_cpus+0x48/0xc0 [ 20.508159] #1: (subsys mutex#2){+.+.+.}, at: [<c0000000004d7670>] .subsys_interface_register+0x50/0x110 [ 20.508168] #2: (&policy->rwsem){+.+.+.}, at: [<c00000000068adcc>] .cpufreq_online+0x1ac/0x9d0 [ 20.508169] [ 20.508169] stack backtrace: [ 20.508173] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc7-00037-gd2ffb010 #21 [ 20.508175] Call Trace: [ 20.508180] [c0000000790c2b90] [c00000000082cc70] .dump_stack+0xe0/0x14c (unreliable) [ 20.508184] [c0000000790c2c20] [c000000000828c88] .print_circular_bug+0x350/0x388 [ 20.508188] [c0000000790c2cd0] [c0000000000ecb0c] .__lock_acquire+0x196c/0x1d30 [ 20.508192] [c0000000790c2e50] [c0000000000ed5b4] .lock_acquire+0x84/0x100 [ 20.508196] [c0000000790c2f20] [c000000000820448] .mutex_lock_nested+0xa8/0x590 [ 20.508201] [c0000000790c3030] [c000000000052830] .pmac_i2c_open+0x30/0x100 [ 20.508206] [c0000000790c30c0] [c000000000052e14] .pmac_i2c_do_begin+0x34/0x120 [ 20.508209] [c0000000790c3150] [c000000000056bc0] .pmf_call_one+0x50/0xd0 [ 20.508213] [c0000000790c31e0] [c00000000068ff1c] .g5_pfunc_switch_volt+0x2c/0xc0 [ 20.508217] [c0000000790c3250] [c00000000068fecc] .g5_pfunc_switch_freq+0x1cc/0x1f0 [ 20.508221] [c0000000790c3320] [c00000000068fc2c] .g5_cpufreq_target+0x2c/0x40 [ 20.508226] [c0000000790c3390] [c0000000006873ec] .__cpufreq_driver_target+0x23c/0x840 [ 20.508230] [c0000000790c3440] [c00000000068c798] .cpufreq_gov_performance_limits+0x18/0x30 [ 20.508235] [c0000000790c34b0] [c00000000068915c] .cpufreq_start_governor+0xac/0x100 [ 20.508239] [c0000000790c3530] [c00000000068a788] .cpufreq_set_policy+0x208/0x260 [ 20.508244] [c0000000790c35d0] [c00000000068abdc] .cpufreq_init_policy+0x6c/0xb0 [ 20.508249] [c0000000790c3940] [c00000000068ae70] .cpufreq_online+0x250/0x9d0 [ 20.508253] [c0000000790c3a30] [c0000000004d76d8] .subsys_interface_register+0xb8/0x110 [ 20.508258] [c0000000790c3ad0] [c000000000689bb0] .cpufreq_register_driver+0x1d0/0x250 [ 20.508262] [c0000000790c3b60] [c000000000b4f8f4] .g5_cpufreq_init+0x9cc/0xa28 [ 20.508267] [c0000000790c3c20] [c00000000000a98c] .do_one_initcall+0x5c/0x1d0 [ 20.508271] [c0000000790c3d00] [c000000000b0f86c] .kernel_init_freeable+0x1ac/0x28c [ 20.508276] [c0000000790c3db0] [c00000000000b3bc] .kernel_init+0x1c/0x140 [ 20.508280] [c0000000790c3e30] [c0000000000098f4] .ret_from_kernel_thread+0x58/0x64 Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
This halves the exception table size on 64-bit builds, and it allows build-time sorting of exception tables to work on relocated kernels. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> [mpe: Minor asm fixups and bits to keep the selftests working] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
This macro is taken from s390, and allows more flexibility in changing exception table format. mpe: Put it in ppc_asm.h and only define one version using stringinfy_in_c(). Add some empty definitions and headers to keep the selftests happy. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
We haven't seen these before, but the soon to be merged relative exception tables support causes them to be generated. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
There's no reason to #error if we include ppc_asm.h in asm files, the ifdef already prevents any problems. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Nicholas Piggin 提交于
Exception handlers are aligned to 128 bytes (L1 cache) on 64s, which is overkill. It can reduce the icache footprint of any individual exception path. However taken as a whole, the expansion in icache footprint seems likely to be counter-productive and cause more total misses. Create IFETCH_ALIGN_SHIFT/BYTES, which should give optimal ifetch alignment with much more reasonable alignment. This saves 1792 bytes from head_64.o text with an allmodconfig build. Other subarchitectures should define appropriate IFETCH_ALIGN_SHIFT values if this becomes more widely used. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Rui Teng 提交于
There are three #ifdef CONFIG_PPC_BOOK3E sections in nohash/64/pgtable.h. And there should be no configurations possible which use nohash/64/pgtable.h but don't also enable CONFIG_PPC_BOOK3E. Suggested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NRui Teng <rui.teng@linux.vnet.ibm.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 30 10月, 2016 1 次提交
-
-
由 Ivan Vecera 提交于
Commit 01cfbad7 "ipv4: Update parameters for csum_tcpudp_magic to their original types" changed parameters for csum_tcpudp_magic and csum_tcpudp_nofold for many platforms but not for PowerPC. Fixes: 01cfbad7 "ipv4: Update parameters for csum_tcpudp_magic to their original types" Cc: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: NIvan Vecera <ivecera@redhat.com> Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 10月, 2016 1 次提交
-
-
由 Jiri Olsa 提交于
The trinity syscall fuzzer triggered following WARN() on powerpc: WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278 ... NIP [c00000000093aedc] .hw_breakpoint_handler+0x28c/0x2b0 LR [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0 Call Trace: [c0000002f7933580] [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0 (unreliable) [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0 [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0 [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0 [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100 [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48 Followed by a lockdep warning: =============================== [ INFO: suspicious RCU usage. ] 4.8.0-rc5+ #7 Tainted: G W ------------------------------- ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side critical section! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by ls/2998: #0: (rcu_read_lock){......}, at: [<c0000000000f6a00>] .__atomic_notifier_call_chain+0x0/0x1c0 #1: (rcu_read_lock){......}, at: [<c00000000093ac50>] .hw_breakpoint_handler+0x0/0x2b0 stack backtrace: CPU: 9 PID: 2998 Comm: ls Tainted: G W 4.8.0-rc5+ #7 Call Trace: [c0000002f7933150] [c00000000094b1f8] .dump_stack+0xe0/0x14c (unreliable) [c0000002f79331e0] [c00000000013c468] .lockdep_rcu_suspicious+0x138/0x180 [c0000002f7933270] [c0000000001005d8] .___might_sleep+0x278/0x2e0 [c0000002f7933300] [c000000000935584] .mutex_lock_nested+0x64/0x5a0 [c0000002f7933410] [c00000000023084c] .perf_event_ctx_lock_nested+0x16c/0x380 [c0000002f7933500] [c000000000230a80] .perf_event_disable+0x20/0x60 [c0000002f7933580] [c00000000093aeec] .hw_breakpoint_handler+0x29c/0x2b0 [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0 [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0 [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0 [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100 [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48 While it looks like the first WARN() is probably valid, the other one is triggered by disabling event via perf_event_disable() from atomic context. The event is disabled here in case we were not able to emulate the instruction that hit the breakpoint. By disabling the event we unschedule the event and make sure it's not scheduled back. But we can't call perf_event_disable() from atomic context, instead we need to use the event's pending_disable irq_work method to disable it. Reported-by: NJan Stancek <jstancek@redhat.com> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Neuling <mikey@neuling.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20161026094824.GA21397@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 10月, 2016 3 次提交
-
-
由 Nicholas Piggin 提交于
This patch does a couple of things. First of all, powernv immediately explodes when running a relocated kernel, because the system reset exception for handling sleeps does not do correct relocated branches. Secondly, the sleep handling code trashes the condition and cfar registers, which we would like to preserve for debugging purposes (for non-sleep case exception). This patch changes the exception to use the standard format that saves registers before any tests or branches are made. It adds the test for idle-wakeup as an "extra" to break out of the normal exception path. Then it branches to a relocated idle handler that calls the various idle handling functions. After this patch, POWER8 CPU simulator now boots powernv kernel that is running at non-zero. Fixes: 948cf67c ("powerpc: Add NAP mode support on Power7 in HV mode") Cc: stable@vger.kernel.org # v3.0+ Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Acked-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Before this patch, we used tlbiel, if we ever ran only on this core. That was mostly derived from the nohash usage of the same. But is incorrect, the ISA 3.0 clarifies tlbiel such that: "All TLB entries that have all of the following properties are made invalid on the thread executing the tlbiel instruction" ie. tlbiel only invalidates TLB entries on the current thread. So if the mm has been used on any other thread (aka. cpu) then we must broadcast the invalidate. This bug could lead to invalid TLB entries if a program runs on multiple threads of a core. Hence use tlbiel, if we only ever ran on only the current cpu. Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Valentin Rothberg 提交于
It should be ALTIVEC, not ALIVEC. Cyril explains: If a thread performs a transaction with altivec and then gets preempted for whatever reason, this bug may cause the kernel to not re-enable altivec when that thread runs again. This will result in an altivec unavailable fault, when that fault happens inside a user transaction the kernel has no choice but to enable altivec and doom the transaction. The result is that transactions using altivec may get aborted more often than they should. The difficulty in catching this with a selftest is my deliberate use of the word may above. Optimisations to avoid FPU/altivec/VSX faults mean that the kernel will always leave them on for 255 switches. This code prevents the kernel turning it off if it got to the 256th switch (and userspace was transactional). Fixes: dc16b553 ("powerpc: Always restore FPU/VEC/VSX if hardware transactional memory in use") Reviewed-by: NCyril Bur <cyrilbur@gmail.com> Signed-off-by: NValentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 24 10月, 2016 2 次提交
-
-
由 Paul Mackerras 提交于
This fixes a race condition where one thread that is entering or leaving a power-saving state can inadvertently ignore the lock bit that was set by another thread, and potentially also clear it. The core_idle_lock_held function is called when the lock bit is seen to be set. It polls the lock bit until it is clear, then does a lwarx to load the word containing the lock bit and thread idle bits so it can be updated. However, it is possible that the value loaded with the lwarx has the lock bit set, even though an immediately preceding lwz loaded a value with the lock bit clear. If this happens then we go ahead and update the word despite the lock bit being set, and when called from pnv_enter_arch207_idle_mode, we will subsequently clear the lock bit. No identifiable misbehaviour has been attributed to this race. This fixes it by checking the lock bit in the value loaded by the lwarx. If it is set then we just go back and keep on polling. Fixes: b32aadc1 ("powerpc/powernv: Fix race in updating core_idle_state") Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
Commit 8117ac6a ("powerpc/powernv: Switch off MMU before entering nap/sleep/rvwinkle mode", 2014-12-10) fixed a race condition where one thread entering a KVM guest could switch the MMU context to the guest while another thread was still in host kernel context with the MMU on. That commit moved the point where a thread entering a power-saving mode set its kvm_hstate.hwthread_state field in its PACA to KVM_HWTHREAD_IN_IDLE from a point where the MMU was on to after the MMU had been switched off. That commit also added a comment explaining that we have to switch to real mode before setting hwthread_state to avoid this race. Nevertheless, commit 4eae2c9a ("powerpc/powernv: Make pnv_powersave_common more generic", 2016-07-08) subsequently moved the setting of hwthread_state back to a point where the MMU is on, thus reintroducing the race, despite the comment saying that this should not be done being included in full in the context lines of the patch that did it. This fixes the race again and adds a bigger and shoutier comment explaining the potential race condition. Fixes: 4eae2c9a ("powerpc/powernv: Make pnv_powersave_common more generic") Cc: stable@vger.kernel.org # v4.8+ Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Reviewed-by: NShreyas B. Prabhu <shreyasbp@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 22 10月, 2016 2 次提交
-
-
由 Segher Boessenkool 提交于
PowerPC's "cmp" instruction has four operands. Normally people write "cmpw" or "cmpd" for the second cmp operand 0 or 1. But, frequently people forget, and write "cmp" with just three operands. With older binutils this is silently accepted as if this was "cmpw", while often "cmpd" is wanted. With newer binutils GAS will complain about this for 64-bit code. For 32-bit code it still silently assumes "cmpw" is what is meant. In this instance the code comes directly from ISA v2.07, including the cmp, but cmpd is correct. Backport to stable so that new toolchains can build old kernels. Fixes: 948cf67c ("powerpc: Add NAP mode support on Power7 in HV mode") Cc: stable@vger.kernel.org # v3.0 Reviewed-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: NSegher Boessenkool <segher@kernel.crashing.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Commit 5d375199 ("KVM: PPC: Book3S HV: Set server for passed-through interrupts") broke the SMP=n build: arch/powerpc/kvm/book3s_hv_rm_xics.c:758:2: error: implicit declaration of function 'get_hard_smp_processor_id' That is because we lost the implicit include of asm/smp.h, so include it explicitly to get the definition for get_hard_smp_processor_id(). Fixes: 5d375199 ("KVM: PPC: Book3S HV: Set server for passed-through interrupts") Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 19 10月, 2016 6 次提交
-
-
由 Lorenzo Stoakes 提交于
This removes the 'write' argument from access_process_vm() and replaces it with 'gup_flags' as use of this function previously silently implied FOLL_FORCE, whereas after this patch callers explicitly pass this flag. We make this explicit as use of FOLL_FORCE can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com> Acked-by: NJesper Nilsson <jesper.nilsson@axis.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Rothwell 提交于
Eliminates warning messages: <stdin>:1316:2: warning: #warning syscall pkey_mprotect not implemented [-Wcpp] <stdin>:1319:2: warning: #warning syscall pkey_alloc not implemented [-Wcpp] <stdin>:1322:2: warning: #warning syscall pkey_free not implemented [-Wcpp] Hopefully we will remember to revert this commit if we ever implement them. Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Acked-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
With recent update to printk, we get console output like below: [ 0.550639] Brought up 160 CPUs [ 0.550718] Node 0 CPUs: [ 0.550721] 0 [ 0.550754] -39 [ 0.550794] Node 1 CPUs: [ 0.550798] 40 [ 0.550817] -79 [ 0.550856] Node 16 CPUs: [ 0.550860] 80 [ 0.550880] -119 [ 0.550917] Node 17 CPUs: [ 0.550923] 120 [ 0.550942] -159 Fix this by properly using pr_cont(), ie. KERN_CONT. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
At boot we dump the NUMA memory topology in dump_numa_memory_topology(), at KERN_DEBUG level, resulting in output like: Node 0 Memory: 0x0-0x100000000 Node 1 Memory: 0x100000000-0x200000000 Which is nice enough, but immediately after that we iterate over each node and call setup_node_data(), which also prints out the node ranges, at KERN_INFO, giving eg: numa: Initmem setup node 0 [mem 0x00000000-0xffffffff] numa: Initmem setup node 1 [mem 0x100000000-0x1ffffffff] Additionally dump_numa_memory_topology() does not use KERN_CONT correctly, resulting in split output lines on recent kernels. So drop dump_numa_memory_topology() as superfluous chatter. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Acked-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Heiner Kallweit 提交于
This commit broke boot on systems with an uncompressed kernel image, namely systems using a cuImage. On such systems the compressed boot image (boot wrapper, uncompressed kernel image, ..) is decompressed by u-boot already, therefore the boot wrapper code sees an uncompressed kernel image. The old decompression code silently assumed an uncompressed kernel image if it found no valid gzip signature, whilst the new code bailed out in this case. Fix this by re-introducing such a fallback if no valid compressed image is found. Fixes: 1b7898ee ("Use the pre-boot decompression API") Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Frederic Barrat 提交于
If a cxl adapter faults on an invalid address for a kernel context, we may enter copro_calculate_slb() with a NULL mm pointer (kernel context) and an effective address which looks like a user address. Which will cause a crash when dereferencing mm. It is clearly an AFU bug, but there's no reason to crash either. So return an error, so that cxl can ack the interrupt with an address error. Fixes: 73d16a6e ("powerpc/cell: Move data segment faulting code out of cell platform") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: NFrederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: NIan Munsie <imunsie@au1.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 12 10月, 2016 3 次提交
-
-
Add support for the DMA_ATTR_NO_WARN attribute on powerpc iommu code. Link: http://lkml.kernel.org/r/1470092390-25451-3-git-send-email-mauricfo@linux.vnet.ibm.comSigned-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michael Ellerman 提交于
In commit 2b4e3ad8 ("powerpc/mm/hash64: Don't test for machine type to detect HEA special case") we changed the logic in might_have_hea() to check FW_FEATURE_SPLPAR rather than machine_is(pseries). However the check was incorrectly negated, leading to crashes on machines with HEA adapters, such as: mm: Hashing failure ! EA=0xd000080080004040 access=0x800000000000000e current=NetworkManager trap=0x300 vsid=0x13d349c ssize=1 base psize=2 psize 2 pte=0xc0003cc033e701ae Unable to handle kernel paging request for data at address 0xd000080080004040 Call Trace: .ehea_create_cq+0x148/0x340 [ehea] (unreliable) .ehea_up+0x258/0x1200 [ehea] .ehea_open+0x44/0x1a0 [ehea] ... Fix it by removing the negation. Fixes: 2b4e3ad8 ("powerpc/mm/hash64: Don't test for machine type to detect HEA special case") Cc: stable@vger.kernel.org # v4.8+ Reported-by: NDenis Kirjanov <kda@linux-powerpc.org> Reported-by: NJan Stancek <jstancek@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Paul Mackerras 提交于
Debugging a data corruption issue with virtio-net/vhost-net led to the observation that __copy_tofrom_user was occasionally returning a value 16 larger than it should. Since the return value from __copy_tofrom_user is the number of bytes not copied, this means that __copy_tofrom_user can occasionally return a value larger than the number of bytes it was asked to copy. In turn this can cause higher-level copy functions such as copy_page_to_iter_iovec to corrupt memory by copying data into the wrong memory locations. It turns out that the failing case involves a fault on the store at label 79, and at that point the first unmodified byte of the destination is at R3 + 16. Consequently the exception handler for that store needs to add 16 to R3 before using it to work out how many bytes were not copied, but in this one case it was not adding the offset to R3. To fix it, this moves the label 179 to the point where we add 16 to R3. I have checked manually all the exception handlers for the loads and stores in this code and the rest of them are correct (it would be excellent to have an automated test of all the exception cases). This bug has been present since this code was initially committed in May 2002 to Linux version 2.5.20. Cc: stable@vger.kernel.org Signed-off-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 11 10月, 2016 3 次提交
-
-
由 Nicholas Piggin 提交于
power4_fixup_nap is called from the "common" handlers, not the virt/real handlers, therefore it should itself be a common handler. Placing it down in the trampoline space caused it to go out of reach of its callers, requiring a trampoline inserted at the start of the text section, which breaks the fixed section address calculations. Fixes: da2bc464 ("powerpc/64s: Add new exception vector macros") Reported-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Laurent Dufour 提交于
This commit fixes a stack corruption in the pseries specific code dealing with the huge pages. In __pSeries_lpar_hugepage_invalidate() the buffer used to pass arguments to the hypervisor is not large enough. This leads to a stack corruption where a previously saved register could be corrupted leading to unexpected result in the caller, like the following panic: Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=2048 NUMA pSeries Modules linked in: virtio_balloon ip_tables x_tables autofs4 virtio_blk 8139too virtio_pci virtio_ring 8139cp virtio CPU: 11 PID: 1916 Comm: mmstress Not tainted 4.8.0 #76 task: c000000005394880 task.stack: c000000005570000 NIP: c00000000027bf6c LR: c00000000027bf64 CTR: 0000000000000000 REGS: c000000005573820 TRAP: 0300 Not tainted (4.8.0) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 84822884 XER: 20000000 CFAR: c00000000010a924 DAR: 420000000014e5e0 DSISR: 40000000 SOFTE: 1 GPR00: c00000000027bf64 c000000005573aa0 c000000000e02800 c000000004447964 GPR04: c00000000404de18 c000000004d38810 00000000042100f5 00000000f5002104 GPR08: e0000000f5002104 0000000000000001 042100f5000000e0 00000000042100f5 GPR12: 0000000000002200 c00000000fe02c00 c00000000404de18 0000000000000000 GPR16: c1ffffffffffe7ff 00003fff62000000 420000000014e5e0 00003fff63000000 GPR20: 0008000000000000 c0000000f7014800 0405e600000000e0 0000000000010000 GPR24: c000000004d38810 c000000004447c10 c00000000404de18 c000000004447964 GPR28: c000000005573b10 c000000004d38810 00003fff62000000 420000000014e5e0 NIP [c00000000027bf6c] zap_huge_pmd+0x4c/0x470 LR [c00000000027bf64] zap_huge_pmd+0x44/0x470 Call Trace: [c000000005573aa0] [c00000000027bf64] zap_huge_pmd+0x44/0x470 (unreliable) [c000000005573af0] [c00000000022bbd8] unmap_page_range+0xcf8/0xed0 [c000000005573c30] [c00000000022c2d4] unmap_vmas+0x84/0x120 [c000000005573c80] [c000000000235448] unmap_region+0xd8/0x1b0 [c000000005573d80] [c0000000002378f0] do_munmap+0x2d0/0x4c0 [c000000005573df0] [c000000000237be4] SyS_munmap+0x64/0xb0 [c000000005573e30] [c000000000009560] system_call+0x38/0x108 Instruction dump: fbe1fff8 fb81ffe0 7c7f1b78 7ca32b78 7cbd2b78 f8010010 7c9a2378 f821ffb1 7cde3378 4bfffea9 7c7b1b79 41820298 <e87f0000> 48000130 7fa5eb78 7fc4f378 Most of the time, the bug is surfacing in a caller up in the stack from __pSeries_lpar_hugepage_invalidate() which is quite confusing. This bug is pending since v3.11 but was hidden if a caller of the caller of __pSeries_lpar_hugepage_invalidate() has pushed the corruped register (r18 in this case) in the stack and is not using it until restoring it. GCC 6.2.0 seems to raise it more frequently. This commit also change the definition of the parameter buffer in pSeries_lpar_flush_hash_range() to rely on the global define PLPAR_HCALL9_BUFSIZE (no functional change here). Fixes: 1a527286 ("powerpc: Optimize hugepage invalidate") Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Emese Revfy 提交于
This adds a new gcc plugin named "latent_entropy". It is designed to extract as much possible uncertainty from a running system at boot time as possible, hoping to capitalize on any possible variation in CPU operation (due to runtime data differences, hardware differences, SMP ordering, thermal timing variation, cache behavior, etc). At the very least, this plugin is a much more comprehensive example for how to manipulate kernel code using the gcc plugin internals. The need for very-early boot entropy tends to be very architecture or system design specific, so this plugin is more suited for those sorts of special cases. The existing kernel RNG already attempts to extract entropy from reliable runtime variation, but this plugin takes the idea to a logical extreme by permuting a global variable based on any variation in code execution (e.g. a different value (and permutation function) is used to permute the global based on loop count, case statement, if/then/else branching, etc). To do this, the plugin starts by inserting a local variable in every marked function. The plugin then adds logic so that the value of this variable is modified by randomly chosen operations (add, xor and rol) and random values (gcc generates separate static values for each location at compile time and also injects the stack pointer at runtime). The resulting value depends on the control flow path (e.g., loops and branches taken). Before the function returns, the plugin mixes this local variable into the latent_entropy global variable. The value of this global variable is added to the kernel entropy pool in do_one_initcall() and _do_fork(), though it does not credit any bytes of entropy to the pool; the contents of the global are just used to mix the pool. Additionally, the plugin can pre-initialize arrays with build-time random contents, so that two different kernel builds running on identical hardware will not have the same starting values. Signed-off-by: NEmese Revfy <re.emese@gmail.com> [kees: expanded commit message and code comments] Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 08 10月, 2016 3 次提交
-
-
由 Chris Metcalf 提交于
When doing an nmi backtrace of many cores, most of which are idle, the output is a little overwhelming and very uninformative. Suppress messages for cpus that are idling when they are interrupted and just emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN". We do this by grouping all the cpuidle code together into a new .cpuidle.text section, and then checking the address of the interrupted PC to see if it lies within that section. This commit suitably tags x86 and tile idle routines, and only adds in the minimal framework for other architectures. Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm] Tested-by: NPetr Mladek <pmladek@suse.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Russell King <linux@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vineet Gupta 提交于
This came to light when implementing native 64-bit atomics for ARCv2. The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE to check whether atomic64_dec_if_positive() is available. It seems it was needed when not every arch defined it. However as of current code the Kconfig option seems needless - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a generic definition of API is present lib/atomic64.c - arches with native 64-bit atomics select it in arch/*/Kconfig and define the API in their headers So I see no point in keeping the Kconfig option Compile tested for: - blackfin (CONFIG_GENERIC_ATOMIC64) - x86 (!CONFIG_GENERIC_ATOMIC64) - ia64 Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.comSigned-off-by: NVineet Gupta <vgupta@synopsys.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ming Lin <ming.l@ssi.samsung.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Andi Kleen <ak@linux.intel.com> Cc: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Srikar Dronamraju 提交于
Currently significant amount of memory is reserved only in kernel booted to capture kernel dump using the fa_dump method. Kernels compiled with CONFIG_DEFERRED_STRUCT_PAGE_INIT will initialize only certain size memory per node. The certain size takes into account the dentry and inode cache sizes. Currently the cache sizes are calculated based on the total system memory including the reserved memory. However such a kernel when booting the same kernel as fadump kernel will not be able to allocate the required amount of memory to suffice for the dentry and inode caches. This results in crashes like Hence only implement arch_reserved_kernel_pages() for CONFIG_FA_DUMP configurations. The amount reserved will be reduced while calculating the large caches and will avoid crashes like the below on large systems such as 32 TB systems. Dentry cache hash table entries: 536870912 (order: 16, 4294967296 bytes) vmalloc: allocation failure, allocated 4097114112 of 17179934720 bytes swapper/0: page allocation failure: order:0, mode:0x2080020(GFP_ATOMIC) CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6-master+ #3 Call Trace: dump_stack+0xb0/0xf0 (unreliable) warn_alloc_failed+0x114/0x160 __vmalloc_node_range+0x304/0x340 __vmalloc+0x6c/0x90 alloc_large_system_hash+0x1b8/0x2c0 inode_init+0x94/0xe4 vfs_caches_init+0x8c/0x13c start_kernel+0x50c/0x578 start_here_common+0x20/0xa8 Link: http://lkml.kernel.org/r/1472476010-4709-4-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Suggested-by: NMel Gorman <mgorman@techsingularity.net> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Hari Bathini <hbathini@linux.vnet.ibm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 10月, 2016 1 次提交
-
-
由 Naveen N. Rao 提交于
In line with similar support for other architectures by Daniel Borkmann. 'MOD Default X' from test_bpf without constant blinding: 84 bytes emitted from JIT compiler (pass:3, flen:7) d0000000058a4688 + <x>: 0: nop 4: nop 8: std r27,-40(r1) c: std r28,-32(r1) 10: xor r8,r8,r8 14: xor r28,r28,r28 18: mr r27,r3 1c: li r8,66 20: cmpwi r28,0 24: bne 0x0000000000000030 28: li r8,0 2c: b 0x0000000000000044 30: divwu r9,r8,r28 34: mullw r9,r28,r9 38: subf r8,r9,r8 3c: rotlwi r8,r8,0 40: li r8,66 44: ld r27,-40(r1) 48: ld r28,-32(r1) 4c: mr r3,r8 50: blr ... and with constant blinding: 140 bytes emitted from JIT compiler (pass:3, flen:11) d00000000bd6ab24 + <x>: 0: nop 4: nop 8: std r27,-40(r1) c: std r28,-32(r1) 10: xor r8,r8,r8 14: xor r28,r28,r28 18: mr r27,r3 1c: lis r2,-22834 20: ori r2,r2,36083 24: rotlwi r2,r2,0 28: xori r2,r2,36017 2c: xoris r2,r2,42702 30: rotlwi r2,r2,0 34: mr r8,r2 38: rotlwi r8,r8,0 3c: cmpwi r28,0 40: bne 0x000000000000004c 44: li r8,0 48: b 0x000000000000007c 4c: divwu r9,r8,r28 50: mullw r9,r28,r9 54: subf r8,r9,r8 58: rotlwi r8,r8,0 5c: lis r2,-17137 60: ori r2,r2,39065 64: rotlwi r2,r2,0 68: xori r2,r2,39131 6c: xoris r2,r2,48399 70: rotlwi r2,r2,0 74: mr r8,r2 78: rotlwi r8,r8,0 7c: ld r27,-40(r1) 80: ld r28,-32(r1) 84: mr r3,r8 88: blr Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-