- 29 6月, 2016 1 次提交
-
-
由 Michael Neuling 提交于
Currently we have 2 segments that are bolted for the kernel linear mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel stacks. Anything accessed outside of these regions may need to be faulted in. (In practice machines with TM always have 1T segments) If a machine has < 2TB of memory we never fault on the kernel linear mapping as these two segments cover all physical memory. If a machine has > 2TB of memory, there may be structures outside of these two segments that need to be faulted in. This faulting can occur when running as a guest as the hypervisor may remove any SLB that's not bolted. When we treclaim and trecheckpoint we have a window where we need to run with the userspace GPRs. This means that we no longer have a valid stack pointer in r1. For this window we therefore clear MSR RI to indicate that any exceptions taken at this point won't be able to be handled. This means that we can't take segment misses in this RI=0 window. In this RI=0 region, we currently access the thread_struct for the process being context switched to or from. This thread_struct access may cause a segment fault since it's not guaranteed to be covered by the two bolted segment entries described above. We've seen this with a crash when running as a guest with > 2TB of memory on PowerVM: Unrecoverable exception 4100 at c00000000004f138 Oops: Unrecoverable exception, sig: 6 [#1] SMP NR_CPUS=2048 NUMA pSeries CPU: 1280 PID: 7755 Comm: kworker/1280:1 Tainted: G X 4.4.13-46-default #1 task: c000189001df4210 ti: c000189001d5c000 task.ti: c000189001d5c000 NIP: c00000000004f138 LR: 0000000010003a24 CTR: 0000000010001b20 REGS: c000189001d5f730 TRAP: 4100 Tainted: G X (4.4.13-46-default) MSR: 8000000100001031 <SF,ME,IR,DR,LE> CR: 24000048 XER: 00000000 CFAR: c00000000004ed18 SOFTE: 0 GPR00: ffffffffc58d7b60 c000189001d5f9b0 00000000100d7d00 000000003a738288 GPR04: 0000000000002781 0000000000000006 0000000000000000 c0000d1f4d889620 GPR08: 000000000000c350 00000000000008ab 00000000000008ab 00000000100d7af0 GPR12: 00000000100d7ae8 00003ffe787e67a0 0000000000000000 0000000000000211 GPR16: 0000000010001b20 0000000000000000 0000000000800000 00003ffe787df110 GPR20: 0000000000000001 00000000100d1e10 0000000000000000 00003ffe787df050 GPR24: 0000000000000003 0000000000010000 0000000000000000 00003fffe79e2e30 GPR28: 00003fffe79e2e68 00000000003d0f00 00003ffe787e67a0 00003ffe787de680 NIP [c00000000004f138] restore_gprs+0xd0/0x16c LR [0000000010003a24] 0x10003a24 Call Trace: [c000189001d5f9b0] [c000189001d5f9f0] 0xc000189001d5f9f0 (unreliable) [c000189001d5fb90] [c00000000001583c] tm_recheckpoint+0x6c/0xa0 [c000189001d5fbd0] [c000000000015c40] __switch_to+0x2c0/0x350 [c000189001d5fc30] [c0000000007e647c] __schedule+0x32c/0x9c0 [c000189001d5fcb0] [c0000000007e6b58] schedule+0x48/0xc0 [c000189001d5fce0] [c0000000000deabc] worker_thread+0x22c/0x5b0 [c000189001d5fd80] [c0000000000e7000] kthread+0x110/0x130 [c000189001d5fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4 Instruction dump: 7cb103a6 7cc0e3a6 7ca222a6 78a58402 38c00800 7cc62838 08860000 7cc000a6 38a00006 78c60022 7cc62838 0b060000 <e8c701a0> 7ccff120 e8270078 e8a70098 ---[ end trace 602126d0a1dedd54 ]--- This fixes this by copying the required data from the thread_struct to the stack before we clear MSR RI. Then once we clear RI, we only access the stack, guaranteeing there's no segment miss. We also tighten the region over which we set RI=0 on the treclaim() path. This may have a slight performance impact since we're adding an mtmsr instruction. Fixes: 090b9284 ("powerpc/tm: Clear MSR RI in non-recoverable TM code") Signed-off-by: NMichael Neuling <mikey@neuling.org> Reviewed-by: NCyril Bur <cyrilbur@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 28 6月, 2016 1 次提交
-
-
由 Gavin Shan 提交于
When calling eeh_rmv_device() in eeh_reset_device() for partial hotplug case, @rmv_data instead of its address is the proper argument. Otherwise, the stack frame is corrupted when writing to @rmv_data (actually its address) in eeh_rmv_device(). It results in kernel crash as observed. This fixes the issue by passing @rmv_data, not its address to eeh_rmv_device() in eeh_reset_device(). Fixes: 67086e32 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE") Reported-by: NPridhiviraj Paidipeddi <ppaidipe@in.ibm.com> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 27 6月, 2016 1 次提交
-
-
由 Cyril Bur 提交于
Userspace can quite legitimately perform an exec() syscall with a suspended transaction. exec() does not return to the old process, rather it load a new one and starts that, the expectation therefore is that the new process starts not in a transaction. Currently exec() is not treated any differently to any other syscall which creates problems. Firstly it could allow a new process to start with a suspended transaction for a binary that no longer exists. This means that the checkpointed state won't be valid and if the suspended transaction were ever to be resumed and subsequently aborted (a possibility which is exceedingly likely as exec()ing will likely doom the transaction) the new process will jump to invalid state. Secondly the incorrect attempt to keep the transactional state while still zeroing state for the new process creates at least two TM Bad Things. The first triggers on the rfid to return to userspace as start_thread() has given the new process a 'clean' MSR but the suspend will still be set in the hardware MSR. The second TM Bad Thing triggers in __switch_to() as the processor is still transactionally suspended but __switch_to() wants to zero the TM sprs for the new process. This is an example of the outcome of calling exec() with a suspended transaction. Note the first 700 is likely the first TM bad thing decsribed earlier only the kernel can't report it as we've loaded userspace registers. c000000000009980 is the rfid in fast_exception_return() Bad kernel stack pointer 3fffcfa1a370 at c000000000009980 Oops: Bad kernel stack pointer, sig: 6 [#1] CPU: 0 PID: 2006 Comm: tm-execed Not tainted NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000 REGS: c00000003ffefd40 TRAP: 0700 Not tainted MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]> CR: 00000000 XER: 00000000 CFAR: c0000000000098b4 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000 GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000 NIP [c000000000009980] fast_exception_return+0xb0/0xb8 LR [0000000000000000] (null) Call Trace: Instruction dump: f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070 e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b Kernel BUG at c000000000043e80 [verbose debug info unavailable] Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033) Oops: Unrecoverable exception, sig: 6 [#2] CPU: 0 PID: 2006 Comm: tm-execed Tainted: G D task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000 NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000 REGS: c00000003ffef7e0 TRAP: 0700 Tainted: G D MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]> CR: 28002828 XER: 00000000 CFAR: c000000000015a20 SOFTE: 0 PACATMSCRATCH: b00000010000d033 GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000 GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000 GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004 GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000 GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000 GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80 NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c LR [c000000000015a24] __switch_to+0x1f4/0x420 Call Trace: Instruction dump: 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020 This fixes CVE-2016-5828. Fixes: bc2a9408 ("powerpc: Hook in new transactional memory code") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: NCyril Bur <cyrilbur@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 23 6月, 2016 2 次提交
-
-
由 Naveen N. Rao 提交于
Classic BPF JIT was never ported completely to work on little endian powerpc. However, it can be enabled and will crash the system when used. As such, disable use of BPF JIT on ppc64le. Fixes: 7c105b63 ("powerpc: Add CONFIG_CPU_LITTLE_ENDIAN kernel config option.") Reported-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
As part of the Radix MMU support we added some feature sections in the SLB miss handler. These are intended to catch the case that we incorrectly take an SLB miss when Radix is enabled, and instead of crashing weirdly they bail out to a well defined exit path and trigger an oops. However the way they were written meant the bailout case was enabled by default until we did CPU feature patching. On powermacs the early debug prints in setup_system() can cause an SLB miss, which happens before code patching, and so the SLB miss handler would incorrectly bailout and crash during boot. Fix it by inverting the sense of the feature section, so that the code which is in place at boot is correct for the hash case. Once we determine we are using Radix - which will never happen on a powermac - only then do we patch in the bailout case which unconditionally jumps. Fixes: caca285e ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Reported-by: NDenis Kirjanov <kda@linux-powerpc.org> Tested-by: NDenis Kirjanov <kda@linux-powerpc.org> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 17 6月, 2016 3 次提交
-
-
由 Gavin Shan 提交于
The PE primary bus cannot be got from its child devices when having full hotplug in error recovery. The PE primary bus is cached, which is done in commit <05ba75f8> ("powerpc/eeh: Fix stale cached primary bus"). In eeh_reset_device(), the flag (EEH_PE_PRI_BUS) is cleared before the PCI hot remove. eeh_pe_bus_get() then returns NULL as the PE primary bus in pnv_eeh_reset() and it crashes the kernel eventually. This fixes the issue by clearing the flag (EEH_PE_PRI_BUS) before the PCI hot add. With it, the PowerNV EEH reset backend (pnv_eeh_reset()) can get valid PE primary bus through eeh_pe_bus_get(). Fixes: 67086e32 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE") Reported-by: NPridhiviraj Paidipeddi <ppaiddipe@in.ibm.com> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
ISA 3.0 updated it to be encoded as Radix tree size = 2^(RTS + 31). We have it encoded as 2^(RTS + 28). Add a helper with the correct encoding and use it instead of opencoding. Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines") Reviewed-by: NBalbir Singh <bsingharora@gmail.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
H_ENTER hcall handling in qemu had assumptions that a cache inhibited hpte entry won't have memory conference set. Also older kernel mentioned that some version of pHyp required this (the code removed by the below commit says: /* Make pHyp happy */ if ((rflags & _PAGE_NO_CACHE) && !(rflags & _PAGE_WRITETHRU)) hpte_r &= ~HPTE_R_M; But with older kernel we had some inconsistent memory conherence mapping. We always enabled memory conherence in the page fault path and removed memory conherence is _PAGE_NO_CACHE was set when we mapped the page via htab_bolt_mapping. The commit mentioned below tried to consolidate that by always enabling memory conherence. But as mentioned above that breaks Qemu H_ENTER handling. This patch update this such that we enable memory conherence only if cache inhibited is not set and bring fault handling, lpar and bolt mapping in sync. Fixes: commit 30bda41a("powerpc/mm: Drop WIMG in favour of new constant") Reported-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 14 6月, 2016 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
With commit e58e87ad "powerpc/mm: Update _PAGE_KERNEL_RO" we now use all the three PPP bits. The top bit is now used to have a PPP value of 0b110 which will be mapped to kernel read only. When updating the hpte entry use right mask such that we update the 63rd bit (top 'P' bit) too. Prior to e58e87ad we didn't support KERNEL_RO at all (it was == KERNEL_RW), so this isn't a regression as such. Fixes: e58e87ad ("powerpc/mm: Update _PAGE_KERNEL_RO") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 10 6月, 2016 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
Even though a tlb_flush() does a flush with invalidate all cache, we can end up doing an RCU page table free before calling tlb_flush(). That means we can have page walk cache entries even after we free the page table pages. This can result in us doing wrong page table walk. Avoid this by doing pwc flush on every page table free. We can't batch the pwc flush, because the rcu call back function where we free the page table pages doesn't have information of the mmu gather. Thus we have to do a pwc on every page table page freed. Note: I also removed the dummy tlb_flush_pgtable call functions for hash 32. Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
Radix invalidate control (RIC) is used to control which cache to flush using tlb instructions. When doing a PID flush, we currently flush everything including page walk cache. For address range flush, we flush only the TLB. In the next patch, we add support for flushing only the page walk cache. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
Commit 74701d59 "powerpc/mm: Rename function to indicate we are allocating fragments" renamed page_table_free() to pte_fragment_free(). One occurrence was mistyped as pte_fragment_fre(). This only breaks the nohash 64K page build, which is not the default or enabled in any defconfig. Fixes: 74701d59 ("powerpc/mm: Rename function to indicate we are allocating fragments") Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 08 6月, 2016 4 次提交
-
-
由 Aneesh Kumar K.V 提交于
PowerISA 3.0 encodes the segment size in the second half of hash page table entry. Update hpte_decode() accordingly. Fixes: 50de596d ("powerpc/mm/hash: Add support for Power9 Hash") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
In some of the radix TLB flush routines, we use a local to store the mm->context.id, AKA the PID. Currently we use an int, but the PID is unsigned long, so large values of PID will be truncated. In particular MMU_NO_CONTEXT is -1, which means all our comparisons against that value can never be true. This means we'll issue TLB flushes when we shouldn't on radix enabled machines. Fix it by using an unsigned long for the local. Discovered by Coverity. Fixes: 1a472c9d ("powerpc/mm/radix: Add tlbflush routines") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: NBalbir Singh <bsingharora@gmail.com> [mpe: Write change log] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Wolfram Sang 提交于
Because of an improper dereference, a stray 'C' character was output to the modalias when no 'compatible' was specified. This is the case for some old PowerMac drivers which only set the 'name' property. Fix it to let them match again. Reported-by: NMathieu Malaterre <malat@debian.org> Signed-off-by: NWolfram Sang <wsa@the-dreams.de> Tested-by: NMathieu Malaterre <malat@debian.org> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Andreas Schwab <schwab@linux-m68k.org> Fixes: 6543becf ("mod/file2alias: make modalias generation safe for cross compiling") Cc: stable@vger.kernel.org # v3.9+ Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Michael Ellerman 提交于
The recent commit 7cc85103 ("powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call") added a new PVR mask & value to the start of the ibm_architecture_vec[] array. However it missed the fact that further down in the array, we hard code the offset of one of the fields, and then at boot use that value to patch the value in the array. This means every update to the array must also update the #define, ugh. This means that on pseries machines we will misreport to firmware the number of cores we support, by a factor of threads_per_core. Fix it for now by updating the #define. Fixes: 7cc85103 ("powerpc/pseries: Add POWER8NVL support to ibm,client-architecture-support call") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 06 6月, 2016 2 次提交
-
-
由 Gavin Shan 提交于
In commit 8445a87f "powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism", the PE address was replaced with the PCI config address in order to remove dependency on EEH. According to PAPR spec, firmware (pHyp or QEMU) should accept "xxBBSSxx" format PCI config address, not "xxxxBBSS" provided by the patch. Note that "BB" is PCI bus number and "SS" is the combination of slot and function number. This fixes the PCI address passed to DDW RTAS calls. Fixes: 8445a87f ("powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism") Cc: stable@vger.kernel.org # v3.4+ Reported-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Tested-by: NGuilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Khem Raj 提交于
gcc-6 correctly warns about a out of bounds access arch/powerpc/kernel/ptrace.c:407:24: warning: index 32 denotes an offset greater than size of 'u64[32][1] {aka long long unsigned int[32][1]}' [-Warray-bounds] offsetof(struct thread_fp_state, fpr[32][0])); ^ check the end of array instead of beginning of next element to fix this Signed-off-by: NKhem Raj <raj.khem@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Segher Boessenkool <segher@kernel.crashing.org> Tested-by: NAaro Koskinen <aaro.koskinen@iki.fi> Acked-by: NOlof Johansson <olof@lixom.net> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 01 6月, 2016 4 次提交
-
-
由 Thomas Huth 提交于
If we do not provide the PVR for POWER8NVL, a guest on this system currently ends up in PowerISA 2.06 compatibility mode on KVM, since QEMU does not provide a generic PowerISA 2.07 mode yet. So some new instructions from POWER8 (like "mtvsrd") get disabled for the guest, resulting in crashes when using code compiled explicitly for POWER8 (e.g. with the "-mcpu=power8" option of GCC). Fixes: ddee09c0 ("powerpc: Add PVR for POWER8NVL processor") Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: NThomas Huth <thuth@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
This should not have any impact on hash, because hash does tlb invalidate with every pte update and we don't implement flush_tlb_* functions for hash. With radix we should make an explicit call to flush tlb outside pte update. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
When we converted the asm routines to C functions, we missed updating HPTE_R_R based on _PAGE_ACCESSED. ASM code used to copy over the lower bits from pte via. andi. r3,r30,0x1fe /* Get basic set of flags */ We also update the code such that we won't update the Change bit ('C' bit) always. This was added by commit c5cf0e30 ("powerpc: Fix buglet with MMU hash management"). With hash64, we need to make sure that hardware doesn't do a pte update directly. This is because we do end up with entries in TLB with no hash page table entry. This happens because when we find a hash bucket full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". For more info look at commit 0608d692("powerpc/mm: Always invalidate tlb on hpte invalidate and update"). Thus it's critical that valid hash PTEs always have reference bit set and writeable ones have change bit set. We do this by hashing a non-dirty linux PTE as read-only and always setting _PAGE_ACCESSED (and thus R) when hashing anything else in. Any attempt by Linux at clearing those bits also removes the corresponding hash entry. Commit 5cf0e30bf3d8 did that for 'C' bit by enabling 'C' bit always. We don't really need to do that because we never map a RW pte entry without setting 'C' bit. On READ fault on a RW pte entry, we still map it READ only, hence a store update in the page will still cause a hash pte fault. This patch reverts the part of commit c5cf0e30 ("[PATCH] powerpc: Fix buglet with MMU hash management") and retain the updatepp part. - If we hit the updatepp path on native, the old code without that commit, would fail to set C bcause native_hpte_updatepp() was implemented to filter the same bits as H_PROTECT and not let C through thus we would "upgrade" a RO HPTE to RW without setting C thus causing the bug. So the real fix in that commit was the change to native_hpte_updatepp Fixes: 89ff7250 ("powerpc/mm: Convert __hash_page_64K to C") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Aneesh Kumar K.V 提交于
LPCR cannot be updated when running in guest mode. Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 31 5月, 2016 2 次提交
-
-
由 Thomas Huth 提交于
We are already using the privileged versions of MMCR0, MMCR1 and MMCRA in the kernel, so for MMCR2, we should better use the privileged versions, too, to be consistent. Fixes: 240686c1 ("powerpc: Initialise PMU related regs on Power8") Cc: stable@vger.kernel.org # v3.10+ Suggested-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NThomas Huth <thuth@redhat.com> Acked-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Thomas Huth 提交于
The SIAR and SDAR registers are available twice, one time as SPRs 780 / 781 (unprivileged, but read-only), and one time as the SPRs 796 / 797 (privileged, but read and write). The Linux kernel code currently uses the unprivileged SPRs - while this is OK for reading, writing to that register of course does not work. Since the KVM code tries to write to this register, too (see the mtspr in book3s_hv_rmhandlers.S), the contents of this register sometimes get lost for the guests, e.g. during migration of a VM. To fix this issue, simply switch to the privileged SPR numbers instead. Cc: stable@vger.kernel.org Signed-off-by: NThomas Huth <thuth@redhat.com> Acked-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 30 5月, 2016 3 次提交
-
-
由 Russell Currey 提交于
The RTAS calls "ibm,configure-pe" and "ibm,configure-bridge" perform the same actions, however the former can skip configuration if unnecessary. The existing code treats them as different tokens even though only one will ever be called. Refactor this by making a single token that is assigned during init. Signed-off-by: NRussell Currey <ruscur@russell.cc> Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Russell Currey 提交于
In the "ibm,configure-pe" and "ibm,configure-bridge" RTAS calls, the spec states that values of 9900-9905 can be returned, indicating that software should delay for 10^x (where x is the last digit, i.e. 990x) milliseconds and attempt the call again. Currently, the kernel doesn't know about this, and respecting it fixes some PCI failures when the hypervisor is busy. The delay is capped at 0.2 seconds. Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: NRussell Currey <ruscur@russell.cc> Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Linus Torvalds 提交于
-
- 29 5月, 2016 13 次提交
-
-
由 George Spelvin 提交于
The self-test was updated to cover zero-length strings; the function needs to be updated, too. Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net> Fixes: fcfd2fbf ("fs/namei.c: Add hashlen_string() function") Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 George Spelvin 提交于
The original name was simply hash_string(), but that conflicted with a function with that name in drivers/base/power/trace.c, and I decided that calling it "hashlen_" was better anyway. But you have to do it in two places. [ This caused build errors for architectures that don't define CONFIG_DCACHE_WORD_ACCESS - Linus ] Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net> Reported-by: NGuenter Roeck <linux@roeck-us.net> Fixes: fcfd2fbf ("fs/namei.c: Add hashlen_string() function") Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mikulas Patocka 提交于
The HPFS filesystem used generic_show_options to produce string that is displayed in /proc/mounts. However, there is a problem that the options may disappear after remount. If we mount the filesystem with option1 and then remount it with option2, /proc/mounts should show both option1 and option2, however it only shows option2 because the whole option string is replaced with replace_mount_options in hpfs_remount_fs. To fix this bug, implement the hpfs_show_options function that prints options that are currently selected. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mikulas Patocka 提交于
Commit c8f33d0b ("affs: kstrdup() memory handling") checks if the kstrdup function returns NULL due to out-of-memory condition. However, if we are remounting a filesystem with no change to filesystem-specific options, the parameter data is NULL. In this case, kstrdup returns NULL (because it was passed NULL parameter), although no out of memory condition exists. The mount syscall then fails with ENOMEM. This patch fixes the bug. We fail with ENOMEM only if data is non-NULL. The patch also changes the call to replace_mount_options - if we didn't pass any filesystem-specific options, we don't call replace_mount_options (thus we don't erase existing reported options). Fixes: c8f33d0b ("affs: kstrdup() memory handling") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mikulas Patocka 提交于
Commit ce657611 ("hpfs: kstrdup() out of memory handling") checks if the kstrdup function returns NULL due to out-of-memory condition. However, if we are remounting a filesystem with no change to filesystem-specific options, the parameter data is NULL. In this case, kstrdup returns NULL (because it was passed NULL parameter), although no out of memory condition exists. The mount syscall then fails with ENOMEM. This patch fixes the bug. We fail with ENOMEM only if data is non-NULL. The patch also changes the call to replace_mount_options - if we didn't pass any filesystem-specific options, we don't call replace_mount_options (thus we don't erase existing reported options). Fixes: ce657611 ("hpfs: kstrdup() out of memory handling") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
git://git.linux-mips.org/pub/scm/ralf/upstream-linus由 Linus Torvalds 提交于
Pull more MIPS updates from Ralf Baechle: "This is the secondnd batch of MIPS patches for 4.7. Summary: CPS: - Copy EVA configuration when starting secondary VPs. EIC: - Clear Status IPL. Lasat: - Fix a few off by one bugs. lib: - Mark intrinsics notrace. Not only are the intrinsics uninteresting, it would cause infinite recursion. MAINTAINERS: - Add file patterns for MIPS BRCM device tree bindings. - Add file patterns for mips device tree bindings. MT7628: - Fix MT7628 pinmux typos. - wled_an pinmux gpio. - EPHY LEDs pinmux support. Pistachio: - Enable KASLR VDSO: - Build microMIPS VDSO for microMIPS kernels. - Fix aliasing warning by building with `-fno-strict-aliasing' for debugging but also tracing them might result in recursion. Misc: - Add missing FROZEN hotplug notifier transitions. - Fix clk binding example for varioius PIC32 devices. - Fix cpu interrupt controller node-names in the DT files. - Fix XPA CPU feature separation. - Fix write_gc0_* macros when writing zero. - Add inline asm encoding helpers. - Add missing VZ accessor microMIPS encodings. - Fix little endian microMIPS MSA encodings. - Add 64-bit HTW fields and fix its configuration. - Fix sigreturn via VDSO on microMIPS kernel. - Lots of typo fixes. - Add definitions of SegCtl registers and use them" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (49 commits) MIPS: Add missing FROZEN hotplug notifier transitions MIPS: Build microMIPS VDSO for microMIPS kernels MIPS: Fix sigreturn via VDSO on microMIPS kernel MIPS: devicetree: fix cpu interrupt controller node-names MIPS: VDSO: Build with `-fno-strict-aliasing' MIPS: Pistachio: Enable KASLR MIPS: lib: Mark intrinsics notrace MIPS: Fix 64-bit HTW configuration MIPS: Add 64-bit HTW fields MAINTAINERS: Add file patterns for mips device tree bindings MAINTAINERS: Add file patterns for mips brcm device tree bindings MIPS: Simplify DSP instruction encoding macros MIPS: Add missing tlbinvf/XPA microMIPS encodings MIPS: Fix little endian microMIPS MSA encodings MIPS: Add missing VZ accessor microMIPS encodings MIPS: Add inline asm encoding helpers MIPS: Spelling fix lets -> let's MIPS: VR41xx: Fix typo MIPS: oprofile: Fix typo MIPS: math-emu: Fix typo ...
-
由 Guenter Roeck 提交于
Various builds (such as i386:allmodconfig) fail with fs/binfmt_aout.c:133:2: error: expected identifier or '(' before 'return' fs/binfmt_aout.c:134:1: error: expected identifier or '(' before '}' token [ Oops. My bad, I had stupidly thought that "allmodconfig" covered this on x86-64 too, but it obviously doesn't. Egg on my face. - Linus ] Fixes: 5d22fc25 ("mm: remove more IS_ERR_VALUE abuses") Signed-off-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
git://ftp.sciencehorizons.net/linux由 Linus Torvalds 提交于
Pull string hash improvements from George Spelvin: "This series does several related things: - Makes the dcache hash (fs/namei.c) useful for general kernel use. (Thanks to Bruce for noticing the zero-length corner case) - Converts the string hashes in <linux/sunrpc/svcauth.h> to use the above. - Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two 32-bit multiplies will do well enough. - Rids the world of the bad hash multipliers in hash_32. This finishes the job started in commit 689de1d6 ("Minimal fix-up of bad hashing behavior of hash_64()") The vast majority of Linux architectures have hardware support for 32x32-bit multiply and so derive no benefit from "simplified" multipliers. The few processors that do not (68000, h8/300 and some models of Microblaze) have arch-specific implementations added. Those patches are last in the series. - Overhauls the dcache hash mixing. The patch in commit 0fed3ac8 ("namei: Improve hash mixing if CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion. Replaced with a much more careful design that's simultaneously faster and better. (My own invention, as there was noting suitable in the literature I could find. Comments welcome!) - Modify the hash_name() loop to skip the initial HASH_MIX(). This would let us salt the hash if we ever wanted to. - Sort out partial_name_hash(). The hash function is declared as using a long state, even though it's truncated to 32 bits at the end and the extra internal state contributes nothing to the result. And some callers do odd things: - fs/hfs/string.c only allocates 32 bits of state - fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes - Modify bytemask_from_count to handle inputs of 1..sizeof(long) rather than 0..sizeof(long)-1. This would simplify users other than full_name_hash" Special thanks to Bruce Fields for testing and finding bugs in v1. (I learned some humbling lessons about "obviously correct" code.) On the arch-specific front, the m68k assembly has been tested in a standalone test harness, I've been in contact with the Microblaze maintainers who mostly don't care, as the hardware multiplier is never omitted in real-world applications, and I haven't heard anything from the H8/300 world" * 'hash' of git://ftp.sciencehorizons.net/linux: h8300: Add <asm/hash.h> microblaze: Add <asm/hash.h> m68k: Add <asm/hash.h> <linux/hash.h>: Add support for architecture-specific functions fs/namei.c: Improve dcache hash function Eliminate bad hash multipliers from hash_32() and hash_64() Change hash_64() return value to 32 bits <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() fs/namei.c: Add hashlen_string() function Pull out string hash to <linux/stringhash.h>
-
由 George Spelvin 提交于
This will improve the performance of hash_32() and hash_64(), but due to complete lack of multi-bit shift instructions on H8, performance will still be bad in surrounding code. Designing H8-specific hash algorithms to work around that is a separate project. (But if the maintainers would like to get in touch...) Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp
-
由 George Spelvin 提交于
Microblaze is an FPGA soft core that can be configured various ways. If it is configured without a multiplier, the standard __hash_32() will require a call to __mulsi3, which is a slow software loop. Instead, use a shift-and-add sequence for the constant multiply. GCC knows how to do this, but it's not as clever as some. Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com>
-
由 George Spelvin 提交于
This provides a multiply by constant GOLDEN_RATIO_32 = 0x61C88647 for the original mc68000, which lacks a 32x32-bit multiply instruction. Yes, the amount of optimization effort put in is excessive. :-) Shift-add chain found by Yevgen Voronenko's Hcub algorithm at http://spiral.ece.cmu.edu/mcm/gen.htmlSigned-off-by: NGeorge Spelvin <linux@sciencehorizons.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Philippe De Muyter <phdm@macq.eu> Cc: linux-m68k@lists.linux-m68k.org
-
由 George Spelvin 提交于
This is just the infrastructure; there are no users yet. This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares the existence of <asm/hash.h>. That file may define its own versions of various functions, and define HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones. Included is a self-test (in lib/test_hash.c) that verifies the basics. It is NOT in general required that the arch-specific functions compute the same thing as the generic, but if a HAVE_* symbol is defined with the value 1, then equality is tested. Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Philippe De Muyter <phdm@macq.eu> Cc: linux-m68k@lists.linux-m68k.org Cc: Alistair Francis <alistai@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp
-
由 George Spelvin 提交于
Patch 0fed3ac8 improved the hash mixing, but the function is slower than necessary; there's a 7-instruction dependency chain (10 on x86) each loop iteration. Word-at-a-time access is a very tight loop (which is good, because link_path_walk() is one of the hottest code paths in the entire kernel), and the hash mixing function must not have a longer latency to avoid slowing it down. There do not appear to be any published fast hash functions that: 1) Operate on the input a word at a time, and 2) Don't need to know the length of the input beforehand, and 3) Have a single iterated mixing function, not needing conditional branches or unrolling to distinguish different loop iterations. One of the algorithms which comes closest is Yann Collet's xxHash, but that's two dependent multiplies per word, which is too much. The key insights in this design are: 1) Barring expensive ops like multiplies, to diffuse one input bit across 64 bits of hash state takes at least log2(64) = 6 sequentially dependent instructions. That is more cycles than we'd like. 2) An operation like "hash ^= hash << 13" requires a second temporary register anyway, and on a 2-operand machine like x86, it's three instructions. 3) A better use of a second register is to hold a two-word hash state. With careful design, no temporaries are needed at all, so it doesn't increase register pressure. And this gets rid of register copying on 2-operand machines, so the code is smaller and faster. 4) Using two words of state weakens the requirement for one-round mixing; we now have two rounds of mixing before cancellation is possible. 5) A two-word hash state also allows operations on both halves to be done in parallel, so on a superscalar processor we get more mixing in fewer cycles. I ended up using a mixing function inspired by the ChaCha and Speck round functions. It is 6 simple instructions and 3 cycles per iteration (assuming multiply by 9 can be done by an "lea" instruction): x ^= *input++; y ^= x; x = ROL(x, K1); x += y; y = ROL(y, K2); y *= 9; Not only is this reversible, two consecutive rounds are reversible: if you are given the initial and final states, but not the intermediate state, it is possible to compute both input words. This means that at least 3 words of input are required to create a collision. (It also has the property, used by hash_name() to avoid a branch, that it hashes all-zero to all-zero.) The rotate constants K1 and K2 were found by experiment. The search took a sample of random initial states (I used 1023) and considered the effect of flipping each of the 64 input bits on each of the 128 output bits two rounds later. Each of the 8192 pairs can be considered a biased coin, and adding up the Shannon entropy of all of them produces a score. The best-scoring shifts also did well in other tests (flipping bits in y, trying 3 or 4 rounds of mixing, flipping all 64*63/2 pairs of input bits), so the choice was made with the additional constraint that the sum of the shifts is odd and not too close to the word size. The final state is then folded into a 32-bit hash value by a less carefully optimized multiply-based scheme. This also has to be fast, as pathname components tend to be short (the most common case is one iteration!), but there's some room for latency, as there is a fair bit of intervening logic before the hash value is used for anything. (Performance verified with "bonnie++ -s 0 -n 1536:-2" on tmpfs. I need a better benchmark; the numbers seem to show a slight dip in performance between 4.6.0 and this patch, but they're too noisy to quote.) Special thanks to Bruce fields for diligent testing which uncovered a nasty fencepost error in an earlier version of this patch. [checkpatch.pl formatting complaints noted and respectfully disagreed with.] Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net> Tested-by: NJ. Bruce Fields <bfields@redhat.com>
-