- 15 10月, 2019 1 次提交
-
-
由 Greg Kurz 提交于
Connecting a vCPU to a XIVE KVM device means establishing a 1:1 association between a vCPU id and the offset (VP id) of a VP structure within a fixed size block of VPs. We currently try to enforce the 1:1 relationship by checking that a vCPU with the same id isn't already connected. This is good but unfortunately not enough because we don't map VP ids to raw vCPU ids but to packed vCPU ids, and the packing function kvmppc_pack_vcpu_id() isn't bijective by design. We got away with it because QEMU passes vCPU ids that fit well in the packing pattern. But nothing prevents userspace to come up with a forged vCPU id resulting in a packed id collision which causes the KVM device to associate two vCPUs to the same VP. This greatly confuses the irq layer and ultimately crashes the kernel, as shown below. Example: a guest with 1 guest thread per core, a core stride of 8 and 300 vCPUs has vCPU ids 0,8,16...2392. If QEMU is patched to inject at some point an invalid vCPU id 348, which is the packed version of itself and 2392, we get: genirq: Flags mismatch irq 199. 00010000 (kvm-2-2392) vs. 00010000 (kvm-2-348) CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38 Call Trace: [c000003f7f9937e0] [c000000000c0110c] dump_stack+0xb0/0xf4 (unreliable) [c000003f7f993820] [c0000000001cb480] __setup_irq+0xa70/0xad0 [c000003f7f9938d0] [c0000000001cb75c] request_threaded_irq+0x13c/0x260 [c000003f7f993940] [c00800000d44e7ac] kvmppc_xive_attach_escalation+0x104/0x270 [kvm] [c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm] [c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm] [c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm] [c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30 [c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120 [c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80 [c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68 xive-kvm: Failed to request escalation interrupt for queue 0 of VCPU 2392 ------------[ cut here ]------------ remove_proc_entry: removing non-empty directory 'irq/199', leaking at least 'kvm-2-348' WARNING: CPU: 24 PID: 88176 at /home/greg/Work/linux/kernel-kvm-ppc/fs/proc/generic.c:684 remove_proc_entry+0x1ec/0x200 Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm] CPU: 24 PID: 88176 Comm: qemu-system-ppc Not tainted 5.3.0-xive-nr-servers-5.3-gku+ #38 NIP: c00000000053b0cc LR: c00000000053b0c8 CTR: c0000000000ba3b0 REGS: c000003f7f9934b0 TRAP: 0700 Not tainted (5.3.0-xive-nr-servers-5.3-gku+) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 48228222 XER: 20040000 CFAR: c000000000131a50 IRQMASK: 0 GPR00: c00000000053b0c8 c000003f7f993740 c0000000015ec500 0000000000000057 GPR04: 0000000000000001 0000000000000000 000049fb98484262 0000000000001bcf GPR08: 0000000000000007 0000000000000007 0000000000000001 9000000000001033 GPR12: 0000000000008000 c000003ffffeb800 0000000000000000 000000012f4ce5a1 GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000 GPR20: 000000012f45d918 c000003f863758b0 c000003f86375870 0000000000000006 GPR24: c000003f86375a30 0000000000000007 c0002039373d9020 c0000000014c4a48 GPR28: 0000000000000001 c000003fe62a4f6b c00020394b2e9fab c000003fe62a4ec0 NIP [c00000000053b0cc] remove_proc_entry+0x1ec/0x200 LR [c00000000053b0c8] remove_proc_entry+0x1e8/0x200 Call Trace: [c000003f7f993740] [c00000000053b0c8] remove_proc_entry+0x1e8/0x200 (unreliable) [c000003f7f9937e0] [c0000000001d3654] unregister_irq_proc+0x114/0x150 [c000003f7f993880] [c0000000001c6284] free_desc+0x54/0xb0 [c000003f7f9938c0] [c0000000001c65ec] irq_free_descs+0xac/0x100 [c000003f7f993910] [c0000000001d1ff8] irq_dispose_mapping+0x68/0x80 [c000003f7f993940] [c00800000d44e8a4] kvmppc_xive_attach_escalation+0x1fc/0x270 [kvm] [c000003f7f9939d0] [c00800000d45013c] kvmppc_xive_connect_vcpu+0x424/0x620 [kvm] [c000003f7f993ac0] [c00800000d444428] kvm_arch_vcpu_ioctl+0x260/0x448 [kvm] [c000003f7f993b90] [c00800000d43593c] kvm_vcpu_ioctl+0x154/0x7c8 [kvm] [c000003f7f993d00] [c0000000004840f0] do_vfs_ioctl+0xe0/0xc30 [c000003f7f993db0] [c000000000484d44] ksys_ioctl+0x104/0x120 [c000003f7f993e00] [c000000000484d88] sys_ioctl+0x28/0x80 [c000003f7f993e20] [c00000000000b278] system_call+0x5c/0x68 Instruction dump: 2c230000 41820008 3923ff78 e8e900a0 3c82ff69 3c62ff8d 7fa6eb78 7fc5f378 3884f080 3863b948 4bbf6925 60000000 <0fe00000> 4bffff7c fba10088 4bbf6e41 ---[ end trace b925b67a74a1d8d1 ]--- BUG: Kernel NULL pointer dereference at 0x00000010 Faulting instruction address: 0xc00800000d44fc04 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: kvm_hv kvm dm_mod vhost_net vhost tap xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter squashfs loop fuse i2c_dev sg ofpart ocxl powernv_flash at24 xts mtd uio_pdrv_genirq vmx_crypto opal_prd ipmi_powernv uio ipmi_devintf ipmi_msghandler ibmpowernv ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables ext4 mbcache jbd2 raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq libcrc32c raid1 raid0 linear sd_mod ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci libata tg3 drm_panel_orientation_quirks [last unloaded: kvm] CPU: 24 PID: 88176 Comm: qemu-system-ppc Tainted: G W 5.3.0-xive-nr-servers-5.3-gku+ #38 NIP: c00800000d44fc04 LR: c00800000d44fc00 CTR: c0000000001cd970 REGS: c000003f7f9938e0 TRAP: 0300 Tainted: G W (5.3.0-xive-nr-servers-5.3-gku+) MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 24228882 XER: 20040000 CFAR: c0000000001cd9ac DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0 GPR00: c00800000d44fc00 c000003f7f993b70 c00800000d468300 0000000000000000 GPR04: 00000000000000c7 0000000000000000 0000000000000000 c000003ffacd06d8 GPR08: 0000000000000000 c000003ffacd0738 0000000000000000 fffffffffffffffd GPR12: 0000000000000040 c000003ffffeb800 0000000000000000 000000012f4ce5a1 GPR16: 000000012ef5a0c8 0000000000000000 000000012f113bb0 0000000000000000 GPR20: 000000012f45d918 00007ffffe0d9a80 000000012f4f5df0 000000012ef8c9f8 GPR24: 0000000000000001 0000000000000000 c000003fe4501ed0 c000003f8b1d0000 GPR28: c0000033314689c0 c000003fe4501c00 c000003fe4501e70 c000003fe4501e90 NIP [c00800000d44fc04] kvmppc_xive_cleanup_vcpu+0xfc/0x210 [kvm] LR [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] Call Trace: [c000003f7f993b70] [c00800000d44fc00] kvmppc_xive_cleanup_vcpu+0xf8/0x210 [kvm] (unreliable) [c000003f7f993bd0] [c00800000d450bd4] kvmppc_xive_release+0xdc/0x1b0 [kvm] [c000003f7f993c30] [c00800000d436a98] kvm_device_release+0xb0/0x110 [kvm] [c000003f7f993c70] [c00000000046730c] __fput+0xec/0x320 [c000003f7f993cd0] [c000000000164ae0] task_work_run+0x150/0x1c0 [c000003f7f993d30] [c000000000025034] do_notify_resume+0x304/0x440 [c000003f7f993e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74 Instruction dump: 3bff0008 7fbfd040 419e0054 847e0004 2fa30000 419effec e93d0000 8929203c 2f890000 419effb8 4800821d e8410018 <e9230010> e9490008 9b2a0039 7c0004ac ---[ end trace b925b67a74a1d8d2 ]--- Kernel panic - not syncing: Fatal exception This affects both XIVE and XICS-on-XIVE devices since the beginning. Check the VP id instead of the vCPU id when a new vCPU is connected. The allocation of the XIVE CPU structure in kvmppc_xive_connect_vcpu() is moved after the check to avoid the need for rollback. Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: NGreg Kurz <groug@kaod.org> Reviewed-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 11 10月, 2019 1 次提交
-
-
由 Emmanuel Nicolet 提交于
The spu_fs_context was not set in fc->fs_private, this caused a crash when accessing ctx->mode in spufs_create_root(). Fixes: d2e0981c ("vfs: Convert spufs to use the new mount API") Signed-off-by: NEmmanuel Nicolet <emmanuel.nicolet@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Acked-by: NArnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20191008141342.GA266797@gmail.com
-
- 09 10月, 2019 3 次提交
-
-
由 Jordan Niethe 提交于
kvmhv_switch_to_host() in arch/powerpc/kvm/book3s_hv_rmhandlers.S needs to set kvmppc_vcore->in_guest to 0 to signal secondary CPUs to continue. This happens after resetting the PCR. Before commit 13c7bb3c ("powerpc/64s: Set reserved PCR bits"), r0 would always be 0 before it was stored to kvmppc_vcore->in_guest. However because of this change in the commit: /* Reset PCR */ ld r0, VCORE_PCR(r5) - cmpdi r0, 0 + LOAD_REG_IMMEDIATE(r6, PCR_MASK) + cmpld r0, r6 beq 18f - li r0, 0 - mtspr SPRN_PCR, r0 + mtspr SPRN_PCR, r6 18: /* Signal secondary CPUs to continue */ stb r0,VCORE_IN_GUEST(r5) We are no longer comparing r0 against 0 and loading it with 0 if it contains something else. Hence when we store r0 to kvmppc_vcore->in_guest, it might not be 0. This means that secondary CPUs will not be signalled to continue. Those CPUs get stuck and errors like the following are logged: KVM: CPU 1 seems to be stuck KVM: CPU 2 seems to be stuck KVM: CPU 3 seems to be stuck KVM: CPU 4 seems to be stuck KVM: CPU 5 seems to be stuck KVM: CPU 6 seems to be stuck KVM: CPU 7 seems to be stuck This can be reproduced with: $ for i in `seq 1 7` ; do chcpu -d $i ; done ; $ taskset -c 0 qemu-system-ppc64 -smp 8,threads=8 \ -M pseries,accel=kvm,kvm-type=HV -m 1G -nographic -vga none \ -kernel vmlinux -initrd initrd.cpio.xz Fix by making sure r0 is 0 before storing it to kvmppc_vcore->in_guest. Fixes: 13c7bb3c ("powerpc/64s: Set reserved PCR bits") Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NJordan Niethe <jniethe5@gmail.com> Reviewed-by: NAlistair Popple <alistair@popple.id.au> Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191004025317.19340-1-jniethe5@gmail.com
-
由 Laurent Dufour 提交于
Since commit 1211ee61 ("powerpc/pseries: Read TLB Block Invalidate Characteristics"), a warning message is displayed when booting a guest on top of KVM: lpar: arch/powerpc/platforms/pseries/lpar.c pseries_lpar_read_hblkrm_characteristics Error calling get-system-parameter (0xfffffffd) This message is displayed because this hypervisor is not supporting the H_BLOCK_REMOVE hcall and thus is not exposing the corresponding feature. Reading the TLB Block Invalidate Characteristics should not be done if the feature is not exposed. Fixes: 1211ee61 ("powerpc/pseries: Read TLB Block Invalidate Characteristics") Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NLaurent Dufour <ldufour@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191001132928.72555-1-ldufour@linux.ibm.com
-
由 Stephen Rothwell 提交于
After merging the powerpc tree, today's linux-next build (powerpc64 allnoconfig) failed like this: arch/powerpc/mm/book3s64/pgtable.c:216:3: error: implicit declaration of function 'radix__flush_all_lpid_guest' radix__flush_all_lpid_guest() is only declared for CONFIG_PPC_RADIX_MMU which is not set for this build. Fix it by adding an empty version for the RADIX_MMU=n case, which should never be called. Fixes: 99161de3 ("powerpc/64s/radix: tidy up TLB flushing code") Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> [mpe: Munge change log] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190930101342.36c1afa0@canb.auug.org.au
-
- 01 10月, 2019 2 次提交
-
-
由 Masahiro Yamada 提交于
Commit 40df759e ("kbuild: Fix build with binutils <= 2.19") introduced ar-option and KBUILD_ARFLAGS to deal with old binutils. According to Documentation/process/changes.rst, the current minimal supported version of binutils is 2.21 so you can assume the 'D' option is always supported. Not only GNU ar but also llvm-ar supports it. With the 'D' option hard-coded, there is no more user of ar-option or KBUILD_ARFLAGS. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Tested-by: NNick Desaulniers <ndesaulniers@google.com>
-
由 Paolo Bonzini 提交于
The largepages debugfs entry is incremented/decremented as shadow pages are created or destroyed. Clearing it will result in an underflow, which is harmless to KVM but ugly (and could be misinterpreted by tools that use debugfs information), so make this particular statistic read-only. Cc: kvm-ppc@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 9月, 2019 2 次提交
-
-
由 Oliver O'Halloran 提交于
s/CONFIG_IOV/CONFIG_PCI_IOV/ Whoops. Fixes: bd6461cc ("powerpc/eeh: Add a eeh_dev_break debugfs interface") Signed-off-by: NOliver O'Halloran <oohall@gmail.com> [mpe: Fixup the #endif comment as well] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190926122502.14826-1-oohall@gmail.com
-
由 Mark Rutland 提交于
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few people, and until recently arm64 used these erroneously/pointlessly for other levels of page table. To make it incredibly clear that these only apply to the PTE level, and to align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them to pgtable_pte_page_{ctor,dtor}(). These changes were generated with the following shell script: ---- git grep -lw 'pgtable_page_.tor' | while read FILE; do sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE; sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE; done ---- ... with the documentation re-flowed to remain under 80 columns, and whitespace fixed up in macros to keep backslashes aligned. There should be no functional change as a result of this patch. Link: http://lkml.kernel.org/r/20190722141133.3116-1-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com> Reviewed-by: NMike Rapoport <rppt@linux.ibm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 9月, 2019 9 次提交
-
-
由 Kefeng Wang 提交于
According to 78ddc534 ("thp: rename split_huge_page_pmd() to split_huge_pmd()"), update related comment. Link: http://lkml.kernel.org/r/20190731033406.185285-1-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem cache for page table allocations on several architectures that do not use PAGE_SIZE tables for one or more levels of the page table hierarchy. Most architectures do not implement these functions and use __weak default NOP implementation of pgd_cache_init(). Since there is no such default for pgtable_cache_init(), its empty stub is duplicated among most architectures. Rename the definitions of pgd_cache_init() to pgtable_cache_init() and drop empty stubs of pgtable_cache_init(). Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com> Acked-by: Will Deacon <will@kernel.org> [arm64] Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nicholas Piggin 提交于
Patch series "mm: remove quicklist page table caches". A while ago Nicholas proposed to remove quicklist page table caches [1]. I've rebased his patch on the curren upstream and switched ia64 and sh to use generic versions of PTE allocation. [1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com This patch (of 3): Remove page table allocator "quicklists". These have been around for a long time, but have not got much traction in the last decade and are only used on ia64 and sh architectures. The numbers in the initial commit look interesting but probably don't apply anymore. If anybody wants to resurrect this it's in the git history, but it's unhelpful to have this code and divergent allocator behaviour for minor archs. Also it might be better to instead make more general improvements to page allocator if this is still so slow. Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Replace 1 << compound_order(page) with compound_nr(page). Minor improvements in readability. Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Replace PAGE_SHIFT + compound_order(page) with the new page_shift() function. Minor improvements in readability. [akpm@linux-foundation.org: fix build in tce_page_is_contained()] Link: http://lkml.kernel.org/r/201907241853.yNQTrJWd%25lkp@intel.com Link: http://lkml.kernel.org/r/20190721104612.19120-3-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NIra Weiny <ira.weiny@intel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
Right now we force an unbind of SCM memory at drcindex on H_OVERLAP error. This really slows down operations like kexec where we get the H_OVERLAP error because we don't go through a full hypervisor re init. H_OVERLAP error for a H_SCM_BIND_MEM hcall indicates that SCM memory at drc index is already bound. Since we don't specify a logical memory address for bind hcall, we can use the H_SCM_QUERY hcall to query the already bound logical address. Boot time difference with and without patch is: [ 5.583617] IOMMU table initialized, virtual merging enabled [ 5.603041] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Retrying bind after unbinding [ 301.514221] papr_scm ibm,persistent-memory:ibm,pmemory@44108001: Retrying bind after unbinding [ 340.057238] hv-24x7: read 1530 catalog entries, created 537 event attrs (0 failures), 275 descs after fix [ 5.101572] IOMMU table initialized, virtual merging enabled [ 5.116984] papr_scm ibm,persistent-memory:ibm,pmemory@44104001: Querying SCM details [ 5.117223] papr_scm ibm,persistent-memory:ibm,pmemory@44108001: Querying SCM details [ 5.120530] hv-24x7: read 1530 catalog entries, created 537 event attrs (0 failures), 275 descs Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190903123452.28620-2-aneesh.kumar@linux.ibm.com
-
由 Aneesh Kumar K.V 提交于
This simplifies the error handling and also enable us to switch to H_SCM_QUERY hcall in a later patch on H_OVERLAP error. We also do some kernel print formatting fixup in this patch. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190903123452.28620-1-aneesh.kumar@linux.ibm.com
-
由 Aneesh Kumar K.V 提交于
With PFN_MODE_PMEM namespace, the memmap area is allocated from the device area. Some architectures map the memmap area with large page size. On architectures like ppc64, 16MB page for memap mapping can map 262144 pfns. This maps a namespace size of 16G. When populating memmap region with 16MB page from the device area, make sure the allocated space is not used to map resources outside this namespace. Such usage of device area will prevent a namespace destroy. Add resource end pnf in altmap and use that to check if the memmap area allocation can map pfn outside the namespace. On ppc64 in such case we fallback to allocation from memory. This fix kernel crash reported below: [ 132.034989] WARNING: CPU: 13 PID: 13719 at mm/memremap.c:133 devm_memremap_pages_release+0x2d8/0x2e0 [ 133.464754] BUG: Unable to handle kernel data access at 0xc00c00010b204000 [ 133.464760] Faulting instruction address: 0xc00000000007580c [ 133.464766] Oops: Kernel access of bad area, sig: 11 [#1] [ 133.464771] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries ..... [ 133.464901] NIP [c00000000007580c] vmemmap_free+0x2ac/0x3d0 [ 133.464906] LR [c0000000000757f8] vmemmap_free+0x298/0x3d0 [ 133.464910] Call Trace: [ 133.464914] [c000007cbfd0f7b0] [c0000000000757f8] vmemmap_free+0x298/0x3d0 (unreliable) [ 133.464921] [c000007cbfd0f8d0] [c000000000370a44] section_deactivate+0x1a4/0x240 [ 133.464928] [c000007cbfd0f980] [c000000000386270] __remove_pages+0x3a0/0x590 [ 133.464935] [c000007cbfd0fa50] [c000000000074158] arch_remove_memory+0x88/0x160 [ 133.464942] [c000007cbfd0fae0] [c0000000003be8c0] devm_memremap_pages_release+0x150/0x2e0 [ 133.464949] [c000007cbfd0fb70] [c000000000738ea0] devm_action_release+0x30/0x50 [ 133.464955] [c000007cbfd0fb90] [c00000000073a5a4] release_nodes+0x344/0x400 [ 133.464961] [c000007cbfd0fc40] [c00000000073378c] device_release_driver_internal+0x15c/0x250 [ 133.464968] [c000007cbfd0fc80] [c00000000072fd14] unbind_store+0x104/0x110 [ 133.464973] [c000007cbfd0fcd0] [c00000000072ee24] drv_attr_store+0x44/0x70 [ 133.464981] [c000007cbfd0fcf0] [c0000000004a32bc] sysfs_kf_write+0x6c/0xa0 [ 133.464987] [c000007cbfd0fd10] [c0000000004a1dfc] kernfs_fop_write+0x17c/0x250 [ 133.464993] [c000007cbfd0fd60] [c0000000003c348c] __vfs_write+0x3c/0x70 [ 133.464999] [c000007cbfd0fd80] [c0000000003c75d0] vfs_write+0xd0/0x250 djbw: Aneesh notes that this crash can likely be triggered in any kernel that supports 'papr_scm', so flagging that commit for -stable consideration. Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions") Cc: <stable@vger.kernel.org> Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reviewed-by: NPankaj Gupta <pagupta@redhat.com> Tested-by: NSantosh Sivaraj <santosh@fossix.org> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Link: https://lore.kernel.org/r/20190910062826.10041-1-aneesh.kumar@linux.ibm.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Aneesh Kumar K.V 提交于
In later patch, we want to use hash_transparent_hugepage() in a kernel module. Export two related functions. Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Link: https://lore.kernel.org/r/20190924042440.27946-1-aneesh.kumar@linux.ibm.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 24 9月, 2019 6 次提交
-
-
由 Aneesh Kumar K.V 提交于
On POWER9, under some circumstances, a broadcast TLB invalidation will fail to invalidate the ERAT cache on some threads when there are parallel mtpidr/mtlpidr happening on other threads of the same core. This can cause stores to continue to go to a page after it's unmapped. The workaround is to force an ERAT flush using PID=0 or LPID=0 tlbie flush. This additional TLB flush will cause the ERAT cache invalidation. Since we are using PID=0 or LPID=0, we don't get filtered out by the TLB snoop filtering logic. We need to still follow this up with another tlbie to take care of store vs tlbie ordering issue explained in commit: a5d4b589 ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9"). The presence of ERAT cache implies we can still get new stores and they may miss store queue marking flush. Cc: stable@vger.kernel.org Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-3-aneesh.kumar@linux.ibm.com
-
由 Aneesh Kumar K.V 提交于
Rename the #define to indicate this is related to store vs tlbie ordering issue. In the next patch, we will be adding another feature flag that is used to handles ERAT flush vs tlbie ordering issue. Fixes: a5d4b589 ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-2-aneesh.kumar@linux.ibm.com
-
由 Aneesh Kumar K.V 提交于
The store ordering vs tlbie issue mentioned in commit a5d4b589 ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") is fixed for Nimbus 2.3 and Cumulus 1.3 revisions. We don't need to apply the fixup if we are running on them We can only do this on PowerNV. On pseries guest with KVM we still don't support redoing the feature fixup after migration. So we should be enabling all the workarounds needed, because whe can possibly migrate between DD 2.3 and DD 2.2 Fixes: a5d4b589 ("powerpc/mm: Fixup tlbie vs store ordering issue on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190924035254.24612-1-aneesh.kumar@linux.ibm.com
-
由 Laurent Dufour 提交于
Depending on the hardware and the hypervisor, the hcall H_BLOCK_REMOVE may not be able to process all the page sizes for a segment base page size, as reported by the TLB Invalidate Characteristics. For each pair of base segment page size and actual page size, this characteristic tells us the size of the block the hcall supports. In the case, the hcall is not supporting a pair of base segment page size, actual page size, it is returning H_PARAM which leads to a panic like this: kernel BUG at /home/srikar/work/linux.git/arch/powerpc/platforms/pseries/lpar.c:466! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 28 PID: 583 Comm: modprobe Not tainted 5.2.0-master #5 NIP: c0000000000be8dc LR: c0000000000be880 CTR: 0000000000000000 REGS: c0000007e77fb130 TRAP: 0700 Not tainted (5.2.0-master) MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 42224824 XER: 20000000 CFAR: c0000000000be8fc IRQMASK: 0 GPR00: 0000000022224828 c0000007e77fb3c0 c000000001434d00 0000000000000005 GPR04: 9000000004fa8c00 0000000000000000 0000000000000003 0000000000000001 GPR08: c0000007e77fb450 0000000000000000 0000000000000001 ffffffffffffffff GPR12: c0000007e77fb450 c00000000edfcb80 0000cd7d3ea30000 c0000000016022b0 GPR16: 00000000000000b0 0000cd7d3ea30000 0000000000000001 c080001f04f00105 GPR20: 0000000000000003 0000000000000004 c000000fbeb05f58 c000000001602200 GPR24: 0000000000000000 0000000000000004 8800000000000000 c000000000c5d148 GPR28: c000000000000000 8000000000000000 a000000000000000 c0000007e77fb580 NIP [c0000000000be8dc] .call_block_remove+0x12c/0x220 LR [c0000000000be880] .call_block_remove+0xd0/0x220 Call Trace: 0xc000000fb8c00240 (unreliable) .pSeries_lpar_flush_hash_range+0x578/0x670 .flush_hash_range+0x44/0x100 .__flush_tlb_pending+0x3c/0xc0 .zap_pte_range+0x7ec/0x830 .unmap_page_range+0x3f4/0x540 .unmap_vmas+0x94/0x120 .exit_mmap+0xac/0x1f0 .mmput+0x9c/0x1f0 .do_exit+0x388/0xd60 .do_group_exit+0x54/0x100 .__se_sys_exit_group+0x14/0x20 system_call+0x5c/0x70 Instruction dump: 39400001 38a00000 4800003c 60000000 60420000 7fa9e800 38e00000 419e0014 7d29d278 7d290074 7929d182 69270001 <0b070000> 7d495378 394a0001 7fa93040 The call to H_BLOCK_REMOVE should only be made for the supported pair of base segment page size, actual page size and using the correct maximum block size. Due to the required complexity in do_block_remove() and call_block_remove(), and the fact that currently a block size of 8 is returned by the hypervisor, we are only supporting 8 size block to the H_BLOCK_REMOVE hcall. In order to identify this limitation easily in the code, a local define HBLKR_SUPPORTED_SIZE defining the currently supported block size, and a dedicated checking helper is_supported_hlbkr() are introduced. For regular pages and hugetlb, the assumption is made that the page size is equal to the base page size. For THP the page size is assumed to be 16M. Fixes: ba2dd8a2 ("powerpc/pseries/mm: call H_BLOCK_REMOVE") Signed-off-by: NLaurent Dufour <ldufour@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190920130523.20441-3-ldufour@linux.ibm.com
-
由 Laurent Dufour 提交于
The PAPR document specifies the TLB Block Invalidate Characteristics which tells for each pair of segment base page size, actual page size, the size of the block the hcall H_BLOCK_REMOVE supports. These characteristics are loaded at boot time in a new table hblkr_size. The table is separate from the mmu_psize_def because this is specific to the pseries platform. A new init function, pseries_lpar_read_hblkrm_characteristics() is added to read the characteristics. It is called from pSeries_setup_arch(). Fixes: ba2dd8a2 ("powerpc/pseries/mm: call H_BLOCK_REMOVE") Signed-off-by: NLaurent Dufour <ldufour@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190920130523.20441-2-ldufour@linux.ibm.com
-
由 Michael Roth 提交于
On a 2-socket Power9 system with 32 cores/128 threads (SMT4) and 1TB of memory running the following guest configs: guest A: - 224GB of memory - 56 VCPUs (sockets=1,cores=28,threads=2), where: VCPUs 0-1 are pinned to CPUs 0-3, VCPUs 2-3 are pinned to CPUs 4-7, ... VCPUs 54-55 are pinned to CPUs 108-111 guest B: - 4GB of memory - 4 VCPUs (sockets=1,cores=4,threads=1) with the following workloads (with KSM and THP enabled in all): guest A: stress --cpu 40 --io 20 --vm 20 --vm-bytes 512M guest B: stress --cpu 4 --io 4 --vm 4 --vm-bytes 512M host: stress --cpu 4 --io 4 --vm 2 --vm-bytes 256M the below soft-lockup traces were observed after an hour or so and persisted until the host was reset (this was found to be reliably reproducible for this configuration, for kernels 4.15, 4.18, 5.0, and 5.3-rc5): [ 1253.183290] rcu: INFO: rcu_sched self-detected stall on CPU [ 1253.183319] rcu: 124-....: (5250 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=1941 [ 1256.287426] watchdog: BUG: soft lockup - CPU#105 stuck for 23s! [CPU 52/KVM:19709] [ 1264.075773] watchdog: BUG: soft lockup - CPU#24 stuck for 23s! [worker:19913] [ 1264.079769] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [worker:20331] [ 1264.095770] watchdog: BUG: soft lockup - CPU#45 stuck for 23s! [worker:20338] [ 1264.131773] watchdog: BUG: soft lockup - CPU#64 stuck for 23s! [avocado:19525] [ 1280.408480] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] [ 1316.198012] rcu: INFO: rcu_sched self-detected stall on CPU [ 1316.198032] rcu: 124-....: (21003 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=8243 [ 1340.411024] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] [ 1379.212609] rcu: INFO: rcu_sched self-detected stall on CPU [ 1379.212629] rcu: 124-....: (36756 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=14714 [ 1404.413615] watchdog: BUG: soft lockup - CPU#124 stuck for 22s! [ksmd:791] [ 1442.227095] rcu: INFO: rcu_sched self-detected stall on CPU [ 1442.227115] rcu: 124-....: (52509 ticks this GP) idle=10a/1/0x4000000000000002 softirq=5408/5408 fqs=21403 [ 1455.111787] INFO: task worker:19907 blocked for more than 120 seconds. [ 1455.111822] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.111833] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.111884] INFO: task worker:19908 blocked for more than 120 seconds. [ 1455.111905] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.111925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.111966] INFO: task worker:20328 blocked for more than 120 seconds. [ 1455.111986] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.111998] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112048] INFO: task worker:20330 blocked for more than 120 seconds. [ 1455.112068] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112097] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112138] INFO: task worker:20332 blocked for more than 120 seconds. [ 1455.112159] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112179] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112210] INFO: task worker:20333 blocked for more than 120 seconds. [ 1455.112231] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112242] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112282] INFO: task worker:20335 blocked for more than 120 seconds. [ 1455.112303] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 [ 1455.112332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1455.112372] INFO: task worker:20336 blocked for more than 120 seconds. [ 1455.112392] Tainted: G L 5.3.0-rc5-mdr-vanilla+ #1 CPUs 45, 24, and 124 are stuck on spin locks, likely held by CPUs 105 and 31. CPUs 105 and 31 are stuck in smp_call_function_many(), waiting on target CPU 42. For instance: # CPU 105 registers (via xmon) R00 = c00000000020b20c R16 = 00007d1bcd800000 R01 = c00000363eaa7970 R17 = 0000000000000001 R02 = c0000000019b3a00 R18 = 000000000000006b R03 = 000000000000002a R19 = 00007d537d7aecf0 R04 = 000000000000002a R20 = 60000000000000e0 R05 = 000000000000002a R21 = 0801000000000080 R06 = c0002073fb0caa08 R22 = 0000000000000d60 R07 = c0000000019ddd78 R23 = 0000000000000001 R08 = 000000000000002a R24 = c00000000147a700 R09 = 0000000000000001 R25 = c0002073fb0ca908 R10 = c000008ffeb4e660 R26 = 0000000000000000 R11 = c0002073fb0ca900 R27 = c0000000019e2464 R12 = c000000000050790 R28 = c0000000000812b0 R13 = c000207fff623e00 R29 = c0002073fb0ca808 R14 = 00007d1bbee00000 R30 = c0002073fb0ca800 R15 = 00007d1bcd600000 R31 = 0000000000000800 pc = c00000000020b260 smp_call_function_many+0x3d0/0x460 cfar= c00000000020b270 smp_call_function_many+0x3e0/0x460 lr = c00000000020b20c smp_call_function_many+0x37c/0x460 msr = 900000010288b033 cr = 44024824 ctr = c000000000050790 xer = 0000000000000000 trap = 100 CPU 42 is running normally, doing VCPU work: # CPU 42 stack trace (via xmon) [link register ] c00800001be17188 kvmppc_book3s_radix_page_fault+0x90/0x2b0 [kvm_hv] [c000008ed3343820] c000008ed3343850 (unreliable) [c000008ed33438d0] c00800001be11b6c kvmppc_book3s_hv_page_fault+0x264/0xe30 [kvm_hv] [c000008ed33439d0] c00800001be0d7b4 kvmppc_vcpu_run_hv+0x8dc/0xb50 [kvm_hv] [c000008ed3343ae0] c00800001c10891c kvmppc_vcpu_run+0x34/0x48 [kvm] [c000008ed3343b00] c00800001c10475c kvm_arch_vcpu_ioctl_run+0x244/0x420 [kvm] [c000008ed3343b90] c00800001c0f5a78 kvm_vcpu_ioctl+0x470/0x7c8 [kvm] [c000008ed3343d00] c000000000475450 do_vfs_ioctl+0xe0/0xc70 [c000008ed3343db0] c0000000004760e4 ksys_ioctl+0x104/0x120 [c000008ed3343e00] c000000000476128 sys_ioctl+0x28/0x80 [c000008ed3343e20] c00000000000b388 system_call+0x5c/0x70 --- Exception: c00 (System Call) at 00007d545cfd7694 SP (7d53ff7edf50) is in userspace It was subsequently found that ipi_message[PPC_MSG_CALL_FUNCTION] was set for CPU 42 by at least 1 of the CPUs waiting in smp_call_function_many(), but somehow the corresponding call_single_queue entries were never processed by CPU 42, causing the callers to spin in csd_lock_wait() indefinitely. Nick Piggin suggested something similar to the following sequence as a possible explanation (interleaving of CALL_FUNCTION/RESCHEDULE IPI messages seems to be most common, but any mix of CALL_FUNCTION and !CALL_FUNCTION messages could trigger it): CPU X: smp_muxed_ipi_set_message(): X: smp_mb() X: message[RESCHEDULE] = 1 X: doorbell_global_ipi(42): X: kvmppc_set_host_ipi(42, 1) X: ppc_msgsnd_sync()/smp_mb() X: ppc_msgsnd() -> 42 42: doorbell_exception(): // from CPU X 42: ppc_msgsync() 105: smp_muxed_ipi_set_message(): 105: smb_mb() // STORE DEFERRED DUE TO RE-ORDERING --105: message[CALL_FUNCTION] = 1 | 105: doorbell_global_ipi(42): | 105: kvmppc_set_host_ipi(42, 1) | 42: kvmppc_set_host_ipi(42, 0) | 42: smp_ipi_demux_relaxed() | 42: // returns to executing guest | // RE-ORDERED STORE COMPLETES ->105: message[CALL_FUNCTION] = 1 105: ppc_msgsnd_sync()/smp_mb() 105: ppc_msgsnd() -> 42 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored 105: // hangs waiting on 42 to process messages/call_single_queue This can be prevented with an smp_mb() at the beginning of kvmppc_set_host_ipi(), such that stores to message[<type>] (or other state indicated by the host_ipi flag) are ordered vs. the store to to host_ipi. However, doing so might still allow for the following scenario (not yet observed): CPU X: smp_muxed_ipi_set_message(): X: smp_mb() X: message[RESCHEDULE] = 1 X: doorbell_global_ipi(42): X: kvmppc_set_host_ipi(42, 1) X: ppc_msgsnd_sync()/smp_mb() X: ppc_msgsnd() -> 42 42: doorbell_exception(): // from CPU X 42: ppc_msgsync() // STORE DEFERRED DUE TO RE-ORDERING -- 42: kvmppc_set_host_ipi(42, 0) | 42: smp_ipi_demux_relaxed() | 105: smp_muxed_ipi_set_message(): | 105: smb_mb() | 105: message[CALL_FUNCTION] = 1 | 105: doorbell_global_ipi(42): | 105: kvmppc_set_host_ipi(42, 1) | // RE-ORDERED STORE COMPLETES -> 42: kvmppc_set_host_ipi(42, 0) 42: // returns to executing guest 105: ppc_msgsnd_sync()/smp_mb() 105: ppc_msgsnd() -> 42 42: local_paca->kvm_hstate.host_ipi == 0 // IPI ignored 105: // hangs waiting on 42 to process messages/call_single_queue Fixing this scenario would require an smp_mb() *after* clearing host_ipi flag in kvmppc_set_host_ipi() to order the store vs. subsequent processing of IPI messages. To handle both cases, this patch splits kvmppc_set_host_ipi() into separate set/clear functions, where we execute smp_mb() prior to setting host_ipi flag, and after clearing host_ipi flag. These functions pair with each other to synchronize the sender and receiver sides. With that change in place the above workload ran for 20 hours without triggering any lock-ups. Fixes: 755563bc ("powerpc/powernv: Fixes for hypervisor doorbell handling") # v4.0 Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@ozlabs.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190911223155.16045-1-mdroth@linux.vnet.ibm.com
-
- 21 9月, 2019 5 次提交
-
-
由 Christophe Leroy 提交于
Uncompressing Kernel Image ... OK Loading Device Tree to 01ff7000, end 01fff74f ... OK [ 0.000000] printk: bootconsole [udbg0] enabled [ 0.000000] BUG: Unable to handle kernel data access at 0xf818c000 [ 0.000000] Faulting instruction address: 0xc0013c7c [ 0.000000] Thread overran stack, or stack corrupted [ 0.000000] Oops: Kernel access of bad area, sig: 11 [#1] [ 0.000000] BE PAGE_SIZE=16K PREEMPT [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.3.0-rc4-s3k-dev-00743-g5abe4a3e8fd3-dirty #2080 [ 0.000000] NIP: c0013c7c LR: c0013310 CTR: 00000000 [ 0.000000] REGS: c0c5ff38 TRAP: 0300 Not tainted (5.3.0-rc4-s3k-dev-00743-g5abe4a3e8fd3-dirty) [ 0.000000] MSR: 00001032 <ME,IR,DR,RI> CR: 99033955 XER: 80002100 [ 0.000000] DAR: f818c000 DSISR: 82000000 [ 0.000000] GPR00: c0013310 c0c5fff0 c0ad6ac0 c0c600c0 f818c031 82000000 00000000 ffffffff [ 0.000000] GPR08: 00000000 f1f1f1f1 c0013c2c c0013304 99033955 00400008 00000000 07ff9598 [ 0.000000] GPR16: 00000000 07ffb94c 00000000 00000000 00000000 00000000 00000000 f818cfb2 [ 0.000000] GPR24: 00000000 00000000 00001000 ffffffff 00000000 c07dbf80 00000000 f818c000 [ 0.000000] NIP [c0013c7c] do_page_fault+0x50/0x904 [ 0.000000] LR [c0013310] handle_page_fault+0xc/0x38 [ 0.000000] Call Trace: [ 0.000000] Instruction dump: [ 0.000000] be010080 91410014 553fe8fe 3d40c001 3d20f1f1 7d800026 394a3c2c 3fffe000 [ 0.000000] 6129f1f1 900100c4 9181007c 91410018 <913f0000> 3d2001f4 6129f4f4 913f0004 Don't map the early shadow page read-only yet when creating the new page tables for the real shadow memory, otherwise the memblock allocations that immediately follows to create the real shadow pages that are about to replace the early shadow page trigger a page fault if they fall into the region being worked on at the moment. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Fixes: 2edb16ef ("powerpc/32: Add KASAN support") Cc: stable@vger.kernel.org Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fe86886fb8db44360417cee0dc515ad47ca6ef72.1566382750.git.christophe.leroy@c-s.fr
-
由 Christophe Leroy 提交于
In a couple of places there is a need to select whether read-only protection of shadow pages is performed with PAGE_KERNEL_RO or with PAGE_READONLY. Add a helper to avoid duplicating the choice. Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Cc: stable@vger.kernel.org Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/9f33f44b9cd741c4a02b3dce7b8ef9438fe2cd2a.1566382750.git.christophe.leroy@c-s.fr
-
由 Jordan Niethe 提交于
Currently the reserved bits of the Processor Compatibility Register (PCR) are cleared as per the Programming Note in Section 1.3.3 of version 3.0B of the Power ISA. This causes all new architecture features to be made available when running on newer processors with new architecture features added to the PCR as bits must be set to disable a given feature. For example to disable new features added as part of Version 2.07 of the ISA the corresponding bit in the PCR needs to be set. As new processor features generally require explicit kernel support they should be disabled until such support is implemented. Therefore kernels should set all unknown/reserved bits in the PCR such that any new architecture features which the kernel does not currently know about get disabled. An update is planned to the ISA to clarify that the PCR is an exception to the Programming Note on reserved bits in Section 1.3.3. Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NJordan Niethe <jniethe5@gmail.com> Tested-by: NJoel Stanley <joel@jms.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190917004605.22471-2-alistair@popple.id.au
-
由 Alistair Popple 提交于
Commit 388cc6e1 ("KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7") introduced new macros defining the PCR bits. When used from assembly files these definitions lead to build errors using older versions of binutils that don't support the 'ul' suffix. This fixes the build errors by updating the definitions to use the __MASK() macro which selects the appropriate suffix. Signed-off-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190917004605.22471-1-alistair@popple.id.au
-
由 Aneesh Kumar K.V 提交于
On failed task initialization due to memory allocation failures, we can call into destroy_context() with process_tb entry already populated. This patch forces the process_tb entry to zero in destroy_context(). With this patch, we lose the ability to track if we are destroying a context without flushing the process table entry. WARNING: CPU: 4 PID: 6368 at arch/powerpc/mm/mmu_context_book3s64.c:246 destroy_context+0x58/0x340 NIP [c0000000000875f8] destroy_context+0x58/0x340 LR [c00000000013da18] __mmdrop+0x78/0x270 Call Trace: [c000000f7db77c80] [c00000000013da18] __mmdrop+0x78/0x270 [c000000f7db77cf0] [c0000000004d6a34] __do_execve_file.isra.13+0xbd4/0x1000 [c000000f7db77e00] [c0000000004d7428] sys_execve+0x58/0x70 [c000000f7db77e30] [c00000000000b388] system_call+0x5c/0x70 Reported-by: NPriya M.A <priyama2@in.ibm.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Reformat/tweak comment wording] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190918140103.24395-1-aneesh.kumar@linux.ibm.com
-
- 19 9月, 2019 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
__find_linux_mm_pte() returns a page table entry pointer after walking the page table without holding locks. To make it safe against a THP split and/or collapse, we disable interrupts around the lockless page table walk. However we need to keep interrupts disabled as long as we use the page table entry pointer that is returned. Fix addr_to_pfn() to do that. Fixes: ba41e1e1 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Rearrange code slightly and tweak change log wording] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190918145328.28602-1-aneesh.kumar@linux.ibm.com
-
由 David Howells 提交于
Convert the spufs filesystem to the new internal mount API as the old one will be obsoleted and removed. This allows greater flexibility in communication of mount parameters between userspace, the VFS and the filesystem. See Documentation/filesystems/mount_api.txt for more information. Signed-off-by: NDavid Howells <dhowells@redhat.com> cc: Jeremy Kerr <jk@ozlabs.org> cc: Arnd Bergmann <arnd@arndb.de> cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 9月, 2019 2 次提交
-
-
由 Naveen N. Rao 提交于
With support for HAVE_FUNCTION_GRAPH_RET_ADDR_PTR, ftrace_graph_ret_addr() provides more robust unwinding when function graph is in use. Update show_stack() to use the same. With dump_stack() added to sysrq_sysctl_handler(), before this patch: root@(none):/sys/kernel/debug/tracing# cat /proc/sys/kernel/sysrq CPU: 0 PID: 218 Comm: cat Not tainted 5.3.0-rc7-00868-g8453ad4a078c-dirty #20 Call Trace: [c0000000d1e13c30] [c00000000006ab98] return_to_handler+0x0/0x40 (dump_stack+0xe8/0x164) (unreliable) [c0000000d1e13c80] [c000000000145680] sysrq_sysctl_handler+0x48/0xb8 [c0000000d1e13cd0] [c00000000006ab98] return_to_handler+0x0/0x40 (proc_sys_call_handler+0x274/0x2a0) [c0000000d1e13d60] [c00000000006ab98] return_to_handler+0x0/0x40 (return_to_handler+0x0/0x40) [c0000000d1e13d80] [c00000000006ab98] return_to_handler+0x0/0x40 (__vfs_read+0x3c/0x70) [c0000000d1e13dd0] [c00000000006ab98] return_to_handler+0x0/0x40 (vfs_read+0xb8/0x1b0) [c0000000d1e13e20] [c00000000006ab98] return_to_handler+0x0/0x40 (ksys_read+0x7c/0x140) After this patch: Call Trace: [c0000000d1e33c30] [c00000000006ab58] return_to_handler+0x0/0x40 (dump_stack+0xe8/0x164) (unreliable) [c0000000d1e33c80] [c000000000145680] sysrq_sysctl_handler+0x48/0xb8 [c0000000d1e33cd0] [c00000000006ab58] return_to_handler+0x0/0x40 (proc_sys_call_handler+0x274/0x2a0) [c0000000d1e33d60] [c00000000006ab58] return_to_handler+0x0/0x40 (__vfs_read+0x3c/0x70) [c0000000d1e33d80] [c00000000006ab58] return_to_handler+0x0/0x40 (vfs_read+0xb8/0x1b0) [c0000000d1e33dd0] [c00000000006ab58] return_to_handler+0x0/0x40 (ksys_read+0x7c/0x140) [c0000000d1e33e20] [c00000000006ab58] return_to_handler+0x0/0x40 (system_call+0x5c/0x68) Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/dc89c9a887121342d9c7819482c3dabdece2a323.1567707399.git.naveen.n.rao@linux.vnet.ibm.com
-
由 Naveen N. Rao 提交于
This associates entries in the ftrace_ret_stack with corresponding stack frames, enabling more robust stack unwinding. Also update the only user of ftrace_graph_ret_addr() to pass the stack pointer. Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/0224f2d0971b069c678e2ff678cfc2cd1e114cfe.1567707399.git.naveen.n.rao@linux.vnet.ibm.com
-
- 17 9月, 2019 1 次提交
-
-
由 Ganesh Goudar 提交于
Since commit 4388c9b3 ("powerpc: Do not send system reset request through the oops path"), pstore dmesg file is not updated when dump is triggered from HMC. This commit modified system reset (sreset) handler to invoke fadump or kdump (if configured), without pushing dmesg to pstore. This leaves pstore to have old dmesg data which won't be much of a help if kdump fails to capture the dump. This patch fixes that by calling kmsg_dump() before heading to fadump ot kdump. Fixes: 4388c9b3 ("powerpc: Do not send system reset request through the oops path") Reviewed-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Reviewed-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NGanesh Goudar <ganeshgr@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190904075949.15607-1-ganeshgr@linux.ibm.com
-
- 13 9月, 2019 6 次提交
-
-
由 Cédric Le Goater 提交于
When dumping the XIVE state of an CPU IPI, xmon does not check if the CPU is started or not which can cause an error. Add a check for that and change the output to be on one line just as the XIVE interrupts of the machine. Signed-off-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190910081850.26038-3-clg@kaod.org
-
由 Cédric Le Goater 提交于
When looping on the list of interrupts, add the current value of the PQ bits with a load on the ESB page. This has the side effect of faulting the ESB page of all interrupts. Signed-off-by: NCédric Le Goater <clg@kaod.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190910081850.26038-2-clg@kaod.org
-
由 Qian Cai 提交于
Booting a POWER9 PowerNV system generates a few messages below with "____ptrval____" due to the pointers printed without a specifier extension (i.e unadorned %p) are hashed to prevent leaking information about the kernel memory layout. radix-mmu: Initializing Radix MMU radix-mmu: Partition table (____ptrval____) radix-mmu: Mapped 0x0000000000000000-0x0000000040000000 with 1.00 GiB pages (exec) radix-mmu: Mapped 0x0000000040000000-0x0000002000000000 with 1.00 GiB pages radix-mmu: Mapped 0x0000200000000000-0x0000202000000000 with 1.00 GiB pages radix-mmu: Process table (____ptrval____) and radix root for kernel: (____ptrval____) Signed-off-by: NQian Cai <cai@lca.pw> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1566570120-16529-1-git-send-email-cai@lca.pw
-
由 Hari Bathini 提交于
With support to copy multiple kernel boot memory regions owing to copy size limitation, also handle holes in the memory area to be preserved. Support as many as 128 kernel boot memory regions. This allows having an adequate FADump capture kernel size for different scenarios. Signed-off-by: NHari Bathini <hbathini@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821385448.5656.6124791213910877759.stgit@hbathini.in.ibm.com
-
由 Hari Bathini 提交于
RMA_START is defined as '0' and there is even a BUILD_BUG_ON() to make sure it is never anything else. Remove this macro and use '0' instead as code change is needed anyway when it has to be something else. Also, remove unused RMA_END macro. Signed-off-by: NHari Bathini <hbathini@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821384096.5656.15026984053970204652.stgit@hbathini.in.ibm.com
-
由 Hari Bathini 提交于
OPAL loads kernel & initrd at 512MB offset (256MB size), also exported as ibm,opal/dump/fw-load-area. So, if boot memory size of FADump is less than 768MB, kernel memory to be exported as '/proc/vmcore' would be overwritten by f/w while loading kernel & initrd. To avoid such a scenario, enforce a minimum boot memory size of 768MB on OPAL platform and skip using FADump if a newer F/W version loads kernel & initrd above 768MB. Also, irrespective of RMA size, set the minimum boot memory size expected on pseries platform at 320MB. This is to avoid inflating the minimum memory requirements on systems with 512M/1024M RMA size. Signed-off-by: NHari Bathini <hbathini@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/156821381414.5656.1592867278535469652.stgit@hbathini.in.ibm.com
-