- 22 2月, 2019 1 次提交
-
-
由 Christian Borntraeger 提交于
While we will not implement interception for query functions yet, we can and should disable functions that have a control bit based on the given CPU model. Let us start with enabling the subfunction interface. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NJanosch Frank <frankja@linux.vnet.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com>
-
- 21 2月, 2019 1 次提交
-
-
由 Sean Christopherson 提交于
kvm_arch_memslots_updated() is at this point in time an x86-specific hook for handling MMIO generation wraparound. x86 stashes 19 bits of the memslots generation number in its MMIO sptes in order to avoid full page fault walks for repeat faults on emulated MMIO addresses. Because only 19 bits are used, wrapping the MMIO generation number is possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that the generation has changed so that it can invalidate all MMIO sptes in case the effective MMIO generation has wrapped so as to avoid using a stale spte, e.g. a (very) old spte that was created with generation==0. Given that the purpose of kvm_arch_memslots_updated() is to prevent consuming stale entries, it needs to be called before the new generation is propagated to memslots. Invalidating the MMIO sptes after updating memslots means that there is a window where a vCPU could dereference the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO spte that was created with (pre-wrap) generation==0. Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()") Cc: <stable@vger.kernel.org> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 2月, 2019 8 次提交
-
-
由 Michael Mueller 提交于
The patch implements a handler for GIB alert interruptions on the host. Its task is to alert guests that interrupts are pending for them. A GIB alert interrupt statistic counter is added as well: $ cat /proc/interrupts CPU0 CPU1 ... GAL: 23 37 [I/O] GIB Alert ... Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Message-Id: <20190131085247.13826-14-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
Add the Interruption Alert Mask (IAM) to the architecture specific kvm struct. This mask in the GISA is used to define for which ISC a GIB alert will be issued. The functions kvm_s390_gisc_register() and kvm_s390_gisc_unregister() are used to (un)register a GISC (guest ISC) with a virtual machine and its GISA. Upon successful completion, kvm_s390_gisc_register() returns the ISC to be used for GIB alert interruptions. A negative return code indicates an error during registration. Theses functions will be used by other adapter types like AP and PCI to request pass-through interruption support. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Acked-by: NPierre Morel <pmorel@linux.ibm.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Message-Id: <20190131085247.13826-12-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
Adding the kvm reference to struct sie_page2 will allow to determine the kvm a given gisa belongs to: container_of(gisa, struct sie_page2, gisa)->kvm This functionality will be required to process a gisa in gib alert interruption context. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-11-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
The Guest Information Block (GIB) links the GISA of all guests that have adapter interrupts pending. These interrupts cannot be delivered because all vcpus of these guests are currently in WAIT state or have masked the respective Interruption Sub Class (ISC). If enabled, a GIB alert is issued on the host to schedule these guests to run suitable vcpus to consume the pending interruptions. This mechanism allows to process adapter interrupts for currently not running guests. The GIB is created during host initialization and associated with the Adapter Interruption Facility in case an Adapter Interruption Virtualization Facility is available. The GIB initialization and thus the activation of the related code will be done in an upcoming patch of this series. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NJanosch Frank <frankja@linux.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-10-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
This patch implements the Set Guest Information Block operation to request association or disassociation of a Guest Information Block (GIB) with the Adapter Interruption Facility. The operation is required to receive GIB alert interrupts for guest adapters in conjunction with AIV and GISA. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NSebastian Ott <sebott@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Acked-by: NJanosch Frank <frankja@linux.ibm.com> Acked-by: NCornelia Huck <cohuck@redhat.com> Message-Id: <20190131085247.13826-9-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
Use this struct analog to the kvm interruption structs for kvm emulated floating and local interruptions. GIB handling will add further fields to this structure as required. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-8-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
The vcpu idle_mask state is used by but not specific to the emulated floating interruptions. The state is relevant to gisa related interruptions as well. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-4-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
Use a consistent bitmap declaration throughout the code. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-3-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
- 12 1月, 2019 2 次提交
-
-
由 Vasily Gorbik 提交于
While "s390/vdso: avoid 64-bit vdso mapping for compat tasks" fixed 64-bit vdso mapping for compat tasks under gdb it introduced another problem. "compat_mm" flag is not inherited during fork and when 31-bit process forks a child (but does not perform exec) it ends up with 64-bit vdso. To address that, init_new_context (which is called during fork and exec) now initialize compat_mm based on thread TIF_31BIT flag. Later compat_mm is adjusted in arch_setup_additional_pages, which is called during exec. Fixes: d1befa65 ("s390/vdso: avoid 64-bit vdso mapping for compat tasks") Reported-by: NStefan Liebler <stli@linux.ibm.com> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@vger.kernel.org> # v4.20+ Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
The ASCE of an mm_struct can be modified after a task has been created, e.g. via crst_table_downgrade for a compat process. The active_mm logic to avoid the switch_mm call if the next task is a kernel thread can lead to a situation where switch_mm is called where 'prev == next' is true but 'prev->context.asce == next->context.asce' is not. This can lead to a situation where a CPU uses the outdated ASCE to run a task. The result can be a crash, endless loops and really subtle problem due to TLBs being created with an invalid ASCE. Cc: stable@kernel.org # v3.15+ Fixes: 53e857f3 ("s390/mm,tlb: race of lazy TLB flush vs. recreation") Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 06 1月, 2019 2 次提交
-
-
由 Masahiro Yamada 提交于
Now that Kbuild automatically creates asm-generic wrappers for missing mandatory headers, it is redundant to list the same headers in generic-y and mandatory-y. Suggested-by: NSam Ravnborg <sam@ravnborg.org> Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NSam Ravnborg <sam@ravnborg.org>
-
由 Masahiro Yamada 提交于
These comments are leftovers of commit fcc8487d ("uapi: export all headers under uapi directories"). Prior to that commit, exported headers must be explicitly added to header-y. Now, all headers under the uapi/ directories are exported. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-
- 05 1月, 2019 2 次提交
-
-
由 Joel Fernandes (Google) 提交于
Patch series "Add support for fast mremap". This series speeds up the mremap(2) syscall by copying page tables at the PMD level even for non-THP systems. There is concern that the extra 'address' argument that mremap passes to pte_alloc may do something subtle architecture related in the future that may make the scheme not work. Also we find that there is no point in passing the 'address' to pte_alloc since its unused. This patch therefore removes this argument tree-wide resulting in a nice negative diff as well. Also ensuring along the way that the enabled architectures do not do anything funky with the 'address' argument that goes unnoticed by the optimization. Build and boot tested on x86-64. Build tested on arm64. The config enablement patch for arm64 will be posted in the future after more testing. The changes were obtained by applying the following Coccinelle script. (thanks Julia for answering all Coccinelle questions!). Following fix ups were done manually: * Removal of address argument from pte_fragment_alloc * Removal of pte_alloc_one_fast definitions from m68k and microblaze. // Options: --include-headers --no-includes // Note: I split the 'identifier fn' line, so if you are manually // running it, please unsplit it so it runs for you. virtual patch @pte_alloc_func_def depends on patch exists@ identifier E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; type T2; @@ fn(... - , T2 E2 ) { ... } @pte_alloc_func_proto_noarg depends on patch exists@ type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1, T2); + T3 fn(T1); | - T3 fn(T1, T2, T4); + T3 fn(T1, T2); ) @pte_alloc_func_proto depends on patch exists@ identifier E1, E2, E4; type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1 E1, T2 E2); + T3 fn(T1 E1); | - T3 fn(T1 E1, T2 E2, T4 E4); + T3 fn(T1 E1, T2 E2); ) @pte_alloc_func_call depends on patch exists@ expression E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ fn(... -, E2 ) @pte_alloc_macro depends on patch exists@ identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; identifier a, b, c; expression e; position p; @@ ( - #define fn(a, b, c) e + #define fn(a, b) e | - #define fn(a, b) e + #define fn(a) e ) Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: NKirill A. Shutemov <kirill@shutemov.name> Acked-by: NKirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
When testing in userspace, UBSAN pointed out that shifting into the sign bit is undefined behaviour. It doesn't really make sense to ask for the highest set bit of a negative value, so just turn the argument type into an unsigned int. Some architectures (eg ppc) already had it declared as an unsigned int, so I don't expect too many problems. Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 1月, 2019 1 次提交
-
-
由 Linus Torvalds 提交于
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 12月, 2018 1 次提交
-
-
由 Will Deacon 提交于
PREEMPT_NEED_RESCHED is never used directly, so move it into the arch code where it can potentially be implemented using either a different bit in the preempt count or as an entirely separate entity. Cc: Robert Love <rml@tech9.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
- 30 11月, 2018 2 次提交
-
-
由 Harald Freudenberger 提交于
There exist very few ap messages which need to have the 'special' flag enabled. This flag tells the firmware layer to do some pre- and maybe postprocessing. However, it may happen that this special flag is enabled but the firmware is unable to deal with this kind of message and thus returns with reply code 0x41. For example older firmware may not know the newest messages triggered by the zcrypt device driver and thus react with reject and the named reply code. Unfortunately this reply code is not known to the zcrypt error routines and thus default behavior is to switch the ap queue offline. This patch now makes the ap error routine aware of the reply code and so userspace is informed about the bad processing result but the queue is not switched to offline state any more. Signed-off-by: NHarald Freudenberger <freude@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Harald Freudenberger 提交于
The inline assembler functions ap_aqic() and ap_qact() used two variables declared on the very same register. One variable was for input only, the other for output. Looks like newer versions of the gcc don't like this. Anyway it is a better coding to use one variable (which may have a union data type) on one register for input and output. So this patch introduces unions and uses only one variable now for input and output for GR1 for the PQAP(QACT) and PQAP(QIC) invocation. Signed-off-by: NHarald Freudenberger <freude@linux.ibm.com> Acked-by: NIlya Leoshkevich <iii@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 06 11月, 2018 1 次提交
-
-
由 Martin Schwidefsky 提交于
The __no_sanitize_address_or_inline and __no_kasan_or_inline defines are almost identical. The only difference is that __no_kasan_or_inline does not have the 'notrace' attribute. To be able to replace __no_sanitize_address_or_inline with the older definition, add 'notrace' to __no_kasan_or_inline and change to two users of __no_sanitize_address_or_inline in the s390 code. The 'notrace' option is necessary for e.g. the __load_psw_mask function in arch/s390/include/asm/processor.h. Without the option it is possible to trace __load_psw_mask which leads to kernel stack overflow. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Pointed-out-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 11月, 2018 2 次提交
-
-
由 Vasily Gorbik 提交于
Increase kasan instrumented kernel stack size from 32k to 64k. Other architectures seems to get away with just doubling kernel stack size under kasan, but on s390 this appears to be not enough due to bigger frame size. The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting stack overflow is fs sync on xfs filesystem: #0 [9a0681e8] 704 bytes check_usage at 34b1fc #1 [9a0684a8] 432 bytes check_usage at 34c710 #2 [9a068658] 1048 bytes validate_chain at 35044a #3 [9a068a70] 312 bytes __lock_acquire at 3559fe #4 [9a068ba8] 440 bytes lock_acquire at 3576ee #5 [9a068d60] 104 bytes _raw_spin_lock at 21b44e0 #6 [9a068dc8] 1992 bytes enqueue_entity at 2dbf72 #7 [9a069590] 1496 bytes enqueue_task_fair at 2df5f0 #8 [9a069b68] 64 bytes ttwu_do_activate at 28f438 #9 [9a069ba8] 552 bytes try_to_wake_up at 298c4c #10 [9a069dd0] 168 bytes wake_up_worker at 23f97c #11 [9a069e78] 200 bytes insert_work at 23fc2e #12 [9a069f40] 648 bytes __queue_work at 2487c0 #13 [9a06a1c8] 200 bytes __queue_delayed_work at 24db28 #14 [9a06a290] 248 bytes mod_delayed_work_on at 24de84 #15 [9a06a388] 24 bytes kblockd_mod_delayed_work_on at 153e2a0 #16 [9a06a3a0] 288 bytes __blk_mq_delay_run_hw_queue at 158168c #17 [9a06a4c0] 192 bytes blk_mq_run_hw_queue at 1581a3c #18 [9a06a580] 184 bytes blk_mq_sched_insert_requests at 15a2192 #19 [9a06a638] 1024 bytes blk_mq_flush_plug_list at 1590f3a #20 [9a06aa38] 704 bytes blk_flush_plug_list at 1555028 #21 [9a06acf8] 320 bytes schedule at 219e476 #22 [9a06ae38] 760 bytes schedule_timeout at 21b0aac #23 [9a06b130] 408 bytes wait_for_common at 21a1706 #24 [9a06b2c8] 360 bytes xfs_buf_iowait at fa1540 #25 [9a06b430] 256 bytes __xfs_buf_submit at fadae6 #26 [9a06b530] 264 bytes xfs_buf_read_map at fae3f6 #27 [9a06b638] 656 bytes xfs_trans_read_buf_map at 10ac9a8 #28 [9a06b8c8] 304 bytes xfs_btree_kill_root at e72426 #29 [9a06b9f8] 288 bytes xfs_btree_lookup_get_block at e7bc5e #30 [9a06bb18] 624 bytes xfs_btree_lookup at e7e1a6 #31 [9a06bd88] 2664 bytes xfs_alloc_ag_vextent_near at dfa070 #32 [9a06c7f0] 144 bytes xfs_alloc_ag_vextent at dff3ca #33 [9a06c880] 1128 bytes xfs_alloc_vextent at e05fce #34 [9a06cce8] 584 bytes xfs_bmap_btalloc at e58342 #35 [9a06cf30] 1336 bytes xfs_bmapi_write at e618de #36 [9a06d468] 776 bytes xfs_iomap_write_allocate at ff678e #37 [9a06d770] 720 bytes xfs_map_blocks at f82af8 #38 [9a06da40] 928 bytes xfs_writepage_map at f83cd6 #39 [9a06dde0] 320 bytes xfs_do_writepage at f85872 #40 [9a06df20] 1320 bytes write_cache_pages at 73dfe8 #41 [9a06e448] 208 bytes xfs_vm_writepages at f7f892 #42 [9a06e518] 88 bytes do_writepages at 73fe6a #43 [9a06e570] 872 bytes __writeback_single_inode at a20cb6 #44 [9a06e8d8] 664 bytes writeback_sb_inodes at a23be2 #45 [9a06eb70] 296 bytes __writeback_inodes_wb at a242e0 #46 [9a06ec98] 928 bytes wb_writeback at a2500e #47 [9a06f038] 848 bytes wb_do_writeback at a260ae #48 [9a06f388] 536 bytes wb_workfn at a28228 #49 [9a06f5a0] 1088 bytes process_one_work at 24a234 #50 [9a06f9e0] 1120 bytes worker_thread at 24ba26 #51 [9a06fe40] 104 bytes kthread at 26545a #52 [9a06fea8] kernel_thread_starter at 21b6b62 To be able to increase the stack size to 64k reuse LLILL instruction in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE (65192) value as unsigned. Reported-by: NBenjamin Block <bblock@linux.ibm.com> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
In case a fork or a clone system fails in copy_process and the error handling does the mmput() at the bad_fork_cleanup_mm label, the following warning messages will appear on the console: BUG: non-zero pgtables_bytes on freeing mm: 16384 The reason for that is the tricks we play with mm_inc_nr_puds() and mm_inc_nr_pmds() in init_new_context(). A normal 64-bit process has 3 levels of page table, the p4d level and the pud level are folded. On process termination the free_pud_range() function in mm/memory.c will subtract 16KB from pgtable_bytes with a mm_dec_nr_puds() call, but there actually is not really a pud table. One issue with this is the fact that pgtable_bytes is usually off by a few kilobytes, but the more severe problem is that for a failed fork or clone the free_pgtables() function is not called. In this case there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context(). The pgtable_bytes will be off by 16384 or 32768 bytes and we get the BUG message. The message itself is purely cosmetic, but annoying. To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded function to check for the true size of the address space. Reported-by: NLi Wang <liwang@redhat.com> Tested-by: NLi Wang <liwang@redhat.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 31 10月, 2018 1 次提交
-
-
由 Nick Desaulniers 提交于
Prefer _THIS_IP_ defined in linux/kernel.h. Most definitions of current_text_addr were the same as _THIS_IP_, but a few archs had inline assembly instead. This patch removes the final call site of current_text_addr, making all of the definitions dead code. [akpm@linux-foundation.org: fix arch/csky/include/asm/processor.h] Link: http://lkml.kernel.org/r/20180911182413.180715-1-ndesaulniers@google.comSigned-off-by: NNick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 10月, 2018 1 次提交
-
-
由 Vasily Gorbik 提交于
When the kernel is built with: CONFIG_PREEMPT=y CONFIG_PREEMPT_COUNT=y "stfle" function used by kasan initialization code makes additional call to preempt_count_add/preempt_count_sub. To avoid removing kasan instrumentation from sched code where those functions leave split stfle function and provide __stfle variant without preemption handling to be used by Kasan. Reported-by: NBenjamin Block <bblock@linux.ibm.com> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 10 10月, 2018 2 次提交
-
-
由 Ingo Franzki 提交于
Introduce a new ioctl API and in-kernel API to transform a variable length key blob of any supported type into a protected key. Transforming a secure key blob uses the already existing function pkey_sec2protk(). Transforming a protected key blob also verifies if the protected key is still valid. If not, -ENODEV is returned. Both APIs are described in detail in the header files arch/s390/include/asm/pkey.h and arch/s390/include/uapi/asm/pkey.h. Signed-off-by: NIngo Franzki <ifranzki@linux.ibm.com> Reviewed-by: NHarald Freudenberger <freude@linux.ibm.com> Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Ingo Franzki 提交于
Introduce a new ioctl API and in-kernel API to verify if a random protected key is still valid. A protected key is invalid when its wrapping key verification pattern does not match the verification pattern of the LPAR. Each time an LPAR is activated, a new LPAR wrapping key is generated and the wrapping key verification pattern is updated. Both APIs are described in detail in the header files arch/s390/include/asm/pkey.h and arch/s390/include/uapi/asm/pkey.h. Signed-off-by: NIngo Franzki <ifranzki@linux.ibm.com> Reviewed-by: NHarald Freudenberger <freude@linux.ibm.com> Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 09 10月, 2018 13 次提交
-
-
由 Ingo Franzki 提交于
This patch introduces a new ioctl API and in-kernel API to generate a random protected key. The protected key is generated in a way that the effective clear key is never exposed in clear. Both APIs are described in detail in the header files arch/s390/include/asm/pkey.h and arch/s390/include/uapi/asm/pkey.h. Signed-off-by: NIngo Franzki <ifranzki@linux.ibm.com> Reviewed-by: NHarald Freudenberger <freude@linux.ibm.com> Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Harald Freudenberger 提交于
Some cleanup in the s390 zcrypt device driver: - Removed fragments of pcixx crypto card code. This code can't be reached anymore because the hardware detection function does not recognize crypto cards < CEX2 since commit f56545430736 ("s390/zcrypt: Introduce QACT support for AP bus devices.") - Rename of some files and driver names which where still reflecting pcixx support to cex2a/cex2c. - Removed all the zcrypt version strings in the file headers. There is only one place left - the zcrypt.h header file is now the only place for zcrypt device driver version info. - Zcrypt version pump up from 2.2.0 to 2.2.1. Signed-off-by: NHarald Freudenberger <freude@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
Early boot stack uses predefined 4 pages of memory 0x8000-0xC000. This stack is used to run not instumented decompressor/facilities verification C code. It doesn't make sense to double its size when the kernel is built with KASAN support. BOOT_STACK_ORDER is introduced to avoid that. Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
By default 3-level paging is used when the kernel is compiled with kasan support. Add 4-level paging option to support systems with more then 3TB of physical memory and to cover 4-level paging specific code with kasan as well. Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
Kasan initialization code is changed to populate persistent shadow first, save allocator position into pgalloc_freeable and proceed with early identity mapping creation. This way early identity mapping paging structures could be freed at once after switching to swapper_pg_dir when early identity mapping is not needed anymore. Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
By defining KASAN_SHADOW_OFFSET in Kconfig stack and global variables memory access check instrumentation is enabled. gcc version 4.9.2 or newer is also required. Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
Some functions from both arch/s390/kernel/ipl.c and arch/s390/kernel/machine_kexec.c are called without DAT enabled (or with and without DAT enabled code paths). There is no easy way to partially disable kasan for those files without a substantial rework. Disable kasan for both files for now. To avoid disabling kasan for arch/s390/kernel/diag.c DAT flag is enabled in diag308 call. pcpu_delegate which disables DAT is marked with __no_sanitize_address to disable instrumentation for that one function. Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
smp_start_secondary function is called without DAT enabled. To avoid disabling kasan instrumentation for entire arch/s390/kernel/smp.c smp_start_secondary has been split in 2 parts. smp_start_secondary has instrumentation disabled, it does minimal setup and enables DAT. Then instrumentated __smp_start_secondary is called to do the rest. __load_psw_mask function instrumentation has been disabled as well to be able to call it from smp_start_secondary. Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
To lower memory footprint and speed up kasan initialisation detect EDAT availability and use large pages if possible. As we know how much memory is needed for initialisation, another simplistic large page allocator is introduced to avoid memory fragmentation. Since facilities list is retrieved anyhow, detect noexec support and adjust pages attributes. Handle noexec kernel option to avoid inconsistent kasan shadow memory pages flags. Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
Kasan stack instrumentation pads stack variables with redzones, which increases stack frames size significantly. Stack sizes are increased from 16k to 32k in the code, as well as for the kernel stack overflow detection option (CHECK_STACK). Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
Kasan needs 1/8 of kernel virtual address space to be reserved as the shadow area. And eventually it requires the shadow memory offset to be known at compile time (passed to the compiler when full instrumentation is enabled). Any value picked as the shadow area offset for 3-level paging would eat up identity mapping on 4-level paging (with 1PB shadow area size). So, the kernel sticks to 3-level paging when kasan is enabled. 3TB border is picked as the shadow offset. The memory layout is adjusted so, that physical memory border does not exceed KASAN_SHADOW_START and vmemmap does not go below KASAN_SHADOW_END. Due to the fact that on s390 paging is set up very late and to cover more code with kasan instrumentation, temporary identity mapping and final shadow memory are set up early. The shadow memory mapping is later carried over to init_mm.pgd during paging_init. For the needs of paging structures allocation and shadow memory population a primitive allocator is used, which simply chops off memory blocks from the end of the physical memory. Kasan currenty doesn't track vmemmap and vmalloc areas. Current memory layout (for 3-level paging, 2GB physical memory). ---[ Identity Mapping ]--- 0x0000000000000000-0x0000000000100000 ---[ Kernel Image Start ]--- 0x0000000000100000-0x0000000002b00000 ---[ Kernel Image End ]--- 0x0000000002b00000-0x0000000080000000 2G <- physical memory border 0x0000000080000000-0x0000030000000000 3070G PUD I ---[ Kasan Shadow Start ]--- 0x0000030000000000-0x0000030010000000 256M PMD RW X <- shadow for 2G memory 0x0000030010000000-0x0000037ff0000000 523776M PTE RO NX <- kasan zero ro page 0x0000037ff0000000-0x0000038000000000 256M PMD RW X <- shadow for 2G modules ---[ Kasan Shadow End ]--- 0x0000038000000000-0x000003d100000000 324G PUD I ---[ vmemmap Area ]--- 0x000003d100000000-0x000003e080000000 ---[ vmalloc Area ]--- 0x000003e080000000-0x000003ff80000000 ---[ Modules Area ]--- 0x000003ff80000000-0x0000040000000000 2G Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
Add pgd_page primitive which is required by kasan common code. Also fixes typo in p4d_page definition. Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Vasily Gorbik 提交于
Kasan common code requires MAX_PTRS_PER_P4D definition, which in case of s390 is always PTRS_PER_P4D. Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-