- 24 10月, 2013 12 次提交
-
-
由 Heiko Carstens 提交于
Just like all other architectures we should use out-of-line find bit operations, since the inline variant bloat the size of the kernel image. And also like all other architecures we should only supply optimized variants of the __ffs, ffs, etc. primitives. Therefore this patch removes the inlined s390 find bit functions and uses the generic out-of-line variants instead. The optimization of the primitives follows with the next patch. With this patch also the functions find_first_bit_left() and find_next_bit_left() have been reimplemented, since logically, they are nothing else but a find_first_bit()/find_next_bit() implementation that use an inverted __fls() instead of __ffs(). Also the restriction that these functions only work on machines which support the "flogr" instruction is gone now. This reduces the size of the kernel image (defconfig, -march=z9-109) by 144,482 bytes. Alone the size of the function build_sched_domains() gets reduced from 7 KB to 3,5 KB. We also git rid of unused functions like find_first_bit_le()... Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Hendrik Brueckner 提交于
Add the debug_level_enabled() function to check if debug events for a particular level would be logged. This might help to save cycles for debug events that require additional information collection. Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Since zEC12 we have the interlocked-access facility 2 which allows to use the instructions ni/oi/xi to update a single byte in storage with compare-and-swap semantics. So change set_bit(), clear_bit() and change_bit() to generate such code instead of a compare-and-swap loop (or using the load-and-* instruction family), if possible. This reduces the text segment by yet another 8KB (defconfig). Alternatively the long displacement variants niy/oiy/xiy could have been used, but the extended displacement field is usually not needed and therefore would only increase the size of the text segment again. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Remove CONFIG_SMP from bitops code. This reduces the C code significantly but also generates better code for the SMP case. This means that for !CONFIG_SMP set_bit() and friends now also have compare and swap semantics (read: more code). However nobody really cares for !CONFIG_SMP and this is the trade-off to simplify the SMP code which we do care about. The non-atomic bitops like __set_bit() now generate also better code because the old code did not have a __builtin_contant_p() check for the CONFIG_SMP case and therefore always generated the inline assembly variant. However the inline assemblies for the non-atomic case now got completely removed since gcc can produce better code, which accesses less memory operands. test_bit() got also a bit simplified since it did have a __builtin_constant_p() check, however two identical code pathes for each case (written differently). In result this mainly reduces the to be maintained code but is not very relevant for code generation, since there are not many non-atomic bitops usages that we care about. (code reduction defconfig kernel image before/after: 560 bytes). Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
- add a typecheck to the defines to make sure they operate on an atomic_t - simplify inline assembly constraints - keep variable names common between functions Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
If the interlocked-access facility 1 is available we can use the asi and agsi instructions for interlocked updates if the to be added value is a contanst and small (in the range of -128..127). asi and agsi do not not return the old or new value, therefore these instructions can only be used for atomic_(add|sub|inc|dec)[64]. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Since we have an in-kernel disassembler we can make sure that there won't be any kprobes set on random data. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Now that the in-kernel disassembler has an own header file move the disassembler related function prototypes to that header file. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Suzuki K. Poulose 提交于
The patch moves some of the definitions to a header file. No functional changes involved. I have retained the Copyright Statement from the original file. Signed-off-by: NSuzuki K Poulose <suzuki@in.ibm.com> [Heiko Carstens: rename s390-dis.h to dis.h] Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Same as for bitops: make use of the interlocked-access facility 1 instructions which allow to atomically update storage locations without a compare-and-swap loop. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Get rid of the own atomic_sub_return() implementation. Otherwise we can't make use of the interlocked-access facility 1 instructions for atomic_sub_return(), since there is no "load and subtract" instruction available. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Make use of the interlocked-access facility 1 that got added with the z196 architecure. This facilility added new instructions which can atomically update a storage location without a compare-and-swap loop. E.g. setting a bit within a "long" can be done with a single instruction. The size of the kernel image gets ~30kb smaller. Considering that there are appr. 1900 bitops call sites this means that each one saves about 15-16 bytes per call site which is expected. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 22 10月, 2013 1 次提交
-
-
由 Martin Schwidefsky 提交于
The result of the store-clock-fast (STCKF) instruction is a bit fuzzy. It can happen that the value stored on one CPU is smaller than the value stored on another CPU, although the order of the stores is the other way around. This can cause deltas of get_tod_clock() values to become negative when they should not be. We need to be more careful with store-clock-fast, this patch partially reverts git commit e4b7b4238e666682555461fa52eecd74652f36bb "time: always use stckf instead of stck if available". The get_tod_clock() function now uses the store-clock-extended (STCKE) instruction. get_tod_clock_fast() can be used if the fuzziness of store-clock-fast is acceptable e.g. for wait loops local to a CPU. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 15 10月, 2013 1 次提交
-
-
由 Martin Schwidefsky 提交于
For machines without enhanced supression on protection the software dirty bit code forces the pte dirty bit and clears the page protection bit in pgste_set_pte. This is done for all pte types, the check for present ptes is missing. As a result swap ptes and other not-present ptes can get corrupted. Add a check for the _PAGE_PRESENT bit to pgste_set_pte before modifying the pte value. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 11 10月, 2013 1 次提交
-
-
由 Ingo Molnar 提交于
Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto' constructs, as outlined here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 Implement a workaround suggested by Jakub Jelinek. Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com> Reported-by: NOleg Nesterov <oleg@redhat.com> Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Suggested-by: NJakub Jelinek <jakub@redhat.com> Reviewed-by: NRichard Henderson <rth@twiddle.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 9月, 2013 2 次提交
-
-
由 Heiko Carstens 提交于
Enable ARCH_USE_CMPXCHG_LOCKREF since it shows performance improvements with Linus' simple stat() test case of up to 50% on a 30 cpu system. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Heiko Carstens 提交于
Linus suggested to replace #ifndef CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX #define arch_mutex_cpu_relax() cpu_relax() #endif with just a simple #ifndef arch_mutex_cpu_relax # define arch_mutex_cpu_relax() cpu_relax() #endif to get rid of CONFIG_HAVE_CPU_RELAX_SIMPLE. So architectures can simply define arch_mutex_cpu_relax if they want an architecture specific function instead of having to add a select statement in their Kconfig in addition. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
- 12 9月, 2013 2 次提交
-
-
由 Michael Holzheu 提交于
Modify the s390 copy_oldmem_page() and remap_oldmem_pfn_range() function for zfcpdump to read from the HSA memory if memory below HSA_SIZE bytes is requested. Otherwise real memory is used. Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com> Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Jan Willeke <willeke@de.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Heiko Carstens 提交于
With the general-instruction extension facility (z10) a couple of instructions with a pc-relative long displacement were introduced. The kprobes support for these instructions however was never implemented. In result, if anybody ever put a probe on any of these instructions the result would have been random behaviour after the instruction got executed within the insn slot. So lets add the missing handling for these instructions. Since all of the new instructions have 32 bit signed displacement the easiest solution is to allocate an insn slot that is within the same 2GB area like the original instruction and patch the displacement field. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 9月, 2013 1 次提交
-
-
由 Heiko Carstens 提交于
Let's not add a function for every external interrupt subclass for which we need reference counting. Just have two register/unregister functions which have a subclass parameter: void irq_subclass_register(enum irq_subclass subclass); void irq_subclass_unregister(enum irq_subclass subclass); Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
- 30 8月, 2013 4 次提交
-
-
由 Sebastian Ott 提交于
Function handles may change while the system was in hibernation use list pci functions and update the function handles. Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Sebastian Ott 提交于
List pci functions is used to query and iterate over pci functions. This function currently has 2 users - initial device discovery and rescan after a machine check. Instead of having a multipurpose function pass a callback which gets called for each pci function. Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Sebastian Ott 提交于
Some functions that do arch specific resume actions are called directly from swsusp_asm64.S . Before we add another function call provide a generic s390_early_resume function which can be used for this purpose. Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Sebastian Ott 提交于
Convert s390' pci hotplug to be builtin only, with no module option. Suggested-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 29 8月, 2013 1 次提交
-
-
由 Martin Schwidefsky 提交于
The last remaining use for the storage key of the s390 architecture is reference counting. The alternative is to make page table entries invalid while they are old. On access the fault handler marks the pte/pmd as young which makes the pte/pmd valid if the access rights allow read access. The pte/pmd invalidations required for software managed reference bits cost a bit of performance, on the other hand the RRBE/RRBM instructions to read and reset the referenced bits are quite expensive as well. Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 28 8月, 2013 1 次提交
-
-
由 Martin Schwidefsky 提交于
For a single-threaded KVM guest ptep_modify_prot_start will not use IPTE, the invalid bit will therefore not be set. If DEBUG_VM is set pgste_set_key called by ptep_modify_prot_commit will complain about the missing invalid bit. ptep_modify_prot_start should set the invalid bit in all cases. Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 22 8月, 2013 11 次提交
-
-
由 Heiko Carstens 提交于
Fix broken contraints for both save_access_regs() and restore_access_regs(). The constraints are incorrect since they tell the compiler that the inline assemblies only access the first element of an array of 16 elements. Therefore the compiler could generate incorrect code. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Fix inline assembly contraints for non atomic bitops functions. This is broken since 2.6.34 987bcdac "[S390] use inline assembly contraints available with gcc 3.3.3". Reported-by: NAndreas Krebbel <krebbel@linux.vnet.ibm.com> Reported-by: NPeter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Isolate the logic of IDTE vs. IPTE flushing of ptes in two functions, ptep_flush_lazy and __tlb_flush_mm_lazy. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
ptep_modify_prot_start uses the pgste_set helper to store the pgste, while ptep_modify_prot_commit uses its own pointer magic to retrieve the value again. Add the pgste_get help function to keep things symmetrical and improve readability. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
On process exit there is no more need for the pgste information. Skip the expensive storage key operations which should speed up termination of KVM processes. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Improve the encoding of the different pte types and the naming of the page, segment table and region table bits. Due to the different pte encoding the hugetlbfs primitives need to be adapted as well. To improve compatability with common code make the huge ptes use the encoding of normal ptes. The conversion between the pte and pmd encoding for a huge pte is done with set_huge_pte_at and huge_ptep_get. Overall the code is now easier to understand. Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
With the introduction of PCI it became apparent that s390 should convert to generic hardirqs as too many drivers do not have the correct dependency for GENERIC_HARDIRQS. On the architecture level s390 does not have irq lines. It has external interrupts, I/O interrupts and adapter interrupts. This patch hard-codes all external interrupts as irq #1, all I/O interrupts as irq #2 and all adapter interrupts as irq #3. The additional information from the lowcore associated with the interrupt is stored in the pt_regs of the interrupt frame, where the interrupt handler can pick it up. For PCI/MSI interrupts the adapter interrupt handler scans the relevant bit fields and calls generic_handle_irq with the virtual irq number for the MSI interrupt. Reviewed-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Make use of the adapter interrupt helpers in the PCI code. This is the first step to convert the MSI interrupt code to PCI domains. The patch removes the limitation of 64 adapter interrupts per PCI function. Reviewed-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Rename s390pci_xyz to zpci_xxz and set_irq_ctrl to zpci_set_irq_ctrl. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
The PCI code is the first user of adapter interrupts vectors. Add a set of helpers to airq.c to separate the adatper interrupt code from the PCI bits. The helpers allow for adapter interrupt vectors of any size. Reviewed-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 17 8月, 2013 1 次提交
-
-
由 Guenter Roeck 提交于
Fix this build error: In file included from fs/exec.c:61:0: arch/s390/include/asm/tlb.h:35:23: error: expected identifier or '(' before 'unsigned' arch/s390/include/asm/tlb.h:36:1: warning: no semicolon at end of struct or union [enabled by default] arch/s390/include/asm/tlb.h: In function 'tlb_gather_mmu': arch/s390/include/asm/tlb.h:57:5: error: 'struct mmu_gather' has no member named 'end' Broken due to commit 2b047252 ("Fix TLB gather virtual address range invalidation corner cases"). Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Signed-off-by: NGuenter Roeck <linux@roeck-us.net> [ Oh well. We had build testing for ppc amd um, but no s390 - Linus ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 8月, 2013 1 次提交
-
-
由 Linus Torvalds 提交于
Ben Tebulin reported: "Since v3.7.2 on two independent machines a very specific Git repository fails in 9/10 cases on git-fsck due to an SHA1/memory failures. This only occurs on a very specific repository and can be reproduced stably on two independent laptops. Git mailing list ran out of ideas and for me this looks like some very exotic kernel issue" and bisected the failure to the backport of commit 53a59fc6 ("mm: limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT"). That commit itself is not actually buggy, but what it does is to make it much more likely to hit the partial TLB invalidation case, since it introduces a new case in tlb_next_batch() that previously only ever happened when running out of memory. The real bug is that the TLB gather virtual memory range setup is subtly buggered. It was introduced in commit 597e1c35 ("mm/mmu_gather: enable tlb flush range in generic mmu_gather"), and the range handling was already fixed at least once in commit e6c495a9 ("mm: fix the TLB range flushed when __tlb_remove_page() runs out of slots"), but that fix was not complete. The problem with the TLB gather virtual address range is that it isn't set up by the initial tlb_gather_mmu() initialization (which didn't get the TLB range information), but it is set up ad-hoc later by the functions that actually flush the TLB. And so any such case that forgot to update the TLB range entries would potentially miss TLB invalidates. Rather than try to figure out exactly which particular ad-hoc range setup was missing (I personally suspect it's the hugetlb case in zap_huge_pmd(), which didn't have the same logic as zap_pte_range() did), this patch just gets rid of the problem at the source: make the TLB range information available to tlb_gather_mmu(), and initialize it when initializing all the other tlb gather fields. This makes the patch larger, but conceptually much simpler. And the end result is much more understandable; even if you want to play games with partial ranges when invalidating the TLB contents in chunks, now the range information is always there, and anybody who doesn't want to bother with it won't introduce subtle bugs. Ben verified that this fixes his problem. Reported-bisected-and-tested-by: NBen Tebulin <tebulin@googlemail.com> Build-testing-by: NStephen Rothwell <sfr@canb.auug.org.au> Build-testing-by: NRichard Weinberger <richard.weinberger@gmail.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 8月, 2013 1 次提交
-
-
由 Frederic Weisbecker 提交于
If the arch overrides some generic vtime APIs, let it describe these on a dedicated and standalone header. This way it becomes convenient to include it in vtime generic headers without irrelevant stuff in such a low level header. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Kevin Hilman <khilman@linaro.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
-