- 16 10月, 2015 1 次提交
-
-
由 Hendrik Brueckner 提交于
Split the API and FPU type definitions into separate header files similar to "x86/fpu: Rename fpu-internal.h to fpu/internal.h" (78f7f1e5). The new header files and their meaning are: asm/fpu/types.h: FPU related data types, needed for 'struct thread_struct' and 'struct task_struct'. asm/fpu/api.h: FPU related 'public' functions for other subsystems and device drivers. asm/fpu/internal.h: FPU internal functions mainly used to convert FPU register contents in signal handling. Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 14 10月, 2015 27 次提交
-
-
由 Christian Borntraeger 提交于
the kernel locks have aqcuire/release semantics. No operation done after the lock can be "moved" before the lock and no operation before the unlock can be moved after the unlock. But it is perfectly fine that memory accesses which happen code wise after unlock are performed within the critical section. On s390x, reads are in-order with other reads (PoP section "Storage-Operand Fetch References") and writes are in-order with other writes (PoP section "Storage-Operand Store References"). Writes are also in-order with reads to the same memory location (PoP section "Storage-Operand Store References"). To other CPUs (and the channel subsystem), reads additionally appear to be performed prior to reads or writes that happen after them in the conceptual sequence (PoP section "Relation between Operand Accesses"). So at least as observed by other CPUs and the channel subsystem, reads inside the critical sections will not happen after unlock (and writes are in-order anyway). That's exactly what we need for "RELEASE operations" (memory-barriers.txt): "It guarantees that all memory operations before the RELEASE operation will appear to happen before the RELEASE operation with respect to the other components of the system." Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-By: NSascha Silbe <silbe@linux.vnet.ibm.com> [cross-reading and lot of improvements for the patch description] Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
The first level machine check handler for etr and stp machine checks may call queue_work() while in nmi context. This may deadlock e.g. if the machine check happened when the interrupted context did hold a lock, that also will be acquired by queue_work(). Therefore split etr and stp machine check handling into first and second level handling. The second level handling will then issue the queue_work() call in process context which avoids the potential deadlock. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Sebastian Ott 提交于
zpci_err_insn writes stale stack content to the debugfs. Ensure that the struct in zpci_err_insn is ordered in a way that we don't have uninitialized holes in it. In addition to that add the packed attribute. Fixes: 3d8258e4 (s390/pci: move debug messages to debugfs) Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
With the removal of 31 bit support a couple of defines became unused. Remove them. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
For some unknown reason the mcck_interruption_code field is defined as array of two 32 bit values. Given that this actually is a 64 bit field according to the architecture, change the type to u64. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
The defines that are used in entry.S have been partially converted to use the _BITUL macro (setup.h). This patch converts the rest. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
The cpu flags and pt_regs flags fields are each 64 bits in size. A flag can be set with helper functions like set_cpu_flags(). These functions create a mask using "1U << flag". This doesn't work if flag is larger than 31, since 1U << 32 == 0. So fix this in case we ever will have such flag numbers. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
When using systemtap it was observed that our udelay implementation is rather suboptimal if being called from a kprobe handler installed by systemtap. The problem observed when a kprobe was installed on lock_acquired(). When the probe was hit the kprobe handler did call udelay, which set up an (internal) timer and reenabled interrupts (only the clock comparator interrupt) and waited for the interrupt. This is an optimization to avoid that the cpu is busy looping while waiting that enough time passes. The problem is that the interrupt handler still does call irq_enter()/irq_exit() which then again can lead to a deadlock, since some accounting functions may take locks as well. If one of these locks is the same, which caused lock_acquired() to be called, we have a nice deadlock. This patch reworks the udelay code for the interrupts disabled case to immediately leave the low level interrupt handler when the clock comparator interrupt happens. That way no C code is being called and the deadlock cannot happen anymore. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Christian Borntraeger 提交于
The program parameter can be used to mark hardware samples with some token. Previously, it was used to mark guest samples only. Improve the program parameter doubleword by combining two parts, the leftmost LPP part and the rightmost PID part. Set the PID part for processes by using the task PID. To distinguish host and guest samples for the kernel (PID part is zero), the guest must always set the program paramater to a non-zero value. Use the leftmost bit in the LPP part of the program parameter to be able to detect guest kernel samples. [brueckner@linux.vnet.ibm.com]: Split __LC_CURRENT and introduced __LC_LPP. Corrected __LC_CURRENT users and adjusted assembler parts. And updated the commit message accordingly. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
The use of OFFSET instead of DEFINE makes the definitions in asm-offsets.c more readable. While we are at it sort the defines for struct _lowcore according to the field order and remove some unneeded defines. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Hendrik Brueckner 提交于
Various functions in entry.S perform test-under-mask instructions to test for particular bits in memory. Because test-under-mask uses a mask value of one byte, the mask value and the offset into the memory must be calculated manually. This easily introduces errors and is hard to review and read. Introduce the TSTMSK assembler macro to specify a mask constant and let the macro calculate the offset and the byte mask to generate a test-under-mask instruction. The benefit is that existing symbolic constants can now be used for tests. Also the macro checks for zero mask values and mask values that consist of multiple bytes. Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Hendrik Brueckner 提交于
Previously, the init task did not have an allocated FPU save area and saving an FPU state was not possible. Now if the vector extension is always enabled, provide a static FPU save area to save FPU states of vector instructions that can be executed quite early. Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Hendrik Brueckner 提交于
If the kernel detects that the s390 hardware supports the vector facility, it is enabled by default at an early stage. To force it off, use the novx kernel parameter. Note that there is a small time window, where the vector facility is enabled before it is forced to be off. With enabling the vector facility by default, the FPU save and restore functions can be improved. They do not longer require to manage expensive control register updates to enable or disable the vector enablement control for particular processes. Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
The call to pgste_set_key in ptep_set_access_flags can be avoided if the old pte is found to be valid at the time the new access rights are set. The function that created the old, valid pte already completed the required storage key operation. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
The principles of operation states reads are in order, writes are in order, writes can be reordered after reads, but no reads can be reordered after writes. The atomic and bitops variantes for z196 use the interlocked-access facility instructions with a memory barrier before and after the instruction. Because of the memory ordering the first barrier is unnecessary and can be removed. Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
To be able to analyse problems in regard to hypervisor overhead add a tracepoing for diagnose calls. It reports the number of the diagnose issued, e.g. sshd-1385 [002] .... 42.701431: diagnose: nr=0x9c <idle>-0 [001] ..s. 43.587528: diagnose: nr=0x9c Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Introduce /sys/debug/kernel/diag_stat with a statistic how many diagnose calls have been done by each CPU in the system. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
The generic implementation for test_and_set_bit_lock in include/asm-generic uses the standard test_and_set_bit operation. This is done with either a 'csg' or a 'loag' instruction. For both version the cache line is fetched exclusively, even if the bit is already set. The result is an increase in cache traffic, for a contented lock this is a bad idea. Acked-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
Use bit 2**1 of the pte and bit 2**14 of the pmd for the soft dirty bit. The fault mechanism to do dirty tracking is already in place. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Sebastian Ott 提交于
We often need to correlate an 8 bit path mask with the position in a channel path array. Introduce and use pathmask_to_pos for that task. Reviewed-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com> Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Sebastian Ott 提交于
Hold the device_lock during [de]activation of the channel measurement block to synchronize concurrent usage of these functions. Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Alexander Kuleshov 提交于
The <linux/memblock.h> already provides for_each_mem_range() macro that iterates through memblock areas from type_a and not included in type_b. We can remove custom for_each_dump_mem_range() macro and use the for_each_mem_range() instead. Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Christian Borntraeger 提交于
The principles of operation says: The storage-operand fetch references of one instruction occur after those of all preceding instructions and before those of subsequent instructions, as observed by other CPUs and by channel programs. [...] The CPU may fetch the operands of instructions before the instructions are executed. [...] The CPU may delay placing results in storage. [...] the results of one instruction are placed in storage after the results of all preceding instructions have been placed in storage and before any results of the succeeding instructions are stored, as observed by other CPUs and by the channel subsystem. which boils down to: - reads are in order - writes are in order - reads can happen earlier - writes can happen later By definition (see memory-barrier.txt) read barriers orders reads vs reads and write barriers orders writes agains writes. but neither of these orders reads vs. writes. That means we can implement smp_wmb,smp_rmb,wmb and rmb as simple compiler barriers. To avoid reviewing all driver code for correct barrier usage we keep dma_[rw]mb as serialization for now. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Christian Borntraeger 提交于
By definition smp_wmb only orders writes against writes. (Finish all previous writes, and do not start any future write). To protect the vdso init code against early reads on other CPUs, let's use a full smp_mb at the end of vdso init. As right now smp_wmb is implemented as full serialization, this needs no stable backport, but this change will be necessary if we reimplement smp_wmb. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Christian Borntraeger 提交于
_raw_write_lock_wait first sets the high order bit to indicate a pending writer and then waits for the reader to drop to zero. smp_rmb by definition only orders reads against reads. Let's use a full smp_mb instead. As right now smp_rmb is implemented as full serialization, this needs no stable backport, but this patch will be necessary if we reimplement smp_rmb. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Michael Holzheu 提交于
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 01 10月, 2015 1 次提交
-
-
由 Sebastian Ott 提交于
Fix this warning: arch/s390/configs/performance_defconfig:380:warning: symbol value 'm' invalid for SCSI_DH Introduced via 086b91d0 (scsi_dh: integrate into the core SCSI code) Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 30 9月, 2015 1 次提交
-
-
由 Martin Schwidefsky 提交于
The calculation for the SMT scaling factor for a hardware thread which has been partially idle needs to disregard the cycles spent by the other threads of the core while the thread is idle. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 29 9月, 2015 1 次提交
-
-
由 Christian Borntraeger 提交于
my gcc 5.1 used an ldgr instruction with a register != 0,2,4,6 for spilling/filling into a floating point register in our decompressor. This will cause an AFP-register data exception as the decompressor did not setup the additional floating point registers via cr0. That causes a program check loop that looked like a hang with one "Uncompressing Linux... " message (directly booted via kvm) or a loop of "Uncompressing Linux... " messages (when booted via zipl boot loader). The offending code in my build was 48e400: e3 c0 af ff ff 71 lay %r12,-1(%r10) -->48e406: b3 c1 00 1c ldgr %f1,%r12 48e40a: ec 6c 01 22 02 7f clij %r6,2,12,0x48e64e but gcc could do spilling into an fpr at any function. We can simply disable floating point support at that early stage. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: stable@vger.kernel.org
-
- 25 9月, 2015 1 次提交
-
-
由 David Hildenbrand 提交于
We observed some performance degradation on s390x with dynamic halt polling. Until we can provide a proper fix, let's enable halt_poll_ns as default only for supported architectures. Architectures are now free to set their own halt_poll_ns default value. Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 9月, 2015 1 次提交
-
-
由 Martin Schwidefsky 提交于
With CONFIG_CPUMASK_OFFSTACK=y cpumask_var_t is a pointer to a CPU mask. Replace the incorrect type for node_to_cpumask_map with cpumask_t. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 18 9月, 2015 3 次提交
-
-
由 Heiko Carstens 提交于
As discussed on linux-arch all architectures should wire up the separate system calls that are hidden behind the socketcall multiplexer system call. It's just a couple more system calls and gives us a very small performance improvement. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
A couple of compat wrapper functions are simply trampolines to the real system call. This happened because the compat wrapper defines will only sign and zero extend system call parameters which are of different size on s390/s390x (longs and pointers). All other parameters will be correctly sign and zero extended by the normal system call wrappers. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Add notrace to the compat wrapper define to disable tracing of compat wrapper functions. These are supposed to be very small and more or less just a trampoline to the real system call. Also fix indentation. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 17 9月, 2015 4 次提交
-
-
由 Mathieu Desnoyers 提交于
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: linux-api@vger.kernel.org CC: Heiko Carstens <heiko.carstens@de.ibm.com> CC: linux-s390@vger.kernel.org Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Michael Holzheu 提交于
This config option is completely irrelevant for zfcpdump and unfortunately causes a kernel panic on recent kernels in "mspro_block_init()/driver_register()". Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Heiko Carstens 提交于
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Martin Schwidefsky 提交于
The scaled cputime is supposed to be derived from the normal per-thread cputime by dividing it with the average thread density in the last interval. The calculation of the scaling values for the average thread density is incorrect. The current, incorrect calculation: Ci = cycle count with i active threads T = unscaled cputime, sT = scaled cputime sT = T * (C1 + C2 + ... + Cn) / (1*C1 + 2*C2 + ... + n*Cn) The calculation happens to yield the correct numbers for the simple cases with only one Ci value not zero. But for cases with multiple Ci values not zero it fails. E.g. on a SMT-2 system with one thread active half the time and two threads active for the other half of the time it fails, the scaling factor should be 3/4 but the formula gives 2/3. The correct formula is sT = T * (C1/1 + C2/2 + ... + Cn/n) / (C1 + C2 + ... + Cn) Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-