提交 · b0753902d42f5cb01c33f0dec47ba2aa7ecfbb3f · openanolis / cloud-kernel

16 10月, 2015 1 次提交

s390/fpu: split fpu-internal.h into fpu internals, api, and type headers · b0753902

由 Hendrik Brueckner 提交于 10月 06, 2015

Split the API and FPU type definitions into separate header files
similar to "x86/fpu: Rename fpu-internal.h to fpu/internal.h" (78f7f1e5).

The new header files and their meaning are:

asm/fpu/types.h:
	FPU related data types, needed for 'struct thread_struct' and
	'struct task_struct'.

asm/fpu/api.h:
	FPU related 'public' functions for other subsystems and device
	drivers.

asm/fpu/internal.h:
	FPU internal functions mainly used to convert
	FPU register contents in signal handling.
Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b0753902

14 10月, 2015 27 次提交

s390/spinlock: remove unneeded serializations at unlock · fdbbe8e7

由 Christian Borntraeger 提交于 10月 09, 2015

the kernel locks have aqcuire/release semantics. No operation done
after the lock can be "moved" before the lock and no operation before
the unlock can be moved after the unlock. But it is perfectly fine
that memory accesses which happen code wise after unlock are performed
within the critical section.
On s390x, reads are in-order with other reads (PoP section
"Storage-Operand Fetch References") and writes are in-order with
other writes (PoP section "Storage-Operand Store References"). Writes
are also in-order with reads to the same memory location (PoP section
"Storage-Operand Store References"). To other CPUs (and the channel
subsystem), reads additionally appear to be performed prior to reads or
writes that happen after them in the conceptual sequence (PoP section
"Relation between Operand Accesses").
So at least as observed by other CPUs and the channel subsystem, reads
inside the critical sections will not happen after unlock (and writes
are in-order anyway). That's exactly what we need for "RELEASE
operations" (memory-barriers.txt): "It guarantees that all memory
operations before the RELEASE operation will appear to happen before the
RELEASE operation with respect to the other components of the system."
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-By: NSascha Silbe <silbe@linux.vnet.ibm.com>
[cross-reading and lot of improvements for the patch description]
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

fdbbe8e7

s390/etr,stp: fix possible deadlock on machine check · 29b0a825

由 Heiko Carstens 提交于 10月 09, 2015

The first level machine check handler for etr and stp machine checks may
call queue_work() while in nmi context. This may deadlock e.g. if the
machine check happened when the interrupted context did hold a lock, that
also will be acquired by queue_work().
Therefore split etr and stp machine check handling into first and second
level handling. The second level handling will then issue the queue_work()
call in process context which avoids the potential deadlock.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

29b0a825

s390/pci: reshuffle struct used to write debug data · 7cc8944e

由 Sebastian Ott 提交于 10月 09, 2015

zpci_err_insn writes stale stack content to the debugfs.

Ensure that the struct in zpci_err_insn is ordered in a way that
we don't have uninitialized holes in it. In addition to that
add the packed attribute.

Fixes: 3d8258e4 (s390/pci: move debug messages to debugfs)
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

7cc8944e

s390/bitops: remove 31 bit related comments · 48002bd5

由 Heiko Carstens 提交于 10月 08, 2015

Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

48002bd5

s390/cmpxchg: remove dead code · e4165dcb

由 Heiko Carstens 提交于 10月 08, 2015

With the removal of 31 bit support a couple of defines became unused.
Remove them.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e4165dcb

s390/nmi: change type of mcck_interruption_code lowcore field · 004f0bba

由 Heiko Carstens 提交于 10月 06, 2015

For some unknown reason the mcck_interruption_code field is defined
as array of two 32 bit values. Given that this actually is a 64 bit
field according to the architecture, change the type to u64.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

004f0bba

s390/flags: use _BITUL macro · 92778b99

由 Heiko Carstens 提交于 10月 06, 2015

The defines that are used in entry.S have been partially converted to
use the _BITUL macro (setup.h). This patch converts the rest.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

92778b99

s390/flags: fix flag handling · ac25e790

由 Heiko Carstens 提交于 10月 06, 2015

The cpu flags and pt_regs flags fields are each 64 bits in size. A flag can
be set with helper functions like set_cpu_flags().

These functions create a mask using "1U << flag". This doesn't work if flag
is larger than 31, since 1U << 32 == 0.

So fix this in case we ever will have such flag numbers.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

ac25e790

s390/udelay: make udelay have busy loop semantics · db7e007f

由 Heiko Carstens 提交于 8月 15, 2015

When using systemtap it was observed that our udelay implementation is
rather suboptimal if being called from a kprobe handler installed by
systemtap.

The problem observed when a kprobe was installed on lock_acquired().
When the probe was hit the kprobe handler did call udelay, which set
up an (internal) timer and reenabled interrupts (only the clock comparator
interrupt) and waited for the interrupt.
This is an optimization to avoid that the cpu is busy looping while waiting
that enough time passes. The problem is that the interrupt handler still
does call irq_enter()/irq_exit() which then again can lead to a deadlock,
since some accounting functions may take locks as well.

If one of these locks is the same, which caused lock_acquired() to be
called, we have a nice deadlock.

This patch reworks the udelay code for the interrupts disabled case to
immediately leave the low level interrupt handler when the clock
comparator interrupt happens. That way no C code is being called and the
deadlock cannot happen anymore.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

db7e007f

s390/cpumf: rework program parameter setting to detect guest samples · e22cf8ca

由 Christian Borntraeger 提交于 10月 06, 2015

The program parameter can be used to mark hardware samples with
some token.  Previously, it was used to mark guest samples only.

Improve the program parameter doubleword by combining two parts,
the leftmost LPP part and the rightmost PID part.  Set the PID
part for processes by using the task PID.
To distinguish host and guest samples for the kernel (PID part
is zero), the guest must always set the program paramater to a
non-zero value.  Use the leftmost bit in the LPP part of the
program parameter to be able to detect guest kernel samples.

[brueckner@linux.vnet.ibm.com]: Split __LC_CURRENT and introduced
__LC_LPP. Corrected __LC_CURRENT users and adjusted assembler parts.
And updated the commit message accordingly.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e22cf8ca

s390/asm: make use of the OFFSET macro to define assember constants · 6a62b485

由 Martin Schwidefsky 提交于 10月 05, 2015

The use of OFFSET instead of DEFINE makes the definitions in asm-offsets.c
more readable. While we are at it sort the defines for struct _lowcore
according to the field order and remove some unneeded defines.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

6a62b485

s390/entry: add assembler macro to conveniently tests under mask · 83abeffb

由 Hendrik Brueckner 提交于 10月 01, 2015

Various functions in entry.S perform test-under-mask instructions
to test for particular bits in memory. Because test-under-mask uses
a mask value of one byte, the mask value and the offset into the
memory must be calculated manually. This easily introduces errors
and is hard to review and read.

Introduce the TSTMSK assembler macro to specify a mask constant and
let the macro calculate the offset and the byte mask to generate a
test-under-mask instruction. The benefit is that existing symbolic
constants can now be used for tests. Also the macro checks for
zero mask values and mask values that consist of multiple bytes.
Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

83abeffb

s390/fpu: add static FPU save area for init_task · 0ac27779

由 Hendrik Brueckner 提交于 9月 29, 2015

Previously, the init task did not have an allocated FPU save area and
saving an FPU state was not possible. Now if the vector extension is
always enabled, provide a static FPU save area to save FPU states of
vector instructions that can be executed quite early.
Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

0ac27779

s390/fpu: always enable the vector facility if it is available · b5510d9b

由 Hendrik Brueckner 提交于 9月 29, 2015

If the kernel detects that the s390 hardware supports the vector
facility, it is enabled by default at an early stage.  To force
it off, use the novx kernel parameter.  Note that there is a small
time window, where the vector facility is enabled before it is
forced to be off.

With enabling the vector facility by default, the FPU save and
restore functions can be improved.  They do not longer require
to manage expensive control register updates to enable or disable
the vector enablement control for particular processes.
Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b5510d9b

s390/mm: try to avoid storage key operation in ptep_set_access_flags · 395e6aa1

由 Martin Schwidefsky 提交于 9月 29, 2015

The call to pgste_set_key in ptep_set_access_flags can be avoided
if the old pte is found to be valid at the time the new access
rights are set. The function that created the old, valid pte already
completed the required storage key operation.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

395e6aa1

s390/barrier: remove unnecessary serialization in atomics and bitops · 5da7667c

由 Martin Schwidefsky 提交于 9月 28, 2015

The principles of operation states reads are in order, writes are in
order, writes can be reordered after reads, but no reads can be
reordered after writes.

The atomic and bitops variantes for z196 use the interlocked-access
facility instructions with a memory barrier before and after the
instruction. Because of the memory ordering the first barrier is
unnecessary and can be removed.
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

5da7667c

s390/diag: add tracepoint for diagnose calls · b5a6b71b

由 Martin Schwidefsky 提交于 8月 21, 2015

To be able to analyse problems in regard to hypervisor overhead
add a tracepoing for diagnose calls. It reports the number of
the diagnose issued, e.g.

            sshd-1385  [002] ....    42.701431: diagnose: nr=0x9c
          <idle>-0     [001] ..s.    43.587528: diagnose: nr=0x9c
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b5a6b71b

s390/diag: add a statistic for diagnose calls · 1ec2772e

由 Martin Schwidefsky 提交于 8月 20, 2015

Introduce /sys/debug/kernel/diag_stat with a statistic how many diagnose
calls have been done by each CPU in the system.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1ec2772e

s390/bitops: implement cache friendly test_and_set_bit_lock · acdc9fc9

由 Martin Schwidefsky 提交于 8月 20, 2015

The generic implementation for test_and_set_bit_lock in include/asm-generic
uses the standard test_and_set_bit operation. This is done with either a
'csg' or a 'loag' instruction. For both version the cache line is fetched
exclusively, even if the bit is already set. The result is an increase in
cache traffic, for a contented lock this is a bad idea.
Acked-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

acdc9fc9

s390/mm: implement soft-dirty bits for user memory change tracking · 5614dd92

由 Martin Schwidefsky 提交于 4月 22, 2015

Use bit 2**1 of the pte and bit 2**14 of the pmd for the soft dirty
bit. The fault mechanism to do dirty tracking is already in place.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

5614dd92

s390/cio: introduce pathmask_to_pos · 9d49f86d

由 Sebastian Ott 提交于 9月 21, 2015

We often need to correlate an 8 bit path mask with the position
in a channel path array. Introduce and use pathmask_to_pos for
that task.
Reviewed-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

9d49f86d

s390/cio: use device_lock during cmb activation · 1bc6664b

由 Sebastian Ott 提交于 9月 15, 2015

Hold the device_lock during [de]activation of the channel measurement
block to synchronize concurrent usage of these functions.
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1bc6664b

s390/crash_dump: use for_each_mem_range · 3c4aac86

由 Alexander Kuleshov 提交于 9月 16, 2015

The <linux/memblock.h> already provides for_each_mem_range() macro that
iterates through memblock areas from type_a and not included in type_b.
We can remove custom for_each_dump_mem_range() macro and use the
for_each_mem_range() instead.
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

3c4aac86

s390/barrier: avoid serialization in [smp_]rmb and [smp_]wmb · 1afc82ae

由 Christian Borntraeger 提交于 9月 11, 2015

The principles of operation says:

The storage-operand fetch references of one instruction
occur after those of all preceding instructions and
before those of subsequent instructions, as observed
by other CPUs and by channel programs.
[...]
The CPU may fetch the operands of instructions before the
instructions are executed.
[...]
The CPU may delay placing results in storage.
[...]
the results of one instruction are placed in storage after
the results of all preceding instructions have been placed
in storage and before any results of the succeeding
instructions are stored, as observed by other CPUs and by
the channel subsystem.

which boils down to:
- reads are in order
- writes are in order
- reads can happen earlier
- writes can happen later

By definition (see memory-barrier.txt) read barriers orders
reads vs reads and write barriers orders writes agains writes.
but neither of these orders reads vs. writes.

That means we can implement smp_wmb,smp_rmb,wmb and rmb as
simple compiler barriers. To avoid reviewing all driver code
for correct barrier usage we keep dma_[rw]mb as serialization
for now.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

1afc82ae

s390/vdso: use correct memory barrier · 33b5881d

由 Christian Borntraeger 提交于 9月 11, 2015

By definition smp_wmb only orders writes against writes. (Finish all
previous writes, and do not start any future write). To protect the
vdso init code against early reads on other CPUs, let's use a full
smp_mb at the end of vdso init. As right now smp_wmb is implemented
as full serialization, this needs no stable backport, but this change
will be necessary if we reimplement smp_wmb.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

33b5881d

s390/spinlock: use correct barriers · e0af21c5

由 Christian Borntraeger 提交于 9月 11, 2015

_raw_write_lock_wait first sets the high order bit to indicate a
pending writer and then waits for the reader to drop to zero.
smp_rmb by definition only orders reads against reads. Let's use
a full smp_mb instead. As right now smp_rmb is implemented
as full serialization, this needs no stable backport, but this
patch will be necessary if we reimplement smp_rmb.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

e0af21c5

s390/numa: write kernel message when emu_size has been increased · b02064a9

由 Michael Holzheu 提交于 9月 03, 2015

Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

b02064a9

01 10月, 2015 1 次提交

s390/defconfig: set SCSI_DH=y · daad0bf1

由 Sebastian Ott 提交于 9月 30, 2015

Fix this warning:
arch/s390/configs/performance_defconfig:380:warning: symbol value 'm' invalid for SCSI_DH

Introduced via 086b91d0
(scsi_dh: integrate into the core SCSI code)
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

daad0bf1

30 9月, 2015 1 次提交

s390/vtime: correct scaled cputime of partially idle CPUs · 72d38b19

由 Martin Schwidefsky 提交于 9月 18, 2015

The calculation for the SMT scaling factor for a hardware thread
which has been partially idle needs to disregard the cycles spent
by the other threads of the core while the thread is idle.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

72d38b19

29 9月, 2015 1 次提交

s390/boot/decompression: disable floating point in decompressor · adc0b7fb

由 Christian Borntraeger 提交于 9月 28, 2015

my gcc 5.1 used an ldgr instruction with a register != 0,2,4,6 for
spilling/filling into a floating point register in our decompressor.

This will cause an AFP-register data exception as the decompressor
did not setup the additional floating point registers via cr0.
That causes a program check loop that looked like a hang with
one "Uncompressing Linux... " message (directly booted via kvm)
or a loop of "Uncompressing Linux... " messages (when booted via
zipl boot loader).

The offending code in my build was

   48e400:       e3 c0 af ff ff 71       lay     %r12,-1(%r10)
-->48e406:       b3 c1 00 1c             ldgr    %f1,%r12
   48e40a:       ec 6c 01 22 02 7f       clij    %r6,2,12,0x48e64e

but gcc could do spilling into an fpr at any function. We can
simply disable floating point support at that early stage.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@vger.kernel.org

adc0b7fb

25 9月, 2015 1 次提交

KVM: disable halt_poll_ns as default for s390x · 920552b2

由 David Hildenbrand 提交于 9月 18, 2015

We observed some performance degradation on s390x with dynamic
halt polling. Until we can provide a proper fix, let's enable
halt_poll_ns as default only for supported architectures.

Architectures are now free to set their own halt_poll_ns
default value.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

920552b2

23 9月, 2015 1 次提交

s390/numa: use correct type for node_to_cpumask_map · 22be9cd9

由 Martin Schwidefsky 提交于 9月 22, 2015

With CONFIG_CPUMASK_OFFSTACK=y cpumask_var_t is a pointer to a CPU mask.
Replace the incorrect type for node_to_cpumask_map with cpumask_t.
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

22be9cd9

18 9月, 2015 3 次提交

s390: wire up separate socketcalls system calls · 977108f8

由 Heiko Carstens 提交于 9月 17, 2015

As discussed on linux-arch all architectures should wire up the separate
system calls that are hidden behind the socketcall multiplexer system call.

It's just a couple more system calls and gives us a very small performance
improvement.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

977108f8

s390/compat: remove superfluous compat wrappers · 7681df45

由 Heiko Carstens 提交于 9月 17, 2015

A couple of compat wrapper functions are simply trampolines to the real
system call. This happened because the compat wrapper defines will only
sign and zero extend system call parameters which are of different size
on s390/s390x (longs and pointers).
All other parameters will be correctly sign and zero extended by the
normal system call wrappers.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

7681df45

s390/compat: do not trace compat wrapper functions · a55b2ae7

由 Heiko Carstens 提交于 9月 17, 2015

Add notrace to the compat wrapper define to disable tracing of compat
wrapper functions. These are supposed to be very small and more or less
just a trampoline to the real system call.

Also fix indentation.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

a55b2ae7

17 9月, 2015 4 次提交

s390/s390x: allocate sys_membarrier system call number · 96d4ee8a

由 Mathieu Desnoyers 提交于 9月 07, 2015

Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: linux-api@vger.kernel.org
CC: Heiko Carstens <heiko.carstens@de.ibm.com>
CC: linux-s390@vger.kernel.org
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

96d4ee8a

s390/configs//zfcpdump_defconfig: Remove CONFIG_MEMSTICK · 3dc636b2

由 Michael Holzheu 提交于 9月 15, 2015

This config option is completely irrelevant for zfcpdump and
unfortunately causes a kernel panic on recent kernels in
"mspro_block_init()/driver_register()".
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

3dc636b2

s390: wire up userfaultfd system call · 02243571

由 Heiko Carstens 提交于 9月 09, 2015

Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

02243571

s390/vtime: correct scaled cputime for SMT · 61cc3790

由 Martin Schwidefsky 提交于 9月 10, 2015

The scaled cputime is supposed to be derived from the normal per-thread
cputime by dividing it with the average thread density in the last interval.

The calculation of the scaling values for the average thread density is
incorrect. The current, incorrect calculation:

    Ci = cycle count with i active threads
    T = unscaled cputime, sT = scaled cputime
    sT = T * (C1 + C2 + ... + Cn) / (1*C1 + 2*C2 + ... + n*Cn)

The calculation happens to yield the correct numbers for the simple cases
with only one Ci value not zero. But for cases with multiple Ci values not
zero it fails. E.g. on a SMT-2 system with one thread active half the time
and two threads active for the other half of the time it fails, the scaling
factor should be 3/4 but the formula gives 2/3.

The correct formula is

    sT = T * (C1/1 + C2/2 + ... + Cn/n) / (C1 + C2 + ... + Cn)
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

61cc3790

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功