提交 · d6dfe2509da935a15583cace7cd3837b1e8addef · openeuler / Kernel

14 8月, 2014 12 次提交

locking,arch,metag: Fold atomic_ops · d6dfe250

由 Peter Zijlstra 提交于 3月 23, 2014

Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.

This also prepares for easy addition of new ops.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NJames Hogan <james.hogan@imgtec.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-metag@vger.kernel.org
Link: http://lkml.kernel.org/r/20140508135852.453864110@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

d6dfe250

locking,arch,m68k: Fold atomic_ops · d839bae4

由 Peter Zijlstra 提交于 3月 23, 2014

Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.

This also prepares for easy addition of new ops.

Requires asm_op due to eor.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: linux-m68k@lists.linux-m68k.org
Link: http://lkml.kernel.org/r/20140509091646.GO30445@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

d839bae4

locking,arch,m32r: Fold atomic_ops · c9ebe21b

由 Peter Zijlstra 提交于 3月 23, 2014

Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.

This also prepares for easy addition of new ops.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: linux-m32r-ja@ml.linux-m32r.org
Cc: linux-m32r@ml.linux-m32r.org
Link: http://lkml.kernel.org/r/20140508135852.318635136@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

c9ebe21b

locking,arch,ia64: Fold atomic_ops · 08be2dab

由 Peter Zijlstra 提交于 3月 23, 2014

Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.

This also prepares for easy addition of new ops.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Link: http://lkml.kernel.org/r/20140508135852.245224472@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

08be2dab

locking,arch,hexagon: Fold atomic_ops · 50f853e3

由 Peter Zijlstra 提交于 3月 23, 2014

OK, no LoC saved in this case because the !return variants were
defined in terms of the return ops. Still do it because this also
prepares for easy addition of new ops.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NRichard Kuo <rkuo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-hexagon@vger.kernel.org
Link: http://lkml.kernel.org/r/20140508135852.171567636@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

50f853e3

locking,arch,cris: Fold atomic_ops · 7179e30e

由 Peter Zijlstra 提交于 3月 23, 2014

Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.

This also prepares for easy addition of new ops.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: linux-cris-kernel@axis.com
Link: http://lkml.kernel.org/r/20140508135852.104572724@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

7179e30e

locking,arch,avr32: Fold atomic_ops · d325209b

由 Peter Zijlstra 提交于 4月 09, 2014

Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.

This also prepares for easy addition of new ops.

Requires the asm_op because of eor.

AVR32 is a bit special in that its ADD/SUB instructions are not
symmetric. Its SUB instruction allows for an 21bit immediate.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20140531141445.GD16155@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

d325209b

locking,arch,arm64: Fold atomic_ops · 92ba1f53

由 Peter Zijlstra 提交于 3月 23, 2014

Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.

This also prepares for easy addition of new ops.

Requires the asm_op due to eor.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NWill Deacon <will.deacon@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Gang <gang.chen@asianux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20140508135851.995123148@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

92ba1f53

locking,arch,arm: Fold atomic_ops · aee9a554

由 Peter Zijlstra 提交于 3月 23, 2014

Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.

This also prepares for easy addition of new ops.

Requires the asm_op because of eor.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NWill Deacon <will.deacon@arm.com>
Cc: Chen Gang <gang.chen@asianux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Albin Tonnerre <albin.tonnerre@arm.com>
Cc: Victor Kamensky <victor.kamensky@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20140508135851.939725247@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

aee9a554

locking,arch,arc: Fold atomic_ops · f7d11e93

由 Peter Zijlstra 提交于 3月 23, 2014

Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.

This also prepares for easy addition of new ops.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Link: http://lkml.kernel.org/r/20140508135851.886055622@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

f7d11e93

locking,arch,alpha: Fold atomic_ops · b93c7b8c

由 Peter Zijlstra 提交于 3月 23, 2014

Many of the atomic op implementations are the same except for one
instruction; fold the lot into a few CPP macros and reduce LoC.

This also prepares for easy addition of new ops.

Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: linux-alpha@vger.kernel.org
Link: http://lkml.kernel.org/r/20140508135851.832107183@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

b93c7b8c

locking,x86: Kill atomic_or_long() · f6b4ecee

由 Peter Zijlstra 提交于 4月 23, 2014

There are no users, kill it.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20140508135851.768177189@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

f6b4ecee

03 8月, 2014 1 次提交

x86/pmc_atom: Silence shift wrapping warnings in pmc_sleep_tmr_show() · 4c51cb00

由 Dan Carpenter 提交于 8月 01, 2014

I don't know if we really need 64 bits here but these variables are
declared as u64 and it can't hurt to cast this so we prevent any shift
wrapping.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NAubrey Li <aubrey.li@linux.intel.com>
Link: http://lkml.kernel.org/r/20140801082715.GE28869@mwandaSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

4c51cb00

02 8月, 2014 2 次提交

ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction · 6bf755db

由 Omar Sandoval 提交于 8月 01, 2014

The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn)
should only be entered when a kgdb break instruction is executed
from the kernel. Otherwise, if kgdb is enabled, a userspace program
can cause the kernel to drop into the debugger by executing either
KGDB_BREAKINST or KGDB_COMPILED_BREAK.
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

6bf755db

ARM: idmap: add identity mapping usage note · c5cc87fa

由 Russell King 提交于 7月 29, 2014

Add a note about the usage of the identity mapping; we do not support
accesses outside of the identity map region and kernel image while a
CPU is using the identity map. This is because the identity mapping
may overwrite vmalloc space, IO mappings, the vectors pages, etc.
Acked-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

c5cc87fa

01 8月, 2014 1 次提交

arm64: add newline to I-cache policy string · ea171967

由 Mark Rutland 提交于 8月 01, 2014

Due to a missing newline in the I-cache policy detection log output,
it's possible to get some ratehr unfortunate output at boot time:

CPU1: Booted secondary processor
Detected VIPT I-cache on CPU1CPU2: Booted secondary processor
Detected VIPT I-cache on CPU2CPU3: Booted secondary processor
Detected VIPT I-cache on CPU3CPU4: Booted secondary processor
Detected PIPT I-cache on CPU4CPU5: Booted secondary processor
Detected PIPT I-cache on CPU5Brought up 6 CPUs
SMP: Total of 6 processors activated.

This patch adds the missing newline to the format string, cleaning up
the output.

Fixes: 59ccc0d4 ("arm64: cachetype: report weakest cache policy")
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

ea171967

31 7月, 2014 13 次提交

x86/mm: Set TLB flush tunable to sane value (33) · a5102476

由 Dave Hansen 提交于 7月 31, 2014

This has been run through Intel's LKP tests across a wide range
of modern sytems and workloads and it wasn't shown to make a
measurable performance difference positive or negative.

Now that we have some shiny new tracepoints, we can actually
figure out what the heck is going on.

During a kernel compile, 60% of the flush_tlb_mm_range() calls
are for a single page. It breaks down like this:

size percent percent<=
V V V
GLOBAL: 2.20% 2.20% avg cycles: 2283
1: 56.92% 59.12% avg cycles: 1276
2: 13.78% 72.90% avg cycles: 1505
3: 8.26% 81.16% avg cycles: 1880
4: 7.41% 88.58% avg cycles: 2447
5: 1.73% 90.31% avg cycles: 2358
6: 1.32% 91.63% avg cycles: 2563
7: 1.14% 92.77% avg cycles: 2862
8: 0.62% 93.39% avg cycles: 3542
9: 0.08% 93.47% avg cycles: 3289
10: 0.43% 93.90% avg cycles: 3570
11: 0.20% 94.10% avg cycles: 3767
12: 0.08% 94.18% avg cycles: 3996
13: 0.03% 94.20% avg cycles: 4077
14: 0.02% 94.23% avg cycles: 4836
15: 0.04% 94.26% avg cycles: 5699
16: 0.06% 94.32% avg cycles: 5041
17: 0.57% 94.89% avg cycles: 5473
18: 0.02% 94.91% avg cycles: 5396
19: 0.03% 94.95% avg cycles: 5296
20: 0.02% 94.96% avg cycles: 6749
21: 0.18% 95.14% avg cycles: 6225
22: 0.01% 95.15% avg cycles: 6393
23: 0.01% 95.16% avg cycles: 6861
24: 0.12% 95.28% avg cycles: 6912
25: 0.05% 95.32% avg cycles: 7190
26: 0.01% 95.33% avg cycles: 7793
27: 0.01% 95.34% avg cycles: 7833
28: 0.01% 95.35% avg cycles: 8253
29: 0.08% 95.42% avg cycles: 8024
30: 0.03% 95.45% avg cycles: 9670
31: 0.01% 95.46% avg cycles: 8949
32: 0.01% 95.46% avg cycles: 9350
33: 3.11% 98.57% avg cycles: 8534
34: 0.02% 98.60% avg cycles: 10977
35: 0.02% 98.62% avg cycles: 11400

We get in to dimishing returns pretty quickly. On pre-IvyBridge
CPUs, we used to set the limit at 8 pages, and it was set at 128
on IvyBrige. That 128 number looks pretty silly considering that
less than 0.5% of the flushes are that large.

The previous code tried to size this number based on the size of
the TLB. Good idea, but it's error-prone, needs maintenance
(which it didn't get up to now), and probably would not matter in
practice much.

Settting it to 33 means that we cover the mallopt
M_TRIM_THRESHOLD, which is the most universally common size to do
flushes.

That's the short version. Here's the long one for why I chose 33:

1. These numbers have a constant bias in the timestamps from the
tracing. Probably counts for a couple hundred cycles in each of
these tests, but it should be fairly _even_ across all of them.
The smallest delta between the tracepoints I have ever seen is
335 cycles. This is one reason the cycles/page cost goes down in
general as the flushes get larger. The true cost is nearer to
100 cycles.
2. A full flush is more expensive than a single invlpg, but not
by much (single percentages).
3. A dtlb miss is 17.1ns (~45 cycles) and a itlb miss is 13.0ns
(~34 cycles). At those rates, refilling the 512-entry dTLB takes
22,000 cycles.
4. 22,000 cycles is approximately the equivalent of doing 85
invlpg operations. But, the odds are that the TLB can
actually be filled up faster than that because TLB misses that
are close in time also tend to leverage the same caches.
6. ~98% of flushes are <=33 pages. There are a lot of flushes of
33 pages, probably because libc's M_TRIM_THRESHOLD is set to
128k (32 pages)
7. I've found no consistent data to support changing the IvyBridge
vs. SandyBridge tunable by a factor of 16

I used the performance counters on this hardware (IvyBridge i5-3320M)
to figure out the tlb miss costs:

ocperf.py stat -e dtlb_load_misses.walk_duration,dtlb_load_misses.walk_completed,dtlb_store_misses.walk_duration,dtlb_store_misses.walk_completed,itlb_misses.walk_duration,itlb_misses.walk_completed,itlb.itlb_flush

7,720,030,970 dtlb_load_misses_walk_duration [57.13%]
169,856,353 dtlb_load_misses_walk_completed [57.15%]
708,832,859 dtlb_store_misses_walk_duration [57.17%]
19,346,823 dtlb_store_misses_walk_completed [57.17%]
2,779,687,402 itlb_misses_walk_duration [57.15%]
82,241,148 itlb_misses_walk_completed [57.13%]
770,717 itlb_itlb_flush [57.11%]

Show that a dtlb miss is 17.1ns (~45 cycles) and a itlb miss is 13.0ns
(~34 cycles). At those rates, refilling the 512-entry dTLB takes
22,000 cycles. On a SandyBridge system with more cores and larger
caches, those are dtlb=13.4ns and itlb=9.5ns.

cat perf.stat.txt | perl -pe 's/,//g'
| awk '/itlb_misses_walk_duration/ { icyc+=$1 }
/itlb_misses_walk_completed/ { imiss+=$1 }
/dtlb_.*_walk_duration/ { dcyc+=$1 }
/dtlb_.*.*completed/ { dmiss+=$1 }
END {print "itlb cyc/miss: ", icyc/imiss, " dtlb cyc/miss: ", dcyc/dmiss, " ----- ", icyc,imiss, dcyc,dmiss }

On Westmere CPUs, the counters to use are: itlb_flush,itlb_misses.walk_cycles,itlb_misses.any,dtlb_misses.walk_cycles,dtlb_misses.any

The assumptions that this code went in under:
https://lkml.org/lkml/2012/6/12/119 say that a flush and a refill are
about 100ns. Being generous, that is over by a factor of 6 on the
refill side, although it is fairly close on the cost of an invlpg.
An increase of a single invlpg operation seems to lengthen the flush
range operation by about 200 cycles. Here is one example of the data
collected for flushing 10 and 11 pages (full data are below):

10: 0.43% 93.90% avg cycles: 3570 cycles/page: 357 samples: 4714
11: 0.20% 94.10% avg cycles: 3767 cycles/page: 342 samples: 2145

How to generate this table:

echo 10000 > /sys/kernel/debug/tracing/buffer_size_kb
echo x86-tsc > /sys/kernel/debug/tracing/trace_clock
echo 'reason != 0' > /sys/kernel/debug/tracing/events/tlb/tlb_flush/filter
echo 1 > /sys/kernel/debug/tracing/events/tlb/tlb_flush/enable

Pipe the trace output in to this script:

http://sr71.net/~dave/intel/201402-tlb/trace-time-diff-process.pl.txt

Note that these data were gathered with the invlpg threshold set to
150 pages. Only data points with >=50 of samples were printed:

Flush % of %<=
in flush this
pages es size
------------------------------------------------------------------------------
-1: 2.20% 2.20% avg cycles: 2283 cycles/page: xxxx samples: 23960
1: 56.92% 59.12% avg cycles: 1276 cycles/page: 1276 samples: 620895
2: 13.78% 72.90% avg cycles: 1505 cycles/page: 752 samples: 150335
3: 8.26% 81.16% avg cycles: 1880 cycles/page: 626 samples: 90131
4: 7.41% 88.58% avg cycles: 2447 cycles/page: 611 samples: 80877
5: 1.73% 90.31% avg cycles: 2358 cycles/page: 471 samples: 18885
6: 1.32% 91.63% avg cycles: 2563 cycles/page: 427 samples: 14397
7: 1.14% 92.77% avg cycles: 2862 cycles/page: 408 samples: 12441
8: 0.62% 93.39% avg cycles: 3542 cycles/page: 442 samples: 6721
9: 0.08% 93.47% avg cycles: 3289 cycles/page: 365 samples: 917
10: 0.43% 93.90% avg cycles: 3570 cycles/page: 357 samples: 4714
11: 0.20% 94.10% avg cycles: 3767 cycles/page: 342 samples: 2145
12: 0.08% 94.18% avg cycles: 3996 cycles/page: 333 samples: 864
13: 0.03% 94.20% avg cycles: 4077 cycles/page: 313 samples: 289
14: 0.02% 94.23% avg cycles: 4836 cycles/page: 345 samples: 236
15: 0.04% 94.26% avg cycles: 5699 cycles/page: 379 samples: 390
16: 0.06% 94.32% avg cycles: 5041 cycles/page: 315 samples: 643
17: 0.57% 94.89% avg cycles: 5473 cycles/page: 321 samples: 6229
18: 0.02% 94.91% avg cycles: 5396 cycles/page: 299 samples: 224
19: 0.03% 94.95% avg cycles: 5296 cycles/page: 278 samples: 367
20: 0.02% 94.96% avg cycles: 6749 cycles/page: 337 samples: 185
21: 0.18% 95.14% avg cycles: 6225 cycles/page: 296 samples: 1964
22: 0.01% 95.15% avg cycles: 6393 cycles/page: 290 samples: 83
23: 0.01% 95.16% avg cycles: 6861 cycles/page: 298 samples: 61
24: 0.12% 95.28% avg cycles: 6912 cycles/page: 288 samples: 1307
25: 0.05% 95.32% avg cycles: 7190 cycles/page: 287 samples: 533
26: 0.01% 95.33% avg cycles: 7793 cycles/page: 299 samples: 94
27: 0.01% 95.34% avg cycles: 7833 cycles/page: 290 samples: 66
28: 0.01% 95.35% avg cycles: 8253 cycles/page: 294 samples: 73
29: 0.08% 95.42% avg cycles: 8024 cycles/page: 276 samples: 846
30: 0.03% 95.45% avg cycles: 9670 cycles/page: 322 samples: 296
31: 0.01% 95.46% avg cycles: 8949 cycles/page: 288 samples: 79
32: 0.01% 95.46% avg cycles: 9350 cycles/page: 292 samples: 60
33: 3.11% 98.57% avg cycles: 8534 cycles/page: 258 samples: 33936
34: 0.02% 98.60% avg cycles: 10977 cycles/page: 322 samples: 268
35: 0.02% 98.62% avg cycles: 11400 cycles/page: 325 samples: 177
36: 0.01% 98.63% avg cycles: 11504 cycles/page: 319 samples: 161
37: 0.02% 98.65% avg cycles: 11596 cycles/page: 313 samples: 182
38: 0.02% 98.66% avg cycles: 11850 cycles/page: 311 samples: 195
39: 0.01% 98.68% avg cycles: 12158 cycles/page: 311 samples: 128
40: 0.01% 98.68% avg cycles: 11626 cycles/page: 290 samples: 78
41: 0.04% 98.73% avg cycles: 11435 cycles/page: 278 samples: 477
42: 0.01% 98.73% avg cycles: 12571 cycles/page: 299 samples: 74
43: 0.01% 98.74% avg cycles: 12562 cycles/page: 292 samples: 78
44: 0.01% 98.75% avg cycles: 12991 cycles/page: 295 samples: 108
45: 0.01% 98.76% avg cycles: 13169 cycles/page: 292 samples: 78
46: 0.02% 98.78% avg cycles: 12891 cycles/page: 280 samples: 261
47: 0.01% 98.79% avg cycles: 13099 cycles/page: 278 samples: 67
48: 0.01% 98.80% avg cycles: 13851 cycles/page: 288 samples: 77
49: 0.01% 98.80% avg cycles: 13749 cycles/page: 280 samples: 66
50: 0.01% 98.81% avg cycles: 13949 cycles/page: 278 samples: 73
52: 0.00% 98.82% avg cycles: 14243 cycles/page: 273 samples: 52
54: 0.01% 98.83% avg cycles: 15312 cycles/page: 283 samples: 87
55: 0.01% 98.84% avg cycles: 15197 cycles/page: 276 samples: 109
56: 0.02% 98.86% avg cycles: 15234 cycles/page: 272 samples: 208
57: 0.00% 98.86% avg cycles: 14888 cycles/page: 261 samples: 53
58: 0.01% 98.87% avg cycles: 15037 cycles/page: 259 samples: 59
59: 0.01% 98.87% avg cycles: 15752 cycles/page: 266 samples: 63
62: 0.00% 98.89% avg cycles: 16222 cycles/page: 261 samples: 54
64: 0.02% 98.91% avg cycles: 17179 cycles/page: 268 samples: 248
65: 0.12% 99.03% avg cycles: 18762 cycles/page: 288 samples: 1324
85: 0.00% 99.10% avg cycles: 21649 cycles/page: 254 samples: 50
127: 0.01% 99.18% avg cycles: 32397 cycles/page: 255 samples: 75
128: 0.13% 99.31% avg cycles: 31711 cycles/page: 247 samples: 1466
129: 0.18% 99.49% avg cycles: 33017 cycles/page: 255 samples: 1927
181: 0.33% 99.84% avg cycles: 2489 cycles/page: 13 samples: 3547
256: 0.05% 99.91% avg cycles: 2305 cycles/page: 9 samples: 550
512: 0.03% 99.95% avg cycles: 2133 cycles/page: 4 samples: 304
1512: 0.01% 99.99% avg cycles: 3038 cycles/page: 2 samples: 65

Here are the tlb counters during a 10-second slice of a kernel compile
for a SandyBridge system. It's better than IvyBridge, but probably
due to the larger caches since this was one of the 'X' extreme parts.

10,873,007,282 dtlb_load_misses_walk_duration
250,711,333 dtlb_load_misses_walk_completed
1,212,395,865 dtlb_store_misses_walk_duration
31,615,772 dtlb_store_misses_walk_completed
5,091,010,274 itlb_misses_walk_duration
163,193,511 itlb_misses_walk_completed
1,321,980 itlb_itlb_flush

10.008045158 seconds time elapsed

# cat perf.stat.1392743721.txt | perl -pe 's/,//g' | awk '/itlb_misses_walk_duration/ { icyc+=$1 } /itlb_misses_walk_completed/ { imiss+=$1 } /dtlb_.*_walk_duration/ { dcyc+=$1 } /dtlb_.*.*completed/ { dmiss+=$1 } END {print "itlb cyc/miss: ", icyc/imiss/3.3, " dtlb cyc/miss: ", dcyc/dmiss/3.3, " ----- ", icyc,imiss, dcyc,dmiss }'
itlb ns/miss: 9.45338 dtlb ns/miss: 12.9716
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154103.10C1115E@viggo.jf.intel.comAcked-by: NRik van Riel <riel@redhat.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

a5102476

x86/mm: New tunable for single vs full TLB flush · 2d040a1c

由 Dave Hansen 提交于 7月 31, 2014

Most of the logic here is in the documentation file.  Please take
a look at it.

I know we've come full-circle here back to a tunable, but this
new one is *WAY* simpler.  I challenge anyone to describe in one
sentence how the old one worked.  Here's the way the new one
works:

	If we are flushing more pages than the ceiling, we use
	the full flush, otherwise we use per-page flushes.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154101.12B52CAF@viggo.jf.intel.comAcked-by: NRik van Riel <riel@redhat.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

2d040a1c

x86/mm: Add tracepoints for TLB flushes · d17d8f9d

由 Dave Hansen 提交于 7月 31, 2014

We don't have any good way to figure out what kinds of flushes
are being attempted.  Right now, we can try to use the vm
counters, but those only tell us what we actually did with the
hardware (one-by-one vs full) and don't tell us what was actually
_requested_.

This allows us to select out "interesting" TLB flushes that we
might want to optimize (like the ranged ones) and ignore the ones
that we have very little control over (the ones at context
switch).
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154059.4C96CBA5@viggo.jf.intel.comAcked-by: NRik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d17d8f9d

x86/mm: Unify remote INVLPG code · a23421f1

由 Dave Hansen 提交于 7月 31, 2014

There are currently three paths through the remote flush code:

1. full invalidation
2. single page invalidation using invlpg
3. ranged invalidation using invlpg

This takes 2 and 3 and combines them in to a single path by
making the single-page one just be the start and end be start
plus a single page.  This makes placement of our tracepoint easier.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154058.E0F90408@viggo.jf.intel.com
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

a23421f1

x86/mm: Fix missed global TLB flush stat · 9dfa6dee

由 Dave Hansen 提交于 7月 31, 2014

If we take the

	if (end == TLB_FLUSH_ALL || vmflag & VM_HUGETLB) {
		local_flush_tlb();
		goto out;
	}

path out of flush_tlb_mm_range(), we will have flushed the tlb,
but not incremented NR_TLB_LOCAL_FLUSH_ALL.  This unifies the
way out of the function so that we always take a single path when
doing a full tlb flush.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154056.FF763B76@viggo.jf.intel.comAcked-by: NRik van Riel <riel@redhat.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9dfa6dee

x86/mm: Rip out complicated, out-of-date, buggy TLB flushing · e9f4e0a9

由 Dave Hansen 提交于 7月 31, 2014

I think the flush_tlb_mm_range() code that tries to tune the
flush sizes based on the CPU needs to get ripped out for
several reasons:

1. It is obviously buggy. It uses mm->total_vm to judge the
task's footprint in the TLB. It should certainly be using
some measure of RSS, *NOT* ->total_vm since only resident
memory can populate the TLB.
2. Haswell, and several other CPUs are missing from the
intel_tlb_flushall_shift_set() function. Thus, it has been
demonstrated to bitrot quickly in practice.
3. It is plain wrong in my vm:
[ 0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.037444] tlb_flushall_shift: 6
Which leads to it to never use invlpg.
4. The assumptions about TLB refill costs are wrong:
http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com
(more on this in later patches)
5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59
I believe the sample times were too short. Running the
benchmark in a loop yields times that vary quite a bit.

Note that this leaves us with a static ceiling of 1 page. This
is a conservative, dumb setting, and will be revised in a later
patch.

This also removes the code which attempts to predict whether we
are flushing data or instructions. We expect instruction flushes
to be relatively rare and not worth tuning for explicitly.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.comAcked-by: NRik van Riel <riel@redhat.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

e9f4e0a9

x86/mm: Clean up the TLB flushing code · 4995ab9c

由 Dave Hansen 提交于 7月 31, 2014

The

	if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids)

line of code is not exactly the easiest to audit, especially when
it ends up at two different indentation levels.  This eliminates
one of the the copy-n-paste versions.  It also gives us a unified
exit point for each path through this function.  We need this in
a minute for our tracepoint.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/r/20140731154054.44F1CDDC@viggo.jf.intel.comAcked-by: NRik van Riel <riel@redhat.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

4995ab9c

x86/kvm: Resolve shadow warnings in macro expansion · 42cbc04f

由 Mark D Rustad 提交于 7月 30, 2014

Resolve shadow warnings that appear in W=2 builds. Instead of
using ret to hold the return pointer, save the length in a new
variable saved_len and compute the pointer on exit. This also
resolves a very technical error, in that ret was declared as
a const char *, when it really was a char * const.
Signed-off-by: NMark Rustad <mark.d.rustad@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

42cbc04f

Revert "arm64: dmi: Add SMBIOS/DMI support" · 94156675

由 Will Deacon 提交于 7月 31, 2014

This reverts commit a28e3f4b.

Ard and Yi Li report that this patch is broken by design, so revert it
and let them sort it out for 3.18 instead.
Reported-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

94156675

arm64: fpsimd: fix a typo in fpsimd_save_partial_state ENDPROC · e4aa297a

由 byungchul.park 提交于 7月 31, 2014

Commit 190f1ca8 ("arm64: add support for kernel mode NEON in interrupt
context") introduced a typing error in fpsimd_save_partial_state ENDPROC.

This patch fixes the typing error.
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Nbyungchul.park <byungchul.park@lge.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

e4aa297a

arm64: don't call break hooks for BRK exceptions from EL0 · c878e0cf

由 Will Deacon 提交于 7月 31, 2014

Our break hooks are used to handle brk exceptions from kgdb (and potentially
kprobes if that code ever resurfaces), so don't bother calling them if
the BRK exception comes from userspace.

This prevents userspace from trapping to a kdb shell on systems where
kgdb is enabled and active.

Cc: <stable@vger.kernel.org>
Reported-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

c878e0cf

KVM: s390: rework broken SIGP STOP interrupt handling · db373861

由 David Hildenbrand 提交于 7月 28, 2014

A VCPU might never stop if it intercepts (for whatever reason) between
"fake interrupt delivery" and execution of the stop function.

Heart of the problem is that SIGP STOP is an interrupt that has to be
processed on every SIE entry until the VCPU finally executes the stop
function.

This problem was made apparent by commit 7dfc63cf
(KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time).
With the old code, the guest could (incorrectly) inject SIGP STOPs
multiple times. The bug of losing a sigp stop exists in KVM before
7dfc63cf, but it was hidden by Linux guests doing a sigp stop loop.
The new code (rightfully) returns CC=2 and does not queue a new
interrupt.

This patch is a simple fix of the problem. Longterm we are going to
rework that code - e.g. get rid of the action bits and so on.
Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
[some additional patch description]

db373861

ARM: nomadik: fix up double inversion in DT · 3181788c

由 Linus Walleij 提交于 7月 25, 2014

The GPIO pin connected to card detect was inverted twice: once by
the argument to the GPIO line itself where it was magically marked
as active low by the flag GPIO_ACTIVE_LOW (0x01) in the third cell,
and also marked active low AGAIN by explicitly stating
"cd-inverted" (a deprecated method).

After commit 78f87df2
"mmc: mmci: Use the common mmc DT parser" this results in the
line being inverted twice so it was effectively uninverted, while
the old code would not have this effect, instead disregarding the
flag on the GPIO line altogether, which is a bug. I admit the
semantics may be unclear but inverting twice is as good a
definition as any on how this should work.

So fix up the buggy device tree. Use proper #includes so the DTS
is clear and readable.

Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NOlof Johansson <olof@lixom.net>

3181788c

30 7月, 2014 8 次提交

arm64: defconfig: enable devtmpfs mount option · 3666f880

由 Robert Richter 提交于 7月 30, 2014

Matching x86 and making it more convenient to run the arm64 default
kernel as distros like Ubuntu need this option.
Signed-off-by: NRobert Richter <rrichter@cavium.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

3666f880

KVM: vmx: remove duplicate vmx_mpx_supported() prototype · 296f0475

由 Chris J Arges 提交于 7月 29, 2014

Remove a prototype which was added by both 93c4adc7 and 36be0b9d.
Signed-off-by: NChris J Arges <chris.j.arges@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

296f0475

arm64: vdso: fix build error when switching from LE to BE · 1915e2ad

由 Arun Chandran 提交于 6月 26, 2014

Building a kernel with CPU_BIG_ENDIAN fails if there are stale objects
from a !CPU_BIG_ENDIAN build. Due to a missing FORCE prerequisite on an
if_changed rule in the VDSO Makefile, we attempt to link a stale LE
object into the new BE kernel.

According to Documentation/kbuild/makefiles.txt, FORCE is required for
if_changed rules and forgetting it is a common mistake, so fix it by
'Forcing' the build of vdso. This patch fixes build errors like these:

arch/arm64/kernel/vdso/note.o: compiled for a little endian system and target is big endian
failed to merge target specific data of file arch/arm64/kernel/vdso/note.o

arch/arm64/kernel/vdso/sigreturn.o: compiled for a little endian system and target is big endian
failed to merge target specific data of file arch/arm64/kernel/vdso/sigreturn.o
Tested-by: NMark Rutland <mark.rutland@arm.com>
Signed-off-by: NArun Chandran <achandran@mvista.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

1915e2ad

KVM: s390: Fix memory leak on busy SIGP stop · d514f426

由 Christian Borntraeger 提交于 7月 28, 2014

commit 7dfc63cf
(KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time)
introduced a memory leak if a sigp stop is already pending. Free
the allocated inti structure.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>

d514f426

x86/xen: safely map and unmap grant frames when in atomic context · b7dd0e35

由 David Vrabel 提交于 7月 11, 2014

arch_gnttab_map_frames() and arch_gnttab_unmap_frames() are called in
atomic context but were calling alloc_vm_area() which might sleep.

Also, if a driver attempts to allocate a grant ref from an interrupt
and the table needs expanding, then the CPU may already by in lazy MMU
mode and apply_to_page_range() will BUG when it tries to re-enable
lazy MMU mode.

These two functions are only used in PV guests.

Introduce arch_gnttab_init() to allocates the virtual address space in
advance.

Avoid the use of apply_to_page_range() by using saving and using the
array of PTE addresses from the alloc_vm_area() call (which ensures
that the required page tables are pre-allocated).
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b7dd0e35

arm: Add devicetree fixup machine function · 5a12a597

由 Laura Abbott 提交于 7月 15, 2014

Commit 1c2f87c2
(ARM: 8025/1: Get rid of meminfo) dropped the upper bound on
the number of memory banks that can be added as there was no
technical need in the kernel. It turns out though, some bootloaders
(specifically the arndale-octa exynos boards) may pass invalid memory
information and rely on the kernel to not parse this data. This is a
bug in the bootloader but we still need to work around this.
Work around this by introducing a dt_fixup function. This function
gets called before the flattened devicetree is scanned for memory
and the like. In this fixup function for exynos, limit the maximum
number of memory regions in the devicetree.
Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
Tested-by: NAndreas Färber <afaerber@suse.de>
[glikely: Added a comment and fixed up function name]
Signed-off-by: NGrant Likely <grant.likely@linaro.org>

5a12a597

[IA64] sn: Do not needlessly convert between pointers and integers · 6b15075c

由 Thierry Reding 提交于 7月 28, 2014

The nasid_to_try variable is an array of integers, so plain integers can
be used when assigning values to the elements rather than casting a NULL
pointer to an integer, which results in the following warning from GCC:

	arch/ia64/sn/kernel/bte.c:117:22: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
	    nasid_to_try[1] = (int)NULL;
	                      ^
	arch/ia64/sn/kernel/bte.c:125:22: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
	    nasid_to_try[1] = (int)NULL;
	                      ^

Replace (int)NULL with a simple 0 to silence these warnings.
Signed-off-by: NThierry Reding <treding@nvidia.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

6b15075c

[IA64] sn: Fix zeroing of PDAs · 882d6f38

由 Thierry Reding 提交于 7月 28, 2014

The code uses a the following to zero out a PDA:

	memset(pda, 0, sizeof(pda));

But sizeof(pda) will return the size of a pointer rather than the size
of the structure pointed to. This triggers the following warning from
GCC:

	arch/ia64/sn/kernel/setup.c:582:23: warning: argument to 'sizeof' in 'memset' call is the same pointer type 'struct pda_s *' as the destination; expected 'struct pda_s' or an explicit length [-Wsizeof-pointer-memaccess]
	  memset(pda, 0, sizeof(pda));
	                       ^

Fix this by passing in the size of the structure using sizeof(*pda)
instead.
Signed-off-by: NThierry Reding <treding@nvidia.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

882d6f38

29 7月, 2014 3 次提交

arm64: defconfig: add virtio support for running as a kvm guest · af9b9964

由 Will Deacon 提交于 7月 29, 2014

When running as a kvm guest on a para-virtualised platform, it is useful
to have virtio implementations of console, 9pfs and network.

This adds these options to the arm64 defconfig, so we can easily run a
defconfig kernel build as both host and as a kvm guest.
Signed-off-by: NWill Deacon <will.deacon@arm.com>

af9b9964

ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout · 811a2407

由 Konstantin Khlebnikov 提交于 7月 25, 2014

On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2
(pmd) entries map 2MiB.

When the identity mapping is created on LPAE, the pgd pointers are copied
from the swapper_pg_dir.  If we find that we need to modify the contents
of a pmd, we allocate a new empty pmd table and insert it into the
appropriate 1GB slot, before then filling it with the identity mapping.

However, if the 1GB slot covers the kernel lowmem mappings, we obliterate
those mappings.

When replacing a PMD, first copy the old PMD contents to the new PMD, so
that we preserve the existing mappings, particularly the mappings of the
kernel itself.

[rewrote commit message and added code comment -- rmk]

Fixes: ae2de101 ("ARM: LPAE: Add identity mapping support for the 3-level page table format")
Signed-off-by: NKonstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

811a2407

ARM: fix alignment of keystone page table fixup · 823a19cd

由 Russell King 提交于 7月 29, 2014

If init_mm.brk is not section aligned, the LPAE fixup code will miss
updating the final PMD.  Fix this by aligning map_end.

Fixes: a77e0c7b ("ARM: mm: Recreate kernel mappings in early_paging_init()")
Cc: <stable@vger.kernel.org>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

823a19cd

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功