提交 · aa5b395b69b65450e008b95ec623b4fc4b175f9f · openeuler / Kernel

01 2月, 2020 3 次提交

lib/zlib: add s390 hardware support for kernel zlib_deflate · aa5b395b

由 Mikhail Zaslonko 提交于 1月 30, 2020

Patch series "S390 hardware support for kernel zlib", v3.

With IBM z15 mainframe the new DFLTCC instruction is available.  It
implements deflate algorithm in hardware (Nest Acceleration Unit - NXU)
with estimated compression and decompression performance orders of
magnitude faster than the current zlib.

This patchset adds s390 hardware compression support to kernel zlib.
The code is based on the userspace zlib implementation:

	https://github.com/madler/zlib/pull/410

The coding style is also preserved for future maintainability.  There is
only limited set of userspace zlib functions represented in kernel.
Apart from that, all the memory allocation should be performed in
advance.  Thus, the workarea structures are extended with the parameter
lists required for the DEFLATE CONVENTION CALL instruction.

Since kernel zlib itself does not support gzip headers, only Adler-32
checksum is processed (also can be produced by DFLTCC facility).  Like
it was implemented for userspace, kernel zlib will compress in hardware
on level 1, and in software on all other levels.  Decompression will
always happen in hardware (when enabled).

Two DFLTCC compression calls produce the same results only when they
both are made on machines of the same generation, and when the
respective buffers have the same offset relative to the start of the
page.  Therefore care should be taken when using hardware compression
when reproducible results are desired.  However it does always produce
the standard conform output which can be inflated anyway.

The new kernel command line parameter 'dfltcc' is introduced to
configure s390 zlib hardware support:

    Format: { on | off | def_only | inf_only | always }
     on:       s390 zlib hardware support for compression on
               level 1 and decompression (default)
     off:      No s390 zlib hardware support
     def_only: s390 zlib hardware support for deflate
               only (compression on level 1)
     inf_only: s390 zlib hardware support for inflate
               only (decompression)
     always:   Same as 'on' but ignores the selected compression
               level always using hardware support (used for debugging)

The main purpose of the integration of the NXU support into the kernel
zlib is the use of hardware deflate in btrfs filesystem with on-the-fly
compression enabled.  Apart from that, hardware support can also be used
during boot for decompressing the kernel or the ramdisk image

With the patch for btrfs expanding zlib buffer from 1 to 4 pages (patch
6) the following performance results have been achieved using the
ramdisk with btrfs.  These are relative numbers based on throughput rate
and compression ratio for zlib level 1:

  Input data              Deflate rate   Inflate rate   Compression ratio
                          NXU/Software   NXU/Software   NXU/Software
  stream of zeroes        1.46           1.02           1.00
  random ASCII data       10.44          3.00           0.96
  ASCII text (dickens)    6,21           3.33           0.94
  binary data (vmlinux)   8,37           3.90           1.02

This means that s390 hardware deflate can provide up to 10 times faster
compression (on level 1) and up to 4 times faster decompression (refers
to all compression levels) for btrfs zlib.

Disclaimer: Performance results are based on IBM internal tests using DD
command-line utility on btrfs on a Fedora 30 based internal driver in
native LPAR on a z15 system.  Results may vary based on individual
workload, configuration and software levels.

This patch (of 9):

Create zlib_dfltcc library with the s390 DEFLATE CONVERSION CALL
implementation and related compression functions.  Update zlib_deflate
functions with the hooks for s390 hardware support and adjust workspace
structures with extra parameter lists required for hardware deflate.

Link: http://lkml.kernel.org/r/20200103223334.20669-2-zaslonko@linux.ibm.comSigned-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: NMikhail Zaslonko <zaslonko@linux.ibm.com>
Co-developed-by: NIlya Leoshkevich <iii@linux.ibm.com>
Cc: Chris Mason <clm@fb.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Eduard Shishkin <edward6@linux.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa5b395b

lib/test_kasan.c: fix memory leak in kmalloc_oob_krealloc_more() · 3e21d9a5

由 Gustavo A. R. Silva 提交于 1月 30, 2020

In case memory resources for _ptr2_ were allocated, release them before
return.

Notice that in case _ptr1_ happens to be NULL, krealloc() behaves
exactly like kmalloc().

Addresses-Coverity-ID: 1490594 ("Resource leak")
Link: http://lkml.kernel.org/r/20200123160115.GA4202@embeddedor
Fixes: 3f15801c ("lib: add kasan test module")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e21d9a5

lib/test_bitmap: correct test data offsets for 32-bit · 69334ca5

由 Andy Shevchenko 提交于 1月 30, 2020

On 32-bit platform the size of long is only 32 bits which makes wrong
offset in the array of 64 bit size.

Calculate offset based on BITS_PER_LONG.

Link: http://lkml.kernel.org/r/20200109103601.45929-1-andriy.shevchenko@linux.intel.com
Fixes: 30544ed5 ("lib/bitmap: introduce bitmap_replace() helper")
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69334ca5

27 1月, 2020 1 次提交

bitmap: Introduce bitmap_cut(): cut bits and shift remaining · 20927671

由 Stefano Brivio 提交于 1月 22, 2020

The new bitmap function bitmap_cut() copies bits from source to
destination by removing the region specified by parameters first
and cut, and remapping the bits above the cut region by right
shifting them.
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

20927671

25 1月, 2020 1 次提交

lib: Reduce user_access_begin() boundaries in strncpy_from_user() and strnlen_user() · ab10ae1c

由 Christophe Leroy 提交于 1月 23, 2020

The range passed to user_access_begin() by strncpy_from_user() and
strnlen_user() starts at 'src' and goes up to the limit of userspace
although reads will be limited by the 'count' param.

On 32 bits powerpc (book3s/32) access has to be granted for each
256Mbytes segment and the cost increases with the number of segments to
unlock.

Limit the range with 'count' param.

Fixes: 594cc251 ("make 'user_access_begin()' do 'access_ok()'")
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab10ae1c

24 1月, 2020 1 次提交

lib: crc64: include <linux/crc64.h> for 'crc64_be' · 0e0c1231

由 Ben Dooks (Codethink) 提交于 1月 24, 2020

The crc64_be() is declared in <linux/crc64.h> so include
this where the symbol is defined to avoid the following
warning:

lib/crc64.c:43:12: warning: symbol 'crc64_be' was not declared. Should it be static?
Signed-off-by: NBen Dooks (Codethink) <ben.dooks@codethink.co.uk>
Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NColy Li <colyli@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

0e0c1231

22 1月, 2020 1 次提交

crypto: chacha20poly1305 - add back missing test vectors and test chunking · 72c79437

由 Jason A. Donenfeld 提交于 1月 16, 2020

When this was originally ported, the 12-byte nonce vectors were left out
to keep things simple. I agree that we don't need nor want a library
interface for 12-byte nonces. But these test vectors were specially
crafted to look at issues in the underlying primitives and related
interactions. Therefore, we actually want to keep around all of the
test vectors, and simply have a helper function to test them with.

Secondly, the sglist-based chunking code in the library interface is
rather complicated, so this adds a developer-only test for ensuring that
all the book keeping is correct, across a wide array of possibilities.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

72c79437

18 1月, 2020 3 次提交

XArray: Fix xas_find returning too many entries · c44aa5e8

由 Matthew Wilcox (Oracle) 提交于 1月 17, 2020

If you call xas_find() with the initial index > max, it should have
returned NULL but was returning the entry at index.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Cc: stable@vger.kernel.org

c44aa5e8

XArray: Fix xa_find_after with multi-index entries · 19c30f4d

由 Matthew Wilcox (Oracle) 提交于 1月 17, 2020

If the entry is of an order which is a multiple of XA_CHUNK_SIZE,
the current detection of sibling entries does not work.  Factor out
an xas_sibling() function to make xa_find_after() a little more
understandable, and write a new implementation that doesn't suffer from
the same bug.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Cc: stable@vger.kernel.org

19c30f4d

XArray: Fix infinite loop with entry at ULONG_MAX · 430f24f9

由 Matthew Wilcox (Oracle) 提交于 1月 17, 2020

If there is an entry at ULONG_MAX, xa_for_each() will overflow the
'index + 1' in xa_find_after() and wrap around to 0.  Catch this case
and terminate the loop by returning NULL.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Cc: stable@vger.kernel.org

430f24f9

17 1月, 2020 4 次提交

debugobjects: Fix various data races · 35fd7a63

由 Marco Elver 提交于 1月 16, 2020

The counters obj_pool_free, and obj_nr_tofree, and the flag obj_freeing are
read locklessly outside the pool_lock critical sections. If read with plain
accesses, this would result in data races.

This is addressed as follows:

 * reads outside critical sections become READ_ONCE()s (pairing with
   WRITE_ONCE()s added);

 * writes become WRITE_ONCE()s (pairing with READ_ONCE()s added); since
   writes happen inside critical sections, only the write and not the read
   of RMWs needs to be atomic, thus WRITE_ONCE(var, var +/- X) is
   sufficient.

The data races were reported by KCSAN:

  BUG: KCSAN: data-race in __free_object / fill_pool

  write to 0xffffffff8beb04f8 of 4 bytes by interrupt on cpu 1:
   __free_object+0x1ee/0x8e0 lib/debugobjects.c:404
   __debug_check_no_obj_freed+0x199/0x330 lib/debugobjects.c:969
   debug_check_no_obj_freed+0x3c/0x44 lib/debugobjects.c:994
   slab_free_hook mm/slub.c:1422 [inline]

  read to 0xffffffff8beb04f8 of 4 bytes by task 1 on cpu 2:
   fill_pool+0x3d/0x520 lib/debugobjects.c:135
   __debug_object_init+0x3c/0x810 lib/debugobjects.c:536
   debug_object_init lib/debugobjects.c:591 [inline]
   debug_object_activate+0x228/0x320 lib/debugobjects.c:677
   debug_rcu_head_queue kernel/rcu/rcu.h:176 [inline]

  BUG: KCSAN: data-race in __debug_object_init / fill_pool

  read to 0xffffffff8beb04f8 of 4 bytes by task 10 on cpu 6:
   fill_pool+0x3d/0x520 lib/debugobjects.c:135
   __debug_object_init+0x3c/0x810 lib/debugobjects.c:536
   debug_object_init_on_stack+0x39/0x50 lib/debugobjects.c:606
   init_timer_on_stack_key kernel/time/timer.c:742 [inline]

  write to 0xffffffff8beb04f8 of 4 bytes by task 1 on cpu 3:
   alloc_object lib/debugobjects.c:258 [inline]
   __debug_object_init+0x717/0x810 lib/debugobjects.c:544
   debug_object_init lib/debugobjects.c:591 [inline]
   debug_object_activate+0x228/0x320 lib/debugobjects.c:677
   debug_rcu_head_queue kernel/rcu/rcu.h:176 [inline]

  BUG: KCSAN: data-race in free_obj_work / free_object

  read to 0xffffffff9140c190 of 4 bytes by task 10 on cpu 6:
   free_object+0x4b/0xd0 lib/debugobjects.c:426
   debug_object_free+0x190/0x210 lib/debugobjects.c:824
   destroy_timer_on_stack kernel/time/timer.c:749 [inline]

  write to 0xffffffff9140c190 of 4 bytes by task 93 on cpu 1:
   free_obj_work+0x24f/0x480 lib/debugobjects.c:313
   process_one_work+0x454/0x8d0 kernel/workqueue.c:2264
   worker_thread+0x9a/0x780 kernel/workqueue.c:2410
Reported-by: NQian Cai <cai@lca.pw>
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200116185529.11026-1-elver@google.com

35fd7a63

livepatch/samples/selftest: Use klp_shadow_alloc() API correctly · be6da984

由 Petr Mladek 提交于 1月 16, 2020

The commit e91c2518 ("livepatch: Initialize shadow variables
safely by a custom callback") leads to the following static checker
warning:

  samples/livepatch/livepatch-shadow-fix1.c:86 livepatch_fix1_dummy_alloc()
  error: 'klp_shadow_alloc()' 'leak' too small (4 vs 8)

It is because klp_shadow_alloc() is used a wrong way:

  int *leak;
  shadow_leak = klp_shadow_alloc(d, SV_LEAK, sizeof(leak), GFP_KERNEL,
				 shadow_leak_ctor, leak);

The code is supposed to store the "leak" pointer into the shadow variable.
3rd parameter correctly passes size of the data (size of pointer). But
the 5th parameter is wrong. It should pass pointer to the data (pointer
to the pointer) but it passes the pointer directly.

It works because shadow_leak_ctor() handle "ctor_data" as the data
instead of pointer to the data. But it is semantically wrong and
confusing.

The same problem is also in the module used by selftests. In this case,
"pvX" variables are introduced. They represent the data stored in
the shadow variables.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Reviewed-by: NJoe Lawrence <joe.lawrence@redhat.com>
Acked-by: NMiroslav Benes <mbenes@suse.cz>
Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

be6da984

livepatch/selftest: Clean up shadow variable names and type · c24c57a4

由 Petr Mladek 提交于 1月 16, 2020

The shadow variable selftest is quite tricky. Especially it is problematic
to understand what values are stored, returned, and printed.

Make it easier to understand by using "int *var, **sv" variables
consistently everywhere instead of the generic "void *", "ret",
and "ctor_data".
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Reviewed-by: NJoe Lawrence <joe.lawrence@redhat.com>
Acked-by: NMiroslav Benes <mbenes@suse.cz>
Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

c24c57a4

lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres() · 49a101d7

由 Christophe Leroy 提交于 1月 16, 2020

Only perform READ_ONCE(vd[CS_HRES_COARSE].hrtimer_res) for
HRES and RAW clocks.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/7ac2f0d21652f95e2bbdfa6bd514ae6c7caf53ab.1579196675.git.christophe.leroy@c-s.fr

49a101d7

16 1月, 2020 3 次提交

crypto: curve25519 - Fix selftest build error · a8bdf2c4

由 Herbert Xu 提交于 1月 08, 2020

If CRYPTO_CURVE25519 is y, CRYPTO_LIB_CURVE25519_GENERIC will be
y, but CRYPTO_LIB_CURVE25519 may be set to m, this causes build
errors:

lib/crypto/curve25519-selftest.o: In function `curve25519':
curve25519-selftest.c:(.text.unlikely+0xc): undefined reference to `curve25519_arch'
lib/crypto/curve25519-selftest.o: In function `curve25519_selftest':
curve25519-selftest.c:(.init.text+0x17e): undefined reference to `curve25519_base_arch'

This is because the curve25519 self-test code is being controlled
by the GENERIC option rather than the overall CURVE25519 option,
as is the case with blake2s.  To recap, the GENERIC and ARCH options
for CURVE25519 are internal only and selected by users such as
the Crypto API, or the externally visible CURVE25519 option which
in turn is selected by wireguard.  The self-test is specific to the
the external CURVE25519 option and should not be enabled by the
Crypto API.

This patch fixes this by splitting the GENERIC module from the
CURVE25519 module with the latter now containing just the self-test.
Reported-by: NHulk Robot <hulkci@huawei.com>
Fixes: aa127963 ("crypto: lib/curve25519 - re-add selftests")
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

a8bdf2c4

crypto: x86/poly1305 - wire up faster implementations for kernel · d7d7b853

由 Jason A. Donenfeld 提交于 1月 05, 2020

These x86_64 vectorized implementations support AVX, AVX-2, and AVX512F.
The AVX-512F implementation is disabled on Skylake, due to throttling,
but it is quite fast on >= Cannonlake.

On the left is cycle counts on a Core i7 6700HQ using the AVX-2
codepath, comparing this implementation ("new") to the implementation in
the current crypto api ("old"). On the right are benchmarks on a Xeon
Gold 5120 using the AVX-512 codepath. The new implementation is faster
on all benchmarks.

        AVX-2                  AVX-512
      ---------              -----------

    size    old     new      size   old     new
    ----    ----    ----     ----   ----    ----
    0       70      68       0      74      70
    16      92      90       16     96      92
    32      134     104      32     136     106
    48      172     120      48     184     124
    64      218     136      64     218     138
    80      254     158      80     260     160
    96      298     174      96     300     176
    112     342     192      112    342     194
    128     388     212      128    384     212
    144     428     228      144    420     226
    160     466     246      160    464     248
    176     510     264      176    504     264
    192     550     282      192    544     282
    208     594     302      208    582     300
    224     628     316      224    624     318
    240     676     334      240    662     338
    256     716     354      256    708     358
    272     764     374      272    748     372
    288     802     352      288    788     358
    304     420     366      304    422     370
    320     428     360      320    432     364
    336     484     378      336    486     380
    352     426     384      352    434     390
    368     478     400      368    480     408
    384     488     394      384    490     398
    400     542     408      400    542     412
    416     486     416      416    492     426
    432     534     430      432    538     436
    448     544     422      448    546     432
    464     600     438      464    600     448
    480     540     448      480    548     456
    496     594     464      496    594     476
    512     602     456      512    606     470
    528     656     476      528    656     480
    544     600     480      544    606     498
    560     650     494      560    652     512
    576     664     490      576    662     508
    592     714     508      592    716     522
    608     656     514      608    664     538
    624     708     532      624    710     552
    640     716     524      640    720     516
    656     770     536      656    772     526
    672     716     548      672    722     544
    688     770     562      688    768     556
    704     774     552      704    778     556
    720     826     568      720    832     568
    736     768     574      736    780     584
    752     822     592      752    826     600
    768     830     584      768    836     560
    784     884     602      784    888     572
    800     828     610      800    838     588
    816     884     628      816    884     604
    832     888     618      832    894     598
    848     942     632      848    946     612
    864     884     644      864    896     628
    880     936     660      880    942     644
    896     948     652      896    952     608
    912     1000    664      912    1004    616
    928     942     676      928    954     634
    944     994     690      944    1000    646
    960     1002    680      960    1008    646
    976     1054    694      976    1062    658
    992     1002    706      992    1012    674
    1008    1052    720      1008   1058    690

This commit wires in the prior implementation from Andy, and makes the
following changes to be suitable for kernel land.

  - Some cosmetic and structural changes, like renaming labels to
    .Lname, constants, and other Linux conventions, as well as making
    the code easy for us to maintain moving forward.

  - CPU feature checking is done in C by the glue code.

  - We avoid jumping into the middle of functions, to appease objtool,
    and instead parameterize shared code.

  - We maintain frame pointers so that stack traces make sense.

  - We remove the dependency on the perl xlate code, which transforms
    the output into things that assemblers we don't care about use.

Importantly, none of our changes affect the arithmetic or core code, but
just involve the differing environment of kernel space.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NSamuel Neves <sneves@dei.uc.pt>
Co-developed-by: NSamuel Neves <sneves@dei.uc.pt>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

d7d7b853

crypto: poly1305 - add new 32 and 64-bit generic versions · 1c08a104

由 Jason A. Donenfeld 提交于 1月 05, 2020

These two C implementations from Zinc -- a 32x32 one and a 64x64 one,
depending on the platform -- come from Andrew Moon's public domain
poly1305-donna portable code, modified for usage in the kernel. The
precomputation in the 32-bit version and the use of 64x64 multiplies in
the 64-bit version make these perform better than the code it replaces.
Moon's code is also very widespread and has received many eyeballs of
scrutiny.

There's a bit of interference between the x86 implementation, which
relies on internal details of the old scalar implementation. In the next
commit, the x86 implementation will be replaced with a faster one that
doesn't rely on this, so none of this matters much. But for now, to keep
this passing the tests, we inline the bits of the old implementation
that the x86 implementation relied on. Also, since we now support a
slightly larger key space, via the union, some offsets had to be fixed
up.

Nonce calculation was folded in with the emit function, to take
advantage of 64x64 arithmetic. However, Adiantum appeared to rely on no
nonce handling in emit, so this path was conditionalized. We also
introduced a new struct, poly1305_core_key, to represent the precise
amount of space that particular implementation uses.

Testing with kbench9000, depending on the CPU, the update function for
the 32x32 version has been improved by 4%-7%, and for the 64x64 by
19%-30%. The 32x32 gains are small, but I think there's great value in
having a parallel implementation to the 64x64 one so that the two can be
compared side-by-side as nice stand-alone units.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

1c08a104

14 1月, 2020 9 次提交

lib/vdso: Prepare for time namespace support · 660fd04f

由 Thomas Gleixner 提交于 11月 12, 2019

To support time namespaces in the vdso with a minimal impact on regular non
time namespace affected tasks, the namespace handling needs to be hidden in
a slow path.

The most obvious place is vdso_seq_begin(). If a task belongs to a time
namespace then the VVAR page which contains the system wide vdso data is
replaced with a namespace specific page which has the same layout as the
VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
namespace handling path.

The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the vdso data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.

If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.

If VDSO time namespace support is disabled the whole magic is compiled out.

Initial testing shows that the disabled case is almost identical to the
host case which does not take the slow timens path. With the special timens
page installed the performance hit is constant time and in the range of
5-7%.

For the vdso functions which are not using the sequence count an
unconditional check for vdso_data->clock_mode is added which switches to
the real vdso when the clock_mode is VCLOCK_TIMENS.

[avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data
 pointer by CS_RAW offset.]
Suggested-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Signed-off-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com

660fd04f

lib/vdso: Mark do_hres() and do_coarse() as __always_inline · c966533f

由 Andrei Vagin 提交于 11月 12, 2019

Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
(more clock_gettime() cycles - the better):

clock            | before     | after      | diff
----------------------------------------------------------
monotonic        |  153222105 |  166775025 | 8.8%
monotonic-coarse |  671557054 |  691513017 | 3.0%
monotonic-raw    |  147116067 |  161057395 | 9.5%
boottime         |  153446224 |  166962668 | 9.1%

The improvement for arm64 for monotonic and boottime is around 3.5%.

clock            | before     | after      | diff
==================================================
monotonic          17326692     17951770     3.6%
monotonic-coarse   43624027     44215292     1.3%
monotonic-raw      17541809     17554932     0.1%
boottime           17334982     17954361     3.5%

[ tglx: Avoid the goto ]
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Signed-off-by: NDmitry Safonov <dima@arista.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-3-dima@arista.com

c966533f

lib/vdso: Avoid duplication in __cvdso_clock_getres() · cdb7c5a9

由 Christophe Leroy 提交于 12月 23, 2019

VDSO_HRES and VDSO_RAW clocks are handled the same way.

Avoid the code duplication.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAndy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/fdf1a968a8f7edd61456f1689ac44082ebb19c15.1577111367.git.christophe.leroy@c-s.fr

cdb7c5a9

lib/vdso: Let do_coarse() return 0 to simplify the callsite · 8463cf80

由 Christophe Leroy 提交于 12月 23, 2019

do_coarse() is similar to do_hres() except that it never fails.

Change its type to int instead of void and let it always return success (0)
to simplify the call site.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/21e8afa38c02ca8672c2690307383507fe63b454.1577111367.git.christophe.leroy@c-s.fr

8463cf80

lib/vdso: Remove checks on return value for 32 bit vDSO · a279235d

由 Vincenzo Frascino 提交于 8月 30, 2019

Since all the architectures that support the generic vDSO library have
been converted to support the 32 bit fallbacks it is not required
anymore to check the return value of __cvdso_clock_get*time32_common()
before updating the old_timespec fields.

Remove the related checks from the generic vdso library.

References: c60a32ea ("lib/vdso/32: Provide legacy syscall fallbacks")
Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20190830135902.20861-6-vincenzo.frascino@arm.com

a279235d

lib/vdso: Remove VDSO_HAS_32BIT_FALLBACK · b767081c

由 Vincenzo Frascino 提交于 8月 30, 2019

VDSO_HAS_32BIT_FALLBACK was introduced to address a regression which
caused seccomp to deny access to the applications to clock_gettime64()
and clock_getres64() because they are not enabled in the existing
filters.

The purpose of VDSO_HAS_32BIT_FALLBACK was to simplify the conditional
implementation of __cvdso_clock_get*time32() variants.

Now that all the architectures that support the generic vDSO library
have been converted to support the 32 bit fallbacks the conditional
can be removed.
Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20190830135902.20861-5-vincenzo.frascino@arm.com

References: c60a32ea ("lib/vdso/32: Provide legacy syscall fallbacks")

b767081c

lib/vdso: Build 32 bit specific functions in the right context · bf279849

由 Vincenzo Frascino 提交于 8月 30, 2019

clock_gettime32 and clock_getres_time32 should be compiled only with a
32 bit vdso library.

Exclude these symbols when BUILD_VDSO32 is not defined.
Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAndy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20190830135902.20861-3-vincenzo.frascino@arm.com

bf279849

md/raid6: fix algorithm choice under larger PAGE_SIZE · f591df3c

由 Zhengyuan Liu 提交于 12月 20, 2019

There are several algorithms available for raid6 to generate xor and syndrome
parity, including basic int1, int2 ... int32 and SIMD optimized implementation
like sse and neon. To test and choose the best algorithms at the initial
stage, we need provide enough disk data to feed the algorithms. However, the
disk number we provided depends on page size and gfmul table, seeing bellow:

const int disks = (65536/PAGE_SIZE) + 2;

So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk,
as a result the chosed algorithm is not reliable. For example, on my arm64
machine with 64K page enabled, it will choose intx32 as the best one, although
the NEON implementation is better.

This patch tries to fix the problem by defining a constant raid6 disk number to
supporting arbitrary page size.
Suggested-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: NSong Liu <songliubraving@fb.com>

f591df3c

raid6/test: fix a compilation warning · 5e5ac01c

由 Zhengyuan Liu 提交于 12月 20, 2019

The compilation warning is redefination showed as following:

        In file included from tables.c:2:
        ../../../include/linux/export.h:180: warning: "EXPORT_SYMBOL" redefined
         #define EXPORT_SYMBOL(sym)  __EXPORT_SYMBOL(sym, "")

        In file included from tables.c:1:
        ../../../include/linux/raid/pq.h:61: note: this is the location of the previous definition
         #define EXPORT_SYMBOL(sym)

Fixes: 69a94abb ("export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols")
Signed-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: NSong Liu <songliubraving@fb.com>

5e5ac01c

11 1月, 2020 1 次提交

lib/vdso: Make __cvdso_clock_getres() static · ffd08731

由 Vincenzo Frascino 提交于 11月 28, 2019

Fix the following sparse warning in the generic vDSO library:

  linux/lib/vdso/gettimeofday.c:224:5: warning: symbol
  '__cvdso_clock_getres' was not declared. Should it be static?

Make it static and also mark it __maybe_unsed.

Fixes: 502a590a ("lib/vdso: Move fallback invocation to the callers")
Reported-by: NMarc Gonzalez <marc.w.gonzalez@free.fr>
Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191128111719.8282-1-vincenzo.frascino@arm.com

ffd08731

10 1月, 2020 5 次提交

kunit: allow kunit to be loaded as a module · 9fe124bf