1. 10 6月, 2017 1 次提交
  2. 11 2月, 2017 2 次提交
    • A
      crypto: improve gcc optimization flags for serpent and wp512 · 7d6e9105
      Arnd Bergmann 提交于
      An ancient gcc bug (first reported in 2003) has apparently resurfaced
      on MIPS, where kernelci.org reports an overly large stack frame in the
      whirlpool hash algorithm:
      
      crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      With some testing in different configurations, I'm seeing large
      variations in stack frames size up to 1500 bytes for what should have
      around 300 bytes at most. I also checked the reference implementation,
      which is essentially the same code but also comes with some test and
      benchmarking infrastructure.
      
      It seems that recent compiler versions on at least arm, arm64 and powerpc
      have a partial fix for this problem, but enabling "-fsched-pressure", but
      even with that fix they suffer from the issue to a certain degree. Some
      testing on arm64 shows that the time needed to hash a given amount of
      data is roughly proportional to the stack frame size here, which makes
      sense given that the wp512 implementation is doing lots of loads for
      table lookups, and the problem with the overly large stack is a result
      of doing a lot more loads and stores for spilled registers (as seen from
      inspecting the object code).
      
      Disabling -fschedule-insns consistently fixes the problem for wp512,
      in my collection of cross-compilers, the results are consistently better
      or identical when comparing the stack sizes in this function, though
      some architectures (notable x86) have schedule-insns disabled by
      default.
      
      The four columns are:
      default: -O2
      press:	 -O2 -fsched-pressure
      nopress: -O2 -fschedule-insns -fno-sched-pressure
      nosched: -O2 -no-schedule-insns (disables sched-pressure)
      
      				default	press	nopress	nosched
      alpha-linux-gcc-4.9.3		1136	848	1136	176
      am33_2.0-linux-gcc-4.9.3	2100	2076	2100	2104
      arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
      cris-linux-gcc-4.9.3		272	272	272	272
      frv-linux-gcc-4.9.3		1128	1000	1128	280
      hppa64-linux-gcc-4.9.3		1128	336	1128	184
      hppa-linux-gcc-4.9.3		644	308	644	276
      i386-linux-gcc-4.9.3		352	352	352	352
      m32r-linux-gcc-4.9.3		720	656	720	268
      microblaze-linux-gcc-4.9.3	1108	604	1108	256
      mips64-linux-gcc-4.9.3		1328	592	1328	208
      mips-linux-gcc-4.9.3		1096	624	1096	240
      powerpc64-linux-gcc-4.9.3	1088	432	1088	160
      powerpc-linux-gcc-4.9.3		1080	584	1080	224
      s390-linux-gcc-4.9.3		456	456	624	360
      sh3-linux-gcc-4.9.3		292	292	292	292
      sparc64-linux-gcc-4.9.3		992	240	992	208
      sparc-linux-gcc-4.9.3		680	592	680	312
      x86_64-linux-gcc-4.9.3		224	240	272	224
      xtensa-linux-gcc-4.9.3		1152	704	1152	304
      
      aarch64-linux-gcc-7.0.0		224	224	1104	208
      arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
      mips-linux-gcc-7.0.0		1120	648	1120	272
      x86_64-linux-gcc-7.0.1		240	240	304	240
      
      arm-linux-gnueabi-gcc-4.4.7	840			392
      arm-linux-gnueabi-gcc-4.5.4	784	728	784	320
      arm-linux-gnueabi-gcc-4.6.4	736	728	736	304
      arm-linux-gnueabi-gcc-4.7.4	944	784	944	352
      arm-linux-gnueabi-gcc-4.8.5	464	464	760	352
      arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
      arm-linux-gnueabi-gcc-5.3.1	824	824	1064	336
      arm-linux-gnueabi-gcc-6.1.1	808	808	1056	344
      arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
      
      Trying the same test for serpent-generic, the picture is a bit different,
      and while -fno-schedule-insns is generally better here than the default,
      -fsched-pressure wins overall, so I picked that instead.
      
      				default	press	nopress	nosched
      alpha-linux-gcc-4.9.3		1392	864	1392	960
      am33_2.0-linux-gcc-4.9.3	536	524	536	528
      arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
      cris-linux-gcc-4.9.3		528	528	528	528
      frv-linux-gcc-4.9.3		536	400	536	504
      hppa64-linux-gcc-4.9.3		524	208	524	480
      hppa-linux-gcc-4.9.3		768	472	768	508
      i386-linux-gcc-4.9.3		564	564	564	564
      m32r-linux-gcc-4.9.3		712	576	712	532
      microblaze-linux-gcc-4.9.3	724	392	724	512
      mips64-linux-gcc-4.9.3		720	384	720	496
      mips-linux-gcc-4.9.3		728	384	728	496
      powerpc64-linux-gcc-4.9.3	704	304	704	480
      powerpc-linux-gcc-4.9.3		704	296	704	480
      s390-linux-gcc-4.9.3		560	560	592	536
      sh3-linux-gcc-4.9.3		540	540	540	540
      sparc64-linux-gcc-4.9.3		544	352	544	496
      sparc-linux-gcc-4.9.3		544	344	544	496
      x86_64-linux-gcc-4.9.3		528	536	576	528
      xtensa-linux-gcc-4.9.3		752	544	752	544
      
      aarch64-linux-gcc-7.0.0		432	432	656	480
      arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
      mips-linux-gcc-7.0.0		720	464	720	488
      x86_64-linux-gcc-7.0.1		536	528	600	536
      
      arm-linux-gnueabi-gcc-4.4.7	592			440
      arm-linux-gnueabi-gcc-4.5.4	776	448	776	544
      arm-linux-gnueabi-gcc-4.6.4	776	448	776	544
      arm-linux-gnueabi-gcc-4.7.4	768	448	768	544
      arm-linux-gnueabi-gcc-4.8.5	488	488	776	544
      arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
      arm-linux-gnueabi-gcc-5.3.1	552	552	776	536
      arm-linux-gnueabi-gcc-6.1.1	560	560	776	536
      arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
      
      I did not do any runtime tests with serpent, so it is possible that stack
      frame size does not directly correlate with runtime performance here and
      it actually makes things worse, but it's more likely to help here, and
      the reduced stack frame size is probably enough reason to apply the patch,
      especially given that the crypto code is often used in deep call chains.
      
      Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
      Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7d6e9105
    • A
      crypto: aes - add generic time invariant AES cipher · b5e0b032
      Ard Biesheuvel 提交于
      Lookup table based AES is sensitive to timing attacks, which is due to
      the fact that such table lookups are data dependent, and the fact that
      8 KB worth of tables covers a significant number of cachelines on any
      architecture, resulting in an exploitable correlation between the key
      and the processing time for known plaintexts.
      
      For network facing algorithms such as CTR, CCM or GCM, this presents a
      security risk, which is why arch specific AES ports are typically time
      invariant, either through the use of special instructions, or by using
      SIMD algorithms that don't rely on table lookups.
      
      For generic code, this is difficult to achieve without losing too much
      performance, but we can improve the situation significantly by switching
      to an implementation that only needs 256 bytes of table data (the actual
      S-box itself), which can be prefetched at the start of each block to
      eliminate data dependent latencies.
      
      This code encrypts at ~25 cycles per byte on ARM Cortex-A57 (while the
      ordinary generic AES driver manages 18 cycles per byte on this
      hardware). Decryption is substantially slower.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      b5e0b032
  3. 30 11月, 2016 1 次提交
  4. 28 11月, 2016 1 次提交
  5. 01 11月, 2016 1 次提交
  6. 25 10月, 2016 2 次提交
  7. 18 7月, 2016 1 次提交
    • H
      crypto: skcipher - Remove top-level givcipher interface · 3a01d0ee
      Herbert Xu 提交于
      This patch removes the old crypto_grab_skcipher helper and replaces
      it with crypto_grab_skcipher2.
      
      As this is the final entry point into givcipher this patch also
      removes all traces of the top-level givcipher interface, including
      all implicit IV generators such as chainiv.
      
      The bottom-level givcipher interface remains until the drivers
      using it are converted.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      3a01d0ee
  8. 23 6月, 2016 3 次提交
  9. 20 6月, 2016 1 次提交
  10. 01 2月, 2016 1 次提交
  11. 30 1月, 2016 1 次提交
  12. 27 1月, 2016 1 次提交
  13. 09 12月, 2015 1 次提交
    • A
      crypto: rsa - RSA padding algorithm · 3d5b1ecd
      Andrzej Zaborowski 提交于
      This patch adds PKCS#1 v1.5 standard RSA padding as a separate template.
      This way an RSA cipher with padding can be obtained by instantiating
      "pkcs1pad(rsa)".  The reason for adding this is that RSA is almost
      never used without this padding (or OAEP) so it will be needed for
      either certificate work in the kernel or the userspace, and I also hear
      that it is likely implemented by hardware RSA in which case hardware
      implementations of the whole of pkcs1pad(rsa) can be provided.
      Signed-off-by: NAndrew Zaborowski <andrew.zaborowski@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      3d5b1ecd
  14. 15 10月, 2015 1 次提交
  15. 14 10月, 2015 1 次提交
  16. 21 8月, 2015 1 次提交
    • H
      crypto: skcipher - Add top-level skcipher interface · 7a7ffe65
      Herbert Xu 提交于
      This patch introduces the crypto skcipher interface which aims
      to replace both blkcipher and ablkcipher.
      
      It's very similar to the existing ablkcipher interface.  The
      main difference is the removal of the givcrypt interface.  In
      order to make the transition easier for blkcipher users, there
      is a helper SKCIPHER_REQUEST_ON_STACK which can be used to place
      a request on the stack for synchronous transforms.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      7a7ffe65
  17. 17 8月, 2015 1 次提交
  18. 25 6月, 2015 1 次提交
    • S
      crypto: jitterentropy - avoid compiler warnings · dfc9fa91
      Stephan Mueller 提交于
      The core of the Jitter RNG is intended to be compiled with -O0. To
      ensure that the Jitter RNG can be compiled on all architectures,
      separate out the RNG core into a stand-alone C file that can be compiled
      with -O0 which does not depend on any kernel include file.
      
      As no kernel includes can be used in the C file implementing the core
      RNG, any dependencies on kernel code must be extracted.
      
      A second file provides the link to the kernel and the kernel crypto API
      that can be compiled with the regular compile options of the kernel.
      Signed-off-by: NStephan Mueller <smueller@chronox.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      dfc9fa91
  19. 17 6月, 2015 2 次提交
  20. 09 6月, 2015 1 次提交
  21. 04 6月, 2015 4 次提交
  22. 27 5月, 2015 1 次提交
    • S
      crypto: jitterentropy - add jitterentropy RNG · bb5530e4
      Stephan Mueller 提交于
      The CPU Jitter RNG provides a source of good entropy by
      collecting CPU executing time jitter. The entropy in the CPU
      execution time jitter is magnified by the CPU Jitter Random
      Number Generator. The CPU Jitter Random Number Generator uses
      the CPU execution timing jitter to generate a bit stream
      which complies with different statistical measurements that
      determine the bit stream is random.
      
      The CPU Jitter Random Number Generator delivers entropy which
      follows information theoretical requirements. Based on these
      studies and the implementation, the caller can assume that
      one bit of data extracted from the CPU Jitter Random Number
      Generator holds one bit of entropy.
      
      The CPU Jitter Random Number Generator provides a decentralized
      source of entropy, i.e. every caller can operate on a private
      state of the entropy pool.
      
      The RNG does not have any dependencies on any other service
      in the kernel. The RNG only needs a high-resolution time
      stamp.
      
      Further design details, the cryptographic assessment and
      large array of test results are documented at
      http://www.chronox.de/jent.html.
      
      CC: Andreas Steffen <andreas.steffen@strongswan.org>
      CC: Theodore Ts'o <tytso@mit.edu>
      CC: Sandy Harris <sandyinchina@gmail.com>
      Signed-off-by: NStephan Mueller <smueller@chronox.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      bb5530e4
  23. 22 5月, 2015 1 次提交
    • H
      crypto: echainiv - Add encrypted chain IV generator · a10f554f
      Herbert Xu 提交于
      This patch adds a new AEAD IV generator echainiv.  It is intended
      to replace the existing skcipher IV generator eseqiv.
      
      If the underlying AEAD algorithm is using the old AEAD interface,
      then echainiv will simply use its IV generator.
      
      Otherwise, echainiv will encrypt a counter just like eseqiv but
      it'll first xor it against a previously stored IV similar to
      chainiv.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      a10f554f
  24. 04 3月, 2015 1 次提交
  25. 29 12月, 2014 1 次提交
  26. 25 8月, 2014 1 次提交
    • T
      crypto: sha-mb - multibuffer crypto infrastructure · 1e65b81a
      Tim Chen 提交于
      This patch introduces the multi-buffer crypto daemon which is responsible
      for submitting crypto jobs in a work queue to the responsible multi-buffer
      crypto algorithm.  The idea of the multi-buffer algorihtm is to put
      data streams from multiple jobs in a wide (AVX2) register and then
      take advantage of SIMD instructions to do crypto computation on several
      buffers simultaneously.
      
      The multi-buffer crypto daemon is also responsbile for flushing the
      remaining buffers to complete the computation if no new buffers arrive
      for a while.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      1e65b81a
  27. 04 7月, 2014 1 次提交
  28. 20 6月, 2014 1 次提交
  29. 25 2月, 2014 1 次提交
  30. 05 12月, 2013 1 次提交
    • C
      crypto: more robust crypto_memneq · fe8c8a12
      Cesar Eduardo Barros 提交于
      Disabling compiler optimizations can be fragile, since a new
      optimization could be added to -O0 or -Os that breaks the assumptions
      the code is making.
      
      Instead of disabling compiler optimizations, use a dummy inline assembly
      (based on RELOC_HIDE) to block the problematic kinds of optimization,
      while still allowing other optimizations to be applied to the code.
      
      The dummy inline assembly is added after every OR, and has the
      accumulator variable as its input and output. The compiler is forced to
      assume that the dummy inline assembly could both depend on the
      accumulator variable and change the accumulator variable, so it is
      forced to compute the value correctly before the inline assembly, and
      cannot assume anything about its value after the inline assembly.
      
      This change should be enough to make crypto_memneq work correctly (with
      data-independent timing) even if it is inlined at its call sites. That
      can be done later in a followup patch.
      
      Compile-tested on x86_64.
      Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.eti.br>
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      fe8c8a12
  31. 26 10月, 2013 1 次提交
  32. 07 10月, 2013 1 次提交
    • J
      crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks · 6bf37e5a
      James Yonan 提交于
      When comparing MAC hashes, AEAD authentication tags, or other hash
      values in the context of authentication or integrity checking, it
      is important not to leak timing information to a potential attacker,
      i.e. when communication happens over a network.
      
      Bytewise memory comparisons (such as memcmp) are usually optimized so
      that they return a nonzero value as soon as a mismatch is found. E.g,
      on x86_64/i5 for 512 bytes this can be ~50 cyc for a full mismatch
      and up to ~850 cyc for a full match (cold). This early-return behavior
      can leak timing information as a side channel, allowing an attacker to
      iteratively guess the correct result.
      
      This patch adds a new method crypto_memneq ("memory not equal to each
      other") to the crypto API that compares memory areas of the same length
      in roughly "constant time" (cache misses could change the timing, but
      since they don't reveal information about the content of the strings
      being compared, they are effectively benign). Iow, best and worst case
      behaviour take the same amount of time to complete (in contrast to
      memcmp).
      
      Note that crypto_memneq (unlike memcmp) can only be used to test for
      equality or inequality, NOT for lexicographical order. This, however,
      is not an issue for its use-cases within the crypto API.
      
      We tried to locate all of the places in the crypto API where memcmp was
      being used for authentication or integrity checking, and convert them
      over to crypto_memneq.
      
      crypto_memneq is declared noinline, placed in its own source file,
      and compiled with optimizations that might increase code size disabled
      ("Os") because a smart compiler (or LTO) might notice that the return
      value is always compared against zero/nonzero, and might then
      reintroduce the same early-return optimization that we are trying to
      avoid.
      
      Using #pragma or __attribute__ optimization annotations of the code
      for disabling optimization was avoided as it seems to be considered
      broken or unmaintained for long time in GCC [1]. Therefore, we work
      around that by specifying the compile flag for memneq.o directly in
      the Makefile. We found that this seems to be most appropriate.
      
      As we use ("Os"), this patch also provides a loop-free "fast-path" for
      frequently used 16 byte digests. Similarly to kernel library string
      functions, leave an option for future even further optimized architecture
      specific assembler implementations.
      
      This was a joint work of James Yonan and Daniel Borkmann. Also thanks
      for feedback from Florian Weimer on this and earlier proposals [2].
      
        [1] http://gcc.gnu.org/ml/gcc/2012-07/msg00211.html
        [2] https://lkml.org/lkml/2013/2/10/131Signed-off-by: NJames Yonan <james@openvpn.net>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Florian Weimer <fw@deneb.enyo.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      6bf37e5a