1. 10 4月, 2015 3 次提交
  2. 03 4月, 2015 1 次提交
    • S
      crypto: arm/sha256 - Add optimized SHA-256/224 · f2f770d7
      Sami Tolvanen 提交于
      Add Andy Polyakov's optimized assembly and NEON implementations for
      SHA-256/224.
      
      The sha256-armv4.pl script for generating the assembly code is from
      OpenSSL commit 51f8d095562f36cdaa6893597b5c609e943b0565.
      
      Compared to sha256-generic these implementations have the following
      tcrypt speed improvements on Motorola Nexus 6 (Snapdragon 805):
      
        bs    b/u      sha256-neon  sha256-asm
        16    16       x1.32        x1.19
        64    16       x1.27        x1.15
        64    64       x1.36        x1.20
        256   16       x1.22        x1.11
        256   64       x1.36        x1.19
        256   256      x1.59        x1.23
        1024  16       x1.21        x1.10
        1024  256      x1.65        x1.23
        1024  1024     x1.76        x1.25
        2048  16       x1.21        x1.10
        2048  256      x1.66        x1.23
        2048  1024     x1.78        x1.25
        2048  2048     x1.79        x1.25
        4096  16       x1.20        x1.09
        4096  256      x1.66        x1.23
        4096  1024     x1.79        x1.26
        4096  4096     x1.82        x1.26
        8192  16       x1.20        x1.09
        8192  256      x1.67        x1.23
        8192  1024     x1.80        x1.26
        8192  4096     x1.85        x1.28
        8192  8192     x1.85        x1.27
      
      Where bs refers to block size and b/u to bytes per update.
      Signed-off-by: NSami Tolvanen <samitolvanen@google.com>
      Cc: Andy Polyakov <appro@openssl.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      f2f770d7
  3. 31 3月, 2015 3 次提交
  4. 24 3月, 2015 1 次提交
  5. 12 3月, 2015 5 次提交
  6. 02 12月, 2014 1 次提交
  7. 24 11月, 2014 1 次提交
  8. 27 8月, 2014 1 次提交
  9. 02 8月, 2014 3 次提交
  10. 28 7月, 2014 1 次提交
  11. 18 7月, 2014 1 次提交
    • R
      ARM: convert all "mov.* pc, reg" to "bx reg" for ARMv6+ · 6ebbf2ce
      Russell King 提交于
      ARMv6 and greater introduced a new instruction ("bx") which can be used
      to return from function calls.  Recent CPUs perform better when the
      "bx lr" instruction is used rather than the "mov pc, lr" instruction,
      and this sequence is strongly recommended to be used by the ARM
      architecture manual (section A.4.1.1).
      
      We provide a new macro "ret" with all its variants for the condition
      code which will resolve to the appropriate instruction.
      
      Rather than doing this piecemeal, and miss some instances, change all
      the "mov pc" instances to use the new macro, with the exception of
      the "movs" instruction and the kprobes code.  This allows us to detect
      the "mov pc, lr" case and fix it up - and also gives us the possibility
      of deploying this for other registers depending on the CPU selection.
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: Stephen Warren <swarren@nvidia.com> # Tegra Jetson TK1
      Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> # mioa701_bootresume.S
      Tested-by: Andrew Lunn <andrew@lunn.ch> # Kirkwood
      Tested-by: NShawn Guo <shawn.guo@freescale.com>
      Tested-by: Tony Lindgren <tony@atomide.com> # OMAPs
      Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> # Armada XP, 375, 385
      Acked-by: Sekhar Nori <nsekhar@ti.com> # DaVinci
      Acked-by: Christoffer Dall <christoffer.dall@linaro.org> # kvm/hyp
      Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> # PXA3xx
      Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> # Xen
      Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> # ARMv7M
      Tested-by: Simon Horman <horms+renesas@verge.net.au> # Shmobile
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      6ebbf2ce
  12. 05 1月, 2014 1 次提交
    • R
      CRYPTO: Fix more AES build errors · d2eca20d
      Russell King 提交于
      Building a multi-arch kernel results in:
      
      arch/arm/crypto/built-in.o: In function `aesbs_xts_decrypt':
      sha1_glue.c:(.text+0x15c8): undefined reference to `bsaes_xts_decrypt'
      arch/arm/crypto/built-in.o: In function `aesbs_xts_encrypt':
      sha1_glue.c:(.text+0x1664): undefined reference to `bsaes_xts_encrypt'
      arch/arm/crypto/built-in.o: In function `aesbs_ctr_encrypt':
      sha1_glue.c:(.text+0x184c): undefined reference to `bsaes_ctr32_encrypt_blocks'
      arch/arm/crypto/built-in.o: In function `aesbs_cbc_decrypt':
      sha1_glue.c:(.text+0x19b4): undefined reference to `bsaes_cbc_encrypt'
      
      This code is already runtime-conditional on NEON being supported, so
      there's no point compiling it out depending on the minimum build
      architecture.
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      d2eca20d
  13. 07 10月, 2013 1 次提交
  14. 05 10月, 2013 1 次提交
    • A
      ARM: add support for bit sliced AES using NEON instructions · e4e7f10b
      Ard Biesheuvel 提交于
      Bit sliced AES gives around 45% speedup on Cortex-A15 for encryption
      and around 25% for decryption. This implementation of the AES algorithm
      does not rely on any lookup tables so it is believed to be invulnerable
      to cache timing attacks.
      
      This algorithm processes up to 8 blocks in parallel in constant time. This
      means that it is not usable by chaining modes that are strictly sequential
      in nature, such as CBC encryption. CBC decryption, however, can benefit from
      this implementation and runs about 25% faster. The other chaining modes
      implemented in this module, XTS and CTR, can execute fully in parallel in
      both directions.
      
      The core code has been adopted from the OpenSSL project (in collaboration
      with the original author, on cc). For ease of maintenance, this version is
      identical to the upstream OpenSSL code, i.e., all modifications that were
      required to make it suitable for inclusion into the kernel have been made
      upstream. The original can be found here:
      
          http://git.openssl.org/gitweb/?p=openssl.git;a=commit;h=6f6a6130
      
      Note to integrators:
      While this implementation is significantly faster than the existing table
      based ones (generic or ARM asm), especially in CTR mode, the effects on
      power efficiency are unclear as of yet. This code does fundamentally more
      work, by calculating values that the table based code obtains by a simple
      lookup; only by doing all of that work in a SIMD fashion, it manages to
      perform better.
      
      Cc: Andy Polyakov <appro@openssl.org>
      Acked-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      e4e7f10b
  15. 04 10月, 2013 1 次提交
  16. 22 9月, 2013 1 次提交
  17. 23 5月, 2013 1 次提交
  18. 13 1月, 2013 1 次提交
    • D
      ARM: 7626/1: arm/crypto: Make asm SHA-1 and AES code Thumb-2 compatible · 638591cd
      Dave Martin 提交于
      This patch fixes aes-armv4.S and sha1-armv4-large.S to work
      natively in Thumb.  This allows ARM/Thumb interworking workarounds
      to be removed.
      
      I also take the opportunity to convert some explicit assembler
      directives for exported functions to the standard
      ENTRY()/ENDPROC().
      
      For the code itself:
      
        * In sha1_block_data_order, use of TEQ with sp is deprecated in
          ARMv7 and not supported in Thumb.  For the branches back to
          .L_00_15 and .L_40_59, the TEQ is converted to a CMP, under the
          assumption that clobbering the C flag here will not cause
          incorrect behaviour.
      
          For the first branch back to .L_20_39_or_60_79 the C flag is
          important, so sp is moved temporarily into another register so
          that TEQ can be used for the comparison.
      
        * In the AES code, most forms of register-indexed addressing with
          shifts and rotates are not permitted for loads and stores in
          Thumb, so the address calculation is done using a separate
          instruction for the Thumb case.
      
      The resulting code is unlikely to be optimally scheduled, but it
      should not have a large impact given the overall size of the code.
      I haven't run any benchmarks.
      Signed-off-by: NDave Martin <dave.martin@linaro.org>
      Tested-by: David McCullough <ucdevel@gmail.com> (ARM only)
      Acked-by: NDavid McCullough <ucdevel@gmail.com>
      Acked-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      638591cd
  19. 07 9月, 2012 1 次提交