- 26 11月, 2017 1 次提交
-
-
由 Andy Polyakov 提交于
Convert AVX512F+VL+BW code path to pure AVX512F, so that it can be executed even on Knights Landing. Trigger for modification was observation that AVX512 code paths can negatively affect overall Skylake-X system performance. Since we are likely to suppress AVX512F capability flag [at least on Skylake-X], conversion serves as kind of "investment protection". Reviewed-by: NRich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/4758)
-
- 12 11月, 2017 1 次提交
-
-
由 Josh Soref 提交于
Around 138 distinct errors found and fixed; thanks! Reviewed-by: NKurt Roeckx <kurt@roeckx.be> Reviewed-by: NTim Hudson <tjh@openssl.org> Reviewed-by: NRich Salz <rsalz@openssl.org> (Merged from https://github.com/openssl/openssl/pull/3459)
-
- 21 7月, 2017 1 次提交
-
-
由 Andy Polyakov 提交于
"Optimize" is in quotes because it's rather a "salvage operation" for now. Idea is to identify processor capability flags that drive Knights Landing to suboptimial code paths and mask them. Two flags were identified, XSAVE and ADCX/ADOX. Former affects choice of AES-NI code path specific for Silvermont (Knights Landing is of Silvermont "ancestry"). And 64-bit ADCX/ADOX instructions are effectively mishandled at decode time. In both cases we are looking at ~2x improvement. AVX-512 results cover even Skylake-X :-) Hardware used for benchmarking courtesy of Atos, experiments run by Romain Dolbeau <romain.dolbeau@atos.net>. Kudos! Reviewed-by: NRich Salz <rsalz@openssl.org>
-
- 04 7月, 2017 1 次提交
-
-
由 Andy Polyakov 提交于
Reviewed-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
-
- 22 3月, 2017 2 次提交
-
-
由 Andy Polyakov 提交于
As hinted by its name new subroutine processes 8 input blocks in parallel by loading data to 512-bit registers. It still needs more work, as it needs to handle some specific input lengths better. In this sense it's yet another intermediate step... Reviewed-by: NRich Salz <rsalz@openssl.org>
-
由 Andy Polyakov 提交于
Reviewed-by: NTim Hudson <tjh@openssl.org>
-
- 14 3月, 2017 1 次提交
-
-
由 Andy Polyakov 提交于
As hinted by its name new subroutine processes 4 input blocks in parallel. It still operates on 256-bit registers and is just another step toward full-blown AVX512IFMA procedure. Reviewed-by: NRich Salz <rsalz@openssl.org>
-
- 27 2月, 2017 2 次提交
-
-
由 Andy Polyakov 提交于
Reviewed-by: NRich Salz <rsalz@openssl.org>
-
由 Andy Polyakov 提交于
Reviewed-by: NRich Salz <rsalz@openssl.org>
-
- 26 2月, 2017 3 次提交
-
-
由 Andy Polyakov 提交于
This is initial and minimal single-block implementation. Reviewed-by: NRich Salz <rsalz@openssl.org>
-
由 Andy Polyakov 提交于
Effectively it's minor size optimization, 5-6% per affected subroutine. Reviewed-by: NRich Salz <rsalz@openssl.org>
-
由 Andy Polyakov 提交于
On pre-Skylake best optimization strategy was balancing port-specific instructions, while on Skylake minimizing the sheer amount appears more sensible. Reviewed-by: NRich Salz <rsalz@openssl.org>
-
- 16 12月, 2016 1 次提交
-
-
由 Andy Polyakov 提交于
chacha/asm/chacha-x86_64.pl: refine nasm version detection logic. Reviewed-by: NRichard Levitte <levitte@openssl.org>
-
- 12 12月, 2016 1 次提交
-
-
由 Andy Polyakov 提交于
Reviewed-by: NRich Salz <rsalz@openssl.org>
-
- 24 10月, 2016 1 次提交
-
-
由 Andy Polyakov 提交于
Reviewed-by: NRichard Levitte <levitte@openssl.org>
-
- 29 5月, 2016 1 次提交
-
-
由 Andy Polyakov 提交于
[as it is now quoting $output is not required, but done just in case] Reviewed-by: NRichard Levitte <levitte@openssl.org>
-
- 21 5月, 2016 1 次提交
-
-
由 Rich Salz 提交于
Reviewed-by: NRichard Levitte <levitte@openssl.org>
-
- 06 5月, 2016 2 次提交
-
-
由 Andy Polyakov 提交于
We don't need it, but external users might find it handy. Reviewed-by: NRichard Levitte <levitte@openssl.org>
-
由 Andy Polyakov 提交于
Reviewed-by: NRichard Levitte <levitte@openssl.org>
-
- 20 4月, 2016 1 次提交
-
-
由 Andy Polyakov 提交于
Reviewed-by: NRichard Levitte <levitte@openssl.org>
-
- 04 4月, 2016 1 次提交
-
-
由 Andy Polyakov 提交于
RT#4483 [poly1305-armv4.pl: remove redundant #ifdef __thumb2__] [poly1305-ppc*.pl: presumably more accurate benchmark results] Reviewed-by: NRichard Levitte <levitte@openssl.org>
-
- 16 3月, 2016 1 次提交
-
-
由 Andy Polyakov 提交于
Reviewed-by: NRichard Levitte <levitte@openssl.org>
-
- 02 3月, 2016 1 次提交
-
-
由 Andy Polyakov 提交于
Formally only 32-bit AVX2 code path needs this, but I choose to harmonize all vector code paths. RT#4346 Reviewed-by: NRichard Levitte <levitte@openssl.org>
-
- 12 2月, 2016 2 次提交
-
-
由 Andy Polyakov 提交于
Reviewed-by: NViktor Dukhovni <viktor@openssl.org>
-
由 Andy Polyakov 提交于
Reviewed-by: NTim Hudson <tjh@openssl.org>
-
- 10 2月, 2016 1 次提交
-
-
由 Andy Polyakov 提交于
Reviewed-by: NRich Salz <rsalz@openssl.org>
-