- 06 4月, 2019 1 次提交
-
-
由 Nathan Chancellor 提交于
[ Upstream commit de9c0d49d85dc563549972edc5589d195cd5e859 ] While building arm32 allyesconfig, I ran into the following errors: arch/arm/lib/xor-neon.c:17:2: error: You should compile this file with '-mfloat-abi=softfp -mfpu=neon' In file included from lib/raid6/neon1.c:27: /home/nathan/cbl/prebuilt/lib/clang/8.0.0/include/arm_neon.h:28:2: error: "NEON support not enabled" Building V=1 showed NEON_FLAGS getting passed along to Clang but __ARM_NEON__ was not getting defined. Ultimately, it boils down to Clang only defining __ARM_NEON__ when targeting armv7, rather than armv6k, which is the '-march' value for allyesconfig. >From lib/Basic/Targets/ARM.cpp in the Clang source: // This only gets set when Neon instructions are actually available, unlike // the VFP define, hence the soft float and arch check. This is subtly // different from gcc, we follow the intent which was that it should be set // when Neon instructions are actually available. if ((FPU & NeonFPU) && !SoftFloat && ArchVersion >= 7) { Builder.defineMacro("__ARM_NEON", "1"); Builder.defineMacro("__ARM_NEON__"); // current AArch32 NEON implementations do not support double-precision // floating-point even when it is present in VFP. Builder.defineMacro("__ARM_NEON_FP", "0x" + Twine::utohexstr(HW_FP & ~HW_FP_DP)); } Ard Biesheuvel recommended explicitly adding '-march=armv7-a' at the beginning of the NEON_FLAGS definitions so that __ARM_NEON__ always gets definined by Clang. This doesn't functionally change anything because that code will only run where NEON is supported, which is implicitly armv7. Link: https://github.com/ClangBuiltLinux/linux/issues/287Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Acked-by: NNicolas Pitre <nico@linaro.org> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Reviewed-by: NStefan Agner <stefan@agner.ch> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 13 1月, 2019 1 次提交
-
-
由 Joel Stanley 提交于
commit e213574a449f7a57d4202c1869bbc7680b6b5521 upstream. We cannot build these files with clang as it does not allow altivec instructions in assembly when -msoft-float is passed. Jinsong Ji <jji@us.ibm.com> wrote: > We currently disable Altivec/VSX support when enabling soft-float. So > any usage of vector builtins will break. > > Enable Altivec/VSX with soft-float may need quite some clean up work, so > I guess this is currently a limitation. > > Removing -msoft-float will make it work (and we are lucky that no > floating point instructions will be generated as well). This is a workaround until the issue is resolved in clang. Link: https://bugs.llvm.org/show_bug.cgi?id=31177 Link: https://github.com/ClangBuiltLinux/linux/issues/239Signed-off-by: NJoel Stanley <joel@jms.id.au> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 27 11月, 2018 1 次提交
-
-
由 Jeremy Linton 提交于
[ Upstream commit 313a06e636808387822af24c507cba92703568b1 ] The lib/raid6/test fails to build the neon objects on arm64 because the correct machine type is 'aarch64'. Once this is correctly enabled, the neon recovery objects need to be added to the build. Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NJeremy Linton <jeremy.linton@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 04 7月, 2018 1 次提交
-
-
由 Kees Cook 提交于
In the quest to remove all stack VLA usage from the kernel[1], this moves the "$#" replacement from being an argument to being inside the function, which avoids generating VLAs. [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.comSigned-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 26 3月, 2018 2 次提交
-
-
由 Arnd Bergmann 提交于
The Tile architecture is getting removed, so we no longer need this either. Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
由 Joe Perches 提交于
Some functions definitions have either the initial open brace and/or the closing brace outside of column 1. Move those braces to column 1. This allows various function analyzers like gnu complexity to work properly for these modified functions. Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Acked-by: NPaul Moore <paul@paul-moore.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Acked-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Acked-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: NTakashi Iwai <tiwai@suse.de> Acked-by: NMauro Carvalho Chehab <mchehab@s-opensource.com> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NNicolin Chen <nicoleotsuka@gmail.com> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 20 3月, 2018 2 次提交
-
-
由 Matt Brown 提交于
Previously the raid6 test Makefile did not build the POWER specific files (altivec and vpermxor). This patch fixes the bug, so that all appropriate files for powerpc are built. This patch also fixes the missing and mismatched ifdef statements to allow the altivec.uc file to be built correctly. Signed-off-by: NMatt Brown <matthew.brown.dev@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Matt Brown 提交于
This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. This instruction was made available with POWER8, ISA version 2.07. It allows for both vperm and vxor instructions to be done in a single instruction. This has been tested for correctness on a ppc64le vm with a basic RAID6 setup containing 5 drives. The performance benchmarks are from the raid6test in the /lib/raid6/test directory. These results are from an IBM Firestone machine with ppc64le architecture. The benchmark results show a 35% speed increase over the best existing algorithm for powerpc (altivec). The raid6test has also been run on a big-endian ppc64 vm to ensure it also works for big-endian architectures. Performance benchmarks: raid6: altivecx4 gen() 18773 MB/s raid6: altivecx8 gen() 19438 MB/s raid6: vpermxor4 gen() 25112 MB/s raid6: vpermxor8 gen() 26279 MB/s Signed-off-by: NMatt Brown <matthew.brown.dev@gmail.com> Reviewed-by: NDaniel Axtens <dja@axtens.net> [mpe: Add VPERMXOR macro so we can build with old binutils] Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 02 11月, 2017 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org> Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 26 8月, 2017 1 次提交
-
-
由 Denys Vlasenko 提交于
Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: mingo@redhat.com Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Megha Dey <megha.dey@linux.intel.com> Cc: Gayatri Kammela <gayatri.kammela@intel.com> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NShaohua Li <shli@fb.com>
-
- 10 8月, 2017 2 次提交
-
-
由 Ard Biesheuvel 提交于
Provide a NEON accelerated implementation of the recovery algorithm, which supersedes the default byte-by-byte one. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Ard Biesheuvel 提交于
The P/Q left side optimization in the delta syndrome simply involves repeatedly multiplying a value by polynomial 'x' in GF(2^8). Given that 'x * x * x * x' equals 'x^4' even in the polynomial world, we can accelerate this substantially by performing up to 4 such operations at once, using the NEON instructions for polynomial multiplication. Results on a Cortex-A57 running in 64-bit mode: Before: ------- raid6: neonx1 xor() 1680 MB/s raid6: neonx2 xor() 2286 MB/s raid6: neonx4 xor() 3162 MB/s raid6: neonx8 xor() 3389 MB/s After: ------ raid6: neonx1 xor() 2281 MB/s raid6: neonx2 xor() 3362 MB/s raid6: neonx4 xor() 3787 MB/s raid6: neonx8 xor() 4239 MB/s While we're at it, simplify MASK() by using a signed shift rather than a vector compare involving a temp register. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 16 5月, 2017 1 次提交
-
-
由 Anup Patel 提交于
The raid6_gfexp table represents {2}^n values for 0 <= n < 256. The Linux async_tx framework pass values from raid6_gfexp as coefficients for each source to prep_dma_pq() callback of DMA channel with PQ capability. This creates problem for RAID6 offload engines (such as Broadcom SBA) which take disk position (i.e. log of {2}) instead of multiplicative cofficients from raid6_gfexp table. This patch adds raid6_gflog table having log-of-2 value for any given x such that 0 <= x < 256. For any given disk coefficient x, the corresponding disk position is given by raid6_gflog[x]. The RAID6 offload engine driver can use this newly added raid6_gflog table to get disk position from multiplicative coefficient. Signed-off-by: NAnup Patel <anup.patel@broadcom.com> Reviewed-by: NScott Branden <scott.branden@broadcom.com> Reviewed-by: NRay Jui <ray.jui@broadcom.com> Acked-by: NShaohua Li <shli@fb.com> Signed-off-by: NVinod Koul <vinod.koul@intel.com>
-
- 08 11月, 2016 1 次提交
-
-
由 Gayatri Kammela 提交于
Implement the AVX2 optimization of RAID6 xor_syndrome functions which is simply based on sse2.c written by hpa. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Yuanhan Liu <yuanhan.liu@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: NGayatri Kammela <gayatri.kammela@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 27 9月, 2016 1 次提交
-
-
由 Gayatri Kammela 提交于
Specifying the aligned attributes to the char data[NDISKS][PAGE_SIZE], char recovi[PAGE_SIZE] and char recovi[PAGE_SIZE] arrays, so that all malloc memory is page boundary aligned. Without these alignment attributes, the test causes a segfault in userspace when the NDISKS are changed to 4 from 16. The RAID stripes will be page aligned anyway, so we want to test what the kernel actually will execute. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: NGayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: NH. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 22 9月, 2016 4 次提交
-
-
由 Gayatri Kammela 提交于
Optimize RAID6 xor_syndrome functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized xor_syndrome functions, which is simply based on sse2.c written by hpa. The patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Megha Dey <megha.dey@linux.intel.com> Signed-off-by: NGayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Gayatri Kammela 提交于
Adding avx512 gen_syndrome and recovery functions so as to allow code to be compiled and tested successfully in userspace. This patch is tested in userspace and improvement in performace is observed. Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: NMegha Dey <megha.dey@linux.intel.com> Signed-off-by: NGayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Gayatri Kammela 提交于
Optimize RAID6 recovery functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized recovery functions, which is simply based on recov_avx2.c written by Jim Kukunas This patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: NMegha Dey <megha.dey@linux.intel.com> Signed-off-by: NGayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
由 Gayatri Kammela 提交于
Optimize RAID6 gen_syndrom functions to take advantage of the 512-bit ZMM integer instructions introduced in AVX512. AVX512 optimized gen_syndrom functions, which is simply based on avx2.c written by Yuanhan Liu and sse2.c written by hpa. The patch was tested and benchmarked before submission on a hardware that has AVX512 flags to support such instructions Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: NMegha Dey <megha.dey@linux.intel.com> Signed-off-by: NGayatri Kammela <gayatri.kammela@intel.com> Reviewed-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NShaohua Li <shli@fb.com>
-
- 01 9月, 2016 1 次提交
-
-
由 Martin Schwidefsky 提交于
The XC instruction can be used to improve the speed of the raid6 recovery. The loops now operate on blocks of 256 bytes. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 29 8月, 2016 1 次提交
-
-
由 Martin Schwidefsky 提交于
Using vector registers is slightly faster: raid6: vx128x8 gen() 19705 MB/s raid6: vx128x8 xor() 11886 MB/s raid6: using algorithm vx128x8 gen() 19705 MB/s raid6: .... xor() 11886 MB/s, rmw enabled vs the software algorithms: raid6: int64x1 gen() 3018 MB/s raid6: int64x1 xor() 1429 MB/s raid6: int64x2 gen() 4661 MB/s raid6: int64x2 xor() 3143 MB/s raid6: int64x4 gen() 5392 MB/s raid6: int64x4 xor() 3509 MB/s raid6: int64x8 gen() 4441 MB/s raid6: int64x8 xor() 3207 MB/s raid6: using algorithm int64x4 gen() 5392 MB/s raid6: .... xor() 3509 MB/s, rmw enabled Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
- 01 12月, 2015 1 次提交
-
-
由 Anton Blanchard 提交于
The enable_kernel_*() functions leave the relevant MSR bits enabled until we exit the kernel sometime later. Create disable versions that wrap the kernel use of FP, Altivec VSX or SPE. While we don't want to disable it normally for performance reasons (MSR writes are slow), it will be used for a debug boot option that does this and catches bad uses in other areas of the kernel. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 01 9月, 2015 1 次提交
-
-
由 Ard Biesheuvel 提交于
This implements XOR syndrome calculation using NEON intrinsics. As before, the module can be built for ARM and arm64 from the same source. Relative performance on a Cortex-A57 based system: raid6: int64x1 gen() 905 MB/s raid6: int64x1 xor() 881 MB/s raid6: int64x2 gen() 1343 MB/s raid6: int64x2 xor() 1286 MB/s raid6: int64x4 gen() 1896 MB/s raid6: int64x4 xor() 1321 MB/s raid6: int64x8 gen() 1773 MB/s raid6: int64x8 xor() 1165 MB/s raid6: neonx1 gen() 1834 MB/s raid6: neonx1 xor() 1278 MB/s raid6: neonx2 gen() 2528 MB/s raid6: neonx2 xor() 1942 MB/s raid6: neonx4 gen() 2888 MB/s raid6: neonx4 xor() 2334 MB/s raid6: neonx8 gen() 2957 MB/s raid6: neonx8 xor() 2232 MB/s raid6: using algorithm neonx8 gen() 2957 MB/s raid6: .... xor() 2232 MB/s, rmw enabled Cc: Markus Stockhausen <stockhausen@collogia.de> Cc: Neil Brown <neilb@suse.de> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NNeilBrown <neilb@suse.com>
-
- 11 6月, 2015 1 次提交
-
-
由 Anton Blanchard 提交于
The -mabi=altivec option is not recognised on LLVM, so use call cc-option to check for support. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
- 19 5月, 2015 1 次提交
-
-
由 Ingo Molnar 提交于
We already have fpu/types.h, move i387.h to fpu/api.h. The file name has become a misnomer anyway: it offers generic FPU APIs, but is not limited to i387 functionality. Reviewed-by: NBorislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 22 4月, 2015 4 次提交
-
-
由 Markus Stockhausen 提交于
The second and (last) optimized XOR syndrome calculation. This version supports right and left side optimization. All CPUs with architecture older than Haswell will benefit from it. It should be noted that SSE2 movntdq kills performance for memory areas that are read and written simultaneously in chunks smaller than cache line size. So use movdqa instead for P/Q writes in sse21 and sse22 XOR functions. Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Markus Stockhausen 提交于
Start the algorithms with the very basic one. It is left and right optimized. That means we can avoid all calculations for unneeded pages above the right stop offset. For pages below the left start offset we still need the syndrome multiplication but without reading data pages. Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Markus Stockhausen 提交于
It is always helpful to have a test tool in place if we implement new data critical algorithms. So add some test routines to the raid6 checker that can prove if the new xor_syndrome() works as expected. Run through all permutations of start/stop pages per algorithm and simulate a xor_syndrome() assisted rmw run. After each rmw check if the recovery algorithm still confirms that the stripe is fine. Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Markus Stockhausen 提交于
v3: s-o-b comment, explanation of performance and descision for the start/stop implementation Implementing rmw functionality for RAID6 requires optimized syndrome calculation. Up to now we can only generate a complete syndrome. The target P/Q pages are always overwritten. With this patch we provide a framework for inplace P/Q modification. In the first place simply fill those functions with NULL values. xor_syndrome() has two additional parameters: start & stop. These will indicate the first and last page that are changing during a rmw run. That makes it possible to avoid several unneccessary loops and speed up calculation. The caller needs to implement the following logic to make the functions work. 1) xor_syndrome(disks, start, stop, ...): "Remove" all data of source blocks inside P/Q between (and including) start and end. 2) modify any block with start <= block <= stop 3) xor_syndrome(disks, start, stop, ...): "Reinsert" all data of source blocks into P/Q between (and including) start and end. Pages between start and stop that won't be changed should be filled with a pointer to the kernel zero page. The reasons for not taking NULL pages are: 1) Algorithms cross the whole source data line by line. Thus avoid additional branches. 2) Having a NULL page avoids calculating the XOR P parity but still need calulation steps for the Q parity. Depending on the algorithm unrolling that might be only a difference of 2 instructions per loop. The benchmark numbers of the gen_syndrome() functions are displayed in the kernel log. Do the same for the xor_syndrome() functions. This will help to analyze performance problems and give an rough estimate how well the algorithm works. The choice of the fastest algorithm will still depend on the gen_syndrome() performance. With the start/stop page implementation the speed can vary a lot in real life. E.g. a change of page 0 & page 15 on a stripe will be harder to compute than the case where page 0 & page 1 are XOR candidates. To be not to enthusiatic about the expected speeds we will run a worse case test that simulates a change on the upper half of the stripe. So we do: 1) calculation of P/Q for the upper pages 2) continuation of Q for the lower (empty) pages Signed-off-by: NMarkus Stockhausen <stockhausen@collogia.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 04 2月, 2015 1 次提交
-
-
由 Jan Beulich 提交于
Just like for AVX2 (which simply needs an #if -> #ifdef conversion), SSSE3 assembler support should be checked for before using it. Signed-off-by: NJan Beulich <jbeulich@suse.com> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 14 10月, 2014 1 次提交
-
-
由 Anton Blanchard 提交于
Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 27 8月, 2013 2 次提交
-
-
由 Max Filippov 提交于
-e is a non-standard echo option, echo output is implementation-dependent when it is used. Replace echo -e with printf as suggested by POSIX echo manual. Cc: NeilBrown <neilb@suse.de> Cc: Jim Kukunas <james.t.kukunas@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com> Acked-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Ken Steele 提交于
This change adds TILE-Gx SIMD instructions to the software raid (md), modeling the Altivec implementation. This is only for Syndrome generation; there is more that could be done to improve recovery, as in the recent Intel SSE3 recovery implementation. The code unrolls 8 times; this turns out to be the best on tilegx hardware among the set 1, 2, 4, 8 or 16. The code reads one cache-line of data from each disk, stores P and Q then goes to the next cache-line. The test code in sys/linux/lib/raid6/test reports 2008 MB/s data read rate for syndrome generation using 18 disks (16 data and 2 parity). It was 1512 MB/s before this SIMD optimizations. This is running on 1 core with all the data in cache. This is based on the paper The Mathematics of RAID-6. (http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf). Signed-off-by: NKen Steele <ken@tilera.com> Signed-off-by: NChris Metcalf <cmetcalf@tilera.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 09 7月, 2013 1 次提交
-
-
由 Ard Biesheuvel 提交于
Rebased/reworked a patch contributed by Rob Herring that uses NEON intrinsics to perform the RAID-6 syndrome calculations. It uses the existing unroll.awk code to generate several unrolled versions of which the best performing one is selected at boot time. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: NNicolas Pitre <nico@linaro.org> Cc: hpa@linux.intel.com
-
- 13 12月, 2012 3 次提交
-
-
由 Yuanhan Liu 提交于
sse and avx2 stuff only exist on x86 arch, and we don't need to build altivec on x86. And we can do that at lib/raid6/Makefile. Proposed-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NJim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Yuanhan Liu 提交于
Add AVX2 optimized gen_syndrom functions, which is simply based on sse2.c written by hpa. Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com> Reviewed-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NJim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jim Kukunas 提交于
Optimize RAID6 recovery functions to take advantage of the 256-bit YMM integer instructions introduced in AVX2. The patch was tested and benchmarked before submission. However hardware is not yet released so benchmark numbers cannot be reported. Acked-by: N"H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NJim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 28 5月, 2012 1 次提交
-
-
由 Jim Kukunas 提交于
Make the recovery functions static to fix the following sparse warnings: lib/raid6/recov.c:25:6: warning: symbol 'raid6_2data_recov_intx1' was not declared. Should it be static? lib/raid6/recov.c:69:6: warning: symbol 'raid6_datap_recov_intx1' was not declared. Should it be static? lib/raid6/recov_ssse3.c:22:6: warning: symbol 'raid6_2data_recov_ssse3' was not declared. Should it be static? lib/raid6/recov_ssse3.c:197:6: warning: symbol 'raid6_datap_recov_ssse3' was not declared. Should it be static? Reported-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NJim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 22 5月, 2012 2 次提交
-
-
由 Jim Kukunas 提交于
Reorders functions in raid6_algos as well as the preference check to reduce the number of functions tested on initialization. Also, creates symmetry between choosing the gen_syndrome functions and choosing the recovery functions. Signed-off-by: NJim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Jim Kukunas 提交于
Test each combination of recovery and syndrome generation functions. Signed-off-by: NJim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-