1. 03 4月, 2019 2 次提交
    • W
      locking/rwsem: Remove rwsem-spinlock.c & use rwsem-xadd.c for all archs · 390a0c62
      Waiman Long 提交于
      Currently, we have two different implementation of rwsem:
      
       1) CONFIG_RWSEM_GENERIC_SPINLOCK (rwsem-spinlock.c)
       2) CONFIG_RWSEM_XCHGADD_ALGORITHM (rwsem-xadd.c)
      
      As we are going to use a single generic implementation for rwsem-xadd.c
      and no architecture-specific code will be needed, there is no point
      in keeping two different implementations of rwsem. In most cases, the
      performance of rwsem-spinlock.c will be worse. It also doesn't get all
      the performance tuning and optimizations that had been implemented in
      rwsem-xadd.c over the years.
      
      For simplication, we are going to remove rwsem-spinlock.c and make all
      architectures use a single implementation of rwsem - rwsem-xadd.c.
      
      All references to RWSEM_GENERIC_SPINLOCK and RWSEM_XCHGADD_ALGORITHM
      in the code are removed.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-c6x-dev@linux-c6x.org
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: linux-riscv@lists.infradead.org
      Cc: linux-um@lists.infradead.org
      Cc: linux-xtensa@linux-xtensa.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: openrisc@lists.librecores.org
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      Link: https://lkml.kernel.org/r/20190322143008.21313-3-longman@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      390a0c62
    • P
      arch/tlb: Clean up simple architectures · 6137fed0
      Peter Zijlstra 提交于
      For the architectures that do not implement their own tlb_flush() but
      do already use the generic mmu_gather, there are two options:
      
       1) the platform has an efficient flush_tlb_range() and
          asm-generic/tlb.h doesn't need any overrides at all.
      
       2) the platform lacks an efficient flush_tlb_range() and
          we select MMU_GATHER_NO_RANGE to minimize full invalidates.
      
      Convert all 'simple' architectures to one of these two forms.
      
      alpha:	    has no range invalidate -> 2
      arc:	    already used flush_tlb_range() -> 1
      c6x:	    has no range invalidate -> 2
      hexagon:    has an efficient flush_tlb_range() -> 1
                  (flush_tlb_mm() is in fact a full range invalidate,
      	     so no need to shoot down everything)
      m68k:	    has inefficient flush_tlb_range() -> 2
      microblaze: has no flush_tlb_range() -> 2
      mips:	    has efficient flush_tlb_range() -> 1
      	    (even though it currently seems to use flush_tlb_mm())
      nds32:	    already uses flush_tlb_range() -> 1
      nios2:	    has inefficient flush_tlb_range() -> 2
      	    (no limit on range iteration)
      openrisc:   has inefficient flush_tlb_range() -> 2
      	    (no limit on range iteration)
      parisc:	    already uses flush_tlb_range() -> 1
      sparc32:    already uses flush_tlb_range() -> 1
      unicore32:  has inefficient flush_tlb_range() -> 2
      	    (no limit on range iteration)
      xtensa:	    has efficient flush_tlb_range() -> 1
      
      Note this also fixes a bug in the existing code for a number
      platforms. Those platforms that did:
      
        tlb_end_vma() -> if (!full_mm) flush_tlb_*()
        tlb_flush -> if (full_mm) flush_tlb_mm()
      
      missed the case of shift_arg_pages(), which doesn't have @fullmm set,
      nor calls into tlb_*vma(), but still frees page-tables and thus needs
      an invalidate. The new code handles this by detecting a non-empty
      range, and either issuing the matching range invalidate or a full
      invalidate, depending on the capabilities.
      
      No change in behavior intended.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Piggin <npiggin@gmail.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6137fed0
  2. 14 12月, 2018 1 次提交
  3. 06 12月, 2018 1 次提交
  4. 23 11月, 2018 5 次提交
  5. 31 10月, 2018 2 次提交
  6. 27 10月, 2018 1 次提交
  7. 02 8月, 2018 3 次提交
  8. 13 6月, 2018 1 次提交
    • S
      alpha: Remove custom dec_and_lock() implementation · f2ae6794
      Sebastian Andrzej Siewior 提交于
      Alpha provides a custom implementation of dec_and_lock(). The functions
      is split into two parts:
      - atomic_add_unless() + return 0 (fast path in assembly)
      - remaining part including locking (slow path in C)
      
      Comparing the result of the alpha implementation with the generic
      implementation compiled by gcc it looks like the fast path is optimized
      by avoiding a stack frame (and reloading the GP), register store and all
      this. This is only done in the slowpath.
      After marking the slowpath (atomic_dec_and_lock_1()) as "noinline" and
      doing the slowpath in C (the atomic_add_unless(atomic, -1, 1) part) I
      noticed differences in the resulting assembly:
      - the GP is still reloaded
      - atomic_add_unless() adds more memory barriers compared to the custom
        assembly
      - the custom assembly here does "load, sub, beq" while
        atomic_add_unless() does "load, cmpeq, add, bne". This is okay because
        it compares against zero after subtraction while the generic code
        compares against 1 before.
      
      I'm not sure if avoiding the stack frame (and GP reloading) brings a lot
      in terms of performance. Regarding the different barriers, Peter
      Zijlstra says:
      
      |refcount decrement needs to be a RELEASE operation, such that all the
      |load/stores to the object happen before we decrement the refcount.
      |
      |Otherwise things like:
      |
      |        obj->foo = 5;
      |        refcnt_dec(&obj->ref);
      |
      |can be re-ordered, which then allows fun scenarios like:
      |
      |        CPU0                            CPU1
      |
      |        refcnt_dec(&obj->ref);
      |                                        if (dec_and_test(&obj->ref))
      |                                                free(obj);
      |        obj->foo = 5; // oops UaF
      |
      |
      |This means (for alpha) that there should be a memory barrier _before_
      |the decrement, however the dec_and_lock asm thing only has one _after_,
      |which, per the above, is too late.
      |
      |The generic version using add_unless will result in memory barrier
      |before and after (because that is the rule for atomic ops with a return
      |value) which is strictly too many barriers for the refcount story, but
      |who knows what other ordering requirements code has.
      
      Remove the custom alpha implementation of dec_and_lock() and if it is an
      issue (performance wise) then the fast path could still be inlined.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: linux-alpha@vger.kernel.org
      Link: https://lkml.kernel.org/r/20180606115918.GG12198@hirez.programming.kicks-ass.net
      Link: https://lkml.kernel.org/r20180612161621.22645-2-bigeasy@linutronix.de
      f2ae6794
  9. 23 5月, 2018 1 次提交
  10. 09 5月, 2018 4 次提交
  11. 17 4月, 2018 1 次提交
  12. 08 4月, 2018 1 次提交
    • M
      alpha: Implement CPU vulnerabilities sysfs functions. · 6fd16ce5
      Michael Cree 提交于
      Implement the CPU vulnerabilty show functions for meltdown, spectre_v1
      and spectre_v2 on Alpha.
      
      Tests on XP1000 (EV67/667MHz) and ES45 (EV68CB/1.25GHz) show them
      to be vulnerable to Meltdown and Spectre V1.  In the case of
      Meltdown I saw a 1 to 2% success rate in reading bytes on the
      XP1000 and 50 to 60% success rate on the ES45. (This compares to
      99.97% success reported for Intel CPUs.)  Report EV6 and later
      CPUs as vulnerable.
      
      Tests on PWS600au (EV56/600MHz) for Spectre V1 attack were
      unsuccessful (though I did not try particularly hard) so mark EV4
      through to EV56 as not vulnerable.
      Signed-off-by: NMichael Cree <mcree@orcon.net.nz>
      Signed-off-by: NMatt Turner <mattst88@gmail.com>
      6fd16ce5
  13. 09 1月, 2018 1 次提交
  14. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  15. 12 10月, 2017 1 次提交
  16. 27 4月, 2017 1 次提交
  17. 29 3月, 2017 1 次提交
  18. 08 10月, 2016 1 次提交
    • V
      atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE · 51a02124
      Vineet Gupta 提交于
      This came to light when implementing native 64-bit atomics for ARCv2.
      
      The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
      to check whether atomic64_dec_if_positive() is available.  It seems it
      was needed when not every arch defined it.  However as of current code
      the Kconfig option seems needless
      
       - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
         generic definition of API is present lib/atomic64.c
       - arches with native 64-bit atomics select it in arch/*/Kconfig and
         define the API in their headers
      
      So I see no point in keeping the Kconfig option
      
      Compile tested for:
       - blackfin (CONFIG_GENERIC_ATOMIC64)
       - x86 (!CONFIG_GENERIC_ATOMIC64)
       - ia64
      
      Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.comSigned-off-by: NVineet Gupta <vgupta@synopsys.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ming Lin <ming.l@ssi.samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51a02124
  19. 21 5月, 2016 1 次提交
    • Z
      lib/GCD.c: use binary GCD algorithm instead of Euclidean · fff7fb0b
      Zhaoxiu Zeng 提交于
      The binary GCD algorithm is based on the following facts:
      	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
      	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
      	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)
      
      Even on x86 machines with reasonable division hardware, the binary
      algorithm runs about 25% faster (80% the execution time) than the
      division-based Euclidian algorithm.
      
      On platforms like Alpha and ARMv6 where division is a function call to
      emulation code, it's even more significant.
      
      There are two variants of the code here, depending on whether a fast
      __ffs (find least significant set bit) instruction is available.  This
      allows the unpredictable branches in the bit-at-a-time shifting loop to
      be eliminated.
      
      If fast __ffs is not available, the "even/odd" GCD variant is used.
      
      I use the following code to benchmark:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <stdint.h>
      	#include <string.h>
      	#include <time.h>
      	#include <unistd.h>
      
      	#define swap(a, b) \
      		do { \
      			a ^= b; \
      			b ^= a; \
      			a ^= b; \
      		} while (0)
      
      	unsigned long gcd0(unsigned long a, unsigned long b)
      	{
      		unsigned long r;
      
      		if (a < b) {
      			swap(a, b);
      		}
      
      		if (b == 0)
      			return a;
      
      		while ((r = a % b) != 0) {
      			a = b;
      			b = r;
      		}
      
      		return b;
      	}
      
      	unsigned long gcd1(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		b >>= __builtin_ctzl(b);
      
      		for (;;) {
      			a >>= __builtin_ctzl(a);
      			if (a == b)
      				return a << __builtin_ctzl(r);
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      		}
      	}
      
      	unsigned long gcd2(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		r &= -r;
      
      		while (!(b & r))
      			b >>= 1;
      
      		for (;;) {
      			while (!(a & r))
      				a >>= 1;
      			if (a == b)
      				return a;
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      			a >>= 1;
      			if (a & r)
      				a += b;
      			a >>= 1;
      		}
      	}
      
      	unsigned long gcd3(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		b >>= __builtin_ctzl(b);
      		if (b == 1)
      			return r & -r;
      
      		for (;;) {
      			a >>= __builtin_ctzl(a);
      			if (a == 1)
      				return r & -r;
      			if (a == b)
      				return a << __builtin_ctzl(r);
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      		}
      	}
      
      	unsigned long gcd4(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		r &= -r;
      
      		while (!(b & r))
      			b >>= 1;
      		if (b == r)
      			return r;
      
      		for (;;) {
      			while (!(a & r))
      				a >>= 1;
      			if (a == r)
      				return r;
      			if (a == b)
      				return a;
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      			a >>= 1;
      			if (a & r)
      				a += b;
      			a >>= 1;
      		}
      	}
      
      	static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
      		gcd0, gcd1, gcd2, gcd3, gcd4,
      	};
      
      	#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))
      
      	#if defined(__x86_64__)
      
      	#define rdtscll(val) do { \
      		unsigned long __a,__d; \
      		__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
      		(val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \
      	} while(0)
      
      	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
      								unsigned long a, unsigned long b, unsigned long *res)
      	{
      		unsigned long long start, end;
      		unsigned long long ret;
      		unsigned long gcd_res;
      
      		rdtscll(start);
      		gcd_res = gcd(a, b);
      		rdtscll(end);
      
      		if (end >= start)
      			ret = end - start;
      		else
      			ret = ~0ULL - start + 1 + end;
      
      		*res = gcd_res;
      		return ret;
      	}
      
      	#else
      
      	static inline struct timespec read_time(void)
      	{
      		struct timespec time;
      		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
      		return time;
      	}
      
      	static inline unsigned long long diff_time(struct timespec start, struct timespec end)
      	{
      		struct timespec temp;
      
      		if ((end.tv_nsec - start.tv_nsec) < 0) {
      			temp.tv_sec = end.tv_sec - start.tv_sec - 1;
      			temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
      		} else {
      			temp.tv_sec = end.tv_sec - start.tv_sec;
      			temp.tv_nsec = end.tv_nsec - start.tv_nsec;
      		}
      
      		return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
      	}
      
      	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
      								unsigned long a, unsigned long b, unsigned long *res)
      	{
      		struct timespec start, end;
      		unsigned long gcd_res;
      
      		start = read_time();
      		gcd_res = gcd(a, b);
      		end = read_time();
      
      		*res = gcd_res;
      		return diff_time(start, end);
      	}
      
      	#endif
      
      	static inline unsigned long get_rand()
      	{
      		if (sizeof(long) == 8)
      			return (unsigned long)rand() << 32 | rand();
      		else
      			return rand();
      	}
      
      	int main(int argc, char **argv)
      	{
      		unsigned int seed = time(0);
      		int loops = 100;
      		int repeats = 1000;
      		unsigned long (*res)[TEST_ENTRIES];
      		unsigned long long elapsed[TEST_ENTRIES];
      		int i, j, k;
      
      		for (;;) {
      			int opt = getopt(argc, argv, "n:r:s:");
      			/* End condition always first */
      			if (opt == -1)
      				break;
      
      			switch (opt) {
      			case 'n':
      				loops = atoi(optarg);
      				break;
      			case 'r':
      				repeats = atoi(optarg);
      				break;
      			case 's':
      				seed = strtoul(optarg, NULL, 10);
      				break;
      			default:
      				/* You won't actually get here. */
      				break;
      			}
      		}
      
      		res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
      		memset(elapsed, 0, sizeof(elapsed));
      
      		srand(seed);
      		for (j = 0; j < loops; j++) {
      			unsigned long a = get_rand();
      			/* Do we have args? */
      			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
      			unsigned long long min_elapsed[TEST_ENTRIES];
      			for (k = 0; k < repeats; k++) {
      				for (i = 0; i < TEST_ENTRIES; i++) {
      					unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
      					if (k == 0 || min_elapsed[i] > tmp)
      						min_elapsed[i] = tmp;
      				}
      			}
      			for (i = 0; i < TEST_ENTRIES; i++)
      				elapsed[i] += min_elapsed[i];
      		}
      
      		for (i = 0; i < TEST_ENTRIES; i++)
      			printf("gcd%d: elapsed %llu\n", i, elapsed[i]);
      
      		k = 0;
      		srand(seed);
      		for (j = 0; j < loops; j++) {
      			unsigned long a = get_rand();
      			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
      			for (i = 1; i < TEST_ENTRIES; i++) {
      				if (res[j][i] != res[j][0])
      					break;
      			}
      			if (i < TEST_ENTRIES) {
      				if (k == 0) {
      					k = 1;
      					fprintf(stderr, "Error:\n");
      				}
      				fprintf(stderr, "gcd(%lu, %lu): ", a, b);
      				for (i = 0; i < TEST_ENTRIES; i++)
      					fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
      			}
      		}
      
      		if (k == 0)
      			fprintf(stderr, "PASS\n");
      
      		free(res);
      
      		return 0;
      	}
      
      Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:
      
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 10174
        gcd1: elapsed 2120
        gcd2: elapsed 2902
        gcd3: elapsed 2039
        gcd4: elapsed 2812
        PASS
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 9309
        gcd1: elapsed 2280
        gcd2: elapsed 2822
        gcd3: elapsed 2217
        gcd4: elapsed 2710
        PASS
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 9589
        gcd1: elapsed 2098
        gcd2: elapsed 2815
        gcd3: elapsed 2030
        gcd4: elapsed 2718
        PASS
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 9914
        gcd1: elapsed 2309
        gcd2: elapsed 2779
        gcd3: elapsed 2228
        gcd4: elapsed 2709
        PASS
      
      [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
      Signed-off-by: NZhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
      Signed-off-by: NGeorge Spelvin <linux@horizon.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fff7fb0b
  20. 26 4月, 2016 1 次提交
  21. 21 1月, 2016 1 次提交
    • C
      dma-mapping: always provide the dma_map_ops based implementation · e1c7e324
      Christoph Hellwig 提交于
      Move the generic implementation to <linux/dma-mapping.h> now that all
      architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now
      that everyone supports them.
      
      [valentinrothberg@gmail.com: remove leftovers in Kconfig]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NValentin Rothberg <valentinrothberg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1c7e324
  22. 31 8月, 2015 1 次提交
    • M
      alpha: select CONFIG_ARCH_USE_CMPXCHG_LOCKREF. · 8f8dcb3f
      Matt Turner 提交于
      On Alpha we have spinlocks that are 32b in size and an efficient
      cmpxchg64 implementation, so we qualify to make use of cmpxchg backed
      lockrefs. Select the ARCH_USE_CMPXCHG_LOCKREF Kconfig symbol and provide
      a trivial implementation of arch_spin_value_unlocked to satisfy the
      lockref code.
      
      Using Linus' simple testcase from
      http://article.gmane.org/gmane.linux.file-systems/77466 on a dual CPU
      ES47 system I see around an 8% gain:
      
          N       Min       Max    Median       Avg      Stddev
      x  30   6194580   6295654   6272504   6272514   17694.232
      +  30   6731164   6786334   6767982d   6764274   13738.863
      Difference at 95.0% confidence
              491760 +/- 8188.17
              7.83992% +/- 0.130541%
              (Student's t, pooled s = 15840.5)
      Signed-off-by: NMatt Turner <mattst88@gmail.com>
      8f8dcb3f
  23. 15 4月, 2015 1 次提交
    • K
      alpha: expose number of page table levels on Kconfig level · f5d4547a
      Kirill A. Shutemov 提交于
      I've implemented accounting for pmd page tables as we have for pte (see
      mm->nr_ptes).  It's requires a new counter in mm_struct: mm->nr_pmds.
      
      But the feature doesn't make any sense if an architecture has PMD level
      folded and it would be nice get rid of the counter in this case.
      
      The problem is that we cannot use __PAGETABLE_PMD_FOLDED in
      <linux/mm_types.h> due to circular dependencies:
      
      <linux/mm_types> -> <asm/pgtable.h> -> <linux/mm_types.h>
      
      In most cases <asm/pgtable.h> wants <linux/mm_types.h> to get definition
      of struct page and struct vm_area_struct.  I've tried to split mm_struct
      into separate header file to be able to user <asm/pgtable.h> there.
      
      But it doesn't fly on some architectures, like ARM: it wants mm_struct
      <asm/pgtable.h> to implement tlb flushing.  I don't see how to fix it
      without massive de-inlining or coverting a lot for inline functions to
      macros.
      
      This is other approach: expose number of page tables in use via Kconfig
      and use it in <linux/mm_types.h> instead of __PAGETABLE_PMD_FOLDED from
      <asm/pgtable.h>.
      
      This patch (of 19):
      
      We would want to use number of page table level to define mm_struct.
      Let's expose it as CONFIG_PGTABLE_LEVELS.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NRichard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5d4547a
  24. 20 3月, 2014 2 次提交
  25. 01 2月, 2014 1 次提交
  26. 24 1月, 2014 1 次提交
  27. 03 1月, 2014 1 次提交
  28. 17 11月, 2013 1 次提交