1. 02 8月, 2016 1 次提交
    • P
      MIPS: Use per-mm page to execute branch delay slot instructions · 432c6bac
      Paul Burton 提交于
      In some cases the kernel needs to execute an instruction from the delay
      slot of an emulated branch instruction. These cases include:
      
        - Emulated floating point branch instructions (bc1[ft]l?) for systems
          which don't include an FPU, or upon which the kernel is run with the
          "nofpu" parameter.
      
        - MIPSr6 systems running binaries targeting older revisions of the
          architecture, which may include branch instructions whose encodings
          are no longer valid in MIPSr6.
      
      Executing instructions from such delay slots is done by writing the
      instruction to memory followed by a trap, as part of an "emuframe", and
      executing it. This avoids the requirement of an emulator for the entire
      MIPS instruction set. Prior to this patch such emuframes are written to
      the user stack and executed from there.
      
      This patch moves FP branch delay emuframes off of the user stack and
      into a per-mm page. Allocating a page per-mm leaves userland with access
      to only what it had access to previously, and compared to other
      solutions is relatively simple.
      
      When a thread requires a delay slot emulation, it is allocated a frame.
      A thread may only have one frame allocated at any one time, since it may
      only ever be executing one instruction at any one time. In order to
      ensure that we can free up allocated frame later, its index is recorded
      in struct thread_struct. In the typical case, after executing the delay
      slot instruction we'll execute a break instruction with the BRK_MEMU
      code. This traps back to the kernel & leads to a call to do_dsemulret
      which frees the allocated frame & moves the user PC back to the
      instruction that would have executed following the emulated branch.
      In some cases the delay slot instruction may be invalid, such as a
      branch, or may trigger an exception. In these cases the BRK_MEMU break
      instruction will not be hit. In order to ensure that frames are freed
      this patch introduces dsemul_thread_cleanup() and calls it to free any
      allocated frame upon thread exit. If the instruction generated an
      exception & leads to a signal being delivered to the thread, or indeed
      if a signal simply happens to be delivered to the thread whilst it is
      executing from the struct emuframe, then we need to take care to exit
      the frame appropriately. This is done by either rolling back the user PC
      to the branch or advancing it to the continuation PC prior to signal
      delivery, using dsemul_thread_rollback(). If this were not done then a
      sigreturn would return to the struct emuframe, and if that frame had
      meanwhile been used in response to an emulated branch instruction within
      the signal handler then we would execute the wrong user code.
      
      Whilst a user could theoretically place something like a compact branch
      to self in a delay slot and cause their thread to become stuck in an
      infinite loop with the frame never being deallocated, this would:
      
        - Only affect the users single process.
      
        - Be architecturally invalid since there would be a branch in the
          delay slot, which is forbidden.
      
        - Be extremely unlikely to happen by mistake, and provide a program
          with no more ability to harm the system than a simple infinite loop
          would.
      
      If a thread requires a delay slot emulation & no frame is available to
      it (ie. the process has enough other threads that all frames are
      currently in use) then the thread joins a waitqueue. It will sleep until
      a frame is freed by another thread in the process.
      
      Since we now know whether a thread has an allocated frame due to our
      tracking of its index, the cookie field of struct emuframe is removed as
      we can be more certain whether we have a valid frame. Since a thread may
      only ever have a single frame at any given time, the epc field of struct
      emuframe is also removed & the PC to continue from is instead stored in
      struct thread_struct. Together these changes simplify & shrink struct
      emuframe somewhat, allowing twice as many frames to fit into the page
      allocated for them.
      
      The primary benefit of this patch is that we are now free to mark the
      user stack non-executable where that is possible.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
      Cc: Maciej Rozycki <maciej.rozycki@imgtec.com>
      Cc: Faraz Shahbazker <faraz.shahbazker@imgtec.com>
      Cc: Raghu Gandham <raghu.gandham@imgtec.com>
      Cc: Matthew Fortune <matthew.fortune@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/13764/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      432c6bac
  2. 29 7月, 2016 3 次提交
    • J
      MIPS: c-r4k: Exclude sibling CPUs in SMP calls · 640511ae
      James Hogan 提交于
      When performing SMP calls to foreign cores, exclude sibling CPUs from
      the provided map, as we already handle the local core on the current
      CPU. This prevents an SMP call from for example core 0, VPE 1 to VPE 0
      on the same core.
      
      In the process the cpu_foreign_map cpumask is turned into an array of
      cpumasks, so that each CPU has its own version of it which excludes
      sibling CPUs. r4k_op_needs_ipi() is also updated to reflect that cache
      management SMP calls are not needed when all CPUs are siblings (i.e.
      there are no foreign CPUs according to the new cpu_foreign_map[]
      semantics which exclude siblings).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
      Cc: Felix Fietkau <nbd@nbd.name>
      Cc: Jayachandran C. <jchandra@broadcom.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/13801/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      640511ae
    • J
      MIPS: c-r4k: Fix protected_writeback_scache_line for EVA · 0758b116
      James Hogan 提交于
      The protected_writeback_scache_line() function is used by
      local_r4k_flush_cache_sigtramp() to flush an FPU delay slot emulation
      trampoline on the userland stack from the caches so it is visible to
      subsequent instruction fetches.
      
      Commit de8974e3 ("MIPS: asm: r4kcache: Add EVA cache flushing
      functions") updated some protected_ cache flush functions to use EVA
      CACHEE instructions via protected_cachee_op(), and commit 83fd4344
      ("MIPS: r4kcache: Add EVA case for protected_writeback_dcache_line") did
      the same thing for protected_writeback_dcache_line(), but
      protected_writeback_scache_line() never got updated. Lets fix that now
      to flush the right user address from the secondary cache rather than
      some arbitrary kernel unmapped address.
      
      This issue was spotted through code inspection, and it seems unlikely to
      be possible to hit this in practice. It theoretically affect EVA kernels
      on EVA capable cores with an L2 cache, where the icache fetches straight
      from RAM (cpu_icache_snoops_remote_store == 0), running a hard float
      userland with FPU disabled (nofpu). That both Malta and Boston platforms
      override cpu_icache_snoops_remote_store to 1 suggests that all MIPS
      cores fetch instructions into icache straight from L2 rather than RAM.
      
      Fixes: de8974e3 ("MIPS: asm: r4kcache: Add EVA cache flushing functions")
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Leonid Yegoshin <leonid.yegoshin@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/13800/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      0758b116
    • J
      MIPS: SMP: Update cpu_foreign_map on CPU disable · 826e99be
      James Hogan 提交于
      When a CPU is disabled via CPU hotplug, cpu_foreign_map is not updated.
      This could result in cache management SMP calls being sent to offline
      CPUs instead of online siblings in the same core.
      
      Add a call to calculate_cpu_foreign_map() in the various MIPS cpu
      disable callbacks after set_cpu_online(). All cases are updated for
      consistency and to keep cpu_foreign_map strictly up to date, not just
      those which may support hardware multithreading.
      
      Fixes: cccf34e9 ("MIPS: c-r4k: Fix cache flushing for MT cores")
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: Kevin Cernekee <cernekee@gmail.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Hongliang Tao <taohl@lemote.com>
      Cc: Hua Yan <yanh@lemote.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/13799/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      826e99be
  3. 28 7月, 2016 2 次提交
  4. 12 7月, 2016 1 次提交
  5. 06 7月, 2016 1 次提交
    • D
      MIPS: Fix page table corruption on THP permission changes. · acd168c0
      David Daney 提交于
      When the core THP code is modifying the permissions of a huge page it
      calls pmd_modify(), which unfortunately was clearing the _PAGE_HUGE bit
      of the page table entry.  The result can be kernel messages like:
      
      mm/memory.c:397: bad pmd 000000040080004d.
      mm/memory.c:397: bad pmd 00000003ff00004d.
      mm/memory.c:397: bad pmd 000000040100004d.
      
      or:
      
      ------------[ cut here ]------------
      WARNING: at mm/mmap.c:3200 exit_mmap+0x150/0x158()
      Modules linked in: ipv6 at24 octeon3_ethernet octeon_srio_nexus m25p80
      CPU: 12 PID: 1295 Comm: pmderr Not tainted 3.10.87-rt80-Cavium-Octeon #4
      Stack : 0000000040808000 0000000014009ce1 0000000000400004 ffffffff81076ba0
                0000000000000000 0000000000000000 ffffffff85110000 0000000000000119
                0000000000000004 0000000000000000 0000000000000119 43617669756d2d4f
                0000000000000000 ffffffff850fda40 ffffffff85110000 0000000000000000
                0000000000000000 0000000000000009 ffffffff809207a0 0000000000000c80
                ffffffff80f1bf20 0000000000000001 000000ffeca36828 0000000000000001
                0000000000000000 0000000000000001 000000ffeca7e700 ffffffff80886924
                80000003fd7a0000 80000003fd7a39b0 80000003fdea8000 ffffffff80885780
                80000003fdea8000 ffffffff80f12218 000000000000000c 000000000000050f
                0000000000000000 ffffffff80865c4c 0000000000000000 0000000000000000
                ...
      Call Trace:
      [<ffffffff80865c4c>] show_stack+0x6c/0xf8
      [<ffffffff80885780>] warn_slowpath_common+0x78/0xa8
      [<ffffffff809207a0>] exit_mmap+0x150/0x158
      [<ffffffff80882d44>] mmput+0x5c/0x110
      [<ffffffff8088b450>] do_exit+0x230/0xa68
      [<ffffffff8088be34>] do_group_exit+0x54/0x1d0
      [<ffffffff8088bfc0>] __wake_up_parent+0x0/0x18
      
      ---[ end trace c7b38293191c57dc ]---
      BUG: Bad rss-counter state mm:80000003fa168000 idx:1 val:1536
      
      Fix by not clearing _PAGE_HUGE bit.
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Tested-by: NAaro Koskinen <aaro.koskinen@nokia.com>
      Cc: stable@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/13687/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      acd168c0
  6. 02 7月, 2016 1 次提交
    • R
      MIPS: Fix possible corruption of cache mode by mprotect. · 6d037de9
      Ralf Baechle 提交于
      The following testcase may result in a page table entries with a invalid
      CCA field being generated:
      
      static void *bindstack;
      
      static int sysrqfd;
      
      static void protect_low(int protect)
      {
      	mprotect(bindstack, BINDSTACK_SIZE, protect);
      }
      
      static void sigbus_handler(int signal, siginfo_t * info, void *context)
      {
      	void *addr = info->si_addr;
      
      	write(sysrqfd, "x", 1);
      
      	printf("sigbus, fault address %p (should not happen, but might)\n",
      	       addr);
      	abort();
      }
      
      static void run_bind_test(void)
      {
      	unsigned int *p = bindstack;
      
      	p[0] = 0xf001f001;
      
      	write(sysrqfd, "x", 1);
      
      	/* Set trap on access to p[0] */
      	protect_low(PROT_NONE);
      
      	write(sysrqfd, "x", 1);
      
      	/* Clear trap on access to p[0] */
      	protect_low(PROT_READ | PROT_WRITE | PROT_EXEC);
      
      	write(sysrqfd, "x", 1);
      
      	/* Check the contents of p[0] */
      	if (p[0] != 0xf001f001) {
      		write(sysrqfd, "x", 1);
      
      		/* Reached, but shouldn't be */
      		printf("badness, shouldn't happen but does\n");
      		abort();
      	}
      }
      
      int main(void)
      {
      	struct sigaction sa;
      
      	sysrqfd = open("/proc/sysrq-trigger", O_WRONLY);
      
      	if (sigprocmask(SIG_BLOCK, NULL, &sa.sa_mask)) {
      		perror("sigprocmask");
      		return 0;
      	}
      
      	sa.sa_sigaction = sigbus_handler;
      	sa.sa_flags = SA_SIGINFO | SA_NODEFER | SA_RESTART;
      	if (sigaction(SIGBUS, &sa, NULL)) {
      		perror("sigaction");
      		return 0;
      	}
      
      	bindstack = mmap(NULL,
      			 BINDSTACK_SIZE,
      			 PROT_READ | PROT_WRITE | PROT_EXEC,
      			 MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
      	if (bindstack == MAP_FAILED) {
      		perror("mmap bindstack");
      		return 0;
      	}
      
      	printf("bindstack: %p\n", bindstack);
      
      	run_bind_test();
      
      	printf("done\n");
      
      	return 0;
      }
      
      There are multiple ingredients for this:
      
       1) PAGE_NONE is defined to _CACHE_CACHABLE_NONCOHERENT, which is CCA 3
          on all platforms except SB1 where it's CCA 5.
       2) _page_cachable_default must have bits set which are not set
          _CACHE_CACHABLE_NONCOHERENT.
       3) Either the defective version of pte_modify for XPA or the standard
          version must be in used.  However pte_modify for the 36 bit address
          space support is no affected.
      
      In that case additional bits in the final CCA mode may generate an invalid
      value for the CCA field.  On the R10000 system where this was tracked
      down for example a CCA 7 has been observed, which is Uncached Accelerated.
      
      Fixed by:
      
       1) Using the proper CCA mode for PAGE_NONE just like for all the other
          PAGE_* pte/pmd bits.
       2) Fix the two affected variants of pte_modify.
      
      Further code inspection also shows the same issue to exist in pmd_modify
      which would affect huge page systems.
      
      Issue in pte_modify tracked down by Alastair Bridgewater, PAGE_NONE
      and pmd_modify issue found by me.
      
      The history of this goes back beyond Linus' git history.  Chris Dearman's
      commit 35133692 ("[MIPS] Allow setting of
      the cache attribute at run time.") missed the opportunity to fix this
      but it was originally introduced in lmo commit
      d523832cf12007b3242e50bb77d0c9e63e0b6518 ("Missing from last commit.")
      and 32cc38229ac7538f2346918a09e75413e8861f87 ("New configuration option
      CONFIG_MIPS_UNCACHED.")
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Reported-by: NAlastair Bridgewater <alastair.bridgewater@gmail.com>
      6d037de9
  7. 25 6月, 2016 1 次提交
  8. 14 6月, 2016 2 次提交
    • J
      MIPS: KVM: Include bit 31 in segment matches · 7f5a1ddc
      James Hogan 提交于
      When faulting guest addresses are matched against guest segments with
      the KVM_GUEST_KSEGX() macro, change the mask to 0xe0000000 so as to
      include bit 31.
      
      This is mainly for safety's sake, as it prevents a rogue BadVAddr in the
      host kseg2/kseg3 segments (e.g. 0xC*******) after a TLB exception from
      matching the guest kseg0 segment (e.g. 0x4*******), triggering an
      internal KVM error instead of allowing the corresponding guest kseg0
      page to be mapped into the host vmalloc space.
      
      Such a rogue BadVAddr was observed to happen with the host MIPS kernel
      running under QEMU with KVM built as a module, due to a not entirely
      transparent optimisation in the QEMU TLB handling. This has already been
      worked around properly in a previous commit.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7f5a1ddc
    • J
      MIPS: KVM: Fix modular KVM under QEMU · 797179bc
      James Hogan 提交于
      Copy __kvm_mips_vcpu_run() into unmapped memory, so that we can never
      get a TLB refill exception in it when KVM is built as a module.
      
      This was observed to happen with the host MIPS kernel running under
      QEMU, due to a not entirely transparent optimisation in the QEMU TLB
      handling where TLB entries replaced with TLBWR are copied to a separate
      part of the TLB array. Code in those pages continue to be executable,
      but those mappings persist only until the next ASID switch, even if they
      are marked global.
      
      An ASID switch happens in __kvm_mips_vcpu_run() at exception level after
      switching to the guest exception base. Subsequent TLB mapped kernel
      instructions just prior to switching to the guest trigger a TLB refill
      exception, which enters the guest exception handlers without updating
      EPC. This appears as a guest triggered TLB refill on a host kernel
      mapped (host KSeg2) address, which is not handled correctly as user
      (guest) mode accesses to kernel (host) segments always generate address
      error exceptions.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.10.x-
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      797179bc
  9. 28 5月, 2016 18 次提交
  10. 21 5月, 2016 1 次提交
    • Z
      lib/GCD.c: use binary GCD algorithm instead of Euclidean · fff7fb0b
      Zhaoxiu Zeng 提交于
      The binary GCD algorithm is based on the following facts:
      	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
      	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
      	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)
      
      Even on x86 machines with reasonable division hardware, the binary
      algorithm runs about 25% faster (80% the execution time) than the
      division-based Euclidian algorithm.
      
      On platforms like Alpha and ARMv6 where division is a function call to
      emulation code, it's even more significant.
      
      There are two variants of the code here, depending on whether a fast
      __ffs (find least significant set bit) instruction is available.  This
      allows the unpredictable branches in the bit-at-a-time shifting loop to
      be eliminated.
      
      If fast __ffs is not available, the "even/odd" GCD variant is used.
      
      I use the following code to benchmark:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <stdint.h>
      	#include <string.h>
      	#include <time.h>
      	#include <unistd.h>
      
      	#define swap(a, b) \
      		do { \
      			a ^= b; \
      			b ^= a; \
      			a ^= b; \
      		} while (0)
      
      	unsigned long gcd0(unsigned long a, unsigned long b)
      	{
      		unsigned long r;
      
      		if (a < b) {
      			swap(a, b);
      		}
      
      		if (b == 0)
      			return a;
      
      		while ((r = a % b) != 0) {
      			a = b;
      			b = r;
      		}
      
      		return b;
      	}
      
      	unsigned long gcd1(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		b >>= __builtin_ctzl(b);
      
      		for (;;) {
      			a >>= __builtin_ctzl(a);
      			if (a == b)
      				return a << __builtin_ctzl(r);
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      		}
      	}
      
      	unsigned long gcd2(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		r &= -r;
      
      		while (!(b & r))
      			b >>= 1;
      
      		for (;;) {
      			while (!(a & r))
      				a >>= 1;
      			if (a == b)
      				return a;
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      			a >>= 1;
      			if (a & r)
      				a += b;
      			a >>= 1;
      		}
      	}
      
      	unsigned long gcd3(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		b >>= __builtin_ctzl(b);
      		if (b == 1)
      			return r & -r;
      
      		for (;;) {
      			a >>= __builtin_ctzl(a);
      			if (a == 1)
      				return r & -r;
      			if (a == b)
      				return a << __builtin_ctzl(r);
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      		}
      	}
      
      	unsigned long gcd4(unsigned long a, unsigned long b)
      	{
      		unsigned long r = a | b;
      
      		if (!a || !b)
      			return r;
      
      		r &= -r;
      
      		while (!(b & r))
      			b >>= 1;
      		if (b == r)
      			return r;
      
      		for (;;) {
      			while (!(a & r))
      				a >>= 1;
      			if (a == r)
      				return r;
      			if (a == b)
      				return a;
      
      			if (a < b)
      				swap(a, b);
      			a -= b;
      			a >>= 1;
      			if (a & r)
      				a += b;
      			a >>= 1;
      		}
      	}
      
      	static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
      		gcd0, gcd1, gcd2, gcd3, gcd4,
      	};
      
      	#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))
      
      	#if defined(__x86_64__)
      
      	#define rdtscll(val) do { \
      		unsigned long __a,__d; \
      		__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
      		(val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \
      	} while(0)
      
      	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
      								unsigned long a, unsigned long b, unsigned long *res)
      	{
      		unsigned long long start, end;
      		unsigned long long ret;
      		unsigned long gcd_res;
      
      		rdtscll(start);
      		gcd_res = gcd(a, b);
      		rdtscll(end);
      
      		if (end >= start)
      			ret = end - start;
      		else
      			ret = ~0ULL - start + 1 + end;
      
      		*res = gcd_res;
      		return ret;
      	}
      
      	#else
      
      	static inline struct timespec read_time(void)
      	{
      		struct timespec time;
      		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
      		return time;
      	}
      
      	static inline unsigned long long diff_time(struct timespec start, struct timespec end)
      	{
      		struct timespec temp;
      
      		if ((end.tv_nsec - start.tv_nsec) < 0) {
      			temp.tv_sec = end.tv_sec - start.tv_sec - 1;
      			temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
      		} else {
      			temp.tv_sec = end.tv_sec - start.tv_sec;
      			temp.tv_nsec = end.tv_nsec - start.tv_nsec;
      		}
      
      		return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
      	}
      
      	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
      								unsigned long a, unsigned long b, unsigned long *res)
      	{
      		struct timespec start, end;
      		unsigned long gcd_res;
      
      		start = read_time();
      		gcd_res = gcd(a, b);
      		end = read_time();
      
      		*res = gcd_res;
      		return diff_time(start, end);
      	}
      
      	#endif
      
      	static inline unsigned long get_rand()
      	{
      		if (sizeof(long) == 8)
      			return (unsigned long)rand() << 32 | rand();
      		else
      			return rand();
      	}
      
      	int main(int argc, char **argv)
      	{
      		unsigned int seed = time(0);
      		int loops = 100;
      		int repeats = 1000;
      		unsigned long (*res)[TEST_ENTRIES];
      		unsigned long long elapsed[TEST_ENTRIES];
      		int i, j, k;
      
      		for (;;) {
      			int opt = getopt(argc, argv, "n:r:s:");
      			/* End condition always first */
      			if (opt == -1)
      				break;
      
      			switch (opt) {
      			case 'n':
      				loops = atoi(optarg);
      				break;
      			case 'r':
      				repeats = atoi(optarg);
      				break;
      			case 's':
      				seed = strtoul(optarg, NULL, 10);
      				break;
      			default:
      				/* You won't actually get here. */
      				break;
      			}
      		}
      
      		res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
      		memset(elapsed, 0, sizeof(elapsed));
      
      		srand(seed);
      		for (j = 0; j < loops; j++) {
      			unsigned long a = get_rand();
      			/* Do we have args? */
      			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
      			unsigned long long min_elapsed[TEST_ENTRIES];
      			for (k = 0; k < repeats; k++) {
      				for (i = 0; i < TEST_ENTRIES; i++) {
      					unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
      					if (k == 0 || min_elapsed[i] > tmp)
      						min_elapsed[i] = tmp;
      				}
      			}
      			for (i = 0; i < TEST_ENTRIES; i++)
      				elapsed[i] += min_elapsed[i];
      		}
      
      		for (i = 0; i < TEST_ENTRIES; i++)
      			printf("gcd%d: elapsed %llu\n", i, elapsed[i]);
      
      		k = 0;
      		srand(seed);
      		for (j = 0; j < loops; j++) {
      			unsigned long a = get_rand();
      			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
      			for (i = 1; i < TEST_ENTRIES; i++) {
      				if (res[j][i] != res[j][0])
      					break;
      			}
      			if (i < TEST_ENTRIES) {
      				if (k == 0) {
      					k = 1;
      					fprintf(stderr, "Error:\n");
      				}
      				fprintf(stderr, "gcd(%lu, %lu): ", a, b);
      				for (i = 0; i < TEST_ENTRIES; i++)
      					fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
      			}
      		}
      
      		if (k == 0)
      			fprintf(stderr, "PASS\n");
      
      		free(res);
      
      		return 0;
      	}
      
      Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:
      
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 10174
        gcd1: elapsed 2120
        gcd2: elapsed 2902
        gcd3: elapsed 2039
        gcd4: elapsed 2812
        PASS
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 9309
        gcd1: elapsed 2280
        gcd2: elapsed 2822
        gcd3: elapsed 2217
        gcd4: elapsed 2710
        PASS
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 9589
        gcd1: elapsed 2098
        gcd2: elapsed 2815
        gcd3: elapsed 2030
        gcd4: elapsed 2718
        PASS
        zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
        gcd0: elapsed 9914
        gcd1: elapsed 2309
        gcd2: elapsed 2779
        gcd3: elapsed 2228
        gcd4: elapsed 2709
        PASS
      
      [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
      Signed-off-by: NZhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
      Signed-off-by: NGeorge Spelvin <linux@horizon.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fff7fb0b
  11. 20 5月, 2016 1 次提交
    • H
      arch: fix has_transparent_hugepage() · fd8cfd30
      Hugh Dickins 提交于
      I've just discovered that the useful-sounding has_transparent_hugepage()
      is actually an architecture-dependent minefield: on some arches it only
      builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
      not, but on some of those (arm and arm64) it then gives the wrong
      answer; and on mips alone it's marked __init, which would crash if
      called later (but so far it has not been called later).
      
      Straighten this out: make it available to all configs, with a sensible
      default in asm-generic/pgtable.h, removing its definitions from those
      arches (arc, arm, arm64, sparc, tile) which are served by the default,
      adding #define has_transparent_hugepage has_transparent_hugepage to
      those (mips, powerpc, s390, x86) which need to override the default at
      runtime, and removing the __init from mips (but maybe that kind of code
      should be avoided after init: set a static variable the first time it's
      called).
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Yang Shi <yang.shi@linaro.org>
      Cc: Ning Qu <quning@gmail.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
      Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd8cfd30
  12. 17 5月, 2016 2 次提交
    • J
      MIPS: Fix VZ probe gas errors with binutils <2.24 · bad50d79
      James Hogan 提交于
      The VZ guest register & TLB access macros introduced in commit "MIPS:
      Add guest CP0 accessors" use VZ ASE specific instructions that aren't
      understood by versions of binutils prior to 2.24.
      
      Add a check for whether the toolchain supports the -mvirt option,
      similar to the MSA toolchain check, and implement the accessors using
      .word if not.
      
      Due to difficulty in converting compiler specified registers (e.g. "$3")
      to usable numbers (e.g. "3") in inline asm, we need to copy to/from a
      temporary register, namely the assembler temporary (at/$1), and specify
      guest CP0 registers numerically in the gc0 macros.
      
      Fixes: 7eb91118 ("MIPS: Add guest CP0 accessors")
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-next@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/13255/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      bad50d79
    • M
      MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC · e49d3848
      Maciej W. Rozycki 提交于
      Fix a build regression from commit c9017757 ("MIPS: init upper 64b
      of vector registers when MSA is first used"):
      
      arch/mips/built-in.o: In function `enable_restore_fp_context':
      traps.c:(.text+0xbb90): undefined reference to `_init_msa_upper'
      traps.c:(.text+0xbb90): relocation truncated to fit: R_MIPS_26 against `_init_msa_upper'
      traps.c:(.text+0xbef0): undefined reference to `_init_msa_upper'
      traps.c:(.text+0xbef0): relocation truncated to fit: R_MIPS_26 against `_init_msa_upper'
      
      to !CONFIG_CPU_HAS_MSA configurations with older GCC versions, which are
      unable to figure out that calls to `_init_msa_upper' are indeed dead.
      Of the many ways to tackle this failure choose the approach we have
      already taken in `thread_msa_context_live'.
      
      [ralf@linux-mips.org: Drop patch segment to junk file.]
      Signed-off-by: NMaciej W. Rozycki <macro@imgtec.com>
      Cc: stable@vger.kernel.org # v3.16+
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/13271/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      e49d3848
  13. 14 5月, 2016 2 次提交
  14. 13 5月, 2016 4 次提交