- 28 7月, 2018 1 次提交
-
-
由 Eugeniy Paltsev 提交于
As for today we don't setup SMP_CACHE_BYTES and cache_line_size for ARC, so they are set to L1_CACHE_BYTES by default. L1 line length (L1_CACHE_BYTES) might be easily smaller than L2 line (which is usually the case BTW). This breaks code. For example this breaks ethernet infrastructure on HSDK/AXS103 boards with IOC disabled, involving manual cache flushes Functions which alloc and manage sk_buff packet data area rely on SMP_CACHE_BYTES define. In the result we can share last L2 cache line in sk_buff linear packet data area between DMA buffer and some useful data in other structure. So we can lose this data when we invalidate DMA buffer. sk_buff linear packet data area | | | skb->end skb->tail V | | V V ----------------------------------------------. packet data | <tail padding> | <useful data in other struct> ----------------------------------------------. ---------------------.--------------------------------------------------. SLC line | SLC (L2 cache) line (128B) | ---------------------.--------------------------------------------------. ^ ^ | | These cache lines will be invalidated when we invalidate skb linear packet data area before DMA transaction starting. This leads to issues painful to debug as it reproduces only if (sk_buff->end - sk_buff->tail) < SLC_LINE_SIZE and if we have some useful data right after sk_buff->end. Fix that by hardcode SMP_CACHE_BYTES to max line length we may have. Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 20 7月, 2018 1 次提交
-
-
由 Vineet Gupta 提交于
This manifsted as strace segfaulting on HSDK because gcc was targetting the accumulator registers as GPRs, which kernek was not saving/restoring by default. Cc: stable@vger.kernel.org #4.14+ Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 08 6月, 2018 1 次提交
-
-
由 Laurent Dufour 提交于
Currently the PTE special supports is turned on in per architecture header files. Most of the time, it is defined in arch/*/include/asm/pgtable.h depending or not on some other per architecture static definition. This patch introduce a new configuration variable to manage this directly in the Kconfig files. It would later replace __HAVE_ARCH_PTE_SPECIAL. Here notes for some architecture where the definition of __HAVE_ARCH_PTE_SPECIAL is not obvious: arm __HAVE_ARCH_PTE_SPECIAL which is currently defined in arch/arm/include/asm/pgtable-3level.h which is included by arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set. So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE. powerpc __HAVE_ARCH_PTE_SPECIAL is defined in 2 files: - arch/powerpc/include/asm/book3s/64/pgtable.h - arch/powerpc/include/asm/pte-common.h The first one is included if (PPC_BOOK3S & PPC64) while the second is included in all the other cases. So select ARCH_HAS_PTE_SPECIAL all the time. sparc: __HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) && defined(__arch64__) which are defined through the compiler in sparc/Makefile if !SPARC32 which I assume to be if SPARC64. So select ARCH_HAS_PTE_SPECIAL if SPARC64 There is no functional change introduced by this patch. Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.comSigned-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com> Suggested-by: NJerome Glisse <jglisse@redhat.com> Reviewed-by: NJerome Glisse <jglisse@redhat.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <albert@sifive.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Christophe LEROY <christophe.leroy@c-s.fr> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 5月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
Switch to the generic noncoherent direct mapping implementation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NAlexey Brodkin <abrodkin@synopsys.com> Acked-by: NVineet Gupta <vgupta@synopsys.com>
-
- 09 5月, 2018 2 次提交
-
-
由 Christoph Hellwig 提交于
Define this symbol if the architecture either uses 64-bit pointers or the PHYS_ADDR_T_64BIT is set. This covers 95% of the old arch magic. We only need an additional select for Xen on ARM (why anyway?), and we now always set ARCH_DMA_ADDR_T_64BIT on mips boards with 64-bit physical addressing instead of only doing it when highmem is set. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NJames Hogan <jhogan@kernel.org>
-
由 Christoph Hellwig 提交于
Instead select the PHYS_ADDR_T_64BIT for 32-bit architectures that need a 64-bit phys_addr_t type directly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NJames Hogan <jhogan@kernel.org>
-
- 06 2月, 2018 1 次提交
-
-
由 Ulf Magnusson 提交于
'default N' should be 'default n', though they happen to have the same effect here, due to undefined symbols (N in this case) evaluating to n in a tristate sense. Remove the default from ARC_EMUL_UNALIGNED instead of changing it. bool and tristate symbols implicitly default to n. Discovered with the https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ulfalizer_Kconfiglib_blob_master_examples_list-5Fundefined.py&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=c14YS-cH-kdhTOW89KozFhBtBJgs1zXscZojEZQ0THs&m=WxxD8ozR7QQUVzNCBksiznaisBGO_crN7PBOvAoju8s&s=1LmxsNqxwT-7wcInVpZ6Z1J27duZKSoyKxHIJclXU_M&e= script. Signed-off-by: NUlf Magnusson <ulfalizer@gmail.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 09 1月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
We always use the stub definitions, so remove the unused other code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NVineet Gupta <vgupta@synopsys.com>
-
- 22 11月, 2017 1 次提交
-
-
由 Vineet Gupta 提交于
For non-alising Dcache, vmalloc is not needed. vmalloc triggers additonal D-TLB Misses in the perf interrupt code path making it slightly inefficient as evident from hackbench runs below. | [ARCLinux]# perf stat -e dTLB-load-misses --repeat 5 hackbench | Running with 10*40 (== 400) tasks. | Time: 35.060 | ... | Performance counter stats for 'hackbench' (5 runs): Before: 399235 dTLB-load-misses ( +- 2.08% ) After : 397676 dTLB-load-misses ( +- 2.27% ) Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 12 10月, 2017 1 次提交
-
-
由 Masanari Iida 提交于
This patch fixes some spelling typos found in Kconfig files. Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Acked-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 04 10月, 2017 1 次提交
-
-
由 Vineet Gupta 提交于
Reported-by: NDmitrii Kolesnichenko <dmitrii@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 02 9月, 2017 2 次提交
-
-
由 Alexey Brodkin 提交于
This initial port adds support of ARC HS Development Kit board with some basic features such serial port, USB, SD/MMC and Ethernet. Essentially we run Linux kernel on all 4 cores (i.e. utilize SMP) and heavily use IO Coherency for speeding-up DMA-aware peripherals. Note as opposed to other ARC boards we link Linux kernel to 0x9000_0000 intentionally because cores 1 and 3 configured with DCCM situated at our more usual link base 0x8000_0000. We still can use memory region starting at 0x8000_0000 as we reallocate DCCM in our platform code. Note that PAE remapping for DMA clients does not work due to an RTL bug, so CREG_PAE register must be programmed to all zeroes, otherwise it will cause problems with DMA to/from peripherals even if PAE40 is not used. Acked-by: NRob Herring <robh@kernel.org> Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Eugeniy Paltsev 提交于
[Needed for HSDK] Currently the first page of system (hence RAM base) is assumed to be @ CONFIG_LINUX_LINK_BASE, where kernel itself is linked. However is case of HSDK platform, for reasons explained in that patch, this is not true. kernel needs to be linked @ 0x9000_0000 while DDR is still wired at 0x8000_0000. To properly account for this 256M of RAM, we need to introduce a new option and base page frame accountiing off of it. Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com> [vgupta: renamed CONFIG_KERNEL_RAM_BASE_ADDRESS => CONFIG_LINUX_RAM_BASE : simplified changelog]
-
- 04 8月, 2017 1 次提交
-
-
由 Vineet Gupta 提交于
Essentially remove CONFIG_ARC_PLAT_SIM There is no need for any platform specific code, just the board DTS match strings which we can include unconditionally Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 06 5月, 2017 1 次提交
-
-
由 Alexey Brodkin 提交于
This reverts commit 7cab91b8. Now when we have a real hardware platform with PAE40 enabled (here I mean axs103 with firmware v1.2) and 1 Gb of DDR mapped to 0x1_a000_0000-0x1_ffff_ffff we're really targeting memory above 4Gb when PAE40 is enabled. This in its turn requires HIGHMEM to be enabled otherwise user won't see any difference with enabling PAE in kernel configuration as only lowmem will be used anyways. Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 27 4月, 2017 1 次提交
-
-
由 Al Viro 提交于
all architectures converted Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 4月, 2017 1 次提交
-
-
由 Vineet Gupta 提交于
Accumulator is present in configs with FPU and/or DSP MPY (mpy > 6) Instead of doing this in pt_regs (and thus every kernel entry/exit), this could have been done in context switch (and for user task only) as currently kernel doesn't clobber these registers for its own accord. However we will soon start using 64-bit multiply instructions for kernel which can clobber these. Also gcc folks also plan to start using these as GPRs, hence better to always save/restore them Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 29 3月, 2017 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 2月, 2017 2 次提交
-
-
由 Vineet Gupta 提交于
A typical SMP system expects cache coherency. Initial NPS platform support was slated to be SMP w/o cache coherency. However it seems the platform now selects that option, so there is no point in keeping it around. Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Yuriy Kolerov 提交于
Currently Kconfig knob ARC_NUMBER_OF_INTERRUPTS is used as indicator of hard irq count. But it is flawed that it doesn't affect - NR_IRQS : for number of virtual interrupts - NR_CPU_IRQS : for number of hardware interrupts Moreover the actual hardware irq count might still not be same as ARC_NUMBER_OF_INTERRUPTS. So use the information availble in the Build Configuration Registers and get rid of the Kconfig option. We still need "some" build time info about irq count to set up sufficient number of vector table entries. This is done with a sufficiently large NR_CPU_IRQS which will eventually be used soley for that purpose (subsequent patches will remove its usage elsewhere) So to summarize what this patch does: * NR_CPU_IRQS defines a maximum number of hardware interrupts. * Remove ARC_NUMBER_OF_INTERRUPTS option and create interrupts table for all possible hardware interrupts. * Increase a maximum number of virtual IRQs to 512. ARCv2 can support 240 interrupts in the core interrupts controllers and 128 interrupts in IDU. Thus 512 virtual IRQs must be enough for most configurations of boards. This patch leads to NR_CPU_IRQS in 2 places, to reduce the overall churn. The next patch will remove the 2nd definition anyways. Signed-off-by: NYuriy Kolerov <yuriy.kolerov@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com> [vgupta: reworked the changelog a bit]
-
- 19 1月, 2017 1 次提交
-
-
由 Vineet Gupta 提交于
commit d65283f7 added mod->arch.secstr under CONFIG_ARC_DW2_UNWIND, but used it unconditionally which broke builds when the option was disabled. Fix that by adjusting the #ifdef guard. And while at it add a missing guard (for unwinder) in module.c as well Reported-by: NWaldemar Brodkorb <wbx@openadk.org> Cc: stable@vger.kernel.org #4.9 Fixes: d65283f7 ("ARC: module: elide loop to save reference to .eh_frame") Tested-by: NAnton Kolesov <akolesov@synopsys.com> Reviewed-by: NAlexey Brodkin <abrodkin@synopsys.com> [abrodkin: provided fixlet to Kconfig per failure in allnoconfig build] Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 19 12月, 2016 1 次提交
-
-
由 Vladimir Kondratiev 提交于
Signed-off-by: NVladimir Kondratiev <vladimir.kondratiev@intel.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 01 12月, 2016 2 次提交
-
-
由 Vineet Gupta 提交于
This adds support for - CONFIG_ARC_TIMERS : legacy 32-bit TIMER0 and TIMER1 which count UP from @CNT to @LIMIT, before optionally triggering an interrupt. These are programmed using ARC auxiliary register interface. These are present in all ARC cores (ARC700 and ARC HS38) TIMER0 serves as clockevent for all ARC linux builds. TIMER1 is used for clocksource in arc700 builds. - CONFIG_ARC_TIMERS_64BIT: 64-bit counters, RTC and GFRC found in ARC HS38 cores. These are independnet IP blocks with different programming model respectively. Link: http://lkml.kernel.org/r/20161111231132.GA4186@maiAcked-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
The original distinction was done as they were developed at different times and primarily because they are specific to UP (RTC) and SMP (GFRC). But given that driver handles that at runtime, (i.e. not allowing RTC as clocksource in SMP), we can simplify things a bit. Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 29 10月, 2016 1 次提交
-
-
由 Vineet Gupta 提交于
... given that we have perf counters abel to do the same thing non intrusively Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 17 10月, 2016 2 次提交
-
-
由 Daniel Mentz 提交于
Add support for lzma compressed uImage. Support for gzip was already available but could not be enabled because we were missing CONFIG_HAVE_KERNEL_GZIP in arch/arc/Kconfig. Signed-off-by: NDaniel Mentz <danielmentz@google.com> Cc: linux-snps-arc@lists.infradead.org Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
The IDU intc is technically part of MCIP (Multi-core IP) hence historically was only available in a SMP hardware build (and thus only in a SMP kernel build). Now that hardware restriction has been lifted, so a UP kernel needs to support it. This requires breaking mcip.c into parts which are strictly SMP (inter-core interrupts) and IDU which in reality is just another intc and thus has no bearing on SMP. This change allows IDU in UP builds and with a suitable device tree, we can have the cascaded intc system ARCv2 core intc <---> ARCv2 IDU intc <---> periperals Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 01 10月, 2016 2 次提交
-
-
由 Noam Camus 提交于
Seem like values assigned as absolute number and not and shift value, i.e. should be 0 for one node (2^0) and 1 for couple of nodes (2^1) Signed-off-by: NNoam Camus <noamca@mellanox.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
ARCv2 ISA provides 64-bit exclusive load/stores so use them to implement the 64-bit atomics and elide the spinlock based generic 64-bit atomics boot tested with atomic64 self-test (and GOD bless the person who wrote them, I realized my inline assmebly is sloppy as hell) Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-snps-arc@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 02 6月, 2016 1 次提交
-
-
由 Vineet Gupta 提交于
This reverts commit e78fdfef. The issue was fixed in hardware in HS2.1C release and there are no known external users of affected RTL so revert the whole delayed retry series ! Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 31 5月, 2016 2 次提交
-
-
由 Vineet Gupta 提交于
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
ARC700 support for 2 interrupt priorities historically allowed even slow perpherals such as emac and uart to setup high priority interrupts which was wrong from the beginning as they could possibly delay the more critical timer interrupt. The hardware support for 2 level interrupts in ARCompact is less than ideal anyways (judging from the "hacks" in low level entry code and thus is not used in productions systems I know of. So reduce the scope of this to timer only, thereby reducing a bunch of complexity. Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 21 5月, 2016 1 次提交
-
-
由 Zhaoxiu Zeng 提交于
The binary GCD algorithm is based on the following facts: 1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2) 2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b) 3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b) Even on x86 machines with reasonable division hardware, the binary algorithm runs about 25% faster (80% the execution time) than the division-based Euclidian algorithm. On platforms like Alpha and ARMv6 where division is a function call to emulation code, it's even more significant. There are two variants of the code here, depending on whether a fast __ffs (find least significant set bit) instruction is available. This allows the unpredictable branches in the bit-at-a-time shifting loop to be eliminated. If fast __ffs is not available, the "even/odd" GCD variant is used. I use the following code to benchmark: #include <stdio.h> #include <stdlib.h> #include <stdint.h> #include <string.h> #include <time.h> #include <unistd.h> #define swap(a, b) \ do { \ a ^= b; \ b ^= a; \ a ^= b; \ } while (0) unsigned long gcd0(unsigned long a, unsigned long b) { unsigned long r; if (a < b) { swap(a, b); } if (b == 0) return a; while ((r = a % b) != 0) { a = b; b = r; } return b; } unsigned long gcd1(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; b >>= __builtin_ctzl(b); for (;;) { a >>= __builtin_ctzl(a); if (a == b) return a << __builtin_ctzl(r); if (a < b) swap(a, b); a -= b; } } unsigned long gcd2(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; r &= -r; while (!(b & r)) b >>= 1; for (;;) { while (!(a & r)) a >>= 1; if (a == b) return a; if (a < b) swap(a, b); a -= b; a >>= 1; if (a & r) a += b; a >>= 1; } } unsigned long gcd3(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; b >>= __builtin_ctzl(b); if (b == 1) return r & -r; for (;;) { a >>= __builtin_ctzl(a); if (a == 1) return r & -r; if (a == b) return a << __builtin_ctzl(r); if (a < b) swap(a, b); a -= b; } } unsigned long gcd4(unsigned long a, unsigned long b) { unsigned long r = a | b; if (!a || !b) return r; r &= -r; while (!(b & r)) b >>= 1; if (b == r) return r; for (;;) { while (!(a & r)) a >>= 1; if (a == r) return r; if (a == b) return a; if (a < b) swap(a, b); a -= b; a >>= 1; if (a & r) a += b; a >>= 1; } } static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = { gcd0, gcd1, gcd2, gcd3, gcd4, }; #define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0])) #if defined(__x86_64__) #define rdtscll(val) do { \ unsigned long __a,__d; \ __asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \ (val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \ } while(0) static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long), unsigned long a, unsigned long b, unsigned long *res) { unsigned long long start, end; unsigned long long ret; unsigned long gcd_res; rdtscll(start); gcd_res = gcd(a, b); rdtscll(end); if (end >= start) ret = end - start; else ret = ~0ULL - start + 1 + end; *res = gcd_res; return ret; } #else static inline struct timespec read_time(void) { struct timespec time; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time); return time; } static inline unsigned long long diff_time(struct timespec start, struct timespec end) { struct timespec temp; if ((end.tv_nsec - start.tv_nsec) < 0) { temp.tv_sec = end.tv_sec - start.tv_sec - 1; temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec; } else { temp.tv_sec = end.tv_sec - start.tv_sec; temp.tv_nsec = end.tv_nsec - start.tv_nsec; } return temp.tv_sec * 1000000000ULL + temp.tv_nsec; } static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long), unsigned long a, unsigned long b, unsigned long *res) { struct timespec start, end; unsigned long gcd_res; start = read_time(); gcd_res = gcd(a, b); end = read_time(); *res = gcd_res; return diff_time(start, end); } #endif static inline unsigned long get_rand() { if (sizeof(long) == 8) return (unsigned long)rand() << 32 | rand(); else return rand(); } int main(int argc, char **argv) { unsigned int seed = time(0); int loops = 100; int repeats = 1000; unsigned long (*res)[TEST_ENTRIES]; unsigned long long elapsed[TEST_ENTRIES]; int i, j, k; for (;;) { int opt = getopt(argc, argv, "n:r:s:"); /* End condition always first */ if (opt == -1) break; switch (opt) { case 'n': loops = atoi(optarg); break; case 'r': repeats = atoi(optarg); break; case 's': seed = strtoul(optarg, NULL, 10); break; default: /* You won't actually get here. */ break; } } res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops); memset(elapsed, 0, sizeof(elapsed)); srand(seed); for (j = 0; j < loops; j++) { unsigned long a = get_rand(); /* Do we have args? */ unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand(); unsigned long long min_elapsed[TEST_ENTRIES]; for (k = 0; k < repeats; k++) { for (i = 0; i < TEST_ENTRIES; i++) { unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]); if (k == 0 || min_elapsed[i] > tmp) min_elapsed[i] = tmp; } } for (i = 0; i < TEST_ENTRIES; i++) elapsed[i] += min_elapsed[i]; } for (i = 0; i < TEST_ENTRIES; i++) printf("gcd%d: elapsed %llu\n", i, elapsed[i]); k = 0; srand(seed); for (j = 0; j < loops; j++) { unsigned long a = get_rand(); unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand(); for (i = 1; i < TEST_ENTRIES; i++) { if (res[j][i] != res[j][0]) break; } if (i < TEST_ENTRIES) { if (k == 0) { k = 1; fprintf(stderr, "Error:\n"); } fprintf(stderr, "gcd(%lu, %lu): ", a, b); for (i = 0; i < TEST_ENTRIES; i++) fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n"); } } if (k == 0) fprintf(stderr, "PASS\n"); free(res); return 0; } Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got: zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 10174 gcd1: elapsed 2120 gcd2: elapsed 2902 gcd3: elapsed 2039 gcd4: elapsed 2812 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9309 gcd1: elapsed 2280 gcd2: elapsed 2822 gcd3: elapsed 2217 gcd4: elapsed 2710 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9589 gcd1: elapsed 2098 gcd2: elapsed 2815 gcd3: elapsed 2030 gcd4: elapsed 2718 PASS zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10 gcd0: elapsed 9914 gcd1: elapsed 2309 gcd2: elapsed 2779 gcd3: elapsed 2228 gcd4: elapsed 2709 PASS [akpm@linux-foundation.org: avoid #defining a CONFIG_ variable] Signed-off-by: NZhaoxiu Zeng <zhaoxiu.zeng@gmail.com> Signed-off-by: NGeorge Spelvin <linux@horizon.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 5月, 2016 4 次提交
-
-
由 Noam Camus 提交于
This commit should be left last since only now eznps platform is in state which one can actually use. Signed-off-by: NNoam Camus <noamc@ezchip.com>
-
由 Noam Camus 提交于
On ARC, lower 2G of address space is translated and used for - user vaddr space (region 0 to 5) - unused kernel-user gutter (region 6) - kernel vaddr space (region 7) where each region simply represents 256MB of address space. The kernel vaddr space of 256MB is used to implement vmalloc, modules So far this was enough, but not on EZChip system with 4K CPUs (given that per cpu mechanism uses vmalloc for allocating chunks) So allow VMALLOC_SIZE to be configurable by expanding down into the unused kernel-user gutter region which at default 256M was excessive anyways. Also use _BITUL() to fix a build error since PGDIR_SIZE cannot use "1UL" as called from assembly code in mm/tlbex.S Signed-off-by: NNoam Camus <noamc@ezchip.com> [vgupta: rewrote changelog, debugged bootup crash due to int vs. hex] Acked-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
The primary interrupt handler arch_do_IRQ() was passing hwirq as linux virq to core code. This was fragile and worked so far as we only had legacy/linear domains. This came out of a rant by Marc Zyngier. http://lists.infradead.org/pipermail/linux-snps-arc/2015-December/000298.html Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Noam Camus <noamc@ezchip.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Noam Camus 提交于
- call clocksource_probe() - This in turns needs of_clk_init() to be called earlier Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NNoam Camus <noamc@ezchip.com> [vgupta: broken off from a bigger patch] Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 05 5月, 2016 1 次提交
-
-
由 Vineet Gupta 提交于
Initial HIGHMEM support on ARC was introduced for PAE40 where the low memory (0x8000_0000 based) and high memory (0x1_0000_0000) were physically contiguous. So CONFIG_FLATMEM sufficed (despite a peipheral hole in the middle, which wasted a bit of struct page memory, but things worked). However w/o PAE, highmem was not possible and we could only reach ~1.75GB of DDR. Now there is a use case to access ~4GB of DDR w/o PAE40 The idea is to have low memory at canonical 0x8000_0000 and highmem at 0 so enire 4GB address space is available for physical addressing This needs additional platform/interconnect mapping to convert the non contiguous physical addresses into linear bus adresses. From Linux point of view, non contiguous divide means FLATMEM no longer works and DISCONTIGMEM is needed to track the pfns in the 2 regions. This scheme would also work for PAE40, only better in that we don't waste struct page memory for the peripheral hole. The DT description will be something like memory { ... reg = <0x80000000 0x200000000 /* 512MB: lowmem */ 0x00000000 0x10000000>; /* 256MB: highmem */ } Signed-off-by: NNoam Camus <noamc@ezchip.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 27 4月, 2016 2 次提交
-
-
由 Alexey Brodkin 提交于
Enable reserved memory initialization from device tree. Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Alexey Brodkin 提交于
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-