提交 · eb2777397fd83a4a7eaa26984d09d3babb845d2a · openeuler / Kernel

28 7月, 2018 1 次提交

ARC: dma [non-IOC] setup SMP_CACHE_BYTES and cache_line_size · eb277739

由 Eugeniy Paltsev 提交于 7月 26, 2018

As for today we don't setup SMP_CACHE_BYTES and cache_line_size for
ARC, so they are set to L1_CACHE_BYTES by default. L1 line length
(L1_CACHE_BYTES) might be easily smaller than L2 line (which is
usually the case BTW). This breaks code.

For example this breaks ethernet infrastructure on HSDK/AXS103 boards
with IOC disabled, involving manual cache flushes
Functions which alloc and manage sk_buff packet data area rely on
SMP_CACHE_BYTES define. In the result we can share last L2 cache
line in sk_buff linear packet data area between DMA buffer and
some useful data in other structure. So we can lose this data when
we invalidate DMA buffer.

   sk_buff linear packet data area
                |
                |
                |         skb->end        skb->tail
                V            |                |
                             V                V
----------------------------------------------.
      packet data            | <tail padding> |  <useful data in other struct>
----------------------------------------------.

---------------------.--------------------------------------------------.
     SLC line        |             SLC (L2 cache) line (128B)           |
---------------------.--------------------------------------------------.
        ^                                     ^
        |                                     |
     These cache lines will be invalidated when we invalidate skb
     linear packet data area before DMA transaction starting.

This leads to issues painful to debug as it reproduces only if
(sk_buff->end - sk_buff->tail) < SLC_LINE_SIZE and
if we have some useful data right after sk_buff->end.

Fix that by hardcode SMP_CACHE_BYTES to max line length we may have.
Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

eb277739

20 7月, 2018 1 次提交

ARCv2: [plat-hsdk]: Save accl reg pair by default · af1fc5ba

由 Vineet Gupta 提交于 7月 17, 2018

This manifsted as strace segfaulting on HSDK because gcc was targetting
the accumulator registers as GPRs, which kernek was not saving/restoring
by default.

Cc: stable@vger.kernel.org   #4.14+
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

af1fc5ba

08 6月, 2018 1 次提交

mm: introduce ARCH_HAS_PTE_SPECIAL · 3010a5ea

由 Laurent Dufour 提交于 6月 07, 2018

Currently the PTE special supports is turned on in per architecture
header files.  Most of the time, it is defined in
arch/*/include/asm/pgtable.h depending or not on some other per
architecture static definition.

This patch introduce a new configuration variable to manage this
directly in the Kconfig files.  It would later replace
__HAVE_ARCH_PTE_SPECIAL.

Here notes for some architecture where the definition of
__HAVE_ARCH_PTE_SPECIAL is not obvious:

arm
 __HAVE_ARCH_PTE_SPECIAL which is currently defined in
arch/arm/include/asm/pgtable-3level.h which is included by
arch/arm/include/asm/pgtable.h when CONFIG_ARM_LPAE is set.
So select ARCH_HAS_PTE_SPECIAL if ARM_LPAE.

powerpc
__HAVE_ARCH_PTE_SPECIAL is defined in 2 files:
 - arch/powerpc/include/asm/book3s/64/pgtable.h
 - arch/powerpc/include/asm/pte-common.h
The first one is included if (PPC_BOOK3S & PPC64) while the second is
included in all the other cases.
So select ARCH_HAS_PTE_SPECIAL all the time.

sparc:
__HAVE_ARCH_PTE_SPECIAL is defined if defined(__sparc__) &&
defined(__arch64__) which are defined through the compiler in
sparc/Makefile if !SPARC32 which I assume to be if SPARC64.
So select ARCH_HAS_PTE_SPECIAL if SPARC64

There is no functional change introduced by this patch.

Link: http://lkml.kernel.org/r/1523433816-14460-2-git-send-email-ldufour@linux.vnet.ibm.comSigned-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: NJerome Glisse <jglisse@redhat.com>
Reviewed-by: NJerome Glisse <jglisse@redhat.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <albert@sifive.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Christophe LEROY <christophe.leroy@c-s.fr>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3010a5ea

19 5月, 2018 1 次提交

arc: use generic dma_noncoherent_ops · 6c3e71dd

由 Christoph Hellwig 提交于 5月 18, 2018

Switch to the generic noncoherent direct mapping implementation.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NAlexey Brodkin <abrodkin@synopsys.com>
Acked-by: NVineet Gupta <vgupta@synopsys.com>

6c3e71dd

09 5月, 2018 2 次提交

arch: define the ARCH_DMA_ADDR_T_64BIT config symbol in lib/Kconfig · 4965a687

由 Christoph Hellwig 提交于 4月 03, 2018

Define this symbol if the architecture either uses 64-bit pointers or the
PHYS_ADDR_T_64BIT is set.  This covers 95% of the old arch magic.  We only
need an additional select for Xen on ARM (why anyway?), and we now always
set ARCH_DMA_ADDR_T_64BIT on mips boards with 64-bit physical addressing
instead of only doing it when highmem is set.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJames Hogan <jhogan@kernel.org>

4965a687

arch: remove the ARCH_PHYS_ADDR_T_64BIT config symbol · d4a451d5

由 Christoph Hellwig 提交于 4月 03, 2018

Instead select the PHYS_ADDR_T_64BIT for 32-bit architectures that need a
64-bit phys_addr_t type directly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJames Hogan <jhogan@kernel.org>

d4a451d5

06 2月, 2018 1 次提交

ARC: Fix malformed ARC_EMUL_UNALIGNED default · 827cc2fa

由 Ulf Magnusson 提交于 2月 05, 2018

'default N' should be 'default n', though they happen to have the same
effect here, due to undefined symbols (N in this case) evaluating to n
in a tristate sense.

Remove the default from ARC_EMUL_UNALIGNED instead of changing it. bool
and tristate symbols implicitly default to n.

Discovered with the
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ulfalizer_Kconfiglib_blob_master_examples_list-5Fundefined.py&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=c14YS-cH-kdhTOW89KozFhBtBJgs1zXscZojEZQ0THs&m=WxxD8ozR7QQUVzNCBksiznaisBGO_crN7PBOvAoju8s&s=1LmxsNqxwT-7wcInVpZ6Z1J27duZKSoyKxHIJclXU_M&e=
script.
Signed-off-by: NUlf Magnusson <ulfalizer@gmail.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

827cc2fa

09 1月, 2018 1 次提交

arc: remove CONFIG_ARC_PLAT_NEEDS_PHYS_TO_DMA · 57723cb3

由 Christoph Hellwig 提交于 12月 20, 2017

We always use the stub definitions, so remove the unused other code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NVineet Gupta <vgupta@synopsys.com>

57723cb3

22 11月, 2017 1 次提交

ARC: perf: avoid vmalloc backed mmap · 82385732

由 Vineet Gupta 提交于 9月 28, 2016

For non-alising Dcache, vmalloc is not needed.

vmalloc triggers additonal D-TLB Misses in the perf interrupt code path
making it slightly inefficient as evident from hackbench runs below.

| [ARCLinux]# perf stat -e dTLB-load-misses --repeat 5 hackbench
| Running with 10*40 (== 400) tasks.
| Time: 35.060
| ...
| Performance counter stats for 'hackbench' (5 runs):

Before:      399235      dTLB-load-misses ( +-  2.08% )
After :      397676      dTLB-load-misses ( +-  2.27% )
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

82385732

12 10月, 2017 1 次提交

treewide: Fix typos in Kconfig · 83fc61a5

由 Masanari Iida 提交于 9月 26, 2017

This patch fixes some spelling typos found in Kconfig files.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

83fc61a5

04 10月, 2017 1 次提交

ARC: fix allnoconfig build warning · 5464d03d

由 Vineet Gupta 提交于 9月 29, 2017

Reported-by: NDmitrii Kolesnichenko <dmitrii@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

5464d03d

02 9月, 2017 2 次提交

ARC: [plat-hsdk] initial port for HSDK board · a518d637

由 Alexey Brodkin 提交于 8月 15, 2017

This initial port adds support of ARC HS Development Kit board with some
basic features such serial port, USB, SD/MMC and Ethernet.

Essentially we run Linux kernel on all 4 cores (i.e. utilize SMP) and
heavily use IO Coherency for speeding-up DMA-aware peripherals.

Note as opposed to other ARC boards we link Linux kernel to
0x9000_0000 intentionally because cores 1 and 3 configured with DCCM
situated at our more usual link base 0x8000_0000. We still can use
memory region starting at 0x8000_0000 as we reallocate DCCM in our
platform code.

Note that PAE remapping for DMA clients does not work due to an RTL bug,
so CREG_PAE register must be programmed to all zeroes, otherwise it will
cause problems with DMA to/from peripherals even if PAE40 is not used.
Acked-by: NRob Herring <robh@kernel.org>
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

a518d637

ARC: mm: Decouple RAM base address from kernel link address · 9ed68785

由 Eugeniy Paltsev 提交于 8月 15, 2017

	[Needed for HSDK]

Currently the first page of system (hence RAM base) is assumed to be
@ CONFIG_LINUX_LINK_BASE, where kernel itself is linked.

However is case of HSDK platform, for reasons explained in that patch,
this is not true. kernel needs to be linked @ 0x9000_0000 while DDR
is still wired at 0x8000_0000. To properly account for this 256M of RAM,
we need to introduce a new option and base page frame accountiing off of
it.
Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
[vgupta: renamed  CONFIG_KERNEL_RAM_BASE_ADDRESS => CONFIG_LINUX_RAM_BASE
       : simplified changelog]

9ed68785

04 8月, 2017 1 次提交

ARC: [plat-sim] Include this platform unconditionally · 33460f86

由 Vineet Gupta 提交于 7月 28, 2017

Essentially remove CONFIG_ARC_PLAT_SIM

There is no need for any platform specific code, just the board DTS
match strings which we can include unconditionally
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

33460f86

06 5月, 2017 1 次提交

Revert "ARCv2: Allow enabling PAE40 w/o HIGHMEM" · cf4100d1

由 Alexey Brodkin 提交于 5月 05, 2017

This reverts commit 7cab91b8.

Now when we have a real hardware platform with PAE40 enabled
(here I mean axs103 with firmware v1.2) and 1 Gb of DDR mapped to
0x1_a000_0000-0x1_ffff_ffff we're really targeting memory above 4Gb
when PAE40 is enabled. This in its turn requires HIGHMEM to be enabled
otherwise user won't see any difference with enabling PAE in
kernel configuration as only lowmem will be used anyways.
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

cf4100d1

27 4月, 2017 1 次提交
- A
  CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now · 701cac61
  由 Al Viro 提交于 4月 05, 2017
```
all architectures converted
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  701cac61
21 4月, 2017 1 次提交

ARCv2: entry: save Accumulator register pair (r58:59) if present · 3d5e8012

由 Vineet Gupta 提交于 4月 20, 2017

Accumulator is present in configs with FPU and/or DSP MPY (mpy > 6)

Instead of doing this in pt_regs (and thus every kernel entry/exit),
this could have been done in context switch (and for user task only) as
currently kernel doesn't clobber these registers for its own accord.
However we will soon start using 64-bit multiply instructions for kernel
which can clobber these. Also gcc folks also plan to start using these
as GPRs, hence better to always save/restore them
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

3d5e8012

29 3月, 2017 1 次提交
- A
  arc: switch to RAW_COPY_USER · 839cc295
  由 Al Viro 提交于 3月 21, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  839cc295
07 2月, 2017 2 次提交

ARC: [plat-*] ARC_HAS_COH_CACHES no longer relevant · 8ba605b6

由 Vineet Gupta 提交于 2月 02, 2017

A typical SMP system expects cache coherency. Initial NPS platform
support was slated to be SMP w/o cache coherency.

However it seems the platform now selects that option, so there is no
point in keeping it around.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

8ba605b6

ARCv2: intc: Rework the build time irq count information · f33b8cdd

由 Yuriy Kolerov 提交于 1月 31, 2017

Currently Kconfig knob ARC_NUMBER_OF_INTERRUPTS is used as indicator of
hard irq count. But it is flawed that it doesn't affect
 - NR_IRQS     : for number of virtual interrupts
 - NR_CPU_IRQS : for number of hardware interrupts

Moreover the actual hardware irq count might still not be same as
ARC_NUMBER_OF_INTERRUPTS. So use the information availble in the
Build Configuration Registers and get rid of the Kconfig option.

We still need "some" build time info about irq count to set up
sufficient number of vector table entries. This is done with a
sufficiently large NR_CPU_IRQS which will eventually be used soley for
that purpose (subsequent patches will remove its usage elsewhere)

So to summarize what this patch does:

  * NR_CPU_IRQS defines a maximum number of hardware interrupts.
  * Remove ARC_NUMBER_OF_INTERRUPTS option and create interrupts
    table for all possible hardware interrupts.
  * Increase a maximum number of virtual IRQs to 512. ARCv2 can
    support 240 interrupts in the core interrupts controllers
    and 128 interrupts in IDU. Thus 512 virtual IRQs must be
    enough for most configurations of boards.

This patch leads to NR_CPU_IRQS in 2 places, to reduce the overall
churn. The next patch will remove the 2nd definition anyways.
Signed-off-by: NYuriy Kolerov <yuriy.kolerov@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
[vgupta: reworked the changelog a bit]

f33b8cdd

19 1月, 2017 1 次提交

ARC: module: Fix !CONFIG_ARC_DW2_UNWIND builds · eb1357d9

由 Vineet Gupta 提交于 1月 16, 2017

commit d65283f7 added mod->arch.secstr under
CONFIG_ARC_DW2_UNWIND, but used it unconditionally which broke builds
when the option was disabled. Fix that by adjusting the #ifdef guard.

And while at it add a missing guard (for unwinder) in module.c as well
Reported-by: NWaldemar Brodkorb <wbx@openadk.org>
Cc: stable@vger.kernel.org    #4.9
Fixes: d65283f7 ("ARC: module: elide loop to save reference to .eh_frame")
Tested-by: NAnton Kolesov <akolesov@synopsys.com>
Reviewed-by: NAlexey Brodkin <abrodkin@synopsys.com>
[abrodkin: provided fixlet to Kconfig per failure in allnoconfig build]
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

eb1357d9

19 12月, 2016 1 次提交

ARC: enable SG chaining · 983eeba7

由 Vladimir Kondratiev 提交于 12月 14, 2016

Signed-off-by: NVladimir Kondratiev <vladimir.kondratiev@intel.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

983eeba7

01 12月, 2016 2 次提交

clocksource: import ARC timer driver · c4c9a040

由 Vineet Gupta 提交于 10月 31, 2016

This adds support for

 - CONFIG_ARC_TIMERS : legacy 32-bit TIMER0 and TIMER1 which count UP
   from @CNT to @LIMIT, before optionally triggering an interrupt.
   These are programmed using ARC auxiliary register interface.
   These are present in all ARC cores (ARC700 and ARC HS38)
   TIMER0 serves as clockevent for all ARC linux builds.
   TIMER1 is used for clocksource in arc700 builds.

 - CONFIG_ARC_TIMERS_64BIT: 64-bit counters, RTC and GFRC found in
   ARC HS38 cores. These are independnet IP blocks with different
   programming model respectively.

Link: http://lkml.kernel.org/r/20161111231132.GA4186@maiAcked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

c4c9a040

ARC: timer: gfrc, rtc: build under same option (64-bit timers) · 04421420

由 Vineet Gupta 提交于 10月 31, 2016

The original distinction was done as they were developed at different
times and primarily because they are specific to UP (RTC) and SMP (GFRC).

But given that driver handles that at runtime, (i.e. not allowing
RTC as clocksource in SMP), we can simplify things a bit.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

04421420

29 10月, 2016 1 次提交

ARC: mm: retire ARC_DBG_TLB_MISS_COUNT... · f644e368

由 Vineet Gupta 提交于 10月 25, 2016

... given that we have perf counters abel to do the same thing non
intrusively
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

f644e368

17 10月, 2016 2 次提交

ARC: [build] Support gz, lzma compressed uImage · 27f3d2a3

由 Daniel Mentz 提交于 10月 04, 2016

Add support for lzma compressed uImage.

Support for gzip was already available but could not be enabled because
we were missing CONFIG_HAVE_KERNEL_GZIP in arch/arc/Kconfig.
Signed-off-by: NDaniel Mentz <danielmentz@google.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

27f3d2a3

ARCv2: intc: untangle SMP, MCIP and IDU · 3ce0fefc

由 Vineet Gupta 提交于 9月 29, 2016

The IDU intc is technically part of MCIP (Multi-core IP) hence
historically was only available in a SMP hardware build (and thus only
in a SMP kernel build). Now that hardware restriction has been lifted,
so a UP kernel needs to support it.

This requires breaking mcip.c into parts which are strictly SMP
(inter-core interrupts) and IDU which in reality is just another
intc and thus has no bearing on SMP.

This change allows IDU in UP builds and with a suitable device tree, we
can have the cascaded intc system

    ARCv2 core intc <---> ARCv2 IDU intc <---> periperals
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

3ce0fefc

01 10月, 2016 2 次提交

ARC: CONFIG_NODES_SHIFT fix default values · 3528f84f

由 Noam Camus 提交于 9月 21, 2016

Seem like values assigned as absolute number and not and
shift value, i.e. should be 0 for one node (2^0) and 1 for
couple of nodes (2^1)
Signed-off-by: NNoam Camus <noamca@mellanox.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

3528f84f

ARCv2: Implement atomic64 based on LLOCKD/SCONDD instructions · ce636527

由 Vineet Gupta 提交于 7月 27, 2015

ARCv2 ISA provides 64-bit exclusive load/stores so use them to implement
the 64-bit atomics and elide the spinlock based generic 64-bit atomics

boot tested with atomic64 self-test (and GOD bless the person who wrote
them, I realized my inline assmebly is sloppy as hell)

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

ce636527

02 6月, 2016 1 次提交

Revert "ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff" · ed6aefed

由 Vineet Gupta 提交于 5月 31, 2016

This reverts commit e78fdfef.

The issue was fixed in hardware in HS2.1C release and there are no known
external users of affected RTL so revert the whole delayed retry series !
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

ed6aefed

31 5月, 2016 2 次提交

V
ARC: don't enable DISCONTIGMEM unconditionally · d140b9bf
由 Vineet Gupta 提交于 5月 31, 2016
```
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
```
d140b9bf

ARC: [intc-compact] simplify code for 2 priority levels · 60f2b4b8

由 Vineet Gupta 提交于 5月 30, 2016

ARC700 support for 2 interrupt priorities historically allowed even slow
perpherals such as emac and uart to setup high priority interrupts
which was wrong from the beginning as they could possibly delay the more
critical timer interrupt.

The hardware support for 2 level interrupts in ARCompact is less than
ideal anyways (judging from the "hacks" in low level entry code and thus
is not used in productions systems I know of.

So reduce the scope of this to timer only, thereby reducing a bunch of
complexity.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

60f2b4b8

21 5月, 2016 1 次提交

lib/GCD.c: use binary GCD algorithm instead of Euclidean · fff7fb0b

由 Zhaoxiu Zeng 提交于 5月 20, 2016

The binary GCD algorithm is based on the following facts:
	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

Even on x86 machines with reasonable division hardware, the binary
algorithm runs about 25% faster (80% the execution time) than the
division-based Euclidian algorithm.

On platforms like Alpha and ARMv6 where division is a function call to
emulation code, it's even more significant.

There are two variants of the code here, depending on whether a fast
__ffs (find least significant set bit) instruction is available.  This
allows the unpredictable branches in the bit-at-a-time shifting loop to
be eliminated.

If fast __ffs is not available, the "even/odd" GCD variant is used.

I use the following code to benchmark:

	#include <stdio.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <string.h>
	#include <time.h>
	#include <unistd.h>

	#define swap(a, b) \
		do { \
			a ^= b; \
			b ^= a; \
			a ^= b; \
		} while (0)

	unsigned long gcd0(unsigned long a, unsigned long b)
	{
		unsigned long r;

		if (a < b) {
			swap(a, b);
		}

		if (b == 0)
			return a;

		while ((r = a % b) != 0) {
			a = b;
			b = r;
		}

		return b;
	}

	unsigned long gcd1(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b >>= __builtin_ctzl(b);

		for (;;) {
			a >>= __builtin_ctzl(a);
			if (a == b)
				return a << __builtin_ctzl(r);

			if (a < b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd2(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &= -r;

		while (!(b & r))
			b >>= 1;

		for (;;) {
			while (!(a & r))
				a >>= 1;
			if (a == b)
				return a;

			if (a < b)
				swap(a, b);
			a -= b;
			a >>= 1;
			if (a & r)
				a += b;
			a >>= 1;
		}
	}

	unsigned long gcd3(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b >>= __builtin_ctzl(b);
		if (b == 1)
			return r & -r;

		for (;;) {
			a >>= __builtin_ctzl(a);
			if (a == 1)
				return r & -r;
			if (a == b)
				return a << __builtin_ctzl(r);

			if (a < b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd4(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &= -r;

		while (!(b & r))
			b >>= 1;
		if (b == r)
			return r;

		for (;;) {
			while (!(a & r))
				a >>= 1;
			if (a == r)
				return r;
			if (a == b)
				return a;

			if (a < b)
				swap(a, b);
			a -= b;
			a >>= 1;
			if (a & r)
				a += b;
			a >>= 1;
		}
	}

	static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
		gcd0, gcd1, gcd2, gcd3, gcd4,
	};

	#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

	#if defined(__x86_64__)

	#define rdtscll(val) do { \
		unsigned long __a,__d; \
		__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
		(val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \
	} while(0)

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		unsigned long long start, end;
		unsigned long long ret;
		unsigned long gcd_res;

		rdtscll(start);
		gcd_res = gcd(a, b);
		rdtscll(end);

		if (end >= start)
			ret = end - start;
		else
			ret = ~0ULL - start + 1 + end;

		*res = gcd_res;
		return ret;
	}

	#else

	static inline struct timespec read_time(void)
	{
		struct timespec time;
		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
		return time;
	}

	static inline unsigned long long diff_time(struct timespec start, struct timespec end)
	{
		struct timespec temp;

		if ((end.tv_nsec - start.tv_nsec) < 0) {
			temp.tv_sec = end.tv_sec - start.tv_sec - 1;
			temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
		} else {
			temp.tv_sec = end.tv_sec - start.tv_sec;
			temp.tv_nsec = end.tv_nsec - start.tv_nsec;
		}

		return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
	}

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		struct timespec start, end;
		unsigned long gcd_res;

		start = read_time();
		gcd_res = gcd(a, b);
		end = read_time();

		*res = gcd_res;
		return diff_time(start, end);
	}

	#endif

	static inline unsigned long get_rand()
	{
		if (sizeof(long) == 8)
			return (unsigned long)rand() << 32 | rand();
		else
			return rand();
	}

	int main(int argc, char **argv)
	{
		unsigned int seed = time(0);
		int loops = 100;
		int repeats = 1000;
		unsigned long (*res)[TEST_ENTRIES];
		unsigned long long elapsed[TEST_ENTRIES];
		int i, j, k;

		for (;;) {
			int opt = getopt(argc, argv, "n:r:s:");
			/* End condition always first */
			if (opt == -1)
				break;

			switch (opt) {
			case 'n':
				loops = atoi(optarg);
				break;
			case 'r':
				repeats = atoi(optarg);
				break;
			case 's':
				seed = strtoul(optarg, NULL, 10);
				break;
			default:
				/* You won't actually get here. */
				break;
			}
		}

		res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
		memset(elapsed, 0, sizeof(elapsed));

		srand(seed);
		for (j = 0; j < loops; j++) {
			unsigned long a = get_rand();
			/* Do we have args? */
			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			unsigned long long min_elapsed[TEST_ENTRIES];
			for (k = 0; k < repeats; k++) {
				for (i = 0; i < TEST_ENTRIES; i++) {
					unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
					if (k == 0 || min_elapsed[i] > tmp)
						min_elapsed[i] = tmp;
				}
			}
			for (i = 0; i < TEST_ENTRIES; i++)
				elapsed[i] += min_elapsed[i];
		}

		for (i = 0; i < TEST_ENTRIES; i++)
			printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

		k = 0;
		srand(seed);
		for (j = 0; j < loops; j++) {
			unsigned long a = get_rand();
			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			for (i = 1; i < TEST_ENTRIES; i++) {
				if (res[j][i] != res[j][0])
					break;
			}
			if (i < TEST_ENTRIES) {
				if (k == 0) {
					k = 1;
					fprintf(stderr, "Error:\n");
				}
				fprintf(stderr, "gcd(%lu, %lu): ", a, b);
				for (i = 0; i < TEST_ENTRIES; i++)
					fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
			}
		}

		if (k == 0)
			fprintf(stderr, "PASS\n");

		free(res);

		return 0;
	}

Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 10174
  gcd1: elapsed 2120
  gcd2: elapsed 2902
  gcd3: elapsed 2039
  gcd4: elapsed 2812
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9309
  gcd1: elapsed 2280
  gcd2: elapsed 2822
  gcd3: elapsed 2217
  gcd4: elapsed 2710
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9589
  gcd1: elapsed 2098
  gcd2: elapsed 2815
  gcd3: elapsed 2030
  gcd4: elapsed 2718
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9914
  gcd1: elapsed 2309
  gcd2: elapsed 2779
  gcd3: elapsed 2228
  gcd4: elapsed 2709
  PASS

[akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
Signed-off-by: NZhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
Signed-off-by: NGeorge Spelvin <linux@horizon.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fff7fb0b

09 5月, 2016 4 次提交

ARC: Add eznps platform to Kconfig and Makefile · 96665789

由 Noam Camus 提交于 10月 16, 2015

This commit should be left last since only now eznps platform
is in state which one can actually use.
Signed-off-by: NNoam Camus <noamc@ezchip.com>

96665789

ARC: Make vmalloc size configurable · 15ca68a9

由 Noam Camus 提交于 9月 07, 2014

On ARC, lower 2G of address space is translated and used for
 - user vaddr space (region 0 to 5)
 - unused kernel-user gutter (region 6)
 - kernel vaddr space (region 7)

where each region simply represents 256MB of address space.

The kernel vaddr space of 256MB is used to implement vmalloc, modules
So far this was enough, but not on EZChip system with 4K CPUs (given
that per cpu mechanism uses vmalloc for allocating chunks)

So allow VMALLOC_SIZE to be configurable by expanding down into the unused
kernel-user gutter region which at default 256M was excessive anyways.

Also use _BITUL() to fix a build error since PGDIR_SIZE cannot use "1UL"
as called from assembly code in mm/tlbex.S
Signed-off-by: NNoam Camus <noamc@ezchip.com>
[vgupta: rewrote changelog, debugged bootup crash due to int vs. hex]
Acked-by: NVineet Gupta <vgupta@synopsys.com>

15ca68a9

ARC: [intc-*] Do a domain lookup in primary handler for hwirq -> linux virq · 1b0ccb8a

由 Vineet Gupta 提交于 1月 01, 2016

The primary interrupt handler arch_do_IRQ() was passing hwirq as linux
virq to core code. This was fragile and worked so far as we only had legacy/linear
domains.

This came out of a rant by Marc Zyngier.
http://lists.infradead.org/pipermail/linux-snps-arc/2015-December/000298.html

Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Noam Camus <noamc@ezchip.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

1b0ccb8a

ARC: clockevent: Prepare for DT based probe · 69fbd098

由 Noam Camus 提交于 1月 14, 2016

 - call clocksource_probe()
 - This in turns needs of_clk_init() to be called earlier

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NNoam Camus <noamc@ezchip.com>
[vgupta: broken off from a bigger patch]
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

69fbd098

05 5月, 2016 1 次提交

ARC: support HIGHMEM even without PAE40 · 26f9d5fd

由 Vineet Gupta 提交于 4月 18, 2016

Initial HIGHMEM support on ARC was introduced for PAE40 where the low
memory (0x8000_0000 based) and high memory (0x1_0000_0000) were
physically contiguous. So CONFIG_FLATMEM sufficed (despite a peipheral
hole in the middle, which wasted a bit of struct page memory, but things
worked).

However w/o PAE, highmem was not possible and we could only reach
~1.75GB of DDR. Now there is a use case to access ~4GB of DDR w/o PAE40
The idea is to have low memory at canonical 0x8000_0000 and highmem
at 0 so enire 4GB address space is available for physical addressing
This needs additional platform/interconnect mapping to convert
the non contiguous physical addresses into linear bus adresses.

From Linux point of view, non contiguous divide means FLATMEM no
longer works and DISCONTIGMEM is needed to track the pfns in the 2
regions.

This scheme would also work for PAE40, only better in that we don't
waste struct page memory for the peripheral hole.

The DT description will be something like

    memory {
        ...
        reg = <0x80000000 0x200000000   /* 512MB: lowmem */
               0x00000000 0x10000000>;  /* 256MB: highmem */
   }
Signed-off-by: NNoam Camus <noamc@ezchip.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

26f9d5fd

27 4月, 2016 2 次提交

ARC: add support for reserved memory defined by device tree · 1b10cb21

由 Alexey Brodkin 提交于 4月 26, 2016

Enable reserved memory initialization from device tree.
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

1b10cb21

ARC: support generic per-device coherent dma mem · 32ed9a0e

由 Alexey Brodkin 提交于 4月 26, 2016

Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

32ed9a0e

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功