提交 · 28dce7c7703fd6ec922fa63b1187cf9f43d1d1c4 · openanolis / cloud-kernel

27 8月, 2015 4 次提交

ARCv2: perf: Finally introduce HS perf unit · 9b28829d

由 Vineet Gupta 提交于 11月 18, 2014

With all features in place, the ARC HS pct block can now be effectively
allowed to be probed/used
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

9b28829d

ARCv2: perf: implement exclusion of event counting in user or kernel mode · e6b1d126

由 Alexey Brodkin 提交于 8月 24, 2015

Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

e6b1d126

ARCv2: perf: Support sampling events using overflow interrupts · 36481cf7

由 Alexey Brodkin 提交于 8月 24, 2015

In times of ARC 700 performance counters didn't have support of
interrupt an so for ARC we only had support of non-sampling events.

Put simply only "perf stat" was functional.

Now with ARC HS we have support of interrupts in performance counters
which this change introduces support of.

ARC performance counters act in the following way in regard of
interrupts generation.
 [1] A counter counts starting from value set in PCT_COUNT register pair
 [2] Once counter reaches value set in PCT_INT_CNT interrupt is raised

Basic setup look like this:
 [1] PCT_COUNT = 0;
 [2] PCT_INT_CNT = __limit_value__;
 [3] Enable interrupts for that counter and let it run
 [4] Let counter reach its limit
 [5] Handle interrupt when it happens

Note that PCT HW block is build in CPU core and so ints interrupt
line (which is basically OR of all counters IRQs) is wired directly to
top-level IRQC. That means do de-assert PCT interrupt it's required to
reset IRQs from all counters that have reached their limit values.
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

36481cf7

ARC: perf: cap the number of counters to hardware max of 32 · fb7c5725

由 Vineet Gupta 提交于 8月 24, 2015

The number of counters in PCT can never be more than 32 (while
countable conditions could be 100+) for both ARCompact and ARCv2

And while at it update copyright dates.
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

fb7c5725

20 8月, 2015 7 次提交

V
ARC: add/fix some comments in code - no functional change · 09074950
由 Vineet Gupta 提交于 8月 19, 2015
```
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
```
09074950

ARC: change some branchs to jumps to resolve linkage errors · 6de6066c

由 Yuriy Kolerov 提交于 8月 12, 2015

When kernel's binary becomes large enough (32M and more) errors
may occur during the final linkage stage. It happens because
the build system uses short relocations for ARC  by default.
This problem may be easily resolved by passing -mlong-calls
option to GCC to use long absolute jumps (j) instead of short
relative branchs (b).

But there are fragments of pure assembler code exist which use
branchs in inappropriate places and cause a linkage error because
of relocations overflow.

First of these fragments is .fixup insertion in futex.h and
unaligned.c. It inserts a code in the separate section (.fixup)
with branch instruction. It leads to the linkage error when
kernel becomes large.

Second of these fragments is calling scheduler's functions
(common kernel code) from entry.S of ARC's code. When kernel's
binary becomes large it may lead to the linkage error because
scheduler may occur far enough from ARC's code in the final
binary.
Signed-off-by: NYuriy Kolerov <yuriy.kolerov@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

6de6066c

ARC: ensure futex ops are atomic in !LLSC config · eb2cd8b7

由 Vineet Gupta 提交于 8月 06, 2015

W/o hardware assisted atomic r-m-w the best we can do is to disable
preemption.

Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

eb2cd8b7

ARC: make futex_atomic_cmpxchg_inatomic() return bimodal · 882a95ae

由 Vineet Gupta 提交于 8月 06, 2015

Callers of cmpxchg_futex_value_locked() in futex code expect bimodal
return value:
  !0 (essentially -EFAULT as failure)
   0 (success)

Before this patch, the success return value was old value of futex,
which could very well be non zero, causing caller to possibly take the
failure path erroneously.

Fix that by returning 0 for success

(This fix was done back in 2011 for all upstream arches, which ARC
obviously missed)

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

882a95ae

ARC: futex cosmetics · ed574e2b

由 Vineet Gupta 提交于 8月 05, 2015

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

ed574e2b

ARC: add barriers to futex code · 31d30c82

由 Vineet Gupta 提交于 8月 05, 2015

The atomic ops on futex need to provide the full barrier just like
regular atomics in kernel.

Also remove pagefault_enable/disable in futex_atomic_cmpxchg_inatomic()
as core code already does that

Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Michel Lespinasse <walken@google.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

31d30c82

ARCv2: Support IO Coherency and permutations involving L1 and L2 caches · f2b0b25a

由 Alexey Brodkin 提交于 5月 25, 2015

In case of ARCv2 CPU there're could be following configurations
that affect cache handling for data exchanged with peripherals
via DMA:
 [1] Only L1 cache exists
 [2] Both L1 and L2 exist, but no IO coherency unit
 [3] L1, L2 caches and IO coherency unit exist

Current implementation takes care of [1] and [2].
Moreover support of [2] is implemented with run-time check
for SLC existence which is not super optimal.

This patch introduces support of [3] and rework of DMA ops
usage. Instead of doing run-time check every time a particular
DMA op is executed we'll have 3 different implementations of
DMA ops and select appropriate one during init.

As for IOC support for it we need:
 [a] Implement empty DMA ops because IOC takes care of cache
     coherency with DMAed data
 [b] Route dma_alloc_coherent() via dma_alloc_noncoherent()
     This is required to make IOC work in first place and also
     serves as optimization as LD/ST to coherent buffers can be
     srviced from caches w/o going all the way to memory
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
[vgupta:
  -Added some comments about IOC gains
  -Marked dma ops as static,
  -Massaged changelog a bit]
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

f2b0b25a

07 8月, 2015 1 次提交

ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoff · 10971638

由 Vineet Gupta 提交于 8月 07, 2015

The increment of delay counter was 2 instructions:
Arithmatic Shfit Left (ASL) + set to 1 on overflow

This can be done in 1 using ROtate Left (ROL)
Suggested-by: NNigel Topham <ntopham@synopsys.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

10971638

05 8月, 2015 1 次提交

ARC: Make pt_regs regs unsigned · 87ce6280

由 Vineet Gupta 提交于 8月 05, 2015

KGDB fails to build after f51e2f19 ("ARC: make sure instruction_pointer()
returns unsigned value")

The hack to force one specific reg to unsigned backfired. There's no
reason to keep the regs signed after all.

|  CC      arch/arc/kernel/kgdb.o
|../arch/arc/kernel/kgdb.c: In function 'kgdb_trap':
| ../arch/arc/kernel/kgdb.c:180:29: error: lvalue required as left operand of assignment
|   instruction_pointer(regs) -= BREAK_INSTR_SIZE;
Reported-by: NYuriy Kolerov <yuriy.kolerov@synopsys.com>
Fixes: f51e2f19 ("ARC: make sure instruction_pointer() returns unsigned value")
Cc: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

87ce6280

04 8月, 2015 6 次提交

ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle · b89aa12c

由 Vineet Gupta 提交于 7月 21, 2015

The previous commit for delayed retry of SCOND needs some fine tuning
for spin locks.

The backoff from delayed retry in conjunction with spin looping of lock
itself can potentially cause the delay counter to reach high values.
So to provide fairness to any lock operation, after a lock "seems"
available (i.e. just before first SCOND try0, reset the delay counter
back to starting value of 1

Essentially reset delay to 1 for a new spin-wait-loop-acquire cycle.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

b89aa12c

ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff · e78fdfef

由 Vineet Gupta 提交于 7月 14, 2015

This is to workaround the llock/scond livelock

HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping
coherency transactions in the SCU. The exclusive line state keeps rotating
among contenting cores leading to a never ending cycle. So break the cycle
by deferring the retry of failed exclusive access (SCOND). The actual delay
needed is function of number of contending cores as well as the unrelated
coherency traffic from other cores. To keep the code simple, start off with
small delay of 1 which would suffice most cases and in case of contention
double the delay. Eventually the delay is sufficient such that the coherency
pipeline is drained, thus a subsequent exclusive access would succeed.

Link: http://lkml.kernel.org/r/1438612568-28265-1-git-send-email-vgupta@synopsys.comAcked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

e78fdfef

ARC: LLOCK/SCOND based rwlock · 69cbe630

由 Vineet Gupta 提交于 7月 16, 2015

With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
for a guarding spin lock.

This in turn elides the EXchange instruction based spinning which causes
the cacheline transition to exclusive state and concurrent spinning
across cores would cause the line to keep bouncing around.
LLOCK/SCOND based implementation is superior as spinning on LLOCK keeps
the cacheline in shared state.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

69cbe630

ARC: LLOCK/SCOND based spin_lock · ae7eae9e

由 Vineet Gupta 提交于 7月 14, 2015

Current spin_lock uses EXchange instruction to implement the atomic test
and set of lock location (reads orig value and ST 1). This however forces
the cacheline into exclusive state (because of the ST) and concurrent
loops in multiple cores will bounce the line around between cores.

Instead, use LLOCK/SCOND to implement the atomic test and set which is
better as line is in shared state while lock is spinning on LLOCK

The real motivation of this change however is to make way for future
changes in atomics to implement delayed retry (with backoff).
Initial experiment with delayed retry in atomics combined with orig
EX based spinlock was a total disaster (broke even LMBench) as
struct sock has a cache line sharing an atomic_t and spinlock. The
tight spinning on lock, caused the atomic retry to keep backing off
such that it would never finish.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

ae7eae9e

ARC: refactor atomic inline asm operands with symbolic names · 8ac0665f

由 Vineet Gupta 提交于 7月 21, 2015

This reduces the diff in forth-coming patches and also helps understand
better the incremental changes to inline asm.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

8ac0665f

Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock" · f5959cb0

由 Vineet Gupta 提交于 7月 29, 2015

Extended testing of quad core configuration revealed that this fix was
insufficient. Specifically LTP open posix shm_op/23-1 would cause the
hardware livelock in llock/scond loop in update_cpu_load_active()

So remove this and make way for a proper workaround

This reverts commit a5c8b52a.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

f5959cb0

03 8月, 2015 1 次提交

ARCv2: Fix the peripheral address space detection · e13c42ec

由 Vineet Gupta 提交于 8月 03, 2015

With HS 2.1 release, the peripheral space register no longer contains
the uncached space specifics, causing the kernel to panic early on.
So read the newer NON VOLATILE AUX register to get that info.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

e13c42ec

18 7月, 2015 1 次提交

mm: clean up per architecture MM hook header files · f2abeef9

由 Laurent Dufour 提交于 7月 17, 2015

Commit 2ae416b1 ("mm: new mm hook framework") introduced an empty
header file (mm-arch-hooks.h) for every architecture, even those which
doesn't need to define mm hooks.

As suggested by Geert Uytterhoeven, this could be cleaned through the use
of a generic header file included via each per architecture
asm/include/Kbuild file.

The PowerPC architecture is not impacted here since this architecture has
to defined the arch_remap MM hook.
Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NVineet Gupta <vgupta@synopsys.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2abeef9

13 7月, 2015 1 次提交

ARC: make sure instruction_pointer() returns unsigned value · f51e2f19

由 Alexey Brodkin 提交于 7月 13, 2015

Currently instruction_pointer() returns pt_regs->ret and so return value
is of type "long", which implicitly stands for "signed long".

While that's perfectly fine when dealing with 32-bit values if return
value of instruction_pointer() gets assigned to 64-bit variable sign
extension may happen.

And at least in one real use-case it happens already.
In perf_prepare_sample() return value of perf_instruction_pointer()
(which is an alias to instruction_pointer() in case of ARC) is assigned
to (struct perf_sample_data)->ip (which type is "u64").

And what we see if instuction pointer points to user-space application
that in case of ARC lays below 0x8000_0000 "ip" gets set properly with
leading 32 zeros. But if instruction pointer points to kernel address
space that starts from 0x8000_0000 then "ip" is set with 32 leadig
"f"-s. I.e. id instruction_pointer() returns 0x8100_0000, "ip" will be
assigned with 0xffff_ffff__8100_0000. Which is obviously wrong.

In particular that issuse broke output of perf, because perf was unable
to associate addresses like 0xffff_ffff__8100_0000 with anything from
/proc/kallsyms.

That's what we used to see:
 ----------->8----------
  6.27%  ls       [unknown]                [k] 0xffffffff8046c5cc
  2.96%  ls       libuClibc-0.9.34-git.so  [.] memcpy
  2.25%  ls       libuClibc-0.9.34-git.so  [.] memset
  1.66%  ls       [unknown]                [k] 0xffffffff80666536
  1.54%  ls       libuClibc-0.9.34-git.so  [.] 0x000224d6
  1.18%  ls       libuClibc-0.9.34-git.so  [.] 0x00022472
 ----------->8----------

With that change perf output looks much better now:
 ----------->8----------
  8.21%  ls       [kernel.kallsyms]        [k] memset
  3.52%  ls       libuClibc-0.9.34-git.so  [.] memcpy
  2.11%  ls       libuClibc-0.9.34-git.so  [.] malloc
  1.88%  ls       libuClibc-0.9.34-git.so  [.] memset
  1.64%  ls       [kernel.kallsyms]        [k] _raw_spin_unlock_irqrestore
  1.41%  ls       [kernel.kallsyms]        [k] __d_lookup_rcu
 ----------->8----------
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
Cc: arc-linux-dev@synopsys.com
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

f51e2f19

09 7月, 2015 2 次提交

V
ARC: Add llock/scond to futex backend · 9138d413
由 Vineet Gupta 提交于 6月 25, 2015
```
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
```
9138d413

ARC: Make ARC bitops "safer" (add anti-optimization) · 80f42084

由 Vineet Gupta 提交于 7月 03, 2015

ARCompact/ARCv2 ISA provide that any instructions which deals with
bitpos/count operand ASL, LSL, BSET, BCLR, BMSK .... will only consider
lower 5 bits. i.e. auto-clamp the pos to 0-31.

ARC Linux bitops exploited this fact by NOT explicitly masking out upper
bits for @nr operand in general, saving a bunch of AND/BMSK instructions
in generated code around bitops.

While this micro-optimization has worked well over years it is NOT safe
as shifting a number with a value, greater than native size is
"undefined" per "C" spec.

So as it turns outm EZChip ran into this eventually, in their massive
muti-core SMP build with 64 cpus. There was a test_bit() inside a loop
from 63 to 0 and gcc was weirdly optimizing away the first iteration
(so it was really adhering to standard by implementing undefined behaviour
vs. removing all the iterations which were phony i.e. (1 << [63..32])

| for i = 63 to 0
|    X = ( 1 << i )
|    if X == 0
|       continue

So fix the code to do the explicit masking at the expense of generating
additional instructions. Fortunately, this can be mitigated to a large
extent as gcc has SHIFT_COUNT_TRUNCATED which allows combiner to fold
masking into shift operation itself. It is currently not enabled in ARC
gcc backend, but could be done after a bit of testing.

Fixes STAR 9000866918 ("unsafe "undefined behavior" code in kernel")
Reported-by: NNoam Camus <noamc@ezchip.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

80f42084

01 7月, 2015 1 次提交

arc: use for_each_sg() · 92ed932d

由 Akinobu Mita 提交于 6月 30, 2015

This replaces the plain loop over the sglist array with for_each_sg()
macro which consists of sg_next() function calls.  Since arc doesn't
select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in
order to loop over each sg element.  But this can help find problems with
drivers that do not properly initialize their sg tables when
CONFIG_DEBUG_SG is enabled.
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Acked-by: NVineet Gupta <vgupta@synopsys.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92ed932d

25 6月, 2015 7 次提交

mm: new mm hook framework · 2ae416b1

由 Laurent Dufour 提交于 6月 24, 2015

CRIU is recreating the process memory layout by remapping the checkpointee
memory area on top of the current process (criu).  This includes remapping
the vDSO to the place it has at checkpoint time.

However some architectures like powerpc are keeping a reference to the
vDSO base address to build the signal return stack frame by calling the
vDSO sigreturn service.  So once the vDSO has been moved, this reference
is no more valid and the signal frame built later are not usable.

This patch serie is introducing a new mm hook framework, and a new
arch_remap hook which is called when mremap is done and the mm lock still
hold.  The next patch is adding the vDSO remap and unmap tracking to the
powerpc architecture.

This patch (of 3):

This patch introduces a new set of header file to manage mm hooks:
- per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h)
- a generic header (include/linux/mm-arch-hooks.h)

The architecture which need to overwrite a hook as to redefine it in its
header file, while architecture which doesn't need have nothing to do.

The default hooks are defined in the generic header and are used in the
case the architecture is not defining it.

In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should
be moved here.
Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ae416b1

ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency) · 795f4558

由 Vineet Gupta 提交于 4月 03, 2015

L2 cache on ARCHS processors is called SLC (System Level Cache)
For working DMA (in absence of hardware assisted IO Coherency) we need
to manage SLC explicitly when buffers transition between cpu and
controllers.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

795f4558

ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock · a5c8b52a

由 Vineet Gupta 提交于 12月 11, 2014

A quad core SMP build could get into hardware livelock with concurrent
LLOCK/SCOND. Workaround that by adding a PREFETCHW which is serialized by
SCU (System Coherency Unit). It brings the cache line in Exclusive state
and makes others invalidate their lines. This gives enough time for
winner to complete the LLOCK/SCOND, before others can get the line back.

The prefetchw in the ll/sc loop is not nice but this is the only
software workaround for current version of RTL.

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

a5c8b52a

ARC: Reduce bitops lines of code using macros · 04e2eee4

由 Vineet Gupta 提交于 3月 31, 2015

No semantical changes !
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

04e2eee4

ARCv2: barriers · b8a03302

由 Vineet Gupta 提交于 3月 11, 2015

ARCv2 based HS38 cores are weakly ordered and thus explicit barriers for
kernel proper.

SMP barrier is provided by DMB instruction which also guarantees local
barrier hence used as backend of smp_*mb() as well as *mb() APIs

Also hookup barriers into MMIO accessors to avoid ordering issues in IO

Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

b8a03302

ARC: add smp barriers around atomics per Documentation/atomic_ops.txt · 2576c28e

由 Vineet Gupta 提交于 11月 20, 2014

 - arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers
   Since ARCv2 only provides load/load, store/store and all/all, we need
   the full barrier

 - LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified
   values were lacking the explicit smp barriers.

 - Non LLOCK/SCOND varaints don't need the explicit barriers since that
   is implicity provided by the spin locks used to implement the
   critical section (the spin lock barriers in turn are also fixed in
   this commit as explained above

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

2576c28e

ARC: add compiler barrier to LLSC based cmpxchg · d57f7272

由 Vineet Gupta 提交于 11月 13, 2014

When auditing cmpxchg call sites, Chuck noted that gcc was optimizing
away some of the desired LDs.

|	do {
|		new = old = *ipi_data_ptr;
|		new |= 1U << msg;
|	} while (cmpxchg(ipi_data_ptr, old, new) != old);

was generating to below

| 8015cef8:	ld         r2,[r4,0]  <-- First LD
| 8015cefc:	bset       r1,r2,r1
|
| 8015cf00:	llock      r3,[r4]  <-- atomic op
| 8015cf04:	brne       r3,r2,8015cf10
| 8015cf08:	scond      r1,[r4]
| 8015cf0c:	bnz        8015cf00
|
| 8015cf10:	brne       r3,r2,8015cf00  <-- Branch doesn't go to orig LD

Although this was fixed by adding a ACCESS_ONCE in this call site, it
seems safer (for now at least) to add compiler barrier to LLSC based
cmpxchg
Reported-by: NChuck Jordan <cjordan@synopsys,com>
Cc: <stable@vger.kernel.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

d57f7272

22 6月, 2015 8 次提交

ARCv2: SMP: clocksource: Enable Global Real Time counter · 72d72880

由 Vineet Gupta 提交于 12月 24, 2014

Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

72d72880

ARCv2: SMP: Support ARConnect (MCIP) for Inter-Core-Interrupts et al · 82fea5a1

由 Vineet Gupta 提交于 9月 10, 2014

Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

82fea5a1

ARCv2: Adhere to Zero Delay loop restriction · 8922bc30

由 Vineet Gupta 提交于 10月 07, 2013

Branch insn can't be scheduled as last insn of Zero Overhead loop
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

8922bc30

V
ARCv2: MMUv4: support aliasing icache config · bcc4d65a
由 Vineet Gupta 提交于 6月 04, 2015
```
This is also default for AXS103 release
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
```
bcc4d65a

ARCv2: MMUv4: cache programming model changes · d1f317d8

由 Vineet Gupta 提交于 4月 06, 2015

Caveats about cache flush on ARCv2 based cores

- dcache is PIPT so paddr is sufficient for cache maintenance ops (no
  need to setup PTAG reg

- icache is still VIPT but only aliasing configs need PTAG setup

So basically this is departure from MMU-v3 which always need vaddr in
line ops registers (DC_IVDL, DC_FLDL, IC_IVIL) but paddr in DC_PTAG,
IC_PTAG respectively.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

d1f317d8

V
ARCv2: MMUv4: TLB programming Model changes · d7a512bf
由 Vineet Gupta 提交于 4月 06, 2015
```
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
```
d7a512bf

ARCv2: STAR 9000814690: Really Re-enable interrupts to avoid deadlocks · 4de0e528

由 Vineet Gupta 提交于 11月 15, 2014

The issue was, on HS when interrupt is taken, IRQ_ACT is set and that is
NOT cleared unless we do RTIE (or manually clear it). Linux interrupt
handling has top and bottom halves. Latter lead to softirqs (which can
reschedule) AND expect interrupts to be REALLY re-enabled which was NOT
happening for us since we only SETI, dont clear IRQ_ACT

So we can have a state when both cores have taken interrupt (IRQ_ACT set),
get rescheduled, both send IPI and wait in CSD lock which will never be
cleared as cores can't take the pending IPI IRQ due to existing IRQ_ACT
set.

So local_irq_enable() now drops the IRQ_ACT.act bit to re-enable IRQs.
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

4de0e528

ARCv2: STAR 9000808988: signals involving Delay Slot · 0d7b8855

由 Vineet Gupta 提交于 10月 07, 2014

Reported by Anton as LTP:munmap01 failing with Illegal Instruction
Exception.

   --------------------->8--------------------------------------
   mmap2(NULL, 24576, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x200d2000
   munmap(0x200d2000, 24576)               = 0
   --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x200d2000}
   ---
   potentially unexpected fatal signal 4.
   Path: /munmap01
   CPU: 0 PID: 61 Comm: munmap01 Not tainted 3.13.0-g5d5c46d9a556 #8
   task: 9f1a8000 ti: 9f154000 task.ti: 9f154000

   [ECR   ]: 0x00020100 => Illegal Insn
   [EFA   ]: 0x0001354c
   [BLINK ]: 0x200515d4
   [ERET  ]: 0x1354c
       @off 0x1354c in [/munmap01]
       VMA: 0x00010000 to 0x00018000
   [STAT32]: 0x800802c0
   ...
   --------------------->8--------------------------------------

The issue was
1. munmap01 accessed unmapped memory (on purpose) with signal handler
   installed for SIGSEGV

2. The faulting instruction happened to be in Delay Slot
   00011864 <main>:
      11908:	bl.d       13284 <tst_resm>
      1190c:	stb        r16,[r2]

3. kernel sets up the reg file for signal handler and correctly clears
   the DE bit in pt_regs->status32 placeholder

4. However RESTORE_CALLEE_SAVED_USER macro is not adjusted for ARCv2,
   and it over-writes the above with orig/stale value of status32

5. After RTIE, userspace signal handler executes a non branch
   instruction with DE bit set, triggering Illegal Instruction Exception.
Reported-by: NAnton Kolesov <akolesov@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>

0d7b8855

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功