- 28 10月, 2015 11 次提交
-
-
由 Vineet Gupta 提交于
That way a single flip of phys_addr_t to 64 bit ensures all places dealing with physical addresses get correct data Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
Implement kmap* API for ARC. This enables - permanent kernel maps (pkmaps): :kmap() API - fixmap : kmap_atomic() We use a very simple/uniform approach for both (unlike some of the other arches). So fixmap doesn't use the customary compile time address stuff. The important semantic is sleep'ability (pkmap) vs. not (fixmap) which the API guarantees. Note that this patch only enables highmem for subsequent PAE40 support as there is no real highmem for ARC in pure 32-bit paradigm as explained below. ARC has 2:2 address split of the 32-bit address space with lower half being translated (virtual) while upper half unstranslated (0x8000_0000 to 0xFFFF_FFFF). kernel itself is linked at base of unstranslated space (i.e. 0x8000_0000 onwards), which is mapped to say DDR 0x0 by external Bus Glue logic (outside the core). So kernel can potentially access 1.75G worth of memory directly w/o need for highmem. (the top 256M is taken by uncached peripheral space from 0xF000_0000 to 0xFFFF_FFFF) In PAE40, hardware can address memory beyond 4G (0x1_0000_0000) while the logical/virtual addresses remain 32-bits. Thus highmem is required for kernel proper to be able to access these pages for it's own purposes (user space is agnostic to this anyways). Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
Before we plug in highmem support, some of code needs to be ready for it - copy_user_highpage() needs to be using the kmap_atomic API - mk_pte() can't assume page_address() - do_page_fault() can't assume VMALLOC_END is end of kernel vaddr space Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Alexey Brodkin 提交于
Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
MCIP now registers it's own per cpu setup routine (for IPI IRQ request) using smp_ops.init_irq_cpu(). So no need for platforms to do that. This now completely decouples platforms from MCIP. Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
Note this is not part of platform owned static machine_desc, but more of device owned plat_smp_ops (rather misnamed) which a IPI provider or some such typically defines. This will help us seperate out the IPI registration from platform specific init_cpu_smp() into device specific init_irq_cpu() Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
This conveys better that it is called for each cpu Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
MCIP now registers it's own probe callback with smp_ops.init_early_smp() which is called by ARC common code, so no need for platforms to do that. This decouples the platforms and MCIP and helps confine MCIP details to it's own file. Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
This adds a platform agnostic early SMP init hook which is called on Master core before calling setup_processor() setup_arch() smp_init_cpus() smp_ops.init_early_smp() ... setup_processor() How this helps: - Used for one time init of certain SMP centric IP blocks, before calling setup_processor() which probes various bits of core, possibly including this block - Currently platforms need to call this IP block init from their init routines, which doesn't make sense as this is specific to ARC core and not platform and otherwise requires copy/paste in all (and hence a possible point of failure) e.g. MCIP init is called from 2 platforms currently (axs10x and sim) which will go away once we have this. This change only adds the hooks but they are empty for now. Next commit will populate them and remove the explicit init calls from platforms. Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
These are not in use for ARC platforms. Moreover DT mechanims exist to probe them w/o explicit platform calls. - clocksource drivers can use CLOCKSOURCE_OF_DECLARE() - intc IRQCHIP_DECLARE() calls + cascading inside DT allows external intc to be probed automatically Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
The reason this was not done so far was lack of genuine IPI_IRQ for ARC700, as we don't have a SMP version of core yet (which might change soon thx to EZChip). Nevertheles to increase the build coverage, we need to allow CONFIG_SMP for ARC700 and still be able to run it on a UP platform (nsim or AXS101) with a UP Device Tree (SMP-on-UP) The build itself requires some define for IPI_IRQ and even a dummy value is fine since that code won't run anyways. Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 17 10月, 2015 9 次提交
-
-
由 Vineet Gupta 提交于
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
This frees up some bits to hold more high level info such as PAE being present, w/o increasing the size of already bloated cpuinfo struct Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
It was generating warnings when called as write_aux_reg(x, paddr >> 32) Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
The requirement is to - Reenable Exceptions (AE cleared) - Reenable Interrupts (E1/E2 set) We need to do wiggle these bits into ERSTATUS and call RTIE. Prev version used the pre-exception STATUS32 as starting point for what goes into ERSTATUS. This required explicit fixups of U/DE/L bits. Instead, use the current (in-exception) STATUS32 as starting point. Being in exception handler U/DE/L can be safely assumed to be correct. Only AE/E1/E2 need to be fixed. So the new implementation is slightly better -Avoids read form memory -Is 4 bytes smaller for the typical 1 level of intr configuration -Depicts the semantics more clearly Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
Historically this was done by ARC IDE driver, which is long gone. IRQ core is pretty robust now and already checks if IRQs are enabled in hard ISRs. Thus no point in checking this in arch code, for every call of irq enabled. Further if some driver does do that - let it bring down the system so we notice/fix this sooner than covering up for sucker This makes local_irq_enable() - for L1 only case atleast simple enough so we can inline it. Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
Implement the TLB flush routine to evict a sepcific Super TLB entry, vs. moving to a new ASID on every such flush. Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
MMUv4 in HS38x cores supports Super Pages which are basis for Linux THP support. Normal and Super pages can co-exist (ofcourse not overlap) in TLB with a new bit "SZ" in TLB page desciptor to distinguish between them. Super Page size is configurable in hardware (4K to 16M), but fixed once RTL builds. The exact THP size a Linx configuration will support is a function of: - MMU page size (typical 8K, RTL fixed) - software page walker address split between PGD:PTE:PFN (typical 11:8:13, but can be changed with 1 line) So for above default, THP size supported is 8K * 256 = 2M Default Page Walker is 2 levels, PGD:PTE:PFN, which in THP regime reduces to 1 level (as PTE is folded into PGD and canonically referred to as PMD). Thus thp PMD accessors are implemented in terms of PTE (just like sparc) Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 09 10月, 2015 3 次提交
-
-
由 Vineet Gupta 提交于
Needed for THP, but will also come in handy for fast GUP later Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
No semantical changes Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
ARC is the only arch with unsigned long type (vs. struct page *). Historically this was done to avoid the page_address() calls in various arch hooks which need to get the virtual/logical address of the table. Some arches alternately define it as pte_t *, and is as efficient as unsigned long (generated code doesn't change) Suggested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 27 8月, 2015 4 次提交
-
-
由 Vineet Gupta 提交于
With all features in place, the ARC HS pct block can now be effectively allowed to be probed/used Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Alexey Brodkin 提交于
Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Alexey Brodkin 提交于
In times of ARC 700 performance counters didn't have support of interrupt an so for ARC we only had support of non-sampling events. Put simply only "perf stat" was functional. Now with ARC HS we have support of interrupts in performance counters which this change introduces support of. ARC performance counters act in the following way in regard of interrupts generation. [1] A counter counts starting from value set in PCT_COUNT register pair [2] Once counter reaches value set in PCT_INT_CNT interrupt is raised Basic setup look like this: [1] PCT_COUNT = 0; [2] PCT_INT_CNT = __limit_value__; [3] Enable interrupts for that counter and let it run [4] Let counter reach its limit [5] Handle interrupt when it happens Note that PCT HW block is build in CPU core and so ints interrupt line (which is basically OR of all counters IRQs) is wired directly to top-level IRQC. That means do de-assert PCT interrupt it's required to reset IRQs from all counters that have reached their limit values. Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
The number of counters in PCT can never be more than 32 (while countable conditions could be 100+) for both ARCompact and ARCv2 And while at it update copyright dates. Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 20 8月, 2015 7 次提交
-
-
由 Vineet Gupta 提交于
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Yuriy Kolerov 提交于
When kernel's binary becomes large enough (32M and more) errors may occur during the final linkage stage. It happens because the build system uses short relocations for ARC by default. This problem may be easily resolved by passing -mlong-calls option to GCC to use long absolute jumps (j) instead of short relative branchs (b). But there are fragments of pure assembler code exist which use branchs in inappropriate places and cause a linkage error because of relocations overflow. First of these fragments is .fixup insertion in futex.h and unaligned.c. It inserts a code in the separate section (.fixup) with branch instruction. It leads to the linkage error when kernel becomes large. Second of these fragments is calling scheduler's functions (common kernel code) from entry.S of ARC's code. When kernel's binary becomes large it may lead to the linkage error because scheduler may occur far enough from ARC's code in the final binary. Signed-off-by: NYuriy Kolerov <yuriy.kolerov@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
W/o hardware assisted atomic r-m-w the best we can do is to disable preemption. Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
Callers of cmpxchg_futex_value_locked() in futex code expect bimodal return value: !0 (essentially -EFAULT as failure) 0 (success) Before this patch, the success return value was old value of futex, which could very well be non zero, causing caller to possibly take the failure path erroneously. Fix that by returning 0 for success (This fix was done back in 2011 for all upstream arches, which ARC obviously missed) Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
The atomic ops on futex need to provide the full barrier just like regular atomics in kernel. Also remove pagefault_enable/disable in futex_atomic_cmpxchg_inatomic() as core code already does that Cc: David Hildenbrand <dahi@linux.vnet.ibm.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Alexey Brodkin 提交于
In case of ARCv2 CPU there're could be following configurations that affect cache handling for data exchanged with peripherals via DMA: [1] Only L1 cache exists [2] Both L1 and L2 exist, but no IO coherency unit [3] L1, L2 caches and IO coherency unit exist Current implementation takes care of [1] and [2]. Moreover support of [2] is implemented with run-time check for SLC existence which is not super optimal. This patch introduces support of [3] and rework of DMA ops usage. Instead of doing run-time check every time a particular DMA op is executed we'll have 3 different implementations of DMA ops and select appropriate one during init. As for IOC support for it we need: [a] Implement empty DMA ops because IOC takes care of cache coherency with DMAed data [b] Route dma_alloc_coherent() via dma_alloc_noncoherent() This is required to make IOC work in first place and also serves as optimization as LD/ST to coherent buffers can be srviced from caches w/o going all the way to memory Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com> [vgupta: -Added some comments about IOC gains -Marked dma ops as static, -Massaged changelog a bit] Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 07 8月, 2015 1 次提交
-
-
由 Vineet Gupta 提交于
The increment of delay counter was 2 instructions: Arithmatic Shfit Left (ASL) + set to 1 on overflow This can be done in 1 using ROtate Left (ROL) Suggested-by: NNigel Topham <ntopham@synopsys.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 05 8月, 2015 1 次提交
-
-
由 Vineet Gupta 提交于
KGDB fails to build after f51e2f19 ("ARC: make sure instruction_pointer() returns unsigned value") The hack to force one specific reg to unsigned backfired. There's no reason to keep the regs signed after all. | CC arch/arc/kernel/kgdb.o |../arch/arc/kernel/kgdb.c: In function 'kgdb_trap': | ../arch/arc/kernel/kgdb.c:180:29: error: lvalue required as left operand of assignment | instruction_pointer(regs) -= BREAK_INSTR_SIZE; Reported-by: NYuriy Kolerov <yuriy.kolerov@synopsys.com> Fixes: f51e2f19 ("ARC: make sure instruction_pointer() returns unsigned value") Cc: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
- 04 8月, 2015 4 次提交
-
-
由 Vineet Gupta 提交于
The previous commit for delayed retry of SCOND needs some fine tuning for spin locks. The backoff from delayed retry in conjunction with spin looping of lock itself can potentially cause the delay counter to reach high values. So to provide fairness to any lock operation, after a lock "seems" available (i.e. just before first SCOND try0, reset the delay counter back to starting value of 1 Essentially reset delay to 1 for a new spin-wait-loop-acquire cycle. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
This is to workaround the llock/scond livelock HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping coherency transactions in the SCU. The exclusive line state keeps rotating among contenting cores leading to a never ending cycle. So break the cycle by deferring the retry of failed exclusive access (SCOND). The actual delay needed is function of number of contending cores as well as the unrelated coherency traffic from other cores. To keep the code simple, start off with small delay of 1 which would suffice most cases and in case of contention double the delay. Eventually the delay is sufficient such that the coherency pipeline is drained, thus a subsequent exclusive access would succeed. Link: http://lkml.kernel.org/r/1438612568-28265-1-git-send-email-vgupta@synopsys.comAcked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need for a guarding spin lock. This in turn elides the EXchange instruction based spinning which causes the cacheline transition to exclusive state and concurrent spinning across cores would cause the line to keep bouncing around. LLOCK/SCOND based implementation is superior as spinning on LLOCK keeps the cacheline in shared state. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-
由 Vineet Gupta 提交于
Current spin_lock uses EXchange instruction to implement the atomic test and set of lock location (reads orig value and ST 1). This however forces the cacheline into exclusive state (because of the ST) and concurrent loops in multiple cores will bounce the line around between cores. Instead, use LLOCK/SCOND to implement the atomic test and set which is better as line is in shared state while lock is spinning on LLOCK The real motivation of this change however is to make way for future changes in atomics to implement delayed retry (with backoff). Initial experiment with delayed retry in atomics combined with orig EX based spinlock was a total disaster (broke even LMBench) as struct sock has a cache line sharing an atomic_t and spinlock. The tight spinning on lock, caused the atomic retry to keep backing off such that it would never finish. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
-