1. 27 8月, 2015 4 次提交
  2. 20 8月, 2015 7 次提交
    • V
      09074950
    • Y
      ARC: change some branchs to jumps to resolve linkage errors · 6de6066c
      Yuriy Kolerov 提交于
      When kernel's binary becomes large enough (32M and more) errors
      may occur during the final linkage stage. It happens because
      the build system uses short relocations for ARC  by default.
      This problem may be easily resolved by passing -mlong-calls
      option to GCC to use long absolute jumps (j) instead of short
      relative branchs (b).
      
      But there are fragments of pure assembler code exist which use
      branchs in inappropriate places and cause a linkage error because
      of relocations overflow.
      
      First of these fragments is .fixup insertion in futex.h and
      unaligned.c. It inserts a code in the separate section (.fixup)
      with branch instruction. It leads to the linkage error when
      kernel becomes large.
      
      Second of these fragments is calling scheduler's functions
      (common kernel code) from entry.S of ARC's code. When kernel's
      binary becomes large it may lead to the linkage error because
      scheduler may occur far enough from ARC's code in the final
      binary.
      Signed-off-by: NYuriy Kolerov <yuriy.kolerov@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      6de6066c
    • V
      ARC: ensure futex ops are atomic in !LLSC config · eb2cd8b7
      Vineet Gupta 提交于
      W/o hardware assisted atomic r-m-w the best we can do is to disable
      preemption.
      
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      eb2cd8b7
    • V
      ARC: make futex_atomic_cmpxchg_inatomic() return bimodal · 882a95ae
      Vineet Gupta 提交于
      Callers of cmpxchg_futex_value_locked() in futex code expect bimodal
      return value:
        !0 (essentially -EFAULT as failure)
         0 (success)
      
      Before this patch, the success return value was old value of futex,
      which could very well be non zero, causing caller to possibly take the
      failure path erroneously.
      
      Fix that by returning 0 for success
      
      (This fix was done back in 2011 for all upstream arches, which ARC
      obviously missed)
      
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      882a95ae
    • V
      ARC: futex cosmetics · ed574e2b
      Vineet Gupta 提交于
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      ed574e2b
    • V
      ARC: add barriers to futex code · 31d30c82
      Vineet Gupta 提交于
      The atomic ops on futex need to provide the full barrier just like
      regular atomics in kernel.
      
      Also remove pagefault_enable/disable in futex_atomic_cmpxchg_inatomic()
      as core code already does that
      
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      31d30c82
    • A
      ARCv2: Support IO Coherency and permutations involving L1 and L2 caches · f2b0b25a
      Alexey Brodkin 提交于
      In case of ARCv2 CPU there're could be following configurations
      that affect cache handling for data exchanged with peripherals
      via DMA:
       [1] Only L1 cache exists
       [2] Both L1 and L2 exist, but no IO coherency unit
       [3] L1, L2 caches and IO coherency unit exist
      
      Current implementation takes care of [1] and [2].
      Moreover support of [2] is implemented with run-time check
      for SLC existence which is not super optimal.
      
      This patch introduces support of [3] and rework of DMA ops
      usage. Instead of doing run-time check every time a particular
      DMA op is executed we'll have 3 different implementations of
      DMA ops and select appropriate one during init.
      
      As for IOC support for it we need:
       [a] Implement empty DMA ops because IOC takes care of cache
           coherency with DMAed data
       [b] Route dma_alloc_coherent() via dma_alloc_noncoherent()
           This is required to make IOC work in first place and also
           serves as optimization as LD/ST to coherent buffers can be
           srviced from caches w/o going all the way to memory
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      [vgupta:
        -Added some comments about IOC gains
        -Marked dma ops as static,
        -Massaged changelog a bit]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      f2b0b25a
  3. 07 8月, 2015 1 次提交
  4. 05 8月, 2015 1 次提交
    • V
      ARC: Make pt_regs regs unsigned · 87ce6280
      Vineet Gupta 提交于
      KGDB fails to build after f51e2f19 ("ARC: make sure instruction_pointer()
      returns unsigned value")
      
      The hack to force one specific reg to unsigned backfired. There's no
      reason to keep the regs signed after all.
      
      |  CC      arch/arc/kernel/kgdb.o
      |../arch/arc/kernel/kgdb.c: In function 'kgdb_trap':
      | ../arch/arc/kernel/kgdb.c:180:29: error: lvalue required as left operand of assignment
      |   instruction_pointer(regs) -= BREAK_INSTR_SIZE;
      Reported-by: NYuriy Kolerov <yuriy.kolerov@synopsys.com>
      Fixes: f51e2f19 ("ARC: make sure instruction_pointer() returns unsigned value")
      Cc: Alexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      87ce6280
  5. 04 8月, 2015 6 次提交
  6. 03 8月, 2015 1 次提交
  7. 18 7月, 2015 1 次提交
  8. 13 7月, 2015 1 次提交
    • A
      ARC: make sure instruction_pointer() returns unsigned value · f51e2f19
      Alexey Brodkin 提交于
      Currently instruction_pointer() returns pt_regs->ret and so return value
      is of type "long", which implicitly stands for "signed long".
      
      While that's perfectly fine when dealing with 32-bit values if return
      value of instruction_pointer() gets assigned to 64-bit variable sign
      extension may happen.
      
      And at least in one real use-case it happens already.
      In perf_prepare_sample() return value of perf_instruction_pointer()
      (which is an alias to instruction_pointer() in case of ARC) is assigned
      to (struct perf_sample_data)->ip (which type is "u64").
      
      And what we see if instuction pointer points to user-space application
      that in case of ARC lays below 0x8000_0000 "ip" gets set properly with
      leading 32 zeros. But if instruction pointer points to kernel address
      space that starts from 0x8000_0000 then "ip" is set with 32 leadig
      "f"-s. I.e. id instruction_pointer() returns 0x8100_0000, "ip" will be
      assigned with 0xffff_ffff__8100_0000. Which is obviously wrong.
      
      In particular that issuse broke output of perf, because perf was unable
      to associate addresses like 0xffff_ffff__8100_0000 with anything from
      /proc/kallsyms.
      
      That's what we used to see:
       ----------->8----------
        6.27%  ls       [unknown]                [k] 0xffffffff8046c5cc
        2.96%  ls       libuClibc-0.9.34-git.so  [.] memcpy
        2.25%  ls       libuClibc-0.9.34-git.so  [.] memset
        1.66%  ls       [unknown]                [k] 0xffffffff80666536
        1.54%  ls       libuClibc-0.9.34-git.so  [.] 0x000224d6
        1.18%  ls       libuClibc-0.9.34-git.so  [.] 0x00022472
       ----------->8----------
      
      With that change perf output looks much better now:
       ----------->8----------
        8.21%  ls       [kernel.kallsyms]        [k] memset
        3.52%  ls       libuClibc-0.9.34-git.so  [.] memcpy
        2.11%  ls       libuClibc-0.9.34-git.so  [.] malloc
        1.88%  ls       libuClibc-0.9.34-git.so  [.] memset
        1.64%  ls       [kernel.kallsyms]        [k] _raw_spin_unlock_irqrestore
        1.41%  ls       [kernel.kallsyms]        [k] __d_lookup_rcu
       ----------->8----------
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Cc: arc-linux-dev@synopsys.com
      Cc: stable@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      f51e2f19
  9. 09 7月, 2015 2 次提交
    • V
      ARC: Add llock/scond to futex backend · 9138d413
      Vineet Gupta 提交于
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      9138d413
    • V
      ARC: Make ARC bitops "safer" (add anti-optimization) · 80f42084
      Vineet Gupta 提交于
      ARCompact/ARCv2 ISA provide that any instructions which deals with
      bitpos/count operand ASL, LSL, BSET, BCLR, BMSK .... will only consider
      lower 5 bits. i.e. auto-clamp the pos to 0-31.
      
      ARC Linux bitops exploited this fact by NOT explicitly masking out upper
      bits for @nr operand in general, saving a bunch of AND/BMSK instructions
      in generated code around bitops.
      
      While this micro-optimization has worked well over years it is NOT safe
      as shifting a number with a value, greater than native size is
      "undefined" per "C" spec.
      
      So as it turns outm EZChip ran into this eventually, in their massive
      muti-core SMP build with 64 cpus. There was a test_bit() inside a loop
      from 63 to 0 and gcc was weirdly optimizing away the first iteration
      (so it was really adhering to standard by implementing undefined behaviour
      vs. removing all the iterations which were phony i.e. (1 << [63..32])
      
      | for i = 63 to 0
      |    X = ( 1 << i )
      |    if X == 0
      |       continue
      
      So fix the code to do the explicit masking at the expense of generating
      additional instructions. Fortunately, this can be mitigated to a large
      extent as gcc has SHIFT_COUNT_TRUNCATED which allows combiner to fold
      masking into shift operation itself. It is currently not enabled in ARC
      gcc backend, but could be done after a bit of testing.
      
      Fixes STAR 9000866918 ("unsafe "undefined behavior" code in kernel")
      Reported-by: NNoam Camus <noamc@ezchip.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      80f42084
  10. 01 7月, 2015 1 次提交
  11. 25 6月, 2015 7 次提交
    • L
      mm: new mm hook framework · 2ae416b1
      Laurent Dufour 提交于
      CRIU is recreating the process memory layout by remapping the checkpointee
      memory area on top of the current process (criu).  This includes remapping
      the vDSO to the place it has at checkpoint time.
      
      However some architectures like powerpc are keeping a reference to the
      vDSO base address to build the signal return stack frame by calling the
      vDSO sigreturn service.  So once the vDSO has been moved, this reference
      is no more valid and the signal frame built later are not usable.
      
      This patch serie is introducing a new mm hook framework, and a new
      arch_remap hook which is called when mremap is done and the mm lock still
      hold.  The next patch is adding the vDSO remap and unmap tracking to the
      powerpc architecture.
      
      This patch (of 3):
      
      This patch introduces a new set of header file to manage mm hooks:
      - per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h)
      - a generic header (include/linux/mm-arch-hooks.h)
      
      The architecture which need to overwrite a hook as to redefine it in its
      header file, while architecture which doesn't need have nothing to do.
      
      The default hooks are defined in the generic header and are used in the
      case the architecture is not defining it.
      
      In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should
      be moved here.
      Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ae416b1
    • V
      ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency) · 795f4558
      Vineet Gupta 提交于
      L2 cache on ARCHS processors is called SLC (System Level Cache)
      For working DMA (in absence of hardware assisted IO Coherency) we need
      to manage SLC explicitly when buffers transition between cpu and
      controllers.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      795f4558
    • V
      ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock · a5c8b52a
      Vineet Gupta 提交于
      A quad core SMP build could get into hardware livelock with concurrent
      LLOCK/SCOND. Workaround that by adding a PREFETCHW which is serialized by
      SCU (System Coherency Unit). It brings the cache line in Exclusive state
      and makes others invalidate their lines. This gives enough time for
      winner to complete the LLOCK/SCOND, before others can get the line back.
      
      The prefetchw in the ll/sc loop is not nice but this is the only
      software workaround for current version of RTL.
      
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      a5c8b52a
    • V
      ARC: Reduce bitops lines of code using macros · 04e2eee4
      Vineet Gupta 提交于
      No semantical changes !
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      04e2eee4
    • V
      ARCv2: barriers · b8a03302
      Vineet Gupta 提交于
      ARCv2 based HS38 cores are weakly ordered and thus explicit barriers for
      kernel proper.
      
      SMP barrier is provided by DMB instruction which also guarantees local
      barrier hence used as backend of smp_*mb() as well as *mb() APIs
      
      Also hookup barriers into MMIO accessors to avoid ordering issues in IO
      
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      b8a03302
    • V
      ARC: add smp barriers around atomics per Documentation/atomic_ops.txt · 2576c28e
      Vineet Gupta 提交于
       - arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers
         Since ARCv2 only provides load/load, store/store and all/all, we need
         the full barrier
      
       - LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified
         values were lacking the explicit smp barriers.
      
       - Non LLOCK/SCOND varaints don't need the explicit barriers since that
         is implicity provided by the spin locks used to implement the
         critical section (the spin lock barriers in turn are also fixed in
         this commit as explained above
      
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      2576c28e
    • V
      ARC: add compiler barrier to LLSC based cmpxchg · d57f7272
      Vineet Gupta 提交于
      When auditing cmpxchg call sites, Chuck noted that gcc was optimizing
      away some of the desired LDs.
      
      |	do {
      |		new = old = *ipi_data_ptr;
      |		new |= 1U << msg;
      |	} while (cmpxchg(ipi_data_ptr, old, new) != old);
      
      was generating to below
      
      | 8015cef8:	ld         r2,[r4,0]  <-- First LD
      | 8015cefc:	bset       r1,r2,r1
      |
      | 8015cf00:	llock      r3,[r4]  <-- atomic op
      | 8015cf04:	brne       r3,r2,8015cf10
      | 8015cf08:	scond      r1,[r4]
      | 8015cf0c:	bnz        8015cf00
      |
      | 8015cf10:	brne       r3,r2,8015cf00  <-- Branch doesn't go to orig LD
      
      Although this was fixed by adding a ACCESS_ONCE in this call site, it
      seems safer (for now at least) to add compiler barrier to LLSC based
      cmpxchg
      Reported-by: NChuck Jordan <cjordan@synopsys,com>
      Cc: <stable@vger.kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      d57f7272
  12. 22 6月, 2015 8 次提交