1. 28 10月, 2015 11 次提交
  2. 17 10月, 2015 9 次提交
  3. 09 10月, 2015 3 次提交
  4. 27 8月, 2015 4 次提交
  5. 20 8月, 2015 7 次提交
    • V
      09074950
    • Y
      ARC: change some branchs to jumps to resolve linkage errors · 6de6066c
      Yuriy Kolerov 提交于
      When kernel's binary becomes large enough (32M and more) errors
      may occur during the final linkage stage. It happens because
      the build system uses short relocations for ARC  by default.
      This problem may be easily resolved by passing -mlong-calls
      option to GCC to use long absolute jumps (j) instead of short
      relative branchs (b).
      
      But there are fragments of pure assembler code exist which use
      branchs in inappropriate places and cause a linkage error because
      of relocations overflow.
      
      First of these fragments is .fixup insertion in futex.h and
      unaligned.c. It inserts a code in the separate section (.fixup)
      with branch instruction. It leads to the linkage error when
      kernel becomes large.
      
      Second of these fragments is calling scheduler's functions
      (common kernel code) from entry.S of ARC's code. When kernel's
      binary becomes large it may lead to the linkage error because
      scheduler may occur far enough from ARC's code in the final
      binary.
      Signed-off-by: NYuriy Kolerov <yuriy.kolerov@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      6de6066c
    • V
      ARC: ensure futex ops are atomic in !LLSC config · eb2cd8b7
      Vineet Gupta 提交于
      W/o hardware assisted atomic r-m-w the best we can do is to disable
      preemption.
      
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      eb2cd8b7
    • V
      ARC: make futex_atomic_cmpxchg_inatomic() return bimodal · 882a95ae
      Vineet Gupta 提交于
      Callers of cmpxchg_futex_value_locked() in futex code expect bimodal
      return value:
        !0 (essentially -EFAULT as failure)
         0 (success)
      
      Before this patch, the success return value was old value of futex,
      which could very well be non zero, causing caller to possibly take the
      failure path erroneously.
      
      Fix that by returning 0 for success
      
      (This fix was done back in 2011 for all upstream arches, which ARC
      obviously missed)
      
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      882a95ae
    • V
      ARC: futex cosmetics · ed574e2b
      Vineet Gupta 提交于
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      ed574e2b
    • V
      ARC: add barriers to futex code · 31d30c82
      Vineet Gupta 提交于
      The atomic ops on futex need to provide the full barrier just like
      regular atomics in kernel.
      
      Also remove pagefault_enable/disable in futex_atomic_cmpxchg_inatomic()
      as core code already does that
      
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Michel Lespinasse <walken@google.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      31d30c82
    • A
      ARCv2: Support IO Coherency and permutations involving L1 and L2 caches · f2b0b25a
      Alexey Brodkin 提交于
      In case of ARCv2 CPU there're could be following configurations
      that affect cache handling for data exchanged with peripherals
      via DMA:
       [1] Only L1 cache exists
       [2] Both L1 and L2 exist, but no IO coherency unit
       [3] L1, L2 caches and IO coherency unit exist
      
      Current implementation takes care of [1] and [2].
      Moreover support of [2] is implemented with run-time check
      for SLC existence which is not super optimal.
      
      This patch introduces support of [3] and rework of DMA ops
      usage. Instead of doing run-time check every time a particular
      DMA op is executed we'll have 3 different implementations of
      DMA ops and select appropriate one during init.
      
      As for IOC support for it we need:
       [a] Implement empty DMA ops because IOC takes care of cache
           coherency with DMAed data
       [b] Route dma_alloc_coherent() via dma_alloc_noncoherent()
           This is required to make IOC work in first place and also
           serves as optimization as LD/ST to coherent buffers can be
           srviced from caches w/o going all the way to memory
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      [vgupta:
        -Added some comments about IOC gains
        -Marked dma ops as static,
        -Massaged changelog a bit]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      f2b0b25a
  6. 07 8月, 2015 1 次提交
  7. 05 8月, 2015 1 次提交
    • V
      ARC: Make pt_regs regs unsigned · 87ce6280
      Vineet Gupta 提交于
      KGDB fails to build after f51e2f19 ("ARC: make sure instruction_pointer()
      returns unsigned value")
      
      The hack to force one specific reg to unsigned backfired. There's no
      reason to keep the regs signed after all.
      
      |  CC      arch/arc/kernel/kgdb.o
      |../arch/arc/kernel/kgdb.c: In function 'kgdb_trap':
      | ../arch/arc/kernel/kgdb.c:180:29: error: lvalue required as left operand of assignment
      |   instruction_pointer(regs) -= BREAK_INSTR_SIZE;
      Reported-by: NYuriy Kolerov <yuriy.kolerov@synopsys.com>
      Fixes: f51e2f19 ("ARC: make sure instruction_pointer() returns unsigned value")
      Cc: Alexey Brodkin <abrodkin@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      87ce6280
  8. 04 8月, 2015 4 次提交
    • V
      ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycle · b89aa12c
      Vineet Gupta 提交于
      The previous commit for delayed retry of SCOND needs some fine tuning
      for spin locks.
      
      The backoff from delayed retry in conjunction with spin looping of lock
      itself can potentially cause the delay counter to reach high values.
      So to provide fairness to any lock operation, after a lock "seems"
      available (i.e. just before first SCOND try0, reset the delay counter
      back to starting value of 1
      
      Essentially reset delay to 1 for a new spin-wait-loop-acquire cycle.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      b89aa12c
    • V
      ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with exponential backoff · e78fdfef
      Vineet Gupta 提交于
      This is to workaround the llock/scond livelock
      
      HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping
      coherency transactions in the SCU. The exclusive line state keeps rotating
      among contenting cores leading to a never ending cycle. So break the cycle
      by deferring the retry of failed exclusive access (SCOND). The actual delay
      needed is function of number of contending cores as well as the unrelated
      coherency traffic from other cores. To keep the code simple, start off with
      small delay of 1 which would suffice most cases and in case of contention
      double the delay. Eventually the delay is sufficient such that the coherency
      pipeline is drained, thus a subsequent exclusive access would succeed.
      
      Link: http://lkml.kernel.org/r/1438612568-28265-1-git-send-email-vgupta@synopsys.comAcked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      e78fdfef
    • V
      ARC: LLOCK/SCOND based rwlock · 69cbe630
      Vineet Gupta 提交于
      With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need
      for a guarding spin lock.
      
      This in turn elides the EXchange instruction based spinning which causes
      the cacheline transition to exclusive state and concurrent spinning
      across cores would cause the line to keep bouncing around.
      LLOCK/SCOND based implementation is superior as spinning on LLOCK keeps
      the cacheline in shared state.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      69cbe630
    • V
      ARC: LLOCK/SCOND based spin_lock · ae7eae9e
      Vineet Gupta 提交于
      Current spin_lock uses EXchange instruction to implement the atomic test
      and set of lock location (reads orig value and ST 1). This however forces
      the cacheline into exclusive state (because of the ST) and concurrent
      loops in multiple cores will bounce the line around between cores.
      
      Instead, use LLOCK/SCOND to implement the atomic test and set which is
      better as line is in shared state while lock is spinning on LLOCK
      
      The real motivation of this change however is to make way for future
      changes in atomics to implement delayed retry (with backoff).
      Initial experiment with delayed retry in atomics combined with orig
      EX based spinlock was a total disaster (broke even LMBench) as
      struct sock has a cache line sharing an atomic_t and spinlock. The
      tight spinning on lock, caused the atomic retry to keep backing off
      such that it would never finish.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      ae7eae9e