1. 13 5月, 2016 1 次提交
  2. 09 5月, 2016 14 次提交
    • N
      ARC: [plat-eznps] Use dedicated COMMAND_LINE_SIZE · 085572f3
      Noam Camus 提交于
      The default 256 bytes sometimes is just not enough.
      We usually provide earlycon=... and console=... and ip=...
      All this and more may need more room.
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      085572f3
    • T
      ARC: [plat-eznps] Use dedicated cpu_relax() · 46c3e6b8
      Tal Zilcer 提交于
      Since the CTOP is SMT hardware multi-threaded, we need to hint
      the HW that now will be a very good time to do a hardware
      thread context switching. This is done by issuing the schd.rw
      instruction (binary coded here so as to not require specific
      revision of GCC to build the kernel).
      sched.rw means that Thread becomes eligible for execution by
      the threads scheduler after all pending read/write
      transactions were completed.
      
      Implementing cpu_relax_lowlatency() with barrier()
      Since with current semantics of cpu_relax() it may take a
      while till yielded CPU will get back.
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      46c3e6b8
    • N
      ARC: [plat-eznps] Use dedicated identity auxiliary register. · 86c25466
      Noam Camus 提交于
      With generic "identity" num of CPUs is limited to 256 (8 bit).
      We use our alternative AUX register GLOBAL_ID (12 bit).
      Now we can support up to 4096 CPUs.
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      86c25466
    • N
      ARC: [plat-eznps] Use dedicated SMP barriers · b1f2f6f3
      Noam Camus 提交于
      NPS device got 256 cores and each got 16 HW threads (SMT).
      We use EZchip dedicated ISA to trigger HW scheduler of the
      core that current HW thread belongs to.
      This scheduling makes sure that data beyond barrier is available
      to all HW threads in core and by that to all in device (4K).
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      b1f2f6f3
    • N
      ARC: [plat-eznps] Use dedicated atomic/bitops/cmpxchg · a5a10d99
      Noam Camus 提交于
      We need our own implementaions since we lack LLSC support.
      Our extended ISA provided with optimized solution for all 32bit
      operations we see in these three headers.
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      a5a10d99
    • N
      ARC: [plat-eznps] Use dedicated user stack top · 8bcf2c48
      Noam Camus 提交于
      NPS use special mapping right below TASK_SIZE.
      Hence we need to lower STACK_TOP so that user stack won't
      overlap NPS special mapping.
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      8bcf2c48
    • N
      ARC: rwlock: disable interrupts in !LLSC variant · 2a1021fc
      Noam Camus 提交于
      If we hold rwlock and interrupt occures we may
      end up spinning on it for ever during softirq.
      Note that this lock is an internal lock
      and since the lock is free to be used from any context,
      the lock needs to be IRQ-safe.
      
      Below you may see an example for interrupt we get while
      nl_table_lock is holding its rw->lock_mutex and we spinned
      on it for ever.
      
      The concept for the fix was taken from SPARC.
      
      [2015-05-12 19:16:12] Stack Trace:
      [2015-05-12 19:16:12]   arc_unwind_core+0xb8/0x11c
      [2015-05-12 19:16:12]   dump_stack+0x68/0xac
      [2015-05-12 19:16:12]   _raw_read_lock+0xa8/0xac
      [2015-05-12 19:16:12]   netlink_broadcast_filtered+0x56/0x35c
      [2015-05-12 19:16:12]   nlmsg_notify+0x42/0xa4
      [2015-05-12 19:16:13]   neigh_update+0x1fe/0x44c
      [2015-05-12 19:16:13]   neigh_event_ns+0x40/0xa4
      [2015-05-12 19:16:13]   arp_process+0x46e/0x5a8
      [2015-05-12 19:16:13]   __netif_receive_skb_core+0x358/0x500
      [2015-05-12 19:16:13]   process_backlog+0x92/0x154
      [2015-05-12 19:16:13]   net_rx_action+0xb8/0x188
      [2015-05-12 19:16:13]   __do_softirq+0xda/0x1d8
      [2015-05-12 19:16:14]   irq_exit+0x8a/0x8c
      [2015-05-12 19:16:14]   arch_do_IRQ+0x6c/0xa8
      [2015-05-12 19:16:14]   handle_interrupt_level1+0xe4/0xf0
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      2a1021fc
    • N
      ARC: Make vmalloc size configurable · 15ca68a9
      Noam Camus 提交于
      On ARC, lower 2G of address space is translated and used for
       - user vaddr space (region 0 to 5)
       - unused kernel-user gutter (region 6)
       - kernel vaddr space (region 7)
      
      where each region simply represents 256MB of address space.
      
      The kernel vaddr space of 256MB is used to implement vmalloc, modules
      So far this was enough, but not on EZChip system with 4K CPUs (given
      that per cpu mechanism uses vmalloc for allocating chunks)
      
      So allow VMALLOC_SIZE to be configurable by expanding down into the unused
      kernel-user gutter region which at default 256M was excessive anyways.
      
      Also use _BITUL() to fix a build error since PGDIR_SIZE cannot use "1UL"
      as called from assembly code in mm/tlbex.S
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      [vgupta: rewrote changelog, debugged bootup crash due to int vs. hex]
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      15ca68a9
    • N
      ARC: clean out UAPI byteorder.h clean off Kconfig symbol · 4bb40c6d
      Noam Camus 提交于
      UAPI header should not use Kconfig items
      
      Use __BIG_ENDIAN__ defined as a compiler intrinsic
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      [vgupta: fix changelog]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      4bb40c6d
    • A
      ARC: RIP arc_{get|set}_core_freq() clk API · 6e9318d1
      Alexey Brodkin 提交于
      There are no more users of this - so RIP!
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      [vgupta: update changelog]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      6e9318d1
    • V
      ARC: irq: export some IRQs again · 88555cc5
      Vineet Gupta 提交于
      This will be needed for switching to linear irq domain as
      irq_create_mapping() called by intr code needs the IRQ numbers
      in addition to existing usage in mcip.c for requesting the irq
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      88555cc5
    • V
      ARC: clockevent: DT based probe · 77c8d0d6
      Vineet Gupta 提交于
       - timer frequency is derived from DT (no longer rely on top level
         DT "clock-frequency" probed early and exported by asm/clk.h)
      
       - TIMER0_IRQ need not be exported across arch code, confined to intc as
         it is property of same
      
       - Any failures in clockevent setup are considered pedantic and system
         panic()'s as there is no generic fallback (unlike clocksource where
         a jiffies based soft clocksource always exists)
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      77c8d0d6
    • N
      ARC: clockevent: switch to cpu notifier for clockevent setup · eec3c58e
      Noam Camus 提交于
      ARC Timers so far have been handled as "legacy" w/o explicit description
      in DT. This poses challenge for newer platforms wanting to use them.
      This series will eventually help move timers over to DT.
      
      This patch does a small change of using a CPU notifier to set clockevent
      on non-boot CPUs. So explicit setup is done only on boot CPU (which will
      later be done by DT)
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      [vgupta: broken off from a bigger patch]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      eec3c58e
    • V
      ARC: opencode arc_request_percpu_irq · 56957940
      Vineet Gupta 提交于
      - The idea is to remove the API usage since it has a subltle
        design flaw - relies on being called on cpu0 first. This is true for
        some early per cpu irqs such as TIMER/IPI, but not for late probed
        per cpu peripherals such a perf. And it's usage in perf has already
        bitten us once: see c6317bc7
        ("ARCv2: perf: Ensure perf intr gets enabled on all cores") where we
        ended up open coding it anyways
      
      - The seeming duplication will go away once we start using cpu notifier
        for timer setup
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      56957940
  3. 05 5月, 2016 3 次提交
    • V
      ARC: support HIGHMEM even without PAE40 · 26f9d5fd
      Vineet Gupta 提交于
      Initial HIGHMEM support on ARC was introduced for PAE40 where the low
      memory (0x8000_0000 based) and high memory (0x1_0000_0000) were
      physically contiguous. So CONFIG_FLATMEM sufficed (despite a peipheral
      hole in the middle, which wasted a bit of struct page memory, but things
      worked).
      
      However w/o PAE, highmem was not possible and we could only reach
      ~1.75GB of DDR. Now there is a use case to access ~4GB of DDR w/o PAE40
      The idea is to have low memory at canonical 0x8000_0000 and highmem
      at 0 so enire 4GB address space is available for physical addressing
      This needs additional platform/interconnect mapping to convert
      the non contiguous physical addresses into linear bus adresses.
      
      From Linux point of view, non contiguous divide means FLATMEM no
      longer works and DISCONTIGMEM is needed to track the pfns in the 2
      regions.
      
      This scheme would also work for PAE40, only better in that we don't
      waste struct page memory for the peripheral hole.
      
      The DT description will be something like
      
          memory {
              ...
              reg = <0x80000000 0x200000000   /* 512MB: lowmem */
                     0x00000000 0x10000000>;  /* 256MB: highmem */
         }
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      26f9d5fd
    • V
      ARC: Fix PAE40 boot failures due to PTE truncation · 2519d753
      Vineet Gupta 提交于
      So a benign looking cleanup which macro'ized PAGE_SHIFT shifts turned
      out to be bad (since it was done non-sensically across the board).
      
      It caused boot failures with PAE40 as forced cast to (unsigned long)
      from newly introduced virt_to_pfn() was causing truncatiion of the
      (long long) pte/paddr values.
      
      It is OK to use this in accessors dealing with kernel virtual address,
      pointers etc, but not for PTE values themelves.
      
      Fixes: cJ2ff5cf2735c ("ARC: mm: Use virt_to_pfn() for addr >> PAGE_SHIFT pattern)
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      2519d753
    • V
      ARC: Add missing io barriers to io{read,write}{16,32}be() · e5bc0478
      Vineet Gupta 提交于
      While reviewing a different change to asm-generic/io.h Arnd spotted that
      ARC ioread32 and ioread32be both of which come from asm-generic versions
      are not symmetrical in terms of calling the io barriers.
      
      generic ioread32   -> ARC readl()                  [ has barriers]
      generic ioread32be -> __be32_to_cpu(__raw_readl()) [ lacks barriers]
      
      While generic ioread32be is being remediated to call readl(), that involves
      a swab32(), causing double swaps on ioread32be() on Big Endian systems.
      
      So provide our versions of big endian IO accessors to ensure io barrier
      calls while also keeping them optimal
      Suggested-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: stable@vger.kernel.org  [4.2+]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      e5bc0478
  4. 22 4月, 2016 1 次提交
    • E
      ARCv2: Enable LOCKDEP · d9676fa1
      Evgeny Voevodin 提交于
      - The asm helpers for calling into irq tracer were missing
      
      - Add calls to above helpers in low level assembly entry code for ARCv2
      
      - irq_save() uses CLRI to disable interrupts and returns the prev interrupt
        state (in STATUS32) in a specific encoding (and not the raw value of
        STATUS32). This is usable with SETI in irq_restore(). However
        save_flags() reads the raw value of STATUS32 which doesn't pair with
        irq_save/restore() and thus needs fixing.
      Signed-off-by: NEvgeny Voevodin <evgeny.voevodin@intel.com>
      [vgupta: updated changelog and also added some comments]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      d9676fa1
  5. 07 4月, 2016 1 次提交
    • A
      arc: Add our own implementation of fb_pgprotect() · e5e0a65c
      Alexey Brodkin 提交于
      During mmaping of frame-buffer pages to user-space
      fb_protect() is called to set proper page settings.
      
      In case of ARC we need to mark pages that are mmaped to
      user as uncached because of 2 reasons:
       * Huge amount of data if passing through data cache will
         thrash cache a lot making cache almost useless for other
         less traffic hungry processes.
       * Data written by user in FB will be immediately available for
         hardware (such as PGU etc) without requirements to flush data
         cache regularly.
      Signed-off-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Cc: linux-snps-arc@lists.infradead.org
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      e5e0a65c
  6. 19 3月, 2016 3 次提交
  7. 18 3月, 2016 1 次提交
  8. 17 3月, 2016 1 次提交
    • V
      ARC: thp: unbork !CONFIG_TRANSPARENT_HUGEPAGE build · c511eaaa
      Vineet Gupta 提交于
      linux-next for 4.6-rc1 timeline reported ARC build failures !THP
      
      | arch/arc/include/asm/tlbflush.h:29:0: warning: "flush_pmd_tlb_range" redefined [enabled by default]
      | arch/arc/include/asm/tlbflush.h:29:0: warning: "flush_pmd_tlb_range" redefined [enabled by default]
      | arch/arc/include/asm/tlbflush.h:29:0: warning: "flush_pmd_tlb_range" redefined [enabled by default]
      
      Turns out that commit ("mm/thp/migration: switch from flush_tlb_range
      to flush_pmd_tlb_range") triggered the issue while the problem was in
      ARC code where THP specific helpers were not guarded with #ifdef.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      c511eaaa
  9. 14 3月, 2016 1 次提交
    • A
      ipv4: Update parameters for csum_tcpudp_magic to their original types · 01cfbad7
      Alexander Duyck 提交于
      This patch updates all instances of csum_tcpudp_magic and
      csum_tcpudp_nofold to reflect the types that are usually used as the source
      inputs.  For example the protocol field is populated based on nexthdr which
      is actually an unsigned 8 bit value.  The length is usually populated based
      on skb->len which is an unsigned integer.
      
      This addresses an issue in which the IPv6 function csum_ipv6_magic was
      generating a checksum using the full 32b of skb->len while
      csum_tcpudp_magic was only using the lower 16 bits.  As a result we could
      run into issues when attempting to adjust the checksum as there was no
      protocol agnostic way to update it.
      
      With this change the value is still truncated as many architectures use
      "(len + proto) << 8", however this truncation only occurs for values
      greater than 16776960 in length and as such is unlikely to occur as we stop
      the inner headers at ~64K in size.
      
      I did have to make a few minor changes in the arm, mn10300, nios2, and
      score versions of the function in order to support these changes as they
      were either using things such as an OR to combine the protocol and length,
      or were using ntohs to convert the length which would have truncated the
      value.
      
      I also updated a few spots in terms of whitespace and type differences for
      the addresses.  Most of this was just to make sure all of the definitions
      were in sync going forward.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01cfbad7
  10. 12 3月, 2016 3 次提交
    • V
      c2ff5cf2
    • V
      ARC: build: Better way to detect ISA compatible toolchain · 20d78037
      Vineet Gupta 提交于
      ARC architecture has 2 instruction sets: ARCompact/ARCv2.
      While same gcc supports compiling for either (using appropriate toggles),
      we can't use the same toolchain to build kernel because libgcc needs
      to be unique and the toolchian (uClibc based) is not multilibed.
      
      uClibc toolchain is convenient since it allows all userspace and
      kernel to be built with a single install for an ISA.
      
      This however means 2 gnu installs (with same triplet prefix) are needed
      for building for 2 ISA and need to be in PATH.
      As developers we keep switching the builds, but would occassionally fail
      to update the PATH leading to usage of wrong tools. And this would only
      show up at the end of kernel build when linking incompatible libgcc.
      
      So the initial solution was to have gcc define a special preprocessor macro
      DEFAULT_CPU_xxx which is unique for default toolchain configuration.
      Claudiu proposed using grep for an existing preprocessor macro which is
      again uniquely defined per ISA.
      
      Cc: Michal Marek <mmarek@suse.cz>
      Suggested-by: NClaudiu Zissulescu <claziss@synopsys.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      20d78037
    • L
      ARC: [BE] readl()/writel() to work in Big Endian CPU configuration · f778cc65
      Lada Trimasova 提交于
      read{l,w}() write{l,w}() primitives should use le{16,32}_to_cpu() and
      cpu_to_le{16,32}() respectively to ensure device registers are read
      correctly in Big Endian CPU configuration.
      
      Per Arnd Bergmann
      | Most drivers using readl() or readl_relaxed() expect those to perform byte
      | swaps on big-endian architectures, as the registers tend to be fixed endian
      
      This was needed for getting UART to work correctly on a Big Endian ARC.
      
      The ARC accessors originally were fine, and the bug got introduced
      inadventently by commit b8a03302 ("ARCv2: barriers")
      
      Fixes: b8a03302 ("ARCv2: barriers")
      Link: http://lkml.kernel.org/r/201603100845.30602.arnd@arndb.de
      Cc: Alexey Brodkin <abrodkin@synopsys.com>
      Cc: stable@vger.kernel.org  [4.2+]
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NLada Trimasova <ltrimas@synopsys.com>
      [vgupta: beefed up changelog, added Fixes/stable tags]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      f778cc65
  11. 11 3月, 2016 3 次提交
  12. 26 2月, 2016 1 次提交
  13. 24 2月, 2016 3 次提交
    • V
      ARCv2: SMP: Push IPI_IRQ into IPI provider · 96817879
      Vineet Gupta 提交于
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      96817879
    • V
      ARC: [intc-compact] Remove IPI setup from ARCompact port · dbcbc7e7
      Vineet Gupta 提交于
      There is no real ARC700 based SMP SoC so remove IPI definition.
      EZChip's SMP ARC700 is going to use a different intc and IPI provider
      anyways.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      dbcbc7e7
    • V
      ARCv2: SMP: Emulate IPI to self using software triggered interrupt · bb143f81
      Vineet Gupta 提交于
      ARConnect/MCIP Inter-Core-Interrupt module can't send interrupt to
      local core. So use core intc capability to trigger software
      interrupt to self, using an unsued IRQ #21.
      
      This showed up as csd deadlock with LTP trace_sched on a dual core
      system. This test acts as scheduler fuzzer, triggering all sorts of
      schedulting activity. Trouble starts with IPI to self, which doesn't get
      delivered (effectively lost due to H/w capability), but the msg intended
      to be sent remain enqueued in per-cpu @ipi_data.
      
      All subsequent IPIs to this core from other cores get elided due to the
      IPI coalescing optimization in ipi_send_msg_one() where a pending msg
      implies an IPI already sent and assumes other core is yet to ack it.
      After the elided IPI, other core simply goes into csd_lock_wait()
      but never comes out as this core never sees the interrupt.
      
      Fixes STAR 9001008624
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>        [4.2]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      bb143f81
  14. 18 2月, 2016 1 次提交
  15. 12 2月, 2016 1 次提交
    • V
      ARC: mm: Introduce explicit super page size support · 37eda9df
      Vineet Gupta 提交于
      MMUv4 supports 2 concurrent page sizes: Normal and Super [4K to 16M]
      
      So far Linux supported a single super page size for a given Normal page,
      depending on the software page walking address split.
      e.g. we had 11:8:13 address split for 8K page, which meant super page
      was 2 ^(8+13) = 2M (given that THP size has to be PMD_SHIFT)
      
      Now we turn this around, by allowing multiple Super Pages in Kconfig
      (currently 2M and 16M only) and forcing page walker address split to
      PGDIR_SHIFT and PAGE_SHIFT
      
      For configs without Super page, things are same as before and
      PGDIR_SHIFT can be hacked to get non default address split
      
      The motivation for this change is a customer who needs 16M super page
      and a 8K Normal page combo.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      37eda9df
  16. 10 2月, 2016 1 次提交
    • V
      ARCv2: intc: Allow interruption by lowest priority interrupt · dec2b284
      Vineet Gupta 提交于
      ARC HS Cores support configurable multiple interrupt priorities of upto
      16 levels.
      
      There is processor "interrupt preemption threshhold" in STATUS32.E[4:1]
      And several places need to set this up:
      1. seed value as kernel is booting
      2. seed value for user space programs
      3. Arg to SLEEP instruction in idle task (what interrupt prio can wake)
      4. Per-IRQ line prioirty (i.e. what is the priority of interrupt
         raised by a peripheral or timer or perf counter...
      
      Currently above sites use the highest priority 0. This can be potential
      problem when multiple priorities are supported. e.g. user space could
      only be interrupted by P0 interrupt, not others...
      So turn this over and instead make default interruption level to be
      the lowest priority possible 15. This should be fine even if there are
      fewer priority levels configured (say two: P0 HIGH, P1 LOW)
      
      This feature also effectively disables FIRQ feature if present in
      hardware config. With old code, a P0 interrupt would be FIRQ, needing
      special handling (ISR or Register Banks) which is NOT supported yet.
      Now it not be P0 (P15 or whatever is lowest prio) so FIRQ is not
      triggered.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      dec2b284
  17. 29 1月, 2016 1 次提交