1. 04 11月, 2017 1 次提交
    • C
      arch/tile: Implement ->set_state_oneshot_stopped() · 777a45b4
      Chris Metcalf 提交于
      set_state_oneshot_stopped() is called by the clkevt core, when the
      next event is required at an expiry time of 'KTIME_MAX'. This normally
      happens with NO_HZ_{IDLE|FULL} in both LOWRES/HIGHRES modes.
      
      This patch makes the clockevent device to stop on such an event, to
      avoid spurious interrupts, as explained by: commit 8fff52fd
      ("clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state").
      Signed-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      777a45b4
  2. 15 4月, 2017 1 次提交
    • N
      tile/time: Set ->min_delta_ticks and ->max_delta_ticks · 45b586ef
      Nicolai Stange 提交于
      In preparation for making the clockevents core NTP correction aware,
      all clockevent device drivers must set ->min_delta_ticks and
      ->max_delta_ticks rather than ->min_delta_ns and ->max_delta_ns: a
      clockevent device's rate is going to change dynamically and thus, the
      ratio of ns to ticks ceases to stay invariant.
      
      Currently, the tile's timer clockevent device is initialized as follows:
      
        evt->max_delta_ns = clockevent_delta2ns(MAX_TICK, evt);
      
      and
      
        .min_delta_ns = 1000,
      
      The first one translates to a ->max_delta_ticks value of MAX_TICK.
      For the latter, note that the clockevent core will superimpose a
      minimum of 1us by itself -- setting ->min_delta_ticks to 1 is safe here.
      
      Initialize ->min_delta_ticks and ->max_delta_ticks with these values.
      
      This patch alone doesn't introduce any change in functionality as the
      clockevents core still looks exclusively at the (untouched) ->min_delta_ns
      and ->max_delta_ns. As soon as this has changed, a followup patch will
      purge the initialization of ->min_delta_ns and ->max_delta_ns from this
      driver.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      45b586ef
  3. 02 3月, 2017 1 次提交
  4. 17 12月, 2016 1 次提交
    • C
      tile: use __ro_after_init instead of tile-specific __write_once · 14e73e78
      Chris Metcalf 提交于
      The semantics of the old tile __write_once are the same as the
      newer generic __ro_after_init, so rename them all and get rid
      of the tile-specific version.
      
      This does not enable actual support for __ro_after_init,
      which had been dropped from the tile architecture before the
      initial upstreaming was done, since we had at that time switched
      to using 16MB huge pages to map the kernel.
      Signed-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      14e73e78
  5. 24 11月, 2016 1 次提交
    • C
      tile: avoid using clocksource_cyc2ns with absolute cycle count · e658a6f1
      Chris Metcalf 提交于
      For large values of "mult" and long uptimes, the intermediate
      result of "cycles * mult" can overflow 64 bits.  For example,
      the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
      we have mult = 853, and after 208.5 days, we overflow 64 bits.
      
      Since clocksource_cyc2ns() is intended to be used for relative
      cycle counts, not absolute cycle counts, performance is more
      importance than accepting a wider range of cycle values.  So,
      just use mult_frac() directly in tile's sched_clock().
      
      Commit 4cecf6d4 ("sched, x86: Avoid unnecessary overflow
      in sched_clock") by Salman Qazi results in essentially the same
      generated code for x86 as this change does for tile.  In fact,
      a follow-on change by Salman introduced mult_frac() and switched
      to using it, so the C code was largely identical at that point too.
      
      Peter Zijlstra then added mul_u64_u32_shr() and switched x86
      to use it.  This is, in principle, better; by optimizing the
      64x64->64 multiplies to be 32x32->64 multiplies we can potentially
      save some time.  However, the compiler piplines the 64x64->64
      multiplies pretty well, and the conditional branch in the generic
      mul_u64_u32_shr() causes some bubbles in execution, with the
      result that it's pretty much a wash.  If tilegx provided its own
      implementation of mul_u64_u32_shr() without the conditional branch,
      we could potentially save 3 cycles, but that seems like small gain
      for a fair amount of additional build scaffolding; no other platform
      currently provides a mul_u64_u32_shr() override, and tile doesn't
      currently have an <asm/div64.h> header to put the override in.
      
      Additionally, gcc currently has an optimization bug that prevents
      it from recognizing the opportunity to use a 32x32->64 multiply,
      and so the result would be no better than the existing mult_frac()
      until such time as the compiler is fixed.
      
      For now, just using mult_frac() seems like the right answer.
      
      Cc: stable@kernel.org [v3.4+]
      Signed-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      e658a6f1
  6. 31 7月, 2015 1 次提交
  7. 27 3月, 2015 1 次提交
  8. 12 11月, 2014 1 次提交
  9. 03 10月, 2014 1 次提交
  10. 02 10月, 2014 1 次提交
  11. 27 8月, 2014 1 次提交
    • C
      tile: Replace __get_cpu_var uses · b4f50191
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b4f50191
  12. 24 7月, 2014 3 次提交
  13. 07 3月, 2014 1 次提交
  14. 14 8月, 2013 1 次提交
    • C
      tile: implement gettimeofday() via vDSO · 4a556f4f
      Chris Metcalf 提交于
      This change creates the framework for vDSO calls, makes the existing
      rt_sigreturn() mechanism use it, and adds a fast gettimeofday().
      Now that we need to expose the vDSO address to userspace, we add
      AT_SYSINFO_EHDR to the set of aux entries provided to userspace.
      (You can disable any extra vDSO support by booting with vdso=0,
      but the rt_sigreturn vDSO page will still be provided.)
      
      Note that glibc has supported the tile vDSO since release 2.17.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      4a556f4f
  15. 15 7月, 2013 1 次提交
    • P
      tile: delete __cpuinit usage from all tile files · 18f894c1
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/tile uses of the __cpuinit macros from
      all C files.  Currently tile does not have any __CPUINIT used in
      assembly files.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      18f894c1
  16. 27 3月, 2013 1 次提交
    • H
      tile: ns2cycles should use __raw_get_cpu_var · 39e8202b
      Henrik Austad 提交于
      ns2cycles use per_cpu variables, and will, eventually, find its way into
      smp_processor_id(). This is not safe in a preemptible kernel;
      preemption should ideally be disabled.
      
      BUG: using smp_processor_id() in preemptible [00000000] code:
      systemd-modules/367
      caller is ns2cycles+0x40/0xb8
      
      Starting stack dump of tid 367, pid 367 (systemd-modules) on cpu 2 at
      cycle 20969956421
       frame 0: 0xfffffff70004b860 dump_stack+0x0/0x20 (sp 0xfffffe407993fa90)
       frame 1: 0xfffffff7006abc28 debug_smp_processor_id+0x1a8/0x1e0 (sp
      0xfffffe407993fa90)
       frame 2: 0xfffffff7004d7b40 ns2cycles+0x40/0xb8 (sp 0xfffffe407993fab8)
       frame 3: 0xfffffff7004dc578 __ndelay+0x38/0x80 (sp 0xfffffe407993fae0)
      
      However, in this case:
      
      - the frequency is the same accross all cores
      - we use the data read-only
      - we do not scale the frequency
      
      Which means that we can use the __raw_get_cpu_var instead.
      Signed-off-by: NHenrik Austad <haustad@cisco.com>
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      39e8202b
  17. 04 6月, 2011 1 次提交
  18. 05 5月, 2011 1 次提交
    • C
      arch/tile: various header improvements for building drivers · 28d71741
      Chris Metcalf 提交于
      This change adds a number of missing headers in asm (fb.h, parport.h,
      serial.h, and vga.h) using the minimal generic versions.
      
      It also adds a number of missing interfaces that showed up as build
      failures when trying to build various drivers not normally included in the
      "tile" distribution: ioremap_wc(), memset_io(), io{read,write}{16,32}be(),
      virt_to_bus(), bus_to_virt(), irq_canonicalize(), __pte(), __pgd(),
      and __pmd().  I also added a cast in virt_to_page() since not all callers
      pass a pointer.
      
      I fixed <asm/stat.h> to properly include a __KERNEL__ guard for the
      __ARCH_WANT_STAT64 symbol, and <asm/swab.h> to use __builtin_bswap32()
      even for our 64-bit architecture, since the same code is produced.
      
      I added an export for get_cycles(), since it's used in some modules.
      
      And I made <arch/spr_def.h> properly include the __KERNEL__ guard,
      even though it's not yet exported, since it likely will be soon.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      28d71741
  19. 02 3月, 2011 1 次提交
    • C
      arch/tile: fix __ndelay etc to work better · 13371731
      Chris Metcalf 提交于
      The current implementations of __ndelay and __udelay call a hypervisor
      service to delay, but the hypervisor service isn't actually implemented
      very well, and the consensus is that Linux should handle figuring this
      out natively and not use a hypervisor service.
      
      By converting nanoseconds to cycles, and then spinning until the
      cycle counter reaches the desired cycle, we get several benefits:
      first, we are sensitive to the actual clock speed; second, we use
      less power by issuing a slow SPR read once every six cycles while
      we delay; and third, we properly handle the case of an interrupt by
      exiting at the target time rather than after some number of cycles.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      13371731
  20. 02 11月, 2010 1 次提交
  21. 13 8月, 2010 1 次提交
    • C
      arch/tile: Use separate, better minsec values for clocksource and sched_clock. · 749dc6f2
      Chris Metcalf 提交于
      We were using the same 5-sec minsec for the clocksource and sched_clock
      that we were using for the clock_event_device.  For the clock_event_device
      that's exactly right since it has a short maximum countdown time.
      But for sched_clock we want to avoid wraparound when converting from
      ticks to nsec over a much longer window, so we force a shift of 10.
      And for clocksource it seems dodgy to use a 5-sec minsec as well, so we
      copy some other platforms and force a shift of 22.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      749dc6f2
  22. 07 7月, 2010 1 次提交
    • C
      arch/tile: Miscellaneous cleanup changes. · 0707ad30
      Chris Metcalf 提交于
      This commit is primarily changes caused by reviewing "sparse"
      and "checkpatch" output on our sources, so is somewhat noisy, since
      things like "printk() -> pr_err()" (or whatever) throughout the
      codebase tend to get tedious to read.  Rather than trying to tease
      apart precisely which things changed due to which type of code
      review, this commit includes various cleanups in the code:
      
      - sparse: Add declarations in headers for globals.
      - sparse: Fix __user annotations.
      - sparse: Using gfp_t consistently instead of int.
      - sparse: removing functions not actually used.
      - checkpatch: Clean up printk() warnings by using pr_info(), etc.;
        also avoid partial-line printks except in bootup code.
        - checkpatch: Use exposed structs rather than typedefs.
        - checkpatch: Change some C99 comments to C89 comments.
      
      In addition, a couple of minor other changes are rolled in
      to this commit:
      
      - Add support for a "raise" instruction to cause SIGFPE, etc., to be raised.
      - Remove some compat code that is unnecessary when we fully eliminate
        some of the deprecated syscalls from the generic syscall ABI.
      - Update the tile_defconfig to reflect current config contents.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      0707ad30
  23. 05 6月, 2010 1 次提交