1. 19 11月, 2012 2 次提交
    • F
      vtime: Explicitly account pending user time on process tick · bcebdf84
      Frederic Weisbecker 提交于
      All vtime implementations just flush the user time on process
      tick. Consolidate that in generic code by calling a user time
      accounting helper. This avoids an indirect call in ia64 and
      prepare to also consolidate vtime context switch code.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      bcebdf84
    • F
      vtime: Remove the underscore prefix invasion · fd25b4c2
      Frederic Weisbecker 提交于
      Prepending irq-unsafe vtime APIs with underscores was actually
      a bad idea as the result is a big mess in the API namespace that
      is even waiting to be further extended. Also these helpers
      are always called from irq safe callers except kvm. Just
      provide a vtime_account_system_irqsafe() for this specific
      case so that we can remove the underscore prefix on other
      vtime functions.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      fd25b4c2
  2. 30 10月, 2012 1 次提交
    • F
      vtime: Make vtime_account_system() irqsafe · 11113334
      Frederic Weisbecker 提交于
      vtime_account_system() currently has only one caller with
      vtime_account() which is irq safe.
      
      Now we are going to call it from other places like kvm where
      irqs are not always disabled by the time we account the cputime.
      
      So let's make it irqsafe. The arch implementation part is now
      prefixed with "__".
      
      vtime_account_idle() arch implementation is prefixed accordingly
      to stay consistent.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      11113334
  3. 26 9月, 2012 1 次提交
  4. 25 9月, 2012 1 次提交
    • F
      cputime: Use a proper subsystem naming for vtime related APIs · bf9fae9f
      Frederic Weisbecker 提交于
      Use a naming based on vtime as a prefix for virtual based
      cputime accounting APIs:
      
      - account_system_vtime() -> vtime_account()
      - account_switch_vtime() -> vtime_task_switch()
      
      It makes it easier to allow for further declension such
      as vtime_account_system(), vtime_account_idle(), ... if we
      want to find out the context we account to from generic code.
      
      This also make it better to know on which subsystem these APIs
      refer to.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      bf9fae9f
  5. 20 8月, 2012 1 次提交
    • F
      cputime: Consolidate vtime handling on context switch · baa36046
      Frederic Weisbecker 提交于
      The archs that implement virtual cputime accounting all
      flush the cputime of a task when it gets descheduled
      and sometimes set up some ground initialization for the
      next task to account its cputime.
      
      These archs all put their own hooks in their context
      switch callbacks and handle the off-case themselves.
      
      Consolidate this by creating a new account_switch_vtime()
      callback called in generic code right after a context switch
      and that these archs must implement to flush the prev task
      cputime and initialize the next task cputime related state.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      baa36046
  6. 20 7月, 2012 1 次提交
  7. 11 3月, 2012 3 次提交
    • H
      [S390] irq: external interrupt code passing · fde15c3a
      Heiko Carstens 提交于
      The external interrupt handlers have a parameter called ext_int_code.
      Besides the name this paramter does not only contain the ext_int_code
      but in addition also the "cpu address" (POP) which caused the external
      interrupt.
      To make the code a bit more obvious pass a struct instead so the called
      function can easily distinguish between external interrupt code and
      cpu address. The cpu address field however is named "subcode" since
      some external interrupt sources do not pass a cpu address but a
      different parameter (or none at all).
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      fde15c3a
    • M
      [S390] rework idle code · 4c1051e3
      Martin Schwidefsky 提交于
      Whenever the cpu loads an enabled wait PSW it will appear as idle to the
      underlying host system. The code in default_idle calls vtime_stop_cpu
      which does the necessary voodoo to get the cpu time accounting right.
      The udelay code just loads an enabled wait PSW. To correct this rework
      the vtime_stop_cpu/vtime_start_cpu logic and move the difficult parts
      to entry[64].S, vtime_stop_cpu can now be called from anywhere and
      vtime_start_cpu is gone. The correction of the cpu time during wakeup
      from an enabled wait PSW is done with a critical section in entry[64].S.
      As vtime_start_cpu is gone, s390_idle_check can be removed as well.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      4c1051e3
    • M
      [S390] rework smp code · 8b646bd7
      Martin Schwidefsky 提交于
      Define struct pcpu and merge some of the NR_CPUS arrays into it, including
      __cpu_logical_map, current_set and smp_cpu_state. Split smp related
      functions to those operating on physical cpus and the functions operating
      on a logical cpu number. Make the functions for physical cpus use a
      pointer to a struct pcpu. This hides the knowledge about cpu addresses in
      smp.c, entry[64].S and swsusp_asm64.S, thus remove the sigp.h header.
      
      The PSW restart mechanism is used to start secondary cpus, calling a
      function on an online cpu, calling a function on the ipl cpu, and for
      the nmi signal. Replace the different assembler functions with a
      single function restart_int_handler. The new entry point calls a function
      whose pointer is stored in the lowcore of the target cpu and it can wait
      for the source cpu to stop. This covers all existing use cases.
      
      Overall the code is now simpler and there are ~380 lines less code.
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      8b646bd7
  8. 30 10月, 2011 2 次提交
  9. 26 5月, 2011 1 次提交
  10. 31 3月, 2011 1 次提交
  11. 05 1月, 2011 2 次提交
  12. 01 12月, 2010 1 次提交
    • H
      [S390] nohz/s390: fix arch_needs_cpu() return value on offline cpus · 39881215
      Heiko Carstens 提交于
      This fixes the same problem as described in the patch "nohz: fix
      printk_needs_cpu() return value on offline cpus" for the arch_needs_cpu()
      primitive:
      
      arch_needs_cpu() may return 1 if called on offline cpus. When a cpu gets
      offlined it schedules the idle process which, before killing its own cpu,
      will call tick_nohz_stop_sched_tick().
      That function in turn will call arch_needs_cpu() in order to check if the
      local tick can be disabled. On offline cpus this function should naturally
      return 0 since regardless if the tick gets disabled or not the cpu will be
      dead short after. That is besides the fact that __cpu_disable() should already
      have made sure that no interrupts on the offlined cpu will be delivered anyway.
      
      In this case it prevents tick_nohz_stop_sched_tick() to call
      select_nohz_load_balancer(). No idea if that really is a problem. However what
      made me debug this is that on 2.6.32 the function get_nohz_load_balancer() is
      used within __mod_timer() to select a cpu on which a timer gets enqueued.
      If arch_needs_cpu() returns 1 then the nohz_load_balancer cpu doesn't get
      updated when a cpu gets offlined. It may contain the cpu number of an offline
      cpu. In turn timers get enqueued on an offline cpu and not very surprisingly
      they never expire and cause system hangs.
      
      This has been observed 2.6.32 kernels. On current kernels __mod_timer() uses
      get_nohz_timer_target() which doesn't have that problem. However there might
      be other problems because of the too early exit tick_nohz_stop_sched_tick()
      in case a cpu goes offline.
      
      This specific bug was indrocuded with 3c5d92a0 "nohz: Introduce
      arch_needs_cpu".
      
      In this case a cpu hotplug notifier is used to fix the issue in order to keep
      the normal/fast path small. All we need to do is to clear the condition that
      makes arch_needs_cpu() return 1 since it is just a performance improvement
      which is supposed to keep the local tick running for a short period if a cpu
      goes idle. Nothing special needs to be done except for clearing the condition.
      
      Cc: stable@kernel.org
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      39881215
  13. 25 10月, 2010 1 次提交
  14. 17 5月, 2010 1 次提交
    • M
      [S390] idle time accounting vs. machine checks · 6377981f
      Martin Schwidefsky 提交于
      A machine check can interrupt the i/o and external interrupt handler
      anytime. If the machine check occurs while the interrupt handler is
      waking up from idle vtime_start_cpu can get executed a second time
      and the int_clock / async_enter_timer values in the lowcore get
      clobbered. This can confuse the cpu time accounting.
      To fix this problem two changes are needed. First the machine check
      handler has to use its own copies of int_clock and async_enter_timer,
      named mcck_clock and mcck_enter_timer. Second the nested execution
      of vtime_start_cpu has to be prevented. This is done in s390_idle_check
      by checking the wait bit in the program status word.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6377981f
  15. 05 11月, 2009 1 次提交
    • M
      nohz: Introduce arch_needs_cpu · 3c5d92a0
      Martin Schwidefsky 提交于
      Allow the architecture to request a normal jiffy tick when the system
      goes idle and tick_nohz_stop_sched_tick is called . On s390 the hook is
      used to prevent the system going fully idle if there has been an
      interrupt other than a clock comparator interrupt since the last wakeup.
      
      On s390 the HiperSockets response time for 1 connection ping-pong goes
      down from 42 to 34 microseconds. The CPU cost decreases by 27%.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      LKML-Reference: <20090929122533.402715150@de.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      3c5d92a0
  16. 22 6月, 2009 2 次提交
  17. 12 6月, 2009 1 次提交
  18. 23 4月, 2009 1 次提交
    • M
      [S390] /proc/stat idle field for idle cpus · e1c80530
      Martin Schwidefsky 提交于
      The cpu idle field in the output of /proc/stat is too small for cpus
      that have been idle for more than a tick. Add the architecture hook
      arch_idle_time that allows to add the not accounted idle time of a
      sleeping cpu without waking the cpu.
      
      The s390 implementation of arch_idle_time uses the already existing
      s390_idle_data per_cpu variable to find the sleep time of a neighboring
      idle cpu.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e1c80530
  19. 14 4月, 2009 3 次提交
  20. 23 1月, 2009 1 次提交
    • H
      [S390] cputime: fix lowcore initialization on cpu hotplug · f9a2f797
      Heiko Carstens 提交于
      On (initial) cpu hotplug the lowcore values for user_timer and
      system_timer don't get initialized like they would get on each
      process schedule.
      On initial start of secondary cpus this leads to the situation
      where per thread user/system_timer values are larger than the
      corresponding contents of the lowcore. When later calculating
      time spent in user/system context the result can be negative.
      
      So for cpu hotplug we should manually initialize lowcore values.
      
      Fixes this bug:
      
      Kernel BUG at 000ec080 [verbose debug info unavailable]
      fixpoint divide exception: 0009 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 10 Not tainted 2.6.28 #4
      Process sysctl (pid: 975, task: 3fa752e0, ksp: 3fbebca0)
      Krnl PSW : 070c1000 800ec080 (show_stat+0x390/0x5fc)
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0
      Krnl GPRS: 7fffffff fefc7ce5 3faec080 003879ae
                 00000001 01388000 7fffffff 01388000
                 00000000 00000000 0049ad50 3fbebcf8
                 01388000 002f51a8 800ec1fe 3fbebcf8
      Krnl Code: 800ec076: 9001b188           stm     %r0,%r1,392(%r11)
                 800ec07a: 9801b0c0           lm      %r0,%r1,192(%r11)
                 800ec07e: 1d05               dr      %r0,%r5
                >800ec080: 9001b0c0           stm     %r0,%r1,192(%r11)
                 800ec084: 5860b0c4           l       %r6,196(%r11)
                 800ec088: 1806               lr      %r0,%r6
                 800ec08a: 8c800001           srdl    %r8,1
                 800ec08e: 1d87               dr      %r8,%r7
      Call Trace:
      ([<00000000000ec1ee>] show_stat+0x4fe/0x5fc)
       [<00000000000c13e8>] seq_read+0xc4/0x3ac
       [<00000000000e4796>] proc_reg_read+0x6e/0x9c
       [<00000000000a6a44>] vfs_read+0x78/0x100
       [<00000000000a6ba8>] sys_read+0x40/0x80
       [<00000000000234a8>] sysc_do_restart+0x1a/0x1e
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      f9a2f797
  21. 31 12月, 2008 5 次提交
    • M
      [PATCH] improve idle cputime accounting · 9cfb9b3c
      Martin Schwidefsky 提交于
      Distinguish the cputime of the idle process where idle is actually using
      cpu cycles from the cputime where idle is sleeping on an enabled wait psw.
      The former is accounted as system time, the later as idle time.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      9cfb9b3c
    • M
      [PATCH] improve precision of idle time detection. · 6f430924
      Martin Schwidefsky 提交于
      Increase the precision of the idle time calculation that is exported
      to user space via /sys/devices/system/cpu/cpu<x>/idle_time_us
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      6f430924
    • M
      [PATCH] improve precision of process accounting. · aa5e97ce
      Martin Schwidefsky 提交于
      The unit of the cputime accouting values that are stored per process is
      currently a microsecond. The CPU timer has a maximum granularity of
      2**-12 microseconds. There is no benefit in storing the per process values
      in the lesser precision and there is the disadvantage that the backend
      has to do the rounding to microseconds. The better solution is to use
      the maximum granularity of the CPU timer as cputime unit.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      aa5e97ce
    • M
      [PATCH] idle cputime accounting · 79741dd3
      Martin Schwidefsky 提交于
      The cpu time spent by the idle process actually doing something is
      currently accounted as idle time. This is plain wrong, the architectures
      that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
      time spent doing nothing and the time spent by idle doing work. The first
      is accounted with account_idle_time and the second with account_system_time.
      The architectures that use the account_xxx_time interface directly and not
      the account_xxx_ticks interface now need to do the check for the idle
      process in their arch code. In particular to improve the system vs true
      idle time accounting the arch code needs to measure the true idle time
      instead of just testing for the idle process.
      To improve the tick based accounting as well we would need an architecture
      primitive that can tell us if the pt_regs of the interrupted context
      points to the magic instruction that halts the cpu.
      
      In addition idle time is no more added to the stime of the idle process.
      This field now contains the system time of the idle process as it should
      be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
      every tick that occurs while idle is running will be accounted as idle
      time.
      
      This patch contains the necessary common code changes to be able to
      distinguish idle system time and true idle time. The architectures with
      support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      79741dd3
    • M
      [PATCH] fix scaled & unscaled cputime accounting · 457533a7
      Martin Schwidefsky 提交于
      The utimescaled / stimescaled fields in the task structure and the
      global cpustat should be set on all architectures. On s390 the calls
      to account_user_time_scaled and account_system_time_scaled never have
      been added. In addition system time that is accounted as guest time
      to the user time of a process is accounted to the scaled system time
      instead of the scaled user time.
      To fix the bugs and to prevent future forgetfulness this patch merges
      account_system_time_scaled into account_system_time and
      account_user_time_scaled into account_user_time.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Michael Neuling <mikey@neuling.org>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      457533a7
  22. 25 12月, 2008 1 次提交
  23. 14 7月, 2008 2 次提交
  24. 27 4月, 2008 1 次提交
    • H
      KVM: s390: arch backend for the kvm kernel module · b0c632db
      Heiko Carstens 提交于
      This patch contains the port of Qumranet's kvm kernel module to IBM zSeries
       (aka s390x, mainframe) architecture. It uses the mainframe's virtualization
      instruction SIE to run virtual machines with up to 64 virtual CPUs each.
      This port is only usable on 64bit host kernels, and can only run 64bit guest
      kernels. However, running 31bit applications in guest userspace is possible.
      
      The following source files are introduced by this patch
      arch/s390/kvm/kvm-s390.c    similar to arch/x86/kvm/x86.c, this implements all
                                  arch callbacks for kvm. __vcpu_run calls back into
                                  sie64a to enter the guest machine context
      arch/s390/kvm/sie64a.S      assembler function sie64a, which enters guest
                                  context via SIE, and switches world before and after                            that
      include/asm-s390/kvm_host.h contains all vital data structures needed to run
                                  virtual machines on the mainframe
      include/asm-s390/kvm.h      defines kvm_regs and friends for user access to
                                  guest register content
      arch/s390/kvm/gaccess.h     functions similar to uaccess to access guest memory
      arch/s390/kvm/kvm-s390.h    header file for kvm-s390 internals, extended by
                                  later patches
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      b0c632db
  25. 10 11月, 2007 1 次提交
    • P
      sched: restore deterministic CPU accounting on powerpc · fa13a5a1
      Paul Mackerras 提交于
      Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the
      deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been
      broken on powerpc, because we end up counting user time twice: once in
      timer_interrupt() and once in update_process_times().
      
      This fixes the problem by pulling the code in update_process_times
      that updates utime and stime into a separate function called
      account_process_tick.  If CONFIG_VIRT_CPU_ACCOUNTING is not defined,
      there is a version of account_process_tick in kernel/timer.c that
      simply accounts a whole tick to either utime or stime as before.  If
      CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to
      implement account_process_tick.
      
      This also lets us simplify the s390 code a bit; it means that the s390
      timer interrupt can now call update_process_times even when
      CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a
      suitable account_process_tick().
      
      account_process_tick() now takes the task_struct * as an argument.
      Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa13a5a1
  26. 27 7月, 2007 1 次提交
  27. 10 7月, 2007 1 次提交