1. 07 11月, 2013 1 次提交
  2. 29 6月, 2013 3 次提交
    • D
      x86: xen: Sync the CMOS RTC as well as the Xen wallclock · 47433b8c
      David Vrabel 提交于
      Adjustments to Xen's persistent clock via update_persistent_clock()
      don't actually persist, as the Xen wallclock is a software only clock
      and modifications to it do not modify the underlying CMOS RTC.
      
      The x86_platform.set_wallclock hook is there to keep the hardware RTC
      synchronized. On a guest this is pointless.
      
      On Dom0 we can use the native implementaion which actually updates the
      hardware RTC, but we still need to keep the software emulation of RTC
      for the guests up to date. The subscription to the pvclock_notifier
      allows us to emulate this easily. The notifier is called at every tick
      and when the clock was set.
      
      Right now we only use that notifier when the clock was set, but due to
      the fact that it is called periodically from the timekeeping update
      code, we can utilize it to emulate the NTP driven drift compensation
      of update_persistant_clock() for the Xen wall (software) clock.
      
      Add a 11 minutes periodic update to the pvclock_gtod notifier callback
      to achieve that. The static variable 'next' which maintains that 11
      minutes update cycle is protected by the core code serialization so
      there is no need to add a Xen specific serialization mechanism.
      
      [ tglx: Massaged changelog and added a few comments ]
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-6-git-send-email-david.vrabel@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      47433b8c
    • D
      x86: xen: Sync the wallclock when the system time is set · 5584880e
      David Vrabel 提交于
      Currently the Xen wallclock is only updated every 11 minutes if NTP is
      synchronized to its clock source (using the sync_cmos_clock() work).
      If a guest is started before NTP is synchronized it may see an
      incorrect wallclock time.
      
      Use the pvclock_gtod notifier chain to receive a notification when the
      system time has changed and update the wallclock to match.
      
      This chain is called on every timer tick and we want to avoid an extra
      (expensive) hypercall on every tick.  Because dom0 has historically
      never provided a very accurate wallclock and guests do not expect one,
      we can do this simply: the wallclock is only updated if the clock was
      set.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-5-git-send-email-david.vrabel@citrix.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5584880e
    • L
      xen/time: remove blocked time accounting from xen "clockchip" · 0b0c002c
      Laszlo Ersek 提交于
      ... because the "clock_event_device framework" already accounts for idle
      time through the "event_handler" function pointer in
      xen_timer_interrupt().
      
      The patch is intended as the completion of [1]. It should fix the double
      idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to
      stolen time accounting (the removed code seems to be isolated).
      
      The approach may be completely misguided.
      
      [1] https://lkml.org/lkml/2011/10/6/10
      [2] http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01068.html
      
      John took the time to retest this patch on top of v3.10 and reported:
      "idle time is correctly incremented for pv and hvm for the normal
      case, nohz=off and nohz=idle." so lets put this patch in.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NJohn Haxby <john.haxby@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0b0c002c
  3. 10 6月, 2013 4 次提交
    • K
      xen/time: Free onlined per-cpu data structure if we want to online it again. · 09e99da7
      Konrad Rzeszutek Wilk 提交于
      If the per-cpu time data structure has been onlined already and
      we are trying to online it again, then free the previous copy
      before blindly over-writting it.
      
      A developer naturally should not call this function multiple times
      but just in case.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      09e99da7
    • K
      xen/time: Check that the per_cpu data structure has data before freeing. · a05e2c37
      Konrad Rzeszutek Wilk 提交于
      We don't check whether the per_cpu data structure has actually
      been freed in the past. This checks it and if it has been freed
      in the past then just continues on without double-freeing.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a05e2c37
    • K
      xen/time: Don't leak interrupt name when offlining. · c9d76a24
      Konrad Rzeszutek Wilk 提交于
      When the user does:
          echo 0 > /sys/devices/system/cpu/cpu1/online
          echo 1 > /sys/devices/system/cpu/cpu1/online
      
      kmemleak reports:
      kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      One of the leaks is from xen/time:
      
      unreferenced object 0xffff88003fa51280 (size 32):
        comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
        hex dump (first 32 bytes):
          74 69 6d 65 72 31 00 00 00 00 00 00 00 00 00 00  timer1..........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81660721>] kmemleak_alloc+0x21/0x50
          [<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
          [<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
          [<ffffffff812fe228>] kasprintf+0x38/0x40
          [<ffffffff81041ec1>] xen_setup_timer+0x51/0xf0
          [<ffffffff8166339f>] xen_cpu_up+0x5f/0x3e8
          [<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
          [<ffffffff8166bd48>] cpu_up+0xd9/0xec
          [<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
          [<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
          [<ffffffff8165ce39>] kernel_init+0x9/0xf0
          [<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This patch fixes it by stashing away the 'name' in the per-cpu
      data structure and freeing it when offlining the CPU.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      c9d76a24
    • K
      xen/time: Encapsulate the struct clock_event_device in another structure. · 31620a19
      Konrad Rzeszutek Wilk 提交于
      We don't do any code movement. We just encapsulate the struct clock_event_device
      in a new structure which contains said structure and a pointer to
      a char *name. The 'name' will be used in 'xen/time: Don't leak interrupt
      name when offlining'.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      31620a19
  4. 29 5月, 2013 1 次提交
    • D
      x86: Increase precision of x86_platform.get/set_wallclock() · 3565184e
      David Vrabel 提交于
      All the virtualized platforms (KVM, lguest and Xen) have persistent
      wallclocks that have more than one second of precision.
      
      read_persistent_wallclock() and update_persistent_wallclock() allow
      for nanosecond precision but their implementation on x86 with
      x86_platform.get/set_wallclock() only allows for one second precision.
      This means guests may see a wallclock time that is off by up to 1
      second.
      
      Make set_wallclock() and get_wallclock() take a struct timespec
      parameter (which allows for nanosecond precision) so KVM and Xen
      guests may start with a more accurate wallclock time and a Xen dom0
      can maintain a more accurate wallclock for guests.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      3565184e
  5. 17 4月, 2013 2 次提交
    • K
      xen/time: Add default value of -1 for IRQ and check for that. · ef35a4e6
      Konrad Rzeszutek Wilk 提交于
      If the timer interrupt has been de-init or is just now being
      initialized, the default value of -1 should be preset as
      interrupt line. Check for that and if something is odd
      WARN us.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ef35a4e6
    • K
      xen/time: Fix kasprintf splat when allocating timer%d IRQ line. · 7918c92a
      Konrad Rzeszutek Wilk 提交于
      When we online the CPU, we get this splat:
      
      smpboot: Booting Node 0 Processor 1 APIC 0x2
      installing Xen timer for CPU 1
      BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
      in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
      Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
      Call Trace:
       [<ffffffff810c1fea>] __might_sleep+0xda/0x100
       [<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
       [<ffffffff81303758>] ? kasprintf+0x38/0x40
       [<ffffffff813036eb>] kvasprintf+0x5b/0x90
       [<ffffffff81303758>] kasprintf+0x38/0x40
       [<ffffffff81044510>] xen_setup_timer+0x30/0xb0
       [<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
       [<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
      
      The solution to that is use kasprintf in the CPU hotplug path
      that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
      and remove the call to in xen_hvm_setup_cpu_clockevents.
      
      Unfortunatly the later is not a good idea as the bootup path
      does not use xen_hvm_cpu_notify so we would end up never allocating
      timer%d interrupt lines when booting. As such add the check for
      atomic() to continue.
      
      CC: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7918c92a
  6. 27 9月, 2011 1 次提交
  7. 25 8月, 2011 1 次提交
  8. 19 5月, 2011 1 次提交
  9. 04 3月, 2011 1 次提交
  10. 22 2月, 2011 1 次提交
    • J
      x86: Convert remaining x86 clocksources to clocksource_register_hz/khz · b01cc1b0
      John Stultz 提交于
      This converts the remaining x86 clocksources to use
      clocksource_register_hz/khz.
      
      CC: jacob.jun.pan@intel.com
      CC: Glauber Costa <glommer@redhat.com>
      CC: Dimitri Sivanich <sivanich@sgi.com>
      CC: Rusty Russell <rusty@rustcorp.com.au>
      CC: Jeremy Fitzhardinge <jeremy@xensource.com>
      CC: Chris McDermott <lcm@us.ibm.com>
      CC: Thomas Gleixner <tglx@linutronix.de>
      Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen]
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      b01cc1b0
  11. 17 12月, 2010 1 次提交
  12. 28 11月, 2010 1 次提交
  13. 05 10月, 2010 1 次提交
  14. 05 8月, 2010 1 次提交
    • J
      xen: drop xen_sched_clock in favour of using plain wallclock time · 8a22b999
      Jeremy Fitzhardinge 提交于
      xen_sched_clock only counts unstolen time.  In principle this should
      be useful to the Linux scheduler so that it knows how much time a process
      actually consumed.  But in practice this doesn't work very well as the
      scheduler expects the sched_clock time to be synchronized between
      cpus.  It also uses sched_clock to measure the time a task spends
      sleeping, in which case "unstolen time" isn't meaningful.
      
      So just use plain xen_clocksource_read to return wallclock nanoseconds
      for sched_clock.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      8a22b999
  15. 30 7月, 2010 1 次提交
  16. 27 7月, 2010 1 次提交
  17. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  18. 13 3月, 2010 1 次提交
  19. 04 12月, 2009 2 次提交
  20. 29 10月, 2009 1 次提交
    • T
      percpu: make percpu symbols in xen unique · c6e22f9e
      Tejun Heo 提交于
      This patch updates percpu related symbols in xen such that percpu
      symbols are unique and don't clash with local symbols.  This serves
      two purposes of decreasing the possibility of global percpu symbol
      collision and allowing dropping per_cpu__ prefix from percpu symbols.
      
      * arch/x86/xen/smp.c, arch/x86/xen/time.c, arch/ia64/xen/irq_xen.c:
        add xen_ prefix to percpu variables
      
      * arch/ia64/xen/time.c: add xen_ prefix to percpu variables, drop
        processed_ prefix and make them static
      
      Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
      which cause name clashes" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      c6e22f9e
  21. 22 4月, 2009 1 次提交
  22. 31 12月, 2008 1 次提交
    • M
      [PATCH] idle cputime accounting · 79741dd3
      Martin Schwidefsky 提交于
      The cpu time spent by the idle process actually doing something is
      currently accounted as idle time. This is plain wrong, the architectures
      that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
      time spent doing nothing and the time spent by idle doing work. The first
      is accounted with account_idle_time and the second with account_system_time.
      The architectures that use the account_xxx_time interface directly and not
      the account_xxx_ticks interface now need to do the check for the idle
      process in their arch code. In particular to improve the system vs true
      idle time accounting the arch code needs to measure the true idle time
      instead of just testing for the idle process.
      To improve the tick based accounting as well we would need an architecture
      primitive that can tell us if the pt_regs of the interrupted context
      points to the magic instruction that halts the cpu.
      
      In addition idle time is no more added to the stime of the idle process.
      This field now contains the system time of the idle process as it should
      be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
      every tick that occurs while idle is running will be accounted as idle
      time.
      
      This patch contains the necessary common code changes to be able to
      distinguish idle system time and true idle time. The architectures with
      support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      79741dd3
  23. 13 12月, 2008 1 次提交
  24. 15 10月, 2008 1 次提交
  25. 25 8月, 2008 1 次提交
    • A
      xen: implement CPU hotplugging · d68d82af
      Alex Nixon 提交于
      Note the changes from 2.6.18-xen CPU hotplugging:
      
      A vcpu_down request from the remote admin via Xenbus both hotunplugs the
      CPU, and disables it by removing it from the cpu_present map, and removing
      its entry in /sys.
      
      A vcpu_up request from the remote admin only re-enables the CPU, and does
      not immediately bring the CPU up. A udev event is emitted, which can be
      caught by the user if he wishes to automatically re-up CPUs when available,
      or implement a more complex policy.
      Signed-off-by: NAlex Nixon <alex.nixon@citrix.com>
      Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d68d82af
  26. 22 8月, 2008 1 次提交
  27. 09 7月, 2008 1 次提交
  28. 25 6月, 2008 1 次提交
  29. 12 6月, 2008 1 次提交
    • J
      common implementation of iterative div/mod · f595ec96
      Jeremy Fitzhardinge 提交于
      We have a few instances of the open-coded iterative div/mod loop, used
      when we don't expcet the dividend to be much bigger than the divisor.
      Unfortunately modern gcc's have the tendency to strength "reduce" this
      into a full mod operation, which isn't necessarily any faster, and
      even if it were, doesn't exist if gcc implements it in libgcc.
      
      The workaround is to put a dummy asm statement in the loop to prevent
      gcc from performing the transformation.
      
      This patch creates a single implementation of this loop, and uses it
      to replace the open-coded versions I know about.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: Christian Kujau <lists@nerdbynature.de>
      Cc: Robert Hancock <hancockr@shaw.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f595ec96
  30. 02 6月, 2008 1 次提交
  31. 27 5月, 2008 2 次提交
    • J
      xen: maintain clock offset over save/restore · 359cdd3f
      Jeremy Fitzhardinge 提交于
      Hook into the device model to make sure that timekeeping's resume handler
      is called.  This deals with our clocksource's non-monotonicity over the
      save/restore.  Explicitly call clock_has_changed() to make sure that
      all the timers get retriggered properly.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      359cdd3f
    • J
      xen: implement save/restore · 0e91398f
      Jeremy Fitzhardinge 提交于
      This patch implements Xen save/restore and migration.
      
      Saving is triggered via xenbus, which is polled in
      drivers/xen/manage.c.  When a suspend request comes in, the kernel
      prepares itself for saving by:
      
      1 - Freeze all processes.  This is primarily to prevent any
          partially-completed pagetable updates from confusing the suspend
          process.  If CONFIG_PREEMPT isn't defined, then this isn't necessary.
      
      2 - Suspend xenbus and other devices
      
      3 - Stop_machine, to make sure all the other vcpus are quiescent.  The
          Xen tools require the domain to run its save off vcpu0.
      
      4 - Within the stop_machine state, it pins any unpinned pgds (under
          construction or destruction), performs canonicalizes various other
          pieces of state (mostly converting mfns to pfns), and finally
      
      5 - Suspend the domain
      
      Restore reverses the steps used to save the domain, ending when all
      the frozen processes are thawed.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      0e91398f
  32. 10 2月, 2008 1 次提交