1. 05 6月, 2014 1 次提交
    • N
      sched: Rename capacity related flags · 5d4dfddd
      Nicolas Pitre 提交于
      It is better not to think about compute capacity as being equivalent
      to "CPU power".  The upcoming "power aware" scheduler work may create
      confusion with the notion of energy consumption if "power" is used too
      liberally.
      
      Let's rename the following feature flags since they do relate to capacity:
      
      	SD_SHARE_CPUPOWER  -> SD_SHARE_CPUCAPACITY
      	ARCH_POWER         -> ARCH_CAPACITY
      	NONTASK_POWER      -> NONTASK_CAPACITY
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: linaro-kernel@lists.linaro.org
      Cc: Andy Fleming <afleming@freescale.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: devicetree@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Link: http://lkml.kernel.org/n/tip-e93lpnxb87owfievqatey6b5@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5d4dfddd
  2. 07 5月, 2014 1 次提交
  3. 05 3月, 2014 2 次提交
  4. 13 12月, 2013 1 次提交
  5. 02 12月, 2013 1 次提交
  6. 21 11月, 2013 1 次提交
    • M
      powerpc: Make cpu_to_chip_id() available when SMP=n · 3eb906c6
      Michael Ellerman 提交于
      Up until now we have only used cpu_to_chip_id() in the topology code,
      which is only used on SMP builds. However my recent commit a4da0d50
      "Implement arch_get_random_long/int() for powernv" added a usage when
      SMP=n, breaking the build.
      
      Move cpu_to_chip_id() into prom.c so it is available for SMP=n builds.
      
      We would move the extern to prom.h, but that breaks the include in
      topology.h. Instead we leave it in smp.h, but move it out of the
      CONFIG_SMP #ifdef. We also need to include asm/smp.h in rng.c, because
      the linux version skips asm/smp.h on UP. What a mess.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      3eb906c6
  7. 01 10月, 2013 1 次提交
    • T
      hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock() · 6dedcca6
      Toshi Kani 提交于
      cpu_hotplug_driver_lock() serializes CPU online/offline operations
      when ARCH_CPU_PROBE_RELEASE is set.  This lock interface is no longer
      necessary with the following reason:
      
       - lock_device_hotplug() now protects CPU online/offline operations,
         including the probe & release interfaces enabled by
         ARCH_CPU_PROBE_RELEASE.  The use of cpu_hotplug_driver_lock() is
         redundant.
       - cpu_hotplug_driver_lock() is only valid when ARCH_CPU_PROBE_RELEASE
         is defined, which is misleading and is only enabled on powerpc.
      
      This patch removes the cpu_hotplug_driver_lock() interface.  As
      a result, ARCH_CPU_PROBE_RELEASE only enables / disables the cpu
      probe & release interface as intended.  There is no functional change
      in this patch.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6dedcca6
  8. 11 9月, 2013 1 次提交
  9. 14 8月, 2013 6 次提交
  10. 01 7月, 2013 2 次提交
    • L
      powerpc: Set cpu sibling mask before online cpu · cce606fe
      Li Zhong 提交于
      It seems following race is possible:
      
      	cpu0					cpux
      smp_init->cpu_up->_cpu_up
      	__cpu_up
      		kick_cpu(1)
      -------------------------------------------------------------------------
      		waiting online			...
      		...				notify CPU_STARTING
      							set cpux active
      						set cpux online
      -------------------------------------------------------------------------
      		finish waiting online
      		...
      sched_init_smp
      	init_sched_domains(cpu_active_mask)
      		build_sched_domains
      						set cpux sibling info
      -------------------------------------------------------------------------
      
      Execution of cpu0 and cpux could be concurrent between two separator
      lines.
      
      So if the cpux sibling information was set too late (normally
      impossible, but could be triggered by adding some delay in
      start_secondary, after setting cpu online), build_sched_domains()
      running on cpu0 might see cpux active, with an empty sibling mask, then
      cause some bad address accessing like following:
      
      [    0.099855] Unable to handle kernel paging request for data at address 0xc00000038518078f
      [    0.099868] Faulting instruction address: 0xc0000000000b7a64
      [    0.099883] Oops: Kernel access of bad area, sig: 11 [#1]
      [    0.099895] PREEMPT SMP NR_CPUS=16 DEBUG_PAGEALLOC NUMA pSeries
      [    0.099922] Modules linked in:
      [    0.099940] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-rc1-00120-gb973425c-dirty #16
      [    0.099956] task: c0000001fed80000 ti: c0000001fed7c000 task.ti: c0000001fed7c000
      [    0.099971] NIP: c0000000000b7a64 LR: c0000000000b7a40 CTR: c0000000000b4934
      [    0.099985] REGS: c0000001fed7f760 TRAP: 0300   Not tainted  (3.10.0-rc1-00120-gb973425c-dirty)
      [    0.099997] MSR: 8000000000009032 <SF,EE,ME,IR,DR,RI>  CR: 24272828  XER: 20000003
      [    0.100045] SOFTE: 1
      [    0.100053] CFAR: c000000000445ee8
      [    0.100064] DAR: c00000038518078f, DSISR: 40000000
      [    0.100073]
      GPR00: 0000000000000080 c0000001fed7f9e0 c000000000c84d48 0000000000000010
      GPR04: 0000000000000010 0000000000000000 c0000001fc55e090 0000000000000000
      GPR08: ffffffffffffffff c000000000b80b30 c000000000c962d8 00000003845ffc5f
      GPR12: 0000000000000000 c00000000f33d000 c00000000000b9e4 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000000000
      GPR20: c000000000ccf750 0000000000000000 c000000000c94d48 c0000001fc504000
      GPR24: c0000001fc504000 c0000001fecef848 c000000000c94d48 c000000000ccf000
      GPR28: c0000001fc522090 0000000000000010 c0000001fecef848 c0000001fed7fae0
      [    0.100293] NIP [c0000000000b7a64] .get_group+0x84/0xc4
      [    0.100307] LR [c0000000000b7a40] .get_group+0x60/0xc4
      [    0.100318] Call Trace:
      [    0.100332] [c0000001fed7f9e0] [c0000000000dbce4] .lock_is_held+0xa8/0xd0 (unreliable)
      [    0.100354] [c0000001fed7fa70] [c0000000000bf62c] .build_sched_domains+0x728/0xd14
      [    0.100375] [c0000001fed7fbe0] [c000000000af67bc] .sched_init_smp+0x4fc/0x654
      [    0.100394] [c0000001fed7fce0] [c000000000adce24] .kernel_init_freeable+0x17c/0x30c
      [    0.100413] [c0000001fed7fdb0] [c00000000000ba08] .kernel_init+0x24/0x12c
      [    0.100431] [c0000001fed7fe30] [c000000000009f74] .ret_from_kernel_thread+0x5c/0x68
      [    0.100445] Instruction dump:
      [    0.100456] 38800010 38a00000 4838e3f5 60000000 7c6307b4 2fbf0000 419e0040 3d220001
      [    0.100496] 78601f24 39491590 e93e0008 7d6a002a <7d69582a> f97f0000 7d4a002a e93e0010
      [    0.100559] ---[ end trace 31fd0ba7d8756001 ]---
      
      This patch tries to move the sibling maps updating before
      notify_cpu_starting() and cpu online, and a write barrier there to make
      sure sibling maps are updated before active and online mask.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cce606fe
    • P
      powerpc: Delete __cpuinit usage from all users · 061d19f2
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      This removes all the powerpc uses of the __cpuinit macros.  There
      are no __CPUINIT users in assembly files in powerpc.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Josh Boyer <jwboyer@gmail.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      061d19f2
  11. 08 4月, 2013 1 次提交
  12. 08 2月, 2013 1 次提交
    • D
      powerpc: fix ics_rtas_init and start_secondary section mismatch · 174ea471
      Daniel Borkmann 提交于
      It seems, we're fine with just annotating the two functions.
      Thus, this fixes the following build warnings on ppc64:
      
      WARNING: arch/powerpc/sysdev/xics/built-in.o(.text+0x1664):
      The function .ics_rtas_init() references
      the function __init .xics_register_ics().
      This is often because .ics_rtas_init lacks a __init
      annotation or the annotation of .xics_register_ics is wrong.
      
      WARNING: arch/powerpc/sysdev/built-in.o(.text+0x6044):
      The function .ics_rtas_init() references
      the function __init .xics_register_ics().
      This is often because .ics_rtas_init lacks a __init
      annotation or the annotation of .xics_register_ics is wrong.
      
      WARNING: arch/powerpc/kernel/built-in.o(.text+0x2db30):
      The function .start_secondary() references
      the function __cpuinit .vdso_getcpu_init().
      This is often because .start_secondary lacks a __cpuinit
      annotation or the annotation of .vdso_getcpu_init is wrong.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      174ea471
  13. 04 1月, 2013 1 次提交
    • G
      POWERPC: drivers: remove __dev* attributes. · cad5cef6
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      __devinitconst, and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cad5cef6
  14. 30 10月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online · 512691d4
      Paul Mackerras 提交于
      When a Book3S HV KVM guest is running, we need the host to be in
      single-thread mode, that is, all of the cores (or at least all of
      the cores where the KVM guest could run) to be running only one
      active hardware thread.  This is because of the hardware restriction
      in POWER processors that all of the hardware threads in the core
      must be in the same logical partition.  Complying with this restriction
      is much easier if, from the host kernel's point of view, only one
      hardware thread is active.
      
      This adds two hooks in the SMP hotplug code to allow the KVM code to
      make sure that secondary threads (i.e. hardware threads other than
      thread 0) cannot come online while any KVM guest exists.  The KVM
      code still has to check that any core where it runs a guest has the
      secondary threads offline, but having done that check it can now be
      sure that they will not come online while the guest is running.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      512691d4
  15. 19 9月, 2012 1 次提交
  16. 13 9月, 2012 1 次提交
  17. 05 9月, 2012 1 次提交
    • P
      powerpc: Make sure IPI handlers see data written by IPI senders · 9fb1b36c
      Paul Mackerras 提交于
      We have been observing hangs, both of KVM guest vcpu tasks and more
      generally, where a process that is woken doesn't properly wake up and
      continue to run, but instead sticks in TASK_WAKING state.  This
      happens because the update of rq->wake_list in ttwu_queue_remote()
      is not ordered with the update of ipi_message in
      smp_muxed_ipi_message_pass(), and the reading of rq->wake_list in
      scheduler_ipi() is not ordered with the reading of ipi_message in
      smp_ipi_demux().  Thus it is possible for the IPI receiver not to see
      the updated rq->wake_list and therefore conclude that there is nothing
      for it to do.
      
      In order to make sure that anything done before smp_send_reschedule()
      is ordered before anything done in the resulting call to scheduler_ipi(),
      this adds barriers in smp_muxed_message_pass() and smp_ipi_demux().
      The barrier in smp_muxed_message_pass() is a full barrier to ensure that
      there is a full ordering between the smp_send_reschedule() caller and
      scheduler_ipi().  In smp_ipi_demux(), we use xchg() rather than
      xchg_local() because xchg() includes release and acquire barriers.
      Using xchg() rather than xchg_local() makes sense given that
      ipi_message is not just accessed locally.
      
      This moves the barrier between setting the message and calling the
      cause_ipi() function into the individual cause_ipi implementations.
      Most of them -- those that used outb, out_8 or similar -- already had
      a full barrier because out_8 etc. include a sync before the MMIO
      store.  This adds an explicit barrier in the two remaining cases.
      
      These changes made no measurable difference to the speed of IPIs as
      measured using a simple ping-pong latency test across two CPUs on
      different cores of a POWER7 machine.
      
      The analysis of the reason why processes were not waking up properly
      is due to Milton Miller.
      
      Cc: stable@vger.kernel.org # v3.0+
      Reported-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9fb1b36c
  18. 11 7月, 2012 1 次提交
    • A
      powerpc: Add VDSO version of getcpu · 18ad51dd
      Anton Blanchard 提交于
      We have a request for a fast method of getting CPU and NUMA node IDs
      from userspace. This patch implements a getcpu VDSO function,
      similar to x86.
      
      Ben suggested we use SPRG3 which is userspace readable. SPRG3 can be
      modified by a KVM guest, so we save the SPRG3 value in the paca and
      restore it when transitioning from the guest to the host.
      
      I have a glibc patch that implements sched_getcpu on top of this.
      Testing on a POWER7:
      
      baseline: 538 cycles
      vdso:      30 cycles
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      18ad51dd
  19. 03 7月, 2012 1 次提交
    • Y
      powerpc/smp: remove call to ipi_call_lock()/ipi_call_unlock() · e250d4bc
      Yong Zhang 提交于
      1) call_function.lock used in smp_call_function_many() is just to protect
         call_function.queue and &data->refs, cpu_online_mask is outside of the
         lock. And it's not necessary to protect cpu_online_mask,
         because data->cpumask is pre-calculate and even if a cpu is brougt up
         when calling arch_send_call_function_ipi_mask(), it's harmless because
         validation test in generic_smp_call_function_interrupt() will take care
         of it.
      
      2) For cpu down issue, stop_machine() will guarantee that no concurrent
         smp_call_fuction() is processing.
      Signed-off-by: NYong Zhang <yong.zhang0@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e250d4bc
  20. 05 6月, 2012 1 次提交
  21. 26 4月, 2012 2 次提交
    • T
      powerpc: Use generic idle thread allocation · 17e32eac
      Thomas Gleixner 提交于
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Link: http://lkml.kernel.org/r/20120420124557.311212868@linutronix.de
      17e32eac
    • T
      smp: Add task_struct argument to __cpu_up() · 8239c25f
      Thomas Gleixner 提交于
      Preparatory patch to make the idle thread allocation for secondary
      cpus generic.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James E.J. Bottomley <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: x86@kernel.org
      Link: http://lkml.kernel.org/r/20120420124556.964170564@linutronix.de
      8239c25f
  22. 29 3月, 2012 1 次提交
  23. 22 12月, 2011 1 次提交
    • K
      cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem · 8a25a2fd
      Kay Sievers 提交于
      This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
      and converts the devices to regular devices. The sysdev drivers are
      implemented as subsystem interfaces now.
      
      After all sysdev classes are ported to regular driver core entities, the
      sysdev implementation will be entirely removed from the kernel.
      
      Userspace relies on events and generic sysfs subsystem infrastructure
      from sysdev devices, which are made available with this conversion.
      
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8a25a2fd
  24. 25 11月, 2011 1 次提交
  25. 08 11月, 2011 1 次提交
  26. 01 11月, 2011 1 次提交
  27. 20 9月, 2011 1 次提交
  28. 27 7月, 2011 1 次提交
  29. 12 7月, 2011 1 次提交
    • P
      KVM: PPC: Add support for Book3S processors in hypervisor mode · de56a948
      Paul Mackerras 提交于
      This adds support for KVM running on 64-bit Book 3S processors,
      specifically POWER7, in hypervisor mode.  Using hypervisor mode means
      that the guest can use the processor's supervisor mode.  That means
      that the guest can execute privileged instructions and access privileged
      registers itself without trapping to the host.  This gives excellent
      performance, but does mean that KVM cannot emulate a processor
      architecture other than the one that the hardware implements.
      
      This code assumes that the guest is running paravirtualized using the
      PAPR (Power Architecture Platform Requirements) interface, which is the
      interface that IBM's PowerVM hypervisor uses.  That means that existing
      Linux distributions that run on IBM pSeries machines will also run
      under KVM without modification.  In order to communicate the PAPR
      hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
      to include/linux/kvm.h.
      
      Currently the choice between book3s_hv support and book3s_pr support
      (i.e. the existing code, which runs the guest in user mode) has to be
      made at kernel configuration time, so a given kernel binary can only
      do one or the other.
      
      This new book3s_hv code doesn't support MMIO emulation at present.
      Since we are running paravirtualized guests, this isn't a serious
      restriction.
      
      With the guest running in supervisor mode, most exceptions go straight
      to the guest.  We will never get data or instruction storage or segment
      interrupts, alignment interrupts, decrementer interrupts, program
      interrupts, single-step interrupts, etc., coming to the hypervisor from
      the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
      exception entry path so that we don't have to do the KVM test on entry
      to those exception handlers.
      
      We do however get hypervisor decrementer, hypervisor data storage,
      hypervisor instruction storage, and hypervisor emulation assist
      interrupts, so we have to handle those.
      
      In hypervisor mode, real-mode accesses can access all of RAM, not just
      a limited amount.  Therefore we put all the guest state in the vcpu.arch
      and use the shadow_vcpu in the PACA only for temporary scratch space.
      We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
      anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
      We don't have a shared page with the guest, but we still need a
      kvm_vcpu_arch_shared struct to store the values of various registers,
      so we include one in the vcpu_arch struct.
      
      The POWER7 processor has a restriction that all threads in a core have
      to be in the same partition.  MMU-on kernel code counts as a partition
      (partition 0), so we have to do a partition switch on every entry to and
      exit from the guest.  At present we require the host and guest to run
      in single-thread mode because of this hardware restriction.
      
      This code allocates a hashed page table for the guest and initializes
      it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
      require that the guest memory is allocated using 16MB huge pages, in
      order to simplify the low-level memory management.  This also means that
      we can get away without tracking paging activity in the host for now,
      since huge pages can't be paged or swapped.
      
      This also adds a few new exports needed by the book3s_hv code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de56a948
  30. 08 7月, 2011 1 次提交
  31. 29 6月, 2011 1 次提交
  32. 20 6月, 2011 1 次提交