1. 26 9月, 2011 1 次提交
    • P
      KVM: PPC: Assemble book3s{,_hv}_rmhandlers.S separately · 177339d7
      Paul Mackerras 提交于
      This makes arch/powerpc/kvm/book3s_rmhandlers.S and
      arch/powerpc/kvm/book3s_hv_rmhandlers.S be assembled as
      separate compilation units rather than having them #included in
      arch/powerpc/kernel/exceptions-64s.S.  We no longer have any
      conditional branches between the exception prologs in
      exceptions-64s.S and the KVM handlers, so there is no need to
      keep their contents close together in the vmlinux image.
      
      In their current location, they are using up part of the limited
      space between the first-level interrupt handlers and the firmware
      NMI data area at offset 0x7000, and with some kernel configurations
      this area will overflow (e.g. allyesconfig), leading to an
      "attempt to .org backwards" error when compiling exceptions-64s.S.
      
      Moving them out requires that we add some #includes that the
      book3s_{,hv_}rmhandlers.S code was previously getting implicitly
      via exceptions-64s.S.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      177339d7
  2. 12 7月, 2011 6 次提交
    • P
      KVM: PPC: book3s_hv: Add support for PPC970-family processors · 9e368f29
      Paul Mackerras 提交于
      This adds support for running KVM guests in supervisor mode on those
      PPC970 processors that have a usable hypervisor mode.  Unfortunately,
      Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to
      1), but the YDL PowerStation does have a usable hypervisor mode.
      
      There are several differences between the PPC970 and POWER7 in how
      guests are managed.  These differences are accommodated using the
      CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature
      bits.  Notably, on PPC970:
      
      * The LPCR, LPID or RMOR registers don't exist, and the functions of
        those registers are provided by bits in HID4 and one bit in HID0.
      
      * External interrupts can be directed to the hypervisor, but unlike
        POWER7 they are masked by MSR[EE] in non-hypervisor modes and use
        SRR0/1 not HSRR0/1.
      
      * There is no virtual RMA (VRMA) mode; the guest must use an RMO
        (real mode offset) area.
      
      * The TLB entries are not tagged with the LPID, so it is necessary to
        flush the whole TLB on partition switch.  Furthermore, when switching
        partitions we have to ensure that no other CPU is executing the tlbie
        or tlbsync instructions in either the old or the new partition,
        otherwise undefined behaviour can occur.
      
      * The PMU has 8 counters (PMC registers) rather than 6.
      
      * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist.
      
      * The SLB has 64 entries rather than 32.
      
      * There is no mediated external interrupt facility, so if we switch to
        a guest that has a virtual external interrupt pending but the guest
        has MSR[EE] = 0, we have to arrange to have an interrupt pending for
        it so that we can get control back once it re-enables interrupts.  We
        do that by sending ourselves an IPI with smp_send_reschedule after
        hard-disabling interrupts.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      9e368f29
    • P
      powerpc, KVM: Split HVMODE_206 cpu feature bit into separate HV and architecture bits · 969391c5
      Paul Mackerras 提交于
      This replaces the single CPU_FTR_HVMODE_206 bit with two bits, one to
      indicate that we have a usable hypervisor mode, and another to indicate
      that the processor conforms to PowerISA version 2.06.  We also add
      another bit to indicate that the processor conforms to ISA version 2.01
      and set that for PPC970 and derivatives.
      
      Some PPC970 chips (specifically those in Apple machines) have a
      hypervisor mode in that MSR[HV] is always 1, but the hypervisor mode
      is not useful in the sense that there is no way to run any code in
      supervisor mode (HV=0 PR=0).  On these processors, the LPES0 and LPES1
      bits in HID4 are always 0, and we use that as a way of detecting that
      hypervisor mode is not useful.
      
      Where we have a feature section in assembly code around code that
      only applies on POWER7 in hypervisor mode, we use a construct like
      
      END_FTR_SECTION_IFSET(CPU_FTR_HVMODE | CPU_FTR_ARCH_206)
      
      The definition of END_FTR_SECTION_IFSET is such that the code will
      be enabled (not overwritten with nops) only if all bits in the
      provided mask are set.
      
      Note that the CPU feature check in __tlbie() only needs to check the
      ARCH_206 bit, not the HVMODE bit, because __tlbie() can only get called
      if we are running bare-metal, i.e. in hypervisor mode.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      969391c5
    • P
      KVM: PPC: Allow book3s_hv guests to use SMT processor modes · 371fefd6
      Paul Mackerras 提交于
      This lifts the restriction that book3s_hv guests can only run one
      hardware thread per core, and allows them to use up to 4 threads
      per core on POWER7.  The host still has to run single-threaded.
      
      This capability is advertised to qemu through a new KVM_CAP_PPC_SMT
      capability.  The return value of the ioctl querying this capability
      is the number of vcpus per virtual CPU core (vcore), currently 4.
      
      To use this, the host kernel should be booted with all threads
      active, and then all the secondary threads should be offlined.
      This will put the secondary threads into nap mode.  KVM will then
      wake them from nap mode and use them for running guest code (while
      they are still offline).  To wake the secondary threads, we send
      them an IPI using a new xics_wake_cpu() function, implemented in
      arch/powerpc/sysdev/xics/icp-native.c.  In other words, at this stage
      we assume that the platform has a XICS interrupt controller and
      we are using icp-native.c to drive it.  Since the woken thread will
      need to acknowledge and clear the IPI, we also export the base
      physical address of the XICS registers using kvmppc_set_xics_phys()
      for use in the low-level KVM book3s code.
      
      When a vcpu is created, it is assigned to a virtual CPU core.
      The vcore number is obtained by dividing the vcpu number by the
      number of threads per core in the host.  This number is exported
      to userspace via the KVM_CAP_PPC_SMT capability.  If qemu wishes
      to run the guest in single-threaded mode, it should make all vcpu
      numbers be multiples of the number of threads per core.
      
      We distinguish three states of a vcpu: runnable (i.e., ready to execute
      the guest), blocked (that is, idle), and busy in host.  We currently
      implement a policy that the vcore can run only when all its threads
      are runnable or blocked.  This way, if a vcpu needs to execute elsewhere
      in the kernel or in qemu, it can do so without being starved of CPU
      by the other vcpus.
      
      When a vcore starts to run, it executes in the context of one of the
      vcpu threads.  The other vcpu threads all go to sleep and stay asleep
      until something happens requiring the vcpu thread to return to qemu,
      or to wake up to run the vcore (this can happen when another vcpu
      thread goes from busy in host state to blocked).
      
      It can happen that a vcpu goes from blocked to runnable state (e.g.
      because of an interrupt), and the vcore it belongs to is already
      running.  In that case it can start to run immediately as long as
      the none of the vcpus in the vcore have started to exit the guest.
      We send the next free thread in the vcore an IPI to get it to start
      to execute the guest.  It synchronizes with the other threads via
      the vcore->entry_exit_count field to make sure that it doesn't go
      into the guest if the other vcpus are exiting by the time that it
      is ready to actually enter the guest.
      
      Note that there is no fixed relationship between the hardware thread
      number and the vcpu number.  Hardware threads are assigned to vcpus
      as they become runnable, so we will always use the lower-numbered
      hardware threads in preference to higher-numbered threads if not all
      the vcpus in the vcore are runnable, regardless of which vcpus are
      runnable.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      371fefd6
    • P
      KVM: PPC: Add support for Book3S processors in hypervisor mode · de56a948
      Paul Mackerras 提交于
      This adds support for KVM running on 64-bit Book 3S processors,
      specifically POWER7, in hypervisor mode.  Using hypervisor mode means
      that the guest can use the processor's supervisor mode.  That means
      that the guest can execute privileged instructions and access privileged
      registers itself without trapping to the host.  This gives excellent
      performance, but does mean that KVM cannot emulate a processor
      architecture other than the one that the hardware implements.
      
      This code assumes that the guest is running paravirtualized using the
      PAPR (Power Architecture Platform Requirements) interface, which is the
      interface that IBM's PowerVM hypervisor uses.  That means that existing
      Linux distributions that run on IBM pSeries machines will also run
      under KVM without modification.  In order to communicate the PAPR
      hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
      to include/linux/kvm.h.
      
      Currently the choice between book3s_hv support and book3s_pr support
      (i.e. the existing code, which runs the guest in user mode) has to be
      made at kernel configuration time, so a given kernel binary can only
      do one or the other.
      
      This new book3s_hv code doesn't support MMIO emulation at present.
      Since we are running paravirtualized guests, this isn't a serious
      restriction.
      
      With the guest running in supervisor mode, most exceptions go straight
      to the guest.  We will never get data or instruction storage or segment
      interrupts, alignment interrupts, decrementer interrupts, program
      interrupts, single-step interrupts, etc., coming to the hypervisor from
      the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
      exception entry path so that we don't have to do the KVM test on entry
      to those exception handlers.
      
      We do however get hypervisor decrementer, hypervisor data storage,
      hypervisor instruction storage, and hypervisor emulation assist
      interrupts, so we have to handle those.
      
      In hypervisor mode, real-mode accesses can access all of RAM, not just
      a limited amount.  Therefore we put all the guest state in the vcpu.arch
      and use the shadow_vcpu in the PACA only for temporary scratch space.
      We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
      anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
      We don't have a shared page with the guest, but we still need a
      kvm_vcpu_arch_shared struct to store the values of various registers,
      so we include one in the vcpu_arch struct.
      
      The POWER7 processor has a restriction that all threads in a core have
      to be in the same partition.  MMU-on kernel code counts as a partition
      (partition 0), so we have to do a partition switch on every entry to and
      exit from the guest.  At present we require the host and guest to run
      in single-thread mode because of this hardware restriction.
      
      This code allocates a hashed page table for the guest and initializes
      it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
      require that the guest memory is allocated using 16MB huge pages, in
      order to simplify the low-level memory management.  This also means that
      we can get away without tracking paging activity in the host for now,
      since huge pages can't be paged or swapped.
      
      This also adds a few new exports needed by the book3s_hv code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de56a948
    • P
      KVM: PPC: Split host-state fields out of kvmppc_book3s_shadow_vcpu · 3c42bf8a
      Paul Mackerras 提交于
      There are several fields in struct kvmppc_book3s_shadow_vcpu that
      temporarily store bits of host state while a guest is running,
      rather than anything relating to the particular guest or vcpu.
      This splits them out into a new kvmppc_host_state structure and
      modifies the definitions in asm-offsets.c to suit.
      
      On 32-bit, we have a kvmppc_host_state structure inside the
      kvmppc_book3s_shadow_vcpu since the assembly code needs to be able
      to get to them both with one pointer.  On 64-bit they are separate
      fields in the PACA.  This means that on 64-bit we don't need to
      copy the kvmppc_host_state in and out on vcpu load/unload, and
      in future will mean that the book3s_hv code doesn't need a
      shadow_vcpu struct in the PACA at all.  That does mean that we
      have to be careful not to rely on any values persisting in the
      hstate field of the paca across any point where we could block
      or get preempted.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3c42bf8a
    • P
      powerpc, KVM: Rework KVM checks in first-level interrupt handlers · b01c8b54
      Paul Mackerras 提交于
      Instead of branching out-of-line with the DO_KVM macro to check if we
      are in a KVM guest at the time of an interrupt, this moves the KVM
      check inline in the first-level interrupt handlers.  This speeds up
      the non-KVM case and makes sure that none of the interrupt handlers
      are missing the check.
      
      Because the first-level interrupt handlers are now larger, some things
      had to be move out of line in exceptions-64s.S.
      
      This all necessitated some minor changes to the interrupt entry code
      in KVM.  This also streamlines the book3s_32 KVM test.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b01c8b54
  3. 19 5月, 2011 1 次提交
    • A
      powerpc: Improve scheduling of system call entry instructions · f5f0307f
      Anton Blanchard 提交于
      After looking at our system call path, Mary Brown suggested that we
      should put all mfspr SRR* instructions before any mtspr SRR*.
      
      To test this I used a very simple null syscall (actually getppid)
      testcase at http://ozlabs.org/~anton/junkcode/null_syscall.c
      
      I tested with the following changes against the pseries_defconfig:
      
      CONFIG_VIRT_CPU_ACCOUNTING=n
      CONFIG_AUDIT=n
      
      to remove the overhead of virtual CPU accounting and syscall
      auditing.
      
      POWER6:
      baseline:       mean = 757.2 cycles       sd = 2.108
      modified:       mean = 759.1 cycles       sd = 2.020
      
      POWER7:
      baseline:       mean = 411.4 cycles       sd = 0.138
      modified:       mean = 404.1 cycles       sd = 0.109
      
      So we have 1.77% improvement on POWER7 which looks significant. The
      POWER6 suggest a 0.25% slowdown, but the results are within 1
      standard deviation and may be in the noise.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f5f0307f
  4. 04 5月, 2011 2 次提交
    • P
      powerpc: Save Come-From Address Register (CFAR) in exception frame · 48404f2e
      Paul Mackerras 提交于
      Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From
      Address Register" (CFAR), that records the address of the most recent
      branch or rfid (return from interrupt) instruction for debugging purposes.
      
      This saves the value of the CFAR in the exception entry code and stores
      it in the exception frame.  We also make xmon print the CFAR value in
      its register dump code.
      
      Rather than extend the pt_regs struct at this time, we steal the orig_gpr3
      field, which is only used for system calls, and use it for the CFAR value
      for all exceptions/interrupts other than system calls.  This means we
      don't save the CFAR on system calls, which is not a great problem since
      system calls tend not to happen unexpectedly, and also avoids adding the
      overhead of reading the CFAR to the system call entry path.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      48404f2e
    • P
      powerpc: Save register r9-r13 values accurately on interrupt with bad stack · 1977b502
      Paul Mackerras 提交于
      When we take an interrupt or exception from kernel mode and the stack
      pointer is obviously not a kernel address (i.e. the top bit is 0), we
      switch to an emergency stack, save register values and panic.  However,
      on 64-bit server machines, we don't actually save the values of r9 - r13
      at the time of the interrupt, but rather values corrupted by the
      exception entry code for r12-r13, and nothing at all for r9-r11.
      
      This fixes it by passing a pointer to the register save area in the paca
      through to the bad_stack code in r3.  The register values are saved in
      one of the paca register save areas (depending on which exception this
      is).  Using the pointer in r3, the bad_stack code now retrieves the
      saved values of r9 - r13 and stores them in the exception frame on the
      emergency stack.  This also stores the normal exception frame marker
      ("regshere") in the exception frame.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1977b502
  5. 27 4月, 2011 1 次提交
  6. 20 4月, 2011 5 次提交
  7. 31 3月, 2011 1 次提交
  8. 30 3月, 2011 1 次提交
  9. 29 11月, 2010 1 次提交
  10. 24 10月, 2010 1 次提交
  11. 07 10月, 2010 1 次提交
    • D
      Fix IRQ flag handling naming · df9ee292
      David Howells 提交于
      Fix the IRQ flag handling naming.  In linux/irqflags.h under one configuration,
      it maps:
      
      	local_irq_enable() -> raw_local_irq_enable()
      	local_irq_disable() -> raw_local_irq_disable()
      	local_irq_save() -> raw_local_irq_save()
      	...
      
      and under the other configuration, it maps:
      
      	raw_local_irq_enable() -> local_irq_enable()
      	raw_local_irq_disable() -> local_irq_disable()
      	raw_local_irq_save() -> local_irq_save()
      	...
      
      This is quite confusing.  There should be one set of names expected of the
      arch, and this should be wrapped to give another set of names that are expected
      by users of this facility.
      
      Change this to have the arch provide:
      
      	flags = arch_local_save_flags()
      	flags = arch_local_irq_save()
      	arch_local_irq_restore(flags)
      	arch_local_irq_disable()
      	arch_local_irq_enable()
      	arch_irqs_disabled_flags(flags)
      	arch_irqs_disabled()
      	arch_safe_halt()
      
      Then linux/irqflags.h wraps these to provide:
      
      	raw_local_save_flags(flags)
      	raw_local_irq_save(flags)
      	raw_local_irq_restore(flags)
      	raw_local_irq_disable()
      	raw_local_irq_enable()
      	raw_irqs_disabled_flags(flags)
      	raw_irqs_disabled()
      	raw_safe_halt()
      
      with type checking on the flags 'arguments', and then wraps those to provide:
      
      	local_save_flags(flags)
      	local_irq_save(flags)
      	local_irq_restore(flags)
      	local_irq_disable()
      	local_irq_enable()
      	irqs_disabled_flags(flags)
      	irqs_disabled()
      	safe_halt()
      
      with tracing included if enabled.
      
      The arch functions can now all be inline functions rather than some of them
      having to be macros.
      
      Signed-off-by: David Howells <dhowells@redhat.com> [X86, FRV, MN10300]
      Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> [Tile]
      Signed-off-by: Michal Simek <monstr@monstr.eu> [Microblaze]
      Tested-by: Catalin Marinas <catalin.marinas@arm.com> [ARM]
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> [AVR]
      Acked-by: Tony Luck <tony.luck@intel.com> [IA-64]
      Acked-by: Hirokazu Takata <takata@linux-m32r.org> [M32R]
      Acked-by: Greg Ungerer <gerg@uclinux.org> [M68K/M68KNOMMU]
      Acked-by: Ralf Baechle <ralf@linux-mips.org> [MIPS]
      Acked-by: Kyle McMartin <kyle@mcmartin.ca> [PA-RISC]
      Acked-by: Paul Mackerras <paulus@samba.org> [PowerPC]
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [S390]
      Acked-by: Chen Liqin <liqin.chen@sunplusct.com> [Score]
      Acked-by: Matt Fleming <matt@console-pimps.org> [SH]
      Acked-by: David S. Miller <davem@davemloft.net> [Sparc]
      Acked-by: Chris Zankel <chris@zankel.net> [Xtensa]
      Reviewed-by: Richard Henderson <rth@twiddle.net> [Alpha]
      Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp> [H8300]
      Cc: starvik@axis.com [CRIS]
      Cc: jesper.nilsson@axis.com [CRIS]
      Cc: linux-cris-kernel@axis.com
      df9ee292
  12. 22 6月, 2010 1 次提交
    • K
      powerpc, hw_breakpoints: Implement hw_breakpoints for 64-bit server processors · 5aae8a53
      K.Prasad 提交于
      Implement perf-events based hw-breakpoint interfaces for PowerPC
      64-bit server (Book III S) processors.  This allows access to a
      given location to be used as an event that can be counted or
      profiled by the perf_events subsystem.
      
      This is done using the DABR (data breakpoint register), which can
      also be used for process debugging via ptrace.  When perf_event
      hw_breakpoint support is configured in, the perf_event subsystem
      manages the DABR and arbitrates access to it, and ptrace then
      creates a perf_event when it is requested to set a data breakpoint.
      
      [Adopted suggestions from Paul Mackerras <paulus@samba.org> to
      - emulate_step() all system-wide breakpoints and single-step only the
        per-task breakpoints
      - perform arch-specific cleanup before unregistration through
        arch_unregister_hw_breakpoint()
      ]
      Signed-off-by: NK.Prasad <prasad@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      5aae8a53
  13. 07 4月, 2010 1 次提交
  14. 05 11月, 2009 1 次提交
  15. 28 10月, 2009 1 次提交
  16. 20 8月, 2009 3 次提交
    • B
      powerpc: Remove use of a second scratch SPRG in STAB code · c5a8c0c9
      Benjamin Herrenschmidt 提交于
      The STAB code used on Power3 and RS/64 uses a second scratch SPRG to
      save a GPR in order to decide whether to go to do_stab_bolted_* or
      to handle a normal data access exception.
      
      This prevents our scheme of freeing SPRG3 which is user visible for
      user uses since we cannot use SPRG0 which, on RS/64, seems to be
      read-only for supervisor mode (like POWER4).
      
      This reworks the STAB exception entry to use the PACA as temporary
      storage instead.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c5a8c0c9
    • B
      powerpc: Use names rather than numbers for SPRGs (v2) · ee43eb78
      Benjamin Herrenschmidt 提交于
      The kernel uses SPRG registers for various purposes, typically in
      low level assembly code as scratch registers or to hold per-cpu
      global infos such as the PACA or the current thread_info pointer.
      
      We want to be able to easily shuffle the usage of those registers
      as some implementations have specific constraints realted to some
      of them, for example, some have userspace readable aliases, etc..
      and the current choice isn't always the best.
      
      This patch should not change any code generation, and replaces the
      usage of SPRN_SPRGn everywhere in the kernel with a named replacement
      and adds documentation next to the definition of the names as to
      what those are used for on each processor family.
      
      The only parts that still use the original numbers are bits of KVM
      or suspend/resume code that just blindly needs to save/restore all
      the SPRGs.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ee43eb78
    • B
      powerpc: Rename exception.h to exception-64s.h · 8aa34ab8
      Benjamin Herrenschmidt 提交于
      The file include/asm/exception.h contains definitions
      that are specific to exception handling on 64-bit server
      type processors.
      
      This renames the file to exception-64s.h to reflect that
      fact and avoid confusion.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8aa34ab8
  17. 18 8月, 2009 1 次提交
    • P
      powerpc: Allow perf_counters to access user memory at interrupt time · 9c1e1052
      Paul Mackerras 提交于
      This provides a mechanism to allow the perf_counters code to access
      user memory in a PMU interrupt routine.  Such an access can cause
      various kinds of interrupt: SLB miss, MMU hash table miss, segment
      table miss, or TLB miss, depending on the processor.  This commit
      only deals with 64-bit classic/server processors, which use an MMU
      hash table.  32-bit processors are already able to access user memory
      at interrupt time.  Since we don't soft-disable on 32-bit, we avoid
      the possibility of reentering hash_page or the TLB miss handlers,
      since they run with interrupts disabled.
      
      On 64-bit processors, an SLB miss interrupt on a user address will
      update the slb_cache and slb_cache_ptr fields in the paca.  This is
      OK except in the case where a PMU interrupt occurs in switch_slb,
      which also accesses those fields.  To prevent this, we hard-disable
      interrupts in switch_slb.  Interrupts are already soft-disabled at
      this point, and will get hard-enabled when they get soft-enabled
      later.
      
      This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
      and to make sure that it clears the slb_cache_ptr when called from
      other callers than switch_slb, the existing routine is renamed to
      __slb_flush_and_rebolt, which is called by switch_slb and the new
      version of slb_flush_and_rebolt.
      
      Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
      hard_irq_disable() to protect the per-cpu variables used there and
      in ste_allocate.
      
      If a MMU hashtable miss interrupt occurs, normally we would call
      hash_page to look up the Linux PTE for the address and create a HPTE.
      However, hash_page is fairly complex and takes some locks, so to
      avoid the possibility of deadlock, we check the preemption count
      to see if we are in a (pseudo-)NMI handler, and if so, we don't call
      hash_page but instead treat it like a bad access that will get
      reported up through the exception table mechanism.  An interrupt
      whose handler runs even though the interrupt occurred when
      soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
      handler, which should use nmi_enter()/nmi_exit() rather than
      irq_enter()/irq_exit().
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      9c1e1052
  18. 09 6月, 2009 2 次提交
  19. 11 3月, 2009 1 次提交
    • B
      powerpc/kconfig: Kill PPC_MULTIPLATFORM · 28794d34
      Benjamin Herrenschmidt 提交于
      CONFIG_PPC_MULTIPLATFORM is a remain of the pre-powerpc days and isn't
      really meaningful anymore. It was basically equivalent to PPC64 || 6xx.
      
      This removes it along with the following changes:
      
       - 32-bit platforms that relied on PPC32 && PPC_MULTIPLATFORM now rely
         on 6xx which is what they want anyway.
      
       - A new symbol, PPC_BOOK3S, is defined that represent compliance with
         the "Server" variant of the architecture. This is set when either 6xx
         or PPC64 is set and open the door for future BOOK3E 64-bit.
      
       - 64-bit platforms that relied on PPC64 && PPC_MULTIPLATFORM now use
         PPC64 && PPC_BOOK3S
      
       - A separate and selectable CONFIG_PPC_OF_BOOT_TRAMPOLINE option is now
         used to control the use of prom_init.c
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      28794d34
  20. 13 1月, 2009 1 次提交
    • B
      powerpc/powermac: Fix occasional SMP boot failure · c478b581
      Benjamin Herrenschmidt 提交于
      The PowerMac kernel occasionally fails to bring up the secondary CPUs on
      SMP, the trigger factor seem to be fairly random and related to location
      of code and data.
      
      This appears to be due to the initial loading of the TOC value by the
      secondary processor which now happens before we clear HID4:RM_CI (Real
      Mode Cache Invalidate). This bit should really be cleared before we do
      any load or store other than fetching code.
      
      This fix works based on the assumption that all SMP 64-bit PowerMacs use
      variants of the 970, which fortunately is true, by explicitely clearing
      that bit, adding an slbia for good measure as RM_CI mode is known to
      create bogus ERAT entries.
      
      I also removed some spurrious debug output that was left enabled by
      mistake while at it.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      c478b581
  21. 31 10月, 2008 1 次提交
    • M
      powerpc/ppc64/kdump: Better flag for running relocatable · 8b8b0cc1
      Milton Miller 提交于
      The __kdump_flag ABI is overly constraining for future development.
      
      As of 2.6.27, the kernel entry point has 4 constraints:  Offset 0 is
      the starting point for the master (boot) cpu (entered with r3 pointing
      to the device tree structure), offset 0x60 is code for the slave cpus
      (entered with r3 set to their device tree physical id), offset 0x20 is
      used by the iseries hypervisor, and secondary cpus must be well behaved
      when the first 256 bytes are copied to address 0.
      
      Placing the __kdump_flag at 0x18 is bad because:
      
      - It was taking the last 8 bytes before the iseries hypervisor data.
      - It was 8 bytes for a boolean flag
      - It had no way of identifying that the flag was present
      - It does leave any room for the master to add any additional code
        before branching, which hurts debug.
      - It will be unnecessarily hard for 32 bit code to be common (8 bytes)
      
      Now that we have eliminated the use of __kdump_flag in favor of
      the standard is_kdump_kernel(), this flag only controls run without
      relocating the kernel to PHYSICAL_START (0), so rename it __run_at_load.
      
      Move the flag to 0x5c, 1 word before the secondary cpu entry point at
      0x60.  Initialize it with "run0" to say it will run at 0 unless it is
      set to 1.  It only exists if we are relocatable.
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      8b8b0cc1
  22. 22 10月, 2008 1 次提交
    • M
      powerpc: Support for relocatable kdump kernel · 54622f10
      Mohan Kumar M 提交于
      This adds relocatable kernel support for kdump. With this one can
      use the same regular kernel to capture the kdump. A signature (0xfeed1234)
      is passed in r6 from panic code to the next kernel through kexec_sequence
      and purgatory code. The signature is used to differentiate between
      kdump kernel and non-kdump kernels.
      
      The purgatory code compares the signature and sets the __kdump_flag in
      head_64.S.  During the boot up, kernel code checks __kdump_flag and if it
      is set, the kernel will behave as relocatable kdump kernel. This kernel
      will boot at the address where it was loaded by kexec-tools ie. at the
      address reserved through crashkernel boot parameter.
      
      CONFIG_CRASH_DUMP depends on CONFIG_RELOCATABLE option to build kdump
      kernel as relocatable. So the same kernel can be used as production and
      kdump kernel.
      
      This patch incorporates the changes suggested by Paul Mackerras to avoid
      GOT use and to avoid two copies of the code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMohan Kumar M <mohan@in.ibm.com>
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      54622f10
  23. 16 9月, 2008 4 次提交
    • P
      powerpc: Make the 64-bit kernel as a position-independent executable · 549e8152
      Paul Mackerras 提交于
      This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as
      a position-independent executable (PIE) when it is set.  This involves
      processing the dynamic relocations in the image in the early stages of
      booting, even if the kernel is being run at the address it is linked at,
      since the linker does not necessarily fill in words in the image for
      which there are dynamic relocations.  (In fact the linker does fill in
      such words for 64-bit executables, though not for 32-bit executables,
      so in principle we could avoid calling relocate() entirely when we're
      running a 64-bit kernel at the linked address.)
      
      The dynamic relocations are processed by a new function relocate(addr),
      where the addr parameter is the virtual address where the image will be
      run.  In fact we call it twice; once before calling prom_init, and again
      when starting the main kernel.  This means that reloc_offset() returns
      0 in prom_init (since it has been relocated to the address it is running
      at), which necessitated a few adjustments.
      
      This also changes __va and __pa to use an equivalent definition that is
      simpler.  With the relocatable kernel, PAGE_OFFSET and MEMORY_START are
      constants (for 64-bit) whereas PHYSICAL_START is a variable (and
      KERNELBASE ideally should be too, but isn't yet).
      
      With this, relocatable kernels still copy themselves down to physical
      address 0 and run there.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      549e8152
    • P
      powerpc: Use LOAD_REG_IMMEDIATE only for constants on 64-bit · e31aa453
      Paul Mackerras 提交于
      Using LOAD_REG_IMMEDIATE to get the address of kernel symbols
      generates 5 instructions where LOAD_REG_ADDR can do it in one,
      and will generate R_PPC64_ADDR16_* relocations in the output when
      we get to making the kernel as a position-independent executable,
      which we'd rather not have to handle.  This changes various bits
      of assembly code to use LOAD_REG_ADDR when we need to get the
      address of a symbol, or to use suitable position-independent code
      for cases where we can't access the TOC for various reasons, or
      if we're not running at the address we were linked at.
      
      It also cleans up a few minor things; there's no reason to save and
      restore SRR0/1 around RTAS calls, __mmu_off can get the return
      address from LR more conveniently than the caller can supply it in
      R4 (and we already assume elsewhere that EA == RA if the MMU is on
      in early boot), and enable_64b_mode was using 5 instructions where
      2 would do.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      e31aa453
    • P
      powerpc: Make it possible to move the interrupt handlers away from the kernel · 1f6a93e4
      Paul Mackerras 提交于
      This changes the way that the exception prologs transfer control to
      the handlers in 64-bit kernels with the aim of making it possible to
      have the prologs separate from the main body of the kernel.  Now,
      instead of computing the address of the handler by taking the top
      32 bits of the paca address (to get the 0xc0000000........ part) and
      ORing in something in the bottom 16 bits, we get the base address of
      the kernel by doing a load from the paca and add an offset.
      
      This also replaces an mfmsr and an ori to compute the MSR value for
      the handler with a load from the paca.  That makes it unnecessary to
      have a separate version of EXCEPTION_PROLOG_PSERIES that forces 64-bit
      mode.
      
      We can no longer use a direct branches in the exception prolog code,
      which means that the SLB miss handlers can't branch directly to
      .slb_miss_realmode any more.  Instead we have to compute the address
      and do an indirect branch.  This is conditional on CONFIG_RELOCATABLE;
      for non-relocatable kernels we use a direct branch as before.  (A later
      change will allow CONFIG_RELOCATABLE to be set on 64-bit powerpc.)
      
      Since the secondary CPUs on pSeries start execution in the first 0x100
      bytes of real memory and then have to get to wherever the kernel is,
      we can't use a direct branch to get there.  Instead this changes
      __secondary_hold_spinloop from a flag to a function pointer.  When it
      is set to a non-NULL value, the secondary CPUs jump to the function
      pointed to by that value.
      
      Finally this eliminates one code difference between 32-bit and 64-bit
      by making __secondary_hold be the text address of the secondary CPU
      spinloop rather than a function descriptor for it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1f6a93e4
    • P
      powerpc: Rearrange head_64.S to move interrupt handler code to the beginning · 9a955167
      Paul Mackerras 提交于
      This rearranges head_64.S so that we have all the first-level exception
      prologs together starting at 0x100, followed by all the second-level
      handlers that are invoked from the first-level prologs, followed by
      other code.  This doesn't make any functional change but will make
      following changes for relocatable kernel support easier.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      9a955167
  24. 15 7月, 2008 1 次提交