1. 17 5月, 2010 2 次提交
  2. 01 3月, 2010 3 次提交
    • A
      KVM: PPC: Keep SRR1 flags around in shadow_msr · f7adbba1
      Alexander Graf 提交于
      SRR1 stores more information that just the MSR value. It also stores
      valuable information about the type of interrupt we received, for
      example whether the storage interrupt we just got was because of a
      missing htab entry or not.
      
      We use that information to speed up the exit path.
      
      Now if we get preempted before we can interpret the shadow_msr values,
      we get into vcpu_put which then calls the MSR handler, which then sets
      all the SRR1 information bits in shadow_msr to 0. Great.
      
      So let's preserve the SRR1 specific bits in shadow_msr whenever we set
      the MSR. They don't hurt.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f7adbba1
    • A
      KVM: PPC: Call SLB patching code in interrupt safe manner · 021ec9c6
      Alexander Graf 提交于
      Currently we're racy when doing the transition from IR=1 to IR=0, from
      the module memory entry code to the real mode SLB switching code.
      
      To work around that I took a look at the RTAS entry code which is faced
      with a similar problem and did the same thing:
      
        A small helper in linear mapped memory that does mtmsr with IR=0 and
        then RFIs info the actual handler.
      
      Thanks to that trick we can safely take page faults in the entry code
      and only need to be really wary of what to do as of the SLB switching
      part.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      021ec9c6
    • A
      KVM: PPC: Use PACA backed shadow vcpu · 7e57cba0
      Alexander Graf 提交于
      We're being horribly racy right now. All the entry and exit code hijacks
      random fields from the PACA that could easily be used by different code in
      case we get interrupted, for example by a #MC or even page fault.
      
      After discussing this with Ben, we figured it's best to reserve some more
      space in the PACA and just shove off some vcpu state to there.
      
      That way we can drastically improve the readability of the code, make it
      less racy and less complex.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      7e57cba0
  3. 21 11月, 2009 1 次提交
  4. 05 11月, 2009 2 次提交
  5. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  6. 20 8月, 2009 2 次提交
  7. 18 8月, 2009 1 次提交
    • P
      powerpc: Allow perf_counters to access user memory at interrupt time · 9c1e1052
      Paul Mackerras 提交于
      This provides a mechanism to allow the perf_counters code to access
      user memory in a PMU interrupt routine.  Such an access can cause
      various kinds of interrupt: SLB miss, MMU hash table miss, segment
      table miss, or TLB miss, depending on the processor.  This commit
      only deals with 64-bit classic/server processors, which use an MMU
      hash table.  32-bit processors are already able to access user memory
      at interrupt time.  Since we don't soft-disable on 32-bit, we avoid
      the possibility of reentering hash_page or the TLB miss handlers,
      since they run with interrupts disabled.
      
      On 64-bit processors, an SLB miss interrupt on a user address will
      update the slb_cache and slb_cache_ptr fields in the paca.  This is
      OK except in the case where a PMU interrupt occurs in switch_slb,
      which also accesses those fields.  To prevent this, we hard-disable
      interrupts in switch_slb.  Interrupts are already soft-disabled at
      this point, and will get hard-enabled when they get soft-enabled
      later.
      
      This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
      and to make sure that it clears the slb_cache_ptr when called from
      other callers than switch_slb, the existing routine is renamed to
      __slb_flush_and_rebolt, which is called by switch_slb and the new
      version of slb_flush_and_rebolt.
      
      Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
      hard_irq_disable() to protect the per-cpu variables used there and
      in ste_allocate.
      
      If a MMU hashtable miss interrupt occurs, normally we would call
      hash_page to look up the Linux PTE for the address and create a HPTE.
      However, hash_page is fairly complex and takes some locks, so to
      avoid the possibility of deadlock, we check the preemption count
      to see if we are in a (pseudo-)NMI handler, and if so, we don't call
      hash_page but instead treat it like a bad access that will get
      reported up through the exception table mechanism.  An interrupt
      whose handler runs even though the interrupt occurred when
      soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
      handler, which should use nmi_enter()/nmi_exit() rather than
      irq_enter()/irq_exit().
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      9c1e1052
  8. 09 6月, 2009 1 次提交
  9. 24 3月, 2009 1 次提交
  10. 11 3月, 2009 1 次提交
  11. 09 1月, 2009 1 次提交
    • P
      powerpc: Provide a way to defer perf counter work until interrupts are enabled · 93a6d3ce
      Paul Mackerras 提交于
      Because 64-bit powerpc uses lazy (soft) interrupt disabling, it is
      possible for a performance monitor exception to come in when the
      kernel thinks interrupts are disabled (i.e. when they are
      soft-disabled but hard-enabled).  In such a situation the performance
      monitor exception handler might have some processing to do (such as
      process wakeups) which can't be done in what is effectively an NMI
      handler.
      
      This provides a way to defer that work until interrupts get enabled,
      either in raw_local_irq_restore() or by returning from an interrupt
      handler to code that had interrupts enabled.  We have a per-processor
      flag that indicates that there is work pending to do when interrupts
      subsequently get re-enabled.  This flag is checked in the interrupt
      return path and in raw_local_irq_restore(), and if it is set,
      perf_counter_do_pending() is called to do the pending work.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      93a6d3ce
  12. 08 1月, 2009 1 次提交
  13. 31 12月, 2008 4 次提交
    • H
      KVM: ppc: Implement in-kernel exit timing statistics · 73e75b41
      Hollis Blanchard 提交于
      Existing KVM statistics are either just counters (kvm_stat) reported for
      KVM generally or trace based aproaches like kvm_trace.
      For KVM on powerpc we had the need to track the timings of the different exit
      types. While this could be achieved parsing data created with a kvm_trace
      extension this adds too much overhead (at least on embedded PowerPC) slowing
      down the workloads we wanted to measure.
      
      Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm
      code. These statistic is available per vm&vcpu under the kvm debugfs directory.
      As this statistic is low, but still some overhead it can be enabled via a
      .config entry and should be off by default.
      
      Since this patch touched all powerpc kvm_stat code anyway this code is now
      merged and simplified together with the exit timing statistic code (still
      working with exit timing disabled in .config).
      Signed-off-by: NChristian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      73e75b41
    • H
      KVM: ppc: directly insert shadow mappings into the hardware TLB · 7924bd41
      Hollis Blanchard 提交于
      Formerly, we used to maintain a per-vcpu shadow TLB and on every entry to the
      guest would load this array into the hardware TLB. This consumed 1280 bytes of
      memory (64 entries of 16 bytes plus a struct page pointer each), and also
      required some assembly to loop over the array on every entry.
      
      Instead of saving a copy in memory, we can just store shadow mappings directly
      into the hardware TLB, accepting that the host kernel will clobber these as
      part of the normal 440 TLB round robin. When we do that we need less than half
      the memory, and we have decreased the exit handling time for all guest exits,
      at the cost of increased number of TLB misses because the host overwrites some
      guest entries.
      
      These savings will be increased on processors with larger TLBs or which
      implement intelligent flush instructions like tlbivax (which will avoid the
      need to walk arrays in software).
      
      In addition to that and to the code simplification, we have a greater chance of
      leaving other host userspace mappings in the TLB, instead of forcing all
      subsequent tasks to re-fault all their mappings.
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      7924bd41
    • H
      KVM: ppc: create struct kvm_vcpu_44x and introduce container_of() accessor · db93f574
      Hollis Blanchard 提交于
      This patch doesn't yet move all 44x-specific data into the new structure, but
      is the first step down that path. In the future we may also want to create a
      struct kvm_vcpu_booke.
      
      Based on patch from Liu Yu <yu.liu@freescale.com>.
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      db93f574
    • H
      KVM: ppc: Rename "struct tlbe" to "struct kvmppc_44x_tlbe" · 0f55dc48
      Hollis Blanchard 提交于
      This will ease ports to other cores.
      
      Also remove unused "struct kvm_tlb" while we're at it.
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      0f55dc48
  14. 29 12月, 2008 1 次提交
  15. 21 12月, 2008 1 次提交
  16. 06 11月, 2008 1 次提交
    • P
      powerpc: Improve resolution of VDSO clock_gettime · 597bc5c0
      Paul Mackerras 提交于
      Currently the clock_gettime implementation in the VDSO produces a
      result with microsecond resolution for the cases that are handled
      without a system call, i.e. CLOCK_REALTIME and CLOCK_MONOTONIC.  The
      nanoseconds field of the result is obtained by computing a
      microseconds value and multiplying by 1000.
      
      This changes the code in the VDSO to do the computation for
      clock_gettime with nanosecond resolution.  That means that the
      resolution of the result will ultimately depend on the timebase
      frequency.
      
      Because the timestamp in the VDSO datapage (stamp_xsec, the real time
      corresponding to the timebase count in tb_orig_stamp) is in units of
      2^-20 seconds, it doesn't have sufficient resolution for computing a
      result with nanosecond resolution.  Therefore this adds a copy of
      xtime to the VDSO datapage and updates it in update_gtod() along with
      the other time-related fields.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      597bc5c0
  17. 15 10月, 2008 3 次提交
  18. 25 9月, 2008 1 次提交
    • B
      POWERPC: Allow 32-bit hashed pgtable code to support 36-bit physical · 4ee7084e
      Becky Bruce 提交于
      This rearranges a bit of code, and adds support for
      36-bit physical addressing for configs that use a
      hashed page table.  The 36b physical support is not
      enabled by default on any config - it must be
      explicitly enabled via the config system.
      
      This patch *only* expands the page table code to accomodate
      large physical addresses on 32-bit systems and enables the
      PHYS_64BIT config option for 86xx.  It does *not*
      allow you to boot a board with more than about 3.5GB of
      RAM - for that, SWIOTLB support is also required (and
      coming soon).
      Signed-off-by: NBecky Bruce <becky.bruce@freescale.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      4ee7084e
  19. 16 9月, 2008 1 次提交
    • P
      powerpc: Make it possible to move the interrupt handlers away from the kernel · 1f6a93e4
      Paul Mackerras 提交于
      This changes the way that the exception prologs transfer control to
      the handlers in 64-bit kernels with the aim of making it possible to
      have the prologs separate from the main body of the kernel.  Now,
      instead of computing the address of the handler by taking the top
      32 bits of the paca address (to get the 0xc0000000........ part) and
      ORing in something in the bottom 16 bits, we get the base address of
      the kernel by doing a load from the paca and add an offset.
      
      This also replaces an mfmsr and an ori to compute the MSR value for
      the handler with a load from the paca.  That makes it unnecessary to
      have a separate version of EXCEPTION_PROLOG_PSERIES that forces 64-bit
      mode.
      
      We can no longer use a direct branches in the exception prolog code,
      which means that the SLB miss handlers can't branch directly to
      .slb_miss_realmode any more.  Instead we have to compute the address
      and do an indirect branch.  This is conditional on CONFIG_RELOCATABLE;
      for non-relocatable kernels we use a direct branch as before.  (A later
      change will allow CONFIG_RELOCATABLE to be set on 64-bit powerpc.)
      
      Since the secondary CPUs on pSeries start execution in the first 0x100
      bytes of real memory and then have to get to wherever the kernel is,
      we can't use a direct branch to get there.  Instead this changes
      __secondary_hold_spinloop from a flag to a function pointer.  When it
      is set to a non-NULL value, the secondary CPUs jump to the function
      pointed to by that value.
      
      Finally this eliminates one code difference between 32-bit and 64-bit
      by making __secondary_hold be the text address of the secondary CPU
      spinloop rather than a function descriptor for it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1f6a93e4
  20. 01 7月, 2008 1 次提交
    • M
      powerpc: Introduce VSX thread_struct and CONFIG_VSX · c6e6771b
      Michael Neuling 提交于
      The layout of the new VSR registers and how they overlap on top of the
      legacy FPR and VR registers is:
      
                         VSR doubleword 0               VSR doubleword 1
                ----------------------------------------------------------------
        VSR[0]  |             FPR[0]            |                              |
                ----------------------------------------------------------------
        VSR[1]  |             FPR[1]            |                              |
                ----------------------------------------------------------------
                |              ...              |                              |
                |              ...              |                              |
                ----------------------------------------------------------------
        VSR[30] |             FPR[30]           |                              |
                ----------------------------------------------------------------
        VSR[31] |             FPR[31]           |                              |
                ----------------------------------------------------------------
        VSR[32] |                             VR[0]                            |
                ----------------------------------------------------------------
        VSR[33] |                             VR[1]                            |
                ----------------------------------------------------------------
                |                              ...                             |
                |                              ...                             |
                ----------------------------------------------------------------
        VSR[62] |                             VR[30]                           |
                ----------------------------------------------------------------
        VSR[63] |                             VR[31]                           |
                ----------------------------------------------------------------
      
      VSX has 64 128bit registers.  The first 32 regs overlap with the FP
      registers and hence extend them with and additional 64 bits.  The
      second 32 regs overlap with the VMX registers.
      
      This commit introduces the thread_struct changes required to reflect
      this register layout.  Ptrace and signals code is updated so that the
      floating point registers are correctly accessed from the thread_struct
      when CONFIG_VSX is enabled.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c6e6771b
  21. 03 6月, 2008 1 次提交
    • K
      [POWERPC] 40x/Book-E: Save/restore volatile exception registers · fca622c5
      Kumar Gala 提交于
      On machines with more than one exception level any system register that
      might be modified by the "normal" exception level needs to be saved and
      restored on taking a higher level exception.  We already are saving
      and restoring ESR and DEAR.
      
      For critical level add SRR0/1.
      For debug level add CSRR0/1 and SRR0/1.
      For machine check level add DSRR0/1, CSRR0/1, and SRR0/1.
      
      On FSL Book-E parts we always save/restore the MAS registers for critical,
      debug, and machine check level exceptions.  On 44x we always save/restore
      the MMUCR.
      
      Additionally, we save and restore the ksp_limit since we have to adjust it
      for each exception level.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      fca622c5
  22. 29 4月, 2008 2 次提交
  23. 27 4月, 2008 1 次提交
  24. 24 4月, 2008 1 次提交
  25. 15 4月, 2008 1 次提交
  26. 26 3月, 2008 1 次提交
  27. 08 2月, 2008 1 次提交
  28. 06 2月, 2008 1 次提交
  29. 07 12月, 2007 1 次提交