1. 29 7月, 2010 1 次提交
    • P
      powerpc: Rework VDSO gettimeofday to prevent time going backwards · 0e469db8
      Paul Mackerras 提交于
      Currently it is possible for userspace to see the result of
      gettimeofday() going backwards by 1 microsecond, assuming that
      userspace is using the gettimeofday() in the VDSO.  The VDSO
      gettimeofday() algorithm computes the time in "xsecs", which are
      units of 2^-20 seconds, or approximately 0.954 microseconds,
      using the algorithm
      
      	now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec
      
      and then converts the time in xsecs to seconds and microseconds.
      
      The kernel updates the tb_orig_stamp and stamp_xsec values every
      tick in update_vsyscall().  If the length of the tick is not an
      integer number of xsecs, then some precision is lost in converting
      the current time to xsecs.  For example, with CONFIG_HZ=1000, the
      tick is 1ms long, which is 1048.576 xsecs.  That means that
      stamp_xsec will advance by either 1048 or 1049 on each tick.
      With the right conditions, it is possible for userspace to get
      (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is
      slightly late in updating the vdso_datapage, and then for stamp_xsec
      to advance by 1048 when the kernel does update it, and for userspace
      to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to
      integer truncation.  The result is that time appears to go backwards
      by 1 microsecond.
      
      To fix this we change the VDSO gettimeofday to use a new field in the
      VDSO datapage which stores the nanoseconds part of the time as a
      fractional number of seconds in a 0.32 binary fraction format.
      (Or put another way, as a 32-bit number in units of 0.23283 ns.)
      This is convenient because we can use the mulhwu instruction to
      convert it to either microseconds or nanoseconds.
      
      Since it turns out that computing the time of day using this new field
      is simpler than either using stamp_xsec (as gettimeofday does) or
      stamp_xtime.tv_nsec (as clock_gettime does), this converts both
      gettimeofday and clock_gettime to use the new field.  The existing
      __do_get_tspec function is converted to use the new field and take
      a parameter in r7 that indicates the desired resolution, 1,000,000
      for microseconds or 1,000,000,000 for nanoseconds.  The __do_get_xsec
      function is then unused and is deleted.
      
      The new algorithm is
      
      	now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs
      		+ (stamp_xtime_seconds << 32) + stamp_sec_fraction
      
      with 'now' in units of 2^-32 seconds.  That is then converted to
      seconds and either microseconds or nanoseconds with
      
      	seconds = now >> 32
      	partseconds = ((now & 0xffffffff) * resolution) >> 32
      
      The 32-bit VDSO code also makes a further simplification: it ignores
      the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary
      fraction.  Doing so gets rid of 4 multiply instructions.  Assuming
      a timebase frequency of 1GHz or less and an update interval of no
      more than 10ms, the upper 32 bits of tb_to_xs will be at least
      4503599, so the error from ignoring the low 32 bits will be at most
      2.2ns, which is more than an order of magnitude less than the time
      taken to do gettimeofday or clock_gettime on our fastest processors,
      so there is no possibility of seeing inconsistent values due to this.
      
      This also moves update_gtod() down next to its only caller, and makes
      update_vsyscall use the time passed in via the wall_time argument rather
      than accessing xtime directly.  At present, wall_time always points to
      xtime, but that could change in future.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0e469db8
  2. 09 7月, 2010 2 次提交
    • A
      powerpc: Optimise per cpu accesses on 64bit · ae01f84b
      Anton Blanchard 提交于
      Now we dynamically allocate the paca array, it takes an extra load
      whenever we want to access another cpu's paca. One place we do that a lot
      is per cpu variables. A simple example:
      
      DEFINE_PER_CPU(unsigned long, vara);
      unsigned long test4(int cpu)
      {
      	return per_cpu(vara, cpu);
      }
      
      This takes 4 loads, 5 if you include the actual load of the per cpu variable:
      
          ld r11,-32760(r30)  # load address of paca pointer
          ld r9,-32768(r30)   # load link address of percpu variable
          sldi r3,r29,9       # get offset into paca (each entry is 512 bytes)
          ld r0,0(r11)        # load paca pointer
          add r3,r0,r3        # paca + offset
          ld r11,64(r3)       # load paca[cpu].data_offset
      
          ldx r3,r9,r11       # load per cpu variable
      
      If we remove the ppc64 specific per_cpu_offset(), we get the generic one
      which indexes into a statically allocated array. This removes one load and
      one add:
      
          ld r11,-32760(r30)  # load address of __per_cpu_offset
          ld r9,-32768(r30)   # load link address of percpu variable
          sldi r3,r29,3       # get offset into __per_cpu_offset (each entry 8 bytes)
          ldx r11,r11,r3      # load __per_cpu_offset[cpu]
      
          ldx r3,r9,r11       # load per cpu variable
      
      Having all the offsets in one array also helps when iterating over a per cpu
      variable across a number of cpus, such as in the scheduler. Before we would
      need to load one paca cacheline when calculating each per cpu offset. Now we
      have 16 (128 / sizeof(long)) per cpu offsets in each cacheline.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ae01f84b
    • P
      powerpc: Rework VDSO gettimeofday to prevent time going backwards · 8fd63a9e
      Paul Mackerras 提交于
      Currently it is possible for userspace to see the result of
      gettimeofday() going backwards by 1 microsecond, assuming that
      userspace is using the gettimeofday() in the VDSO.  The VDSO
      gettimeofday() algorithm computes the time in "xsecs", which are
      units of 2^-20 seconds, or approximately 0.954 microseconds,
      using the algorithm
      
      	now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec
      
      and then converts the time in xsecs to seconds and microseconds.
      
      The kernel updates the tb_orig_stamp and stamp_xsec values every
      tick in update_vsyscall().  If the length of the tick is not an
      integer number of xsecs, then some precision is lost in converting
      the current time to xsecs.  For example, with CONFIG_HZ=1000, the
      tick is 1ms long, which is 1048.576 xsecs.  That means that
      stamp_xsec will advance by either 1048 or 1049 on each tick.
      With the right conditions, it is possible for userspace to get
      (timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is
      slightly late in updating the vdso_datapage, and then for stamp_xsec
      to advance by 1048 when the kernel does update it, and for userspace
      to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to
      integer truncation.  The result is that time appears to go backwards
      by 1 microsecond.
      
      To fix this we change the VDSO gettimeofday to use a new field in the
      VDSO datapage which stores the nanoseconds part of the time as a
      fractional number of seconds in a 0.32 binary fraction format.
      (Or put another way, as a 32-bit number in units of 0.23283 ns.)
      This is convenient because we can use the mulhwu instruction to
      convert it to either microseconds or nanoseconds.
      
      Since it turns out that computing the time of day using this new field
      is simpler than either using stamp_xsec (as gettimeofday does) or
      stamp_xtime.tv_nsec (as clock_gettime does), this converts both
      gettimeofday and clock_gettime to use the new field.  The existing
      __do_get_tspec function is converted to use the new field and take
      a parameter in r7 that indicates the desired resolution, 1,000,000
      for microseconds or 1,000,000,000 for nanoseconds.  The __do_get_xsec
      function is then unused and is deleted.
      
      The new algorithm is
      
      	now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs
      		+ (stamp_xtime_seconds << 32) + stamp_sec_fraction
      
      with 'now' in units of 2^-32 seconds.  That is then converted to
      seconds and either microseconds or nanoseconds with
      
      	seconds = now >> 32
      	partseconds = ((now & 0xffffffff) * resolution) >> 32
      
      The 32-bit VDSO code also makes a further simplification: it ignores
      the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary
      fraction.  Doing so gets rid of 4 multiply instructions.  Assuming
      a timebase frequency of 1GHz or less and an update interval of no
      more than 10ms, the upper 32 bits of tb_to_xs will be at least
      4503599, so the error from ignoring the low 32 bits will be at most
      2.2ns, which is more than an order of magnitude less than the time
      taken to do gettimeofday or clock_gettime on our fastest processors,
      so there is no possibility of seeing inconsistent values due to this.
      
      This also moves update_gtod() down next to its only caller, and makes
      update_vsyscall use the time passed in via the wall_time argument rather
      than accessing xtime directly.  At present, wall_time always points to
      xtime, but that could change in future.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8fd63a9e
  3. 21 5月, 2010 1 次提交
    • M
      powerpc/kexec: Fix race in kexec shutdown · 1fc711f7
      Michael Neuling 提交于
      In kexec_prepare_cpus, the primary CPU IPIs the secondary CPUs to
      kexec_smp_down().  kexec_smp_down() calls kexec_smp_wait() which sets
      the hw_cpu_id() to -1.  The primary does this while leaving IRQs on
      which means the primary can take a timer interrupt which can lead to
      the IPIing one of the secondary CPUs (say, for a scheduler re-balance)
      but since the secondary CPU now has a hw_cpu_id = -1, we IPI CPU
      -1... Kaboom!
      
      We are hitting this case regularly on POWER7 machines.
      
      There is also a second race, where the primary will tear down the MMU
      mappings before knowing the secondaries have entered real mode.
      
      Also, the secondaries are clearing out any pending IPIs before
      guaranteeing that no more will be received.
      
      This changes kexec_prepare_cpus() so that we turn off IRQs in the
      primary CPU much earlier.  It adds a paca flag to say that the
      secondaries have entered the kexec_smp_down() IPI and turned off IRQs,
      rather than overloading hw_cpu_id with -1.  This new paca flag is
      again used to in indicate when the secondaries has entered real mode.
      
      It also ensures that all CPUs have their IRQs off before we clear out
      any pending IPI requests (in kexec_cpu_down()) to ensure there are no
      trailing IPIs left unacknowledged.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1fc711f7
  4. 17 5月, 2010 5 次提交
  5. 14 5月, 2010 1 次提交
  6. 12 5月, 2010 1 次提交
    • P
      powerpc/perf_event: Fix oops due to perf_event_do_pending call · 0fe1ac48
      Paul Mackerras 提交于
      Anton Blanchard found that large POWER systems would occasionally
      crash in the exception exit path when profiling with perf_events.
      The symptom was that an interrupt would occur late in the exit path
      when the MSR[RI] (recoverable interrupt) bit was clear.  Interrupts
      should be hard-disabled at this point but they were enabled.  Because
      the interrupt was not recoverable the system panicked.
      
      The reason is that the exception exit path was calling
      perf_event_do_pending after hard-disabling interrupts, and
      perf_event_do_pending will re-enable interrupts.
      
      The simplest and cleanest fix for this is to use the same mechanism
      that 32-bit powerpc does, namely to cause a self-IPI by setting the
      decrementer to 1.  This means we can remove the tests in the exception
      exit path and raw_local_irq_restore.
      
      This also makes sure that the call to perf_event_do_pending from
      timer_interrupt() happens within irq_enter/irq_exit.  (Note that
      calling perf_event_do_pending from timer_interrupt does not mean that
      there is a possible 1/HZ latency; setting the decrementer to 1 ensures
      that the timer interrupt will happen immediately, i.e. within one
      timebase tick, which is a few nanoseconds or 10s of nanoseconds.)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: stable@kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0fe1ac48
  7. 01 3月, 2010 3 次提交
    • A
      KVM: PPC: Keep SRR1 flags around in shadow_msr · f7adbba1
      Alexander Graf 提交于
      SRR1 stores more information that just the MSR value. It also stores
      valuable information about the type of interrupt we received, for
      example whether the storage interrupt we just got was because of a
      missing htab entry or not.
      
      We use that information to speed up the exit path.
      
      Now if we get preempted before we can interpret the shadow_msr values,
      we get into vcpu_put which then calls the MSR handler, which then sets
      all the SRR1 information bits in shadow_msr to 0. Great.
      
      So let's preserve the SRR1 specific bits in shadow_msr whenever we set
      the MSR. They don't hurt.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      f7adbba1
    • A
      KVM: PPC: Call SLB patching code in interrupt safe manner · 021ec9c6
      Alexander Graf 提交于
      Currently we're racy when doing the transition from IR=1 to IR=0, from
      the module memory entry code to the real mode SLB switching code.
      
      To work around that I took a look at the RTAS entry code which is faced
      with a similar problem and did the same thing:
      
        A small helper in linear mapped memory that does mtmsr with IR=0 and
        then RFIs info the actual handler.
      
      Thanks to that trick we can safely take page faults in the entry code
      and only need to be really wary of what to do as of the SLB switching
      part.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      021ec9c6
    • A
      KVM: PPC: Use PACA backed shadow vcpu · 7e57cba0
      Alexander Graf 提交于
      We're being horribly racy right now. All the entry and exit code hijacks
      random fields from the PACA that could easily be used by different code in
      case we get interrupted, for example by a #MC or even page fault.
      
      After discussing this with Ben, we figured it's best to reserve some more
      space in the PACA and just shove off some vcpu state to there.
      
      That way we can drastically improve the readability of the code, make it
      less racy and less complex.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      7e57cba0
  8. 21 11月, 2009 1 次提交
  9. 05 11月, 2009 2 次提交
  10. 21 9月, 2009 1 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  11. 20 8月, 2009 2 次提交
  12. 18 8月, 2009 1 次提交
    • P
      powerpc: Allow perf_counters to access user memory at interrupt time · 9c1e1052
      Paul Mackerras 提交于
      This provides a mechanism to allow the perf_counters code to access
      user memory in a PMU interrupt routine.  Such an access can cause
      various kinds of interrupt: SLB miss, MMU hash table miss, segment
      table miss, or TLB miss, depending on the processor.  This commit
      only deals with 64-bit classic/server processors, which use an MMU
      hash table.  32-bit processors are already able to access user memory
      at interrupt time.  Since we don't soft-disable on 32-bit, we avoid
      the possibility of reentering hash_page or the TLB miss handlers,
      since they run with interrupts disabled.
      
      On 64-bit processors, an SLB miss interrupt on a user address will
      update the slb_cache and slb_cache_ptr fields in the paca.  This is
      OK except in the case where a PMU interrupt occurs in switch_slb,
      which also accesses those fields.  To prevent this, we hard-disable
      interrupts in switch_slb.  Interrupts are already soft-disabled at
      this point, and will get hard-enabled when they get soft-enabled
      later.
      
      This also reworks slb_flush_and_rebolt: to avoid hard-disabling twice,
      and to make sure that it clears the slb_cache_ptr when called from
      other callers than switch_slb, the existing routine is renamed to
      __slb_flush_and_rebolt, which is called by switch_slb and the new
      version of slb_flush_and_rebolt.
      
      Similarly, switch_stab (used on POWER3 and RS64 processors) gets a
      hard_irq_disable() to protect the per-cpu variables used there and
      in ste_allocate.
      
      If a MMU hashtable miss interrupt occurs, normally we would call
      hash_page to look up the Linux PTE for the address and create a HPTE.
      However, hash_page is fairly complex and takes some locks, so to
      avoid the possibility of deadlock, we check the preemption count
      to see if we are in a (pseudo-)NMI handler, and if so, we don't call
      hash_page but instead treat it like a bad access that will get
      reported up through the exception table mechanism.  An interrupt
      whose handler runs even though the interrupt occurred when
      soft-disabled (such as the PMU interrupt) is considered a pseudo-NMI
      handler, which should use nmi_enter()/nmi_exit() rather than
      irq_enter()/irq_exit().
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      9c1e1052
  13. 09 6月, 2009 1 次提交
  14. 24 3月, 2009 1 次提交
  15. 11 3月, 2009 1 次提交
  16. 09 1月, 2009 1 次提交
    • P
      powerpc: Provide a way to defer perf counter work until interrupts are enabled · 93a6d3ce
      Paul Mackerras 提交于
      Because 64-bit powerpc uses lazy (soft) interrupt disabling, it is
      possible for a performance monitor exception to come in when the
      kernel thinks interrupts are disabled (i.e. when they are
      soft-disabled but hard-enabled).  In such a situation the performance
      monitor exception handler might have some processing to do (such as
      process wakeups) which can't be done in what is effectively an NMI
      handler.
      
      This provides a way to defer that work until interrupts get enabled,
      either in raw_local_irq_restore() or by returning from an interrupt
      handler to code that had interrupts enabled.  We have a per-processor
      flag that indicates that there is work pending to do when interrupts
      subsequently get re-enabled.  This flag is checked in the interrupt
      return path and in raw_local_irq_restore(), and if it is set,
      perf_counter_do_pending() is called to do the pending work.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      93a6d3ce
  17. 08 1月, 2009 1 次提交
  18. 31 12月, 2008 4 次提交
    • H
      KVM: ppc: Implement in-kernel exit timing statistics · 73e75b41
      Hollis Blanchard 提交于
      Existing KVM statistics are either just counters (kvm_stat) reported for
      KVM generally or trace based aproaches like kvm_trace.
      For KVM on powerpc we had the need to track the timings of the different exit
      types. While this could be achieved parsing data created with a kvm_trace
      extension this adds too much overhead (at least on embedded PowerPC) slowing
      down the workloads we wanted to measure.
      
      Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm
      code. These statistic is available per vm&vcpu under the kvm debugfs directory.
      As this statistic is low, but still some overhead it can be enabled via a
      .config entry and should be off by default.
      
      Since this patch touched all powerpc kvm_stat code anyway this code is now
      merged and simplified together with the exit timing statistic code (still
      working with exit timing disabled in .config).
      Signed-off-by: NChristian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      73e75b41
    • H
      KVM: ppc: directly insert shadow mappings into the hardware TLB · 7924bd41
      Hollis Blanchard 提交于
      Formerly, we used to maintain a per-vcpu shadow TLB and on every entry to the
      guest would load this array into the hardware TLB. This consumed 1280 bytes of
      memory (64 entries of 16 bytes plus a struct page pointer each), and also
      required some assembly to loop over the array on every entry.
      
      Instead of saving a copy in memory, we can just store shadow mappings directly
      into the hardware TLB, accepting that the host kernel will clobber these as
      part of the normal 440 TLB round robin. When we do that we need less than half
      the memory, and we have decreased the exit handling time for all guest exits,
      at the cost of increased number of TLB misses because the host overwrites some
      guest entries.
      
      These savings will be increased on processors with larger TLBs or which
      implement intelligent flush instructions like tlbivax (which will avoid the
      need to walk arrays in software).
      
      In addition to that and to the code simplification, we have a greater chance of
      leaving other host userspace mappings in the TLB, instead of forcing all
      subsequent tasks to re-fault all their mappings.
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      7924bd41
    • H
      KVM: ppc: create struct kvm_vcpu_44x and introduce container_of() accessor · db93f574
      Hollis Blanchard 提交于
      This patch doesn't yet move all 44x-specific data into the new structure, but
      is the first step down that path. In the future we may also want to create a
      struct kvm_vcpu_booke.
      
      Based on patch from Liu Yu <yu.liu@freescale.com>.
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      db93f574
    • H
      KVM: ppc: Rename "struct tlbe" to "struct kvmppc_44x_tlbe" · 0f55dc48
      Hollis Blanchard 提交于
      This will ease ports to other cores.
      
      Also remove unused "struct kvm_tlb" while we're at it.
      Signed-off-by: NHollis Blanchard <hollisb@us.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      0f55dc48
  19. 29 12月, 2008 1 次提交
  20. 21 12月, 2008 1 次提交
  21. 06 11月, 2008 1 次提交
    • P
      powerpc: Improve resolution of VDSO clock_gettime · 597bc5c0
      Paul Mackerras 提交于
      Currently the clock_gettime implementation in the VDSO produces a
      result with microsecond resolution for the cases that are handled
      without a system call, i.e. CLOCK_REALTIME and CLOCK_MONOTONIC.  The
      nanoseconds field of the result is obtained by computing a
      microseconds value and multiplying by 1000.
      
      This changes the code in the VDSO to do the computation for
      clock_gettime with nanosecond resolution.  That means that the
      resolution of the result will ultimately depend on the timebase
      frequency.
      
      Because the timestamp in the VDSO datapage (stamp_xsec, the real time
      corresponding to the timebase count in tb_orig_stamp) is in units of
      2^-20 seconds, it doesn't have sufficient resolution for computing a
      result with nanosecond resolution.  Therefore this adds a copy of
      xtime to the VDSO datapage and updates it in update_gtod() along with
      the other time-related fields.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      597bc5c0
  22. 15 10月, 2008 3 次提交
  23. 25 9月, 2008 1 次提交
    • B
      POWERPC: Allow 32-bit hashed pgtable code to support 36-bit physical · 4ee7084e
      Becky Bruce 提交于
      This rearranges a bit of code, and adds support for
      36-bit physical addressing for configs that use a
      hashed page table.  The 36b physical support is not
      enabled by default on any config - it must be
      explicitly enabled via the config system.
      
      This patch *only* expands the page table code to accomodate
      large physical addresses on 32-bit systems and enables the
      PHYS_64BIT config option for 86xx.  It does *not*
      allow you to boot a board with more than about 3.5GB of
      RAM - for that, SWIOTLB support is also required (and
      coming soon).
      Signed-off-by: NBecky Bruce <becky.bruce@freescale.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      4ee7084e
  24. 16 9月, 2008 1 次提交
    • P
      powerpc: Make it possible to move the interrupt handlers away from the kernel · 1f6a93e4
      Paul Mackerras 提交于
      This changes the way that the exception prologs transfer control to
      the handlers in 64-bit kernels with the aim of making it possible to
      have the prologs separate from the main body of the kernel.  Now,
      instead of computing the address of the handler by taking the top
      32 bits of the paca address (to get the 0xc0000000........ part) and
      ORing in something in the bottom 16 bits, we get the base address of
      the kernel by doing a load from the paca and add an offset.
      
      This also replaces an mfmsr and an ori to compute the MSR value for
      the handler with a load from the paca.  That makes it unnecessary to
      have a separate version of EXCEPTION_PROLOG_PSERIES that forces 64-bit
      mode.
      
      We can no longer use a direct branches in the exception prolog code,
      which means that the SLB miss handlers can't branch directly to
      .slb_miss_realmode any more.  Instead we have to compute the address
      and do an indirect branch.  This is conditional on CONFIG_RELOCATABLE;
      for non-relocatable kernels we use a direct branch as before.  (A later
      change will allow CONFIG_RELOCATABLE to be set on 64-bit powerpc.)
      
      Since the secondary CPUs on pSeries start execution in the first 0x100
      bytes of real memory and then have to get to wherever the kernel is,
      we can't use a direct branch to get there.  Instead this changes
      __secondary_hold_spinloop from a flag to a function pointer.  When it
      is set to a non-NULL value, the secondary CPUs jump to the function
      pointed to by that value.
      
      Finally this eliminates one code difference between 32-bit and 64-bit
      by making __secondary_hold be the text address of the secondary CPU
      spinloop rather than a function descriptor for it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      1f6a93e4
  25. 01 7月, 2008 1 次提交
    • M
      powerpc: Introduce VSX thread_struct and CONFIG_VSX · c6e6771b
      Michael Neuling 提交于
      The layout of the new VSR registers and how they overlap on top of the
      legacy FPR and VR registers is:
      
                         VSR doubleword 0               VSR doubleword 1
                ----------------------------------------------------------------
        VSR[0]  |             FPR[0]            |                              |
                ----------------------------------------------------------------
        VSR[1]  |             FPR[1]            |                              |
                ----------------------------------------------------------------
                |              ...              |                              |
                |              ...              |                              |
                ----------------------------------------------------------------
        VSR[30] |             FPR[30]           |                              |
                ----------------------------------------------------------------
        VSR[31] |             FPR[31]           |                              |
                ----------------------------------------------------------------
        VSR[32] |                             VR[0]                            |
                ----------------------------------------------------------------
        VSR[33] |                             VR[1]                            |
                ----------------------------------------------------------------
                |                              ...                             |
                |                              ...                             |
                ----------------------------------------------------------------
        VSR[62] |                             VR[30]                           |
                ----------------------------------------------------------------
        VSR[63] |                             VR[31]                           |
                ----------------------------------------------------------------
      
      VSX has 64 128bit registers.  The first 32 regs overlap with the FP
      registers and hence extend them with and additional 64 bits.  The
      second 32 regs overlap with the VMX registers.
      
      This commit introduces the thread_struct changes required to reflect
      this register layout.  Ptrace and signals code is updated so that the
      floating point registers are correctly accessed from the thread_struct
      when CONFIG_VSX is enabled.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c6e6771b
  26. 03 6月, 2008 1 次提交
    • K
      [POWERPC] 40x/Book-E: Save/restore volatile exception registers · fca622c5
      Kumar Gala 提交于
      On machines with more than one exception level any system register that
      might be modified by the "normal" exception level needs to be saved and
      restored on taking a higher level exception.  We already are saving
      and restoring ESR and DEAR.
      
      For critical level add SRR0/1.
      For debug level add CSRR0/1 and SRR0/1.
      For machine check level add DSRR0/1, CSRR0/1, and SRR0/1.
      
      On FSL Book-E parts we always save/restore the MAS registers for critical,
      debug, and machine check level exceptions.  On 44x we always save/restore
      the MMUCR.
      
      Additionally, we save and restore the ksp_limit since we have to adjust it
      for each exception level.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      fca622c5