1. 08 7月, 2013 2 次提交
  2. 30 6月, 2013 9 次提交
    • G
      powerpc/eeh: Fix fetching bus for single-dev-PE · ea461abf
      Gavin Shan 提交于
      While running Linux as guest on top of phyp, we possiblly have
      PE that includes single PCI device. However, we didn't return
      its PCI bus correctly and it leads to failure on recovery from
      EEH errors for single-dev-PE. The patch fixes the issue.
      
      Cc: <stable@vger.kernel.org> # v3.7+
      Cc: Steve Best <sbest@us.ibm.com>
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ea461abf
    • A
      KVM: PPC: Ignore PIR writes · a3ff5fbc
      Alexander Graf 提交于
      While technically it's legal to write to PIR and have the identifier changed,
      we don't implement logic to do so because we simply expose vcpu_id to the guest.
      
      So instead, let's ignore writes to PIR. This ensures that we don't inject faults
      into the guest for something the guest is allowed to do. While at it, we cross
      our fingers hoping that it also doesn't mind that we broke its PIR read values.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a3ff5fbc
    • P
      KVM: PPC: Book3S PR: Invalidate SLB entries properly · 681562cd
      Paul Mackerras 提交于
      At present, if the guest creates a valid SLB (segment lookaside buffer)
      entry with the slbmte instruction, then invalidates it with the slbie
      instruction, then reads the entry with the slbmfee/slbmfev instructions,
      the result of the slbmfee will have the valid bit set, even though the
      entry is not actually considered valid by the host.  This is confusing,
      if not worse.  This fixes it by zeroing out the orige and origv fields
      of the SLB entry structure when the entry is invalidated.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      681562cd
    • P
      KVM: PPC: Book3S PR: Allow guest to use 1TB segments · 0f296829
      Paul Mackerras 提交于
      With this, the guest can use 1TB segments as well as 256MB segments.
      Since we now have the situation where a single emulated guest segment
      could correspond to multiple shadow segments (as the shadow segments
      are still 256MB segments), this adds a new kvmppc_mmu_flush_segment()
      to scan for all shadow segments that need to be removed.
      
      This restructures the guest HPT (hashed page table) lookup code to
      use the correct hashing and matching functions for HPTEs within a
      1TB segment.  We use the standard hpt_hash() function instead of
      open-coding the hash calculation, and we use HPTE_V_COMPARE() with
      an AVPN value that has the B (segment size) field included.  The
      calculation of avpn is done a little earlier since it doesn't change
      in the loop starting at the do_second label.
      
      The computation in kvmppc_mmu_book3s_64_esid_to_vsid() changes so that
      it returns a 256MB VSID even if the guest SLB entry is a 1TB entry.
      This is because the users of this function are creating 256MB SLB
      entries.  We set a new VSID_1T flag so that entries created from 1T
      segments don't collide with entries from 256MB segments.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      0f296829
    • P
      KVM: PPC: Book3S PR: Don't keep scanning HPTEG after we find a match · 6ed1485f
      Paul Mackerras 提交于
      The loop in kvmppc_mmu_book3s_64_xlate() that looks up a translation
      in the guest hashed page table (HPT) keeps going if it finds an
      HPTE that matches but doesn't allow access.  This is incorrect; it
      is different from what the hardware does, and there should never be
      more than one matching HPTE anyway.  This fixes it to stop when any
      matching HPTE is found.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      6ed1485f
    • P
      KVM: PPC: Book3S PR: Fix invalidation of SLB entry 0 on guest entry · bc1bc4e3
      Paul Mackerras 提交于
      On entering a PR KVM guest, we invalidate the whole SLB before loading
      up the guest entries.  We do this using an slbia instruction, which
      invalidates all entries except entry 0, followed by an slbie to
      invalidate entry 0.  However, the slbie turns out to be ineffective
      in some circumstances (specifically when the host linear mapping uses
      64k pages) because of errors in computing the parameter to the slbie.
      The result is that the guest kernel hangs very early in boot because
      it takes a DSI the first time it tries to access kernel data using
      a linear mapping address in real mode.
      
      Currently we construct bits 36 - 43 (big-endian numbering) of the slbie
      parameter by taking bits 56 - 63 of the SLB VSID doubleword.  These bits
      for the tlbie are C (class, 1 bit), B (segment size, 2 bits) and 5
      reserved bits.  For the SLB VSID doubleword these are C (class, 1 bit),
      reserved (1 bit), LP (large page size, 2 bits), and 4 reserved bits.
      Thus we are not setting the B field correctly, and when LP = 01 as
      it is for 64k pages, we are setting a reserved bit.
      
      Rather than add more instructions to calculate the slbie parameter
      correctly, this takes a simpler approach, which is to set entry 0 to
      zeroes explicitly.  Normally slbmte should not be used to invalidate
      an entry, since it doesn't invalidate the ERATs, but it is OK to use
      it to invalidate an entry if it is immediately followed by slbia,
      which does invalidate the ERATs.  (This has been confirmed with the
      Power architects.)  This approach takes fewer instructions and will
      work whatever the contents of entry 0.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      bc1bc4e3
    • P
      KVM: PPC: Book3S PR: Fix proto-VSID calculations · 8ed7b7e9
      Paul Mackerras 提交于
      This makes sure the calculation of the proto-VSIDs used by PR KVM
      is done with 64-bit arithmetic.  Since vcpu3s->context_id[] is int,
      when we do vcpu3s->context_id[0] << ESID_BITS the shift will be done
      with 32-bit instructions, possibly leading to significant bits
      getting lost, as the context id can be up to 524283 and ESID_BITS is
      18.  To fix this we cast the context id to u64 before shifting.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8ed7b7e9
    • T
      KVM: PPC: Guard doorbell exception with CONFIG_PPC_DOORBELL · 5f17ce8b
      Tiejun Chen 提交于
      Availablity of the doorbell_exception function is guarded by
      CONFIG_PPC_DOORBELL. Use the same define to guard our caller
      of it.
      Signed-off-by: NTiejun Chen <tiejun.chen@windriver.com>
      [agraf: improve patch description]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5f17ce8b
    • G
      powerpc/pci: Improve device hotplug initialization · 7846de40
      Guenter Roeck 提交于
      Commit 37f02195 (powerpc/pci: fix PCI-e devices rescan issue on powerpc
      platform) fixes a problem with interrupt and DMA initialization on hot
      plugged devices. With this commit, interrupt and DMA initialization for
      hot plugged devices is handled in the pci device enable function.
      
      This approach has a couple of drawbacks. First, it creates two code paths
      for device initialization, one for hot plugged devices and another for devices
      known during the initial PCI scan. Second, the initialization code for hot
      plugged devices is only called when the device is enabled, ie typically
      in the probe function. Also, the platform specific setup code is called each
      time pci_enable_device() is called, not only once during device discovery,
      meaning it is actually called multiple times, once for devices discovered
      during the initial scan and again each time a driver is re-loaded.
      
      The visible result is that interrupt pins are only assigned to hot plugged
      devices when the device driver is loaded. Effectively this changes the PCI
      probe API, since pci_dev->irq and the device's dma configuration will now
      only be valid after pci_enable() was called at least once. A more subtle
      change is that platform specific PCI device setup is moved from device
      discovery into the driver's probe function, more specifically into the
      pci_enable_device() call.
      
      To fix the inconsistencies, add new function pcibios_add_device.
      Call pcibios_setup_device from pcibios_setup_bus_devices if device setup
      is not complete, and from pcibios_add_device if bus setup is complete.
      
      With this change, device setup code is moved back into device initialization,
      and called exactly once for both static and hot plugged devices.
      
      [ This also fixes a regression introduced by the above patch which
        causes dev->irq to be overwritten under some cirumstances after
        MSIs have been enabled for the device which leads to crashes due
        to the MSI core "hijacking" dev->irq to store the base MSI number
        and not the LSI. --BenH
      ]
      
      Cc: Yuanquan Chen <Yuanquan.Chen@freescale.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Hiroo Matsumoto <matsumoto.hiroo@jp.fujitsu.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7846de40
  3. 29 6月, 2013 3 次提交
  4. 28 6月, 2013 1 次提交
  5. 26 6月, 2013 1 次提交
  6. 25 6月, 2013 1 次提交
  7. 20 6月, 2013 1 次提交
  8. 19 6月, 2013 2 次提交
  9. 15 6月, 2013 3 次提交
    • B
      powerpc: Fix missing/delayed calls to irq_work · 230b3034
      Benjamin Herrenschmidt 提交于
      When replaying interrupts (as a result of the interrupt occurring
      while soft-disabled), in the case of the decrementer, we are exclusively
      testing for a pending timer target. However we also use decrementer
      interrupts to trigger the new "irq_work", which in this case would
      be missed.
      
      This change the logic to force a replay in both cases of a timer
      boundary reached and a decrementer interrupt having actually occurred
      while disabled. The former test is still useful to catch cases where
      a CPU having been hard-disabled for a long time completely misses the
      interrupt due to a decrementer rollover.
      
      CC: <stable@vger.kernel.org> [v3.4+]
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      230b3034
    • P
      powerpc: Fix emulation of illegal instructions on PowerNV platform · bf593907
      Paul Mackerras 提交于
      Normally, the kernel emulates a few instructions that are unimplemented
      on some processors (e.g. the old dcba instruction), or privileged (e.g.
      mfpvr).  The emulation of unimplemented instructions is currently not
      working on the PowerNV platform.  The reason is that on these machines,
      unimplemented and illegal instructions cause a hypervisor emulation
      assist interrupt, rather than a program interrupt as on older CPUs.
      Our vector for the emulation assist interrupt just calls
      program_check_exception() directly, without setting the bit in SRR1
      that indicates an illegal instruction interrupt.  This fixes it by
      making the emulation assist interrupt set that bit before calling
      program_check_interrupt().  With this, old programs that use no-longer
      implemented instructions such as dcba now work again.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      bf593907
    • M
      powerpc: Fix stack overflow crash in resume_kernel when ftracing · 0e37739b
      Michael Ellerman 提交于
      It's possible for us to crash when running with ftrace enabled, eg:
      
        Bad kernel stack pointer bffffd12 at c00000000000a454
        cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
            pc: c00000000000a454: resume_kernel+0x34/0x60
            lr: c00000000000335c: performance_monitor_common+0x15c/0x180
            sp: bffffd12
           msr: 8000000000001032
           dar: bffffd12
         dsisr: 42000000
      
      If we look at current's stack (paca->__current->stack) we see it is
      equal to c0000002ecab0000. Our stack is 16K, and comparing to
      paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
      kernel stack. This leads to us writing over our struct thread_info, and
      in this case we have corrupted thread_info->flags and set
      _TIF_EMULATE_STACK_STORE.
      
      Dumping the stack we see:
      
        3:mon> t c0000002ecab0000
        [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
        [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
        --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
        [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
        [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
        [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
        [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
        [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
        [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
        [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
        --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
        [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
        [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
        [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
        [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
        [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
        [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
        --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
      
        ... and so on
      
      __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
      path. At that point the irq state is not consistent, ie. interrupts are
      hard disabled (by the exception entry), but the paca soft-enabled flag
      may be out of sync.
      
      This leads to the local_irq_restore() in trace_graph_entry() actually
      enabling interrupts, which we do not want. Because we have not yet
      reprogrammed the decrementer we immediately take another decrementer
      exception, and recurse.
      
      The fix is twofold. Firstly make sure we call DISABLE_INTS before
      calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
      the irq state in the paca with the hardware, making it safe again to
      call local_irq_save/restore().
      
      Although that should be sufficient to fix the bug, we also mark the
      runlatch routines as notrace. They are called very early in the
      exception entry and we are asking for trouble tracing them. They are
      also fairly uninteresting and tracing them just adds unnecessary
      overhead.
      
      [ This regression was introduced by fe1952fc
        "powerpc: Rework runlatch code" by myself --BenH
      ]
      
      CC: <stable@vger.kernel.org> [v3.4+]
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0e37739b
  10. 11 6月, 2013 5 次提交
  11. 10 6月, 2013 8 次提交
  12. 04 6月, 2013 2 次提交
  13. 01 6月, 2013 2 次提交