1. 11 6月, 2014 7 次提交
  2. 07 6月, 2014 1 次提交
  3. 05 6月, 2014 3 次提交
  4. 30 5月, 2014 8 次提交
    • A
      KVM: PPC: Book3S PR: Rework SLB switching code · d8d164a9
      Alexander Graf 提交于
      On LPAR guest systems Linux enables the shadow SLB to indicate to the
      hypervisor a number of SLB entries that always have to be available.
      
      Today we go through this shadow SLB and disable all ESID's valid bits.
      However, pHyp doesn't like this approach very much and honors us with
      fancy machine checks.
      
      Fortunately the shadow SLB descriptor also has an entry that indicates
      the number of valid entries following. During the lifetime of a guest
      we can just swap that value to 0 and don't have to worry about the
      SLB restoration magic.
      
      While we're touching the code, let's also make it more readable (get
      rid of rldicl), allow it to deal with a dynamic number of bolted
      SLB entries and only do shadow SLB swizzling on LPAR systems.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      d8d164a9
    • A
      PPC: ePAPR: Fix hypercall on LE guest · 235959be
      Alexander Graf 提交于
      We get an array of instructions from the hypervisor via device tree that
      we write into a buffer that gets executed whenever we want to make an
      ePAPR compliant hypercall.
      
      However, the hypervisor passes us these instructions in BE order which
      we have to manually convert to LE when we want to run them in LE mode.
      
      With this fixup in place, I can successfully run LE kernels with KVM
      PV enabled on PR KVM.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      235959be
    • A
      KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler · ddca156a
      Aneesh Kumar K.V 提交于
      Use make_dsisr instead of open coding it. This also have
      the added benefit of handling alignment interrupt on additional
      instructions.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ddca156a
    • A
      PPC: KVM: Make NX bit available with magic page · 5c165aec
      Alexander Graf 提交于
      Because old kernels enable the magic page and then choke on NXed trampoline
      code we have to disable NX by default in KVM when we use the magic page.
      
      However, since commit b18db0b8 we have successfully fixed that and can now
      leave NX enabled, so tell the hypervisor about this.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5c165aec
    • A
      KVM: PPC: Book3S PR: Expose TAR facility to guest · e14e7a1e
      Alexander Graf 提交于
      POWER8 implements a new register called TAR. This register has to be
      enabled in FSCR and then from KVM's point of view is mere storage.
      
      This patch enables the guest to use TAR.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e14e7a1e
    • A
      KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR · 616dff86
      Alexander Graf 提交于
      POWER8 introduced a new interrupt type called "Facility unavailable interrupt"
      which contains its status message in a new register called FSCR.
      
      Handle these exits and try to emulate instructions for unhandled facilities.
      Follow-on patches enable KVM to expose specific facilities into the guest.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      616dff86
    • A
      KVM: PPC: Make shared struct aka magic page guest endian · 5deb8e7a
      Alexander Graf 提交于
      The shared (magic) page is a data structure that contains often used
      supervisor privileged SPRs accessible via memory to the user to reduce
      the number of exits we have to take to read/write them.
      
      When we actually share this structure with the guest we have to maintain
      it in guest endianness, because some of the patch tricks only work with
      native endian load/store operations.
      
      Since we only share the structure with either host or guest in little
      endian on book3s_64 pr mode, we don't have to worry about booke or book3s hv.
      
      For booke, the shared struct stays big endian. For book3s_64 hv we maintain
      the struct in host native endian, since it never gets shared with the guest.
      
      For book3s_64 pr we introduce a variable that tells us which endianness the
      shared struct is in and route every access to it through helper inline
      functions that evaluate this variable.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5deb8e7a
    • A
      KVM: PPC: BOOK3S: PR: Enable Little Endian PR guest · e5ee5422
      Aneesh Kumar K.V 提交于
      This patch make sure we inherit the LE bit correctly in different case
      so that we can run Little Endian distro in PR mode
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      e5ee5422
  5. 28 5月, 2014 11 次提交
    • S
      powerpc: Fix regression of per-CPU DSCR setting · 1739ea9e
      Sam bobroff 提交于
      Since commit "efcac658 powerpc: Per process DSCR + some fixes (try#4)"
      it is no longer possible to set the DSCR on a per-CPU basis.
      
      The old behaviour was to minipulate the DSCR SPR directly but this is no
      longer sufficient: the value is quickly overwritten by context switching.
      
      This patch stores the per-CPU DSCR value in a kernel variable rather than
      directly in the SPR and it is used whenever a process has not set the DSCR
      itself. The sysfs interface (/sys/devices/system/cpu/cpuN/dscr) is unchanged.
      
      Writes to the old global default (/sys/devices/system/cpu/dscr_default)
      now set all of the per-CPU values and reads return the last written value.
      
      The new per-CPU default is added to the paca_struct and is used everywhere
      outside of sysfs.c instead of the old global default.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1739ea9e
    • S
      powerpc: Split __SYSFS_SPRSETUP macro · 39a360ef
      Sam bobroff 提交于
      Split the __SYSFS_SPRSETUP macro into two parts so that registers requiring
      custom read and write functions can use common code for their show and store
      functions.
      Signed-off-by: NSam Bobroff <sam.bobroff@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      39a360ef
    • R
      arch: powerpc/fadump: Cleaning up inconsistent NULL checks · b717d985
      Rickard Strandqvist 提交于
      Cleaning up inconsistent NULL checks.
      There is otherwise a risk of a possible null pointer dereference.
      
      Was largely found by using a static code analysis program called cppcheck.
      Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b717d985
    • M
      powerpc: Check cpu_thread_in_subcore() in __cpu_up() · 6f5e40a3
      Michael Ellerman 提交于
      To support split core we need to change the check in __cpu_up() that
      determines if a cpu is allowed to come online.
      
      Currently we refuse to online cpus which are not the primary thread
      within their core.
      
      On POWER8 with split core support this check needs to instead refuse to
      online cpus which are not the primary thread within their *sub* core.
      
      On POWER7 and other systems that do not support split core,
      threads_per_subcore == threads_per_core and so the check is equivalent.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6f5e40a3
    • M
      powerpc: Add threads_per_subcore · 5853aef1
      Michael Ellerman 提交于
      On POWER8 we have a new concept of a subcore. This is what happens when
      you take a regular core and split it. A subcore is a grouping of two or
      four SMT threads, as well as a handfull of SPRs which allows the subcore
      to appear as if it were a core from the point of view of a guest.
      
      Unlike threads_per_core which is fixed at boot, threads_per_subcore can
      change while the system is running. Most code will not want to use
      threads_per_subcore.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5853aef1
    • M
      powerpc/powernv: Make it possible to skip the IRQHAPPENED check in power7_nap() · 8d6f7c5a
      Michael Ellerman 提交于
      To support split core we need to be able to force all secondaries into
      nap, so the core can detect they are idle and do an unsplit.
      
      Currently power7_nap() will return without napping if there is an irq
      pending. We want to ignore the pending irq and nap anyway, we will deal
      with the interrupt later.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d6f7c5a
    • M
      powerpc/kvm/book3s_hv: Rework the secondary inhibit code · 441c19c8
      Michael Ellerman 提交于
      As part of the support for split core on POWER8, we want to be able to
      block splitting of the core while KVM VMs are active.
      
      The logic to do that would be exactly the same as the code we currently
      have for inhibiting onlining of secondaries.
      
      Instead of adding an identical mechanism to block split core, rework the
      secondary inhibit code to be a "HV KVM is active" check. We can then use
      that in both the cpu hotplug code and the upcoming split core code.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Acked-by: NAlexander Graf <agraf@suse.de>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      441c19c8
    • N
      powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES · 64bb80d8
      Nishanth Aravamudan 提交于
      Based off fd1197f1 for ia64, enable CONFIG_HAVE_MEMORYLESS_NODES if
      NUMA. Initialize the local memory node in start_secondary.
      
      With this commit and the preceding to enable
      CONFIG_USER_PERCPU_NUMA_NODE_ID, which is a prerequisite, in a PowerKVM
      guest with the following topology:
      
      numactl --hardware
      available: 3 nodes (0-2)
      node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
      23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
      47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
      71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
      95 96 97 98 99
      node 0 size: 1998 MB
      node 0 free: 521 MB
      node 1 cpus: 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
      115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
      133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
      151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
      169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
      187 188 189 190 191 192 193 194 195 196 197 198 199
      node 1 size: 0 MB
      node 1 free: 0 MB
      node 2 cpus:
      node 2 size: 2039 MB
      node 2 free: 1739 MB
      node distances:
      node   0   1   2
        0:  10  40  40
        1:  40  10  40
        2:  40  40  10
      
      the unreclaimable slab is reduced by close to 130M:
      
      Before:
              Slab:             418176 kB
              SReclaimable:      26624 kB
              SUnreclaim:       391552 kB
      
      After:
              Slab:             298944 kB
              SReclaimable:      31744 kB
              SUnreclaim:       267200 kB
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      64bb80d8
    • N
      powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID · 8c272261
      Nishanth Aravamudan 提交于
      Based off 3bccd996 for ia64, convert powerpc to use the generic per-CPU
      topology tracking, specifically:
      
          initialize per cpu numa_node entry in start_secondary
          remove the powerpc cpu_to_node()
          define CONFIG_USE_PERCPU_NUMA_NODE_ID if NUMA
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8c272261
    • S
      powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode · 011e4b02
      Srivatsa S. Bhat 提交于
      If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
      (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
      get the following messages during boot:
      
      [    0.089866] POWER8 performance monitor hardware support registered
      [    0.089985] power8-pmu: PMAO restore workaround active.
      [    5.095419] Processor 1 is stuck.
      [   10.097933] Processor 2 is stuck.
      [   15.100480] Processor 3 is stuck.
      [   20.102982] Processor 4 is stuck.
      [   25.105489] Processor 5 is stuck.
      [   30.108005] Processor 6 is stuck.
      [   35.110518] Processor 7 is stuck.
      [   40.113369] Processor 9 is stuck.
      [   45.115879] Processor 10 is stuck.
      [   50.118389] Processor 11 is stuck.
      [   55.120904] Processor 12 is stuck.
      [   60.123425] Processor 13 is stuck.
      [   65.125970] Processor 14 is stuck.
      [   70.128495] Processor 15 is stuck.
      [   75.131316] Processor 17 is stuck.
      
      Note that only the sibling threads are stuck, while the primary threads (0, 8,
      16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
      that kexec tries to wakeup (bring online) the sibling threads of all the cores,
      before performing kexec:
      
      [ 9464.131231] Starting new kernel
      [ 9464.148507] kexec: Waking offline cpu 1.
      [ 9464.148552] kexec: Waking offline cpu 2.
      [ 9464.148600] kexec: Waking offline cpu 3.
      [ 9464.148636] kexec: Waking offline cpu 4.
      [ 9464.148671] kexec: Waking offline cpu 5.
      [ 9464.148708] kexec: Waking offline cpu 6.
      [ 9464.148743] kexec: Waking offline cpu 7.
      [ 9464.148779] kexec: Waking offline cpu 9.
      [ 9464.148815] kexec: Waking offline cpu 10.
      [ 9464.148851] kexec: Waking offline cpu 11.
      [ 9464.148887] kexec: Waking offline cpu 12.
      [ 9464.148922] kexec: Waking offline cpu 13.
      [ 9464.148958] kexec: Waking offline cpu 14.
      [ 9464.148994] kexec: Waking offline cpu 15.
      [ 9464.149030] kexec: Waking offline cpu 17.
      
      Instrumenting this piece of code revealed that the cpu_up() operation actually
      fails with -EBUSY. Thus, only the primary threads of all the cores are online
      during kexec, and hence this is a sure-shot receipe for disaster, as explained
      in commit e8e5c215 (powerpc/kexec: Fix orphaned offline CPUs across kexec),
      as well as in the comment above wake_offline_cpus().
      
      It turns out that cpu_up() was returning -EBUSY because the variable
      'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
      by migrate_to_reboot_cpu() inside kernel_kexec().
      
      Now, migrate_to_reboot_cpu() was originally written with the assumption that
      any further code will not need to perform CPU hotplug, since we are anyway in
      the reboot path. However, kexec is clearly not such a case, since we depend on
      onlining CPUs, atleast on powerpc.
      
      So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
      kexec path, to fix this regression in kexec on powerpc.
      
      Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
      can catch such issues more easily in the future.
      
      Fixes: c97102ba (kexec: migrate to reboot cpu)
      Cc: stable@vger.kernel.org
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      011e4b02
    • Y
      powerpc/PCI: Use pci_is_bridge() to simplify code · c888770e
      Yijing Wang 提交于
      Use pci_is_bridge() to simplify code.  No functional change.
      
      Requires: 326c1cda PCI: Rename pci_is_bridge() to pci_has_subordinate()
      Requires: 1c86438c PCI: Add new pci_is_bridge() interface
      Signed-off-by: NYijing Wang <wangyijing@huawei.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      c888770e
  6. 23 5月, 2014 4 次提交
  7. 20 5月, 2014 4 次提交
  8. 12 5月, 2014 1 次提交
    • A
      powerpc: irq work racing with timer interrupt can result in timer interrupt hang · 8050936c
      Anton Blanchard 提交于
      I am seeing an issue where a CPU running perf eventually hangs.
      Traces show timer interrupts happening every 4 seconds even
      when a userspace task is running on the CPU. /proc/timer_list
      also shows pending hrtimers have not run in over an hour,
      including the scheduler.
      
      Looking closer, decrementers_next_tb is getting set to
      0xffffffffffffffff, and at that point we will never take
      a timer interrupt again.
      
      In __timer_interrupt() we set decrementers_next_tb to
      0xffffffffffffffff and rely on ->event_handler to update it:
      
              *next_tb = ~(u64)0;
              if (evt->event_handler)
                      evt->event_handler(evt);
      
      In this case ->event_handler is hrtimer_interrupt. This will eventually
      call back through the clockevents code with the next event to be
      programmed:
      
      static int decrementer_set_next_event(unsigned long evt,
                                            struct clock_event_device *dev)
      {
              /* Don't adjust the decrementer if some irq work is pending */
              if (test_irq_work_pending())
                      return 0;
              __get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
      
      If irq work came in between these two points, we will return
      before updating decrementers_next_tb and we never process a timer
      interrupt again.
      
      This looks to have been introduced by 0215f7d8 (powerpc: Fix races
      with irq_work). Fix it by removing the early exit and relying on
      code later on in the function to force an early decrementer:
      
             /* We may have raced with new irq work */
             if (test_irq_work_pending())
                     set_dec(1);
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8050936c
  9. 07 5月, 2014 1 次提交