1. 28 5月, 2014 9 次提交
    • M
      powerpc: Add threads_per_subcore · 5853aef1
      Michael Ellerman 提交于
      On POWER8 we have a new concept of a subcore. This is what happens when
      you take a regular core and split it. A subcore is a grouping of two or
      four SMT threads, as well as a handfull of SPRs which allows the subcore
      to appear as if it were a core from the point of view of a guest.
      
      Unlike threads_per_core which is fixed at boot, threads_per_subcore can
      change while the system is running. Most code will not want to use
      threads_per_subcore.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5853aef1
    • M
      powerpc/powernv: Make it possible to skip the IRQHAPPENED check in power7_nap() · 8d6f7c5a
      Michael Ellerman 提交于
      To support split core we need to be able to force all secondaries into
      nap, so the core can detect they are idle and do an unsplit.
      
      Currently power7_nap() will return without napping if there is an irq
      pending. We want to ignore the pending irq and nap anyway, we will deal
      with the interrupt later.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8d6f7c5a
    • M
      powerpc/kvm/book3s_hv: Rework the secondary inhibit code · 441c19c8
      Michael Ellerman 提交于
      As part of the support for split core on POWER8, we want to be able to
      block splitting of the core while KVM VMs are active.
      
      The logic to do that would be exactly the same as the code we currently
      have for inhibiting onlining of secondaries.
      
      Instead of adding an identical mechanism to block split core, rework the
      secondary inhibit code to be a "HV KVM is active" check. We can then use
      that in both the cpu hotplug code and the upcoming split core code.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Acked-by: NAlexander Graf <agraf@suse.de>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      441c19c8
    • N
      powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES · 64bb80d8
      Nishanth Aravamudan 提交于
      Based off fd1197f1 for ia64, enable CONFIG_HAVE_MEMORYLESS_NODES if
      NUMA. Initialize the local memory node in start_secondary.
      
      With this commit and the preceding to enable
      CONFIG_USER_PERCPU_NUMA_NODE_ID, which is a prerequisite, in a PowerKVM
      guest with the following topology:
      
      numactl --hardware
      available: 3 nodes (0-2)
      node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
      23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
      47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
      71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
      95 96 97 98 99
      node 0 size: 1998 MB
      node 0 free: 521 MB
      node 1 cpus: 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114
      115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
      133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150
      151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168
      169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
      187 188 189 190 191 192 193 194 195 196 197 198 199
      node 1 size: 0 MB
      node 1 free: 0 MB
      node 2 cpus:
      node 2 size: 2039 MB
      node 2 free: 1739 MB
      node distances:
      node   0   1   2
        0:  10  40  40
        1:  40  10  40
        2:  40  40  10
      
      the unreclaimable slab is reduced by close to 130M:
      
      Before:
              Slab:             418176 kB
              SReclaimable:      26624 kB
              SUnreclaim:       391552 kB
      
      After:
              Slab:             298944 kB
              SReclaimable:      31744 kB
              SUnreclaim:       267200 kB
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      64bb80d8
    • N
      powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID · 8c272261
      Nishanth Aravamudan 提交于
      Based off 3bccd996 for ia64, convert powerpc to use the generic per-CPU
      topology tracking, specifically:
      
          initialize per cpu numa_node entry in start_secondary
          remove the powerpc cpu_to_node()
          define CONFIG_USE_PERCPU_NUMA_NODE_ID if NUMA
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8c272261
    • B
      Merge branch 'merge' into next · 86969cf7
      Benjamin Herrenschmidt 提交于
      Merge the binutils and kexec fixes.
      86969cf7
    • S
      powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode · 011e4b02
      Srivatsa S. Bhat 提交于
      If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
      (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
      get the following messages during boot:
      
      [    0.089866] POWER8 performance monitor hardware support registered
      [    0.089985] power8-pmu: PMAO restore workaround active.
      [    5.095419] Processor 1 is stuck.
      [   10.097933] Processor 2 is stuck.
      [   15.100480] Processor 3 is stuck.
      [   20.102982] Processor 4 is stuck.
      [   25.105489] Processor 5 is stuck.
      [   30.108005] Processor 6 is stuck.
      [   35.110518] Processor 7 is stuck.
      [   40.113369] Processor 9 is stuck.
      [   45.115879] Processor 10 is stuck.
      [   50.118389] Processor 11 is stuck.
      [   55.120904] Processor 12 is stuck.
      [   60.123425] Processor 13 is stuck.
      [   65.125970] Processor 14 is stuck.
      [   70.128495] Processor 15 is stuck.
      [   75.131316] Processor 17 is stuck.
      
      Note that only the sibling threads are stuck, while the primary threads (0, 8,
      16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
      that kexec tries to wakeup (bring online) the sibling threads of all the cores,
      before performing kexec:
      
      [ 9464.131231] Starting new kernel
      [ 9464.148507] kexec: Waking offline cpu 1.
      [ 9464.148552] kexec: Waking offline cpu 2.
      [ 9464.148600] kexec: Waking offline cpu 3.
      [ 9464.148636] kexec: Waking offline cpu 4.
      [ 9464.148671] kexec: Waking offline cpu 5.
      [ 9464.148708] kexec: Waking offline cpu 6.
      [ 9464.148743] kexec: Waking offline cpu 7.
      [ 9464.148779] kexec: Waking offline cpu 9.
      [ 9464.148815] kexec: Waking offline cpu 10.
      [ 9464.148851] kexec: Waking offline cpu 11.
      [ 9464.148887] kexec: Waking offline cpu 12.
      [ 9464.148922] kexec: Waking offline cpu 13.
      [ 9464.148958] kexec: Waking offline cpu 14.
      [ 9464.148994] kexec: Waking offline cpu 15.
      [ 9464.149030] kexec: Waking offline cpu 17.
      
      Instrumenting this piece of code revealed that the cpu_up() operation actually
      fails with -EBUSY. Thus, only the primary threads of all the cores are online
      during kexec, and hence this is a sure-shot receipe for disaster, as explained
      in commit e8e5c215 (powerpc/kexec: Fix orphaned offline CPUs across kexec),
      as well as in the comment above wake_offline_cpus().
      
      It turns out that cpu_up() was returning -EBUSY because the variable
      'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
      by migrate_to_reboot_cpu() inside kernel_kexec().
      
      Now, migrate_to_reboot_cpu() was originally written with the assumption that
      any further code will not need to perform CPU hotplug, since we are anyway in
      the reboot path. However, kexec is clearly not such a case, since we depend on
      onlining CPUs, atleast on powerpc.
      
      So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
      kexec path, to fix this regression in kexec on powerpc.
      
      Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
      can catch such issues more easily in the future.
      
      Fixes: c97102ba (kexec: migrate to reboot cpu)
      Cc: stable@vger.kernel.org
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      011e4b02
    • G
      powerpc: Fix 64 bit builds with binutils 2.24 · 7998eb3d
      Guenter Roeck 提交于
      With binutils 2.24, various 64 bit builds fail with relocation errors
      such as
      
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_base_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_end_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      
      The assembler maintainer says:
      
       I changed the ABI, something that had to be done but unfortunately
       happens to break the booke kernel code.  When building up a 64-bit
       value with lis, ori, shl, oris, ori or similar sequences, you now
       should use @high and @higha in place of @h and @ha.  @h and @ha
       (and their associated relocs R_PPC64_ADDR16_HI and R_PPC64_ADDR16_HA)
       now report overflow if the value is out of 32-bit signed range.
       ie. @h and @ha assume you're building a 32-bit value. This is needed
       to report out-of-range -mcmodel=medium toc pointer offsets in @toc@h
       and @toc@ha expressions, and for consistency I did the same for all
       other @h and @ha relocs.
      
      Replacing @h with @high in one strategic location fixes the relocation
      errors. This has to be done conditionally since the assembler either
      supports @h or @high but not both.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7998eb3d
    • B
      Merge remote-tracking branch 'scott/next' into next · b9d80095
      Benjamin Herrenschmidt 提交于
      <<
      Highlights include a few new boards, a device tree binding for CCF
      (including backwards-compatible device tree updates to distinguish
      incompatible versions), and some fixes.
      >>
      b9d80095
  2. 23 5月, 2014 21 次提交
  3. 20 5月, 2014 9 次提交
  4. 12 5月, 2014 1 次提交
    • A
      powerpc: irq work racing with timer interrupt can result in timer interrupt hang · 8050936c
      Anton Blanchard 提交于
      I am seeing an issue where a CPU running perf eventually hangs.
      Traces show timer interrupts happening every 4 seconds even
      when a userspace task is running on the CPU. /proc/timer_list
      also shows pending hrtimers have not run in over an hour,
      including the scheduler.
      
      Looking closer, decrementers_next_tb is getting set to
      0xffffffffffffffff, and at that point we will never take
      a timer interrupt again.
      
      In __timer_interrupt() we set decrementers_next_tb to
      0xffffffffffffffff and rely on ->event_handler to update it:
      
              *next_tb = ~(u64)0;
              if (evt->event_handler)
                      evt->event_handler(evt);
      
      In this case ->event_handler is hrtimer_interrupt. This will eventually
      call back through the clockevents code with the next event to be
      programmed:
      
      static int decrementer_set_next_event(unsigned long evt,
                                            struct clock_event_device *dev)
      {
              /* Don't adjust the decrementer if some irq work is pending */
              if (test_irq_work_pending())
                      return 0;
              __get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
      
      If irq work came in between these two points, we will return
      before updating decrementers_next_tb and we never process a timer
      interrupt again.
      
      This looks to have been introduced by 0215f7d8 (powerpc: Fix races
      with irq_work). Fix it by removing the early exit and relying on
      code later on in the function to force an early decrementer:
      
             /* We may have raced with new irq work */
             if (test_irq_work_pending())
                     set_dec(1);
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8050936c