1. 14 12月, 2014 1 次提交
  2. 08 12月, 2014 1 次提交
    • P
      powerpc/powernv: Return to cpu offline loop when finished in KVM guest · 56548fc0
      Paul Mackerras 提交于
      When a secondary hardware thread has finished running a KVM guest, we
      currently put that thread into nap mode using a nap instruction in
      the KVM code.  This changes the code so that instead of doing a nap
      instruction directly, we instead cause the call to power7_nap() that
      put the thread into nap mode to return.  The reason for doing this is
      to avoid having the KVM code having to know what low-power mode to
      put the thread into.
      
      In the case of a secondary thread used to run a KVM guest, the thread
      will be offline from the point of view of the host kernel, and the
      relevant power7_nap() call is the one in pnv_smp_cpu_disable().
      In this case we don't want to clear pending IPIs in the offline loop
      in that function, since that might cause us to miss the wakeup for
      the next time the thread needs to run a guest.  To tell whether or
      not to clear the interrupt, we use the SRR1 value returned from
      power7_nap(), and check if it indicates an external interrupt.  We
      arrange that the return from power7_nap() when we have finished running
      a guest returns 0, so pending interrupts don't get flushed in that
      case.
      
      Note that it is important a secondary thread that has finished
      executing in the guest, or that didn't have a guest to run, should
      not return to power7_nap's caller while the kvm_hstate.hwthread_req
      flag in the PACA is non-zero, because the return from power7_nap
      will reenable the MMU, and the MMU might still be in guest context.
      In this situation we spin at low priority in real mode waiting for
      hwthread_req to become zero.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      56548fc0
  3. 05 12月, 2014 1 次提交
    • A
      powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault · aefa5688
      Aneesh Kumar K.V 提交于
      upatepp can get called for a nohpte fault when we find from the linux
      page table that the translation was hashed before. In that case
      we are sure that there is no existing translation, hence we could
      avoid doing tlbie.
      
      We could possibly race with a parallel fault filling the TLB. But
      that should be ok because updatepp is only ever relaxing permissions.
      We also look at linux pte permission bits when filling hash pte
      permission bits. We also hold the linux pte busy bits while
      inserting/updating a hashpte entry, hence a paralle update of
      linux pte is not possible. On the other hand mprotect involves
      ptep_modify_prot_start which cause a hpte invalidate and not updatepp.
      
      Performance number:
      We use randbox_access_bench written by Anton.
      
      Kernel with THP disabled and smaller hash page table size.
      
          86.60%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_updatepp
           2.10%  random_access_b  random_access_bench              [.] doit
           1.99%  random_access_b  [kernel.kallsyms]                [k] .do_raw_spin_lock
           1.85%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           1.26%  random_access_b  [kernel.kallsyms]                [k] .native_flush_hash_range
           1.18%  random_access_b  [kernel.kallsyms]                [k] .__delay
           0.69%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           0.37%  random_access_b  [kernel.kallsyms]                [k] .clear_user_page
           0.34%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           0.32%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           0.30%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
      
      With Fix:
      
          27.54%  random_access_b  random_access_bench              [.] doit
          22.90%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           5.76%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           5.20%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           5.12%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           4.80%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
           3.31%  random_access_b  [kernel.kallsyms]                [k] data_access_common
           1.84%  random_access_b  [kernel.kallsyms]                [k] .trace_hardirqs_on_caller
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aefa5688
  4. 02 12月, 2014 4 次提交
  5. 19 11月, 2014 2 次提交
    • M
      powerpc: Remove more traces of bootmem · e39f223f
      Michael Ellerman 提交于
      Although we are now selecting NO_BOOTMEM, we still have some traces of
      bootmem lying around. That is because even with NO_BOOTMEM there is
      still a shim that converts bootmem calls into memblock calls, but
      ultimately we want to remove all traces of bootmem.
      
      Most of the patch is conversions from alloc_bootmem() to
      memblock_virt_alloc(). In general a call such as:
      
        p = (struct foo *)alloc_bootmem(x);
      
      Becomes:
      
        p = memblock_virt_alloc(x, 0);
      
      We don't need the cast because memblock_virt_alloc() returns a void *.
      The alignment value of zero tells memblock to use the default alignment,
      which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.
      
      We remove a number of NULL checks on the result of
      memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
      if it can't allocate, in exactly the same way as alloc_bootmem(), so the
      NULL checks are and always have been redundant.
      
      The memory returned by memblock_virt_alloc() is already zeroed, so we
      remove several memsets of the result of memblock_virt_alloc().
      
      Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
      to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
      because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
      to that is pointless, 16XB ought to be enough for anyone.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e39f223f
    • L
      powerpc/pseries: Initialise nvram_pstore_info's buf_lock · a49ab6ee
      Li Zhong 提交于
      nvram_pstore_info's buf_lock is not initialized before registering,
      which is clearly incorrect.
      
      It causes some strange behavior when trying to obtain the lock during
      kdump process.
      
      On a UP configuration, the console stopped for a couple of seconds, then
      "lockup suspected" warning printed out, but then it continued to run.
      
      So try lock fails, and lockup reported, but then arch_spin_lock()
      passes.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      [mpe: Edited changelog]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a49ab6ee
  6. 17 11月, 2014 1 次提交
    • N
      rtc/tpo: Driver to support rtc and wakeup on PowerNV platform · 16b1d26e
      Neelesh Gupta 提交于
      The patch implements the OPAL rtc driver that binds with the rtc
      driver subsystem. The driver uses the platform device infrastructure
      to probe the rtc device and register it to rtc class framework. The
      'wakeup' is supported depending upon the property 'has-tpo' present
      in the OF node. It provides a way to load the generic rtc driver in
      in the absence of an OPAL driver.
      
      The patch also moves the existing OPAL rtc get/set time interfaces to the
      new driver and exposes the necessary OPAL calls using EXPORT_SYMBOL_GPL.
      
      Test results:
      -------------
      Host:
      [root@tul169p1 ~]# ls -l /sys/class/rtc/
      total 0
      lrwxrwxrwx 1 root root 0 Oct 14 03:07 rtc0 -> ../../devices/opal-rtc/rtc/rtc0
      [root@tul169p1 ~]# cat /sys/devices/opal-rtc/rtc/rtc0/time
      08:10:07
      [root@tul169p1 ~]# echo `date '+%s' -d '+ 2 minutes'` > /sys/class/rtc/rtc0/wakealarm
      [root@tul169p1 ~]# cat /sys/class/rtc/rtc0/wakealarm
      1413274345
      [root@tul169p1 ~]#
      
      FSP:
      $ smgr mfgState
      standby
      $ rtim timeofday
      
      System time is valid: 2014/10/14 08:12:04.225115
      
      $ smgr mfgState
      ipling
      $
      
      CC: devicetree@vger.kernel.org
      CC: tglx@linutronix.de
      CC: rtc-linux@googlegroups.com
      CC: a.zummo@towertech.it
      Signed-off-by: NNeelesh Gupta <neelegup@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      16b1d26e
  7. 14 11月, 2014 8 次提交
  8. 12 11月, 2014 1 次提交
  9. 10 11月, 2014 7 次提交
  10. 08 11月, 2014 1 次提交
  11. 05 11月, 2014 1 次提交
  12. 03 11月, 2014 2 次提交
    • A
      powerpc: Convert power off logic to pm_power_off · 9178ba29
      Alexander Graf 提交于
      The generic Linux framework to power off the machine is a function pointer
      called pm_power_off. The trick about this pointer is that device drivers can
      potentially implement it rather than board files.
      
      Today on powerpc we set pm_power_off to invoke our generic full machine power
      off logic which then calls ppc_md.power_off to invoke machine specific power
      off.
      
      However, when we want to add a power off GPIO via the "gpio-poweroff" driver,
      this card house falls apart. That driver only registers itself if pm_power_off
      is NULL to ensure it doesn't override board specific logic. However, since we
      always set pm_power_off to the generic power off logic (which will just not
      power off the machine if no ppc_md.power_off call is implemented), we can't
      implement power off via the generic GPIO power off driver.
      
      To fix this up, let's get rid of the ppc_md.power_off logic and just always use
      pm_power_off as was intended. Then individual drivers such as the GPIO power off
      driver can implement power off logic via that function pointer.
      
      With this patch set applied and a few patches on top of QEMU that implement a
      power off GPIO on the virt e500 machine, I can successfully turn off my virtual
      machine after halt.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      [mpe: Squash into one patch and update changelog based on cover letter]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9178ba29
    • C
      powerpc: Replace __get_cpu_var uses · 69111bac
      Christoph Lameter 提交于
      This still has not been merged and now powerpc is the only arch that does
      not have this change. Sorry about missing linuxppc-dev before.
      
      V2->V2
        - Fix up to work against 3.18-rc1
      
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      [mpe: Fix build errors caused by set/or_softirq_pending(), and rework
            assignment in __set_breakpoint() to use memcpy().]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      69111bac
  13. 02 11月, 2014 1 次提交
    • D
      powerpc: use device_online/offline() instead of cpu_up/down() · 10ccaf17
      Dan Streetman 提交于
      In powerpc pseries platform dlpar operations, use device_online() and
      device_offline() instead of cpu_up() and cpu_down().
      
      Calling cpu_up/down() directly does not update the cpu device offline
      field, which is used to online/offline a cpu from sysfs. Calling
      device_online/offline() instead keeps the sysfs cpu online value
      correct. The hotplug lock, which is required to be held when calling
      device_online/offline(), is already held when dlpar_online/offline_cpu()
      are called, since they are called only from cpu_probe|release_store().
      
      This patch fixes errors on phyp (PowerVM) systems that have cpu(s)
      added/removed using dlpar operations; without this patch, the
      /sys/devices/system/cpu/cpuN/online nodes do not correctly show the
      online state of added/removed cpus.
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Fixes: 0902a904 ("Driver core: Use generic offline/online for CPU offline/online")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      10ccaf17
  14. 31 10月, 2014 1 次提交
  15. 30 10月, 2014 1 次提交
    • H
      powerpc/fadump: Fix endianess issues in firmware assisted dump handling · 408cddd9
      Hari Bathini 提交于
      Firmware-assisted dump (fadump) kernel code is not endian safe. The
      below patch fixes this issue. Tested this patch with upstream kernel.
      Below output shows crash tool successfully opening LE fadump vmcore.
      
          # crash vmlinux vmcore
          GNU gdb (GDB) 7.6
          This GDB was configured as "powerpc64le-unknown-linux-gnu"...
      
                KERNEL: vmlinux
              DUMPFILE: vmcore
          	CPUS: 16
          	DATE: Wed Dec 31 19:00:00 1969
                UPTIME: 00:03:28
          LOAD AVERAGE: 0.46, 0.86, 0.41
                 TASKS: 268
              NODENAME: linux-dhr2
               RELEASE: 3.17.0-rc5-7-default
               VERSION: #6 SMP Tue Sep 30 01:06:34 EDT 2014
               MACHINE: ppc64le  (4116 Mhz)
                MEMORY: 40 GB
                 PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details)
          	 PID: 6223
               COMMAND: "bash"
          	TASK: c0000009661b2500  [THREAD_INFO: c000000967ac0000]
          	 CPU: 2
                 STATE: TASK_RUNNING (PANIC)
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      [mpe: Make the comment in pSeries_lpar_hptab_clear() clearer]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      408cddd9
  16. 28 10月, 2014 1 次提交
  17. 23 10月, 2014 1 次提交
    • J
      powernv: Use _GLOBAL_TOC for opal wrappers · 19d36c21
      Jeremy Kerr 提交于
      Currently, we can't call opal wrappers from modules when using the LE
      ABIv2, which requires a TOC init. If we do we'll try and load the opal
      entry point using the wrong toc and probably explode or worse jump to
      the wrong address.
      
      Nothing in upstream is making opal calls from a module, but we do export
      one of the wrappers so we should fix this anyway.
      
      This change uses the _GLOBAL_TOC() macro (rather than _GLOBAL) for the
      opal wrappers, so that we can do non-local calls to them.
      Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      19d36c21
  18. 15 10月, 2014 5 次提交