1. 17 3月, 2015 3 次提交
    • N
      powerpc/pseries: Implement memory hotplug add in the kernel · 5f97b2a0
      Nathan Fontenot 提交于
      This patch adds the ability to do memory hotplug add in the kernel.
      
      Currently the operation to hotplug add memory is handled by the drmgr
      command which performs the operation by performing some work in user-space
      and making requests to the kernel to handle other pieces. By moving all
      of the work to the kernel we can do the add faster, and provide a common
      code path to do memory hotplug for both the PowerVM and PowerKVM environments.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5f97b2a0
    • N
      powerpc/pseries: Create new device hotplug entry point · 999e2dad
      Nathan Fontenot 提交于
      The current hotplug (or dlpar) of devices (the process is generally the
      same for memory, cpu, and pci) on PowerVM systems is initiated
      from the HMC, which communicates the request to the partitions through
      the RSCT framework. The RSCT framework then invokes the drmgr command.
      The drmgr command performs the hotplug operation by doing some pieces,
      such as most of the rtas calls and device tree parsing, in userspace
      and make requests to the kernel to online/offline the device, update the
      device tree and add/remove the device.
      
      For PowerKVM the approach for device hotplug is to follow what is currently
      being done for pci hotplug. A hotplug request is initiated from the host.
      QEMU then generates an EPOW interrupt to the guest which causes the guest
      to make the rtas,check-exception call. In QEMU, the rtas,check-exception call
      returns a rtas hotplug event to the guest.
      
      Please note that the current pci hotplug path for PowerKVM involves the
      kernel receiving the rtas hotplug event, passing it to rtas_errd in
      userspace, and having rtas_errd invoke drmgr. The drmgr command then
      handles the request as described above for PowerVM systems.
      
      There is no need for this circuitous route, we should just handle the entire
      hotplug of devices in the kernel. What I am planning is to enable this
      by moving the code to handle hotplug from drmgr into the kernel to
      provide a single path for handling device hotplug for both PowerVM and
      PowerKVM systems. This patch provides the common iframework and entry point.
      For PowerKVM a future update to the kernel rtas code will recognize rtas
      hotplug events returned from rtas,check-exception calls and use the common
      entry point to handle hotplug of the device.
      
      For PowerVM systems, This patch creates /sys/kernel/dlpar that can be
      used by the drmgr command to initiate hotplug requests. In order to do
      this a string of the format "<resource> <action> <id_type> <id>" is
      written to this file. The string consists of a resource (cpu, memory, pci,
      phb), an action (add or remove), an id_type (count, drc index, drc name),
      and the corresponding id. The kernel will parse the string and create a
      rtas hotplug section that can be passed to the common entry point for
      handling hotplug requests.
      
      It should be noted that there is no chance of updating how we receive
      hotplug (dlpar) requests from the HMC on PowerVM systems.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      999e2dad
    • N
      powerpc/pseries: Declare the acquire/release drc index routines · 5e51d3c2
      Nathan Fontenot 提交于
      Add declarations for dlpar_{acquire,release}_drc(...)
      
      They are already marked non-static but were missing a prototype/
      
      [BenH: Added extern to be consistent with the rest of the file]
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5e51d3c2
  2. 04 3月, 2015 1 次提交
  3. 27 1月, 2015 1 次提交
    • C
      powerpc/pseries: Fix endian problems with LE migration · 3df76a9d
      Cyril Bur 提交于
      RTAS events require arguments be passed in big endian while hypercalls
      have their arguments passed in registers and the values should therefore
      be in CPU endian.
      
      The "ibm,suspend_me" 'RTAS' call makes a sequence of hypercalls to setup
      one true RTAS call. This means that "ibm,suspend_me" is handled
      specially in the ppc_rtas() syscall.
      
      The ppc_rtas() syscall has its arguments in big endian and can therefore
      pass these arguments directly to the RTAS call. "ibm,suspend_me" is
      handled specially from within ppc_rtas() (by calling rtas_ibm_suspend_me())
      which has left an endian bug on little endian systems due to the
      requirement of hypercalls. The return value from rtas_ibm_suspend_me()
      gets returned in cpu endian, and is left unconverted, also a bug on
      little endian systems.
      
      rtas_ibm_suspend_me() does not actually make use of the rtas_args that
      it is passed. This patch removes the convoluted use of the rtas_args
      struct to pass params to rtas_ibm_suspend_me() in favour of passing what
      it needs as actual arguments. This patch also ensures the two callers of
      rtas_ibm_suspend_me() pass function parameters in cpu endian and in the
      case of ppc_rtas(), converts the return value.
      
      migrate_store() (the other caller of rtas_ibm_suspend_me()) is from a
      sysfs file which deals with everything in cpu endian so this function
      only underwent cleanup.
      
      This patch has been tested with KVM both LE and BE and on PowerVM both
      LE and BE. Under QEMU/KVM the migration happens without touching these
      code pathes.
      
      For PowerVM there is no obvious regression on BE and the LE code path
      now provides the correct parameters to the hypervisor.
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3df76a9d
  4. 23 1月, 2015 2 次提交
  5. 29 12月, 2014 2 次提交
    • C
      powerpc/pseries: relocate "config DTL" so kconfig nests properly · e3a8446a
      Cody P Schafer 提交于
      Moving config DTL up so it is below config PPC_SPLPAR means that
      menuconfig will show config DTL nicely indented right below config
      PPC_SPLPAR when PPC_SPLPAR is enabled.
      
      To contrast that, right now if I enable PPC_SPLPAR in menuconfig, all I
      can immediately tell is that "something showed up further down the list
      where I wasn't looking", and I end up having to toggle the option a few
      times to figure out what showed up, or look at the KConfig to find out
      that config DTL depends on config PPC_SPLPAR.
      Signed-off-by: NCody P Schafer <cody@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e3a8446a
    • H
      powerpc/kdump: Ignore failure in enabling big endian exception during crash · c1caae3d
      Hari Bathini 提交于
      In LE kernel, we currently have a hack for kexec that resets the exception
      endian before starting a new kernel as the kernel that is loaded could be a
      big endian or a little endian kernel. In kdump case, resetting exception
      endian fails when one or more cpus is disabled. But we can ignore the failure
      and still go ahead, as in most cases crashkernel will be of same endianess
      as primary kernel and reseting endianess is not even needed in those cases.
      This patch adds a new inline function to say if this is kdump path. This
      function is used at places where such a check is needed.
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      [mpe: Rename to kdump_in_progress(), use bool, and edit comment]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c1caae3d
  6. 05 12月, 2014 1 次提交
    • A
      powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault · aefa5688
      Aneesh Kumar K.V 提交于
      upatepp can get called for a nohpte fault when we find from the linux
      page table that the translation was hashed before. In that case
      we are sure that there is no existing translation, hence we could
      avoid doing tlbie.
      
      We could possibly race with a parallel fault filling the TLB. But
      that should be ok because updatepp is only ever relaxing permissions.
      We also look at linux pte permission bits when filling hash pte
      permission bits. We also hold the linux pte busy bits while
      inserting/updating a hashpte entry, hence a paralle update of
      linux pte is not possible. On the other hand mprotect involves
      ptep_modify_prot_start which cause a hpte invalidate and not updatepp.
      
      Performance number:
      We use randbox_access_bench written by Anton.
      
      Kernel with THP disabled and smaller hash page table size.
      
          86.60%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_updatepp
           2.10%  random_access_b  random_access_bench              [.] doit
           1.99%  random_access_b  [kernel.kallsyms]                [k] .do_raw_spin_lock
           1.85%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           1.26%  random_access_b  [kernel.kallsyms]                [k] .native_flush_hash_range
           1.18%  random_access_b  [kernel.kallsyms]                [k] .__delay
           0.69%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           0.37%  random_access_b  [kernel.kallsyms]                [k] .clear_user_page
           0.34%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           0.32%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           0.30%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
      
      With Fix:
      
          27.54%  random_access_b  random_access_bench              [.] doit
          22.90%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           5.76%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           5.20%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           5.12%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           4.80%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
           3.31%  random_access_b  [kernel.kallsyms]                [k] data_access_common
           1.84%  random_access_b  [kernel.kallsyms]                [k] .trace_hardirqs_on_caller
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aefa5688
  7. 02 12月, 2014 1 次提交
  8. 25 11月, 2014 1 次提交
    • G
      of/reconfig: Always use the same structure for notifiers · f5242e5a
      Grant Likely 提交于
      The OF_RECONFIG notifier callback uses a different structure depending
      on whether it is a node change or a property change. This is silly, and
      not very safe. Rework the code to use the same data structure regardless
      of the type of notifier.
      Signed-off-by: NGrant Likely <grant.likely@linaro.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Pantelis Antoniou <pantelis.antoniou@konsulko.com>
      Cc: <linuxppc-dev@lists.ozlabs.org>
      f5242e5a
  9. 24 11月, 2014 1 次提交
  10. 23 11月, 2014 1 次提交
  11. 19 11月, 2014 1 次提交
  12. 10 11月, 2014 4 次提交
  13. 05 11月, 2014 1 次提交
  14. 03 11月, 2014 2 次提交
    • A
      powerpc: Convert power off logic to pm_power_off · 9178ba29
      Alexander Graf 提交于
      The generic Linux framework to power off the machine is a function pointer
      called pm_power_off. The trick about this pointer is that device drivers can
      potentially implement it rather than board files.
      
      Today on powerpc we set pm_power_off to invoke our generic full machine power
      off logic which then calls ppc_md.power_off to invoke machine specific power
      off.
      
      However, when we want to add a power off GPIO via the "gpio-poweroff" driver,
      this card house falls apart. That driver only registers itself if pm_power_off
      is NULL to ensure it doesn't override board specific logic. However, since we
      always set pm_power_off to the generic power off logic (which will just not
      power off the machine if no ppc_md.power_off call is implemented), we can't
      implement power off via the generic GPIO power off driver.
      
      To fix this up, let's get rid of the ppc_md.power_off logic and just always use
      pm_power_off as was intended. Then individual drivers such as the GPIO power off
      driver can implement power off logic via that function pointer.
      
      With this patch set applied and a few patches on top of QEMU that implement a
      power off GPIO on the virt e500 machine, I can successfully turn off my virtual
      machine after halt.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      [mpe: Squash into one patch and update changelog based on cover letter]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9178ba29
    • C
      powerpc: Replace __get_cpu_var uses · 69111bac
      Christoph Lameter 提交于
      This still has not been merged and now powerpc is the only arch that does
      not have this change. Sorry about missing linuxppc-dev before.
      
      V2->V2
        - Fix up to work against 3.18-rc1
      
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      CC: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      [mpe: Fix build errors caused by set/or_softirq_pending(), and rework
            assignment in __set_breakpoint() to use memcpy().]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      69111bac
  15. 02 11月, 2014 1 次提交
    • D
      powerpc: use device_online/offline() instead of cpu_up/down() · 10ccaf17
      Dan Streetman 提交于
      In powerpc pseries platform dlpar operations, use device_online() and
      device_offline() instead of cpu_up() and cpu_down().
      
      Calling cpu_up/down() directly does not update the cpu device offline
      field, which is used to online/offline a cpu from sysfs. Calling
      device_online/offline() instead keeps the sysfs cpu online value
      correct. The hotplug lock, which is required to be held when calling
      device_online/offline(), is already held when dlpar_online/offline_cpu()
      are called, since they are called only from cpu_probe|release_store().
      
      This patch fixes errors on phyp (PowerVM) systems that have cpu(s)
      added/removed using dlpar operations; without this patch, the
      /sys/devices/system/cpu/cpuN/online nodes do not correctly show the
      online state of added/removed cpus.
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Fixes: 0902a904 ("Driver core: Use generic offline/online for CPU offline/online")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      10ccaf17
  16. 30 10月, 2014 1 次提交
    • H
      powerpc/fadump: Fix endianess issues in firmware assisted dump handling · 408cddd9
      Hari Bathini 提交于
      Firmware-assisted dump (fadump) kernel code is not endian safe. The
      below patch fixes this issue. Tested this patch with upstream kernel.
      Below output shows crash tool successfully opening LE fadump vmcore.
      
          # crash vmlinux vmcore
          GNU gdb (GDB) 7.6
          This GDB was configured as "powerpc64le-unknown-linux-gnu"...
      
                KERNEL: vmlinux
              DUMPFILE: vmcore
          	CPUS: 16
          	DATE: Wed Dec 31 19:00:00 1969
                UPTIME: 00:03:28
          LOAD AVERAGE: 0.46, 0.86, 0.41
                 TASKS: 268
              NODENAME: linux-dhr2
               RELEASE: 3.17.0-rc5-7-default
               VERSION: #6 SMP Tue Sep 30 01:06:34 EDT 2014
               MACHINE: ppc64le  (4116 Mhz)
                MEMORY: 40 GB
                 PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details)
          	 PID: 6223
               COMMAND: "bash"
          	TASK: c0000009661b2500  [THREAD_INFO: c000000967ac0000]
          	 CPU: 2
                 STATE: TASK_RUNNING (PANIC)
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      [mpe: Make the comment in pSeries_lpar_hptab_clear() clearer]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      408cddd9
  17. 15 10月, 2014 2 次提交
  18. 03 10月, 2014 1 次提交
    • A
      powerpc/iommu/ddw: Fix endianness · 9410e018
      Alexey Kardashevskiy 提交于
      rtas_call() accepts and returns values in CPU endianness.
      The ddw_query_response and ddw_create_response structs members are
      defined and treated as BE but as they are passed to rtas_call() as
      (u32 *) and they get byteswapped automatically, the data is CPU-endian.
      This fixes ddw_query_response and ddw_create_response definitions and use.
      
      of_read_number() is designed to work with device tree cells - it assumes
      the input is big-endian and returns data in CPU-endian. However due
      to the ddw_create_response struct fix, create.addr_hi/lo are already
      CPU-endian so do not byteswap them.
      
      ddw_avail is a pointer to the "ibm,ddw-applicable" property which contains
      3 cells which are big-endian as it is a device tree. rtas_call() accepts
      a RTAS token in CPU-endian. This makes use of of_property_read_u32_array
      to byte swap and avoid the need for a number of be32_to_cpu calls.
      
      Cc: stable@vger.kernel.org # v3.13+
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      [aik: folded Anton's patch with of_property_read_u32_array]
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9410e018
  19. 02 10月, 2014 2 次提交
  20. 30 9月, 2014 3 次提交
  21. 25 9月, 2014 6 次提交
  22. 23 9月, 2014 1 次提交
  23. 09 9月, 2014 1 次提交