1. 03 11月, 2014 1 次提交
    • A
      powerpc: Convert power off logic to pm_power_off · 9178ba29
      Alexander Graf 提交于
      The generic Linux framework to power off the machine is a function pointer
      called pm_power_off. The trick about this pointer is that device drivers can
      potentially implement it rather than board files.
      
      Today on powerpc we set pm_power_off to invoke our generic full machine power
      off logic which then calls ppc_md.power_off to invoke machine specific power
      off.
      
      However, when we want to add a power off GPIO via the "gpio-poweroff" driver,
      this card house falls apart. That driver only registers itself if pm_power_off
      is NULL to ensure it doesn't override board specific logic. However, since we
      always set pm_power_off to the generic power off logic (which will just not
      power off the machine if no ppc_md.power_off call is implemented), we can't
      implement power off via the generic GPIO power off driver.
      
      To fix this up, let's get rid of the ppc_md.power_off logic and just always use
      pm_power_off as was intended. Then individual drivers such as the GPIO power off
      driver can implement power off logic via that function pointer.
      
      With this patch set applied and a few patches on top of QEMU that implement a
      power off GPIO on the virt e500 machine, I can successfully turn off my virtual
      machine after halt.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      [mpe: Squash into one patch and update changelog based on cover letter]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9178ba29
  2. 17 10月, 2014 1 次提交
  3. 16 10月, 2014 2 次提交
    • A
      2c186e05
    • M
      powerpc/pci: Fix IO space breakage after of_pci_range_to_resource() change · aeba3731
      Michael Ellerman 提交于
      Commit 0b0b0893 "of/pci: Fix the conversion of IO ranges into IO
      resources" changed the behaviour of of_pci_range_to_resource().
      
      Previously it simply populated the resource based on the arguments. Now
      it calls pci_register_io_range() and pci_address_to_pio(). These both
      have two implementations depending on whether PCI_IOBASE is defined,
      which it is not for powerpc.
      
      Further complicating matters, both routines are weak, and powerpc
      implements it's own version of one - pci_address_to_pio(). However
      powerpc's implementation depends on other initialisations which are done
      later in boot.
      
      The end result is incorrectly initialised IO space. Often we can get
      away with that, because we don't make much use of IO space. However
      virtio requires it, so we see eg:
      
        pci_bus 0000:00: root bus resource [io  0xffff] (bus address [0xffffffffffffffff-0xffffffffffffffff])
        PCI: Cannot allocate resource region 0 of device 0000:00:01.0, will remap
        virtio-pci 0000:00:01.0: can't enable device: BAR 0 [io  size 0x0020] not assigned
      
      The simplest fix for now is to just stop using of_pci_range_to_resource(),
      and open-code the original implementation, that's all we want it to do.
      
      Fixes: 0b0b0893 ("of/pci: Fix the conversion of IO ranges into IO resources")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aeba3731
  4. 15 10月, 2014 7 次提交
    • G
      powerpc/eeh: Don't collect logs on PE with blocked config space · c59004cc
      Gavin Shan 提交于
      When the PE's config space is marked as blocked, PCI config read
      requests always return 0xFF's. It's pointless to collect logs in
      this case.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c59004cc
    • G
      powerpc/eeh: Block PCI config access upon frozen PE · b6541db1
      Gavin Shan 提交于
      The problem was found when I tried to inject PCI config error by
      PHB3 PAPR error injection registers into Broadcom Austin 4-ports
      NIC adapter. The frozen PE was reported successfully and EEH core
      started to recover it. However, I run into fenced PHB when dumping
      PCI config space as EEH logs. I was told that PCI config requests
      should not be progagated to the adapter until PE reset is done
      successfully. Otherise, we would run out of PHB internal credits
      and trigger PCT (PCIE Completion Timeout), which leads to the
      fenced PHB.
      
      The patch introduces another PE flag EEH_PE_CFG_RESTRICTED, which
      is set during PE initialization time if the PE includes the specific
      PCI devices that need block PCI config access until PE reset is done.
      When the PE becomes frozen for the first time, EEH_PE_CFG_BLOCKED is
      set if the PE has flag EEH_PE_CFG_RESTRICTED. Then the PCI config
      access to the PE will be dropped by platform PCI accessors until
      PE reset is done successfully. The mechanism is shared by PowerNV
      platform owned PE or userland owned ones. It's not used on pSeries
      platform yet.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b6541db1
    • G
      powerpc/pseries: Drop config requests in EEH accessors · 3409eb4e
      Gavin Shan 提交于
      The pSeires EEH config accessors rely on rtas_{read, write}_config()
      and the condition to check if the PE's config space is blocked
      should be moved to those 2 functions so that config requests from
      kernel, userland, EEH core can be dropped to avoid recursive EEH error
      if necessary.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3409eb4e
    • G
      powerpc/eeh: Rename flag EEH_PE_RESET to EEH_PE_CFG_BLOCKED · 8a6b3710
      Gavin Shan 提交于
      The flag EEH_PE_RESET indicates blocking config space of the PE
      during reset time. We potentially need block PE's config space
      other than reset time. So it's reasonable to replace it with
      EEH_PE_CFG_BLOCKED to indicate its usage.
      
      There are no substantial code or logic changes in this patch.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8a6b3710
    • G
      powerpc/eeh: Fix condition for isolated state · 8315070c
      Gavin Shan 提交于
      Function eeh_pe_state_mark() could possibly have combination of
      multiple EEH PE state as its argument. The patch fixes the condition
      used to check if EEH_PE_ISOLATED is included.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8315070c
    • A
      powerpc: Rename __get_SP() to current_stack_pointer() · acf620ec
      Anton Blanchard 提交于
      Michael points out that __get_SP() is a pretty horrible
      function name. Let's give it a better name.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      acf620ec
    • A
      powerpc: Reimplement __get_SP() as a function not a define · bfe9a2cf
      Anton Blanchard 提交于
      Li Zhong points out an issue with our current __get_SP()
      implementation. If ftrace function tracing is enabled (ie -pg
      profiling using _mcount) we spill a stack frame on 64bit all the
      time.
      
      If a function calls __get_SP() and later calls a function that is
      tail call optimised, we will pop the stack frame and the value
      returned by __get_SP() is no longer valid. An example from Li can
      be found in save_stack_trace -> save_context_stack:
      
      c0000000000432c0 <.save_stack_trace>:
      c0000000000432c0:       mflr    r0
      c0000000000432c4:       std     r0,16(r1)
      c0000000000432c8:       stdu    r1,-128(r1) <-- stack frame for _mcount
      c0000000000432cc:       std     r3,112(r1)
      c0000000000432d0:       bl      <._mcount>
      c0000000000432d4:       nop
      
      c0000000000432d8:       mr      r4,r1 <-- __get_SP()
      
      c0000000000432dc:       ld      r5,632(r13)
      c0000000000432e0:       ld      r3,112(r1)
      c0000000000432e4:       li      r6,1
      
      c0000000000432e8:       addi    r1,r1,128 <-- pop stack frame
      
      c0000000000432ec:       ld      r0,16(r1)
      c0000000000432f0:       mtlr    r0
      c0000000000432f4:       b       <.save_context_stack> <-- tail call optimized
      
      save_context_stack ends up with a stack pointer below the current
      one, and it is likely to be scribbled over.
      
      Fix this by making __get_SP() a function which returns the
      callers stack frame. Also replace inline assembly which grabs
      the stack pointer in save_stack_trace and show_stack with
      __get_SP().
      
      This also fixes an issue with perf_arch_fetch_caller_regs().
      It currently unwinds the stack once, which will skip a
      valid stack frame on a leaf function. With the __get_SP() fixes
      in this patch, we never need to unwind the stack frame to get
      to the first interesting frame.
      
      We have to export __get_SP() because perf_arch_fetch_caller_regs()
      (which is used in modules) calls it from a header file.
      Reported-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bfe9a2cf
  5. 10 10月, 2014 2 次提交
  6. 03 10月, 2014 2 次提交
  7. 02 10月, 2014 4 次提交
  8. 01 10月, 2014 1 次提交
  9. 30 9月, 2014 14 次提交
  10. 25 9月, 2014 6 次提交
    • W
      powerpc/eeh: Fix kernel crash when passing through VF · 2a58222f
      Wei Yang 提交于
      When doing vfio passthrough a VF, the kernel will crash with following
      message:
      
      [  442.656459] Unable to handle kernel paging request for data at address 0x00000060
      [  442.656593] Faulting instruction address: 0xc000000000038b88
      [  442.656706] Oops: Kernel access of bad area, sig: 11 [#1]
      [  442.656798] SMP NR_CPUS=1024 NUMA PowerNV
      [  442.656890] Modules linked in: vfio_pci mlx4_core nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack bnep bluetooth rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw tg3 nfsd be2net nfs_acl ses lockd ptp enclosure pps_core kvm_hv kvm_pr shpchp binfmt_misc kvm sunrpc uinput lpfc scsi_transport_fc ipr scsi_tgt [last unloaded: mlx4_core]
      [  442.658152] CPU: 40 PID: 14948 Comm: qemu-system-ppc Not tainted 3.10.42yw-pkvm+ #37
      [  442.658219] task: c000000f7e2a9a00 ti: c000000f6dc3c000 task.ti: c000000f6dc3c000
      [  442.658287] NIP: c000000000038b88 LR: c0000000004435a8 CTR: c000000000455bc0
      [  442.658352] REGS: c000000f6dc3f580 TRAP: 0300   Not tainted  (3.10.42yw-pkvm+)
      [  442.658419] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004882  XER: 20000000
      [  442.658577] CFAR: c00000000000908c DAR: 0000000000000060 DSISR: 40000000 SOFTE: 1
      GPR00: c0000000004435a8 c000000f6dc3f800 c0000000012b1c10 c00000000da24000
      GPR04: 0000000000000003 0000000000001004 00000000000015b3 000000000000ffff
      GPR08: c00000000127f5d8 0000000000000000 000000000000ffff 0000000000000000
      GPR12: c000000000068078 c00000000fdd6800 000001003c320c80 000001003c3607f0
      GPR16: 0000000000000001 00000000105480c8 000000001055aaa8 000001003c31ab18
      GPR20: 000001003c10fb40 000001003c360ae8 000000001063bcf0 000000001063bdb0
      GPR24: 000001003c15ed70 0000000010548f40 c000001fe5514c88 c000001fe5514cb0
      GPR28: c00000000da24000 0000000000000000 c00000000da24000 0000000000000003
      [  442.659471] NIP [c000000000038b88] .pcibios_set_pcie_reset_state+0x28/0x130
      [  442.659530] LR [c0000000004435a8] .pci_set_pcie_reset_state+0x28/0x40
      [  442.659585] Call Trace:
      [  442.659610] [c000000f6dc3f800] [00000000000719e0] 0x719e0 (unreliable)
      [  442.659677] [c000000f6dc3f880] [c0000000004435a8] .pci_set_pcie_reset_state+0x28/0x40
      [  442.659757] [c000000f6dc3f900] [c000000000455bf8] .reset_fundamental+0x38/0x80
      [  442.659835] [c000000f6dc3f980] [c0000000004562a8] .pci_dev_specific_reset+0xa8/0xf0
      [  442.659913] [c000000f6dc3fa00] [c0000000004448c4] .__pci_dev_reset+0x44/0x430
      [  442.659980] [c000000f6dc3fab0] [c000000000444d5c] .pci_reset_function+0x7c/0xc0
      [  442.660059] [c000000f6dc3fb30] [d00000001c141ab8] .vfio_pci_open+0xe8/0x2b0 [vfio_pci]
      [  442.660139] [c000000f6dc3fbd0] [c000000000586c30] .vfio_group_fops_unl_ioctl+0x3a0/0x630
      [  442.660219] [c000000f6dc3fc90] [c000000000255fbc] .do_vfs_ioctl+0x4ec/0x7c0
      [  442.660286] [c000000f6dc3fd80] [c000000000256364] .SyS_ioctl+0xd4/0xf0
      [  442.660354] [c000000f6dc3fe30] [c000000000009e54] syscall_exit+0x0/0x98
      [  442.660420] Instruction dump:
      [  442.660454] 4bfffce9 4bfffee4 7c0802a6 fbc1fff0 fbe1fff8 f8010010 f821ff81 7c7e1b78
      [  442.660566] 7c9f2378 60000000 60000000 e93e02c8 <e8690060> 2fa30000 41de00c4 2b9f0002
      [  442.660679] ---[ end trace a64ac9546bcf0328 ]---
      [  442.660724]
      
      The reason is current VF is not EEH enabled.
      
      This patch introduces a macro to convert eeh_dev to eeh_pe. By doing so, it
      will prevent converting with NULL pointer.
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      
      V3 -> V4:
         1. move the macro definition from include/linux/pci.h to
            arch/powerpc/include/asm/eeh.h
      
      V2 -> V3:
         1. rebased on 3.17-rc4
         2. introduce a macro
         3. use this macro in several other places
      
      V1 -> V2:
         1. code style and patch subject adjustment
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2a58222f
    • M
      powerpc/ppc64: Print CPU/MMU/FW features at boot · 87d99c0e
      Michael Ellerman 提交于
      "Helps debug funky firmware issues".
      
      After:
        Starting Linux PPC64 #108 SMP Wed Aug 6 19:04:51 EST 2014
        -----------------------------------------------------
        ppc64_pft_size    = 0x1a
        phys_mem_size     = 0x200000000
        cpu_features      = 0x17fc7a6c18500249
          possible        = 0x1fffffff18700649
          always          = 0x0000000000000040
        cpu_user_features = 0xdc0065c2 0xee000000
        mmu_features      = 0x5a000001
        firmware_features = 0x00000001405a440b
        htab_hash_mask    = 0x7ffff
        -----------------------------------------------------
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      87d99c0e
    • M
      powerpc/ppc64: Clean up the boot-time settings display · bdce97e9
      Michael Ellerman 提交于
      At boot we display a bunch of low level settings which can be useful to
      know, and can help to spot bugs when things are fundamentally
      misconfigured.
      
      At the moment they are very widely spaced, so that we can accommodate
      the line:
      
        ppc64_caches.dcache_line_size = 0xYY
      
      But we only print that line when the cache line size is not 128, ie.
      almost never, so it just makes the display look odd usually.
      
      The ppc64_caches prefix is redundant so remove it, which means we can
      align things a bit closer for the common case. While we're there
      replace the last use of camelCase (physicalMemorySize), and use
      phys_mem_size.
      
      Before:
        Starting Linux PPC64 #104 SMP Wed Aug 6 18:41:34 EST 2014
        -----------------------------------------------------
        ppc64_pft_size                = 0x1a
        physicalMemorySize            = 0x200000000
        ppc64_caches.dcache_line_size = 0xf0
        ppc64_caches.icache_line_size = 0xf0
        htab_address                  = 0xdeadbeef
        htab_hash_mask                = 0x7ffff
        physical_start                = 0xf000bar
        -----------------------------------------------------
      
      After:
        Starting Linux PPC64 #103 SMP Wed Aug 6 18:38:04 EST 2014
        -----------------------------------------------------
        ppc64_pft_size    = 0x1a
        phys_mem_size     = 0x200000000
        dcache_line_size  = 0xf0
        icache_line_size  = 0xf0
        htab_address      = 0xdeadbeef
        htab_hash_mask    = 0x7ffff
        physical_start    = 0xf000bar
        -----------------------------------------------------
      
      This patch is final, no bike shedding ;)
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bdce97e9
    • L
      powerpc: Only set numa node information for present cpus at boottime · bc3c4327
      Li Zhong 提交于
      As Nish suggested, it makes more sense to init the numa node informatiion
      for present cpus at boottime, which could also avoid WARN_ON(1) in
      numa_setup_cpu().
      
      With this change, we also need to change the smp_prepare_cpus() to set up
      numa information only on present cpus.
      
      For those possible, but not present cpus, their numa information
      will be set up after they are started, as the original code did before commit
      2fabf084.
      
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Acked-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Tested-by: NCyril Bur <cyril.bur@au1.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bc3c4327
    • M
      powerpc: Check flat device tree version at boot · ad72a279
      Michael Ellerman 提交于
      In commit e6a6928c "of/fdt: Convert FDT functions to use libfdt",
      the kernel stopped supporting old flat device tree formats. The minimum
      supported version is now 0x10.
      
      There was a checking function added, early_init_dt_verify(), but it's
      not called on powerpc.
      
      The result is, if you boot with an old flat device tree, the kernel will
      fail to parse it correctly, think you have no memory etc. and hilarity
      ensues.
      
      We can't really fix it, but we can at least catch the fact that the
      device tree is in an unsupported format and panic(). We can't call
      BUG(), it's too early.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ad72a279
    • P
      powerpc/powernv: Don't call generic code on offline cpus · d6a4f709
      Paul Mackerras 提交于
      On PowerNV platforms, when a CPU is offline, we put it into nap mode.
      It's possible that the CPU wakes up from nap mode while it is still
      offline due to a stray IPI.  A misdirected device interrupt could also
      potentially cause it to wake up.  In that circumstance, we need to clear
      the interrupt so that the CPU can go back to nap mode.
      
      In the past the clearing of the interrupt was accomplished by briefly
      enabling interrupts and allowing the normal interrupt handling code
      (do_IRQ() etc.) to handle the interrupt.  This has the problem that
      this code calls irq_enter() and irq_exit(), which call functions such
      as account_system_vtime() which use RCU internally.  Use of RCU is not
      permitted on offline CPUs and will trigger errors if RCU checking is
      enabled.
      
      To avoid calling into any generic code which might use RCU, we adopt
      a different method of clearing interrupts on offline CPUs.  Since we
      are on the PowerNV platform, we know that the system interrupt
      controller is a XICS being driven directly (i.e. not via hcalls) by
      the kernel.  Hence this adds a new icp_native_flush_interrupt()
      function to the native-mode XICS driver and arranges to call that
      when an offline CPU is woken from nap.  This new function reads the
      interrupt from the XICS.  If it is an IPI, it clears the IPI; if it
      is a device interrupt, it prints a warning and disables the source.
      Then it does the end-of-interrupt processing for the interrupt.
      
      The other thing that briefly enabling interrupts did was to check and
      clear the irq_happened flag in this CPU's PACA.  Therefore, after
      flushing the interrupt from the XICS, we also clear all bits except
      the PACA_IRQ_HARD_DIS (interrupts are hard disabled) bit from the
      irq_happened flag.  The PACA_IRQ_HARD_DIS flag is set by power7_nap()
      and is left set to indicate that interrupts are hard disabled.  This
      means we then have to ignore that flag in power7_nap(), which is
      reasonable since it doesn't indicate that any interrupt event needs
      servicing.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d6a4f709