1. 09 3月, 2016 1 次提交
  2. 15 2月, 2016 1 次提交
  3. 27 1月, 2016 1 次提交
  4. 18 8月, 2015 1 次提交
    • G
      powerpc/eeh: Disable automatically blocked PCI config · 39bfd715
      Gavin Shan 提交于
      pcibios_set_pcie_reset_state() could be called to complete
      reset request when passing through PCI device, flag
      EEH_PE_ISOLATED is set before saving the PCI config sapce.
      On some Broadcom adapters, EEH_PE_CFG_BLOCKED is automatically
      set when the flag EEH_PE_ISOLATED is marked. It caused bogus
      data saved from the PCI config space, which will be restored
      to the PCI adapter after the reset. Eventually, the hardware
      can't work with corrupted data in PCI config space.
      
      The patch fixes the issue with eeh_pe_state_mark_no_cfg(), which
      doesn't set EEH_PE_CFG_BLOCKED when seeing EEH_PE_ISOLATED on the
      PE, in order to avoid the bogus data saved and restored to the PCI
      config space.
      Reported-by: NRajanikanth H. Adaveeshaiah <rajanikanth.ha@in.ibm.com>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      39bfd715
  5. 31 3月, 2015 1 次提交
    • G
      powerpc/eeh: Fix PE#0 check in eeh_add_to_parent_pe() · 433185d2
      Gavin Shan 提交于
      The function eeh_add_parent_pe() is used to create a PE or add one
      edev to its parent PE. Current code checks if PE#0 is valid for the
      later case. Actually, we should validate PE#0 for both cases when
      EEH core regards PE#0 as invalid one (without flag EEH_VALID_PE_ZERO).
      Otherwise, not all EEH devices can be added to its parent PE#0 for
      EEH on P7IOC.
      
      The patch fixes the issue by validating PE#0 for the two cases. So far,
      we don't have PE#0 for EEH on P7IOC, but it will show up when we enable
      M64 for P7IOC. The patch also makes the error message more meaningful.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      433185d2
  6. 24 3月, 2015 2 次提交
  7. 23 1月, 2015 2 次提交
    • G
      powerpc/eeh: Introduce flag EEH_PE_REMOVED · 432227e9
      Gavin Shan 提交于
      The conditions that one specific PE's frozen count exceeds the maximal
      allowed times (EEH_MAX_ALLOWED_FREEZES) and it's in isolated or recovery
      state indicate the PE was removed permanently implicitly. The patch
      introduces flag EEH_PE_REMOVED to indicate that explicitly so that we
      don't depend on the fixed maximal allowed times, which can be varied as
      we do in subsequent patch.
      
      Flag EEH_PE_REMOVED is expected to be marked for the PE whose frozen
      count exceeds the maximal allowed times, or just failed from recovery.
      Requested-by: NRyan Grimm <grimm@linux.vnet.ibm.com>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      432227e9
    • G
      powerpc/eeh: Fix missed PE#0 on P7IOC · 2aa5cf9e
      Gavin Shan 提交于
      PE#0 should be regarded as valid for P7IOC, while it's invalid for
      PHB3. The patch adds flag EEH_VALID_PE_ZERO to differentiate those
      two cases. Without the patch, we possibly see frozen PE#0 state is
      cleared without EEH recovery taken on P7IOC as following kernel logs
      indicate:
      
      [root@ltcfbl8eb ~]# dmesg
             :
      pci 0000:00     : [PE# 000] Secondary bus 0 associated with PE#0
      pci 0000:01     : [PE# 001] Secondary bus 1 associated with PE#1
      pci 0001:00     : [PE# 000] Secondary bus 0 associated with PE#0
      pci 0001:01     : [PE# 001] Secondary bus 1 associated with PE#1
      pci 0002:00     : [PE# 000] Secondary bus 0 associated with PE#0
      pci 0002:01     : [PE# 001] Secondary bus 1 associated with PE#1
      pci 0003:00     : [PE# 000] Secondary bus 0 associated with PE#0
      pci 0003:01     : [PE# 001] Secondary bus 1 associated with PE#1
      pci 0003:20     : [PE# 002] Secondary bus 32..63 associated with PE#2
             :
      EEH: Clear non-existing PHB#3-PE#0
      EEH: PHB location: U78AE.001.WZS00M9-P1-002
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2aa5cf9e
  8. 15 10月, 2014 2 次提交
    • G
      powerpc/eeh: Block PCI config access upon frozen PE · b6541db1
      Gavin Shan 提交于
      The problem was found when I tried to inject PCI config error by
      PHB3 PAPR error injection registers into Broadcom Austin 4-ports
      NIC adapter. The frozen PE was reported successfully and EEH core
      started to recover it. However, I run into fenced PHB when dumping
      PCI config space as EEH logs. I was told that PCI config requests
      should not be progagated to the adapter until PE reset is done
      successfully. Otherise, we would run out of PHB internal credits
      and trigger PCT (PCIE Completion Timeout), which leads to the
      fenced PHB.
      
      The patch introduces another PE flag EEH_PE_CFG_RESTRICTED, which
      is set during PE initialization time if the PE includes the specific
      PCI devices that need block PCI config access until PE reset is done.
      When the PE becomes frozen for the first time, EEH_PE_CFG_BLOCKED is
      set if the PE has flag EEH_PE_CFG_RESTRICTED. Then the PCI config
      access to the PE will be dropped by platform PCI accessors until
      PE reset is done successfully. The mechanism is shared by PowerNV
      platform owned PE or userland owned ones. It's not used on pSeries
      platform yet.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b6541db1
    • G
      powerpc/eeh: Fix condition for isolated state · 8315070c
      Gavin Shan 提交于
      Function eeh_pe_state_mark() could possibly have combination of
      multiple EEH PE state as its argument. The patch fixes the condition
      used to check if EEH_PE_ISOLATED is included.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8315070c
  9. 30 9月, 2014 1 次提交
    • G
      powerpc/eeh: Clear frozen device state in time · 22fca179
      Gavin Shan 提交于
      The problem was reported by Carol: In the scenario of passing mlx4
      adapter to guest, EEH error could be recovered successfully. When
      returning the device back to host, the driver (mlx4_core.ko)
      couldn't be loaded successfully because of error number -5 (-EIO)
      returned from mlx4_get_ownership(), which hits offlined PCI device.
      The root cause is that we missed to put the affected devices into
      normal state on clearing PE isolated state right after PE reset.
      
      The patch fixes above issue by putting the affected devices to
      normal state when clearing PE isolated state in eeh_pe_state_clear().
      
      Cc: stable@vger.kernel.org
      Reported-by: NCarol L. Soto <clsoto@us.ibm.com>
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      22fca179
  10. 25 9月, 2014 1 次提交
    • W
      powerpc/eeh: Fix kernel crash when passing through VF · 2a58222f
      Wei Yang 提交于
      When doing vfio passthrough a VF, the kernel will crash with following
      message:
      
      [  442.656459] Unable to handle kernel paging request for data at address 0x00000060
      [  442.656593] Faulting instruction address: 0xc000000000038b88
      [  442.656706] Oops: Kernel access of bad area, sig: 11 [#1]
      [  442.656798] SMP NR_CPUS=1024 NUMA PowerNV
      [  442.656890] Modules linked in: vfio_pci mlx4_core nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT xt_conntrack bnep bluetooth rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw tg3 nfsd be2net nfs_acl ses lockd ptp enclosure pps_core kvm_hv kvm_pr shpchp binfmt_misc kvm sunrpc uinput lpfc scsi_transport_fc ipr scsi_tgt [last unloaded: mlx4_core]
      [  442.658152] CPU: 40 PID: 14948 Comm: qemu-system-ppc Not tainted 3.10.42yw-pkvm+ #37
      [  442.658219] task: c000000f7e2a9a00 ti: c000000f6dc3c000 task.ti: c000000f6dc3c000
      [  442.658287] NIP: c000000000038b88 LR: c0000000004435a8 CTR: c000000000455bc0
      [  442.658352] REGS: c000000f6dc3f580 TRAP: 0300   Not tainted  (3.10.42yw-pkvm+)
      [  442.658419] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004882  XER: 20000000
      [  442.658577] CFAR: c00000000000908c DAR: 0000000000000060 DSISR: 40000000 SOFTE: 1
      GPR00: c0000000004435a8 c000000f6dc3f800 c0000000012b1c10 c00000000da24000
      GPR04: 0000000000000003 0000000000001004 00000000000015b3 000000000000ffff
      GPR08: c00000000127f5d8 0000000000000000 000000000000ffff 0000000000000000
      GPR12: c000000000068078 c00000000fdd6800 000001003c320c80 000001003c3607f0
      GPR16: 0000000000000001 00000000105480c8 000000001055aaa8 000001003c31ab18
      GPR20: 000001003c10fb40 000001003c360ae8 000000001063bcf0 000000001063bdb0
      GPR24: 000001003c15ed70 0000000010548f40 c000001fe5514c88 c000001fe5514cb0
      GPR28: c00000000da24000 0000000000000000 c00000000da24000 0000000000000003
      [  442.659471] NIP [c000000000038b88] .pcibios_set_pcie_reset_state+0x28/0x130
      [  442.659530] LR [c0000000004435a8] .pci_set_pcie_reset_state+0x28/0x40
      [  442.659585] Call Trace:
      [  442.659610] [c000000f6dc3f800] [00000000000719e0] 0x719e0 (unreliable)
      [  442.659677] [c000000f6dc3f880] [c0000000004435a8] .pci_set_pcie_reset_state+0x28/0x40
      [  442.659757] [c000000f6dc3f900] [c000000000455bf8] .reset_fundamental+0x38/0x80
      [  442.659835] [c000000f6dc3f980] [c0000000004562a8] .pci_dev_specific_reset+0xa8/0xf0
      [  442.659913] [c000000f6dc3fa00] [c0000000004448c4] .__pci_dev_reset+0x44/0x430
      [  442.659980] [c000000f6dc3fab0] [c000000000444d5c] .pci_reset_function+0x7c/0xc0
      [  442.660059] [c000000f6dc3fb30] [d00000001c141ab8] .vfio_pci_open+0xe8/0x2b0 [vfio_pci]
      [  442.660139] [c000000f6dc3fbd0] [c000000000586c30] .vfio_group_fops_unl_ioctl+0x3a0/0x630
      [  442.660219] [c000000f6dc3fc90] [c000000000255fbc] .do_vfs_ioctl+0x4ec/0x7c0
      [  442.660286] [c000000f6dc3fd80] [c000000000256364] .SyS_ioctl+0xd4/0xf0
      [  442.660354] [c000000f6dc3fe30] [c000000000009e54] syscall_exit+0x0/0x98
      [  442.660420] Instruction dump:
      [  442.660454] 4bfffce9 4bfffee4 7c0802a6 fbc1fff0 fbe1fff8 f8010010 f821ff81 7c7e1b78
      [  442.660566] 7c9f2378 60000000 60000000 e93e02c8 <e8690060> 2fa30000 41de00c4 2b9f0002
      [  442.660679] ---[ end trace a64ac9546bcf0328 ]---
      [  442.660724]
      
      The reason is current VF is not EEH enabled.
      
      This patch introduces a macro to convert eeh_dev to eeh_pe. By doing so, it
      will prevent converting with NULL pointer.
      Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      CC: Michael Ellerman <mpe@ellerman.id.au>
      
      V3 -> V4:
         1. move the macro definition from include/linux/pci.h to
            arch/powerpc/include/asm/eeh.h
      
      V2 -> V3:
         1. rebased on 3.17-rc4
         2. introduce a macro
         3. use this macro in several other places
      
      V1 -> V2:
         1. code style and patch subject adjustment
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2a58222f
  11. 05 8月, 2014 4 次提交
    • G
      powerpc/eeh: Aux PE data for error log · bb593c00
      Gavin Shan 提交于
      The patch allows PE (struct eeh_pe) instance to have auxillary data,
      whose size is configurable on basis of platform. For PowerNV, the
      auxillary data will be used to cache PHB diag-data for that PE
      (frozen PE or fenced PHB). In turn, we can retrieve the diag-data
      at any later points.
      
      It's useful for the case of VFIO PCI devices where the error log
      should be cached, and then be retrieved by the guest at later point.
      Also, it can avoid PHB diag-data overwritting if another frozen PE
      reported and the previous diag-data isn't fetched by guest.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      bb593c00
    • G
      powerpc/eeh: Replace pr_warning() with pr_warn() · 0dae2743
      Gavin Shan 提交于
      pr_warn() is equal to pr_warning(), but the former is a bit more
      formal according to commit fc62f2f1 ("kernel.h: add pr_warn for
      symmetry to dev_warn, netdev_warn").
      
      The patch replaces pr_warning() with pr_warn().
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0dae2743
    • M
      powerpc/eeh: Wrong place to call pci_get_slot() · 9e5c6e5a
      Mike Qiu 提交于
      pci_get_slot() is called with hold of PCI bus semaphore and it's not
      safe to be called in interrupt context. However, we possibly checks
      EEH error and calls the function in interrupt context. To avoid using
      pci_get_slot(), we turn into device tree for fetching location code.
      Otherwise, we might run into WARN_ON() as following messages indicate:
      
       WARNING: at drivers/pci/search.c:223
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.16.0-rc3+ #72
       task: c000000001367af0 ti: c000000001444000 task.ti: c000000001444000
       NIP: c000000000497b70 LR: c000000000037530 CTR: 000000003003d114
       REGS: c000000001446fa0 TRAP: 0700   Not tainted  (3.16.0-rc3+)
       MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI>  CR: 48002422  XER: 20000000
       CFAR: c00000000003752c SOFTE: 0
         :
       NIP [c000000000497b70] .pci_get_slot+0x40/0x110
       LR [c000000000037530] .eeh_pe_loc_get+0x150/0x190
       Call Trace:
         .of_get_property+0x30/0x60 (unreliable)
         .eeh_pe_loc_get+0x150/0x190
         .eeh_dev_check_failure+0x1b4/0x550
         .eeh_check_failure+0x90/0xf0
         .lpfc_sli_check_eratt+0x504/0x7c0 [lpfc]
         .lpfc_poll_eratt+0x64/0x100 [lpfc]
         .call_timer_fn+0x64/0x190
         .run_timer_softirq+0x2cc/0x3e0
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Qiu <qiudayu@linux.vnet.ibm.com>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9e5c6e5a
    • M
      powerpc/eeh: sysfs entries lost · dadcd6d6
      Mike Qiu 提交于
      The sysfs entries are lost because of commit 2213fb14 ("powerpc/eeh:
      Skip eeh sysfs when eeh is disabled"). That commit added condition
      to create sysfs entries with EEH_ENABLED, which isn't populated
      when trying to create sysfs entries on PowerNV platform during system
      boot time. The patch fixes the issue by:
      
         * Reoder EEH initialization functions so that they're same on
           PowerNV/pSeries.
         * Cache PE's primary bus by PowerNV platform instead of EEH core
           to avoid kernel crash caused by the function reorder. Another
           benefit with this is to avoid one eeh_probe_mode_dev() in EEH
           core.
      Signed-off-by: NMike Qiu <qiudayu@linux.vnet.ibm.com>
      Acked-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      dadcd6d6
  12. 11 6月, 2014 1 次提交
    • G
      powerpc/eeh: Dump PE location code · 357b2f3d
      Gavin Shan 提交于
      As Ben suggested, it's meaningful to dump PE's location code
      for site engineers when hitting EEH errors. The patch introduces
      function eeh_pe_loc_get() to retireve the location code from
      dev-tree so that we can output it when hitting EEH errors.
      
      If primary PE bus is root bus, the PHB's dev-node would be tried
      prior to root port's dev-node. Otherwise, the upstream bridge's
      dev-node of the primary PE bus will be check for the location code
      directly.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      357b2f3d
  13. 28 4月, 2014 1 次提交
    • G
      powerpc/eeh: No hotplug on permanently removed dev · d2b0f6f7
      Gavin Shan 提交于
      The issue was detected in a bit complicated test case where
      we have multiple hierarchical PEs shown as following figure:
      
                      +-----------------+
                      | PE#3     p2p#0  |
                      |          p2p#1  |
                      +-----------------+
                              |
                      +-----------------+
                      | PE#4     pdev#0 |
                      |          pdev#1 |
                      +-----------------+
      
      PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p
      bridges. We accidentally had less-known scenario: PE#4 was removed
      permanently from the system because of permanent failure (e.g.
      exceeding the max allowd failure times in last hour), then we detects
      EEH errors on PE#3 and tried to recover it. However, eeh_dev instances
      for pdev#0/1 were not detached from PE#4, which was still connected to
      PE#3. All of that was because of the fact that we rely on count-based
      pcibios_release_device(), which isn't reliable enough. When doing
      recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which
      are not valid any more. Eventually, we run into kernel crash.
      
      The patch fixes above issue from two aspects. For unplug, we simply
      skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED
      && !EEH_PE_STATE_RECOVERING) and its frozen count should be greater
      than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently
      removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read
      its PCI config so that PCI core will omit them.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d2b0f6f7
  14. 15 1月, 2014 2 次提交
  15. 24 7月, 2013 4 次提交
    • G
      powerpc/eeh: Don't use pci_dev during BAR restore · 4b83bd45
      Gavin Shan 提交于
      While restoring BARs for one specific PCI device, the pci_dev
      instance should have been released. So it's not reliable to use
      the pci_dev instance on restoring BARs. However, we still need
      some information (e.g. PCIe capability position, header type) from
      the pci_dev instance. So we have to store those information to
      EEH device in advance.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      4b83bd45
    • G
      powerpc/eeh: Use partial hotplug for EEH unaware drivers · f5c57710
      Gavin Shan 提交于
      When EEH error happens to one specific PE, some devices with drivers
      supporting EEH won't except hotplug on the device. However, there
      might have other deivces without driver, or with driver without EEH
      support. For the case, we need do partial hotplug in order to make
      sure that the PE becomes absolutely quite during reset. Otherise,
      the PE reset might fail and leads to failure of error recovery.
      
      The current code doesn't handle that 'mixed' case properly, it either
      uses the error callbacks to the drivers, or tries hotplug, but doesn't
      handle a PE (EEH domain) composed of a combination of the two.
      
      The patch intends to support so-called "partial" hotplug for EEH:
      Before we do reset, we stop and remove those PCI devices without
      EEH sensitive driver. The corresponding EEH devices are not detached
      from its PE, but with special flag. After the reset is done, those
      EEH devices with the special flag will be scanned one by one.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f5c57710
    • G
      powerpc/eeh: Use safe list traversal when walking EEH devices · 9feed42e
      Gavin Shan 提交于
      Currently, we're trasversing the EEH devices list using list_for_each_entry().
      That's not safe enough because the EEH devices might be removed from
      its parent PE while doing iteration. The patch replaces that with
      list_for_each_entry_safe().
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9feed42e
    • G
      powerpc/eeh: Keep PE during hotplug · 807a827d
      Gavin Shan 提交于
      When we do normal hotplug, the PE (shadow EEH structure) shouldn't be
      kept around.
      
      However, we need to keep it if the hotplug an artifial one caused by
      EEH errors recovery.
      
      Since we remove EEH device through the PCI hook pcibios_release_device(),
      the flag "purge_pe" passed to various functions is meaningless. So the patch
      removes the meaningless flag and introduce new flag "EEH_PE_KEEP"
      to save the PE while doing hotplug during EEH error recovery.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      807a827d
  16. 01 7月, 2013 1 次提交
  17. 25 6月, 2013 1 次提交
  18. 20 6月, 2013 6 次提交
  19. 04 1月, 2013 1 次提交
    • G
      POWERPC: drivers: remove __dev* attributes. · cad5cef6
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      __devinitconst, and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cad5cef6
  20. 26 11月, 2012 1 次提交
  21. 27 9月, 2012 1 次提交
    • A
      powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get · 7844663a
      Aneesh Kumar K.V 提交于
      Acked-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      
      =====================================
      [ BUG: bad unlock balance detected! ]
      3.6.0-rc5-00338-gcaa1d631-dirty #6 Not tainted
      -------------------------------------
      swapper/0/1 is trying to release lock (eeh_mutex) at:
      [<c000000000058218>] .eeh_add_to_parent_pe+0x318/0x410
      but there are no more locks to release!
      
      other info that might help us debug this:
      no locks held by swapper/0/1.
      
      stack backtrace:
      Call Trace:
      [c00000003e483870] [c000000000013310] .show_stack+0x70/0x1c0 (unreliable)
      [c00000003e483920] [c0000000000d8310] .print_unlock_inbalance_bug+0x110/0x120
      [c00000003e4839b0] [c0000000000d9a50] .lock_release+0x1d0/0x240
      [c00000003e483a60] [c000000000778064] .__mutex_unlock_slowpath+0xb4/0x250
      [c00000003e483b10] [c000000000058218] .eeh_add_to_parent_pe+0x318/0x410
      [c00000003e483bc0] [c00000000005a118] .pseries_eeh_of_probe+0x258/0x2f0
      [c00000003e483cc0] [c000000000032528] .traverse_pci_devices+0xa8/0x150
      [c00000003e483d70] [c000000000aa7288] .eeh_init+0xd4/0x140
      [c00000003e483e00] [c00000000000abc4] .do_one_initcall+0x64/0x1e0
      [c00000003e483ec0] [c000000000a90418] .kernel_init+0x1e8/0x2bc
      [c00000003e483f90] [c00000000002048c] .kernel_thread+0x54/0x70
      EEH: PCI Enhanced I/O Error Handling Enabled
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7844663a
  22. 18 9月, 2012 3 次提交
    • G
      powerpc/eeh: Global mutex to protect PE tree · ea81245c
      Gavin Shan 提交于
      We have missed lots of situations where the PE hierarchy tree need
      protection through the EEH global mutex. The patch fixes that for
      those public APIs implemented in eeh_pe.c. The only exception is
      eeh_pe_restore_bars() because it calls eeh_pe_dev_traverse(), which
      has been protected by the mutex.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ea81245c
    • G
      powerpc/eeh: Remove EEH PE for normal PCI hotplug · 20ee6a97
      Gavin Shan 提交于
      Function eeh_rmv_from_parent_pe() could be called by the path of
      either normal PCI hotplug, or EEH recovery. For the former case,
      we need purge the corresponding PE on removal of the associated
      PE bus.
      
      The patch tries to cover that by passing more information to function
      pcibios_remove_pci_devices() so that we know if the corresponding PE
      needs to be purged or be marked as "invalid".
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      20ee6a97
    • G
      powerpc/eeh: Introduce EEH_PE_INVALID type PE · 5efc3ad7
      Gavin Shan 提交于
      When EEH error happens on the PE whose PCI devices don't have
      attached drivers. In function eeh_handle_event(), the default
      value PCI_ERS_RESULT_NONE will be returned after iterating all
      drivers of those PCI devices belonging to the PE. Actually, we
      don't have installed drivers for the PCI devices. Under the
      circumstance, we will remove the corresponding PCI bus of the PE,
      including the associated EEH devices and PE instance. However,
      we still need the information stored in the PE instance to do PE
      reset after that. So it's unsafe to free the PE instance.
      
      The patch introduces EEH_PE_INVALID type PE to address the issue.
      When the PCI bus and the corresponding attached EEH devices are
      removed, we will mark the PE as EEH_PE_INVALID. At later point,
      the PE will be changed to EEH_PE_DEVICE or EEH_PE_BUS when the
      corresponding EEH devices are attached again.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5efc3ad7
  23. 10 9月, 2012 1 次提交