1. 20 6月, 2013 7 次提交
  2. 10 1月, 2013 1 次提交
  3. 18 9月, 2012 2 次提交
    • G
      powerpc/eeh: Fix crash on converting OF node to edev · 1e38b714
      Gavin Shan 提交于
      The kernel crash was reported by Alexy. He was testing some feature
      with private kernel, in which Alexy added some code in pci_pm_reset()
      to read the CSR after writting it. The bug could be reproduced on
      Fiber Channel card (Fibre Channel: Emulex Corporation Saturn-X:
      LightPulse Fibre Channel Host Adapter (rev 03)) by the following
      commands.
      
      	# echo 1 > /sys/devices/pci0004:01/0004:01:00.0/reset
      	# rmmod lpfc
      	# modprobe lpfc
      
      The history behind the test case is that those additional config
      space reading operations in pci_pm_reset() would cause EEH error,
      but we didn't detect EEH error until "modprobe lpfc". For the case,
      all the PCI devices on PCI bus (0004:01) were removed and added after
      PE reset. Then the EEH devices would be figured out again based on
      the OF nodes. Unfortunately, there were some child OF nodes under
      PCI device (0004:01:00.0), but they didn't have attached PCI_DN since
      they're invisible from PCI domain. However, we were still trying to
      convert OF node to EEH device without checking on the attached PCI_DN.
      Eventually, it caused the kernel crash as follows:
      
      Unable to handle kernel paging request for data at address 0x00000030
      Faulting instruction address: 0xc00000000004d888
      cpu 0x0: Vector: 300 (Data Access) at [c000000fc797b950]
          pc: c00000000004d888: .eeh_add_device_tree_early+0x78/0x140
          lr: c00000000004d880: .eeh_add_device_tree_early+0x70/0x140
          sp: c000000fc797bbd0
         msr: 8000000000009032
         dar: 30
       dsisr: 40000000
        current = 0xc000000fc78d9f70
        paca    = 0xc00000000edb0000   softe: 0        irq_happened: 0x00
          pid   = 2951, comm = eehd
      enter ? for help
      [c000000fc797bc50] c00000000004d848 .eeh_add_device_tree_early+0x38/0x140
      [c000000fc797bcd0] c00000000004d848 .eeh_add_device_tree_early+0x38/0x140
      [c000000fc797bd50] c000000000051b54 .pcibios_add_pci_devices+0x34/0x190
      [c000000fc797bde0] c00000000004fb10 .eeh_reset_device+0x100/0x160
      [c000000fc797be70] c0000000000502dc .eeh_handle_event+0x19c/0x300
      [c000000fc797bf00] c000000000050570 .eeh_event_handler+0x130/0x1a0
      [c000000fc797bf90] c000000000020138 .kernel_thread+0x54/0x70
      
      The patch changes of_node_to_eeh_dev() and just returns NULL if the
      passed OF node doesn't have attached PCI_DN.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1e38b714
    • G
      powerpc/eeh: Remove EEH PE for normal PCI hotplug · 20ee6a97
      Gavin Shan 提交于
      Function eeh_rmv_from_parent_pe() could be called by the path of
      either normal PCI hotplug, or EEH recovery. For the former case,
      we need purge the corresponding PE on removal of the associated
      PE bus.
      
      The patch tries to cover that by passing more information to function
      pcibios_remove_pci_devices() so that we know if the corresponding PE
      needs to be purged or be marked as "invalid".
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      20ee6a97
  4. 10 9月, 2012 14 次提交
  5. 30 4月, 2012 1 次提交
    • A
      powerpc: Use WARN instead of dump_stack when printing EEH error backtrace · 14fb1fa6
      Anton Blanchard 提交于
      When we get an EEH error we just print a backtrace with dump_stack
      which is rather cryptic. We really should print something before
      spewing out the backtrace.
      
      Also switch from dump_stack to WARN so we get more information about
      the fail - what modules were loaded, what process was running etc.
      This was useful information when debugging a recent EEH subsystem bug.
      
      The standard WARN output should also get picked up by monitoring
      tools like kerneloops.
      
      The register dump is of questionable value here but I figured it was
      better to use something standard and not roll my own.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      14fb1fa6
  6. 23 4月, 2012 1 次提交
    • G
      powerpc/eeh: Fix crash caused by null eeh_dev · 2ef822c5
      Gavin Shan 提交于
      The problem was reported by Anton Blanchard. While EEH error
      happened to the PCI device without the corresponding device
      driver, kernel crash was seen. Eventually, I successfully
      reproduced the problem on Firebird-L machine with utility
      "errinjct". Initially, the device driver for Emulex ethernet
      MAC has been disabled from .config and force data parity on
      the Emulex ethernet MAC with help of "errinjct". Eventually,
      I saw the kernel crash after issueing couple of "lspci -v"
      command.
      
      The root cause behind is that the PCI device, including the
      reference to the corresponding eeh device, will be removed
      from the system while EEH does recovery. Afterwards, the
      PCI device will be probed again and added into the system
      accordingly. So it's not safe to retrieve the eeh device from
      the corresponding PCI device after the PCI device has been removed
      and not added again.
      
      The patch fixes the issue and retrieve the eeh device from OF node
      instead of PCI device after the PCI device has been removed.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Tested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2ef822c5
  7. 28 3月, 2012 1 次提交
    • G
      powerpc/eeh: Retrieve PHB from global list · 1a5c2e63
      Gavin Shan 提交于
      Currently, the existing PHBs are retrieved from the FDT (Flat
      Device Tree) based on the name of FDT node. Specificly, those
      FDT nodes whose names have prefix "pci" are regarded as PHBs.
      That's inappropriate because some PCI bridges possibilly have
      names leading with "pci". It caused EEH is enabled on same
      PCI devices for towice.
      
      The patch fixes the above issue. Besides, the PHBs are expected
      to be figured out from FDT before enable EEH on them. Therefore,
      it's resonable to retrieve the PHBs from the global linked list
      traced by variable "hose_list" insteading poking them from FDT.
      
      For the EEH implementation on pSeries platform, RTAS is critical
      because all low-level functions are implemented based on RTAS.
      Therefore, we should make sure "/rtas" OF node is available and
      ready before to enable EEH core. However, it actually introduced
      duplicate since the previous pSeries platform dependent initialization
      function already do the check. Besides, we want to make eeh core
      platform independent, so RTAS related staff should be removed there.
      The patch removes the duplicate check on "/rtas" OF node for eeh
      core.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1a5c2e63
  8. 09 3月, 2012 13 次提交