1. 17 6月, 2009 1 次提交
  2. 15 4月, 2009 1 次提交
    • M
      powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset() · c58dc575
      Mike Mason 提交于
      While adding native EEH support to Emulex and Qlogic drivers, it was
      discovered that dev->error_state was set to pci_io_channel_normal too
      late in the recovery process. These drivers rely on error_state to
      determine if they can access the device in their slot_reset callback,
      thus error_state needs to be set to pci_io_channel_normal in
      eeh_report_reset(). Below is a detailed explanation (courtesy of Richard
      Lary) as to why this is necessary.
      
      Background:
      PCI MMIO or DMA accesses to a frozen slot generate additional EEH
      errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the
      adapter will be shutdown. To avoid triggering excessive EEH errors and
      an undesirable adapter shutdown, some drivers use the
      pci_channel_offline(dev) wrapper function to return a Boolean value
      based on the value of pci_dev->error_state to determine if PCI MMIO or
      DMA accesses are safe. If the wrapper returns TRUE, drivers must not
      make PCI MMIO or DMA access to their hardware.
      
      The pci_dev structure member error_state reflects one of three values,
      1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3)
      pci_channel_io_perm_failure.  Function pci_channel_offline(dev) returns
      TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure.
      
      The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the
      point where the PCI slot is frozen. Currently, the EEH driver restores
      dev->error_state to pci_channel_io_normal in eeh_report_resume() before
      calling the driver's resume callback. However, when the EEH driver calls
      the driver's slot_reset callback() from eeh_report_reset(), it
      incorrectly indicates the error state is still pci_channel_io_frozen.
      
      Waiting until eeh_report_resume() to restore dev->error_state to
      pci_channel_io_normal is too late for Emulex and QLogic FC drivers and
      any other drivers which are designed to use common code paths in these
      two cases: i) those called after the driver's slot_reset callback() and
      ii) those called after the PCI slot is frozen but before the driver's
      slot_reset callback is called. Case i) all driver paths executed to
      reinitialize the hardware after a reset and case ii) all code paths
      executed by driver kernel threads that run asynchronous to the main
      driver thread, such as interrupt handlers and worker threads to process
      driver work queues.
      
      Emulex and QLogic FC drivers are designed with common code paths which
      require that pci_channel_offline(dev) reflect the true state of the
      hardware. The state transitions that the hardware takes from Normal
      Operations to Slot Frozen to Reset to Normal Operations are documented
      in the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75.
      PE State Control.
      
      PAPR defines the following 3 states:
      
      0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed
           (Normal Operations)
      1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled
      2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled
           (Slot Frozen)
      
      An EEH error places the slot in state 2 (Frozen) and the adapter driver
      is notified that an EEH error was detected. If the adapter driver
      returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls
      eeh_reset_device() to place the slot into state 1 (Reset) and
      eeh_reset_device completes by placing the slot into State 0 (Normal
      Operations). Upon return from eeh_reset_device(), the EEH driver calls
      eeh_report_reset, which then calls the adapter's slot_reset callback. At
      the time the adapter's slot_reset callback is called, the true state of
      the hardware is Normal Operations and should be accurately reflected by
      setting dev->error_state to pci_channel_io_normal.
      
      The current implementation of EEH driver does not do so and requires
      this change to correct this deficiency.
      Signed-off-by: NMike Mason <mmlnx@us.ibm.com>
      Acked-by: NLinas Vepstas <linasvepstas@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c58dc575
  3. 11 2月, 2009 1 次提交
  4. 20 8月, 2008 1 次提交
  5. 16 6月, 2008 1 次提交
  6. 11 12月, 2007 1 次提交
  7. 03 12月, 2007 1 次提交
  8. 08 11月, 2007 2 次提交
  9. 14 6月, 2007 1 次提交
  10. 10 5月, 2007 1 次提交
  11. 09 5月, 2007 2 次提交
  12. 13 4月, 2007 1 次提交
  13. 22 3月, 2007 6 次提交
  14. 24 1月, 2007 1 次提交
  15. 08 12月, 2006 1 次提交
  16. 21 9月, 2006 2 次提交
  17. 31 7月, 2006 1 次提交
  18. 01 7月, 2006 1 次提交
  19. 21 6月, 2006 1 次提交
  20. 19 5月, 2006 1 次提交
  21. 22 4月, 2006 1 次提交
    • L
      [PATCH] powerpc/pseries: clear PCI failure counter if no new failures · ac325acd
      Linas Vepstas 提交于
      The current PCI error recovery system keeps track of the number of PCI card
      resets, and refuses to bring a card back up if this number is too large.
      The goal of doing this was to avoid an infinite loop of resets if a card is
      obviously dead.  However, if the failures are rare, but the machine has a
      high uptime, this mechanism might still be triggered; this is too harsh.
      
      This patch will avoids this problem by decrementing the fail count after an
      hour.  Thus, as long as a pci card BSOD's less than 6 times an hour, it
      will continue to be reset indefinitely.  If it's failure rate is greater
      than that, it will be taken off-line permanently.
      
      This patch is larger than it might otherwise be because it changes
      indentation by removing a pointless while-loop.  The while loop is not
      needed, as the handler is invoked once fo each event (by schedule_work());
      the loop is leftover cruft from an earlier implementation.
      Signed-off-by: NLinas Vepstas <linas@austin.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      ac325acd
  22. 01 4月, 2006 2 次提交
  23. 27 3月, 2006 1 次提交
  24. 28 2月, 2006 1 次提交
    • O
      [PATCH] powerpc: fix NULL pointer in handle_eeh_events · 273d2803
      Olaf Hering 提交于
      This patch fixes a crash in handle_eeh_events,
      but ethtool -t still doesnt work right.
      
      ...
      pepino:~ # cpu 0x3: Vector: 300 (Data Access) at [c00000005192bbe0]
          pc: c00000000004a380: .handle_eeh_events+0xe0/0x23c
          lr: c00000000004a374: .handle_eeh_events+0xd4/0x23c
          sp: c00000005192be60
         msr: 9000000000009032
         dar: 268
       dsisr: 40000000
        current = 0xc0000001fe7bf1a0
        paca    = 0xc00000000048b280
          pid   = 16322, comm = eehd
      enter ? for help
      [c00000005192bf00] c00000000004a808 .eeh_event_handler+0xcc/0x130
      [c00000005192bf90] c000000000025e00 .kernel_thread+0x4c/0x68
      
      ...
      
      (none):/# /usr/sbin/ethtool -i eth0
      driver: e100
      version: 3.5.10-k2-NAPI
      firmware-version: N/A
      bus-info: 0000:21:01.0
      (none):/# /usr/sbin/ethtool -t eth0
      Call Trace:
      [C00000000F8DEFF0] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable)
      [C00000000F8DF0A0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8
      [C00000000F8DF150] [C000000000049E58] .eeh_check_failure+0x10c/0x138
      [C00000000F8DF1E0] [C0000000002DFDB0] .e100_hw_reset+0x70/0xf4
      [C00000000F8DF270] [C0000000002E1BBC] .e100_hw_init+0x2c/0x260
      [C00000000F8DF310] [C0000000002E2464] .e100_loopback_test+0x8c/0x220
      [C00000000F8DF3C0] [C0000000002E28DC] .e100_diag_test+0xdc/0x16c
      [C00000000F8DF490] [C000000000420BE0] .dev_ethtool+0xf24/0x14f8
      [C00000000F8DF8F0] [C00000000041F4A8] .dev_ioctl+0x5cc/0x740
      [C00000000F8DFA20] [C00000000040FEFC] .sock_ioctl+0x3d0/0x404
      [C00000000F8DFAC0] [C0000000000D513C] .do_ioctl+0x68/0x108
      [C00000000F8DFB50] [C0000000000D56B0] .vfs_ioctl+0x4d4/0x510
      [C00000000F8DFC10] [C0000000000D5740] .sys_ioctl+0x54/0x94
      [C00000000F8DFCC0] [C0000000000FB6EC] .ethtool_ioctl+0x11c/0x150
      [C00000000F8DFD60] [C0000000000F7E40] .compat_sys_ioctl+0x338/0x3bc
      [C00000000F8DFE30] [C00000000000871C] syscall_exit+0x0/0x40
      EEH: Detected PCI bus error on device 0000:21:01.0
      EEH: This PCI device has failed 1 times since last reboot: <NULL> -
      
      modprobe: FATAL: Could not load /lib/modules/2.6.16-rc4-git7/modules.dep: No such file or directory
      
      Cannot get strings: No such device
      (none):/#
      (none):/# EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2
      
      (none):/# Call Trace:
      [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable)
      [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8
      [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154
      [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc
      [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c
      [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c
      [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8
      [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110
      [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250
      [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140
      [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68
      EEH: Detected PCI bus error on device <NULL>
      EEH: This PCI device has failed 1 times since last reboot: <NULL> -
      EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2
      Call Trace:
      [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable)
      [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8
      [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154
      [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc
      [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c
      [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c
      [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8
      [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110
      [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250
      [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140
      [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68
      EEH: Detected PCI bus error on device <NULL>
      EEH: This PCI device has failed 1 times since last reboot: <NULL> -
      EEH: Unable to configure device bridge (-3) for /pci@400000000110/pci@2,2
      Call Trace:
      [C00000000FA17940] [C00000000000F270] .show_stack+0x74/0x1b4 (unreliable)
      [C00000000FA179F0] [C000000000049D04] .eeh_dn_check_failure+0x290/0x2d8
      [C00000000FA17AA0] [C00000000001E114] .rtas_read_config+0x120/0x154
      [C00000000FA17B40] [C000000000049664] .early_enable_eeh+0x274/0x2bc
      [C00000000FA17C00] [C000000000049708] .eeh_add_device_early+0x5c/0x6c
      [C00000000FA17C90] [C000000000049748] .eeh_add_device_tree_early+0x30/0x5c
      [C00000000FA17D20] [C000000000046568] .pcibios_add_pci_devices+0x8c/0x1f8
      [C00000000FA17DD0] [C00000000004A528] .eeh_reset_device+0xe0/0x110
      [C00000000FA17E60] [C00000000004A698] .handle_eeh_events+0x140/0x250
      [C00000000FA17F00] [C00000000004AC7C] .eeh_event_handler+0xe8/0x140
      [C00000000FA17F90] [C000000000025784] .kernel_thread+0x4c/0x68
      EEH: Detected PCI bus error on device <NULL>
      and so on
      Signed-off-by: NOlaf Hering <olh@suse.de>
      Acked-by: NLinas Vepstas <linas@austin.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      273d2803
  25. 08 2月, 2006 1 次提交
  26. 10 1月, 2006 5 次提交