1. 15 3月, 2014 2 次提交
    • S
      cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes · 05eb2389
      Steve Wise 提交于
      The current logic suffers from a slow response time to disable user DB
      usage, and also fails to avoid DB FIFO drops under heavy load. This commit
      fixes these deficiencies and makes the avoidance logic more optimal.
      This is done by more efficiently notifying the ULDs of potential DB
      problems, and implements a smoother flow control algorithm in iw_cxgb4,
      which is the ULD that puts the most load on the DB fifo.
      
      Design:
      
      cxgb4:
      
      Direct ULD callback from the DB FULL/DROP interrupt handler.  This allows
      the ULD to stop doing user DB writes as quickly as possible.
      
      While user DB usage is disabled, the LLD will accumulate DB write events
      for its queues.  Then once DB usage is reenabled, a single DB write is
      done for each queue with its accumulated write count.  This reduces the
      load put on the DB fifo when reenabling.
      
      iw_cxgb4:
      
      Instead of marking each qp to indicate DB writes are disabled, we create
      a device-global status page that each user process maps.  This allows
      iw_cxgb4 to only set this single bit to disable all DB writes for all
      user QPs vs traversing the idr of all the active QPs.  If the libcxgb4
      doesn't support this, then we fall back to the old approach of marking
      each QP.  Thus we allow the new driver to work with an older libcxgb4.
      
      When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
      via the status page and transition the DB state to STOPPED.  As user
      processes see that DB writes are disabled, they call into iw_cxgb4
      to submit their DB write events.  Since the DB state is in STOPPED,
      the QP trying to write gets enqueued on a new DB "flow control" list.
      As subsequent DB writes are submitted for this flow controlled QP, the
      amount of writes are accumulated for each QP on the flow control list.
      So all the user QPs that are actively ringing the DB get put on this
      list and the number of writes they request are accumulated.
      
      When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
      context, we change the DB state to FLOW_CONTROL, and begin resuming all
      the QPs that are on the flow control list.  This logic runs on until
      the flow control list is empty or we exit FLOW_CONTROL mode (due to
      a DB DROP upcall, for example).  QPs are removed from this list, and
      their accumulated DB write counts written to the DB FIFO.  Sets of QPs,
      called chunks in the code, are removed at one time. The chunk size is 64.
      So 64 QPs are resumed at a time, and before the next chunk is resumed, the
      logic waits (blocks) for the DB FIFO to drain.  This prevents resuming to
      quickly and overflowing the FIFO.  Once the flow control list is empty,
      the db state transitions back to NORMAL and user QPs are again allowed
      to write directly to the user DB register.
      
      The algorithm is designed such that if the DB write load is high enough,
      then all the DB writes get submitted by the kernel using this flow
      controlled approach to avoid DB drops.  As the load lightens though, we
      resume to normal DB writes directly by user applications.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05eb2389
    • S
      cxgb4/iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice · 7a2cea2a
      Steve Wise 提交于
      Based on original work by Anand Priyadarshee <anandp@chelsio.com>.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a2cea2a
  2. 14 3月, 2014 5 次提交
  3. 26 2月, 2014 1 次提交
  4. 25 2月, 2014 1 次提交
  5. 19 2月, 2014 9 次提交
  6. 25 1月, 2014 1 次提交
  7. 24 1月, 2014 2 次提交
    • G
      net/cxgb4: Don't retrieve stats during recovery · 9fe6cb58
      Gavin Shan 提交于
      We possibly retrieve the adapter's statistics during EEH recovery
      and that should be disallowed. Otherwise, it would possibly incur
      replicate EEH error and EEH recovery is going to fail eventually.
      
      The patch reuses statistics lock and checks net_device is attached
      before going to retrieve statistics, so that the problem can be
      avoided.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fe6cb58
    • G
      net/cxgb4: Avoid disabling PCI device for towice · 144be3d9
      Gavin Shan 提交于
      If we have EEH error happens to the adapter and we have to remove
      it from the system for some reasons (e.g. more than 5 EEH errors
      detected from the device in last hour), the adapter will be disabled
      for towice separately by eeh_err_detected() and remove_one(), which
      will incur following unexpected backtrace. The patch tries to avoid
      it.
      
      WARNING: at drivers/pci/pci.c:1431
      CPU: 12 PID: 121 Comm: eehd Not tainted 3.13.0-rc7+ #1
      task: c0000001823a3780 ti: c00000018240c000 task.ti: c00000018240c000
      NIP: c0000000003c1e40 LR: c0000000003c1e3c CTR: 0000000001764c5c
      REGS: c00000018240f470 TRAP: 0700   Not tainted  (3.13.0-rc7+)
      MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 28000024  XER: 00000004
      CFAR: c000000000706528 SOFTE: 1
      GPR00: c0000000003c1e3c c00000018240f6f0 c0000000010fe1f8 0000000000000035
      GPR04: 0000000000000000 0000000000000000 00000000003ae509 0000000000000000
      GPR08: 000000000000346f 0000000000000000 0000000000000000 0000000000003fef
      GPR12: 0000000028000022 c00000000ec93000 c0000000000c11b0 c000000184ac3e40
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR24: 0000000000000000 c0000000009398d8 c00000000101f9c0 c0000001860ae000
      GPR28: c000000182ba0000 00000000000001f0 c0000001860ae6f8 c0000001860ae000
      NIP [c0000000003c1e40] .pci_disable_device+0xd0/0xf0
      LR [c0000000003c1e3c] .pci_disable_device+0xcc/0xf0
      Call Trace:
      [c0000000003c1e3c] .pci_disable_device+0xcc/0xf0 (unreliable)
      [d0000000073881c4] .remove_one+0x174/0x320 [cxgb4]
      [c0000000003c57e0] .pci_device_remove+0x60/0x100
      [c00000000046396c] .__device_release_driver+0x9c/0x120
      [c000000000463a20] .device_release_driver+0x30/0x60
      [c0000000003bcdb4] .pci_stop_bus_device+0x94/0xd0
      [c0000000003bcf48] .pci_stop_and_remove_bus_device+0x18/0x30
      [c00000000003f548] .pcibios_remove_pci_devices+0xa8/0x140
      [c000000000035c00] .eeh_handle_normal_event+0xa0/0x3c0
      [c000000000035f50] .eeh_handle_event+0x30/0x2b0
      [c0000000000362c4] .eeh_event_handler+0xf4/0x1b0
      [c0000000000c12b8] .kthread+0x108/0x130
      [c00000000000a168] .ret_from_kernel_thread+0x5c/0x74
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      144be3d9
  8. 17 1月, 2014 1 次提交
  9. 14 1月, 2014 1 次提交
  10. 10 1月, 2014 1 次提交
  11. 04 1月, 2014 1 次提交
    • T
      cxgb4: allow large buffer size to have page size · 940d9d34
      Thadeu Lima de Souza Cascardo 提交于
      Since commit 52367a76
      ("cxgb4/cxgb4vf: Code cleanup to enable T4 Configuration File support"),
      we have failures like this during cxgb4 probe:
      
      cxgb4 0000:01:00.4: bad SGE FL page buffer sizes [65536, 65536]
      cxgb4: probe of 0000:01:00.4 failed with error -22
      
      This happens whenever software parameters are used, without a
      configuration file. That happens when the hardware was already
      initialized (after kexec, or after csiostor is loaded).
      
      It happens that these values are acceptable, rendering fl_pg_order equal
      to 0, which is the case of a hard init when the page size is equal or
      larger than 65536.
      
      Accepting fl_large_pg equal to fl_small_pg solves the issue, and
      shouldn't cause any trouble besides a possible performance reduction
      when smaller pages are used. And that can be fixed by a configuration
      file.
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      940d9d34
  12. 23 12月, 2013 6 次提交
  13. 19 12月, 2013 1 次提交
  14. 04 12月, 2013 3 次提交
  15. 22 10月, 2013 1 次提交
  16. 18 10月, 2013 1 次提交
  17. 27 9月, 2013 1 次提交
  18. 16 9月, 2013 1 次提交
  19. 20 8月, 2013 1 次提交