1. 03 7月, 2014 1 次提交
  2. 02 7月, 2014 4 次提交
  3. 25 6月, 2014 1 次提交
    • L
      cxgb4: Not need to hold the adap_rcu_lock lock when read adap_rcu_list · ee9a33b2
      Li RongQing 提交于
      cxgb4_netdev maybe lead to dead lock, since it uses a spin lock, and be called
      in both thread and softirq context, but not disable BH, the lockdep report is
      below; In fact, cxgb4_netdev only reads adap_rcu_list with RCU protection, so
      not need to hold spin lock again.
      	=================================
      	[ INFO: inconsistent lock state ]
      	3.14.7+ #24 Tainted: G         C O
      	---------------------------------
      	inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      	radvd/3794 [HC0[0]:SC1[1]:HE1:SE0] takes:
      	 (adap_rcu_lock){+.?...}, at: [<ffffffffa09989ea>] clip_add+0x2c/0x116 [cxgb4]
      	{SOFTIRQ-ON-W} state was registered at:
      	  [<ffffffff810fca81>] __lock_acquire+0x34a/0xe48
      	  [<ffffffff810fd98b>] lock_acquire+0x82/0x9d
      	  [<ffffffff815d6ff8>] _raw_spin_lock+0x34/0x43
      	  [<ffffffffa09989ea>] clip_add+0x2c/0x116 [cxgb4]
      	  [<ffffffffa0998beb>] cxgb4_inet6addr_handler+0x117/0x12c [cxgb4]
      	  [<ffffffff815da98b>] notifier_call_chain+0x32/0x5c
      	  [<ffffffff815da9f9>] __atomic_notifier_call_chain+0x44/0x6e
      	  [<ffffffff815daa32>] atomic_notifier_call_chain+0xf/0x11
      	  [<ffffffff815b1356>] inet6addr_notifier_call_chain+0x16/0x18
      	  [<ffffffffa01f72e5>] ipv6_add_addr+0x404/0x46e [ipv6]
      	  [<ffffffffa01f8df0>] addrconf_add_linklocal+0x5f/0x95 [ipv6]
      	  [<ffffffffa01fc3e9>] addrconf_notify+0x632/0x841 [ipv6]
      	  [<ffffffff815da98b>] notifier_call_chain+0x32/0x5c
      	  [<ffffffff810e09a1>] __raw_notifier_call_chain+0x9/0xb
      	  [<ffffffff810e09b2>] raw_notifier_call_chain+0xf/0x11
      	  [<ffffffff8151b3b7>] call_netdevice_notifiers_info+0x4e/0x56
      	  [<ffffffff8151b3d0>] call_netdevice_notifiers+0x11/0x13
      	  [<ffffffff8151c0a6>] netdev_state_change+0x1f/0x38
      	  [<ffffffff8152f004>] linkwatch_do_dev+0x3b/0x49
      	  [<ffffffff8152f184>] __linkwatch_run_queue+0x10b/0x144
      	  [<ffffffff8152f1dd>] linkwatch_event+0x20/0x27
      	  [<ffffffff810d7bc0>] process_one_work+0x1cb/0x2ee
      	  [<ffffffff810d7e3b>] worker_thread+0x12e/0x1fc
      	  [<ffffffff810dd391>] kthread+0xc4/0xcc
      	  [<ffffffff815dc48c>] ret_from_fork+0x7c/0xb0
      	irq event stamp: 3388
      	hardirqs last  enabled at (3388): [<ffffffff810c6c85>]
      	__local_bh_enable_ip+0xaa/0xd9
      	hardirqs last disabled at (3387): [<ffffffff810c6c2d>]
      	__local_bh_enable_ip+0x52/0xd9
      	softirqs last  enabled at (3288): [<ffffffffa01f1d5b>]
      	rcu_read_unlock_bh+0x0/0x2f [ipv6]
      	softirqs last disabled at (3289): [<ffffffff815ddafc>]
      	do_softirq_own_stack+0x1c/0x30
      
      	other info that might help us debug this:
      	 Possible unsafe locking scenario:
      
      	       CPU0
      	       ----
      	  lock(adap_rcu_lock);
      	  <Interrupt>
      	    lock(adap_rcu_lock);
      
      	 *** DEADLOCK ***
      
      	5 locks held by radvd/3794:
      	 #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffffa020b85a>]
      	rawv6_sendmsg+0x74b/0xa4d [ipv6]
      	 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8151ac6b>]
      	rcu_lock_acquire+0x0/0x29
      	 #2:  (rcu_read_lock){.+.+..}, at: [<ffffffffa01f4cca>]
      	rcu_lock_acquire.constprop.16+0x0/0x30 [ipv6]
      	 #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff810e09b4>]
      	rcu_lock_acquire+0x0/0x29
      	 #4:  (rcu_read_lock){.+.+..}, at: [<ffffffffa0998782>]
      	rcu_lock_acquire.constprop.40+0x0/0x30 [cxgb4]
      
      	stack backtrace:
      	CPU: 7 PID: 3794 Comm: radvd Tainted: G         C O 3.14.7+ #24
      	Hardware name: Supermicro X7DBU/X7DBU, BIOS 6.00 12/03/2007
      	 ffffffff81f15990 ffff88012fdc36a8 ffffffff815d0016 0000000000000006
      	 ffff8800c80dc2a0 ffff88012fdc3708 ffffffff815cc727 0000000000000001
      	 0000000000000001 ffff880100000000 ffffffff81015b02 ffff8800c80dcb58
      	Call Trace:
      	 <IRQ>  [<ffffffff815d0016>] dump_stack+0x4e/0x71
      	 [<ffffffff815cc727>] print_usage_bug+0x1ec/0x1fd
      	 [<ffffffff81015b02>] ? save_stack_trace+0x27/0x44
      	 [<ffffffff810fbfaa>] ? check_usage_backwards+0xa0/0xa0
      	 [<ffffffff810fc640>] mark_lock+0x11b/0x212
      	 [<ffffffff810fca0b>] __lock_acquire+0x2d4/0xe48
      	 [<ffffffff810fbfaa>] ? check_usage_backwards+0xa0/0xa0
      	 [<ffffffff810fbff6>] ? check_usage_forwards+0x4c/0xa6
      	 [<ffffffff810c6c8a>] ? __local_bh_enable_ip+0xaf/0xd9
      	 [<ffffffff810fd98b>] lock_acquire+0x82/0x9d
      	 [<ffffffffa09989ea>] ? clip_add+0x2c/0x116 [cxgb4]
      	 [<ffffffffa0998782>] ? rcu_read_unlock+0x23/0x23 [cxgb4]
      	 [<ffffffff815d6ff8>] _raw_spin_lock+0x34/0x43
      	 [<ffffffffa09989ea>] ? clip_add+0x2c/0x116 [cxgb4]
      	 [<ffffffffa09987b0>] ? rcu_lock_acquire.constprop.40+0x2e/0x30 [cxgb4]
      	 [<ffffffffa0998782>] ? rcu_read_unlock+0x23/0x23 [cxgb4]
      	 [<ffffffffa09989ea>] clip_add+0x2c/0x116 [cxgb4]
      	 [<ffffffffa0998beb>] cxgb4_inet6addr_handler+0x117/0x12c [cxgb4]
      	 [<ffffffff810fd99d>] ? lock_acquire+0x94/0x9d
      	 [<ffffffff810e09b4>] ? raw_notifier_call_chain+0x11/0x11
      	 [<ffffffff815da98b>] notifier_call_chain+0x32/0x5c
      	 [<ffffffff815da9f9>] __atomic_notifier_call_chain+0x44/0x6e
      	 [<ffffffff815daa32>] atomic_notifier_call_chain+0xf/0x11
      	 [<ffffffff815b1356>] inet6addr_notifier_call_chain+0x16/0x18
      	 [<ffffffffa01f72e5>] ipv6_add_addr+0x404/0x46e [ipv6]
      	 [<ffffffff810fde6a>] ? trace_hardirqs_on+0xd/0xf
      	 [<ffffffffa01fb634>] addrconf_prefix_rcv+0x385/0x6ea [ipv6]
      	 [<ffffffffa0207950>] ndisc_rcv+0x9d3/0xd76 [ipv6]
      	 [<ffffffffa020d536>] icmpv6_rcv+0x592/0x67b [ipv6]
      	 [<ffffffff810c6c85>] ? __local_bh_enable_ip+0xaa/0xd9
      	 [<ffffffff810c6c85>] ? __local_bh_enable_ip+0xaa/0xd9
      	 [<ffffffff810fd8dc>] ? lock_release+0x14e/0x17b
      	 [<ffffffffa020df97>] ? rcu_read_unlock+0x21/0x23 [ipv6]
      	 [<ffffffff8150df52>] ? rcu_read_unlock+0x23/0x23
      	 [<ffffffffa01f4ede>] ip6_input_finish+0x1e4/0x2fc [ipv6]
      	 [<ffffffffa01f540b>] ip6_input+0x33/0x38 [ipv6]
      	 [<ffffffffa01f5557>] ip6_mc_input+0x147/0x160 [ipv6]
      	 [<ffffffffa01f4ba3>] ip6_rcv_finish+0x7c/0x81 [ipv6]
      	 [<ffffffffa01f5397>] ipv6_rcv+0x3a1/0x3e2 [ipv6]
      	 [<ffffffff8151ef96>] __netif_receive_skb_core+0x4ab/0x511
      	 [<ffffffff810fdc94>] ? mark_held_locks+0x71/0x99
      	 [<ffffffff8151f0c0>] ? process_backlog+0x69/0x15e
      	 [<ffffffff8151f045>] __netif_receive_skb+0x49/0x5b
      	 [<ffffffff8151f0cf>] process_backlog+0x78/0x15e
      	 [<ffffffff8151f571>] ? net_rx_action+0x1a2/0x1cc
      	 [<ffffffff8151f47b>] net_rx_action+0xac/0x1cc
      	 [<ffffffff810c69b7>] ? __do_softirq+0xad/0x218
      	 [<ffffffff810c69ff>] __do_softirq+0xf5/0x218
      	 [<ffffffff815ddafc>] do_softirq_own_stack+0x1c/0x30
      	 <EOI>  [<ffffffff810c6bb6>] do_softirq+0x38/0x5d
      	 [<ffffffffa01f1d5b>] ? ip6_copy_metadata+0x156/0x156 [ipv6]
      	 [<ffffffff810c6c78>] __local_bh_enable_ip+0x9d/0xd9
      	 [<ffffffffa01f1d88>] rcu_read_unlock_bh+0x2d/0x2f [ipv6]
      	 [<ffffffffa01f28b4>] ip6_finish_output2+0x381/0x3d8 [ipv6]
      	 [<ffffffffa01f49ef>] ip6_finish_output+0x6e/0x73 [ipv6]
      	 [<ffffffffa01f4a70>] ip6_output+0x7c/0xa8 [ipv6]
      	 [<ffffffff815b1bfa>] dst_output+0x18/0x1c
      	 [<ffffffff815b1c9e>] ip6_local_out+0x1c/0x21
      	 [<ffffffffa01f2489>] ip6_push_pending_frames+0x37d/0x427 [ipv6]
      	 [<ffffffff81558af8>] ? skb_orphan+0x39/0x39
      	 [<ffffffffa020b85a>] ? rawv6_sendmsg+0x74b/0xa4d [ipv6]
      	 [<ffffffffa020ba51>] rawv6_sendmsg+0x942/0xa4d [ipv6]
      	 [<ffffffff81584cd2>] inet_sendmsg+0x3d/0x66
      	 [<ffffffff81508930>] __sock_sendmsg_nosec+0x25/0x27
      	 [<ffffffff8150b0d7>] sock_sendmsg+0x5a/0x7b
      	 [<ffffffff810fd8dc>] ? lock_release+0x14e/0x17b
      	 [<ffffffff8116d756>] ? might_fault+0x9e/0xa5
      	 [<ffffffff8116d70d>] ? might_fault+0x55/0xa5
      	 [<ffffffff81508cb1>] ? copy_from_user+0x2a/0x2c
      	 [<ffffffff8150b70c>] ___sys_sendmsg+0x226/0x2d9
      	 [<ffffffff810fcd25>] ? __lock_acquire+0x5ee/0xe48
      	 [<ffffffff810fde01>] ? trace_hardirqs_on_caller+0x145/0x1a1
      	 [<ffffffff8118efcb>] ? slab_free_hook.isra.71+0x50/0x59
      	 [<ffffffff8115c81f>] ? release_pages+0xbc/0x181
      	 [<ffffffff810fd99d>] ? lock_acquire+0x94/0x9d
      	 [<ffffffff81115e97>] ? read_seqcount_begin.constprop.25+0x73/0x90
      	 [<ffffffff8150c408>] __sys_sendmsg+0x3d/0x5b
      	 [<ffffffff8150c433>] SyS_sendmsg+0xd/0x19
      	 [<ffffffff815dc53d>] system_call_fastpath+0x1a/0x1f
      Reported-by: NBen Greear <greearb@candelatech.com>
      Cc: Casey Leedom <leedom@chelsio.com>
      Cc: Hariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee9a33b2
  4. 23 6月, 2014 2 次提交
  5. 11 6月, 2014 3 次提交
  6. 03 6月, 2014 1 次提交
  7. 14 5月, 2014 1 次提交
  8. 13 5月, 2014 2 次提交
  9. 01 5月, 2014 1 次提交
  10. 29 3月, 2014 1 次提交
  11. 27 3月, 2014 1 次提交
  12. 15 3月, 2014 1 次提交
    • S
      cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes · 05eb2389
      Steve Wise 提交于
      The current logic suffers from a slow response time to disable user DB
      usage, and also fails to avoid DB FIFO drops under heavy load. This commit
      fixes these deficiencies and makes the avoidance logic more optimal.
      This is done by more efficiently notifying the ULDs of potential DB
      problems, and implements a smoother flow control algorithm in iw_cxgb4,
      which is the ULD that puts the most load on the DB fifo.
      
      Design:
      
      cxgb4:
      
      Direct ULD callback from the DB FULL/DROP interrupt handler.  This allows
      the ULD to stop doing user DB writes as quickly as possible.
      
      While user DB usage is disabled, the LLD will accumulate DB write events
      for its queues.  Then once DB usage is reenabled, a single DB write is
      done for each queue with its accumulated write count.  This reduces the
      load put on the DB fifo when reenabling.
      
      iw_cxgb4:
      
      Instead of marking each qp to indicate DB writes are disabled, we create
      a device-global status page that each user process maps.  This allows
      iw_cxgb4 to only set this single bit to disable all DB writes for all
      user QPs vs traversing the idr of all the active QPs.  If the libcxgb4
      doesn't support this, then we fall back to the old approach of marking
      each QP.  Thus we allow the new driver to work with an older libcxgb4.
      
      When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
      via the status page and transition the DB state to STOPPED.  As user
      processes see that DB writes are disabled, they call into iw_cxgb4
      to submit their DB write events.  Since the DB state is in STOPPED,
      the QP trying to write gets enqueued on a new DB "flow control" list.
      As subsequent DB writes are submitted for this flow controlled QP, the
      amount of writes are accumulated for each QP on the flow control list.
      So all the user QPs that are actively ringing the DB get put on this
      list and the number of writes they request are accumulated.
      
      When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
      context, we change the DB state to FLOW_CONTROL, and begin resuming all
      the QPs that are on the flow control list.  This logic runs on until
      the flow control list is empty or we exit FLOW_CONTROL mode (due to
      a DB DROP upcall, for example).  QPs are removed from this list, and
      their accumulated DB write counts written to the DB FIFO.  Sets of QPs,
      called chunks in the code, are removed at one time. The chunk size is 64.
      So 64 QPs are resumed at a time, and before the next chunk is resumed, the
      logic waits (blocks) for the DB FIFO to drain.  This prevents resuming to
      quickly and overflowing the FIFO.  Once the flow control list is empty,
      the db state transitions back to NORMAL and user QPs are again allowed
      to write directly to the user DB register.
      
      The algorithm is designed such that if the DB write load is high enough,
      then all the DB writes get submitted by the kernel using this flow
      controlled approach to avoid DB drops.  As the load lightens though, we
      resume to normal DB writes directly by user applications.
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05eb2389
  13. 26 2月, 2014 1 次提交
  14. 25 2月, 2014 1 次提交
  15. 19 2月, 2014 7 次提交
  16. 25 1月, 2014 1 次提交
  17. 24 1月, 2014 2 次提交
    • G
      net/cxgb4: Don't retrieve stats during recovery · 9fe6cb58
      Gavin Shan 提交于
      We possibly retrieve the adapter's statistics during EEH recovery
      and that should be disallowed. Otherwise, it would possibly incur
      replicate EEH error and EEH recovery is going to fail eventually.
      
      The patch reuses statistics lock and checks net_device is attached
      before going to retrieve statistics, so that the problem can be
      avoided.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fe6cb58
    • G
      net/cxgb4: Avoid disabling PCI device for towice · 144be3d9
      Gavin Shan 提交于
      If we have EEH error happens to the adapter and we have to remove
      it from the system for some reasons (e.g. more than 5 EEH errors
      detected from the device in last hour), the adapter will be disabled
      for towice separately by eeh_err_detected() and remove_one(), which
      will incur following unexpected backtrace. The patch tries to avoid
      it.
      
      WARNING: at drivers/pci/pci.c:1431
      CPU: 12 PID: 121 Comm: eehd Not tainted 3.13.0-rc7+ #1
      task: c0000001823a3780 ti: c00000018240c000 task.ti: c00000018240c000
      NIP: c0000000003c1e40 LR: c0000000003c1e3c CTR: 0000000001764c5c
      REGS: c00000018240f470 TRAP: 0700   Not tainted  (3.13.0-rc7+)
      MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI>  CR: 28000024  XER: 00000004
      CFAR: c000000000706528 SOFTE: 1
      GPR00: c0000000003c1e3c c00000018240f6f0 c0000000010fe1f8 0000000000000035
      GPR04: 0000000000000000 0000000000000000 00000000003ae509 0000000000000000
      GPR08: 000000000000346f 0000000000000000 0000000000000000 0000000000003fef
      GPR12: 0000000028000022 c00000000ec93000 c0000000000c11b0 c000000184ac3e40
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR24: 0000000000000000 c0000000009398d8 c00000000101f9c0 c0000001860ae000
      GPR28: c000000182ba0000 00000000000001f0 c0000001860ae6f8 c0000001860ae000
      NIP [c0000000003c1e40] .pci_disable_device+0xd0/0xf0
      LR [c0000000003c1e3c] .pci_disable_device+0xcc/0xf0
      Call Trace:
      [c0000000003c1e3c] .pci_disable_device+0xcc/0xf0 (unreliable)
      [d0000000073881c4] .remove_one+0x174/0x320 [cxgb4]
      [c0000000003c57e0] .pci_device_remove+0x60/0x100
      [c00000000046396c] .__device_release_driver+0x9c/0x120
      [c000000000463a20] .device_release_driver+0x30/0x60
      [c0000000003bcdb4] .pci_stop_bus_device+0x94/0xd0
      [c0000000003bcf48] .pci_stop_and_remove_bus_device+0x18/0x30
      [c00000000003f548] .pcibios_remove_pci_devices+0xa8/0x140
      [c000000000035c00] .eeh_handle_normal_event+0xa0/0x3c0
      [c000000000035f50] .eeh_handle_event+0x30/0x2b0
      [c0000000000362c4] .eeh_event_handler+0xf4/0x1b0
      [c0000000000c12b8] .kthread+0x108/0x130
      [c00000000000a168] .ret_from_kernel_thread+0x5c/0x74
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      144be3d9
  18. 23 12月, 2013 5 次提交
  19. 04 12月, 2013 2 次提交
  20. 18 10月, 2013 1 次提交
  21. 27 9月, 2013 1 次提交