1. 13 1月, 2017 4 次提交
    • F
      IB/ipoib: Add detailed error message to dev_queue_xmit call · d32b9a81
      Feras Daoud 提交于
      Add a detailed return code to dev_queue_xmit function when
      calling to requeue packet via __skb_dequeue.
      Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d32b9a81
    • F
      IB/ipoib: Fix deadlock between rmmod and set_mode · 0a0007f2
      Feras Daoud 提交于
      When calling set_mode from sys/fs, the call flow locks the sys/fs lock
      first and then tries to lock rtnl_lock (when calling ipoib_set_mod).
      On the other hand, the rmmod call flow takes the rtnl_lock first
      (when calling unregister_netdev) and then tries to take the sys/fs
      lock. Deadlock a->b, b->a.
      
      The problem starts when ipoib_set_mod frees it's rtnl_lck and tries
      to get it after that.
      
          set_mod:
          [<ffffffff8104f2bd>] ? check_preempt_curr+0x6d/0x90
          [<ffffffff814fee8e>] __mutex_lock_slowpath+0x13e/0x180
          [<ffffffff81448655>] ? __rtnl_unlock+0x15/0x20
          [<ffffffff814fed2b>] mutex_lock+0x2b/0x50
          [<ffffffff81448675>] rtnl_lock+0x15/0x20
          [<ffffffffa02ad807>] ipoib_set_mode+0x97/0x160 [ib_ipoib]
          [<ffffffffa02b5f5b>] set_mode+0x3b/0x80 [ib_ipoib]
          [<ffffffff8134b840>] dev_attr_store+0x20/0x30
          [<ffffffff811f0fe5>] sysfs_write_file+0xe5/0x170
          [<ffffffff8117b068>] vfs_write+0xb8/0x1a0
          [<ffffffff8117ba81>] sys_write+0x51/0x90
          [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      
          rmmod:
          [<ffffffff81279ffc>] ? put_dec+0x10c/0x110
          [<ffffffff8127a2ee>] ? number+0x2ee/0x320
          [<ffffffff814fe6a5>] schedule_timeout+0x215/0x2e0
          [<ffffffff8127cc04>] ? vsnprintf+0x484/0x5f0
          [<ffffffff8127b550>] ? string+0x40/0x100
          [<ffffffff814fe323>] wait_for_common+0x123/0x180
          [<ffffffff81060250>] ? default_wake_function+0x0/0x20
          [<ffffffff8119661e>] ? ifind_fast+0x5e/0xb0
          [<ffffffff814fe43d>] wait_for_completion+0x1d/0x20
          [<ffffffff811f2e68>] sysfs_addrm_finish+0x228/0x270
          [<ffffffff811f2fb3>] sysfs_remove_dir+0xa3/0xf0
          [<ffffffff81273f66>] kobject_del+0x16/0x40
          [<ffffffff8134cd14>] device_del+0x184/0x1e0
          [<ffffffff8144e59b>] netdev_unregister_kobject+0xab/0xc0
          [<ffffffff8143c05e>] rollback_registered+0xae/0x130
          [<ffffffff8143c102>] unregister_netdevice+0x22/0x70
          [<ffffffff8143c16e>] unregister_netdev+0x1e/0x30
          [<ffffffffa02a91b0>] ipoib_remove_one+0xe0/0x120 [ib_ipoib]
          [<ffffffffa01ed95f>] ib_unregister_device+0x4f/0x100 [ib_core]
          [<ffffffffa021f5e1>] mlx4_ib_remove+0x41/0x180 [mlx4_ib]
          [<ffffffffa01ab771>] mlx4_remove_device+0x71/0x90 [mlx4_core]
      
      Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
      Cc: <stable@vger.kernel.org> # v3.6+
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      0a0007f2
    • F
      IB/ipoib: Set device connection mode only when needed · 80b5b35a
      Feras Daoud 提交于
      When changing the connection mode, the ipoib_set_mode function
      did not check if the previous connection mode equals to the
      new one. This commit adds the required check and return 0 if the new
      mode equals to the previous one.
      
      Fixes: 839fcaba ("IPoIB: Connected mode experimental support")
      Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: NAlex Vesker <valex@mellanox.com>
      Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      80b5b35a
    • F
      IB/ipoib: When given an invalid UD MTU, give debug msg · 29da686d
      Feras Daoud 提交于
      In datagram mode, the IB UD (Unreliable Datagram) transport is used
      so the MTU of the interface is equal to the IB L2 MTU minus the
      IPoIB encapsulation header. Any request to change the MTU value
      above the maximum range will change the MTU to the max allowed, but
      will not show any warning message. An ipoib_warn is issued in such
      cases, letting the user know that even though the value is legal,
      it can't be currently applied.
      Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: NNoa Osherovich <noaos@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      29da686d
  2. 04 12月, 2016 1 次提交
  3. 21 10月, 2016 1 次提交
    • J
      net: use core MTU range checking in misc drivers · b3e3893e
      Jarod Wilson 提交于
      firewire-net:
      - set min/max_mtu
      - remove fwnet_change_mtu
      
      nes:
      - set max_mtu
      - clean up nes_netdev_change_mtu
      
      xpnet:
      - set min/max_mtu
      - remove xpnet_dev_change_mtu
      
      hippi:
      - set min/max_mtu
      - remove hippi_change_mtu
      
      batman-adv:
      - set max_mtu
      - remove batadv_interface_change_mtu
      - initialization is a little async, not 100% certain that max_mtu is set
        in the optimal place, don't have hardware to test with
      
      rionet:
      - set min/max_mtu
      - remove rionet_change_mtu
      
      slip:
      - set min/max_mtu
      - streamline sl_change_mtu
      
      um/net_kern:
      - remove pointless ndo_change_mtu
      
      hsi/clients/ssi_protocol:
      - use core MTU range checking
      - remove now redundant ssip_pn_set_mtu
      
      ipoib:
      - set a default max MTU value
      - Note: ipoib's actual max MTU can vary, depending on if the device is in
        connected mode or not, so we'll just set the max_mtu value to the max
        possible, and let the ndo_change_mtu function continue to validate any new
        MTU change requests with checks for CM or not. Note that ipoib has no
        min_mtu set, and thus, the network core's mtu > 0 check is the only lower
        bounds here.
      
      mptlan:
      - use net core MTU range checking
      - remove now redundant mpt_lan_change_mtu
      
      fddi:
      - min_mtu = 21, max_mtu = 4470
      - remove now redundant fddi_change_mtu (including export)
      
      fjes:
      - min_mtu = 8192, max_mtu = 65536
      - The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to
        get past the core net MTU range checks so fjes_change_mtu can validate a
        new MTU against what it supports (see fjes_support_mtu in fjes_hw.c)
      
      hsr:
      - min_mtu = 0 (calls ether_setup, max_mtu is 1500)
      
      f_phonet:
      - min_mtu = 6, max_mtu = 65541
      
      u_ether:
      - min_mtu = 14, max_mtu = 15412
      
      phonet/pep-gprs:
      - min_mtu = 576, max_mtu = 65530
      - remove redundant gprs_set_mtu
      
      CC: netdev@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: Stefan Richter <stefanr@s5r6.in-berlin.de>
      CC: Faisal Latif <faisal.latif@intel.com>
      CC: linux-rdma@vger.kernel.org
      CC: Cliff Whickman <cpw@sgi.com>
      CC: Robin Holt <robinmholt@gmail.com>
      CC: Jes Sorensen <jes@trained-monkey.org>
      CC: Marek Lindner <mareklindner@neomailbox.ch>
      CC: Simon Wunderlich <sw@simonwunderlich.de>
      CC: Antonio Quartulli <a@unstable.cc>
      CC: Sathya Prakash <sathya.prakash@broadcom.com>
      CC: Chaitra P B <chaitra.basappa@broadcom.com>
      CC: Suganath Prabu Subramani <suganath-prabu.subramani@broadcom.com>
      CC: MPT-FusionLinux.pdl@broadcom.com
      CC: Sebastian Reichel <sre@kernel.org>
      CC: Felipe Balbi <balbi@kernel.org>
      CC: Arvid Brodin <arvid.brodin@alten.se>
      CC: Remi Denis-Courmont <courmisch@gmail.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3e3893e
  4. 18 10月, 2016 1 次提交
  5. 14 10月, 2016 1 次提交
    • P
      IB/ipoib: move back IB LL address into the hard header · fc791b63
      Paolo Abeni 提交于
      After the commit 9207f9d4 ("net: preserve IP control block
      during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
      That destroy the IPoIB address information cached there,
      causing a severe performance regression, as better described here:
      
      http://marc.info/?l=linux-kernel&m=146787279825501&w=2
      
      This change moves the data cached by the IPoIB driver from the
      skb control lock into the IPoIB hard header, as done before
      the commit 936d7de3 ("IPoIB: Stop lying about hard_header_len
      and use skb->cb to stash LL addresses").
      In order to avoid GRO issue, on packet reception, the IPoIB driver
      stash into the skb a dummy pseudo header, so that the received
      packets have actually a hard header matching the declared length.
      To avoid changing the connected mode maximum mtu, the allocated
      head buffer size is increased by the pseudo header length.
      
      After this commit, IPoIB performances are back to pre-regression
      value.
      
      v2 -> v3: rebased
      v1 -> v2: avoid changing the max mtu, increasing the head buf size
      
      Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc791b63
  6. 08 10月, 2016 1 次提交
  7. 03 9月, 2016 1 次提交
    • E
      IB/ipoib: Fix memory corruption in ipoib cm mode connect flow · 546481c2
      Erez Shitrit 提交于
      When a new CM connection is being requested, ipoib driver copies data
      from the path pointer in the CM/tx object, the path object might be
      invalid at the point and memory corruption will happened later when now
      the CM driver will try using that data.
      
      The next scenario demonstrates it:
      	neigh_add_path --> ipoib_cm_create_tx -->
      	queue_work (pointer to path is in the cm/tx struct)
      	#while the work is still in the queue,
      	#the port goes down and causes the ipoib_flush_paths:
      	ipoib_flush_paths --> path_free --> kfree(path)
      	#at this point the work scheduled starts.
      	ipoib_cm_tx_start --> copy from the (invalid)path pointer:
      	(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
      	 -> memory corruption.
      
      To fix that the driver now starts the CM/tx connection only if that
      specific path exists in the general paths database.
      This check is protected with the relevant locks, and uses the gid from
      the neigh member in the CM/tx object which is valid according to the ref
      count that was taken by the CM/tx.
      
      Fixes: 839fcaba ('IPoIB: Connected mode experimental support')
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      546481c2
  8. 04 8月, 2016 1 次提交
  9. 07 6月, 2016 3 次提交
    • E
      IB/IPoIB: Don't update neigh validity for unresolved entries · 61c78eea
      Erez Shitrit 提交于
      ipoib_neigh_get unconditionally updates the "alive" variable member on
      any packet send.  This prevents the neighbor garbage collection from
      cleaning out a dead neighbor entry if we are still queueing packets
      for it.  If the queue for this neighbor is full, then don't update the
      alive timestamp.  That way the neighbor can time out even if packets
      are still being queued as long as none of them are being sent.
      
      Fixes: b63b70d8 ("IPoIB: Use a private hash table for path lookup in xmit path")
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      61c78eea
    • M
      IB/IPoIB: Disable bottom half when dealing with device address · 9b29953b
      Mark Bloch 提交于
      Align locking usage when touching device address with rest
      of the kernel. Lock the bottom half when doing so using
      netif_addr_lock_bh.
      
      This also solves the following case as reported by lockdep:
      	CPU0                    CPU1
      	----                    ----
      lock(_xmit_INFINIBAND);
      				local_irq_disable();
      				lock(&(&mc->mca_lock)->rlock);
      				lock(_xmit_INFINIBAND);
      <Interrupt>
      lock(&(&mc->mca_lock)->rlock);
      
      *** DEADLOCK ***
      
      Fixes: 492a7e67 ("IB/IPoIB: Allow setting the device address")
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      9b29953b
    • E
      IB/IPoIB: Fix race between ipoib_remove_one to sysfs functions · 198b12f7
      Erez Shitrit 提交于
      In ipoib_remove_one the driver holds the rtnl_lock and tries to do some
      operation like dev_change_flags or unregister_netdev, while sysfs
      callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the
      rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not
      free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to
      free its sysfs directory, and we will get  a->b, b->a deadlock.
      
          Trace like the following:
      
              schedule+0x37/0x80
              schedule_preempt_disabled+0xe/0x10
              __mutex_lock_slowpath+0xb5/0x120
              mutex_lock+0x23/0x40
              rtnl_lock+0x15/0x20
              netdev_run_todo+0x17c/0x320
              rtnl_unlock+0xe/0x10
              ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib]
              delete_child+0x54/0x80 [ib_ipoib]
              dev_attr_store+0x18/0x30
              sysfs_kf_write+0x37/0x40
              mutex_lock+0x16/0x40
              SyS_write+0x55/0xc0
              entry_SYSCALL_64_fastpath+0x16/0x75
          And
              schedule+0x37/0x80
              __kernfs_remove+0x1a8/0x260
              ? wake_atomic_t_function+0x60/0x60
              kernfs_remove+0x25/0x40
              sysfs_remove_dir+0x50/0x80
              kobject_del+0x18/0x50
              device_del+0x19f/0x260
              netdev_unregister_kobject+0x6a/0x80
              rollback_registered_many+0x1fd/0x340
              rollback_registered+0x3c/0x70
              unregister_netdevice_queue+0x55/0xc0
              unregister_netdev+0x20/0x30
              ipoib_remove_one+0x114/0x1b0 [ib_ipoib]
              ib_unregister_client+0x4a/0x170 [ib_core]
              ? find_module_all+0x71/0xa0
              ipoib_cleanup_module+0x10/0x94 [ib_ipoib]
              SyS_delete_module+0x1b5/0x210
              entry_SYSCALL_64_fastpath+0x16/0x75
      
      The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to
      get out from the sysfs function.
      
      Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
      Fixes: 9baa0b03 ("IB/ipoib: Add rtnl_link_ops support")
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      198b12f7
  10. 26 5月, 2016 2 次提交
    • M
      IB/IPoIB: Allow setting the device address · 492a7e67
      Mark Bloch 提交于
      In IB networks, and specifically in IPoIB/rdmacm traffic, the device
      address of an IPoIB interface is used as a means to exchange information
      between nodes needed for communication.
      
      Currently an IPoIB interface will always be created with a device
      address based on its node GUID without a way to change that.
      
      This change adds the ability to set the device address of an IPoIB
      interface by value. We use the set mac address ndo to do that.
      
      The flow should be broken down to two:
      1) The GID value is already in the GID table,
         in this case the interface will be able to set carrier up.
      
      2) The GID value is not yet in the GID table,
         in this case the interface won't try to join the multicast group
         and will wait (listen on GID_CHANGE event) until the GID is inserted.
      
      In order to track those changes, we add a new flag:
      * IPOIB_FLAG_DEV_ADDR_SET.
      
      When set, it means the dev_addr is a based on a value in the gid
      table. this bit will be cleared upon a dev_addr change triggered
      by the user and set after validation.
      
      Per IB spec the port GUID can't change if the module is loaded.
      port GUID is the basis for GID at index 0 which is the basis for
      the default device address of a ipoib interface.
      
      The issue is that there are devices that don't follow the spec,
      they change the port GUID while HCA is powered on, so in order
      not to break userspace applications. We need to check if the
      user wanted to control the device address and we assume that
      if he sets the device address back to be based on GID index 0,
      he no longer wishs to control it.
      
      In order to track this, we add an additional flag:
      * IPOIB_FLAG_DEV_ADDR_CTRL
      
      When setting the device address, there is no validation of the upper
      twelve bytes of the device address (flags, qpn, subnet prefix) as those
      bytes are not under the control of the user.
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      492a7e67
    • E
      IB/ipoib: Support SendOnlyFullMember MCG for SendOnly join · 3b561130
      Erez Shitrit 提交于
      Check (via an SA query) if the SM supports the new option for SendOnly
      multicast joins.
      If the SM supports that option it will use the new join state to create
      such multicast group.
      If SendOnlyFullMember is supported, we wouldn't use faked FullMember state
      join for SendOnly MCG, use the correct state if supported.
      
      This check is performed at every invocation of mcast_restart task, to be
      sure that the driver stays in sync with the current state of the SM.
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3b561130
  11. 05 5月, 2016 1 次提交
    • F
      drivers: replace dev->trans_start accesses with dev_trans_start · 4d0e9657
      Florian Westphal 提交于
      a trans_start struct member exists twice:
      - in struct net_device (legacy)
      - in struct netdev_queue
      
      Instead of open-coding dev->trans_start usage to obtain the current
      trans_start value, use dev_trans_start() instead.
      
      This is not exactly the same, as dev_trans_start also considers
      the trans_start values of the netdev queues owned by the device
      and provides the most recent one.
      
      For legacy devices this doesn't matter as dev_trans_start can cope
      with netdev trans_start values of 0 (they are ignored).
      
      This is a prerequisite to eventual removal of dev->trans_start.
      
      Cc: linux-rdma@vger.kernel.org
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d0e9657
  12. 22 3月, 2016 1 次提交
  13. 20 1月, 2016 1 次提交
    • E
      IB/IPoIB: Fix kernel panic on multicast flow · 50be28de
      Erez Shitrit 提交于
      ipoib_mcast_restart_task calls ipoib_mcast_remove_list with the
      parameter mcast->dev. That mcast is a temporary (used as an iterator)
      variable that may be uninitialized.
      There is no need to send the variable dev to the function, as each mcast
      has its dev as a member in the mcast struct.
      
      This causes the next panic:
      RIP: 0010: ipoib_mcast_leave+0x6d/0xf0 [ib_ipoib]
      RSP: 0018: EFLAGS: 00010246
      RAX: f0201 RBX: 24e00 RCX: 00000
      ....
      ....
      Stack:
      Call Trace:
      	ipoib_mcast_remove_list+0x3a/0x70 [ib_ipoib]
      	ipoib_mcast_restart_task+0x3bb/0x520 [ib_ipoib]
      	process_one_work+0x164/0x470
      	worker_thread+0x11d/0x420
      	...
      
      Fixes: 5a0e81f6 ('IB/IPoIB: factor out common multicast list removal code')
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Reported-by: NDoron Tsur <doront@mellanox.com>
      Reviewed-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      50be28de
  14. 24 12月, 2015 2 次提交
  15. 23 12月, 2015 1 次提交
  16. 22 10月, 2015 1 次提交
  17. 14 10月, 2015 1 次提交
  18. 08 10月, 2015 1 次提交
    • C
      IB: split struct ib_send_wr · e622f2f4
      Christoph Hellwig 提交于
      This patch split up struct ib_send_wr so that all non-trivial verbs
      use their own structure which embedds struct ib_send_wr.  This dramaticly
      shrinks the size of a WR for most common operations:
      
      sizeof(struct ib_send_wr) (old):	96
      
      sizeof(struct ib_send_wr):		48
      sizeof(struct ib_rdma_wr):		64
      sizeof(struct ib_atomic_wr):		96
      sizeof(struct ib_ud_wr):		88
      sizeof(struct ib_fast_reg_wr):		88
      sizeof(struct ib_bind_mw_wr):		96
      sizeof(struct ib_sig_handover_wr):	80
      
      And with Sagi's pending MR rework the fast registration WR will also be
      down to a reasonable size:
      
      sizeof(struct ib_fastreg_wr):		64
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
      Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
      Tested-by: NHaggai Eran <haggaie@mellanox.com>
      Tested-by: NSagi Grimberg <sagig@mellanox.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      e622f2f4
  19. 26 9月, 2015 1 次提交
  20. 31 8月, 2015 2 次提交
  21. 15 7月, 2015 4 次提交
  22. 02 6月, 2015 1 次提交
  23. 19 5月, 2015 1 次提交
  24. 18 4月, 2015 1 次提交
  25. 16 4月, 2015 4 次提交
    • E
      IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's · 1e85b806
      Erez Shitrit 提交于
      Whenever there is no path->ah to the destination, keep only defined
      number of skb's. Otherwise there are cases that the driver can keep
      infinite list of skb's.
      
      For example, when one device want to send unicast arp to the destination,
      and from some reason the SM doesn't respond, the driver currently keeps
      all the skb's. If that unicast arp traffic stopped, all  these skb's
      are kept by the path object till the interface is down.
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      1e85b806
    • D
      IB/ipoib: No longer use flush as a parameter · efc82eee
      Doug Ledford 提交于
      Various places in the IPoIB code had a deadlock related to flushing
      the ipoib workqueue.  Now that we have per device workqueues and a
      specific flush workqueue, there is no longer a deadlock issue with
      flushing the device specific workqueues and we can do so unilaterally.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      efc82eee
    • D
      IB/ipoib: Use dedicated workqueues per interface · 0b39578b
      Doug Ledford 提交于
      During my recent work on the rtnl lock deadlock in the IPoIB driver, I
      saw that even once I fixed the apparent races for a single device, as
      soon as that device had any children, new races popped up.  It turns
      out that this is because no matter how well we protect against races
      on a single device, the fact that all devices use the same workqueue,
      and flush_workqueue() flushes *everything* from that workqueue means
      that we would also have to prevent all races between different devices
      (for instance, ipoib_mcast_restart_task on interface ib0 can race with
      ipoib_mcast_flush_dev on interface ib0.8002, resulting in a deadlock on
      the rtnl_lock).
      
      There are several possible solutions to this problem:
      
      Make carrier_on_task and mcast_restart_task try to take the rtnl for
      some set period of time and if they fail, then bail.  This runs the
      real risk of dropping work on the floor, which can end up being its
      own separate kind of deadlock.
      
      Set some global flag in the driver that says some device is in the
      middle of going down, letting all tasks know to bail.  Again, this can
      drop work on the floor.
      
      Or the method this patch attempts to use, which is when we bring an
      interface up, create a workqueue specifically for that interface, so
      that when we take it back down, we are flushing only those tasks
      associated with our interface.  In addition, keep the global
      workqueue, but now limit it to only flush tasks.  In this way, the
      flush tasks can always flush the device specific work queues without
      having deadlock issues.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      0b39578b
    • D
      IB/ipoib: change init sequence ordering · be7aa663
      Doug Ledford 提交于
      In preparation for using per device work queues, we need to move the
      start of the neighbor thread task to after ipoib_ib_dev_init and move
      the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
      Otherwise we will end up freeing our workqueue with work possibly
      still on it.
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      be7aa663
  26. 03 4月, 2015 1 次提交