1. 06 9月, 2018 1 次提交
    • A
      IB/ipoib: Avoid a race condition between start_xmit and cm_rep_handler · 816e846c
      Aaron Knister 提交于
      Inside of start_xmit() the call to check if the connection is up and the
      queueing of the packets for later transmission is not atomic which leaves
      a window where cm_rep_handler can run, set the connection up, dequeue
      pending packets and leave the subsequently queued packets by start_xmit()
      sitting on neigh->queue until they're dropped when the connection is torn
      down. This only applies to connected mode. These dropped packets can
      really upset TCP, for example, and cause multi-minute delays in
      transmission for open connections.
      
      Here's the code in start_xmit where we check to see if the connection is
      up:
      
             if (ipoib_cm_get(neigh)) {
                     if (ipoib_cm_up(neigh)) {
                             ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
                             goto unref;
                     }
             }
      
      The race occurs if cm_rep_handler execution occurs after the above
      connection check (specifically if it gets to the point where it acquires
      priv->lock to dequeue pending skb's) but before the below code snippet in
      start_xmit where packets are queued.
      
             if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
                     push_pseudo_header(skb, phdr->hwaddr);
                     spin_lock_irqsave(&priv->lock, flags);
                     __skb_queue_tail(&neigh->queue, skb);
                     spin_unlock_irqrestore(&priv->lock, flags);
             } else {
                     ++dev->stats.tx_dropped;
                     dev_kfree_skb_any(skb);
             }
      
      The patch acquires the netif tx lock in cm_rep_handler for the section
      where it sets the connection up and dequeues and retransmits deferred
      skb's.
      
      Fixes: 839fcaba ("IPoIB: Connected mode experimental support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAaron Knister <aaron.s.knister@nasa.gov>
      Tested-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      816e846c
  2. 03 8月, 2018 2 次提交
    • J
      IB/ipoib: Get rid of the sysfs_mutex · ee190ab7
      Jason Gunthorpe 提交于
      This mutex was introduced to deal with the deadlock formed by calling
      unregister_netdev from within the sysfs callback of a netdev.
      
      Now that we have priv_destructor and needs_free_netdev we can switch
      to the more targeted solution of running the unregister from a
      work queue. This avoids the deadlock and gets rid of the mutex.
      
      The next patch in the series needs this mutex eliminated to create
      atomicity of unregisteration.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      ee190ab7
    • J
      IB/ipoib: Get rid of IPOIB_FLAG_GOING_DOWN · 577e07ff
      Jason Gunthorpe 提交于
      This essentially duplicates the netdev's reg_state, so just use that
      directly. The reg_state is updated under the rntl_lock, and all places
      using GOING_DOWN already acquire the rtnl_lock so checking is safe.
      
      Since the only place we use GOING_DOWN is for the parent device this
      does not fix any bugs, but it is a step to tidy up the unregister flow
      so that after later patches the flow is uniform and sane.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      577e07ff
  3. 31 7月, 2018 2 次提交
  4. 25 7月, 2018 1 次提交
  5. 10 7月, 2018 3 次提交
  6. 19 6月, 2018 1 次提交
  7. 13 6月, 2018 1 次提交
    • K
      treewide: Use array_size() in vzalloc() · fad953ce
      Kees Cook 提交于
      The vzalloc() function has no 2-factor argument form, so multiplication
      factors need to be wrapped in array_size(). This patch replaces cases of:
      
              vzalloc(a * b)
      
      with:
              vzalloc(array_size(a, b))
      
      as well as handling cases of:
      
              vzalloc(a * b * c)
      
      with:
      
              vzalloc(array3_size(a, b, c))
      
      This does, however, attempt to ignore constant size factors like:
      
              vzalloc(4 * 1024)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        vzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        vzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        vzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
        vzalloc(
      -	SIZE * COUNT
      +	array_size(COUNT, SIZE)
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        vzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        vzalloc(C1 * C2 * C3, ...)
      |
        vzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants.
      @@
      expression E1, E2;
      constant C1, C2;
      @@
      
      (
        vzalloc(C1 * C2, ...)
      |
        vzalloc(
      -	E1 * E2
      +	array_size(E1, E2)
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      fad953ce
  8. 26 1月, 2018 1 次提交
  9. 04 1月, 2018 1 次提交
  10. 14 12月, 2017 1 次提交
  11. 12 12月, 2017 1 次提交
  12. 26 10月, 2017 2 次提交
  13. 29 9月, 2017 1 次提交
  14. 23 9月, 2017 1 次提交
  15. 25 8月, 2017 1 次提交
    • E
      IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock · 69956d83
      Erez Shitrit 提交于
      In order to avoid deadlock between sysfs functions (like create/delete
      child) and remove_one (both of them are using the sysfs lock and
      rtnl_lock) the driver will use a state mutex for sync.
      
      That will fix traces as the following:
      schedule+0x3e/0x90
      kernfs_drain+0x75/0xf0
      ? wait_woken+0x90/0x90
      __kernfs_remove+0x12e/0x1c0
      kernfs_remove+0x25/0x40
      sysfs_remove_dir+0x57/0x90
      kobject_del+0x22/0x60
      device_del+0x195/0x230
       pm_runtime_set_memalloc_noio+0xac/0xf0
      netdev_unregister_kobject+0x71/0x80
      rollback_registered_many+0x205/0x2f0
      rollback_registered+0x31/0x40
      unregister_netdevice_queue+0x58/0xb0
      unregister_netdev+0x20/0x30
      ipoib_remove_one+0xb7/0x240 [ib_ipoib]
      ib_unregister_device+0xbc/0x1b0 [ib_core]
      ib_unregister_mad_agent+0x29/0x30 [ib_core]
      mlx4_ib_remove+0x67/0x280 [mlx4_ib]
      INFO: task echo:24082 blocked for more than 120 seconds.
      Tainted: G           OE   4.1.12-37.5.1.el6uek.x86_64 #2
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
      message.
      Call Trace:
      schedule+0x3e/0x90
      schedule_preempt_disabled+0xe/0x10
      __mutex_lock_slowpath+0x95/0x110
      ? _rcu_barrier+0x177/0x220
      mutex_lock+0x23/0x40
      rtnl_lock+0x15/0x20
      netdev_run_todo+0x81/0x1f0
      rtnl_unlock+0xe/0x10
      ipoib_vlan_delete+0x12f/0x1c0 [ib_ipoib]
      delete_child+0x69/0x80 [ib_ipoib]
      dev_attr_store+0x20/0x30
      sysfs_kf_write+0x41/0x50
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Reviewed-by: NAlex Vesker <valex@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      69956d83
  16. 23 7月, 2017 1 次提交
  17. 18 7月, 2017 2 次提交
  18. 02 5月, 2017 1 次提交
  19. 21 4月, 2017 1 次提交
  20. 02 3月, 2017 1 次提交
  21. 19 2月, 2017 2 次提交
  22. 13 1月, 2017 4 次提交
  23. 04 12月, 2016 1 次提交
  24. 17 11月, 2016 1 次提交
  25. 14 10月, 2016 1 次提交
    • P
      IB/ipoib: move back IB LL address into the hard header · fc791b63
      Paolo Abeni 提交于
      After the commit 9207f9d4 ("net: preserve IP control block
      during GSO segmentation"), the GSO CB and the IPoIB CB conflict.
      That destroy the IPoIB address information cached there,
      causing a severe performance regression, as better described here:
      
      http://marc.info/?l=linux-kernel&m=146787279825501&w=2
      
      This change moves the data cached by the IPoIB driver from the
      skb control lock into the IPoIB hard header, as done before
      the commit 936d7de3 ("IPoIB: Stop lying about hard_header_len
      and use skb->cb to stash LL addresses").
      In order to avoid GRO issue, on packet reception, the IPoIB driver
      stash into the skb a dummy pseudo header, so that the received
      packets have actually a hard header matching the declared length.
      To avoid changing the connected mode maximum mtu, the allocated
      head buffer size is increased by the pseudo header length.
      
      After this commit, IPoIB performances are back to pre-regression
      value.
      
      v2 -> v3: rebased
      v1 -> v2: avoid changing the max mtu, increasing the head buf size
      
      Fixes: 9207f9d4 ("net: preserve IP control block during GSO segmentation")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc791b63
  26. 03 9月, 2016 1 次提交
    • E
      IB/ipoib: Fix memory corruption in ipoib cm mode connect flow · 546481c2
      Erez Shitrit 提交于
      When a new CM connection is being requested, ipoib driver copies data
      from the path pointer in the CM/tx object, the path object might be
      invalid at the point and memory corruption will happened later when now
      the CM driver will try using that data.
      
      The next scenario demonstrates it:
      	neigh_add_path --> ipoib_cm_create_tx -->
      	queue_work (pointer to path is in the cm/tx struct)
      	#while the work is still in the queue,
      	#the port goes down and causes the ipoib_flush_paths:
      	ipoib_flush_paths --> path_free --> kfree(path)
      	#at this point the work scheduled starts.
      	ipoib_cm_tx_start --> copy from the (invalid)path pointer:
      	(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
      	 -> memory corruption.
      
      To fix that the driver now starts the CM/tx connection only if that
      specific path exists in the general paths database.
      This check is protected with the relevant locks, and uses the gid from
      the neigh member in the CM/tx object which is valid according to the ref
      count that was taken by the CM/tx.
      
      Fixes: 839fcaba ('IPoIB: Connected mode experimental support')
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      546481c2
  27. 07 6月, 2016 1 次提交
    • E
      IB/IPoIB: Fix race between ipoib_remove_one to sysfs functions · 198b12f7
      Erez Shitrit 提交于
      In ipoib_remove_one the driver holds the rtnl_lock and tries to do some
      operation like dev_change_flags or unregister_netdev, while sysfs
      callback like ipoib_vlan_delete holds sysfs mutex and tries to hold the
      rtnl_lock via rtnl_trylock() and restart_syscall() if the lock is not
      free, meanwhile ipoib_remove_one tries to get the sysfs lock in order to
      free its sysfs directory, and we will get  a->b, b->a deadlock.
      
          Trace like the following:
      
              schedule+0x37/0x80
              schedule_preempt_disabled+0xe/0x10
              __mutex_lock_slowpath+0xb5/0x120
              mutex_lock+0x23/0x40
              rtnl_lock+0x15/0x20
              netdev_run_todo+0x17c/0x320
              rtnl_unlock+0xe/0x10
              ipoib_vlan_delete+0x11b/0x1b0 [ib_ipoib]
              delete_child+0x54/0x80 [ib_ipoib]
              dev_attr_store+0x18/0x30
              sysfs_kf_write+0x37/0x40
              mutex_lock+0x16/0x40
              SyS_write+0x55/0xc0
              entry_SYSCALL_64_fastpath+0x16/0x75
          And
              schedule+0x37/0x80
              __kernfs_remove+0x1a8/0x260
              ? wake_atomic_t_function+0x60/0x60
              kernfs_remove+0x25/0x40
              sysfs_remove_dir+0x50/0x80
              kobject_del+0x18/0x50
              device_del+0x19f/0x260
              netdev_unregister_kobject+0x6a/0x80
              rollback_registered_many+0x1fd/0x340
              rollback_registered+0x3c/0x70
              unregister_netdevice_queue+0x55/0xc0
              unregister_netdev+0x20/0x30
              ipoib_remove_one+0x114/0x1b0 [ib_ipoib]
              ib_unregister_client+0x4a/0x170 [ib_core]
              ? find_module_all+0x71/0xa0
              ipoib_cleanup_module+0x10/0x94 [ib_ipoib]
              SyS_delete_module+0x1b5/0x210
              entry_SYSCALL_64_fastpath+0x16/0x75
      
      The fix is by checking the flag IPOIB_FLAG_INTF_ON_DESTROY in order to
      get out from the sysfs function.
      
      Fixes: 862096a8 ("IB/ipoib: Add more rtnl_link_ops callbacks")
      Fixes: 9baa0b03 ("IB/ipoib: Add rtnl_link_ops support")
      Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      198b12f7
  28. 05 5月, 2016 1 次提交
  29. 03 3月, 2016 1 次提交
  30. 23 12月, 2015 1 次提交