1. 19 12月, 2017 1 次提交
    • P
      IB/{core/cm}: Fix generating a return AH for RoCEE · 1060f865
      Parav Pandit 提交于
      When computing a UD reverse path (return AH) from a WC the code was not
      doing a route lookup anchored in a specific netdevice. This caused several
      bugs, including broken IPv6 link-local address support in RoCEv2. [1]
      
      This fixes the lookup by determining the GID table entry that the HW
      matched to the SGID for the WC and then using the netdevice from that
      entry to perform the route and ND lookup for the 'DGID' to build a return
      AH.
      
      RoCE GID table management ensures that right upper netdevices of the
      physical netdevices are added. Therefore init_ah_from_wc doesn't need to
      perform such check.
      
      Now that route lookup is done based on the netdevice of the GID entry,
      simplify code to not have ifindex and vlan pointers.  As part of that,
      refactor to have netdevice as input parameter.  This is already discussed
      at [2].
      
      Finally ib_init_ah_from_wc resolves dmac for unicast GID in similar way as
      what ib_resolve_eth_dmac() does. So ib_resolve_eth_dmac is refactored to
      split for unicast and non unicast GIDs, so that it can be reused by
      ib_init_ah_from_wc.
      
      While we are at refactoring ib_resolve_eth_dmac(), it is further
      simplified
      
      (a) to avoid hoplimit as optional parameter, as there is only one
          user who always queries hoplimit.
      (b) for empty line.
      (c) avoided zero initialization of ret.
      (d) removed as exported symbol as only ib core uses it.
      
      For IPv6, this is tested using simple rping test as below.
       rping -sv -a ::0
       rping -c -a fe80::268a:7ff:fe55:4661%ens2f1 -C 1 -v -d
      
      [1] https://www.spinics.net/lists/linux-rdma/msg45690.html
      [2] https://www.spinics.net/lists/linux-rdma/msg45710.htmlSigned-off-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NMatan Barak <matanb@mellanox.com>
      Reviewed-by: NMark Bloch <markb@mellanox.com>
      Reported-by: NRoland Dreier <roland@purestorage.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      1060f865
  2. 14 11月, 2017 1 次提交
  3. 19 10月, 2017 1 次提交
  4. 10 8月, 2017 4 次提交
  5. 05 8月, 2017 1 次提交
    • P
      IB/core: Fix race condition in resolving IP to MAC · 5fff41e1
      Parav Pandit 提交于
      Currently while resolving IP address to MAC address single delayed work
      is used for resolving multiple such resolve requests. This singled work
      is essentially performs two tasks.
      (a) any retry needed to resolve and
      (b) it executes the callback function for all completed requests
      
      While work is executing callbacks, any new work scheduled on for this
      workqueue is lost because workqueue has completed looking at all pending
      requests and now looking at callbacks, but work is still under
      execution. Any further retry to look at pending requests in
      process_req() after executing callbacks would lead to similar race
      condition (may be reduce the probably further but doesn't eliminate it).
      Retrying to enqueue work that from queue_req() context is not something
      rest of the kernel modules have followed.
      
      Therefore fix in this patch utilizes kernel facility to enqueue multiple
      work items to a workqueue. This ensures that no such requests
      gets lost in synchronization. Request list is still maintained so that
      rdma_cancel_addr() can unlink the request and get the completion with
      error sooner. Neighbour update event handling continues to be handled in
      same way as before.
      Additionally process_req() work entry cancels any pending work for a
      request that gets completed while processing those requests.
      
      Originally ib_addr was ST workqueue, but it became MT work queue with
      patch of [1]. This patch again makes it similar to ST so that
      neighbour update events handler work item doesn't race with
      other work items.
      
      In one such below trace, (though on 4.5 based kernel) it can be seen
      that process_req() never executed the callback, which is likely for an
      event that was schedule by queue_req() when previous callback was
      getting executed by workqueue.
      
       [<ffffffff816b0dde>] schedule+0x3e/0x90
       [<ffffffff816b3c45>] schedule_timeout+0x1b5/0x210
       [<ffffffff81618c37>] ? ip_route_output_flow+0x27/0x70
       [<ffffffffa027f9c9>] ? addr_resolve+0x149/0x1b0 [ib_addr]
       [<ffffffff816b228f>] wait_for_completion+0x10f/0x170
       [<ffffffff810b6140>] ? try_to_wake_up+0x210/0x210
       [<ffffffffa027f220>] ? rdma_copy_addr+0xa0/0xa0 [ib_addr]
       [<ffffffffa0280120>] rdma_addr_find_l2_eth_by_grh+0x1d0/0x278 [ib_addr]
       [<ffffffff81321297>] ? sub_alloc+0x77/0x1c0
       [<ffffffffa02943b7>] ib_init_ah_from_wc+0x3a7/0x5a0 [ib_core]
       [<ffffffffa0457aba>] cm_req_handler+0xea/0x580 [ib_cm]
       [<ffffffff81015982>] ? __switch_to+0x212/0x5e0
       [<ffffffffa04582fd>] cm_work_handler+0x6d/0x150 [ib_cm]
       [<ffffffff810a14c1>] process_one_work+0x151/0x4b0
       [<ffffffff810a1940>] worker_thread+0x120/0x480
       [<ffffffff816b074b>] ? __schedule+0x30b/0x890
       [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
       [<ffffffff810a1820>] ? process_one_work+0x4b0/0x4b0
       [<ffffffff810a6b1e>] kthread+0xce/0xf0
       [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
       [<ffffffff816b53a2>] ret_from_fork+0x42/0x70
       [<ffffffff810a6a50>] ? kthread_freezable_should_stop+0x70/0x70
      INFO: task kworker/u144:1:156520 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
      message.
      kworker/u144:1  D ffff883ffe1d7600     0 156520      2 0x00000080
      Workqueue: ib_addr process_req [ib_addr]
       ffff883f446fbbd8 0000000000000046 ffff881f95280000 ffff881ff24de200
       ffff883f66120000 ffff883f446f8008 ffff881f95280000 ffff883f6f9208c4
       ffff883f6f9208c8 00000000ffffffff ffff883f446fbbf8 ffffffff816b0dde
      
      [1] http://lkml.iu.edu/hypermail/linux/kernel/1608.1/05834.htmlSigned-off-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      5fff41e1
  6. 17 7月, 2017 2 次提交
  7. 16 6月, 2017 1 次提交
    • J
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      which actually doesn't cover pskb_put since there are only three
      users overall.
      
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4df864c1
  8. 08 6月, 2017 1 次提交
    • R
      IB/addr: Fix setting source address in addr6_resolve() · 79e25959
      Roland Dreier 提交于
      Commit eea40b8f ("infiniband: call ipv6 route lookup via the stub
      interface") introduced a regression in address resolution when connecting
      to IPv6 destination addresses.  The old code called ip6_route_output(),
      while the new code calls ipv6_stub->ipv6_dst_lookup().  The two are almost
      the same, except that ipv6_dst_lookup() also calls ip6_route_get_saddr()
      if the source address is in6addr_any.
      
      This means that the test of ipv6_addr_any(&fl6.saddr) now never succeeds,
      and so we never copy the source address out.  This ends up causing
      rdma_resolve_addr() to fail, because without a resolved source address,
      cma_acquire_dev() will fail to find an RDMA device to use.  For me, this
      causes connecting to an NVMe over Fabrics target via RoCE / IPv6 to fail.
      
      Fix this by copying out fl6.saddr if ipv6_addr_any() is true for the original
      source address passed into addr6_resolve().  We can drop our call to
      ipv6_dev_get_saddr() because ipv6_dst_lookup() already does that work.
      
      Fixes: eea40b8f ("infiniband: call ipv6 route lookup via the stub interface")
      Cc: <stable@vger.kernel.org> # 3.12+
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      79e25959
  9. 02 5月, 2017 1 次提交
  10. 29 4月, 2017 1 次提交
    • P
      infiniband: call ipv6 route lookup via the stub interface · eea40b8f
      Paolo Abeni 提交于
      The infiniband address handle can be triggered to resolve an ipv6
      address in response to MAD packets, regardless of the ipv6
      module being disabled via the kernel command line argument.
      
      That will cause a call into the ipv6 routing code, which is not
      initialized, and a conseguent oops.
      
      This commit addresses the above issue replacing the direct lookup
      call with an indirect one via the ipv6 stub, which is properly
      initialized according to the ipv6 status (e.g. if ipv6 is
      disabled, the routing lookup fails gracefully)
      
      Cc: stable@vger.kernel.org # 3.12+
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      eea40b8f
  11. 14 4月, 2017 1 次提交
  12. 17 11月, 2016 1 次提交
  13. 08 10月, 2016 1 次提交
  14. 25 5月, 2016 2 次提交
  15. 20 1月, 2016 3 次提交
  16. 23 12月, 2015 2 次提交
  17. 29 10月, 2015 1 次提交
  18. 22 10月, 2015 1 次提交
    • M
      IB/core: Use GID table in AH creation and dmac resolution · dbf727de
      Matan Barak 提交于
      Previously, vlan id and source MAC were used from QP attributes. Since
      the net device is now stored in the GID attributes, they could be used
      instead of getting this information from the QP attributes.
      
      IB_QP_SMAC, IB_QP_ALT_SMAC, IB_QP_VID and IB_QP_ALT_VID were removed
      because there is no known libibverbs that uses them.
      
      This commit also modifies the vendors (mlx4, ocrdma) drivers in order
      to use the new approach.
      
      ocrdma driver changes were done by Somnath Kotur <Somnath.Kotur@Avagotech.Com>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      dbf727de
  19. 02 6月, 2015 1 次提交
  20. 06 5月, 2015 1 次提交
  21. 16 12月, 2014 1 次提交
    • O
      IB/addr: Improve address resolution callback scheduling · 346f98b4
      Or Kehati 提交于
      Address resolution always does a context switch to a work-queue to
      deliver the address resolution event.  When the IP address is already
      cached in the system ARP table, we're going through the following:
      chain:
      
          rdma_resolve_ip --> addr_resolve (cache hit) -->
      
      which ends up with:
      
          queue_req --> set_timeout (now) --> mod_delayed_work(,, delay=1)
      
      We actually do realize that the timeout should be zero, but the code
      forces it to a minimum of one jiffie.
      
      Using one jiffie as the minimum delay value results in sub-optimal
      scheduling of executing this work item by the workqueue, which on the
      below testbed costs about 3-4ms out of 12ms total time.
      
      To fix that, we let the minimum delay to be zero.  Note that the
      connect step times change too, as there are address resolution calls
      from that flow.
      
      The results were taken from running both client and server on the
      same node, over mlx4 RoCE port.
      
      before -->
      step              total ms     max ms     min us  us / conn
      create id    :        0.01       0.01       6.00       6.00
      resolve addr :        4.02       4.01    4013.00    4016.00
      resolve route:        0.18       0.18     182.00     183.00
      create qp    :        1.15       1.15    1150.00    1150.00
      connect      :        6.73       6.73    6730.00    6731.00
      disconnect   :        0.55       0.55     549.00     550.00
      destroy      :        0.01       0.01       9.00       9.00
      
      after -->
      step              total ms     max ms     min us  us / conn
      create id    :        0.01       0.01       6.00       6.00
      resolve addr :        0.05       0.05      49.00      52.00
      resolve route:        0.21       0.21     207.00     208.00
      create qp    :        1.10       1.10    1104.00    1104.00
      connect      :        1.22       1.22    1220.00    1221.00
      disconnect   :        0.71       0.71     713.00     713.00
      destroy      :        0.01       0.01       9.00       9.00
      Signed-off-by: NOr Kehati <ork@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Acked-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      346f98b4
  22. 15 1月, 2014 1 次提交
    • M
      IB/core: Ethernet L2 attributes in verbs/cm structures · dd5f03be
      Matan Barak 提交于
      This patch add the support for Ethernet L2 attributes in the
      verbs/cm/cma structures.
      
      When dealing with L2 Ethernet, we should use smac, dmac, vlan ID and priority
      in a similar manner that the IB L2 (and the L4 PKEY) attributes are used.
      
      Thus, those attributes were added to the following structures:
      
      * ib_ah_attr - added dmac
      * ib_qp_attr - added smac and vlan_id, (sl remains vlan priority)
      * ib_wc - added smac, vlan_id
      * ib_sa_path_rec - added smac, dmac, vlan_id
      * cm_av - added smac and vlan_id
      
      For the path record structure, extra care was taken to avoid the new
      fields when packing it into wire format, so we don't break the IB CM
      and SA wire protocol.
      
      On the active side, the CM fills. its internal structures from the
      path provided by the ULP.  We add there taking the ETH L2 attributes
      and placing them into the CM Address Handle (struct cm_av).
      
      On the passive side, the CM fills its internal structures from the WC
      associated with the REQ message.  We add there taking the ETH L2
      attributes from the WC.
      
      When the HW driver provides the required ETH L2 attributes in the WC,
      they set the IB_WC_WITH_SMAC and IB_WC_WITH_VLAN flags. The IB core
      code checks for the presence of these flags, and in their absence does
      address resolution from the ib_init_ah_from_wc() helper function.
      
      ib_modify_qp_is_ok is also updated to consider the link layer. Some
      parameters are mandatory for Ethernet link layer, while they are
      irrelevant for IB.  Vendor drivers are modified to support the new
      function signature.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      dd5f03be
  23. 21 6月, 2013 1 次提交
  24. 14 8月, 2012 1 次提交
    • T
      workqueue: use mod_delayed_work() instead of cancel + queue · 41f63c53
      Tejun Heo 提交于
      Convert delayed_work users doing cancel_delayed_work() followed by
      queue_delayed_work() to mod_delayed_work().
      
      Most conversions are straight-forward.  Ones worth mentioning are,
      
      * drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
        use mod_delayed_work() and cancel loop in
        edac_mc_reset_delay_period() is dropped.
      
      * drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
        watchdog is active or not.  @fan_watchdog_active and related code
        dropped.
      
      * drivers/power/charger-manager.c: Seemingly a lot of
        delayed_work_pending() abuse going on here.
        [delayed_]work_pending() are unsynchronized and racy when used like
        this.  I converted one instance in fullbatt_handler().  Please
        conver the rest so that it invokes workqueue APIs for the intended
        target state rather than trying to game work item pending state
        transitions.  e.g. if timer should be modified - call
        mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().
      
      * drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
        simplified.  Note that round_jiffies() calls in this function are
        meaningless.  round_jiffies() work on absolute jiffies not delta
        delay used by delayed_work.
      
      v2: Tomi pointed out that __cancel_delayed_work() users can't be
          safely converted to mod_delayed_work().  They could be calling it
          from irq context and if that happens while delayed_work_timer_fn()
          is running, it could deadlock.  __cancel_delayed_work() users are
          dropped.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NHenrique de Moraes Holschuh <hmh@hmh.eng.br>
      Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Acked-by: NAnton Vorontsov <cbouatmailru@gmail.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Doug Thompson <dougthompson@xmission.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      41f63c53
  25. 09 7月, 2012 1 次提交
  26. 26 1月, 2012 1 次提交
  27. 06 12月, 2011 2 次提交
  28. 30 11月, 2011 1 次提交
  29. 23 11月, 2011 1 次提交
  30. 01 11月, 2011 1 次提交
  31. 18 7月, 2011 1 次提交