1. 31 8月, 2015 2 次提交
    • M
      IB/core: Add RoCE table bonding support · 238fdf48
      Matan Barak 提交于
      Handling bonding and other devices require us to all all GIDs of the
      net-devices which are upper-devices of the RoCE port related
      net-device.
      
      Active-backup configurations imposes even more challenges as the
      default GID should only be set on the active devices (this is
      necessary as otherwise the same MAC could be used for several
      slaves and thus several slaves will have identical GIDs).
      
      Managing these configurations are done by listening to:
      (a) NETDEV_CHANGEUPPER event
      	(1) if a related net-device is linked, delete all inactive
      	    slaves default GIDs and add the upper device GIDs.
      	(2) if a related net-device is unlinked, delete all upper GIDs
      	    and add the default GIDs.
      (b) NETDEV_BONDING_FAILOVER:
      	(1) delete the bond GIDs from inactive slaves
      	(2) delete the inactive slave's default GIDs
      	(3) Add the bond GIDs to the active slave.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      238fdf48
    • M
      IB/core: Add RoCE GID table management · 03db3a2d
      Matan Barak 提交于
      RoCE GIDs are based on IP addresses configured on Ethernet net-devices
      which relate to the RDMA (RoCE) device port.
      
      Currently, each of the low-level drivers that support RoCE (ocrdma,
      mlx4) manages its own RoCE port GID table. As there's nothing which is
      essentially vendor specific, we generalize that, and enhance the RDMA
      core GID cache to do this job.
      
      In order to populate the GID table, we listen for events:
      
      (a) netdev up/down/change_addr events - if a netdev is built onto
          our RoCE device, we need to add/delete its IPs. This involves
          adding all GIDs related to this ndev, add default GIDs, etc.
      
      (b) inet events - add new GIDs (according to the IP addresses)
          to the table.
      
      For programming the port RoCE GID table, providers must implement
      the add_gid and del_gid callbacks.
      
      RoCE GID management requires us to state the associated net_device
      alongside the GID. This information is necessary in order to manage
      the GID table. For example, when a net_device is removed, its
      associated GIDs need to be removed as well.
      
      RoCE mandates generating a default GID for each port, based on the
      related net-device's IPv6 link local. In contrast to the GID based on
      the regular IPv6 link-local (as we generate GID per IP address),
      the default GID is also available when the net device is down (in
      order to support loopback).
      
      Locking is done as follows:
      The patch modify the GID table code both for new RoCE drivers
      implementing the add_gid/del_gid callbacks and for current RoCE and
      IB drivers that do not. The flows for updating the table are
      different, so the locking requirements are too.
      
      While updating RoCE GID table, protection against multiple writers is
      achieved via mutex_lock(&table->lock). Since writing to a table
      requires us to find an entry (possible a free entry) in the table and
      then modify it, this mutex protects both the find_gid and write_gid
      ensuring the atomicity of the action.
      Each entry in the GID cache is protected by rwlock. In RoCE, writing
      (usually results from netdev notifier) involves invoking the vendor's
      add_gid and del_gid callbacks, which could sleep.
      Therefore, an invalid flag is added for each entry. Updates for RoCE are
      done via a workqueue, thus sleeping is permitted.
      
      In IB, updates are done in write_lock_irq(&device->cache.lock), thus
      write_gid isn't allowed to sleep and add_gid/del_gid are not called.
      
      When passing net-device into/out-of the GID cache, the device
      is always passed held (dev_hold).
      
      The code uses a single work item for updating all RDMA devices,
      following a netdev or inet notifier.
      
      The patch moves the cache from being a client (which was incorrect,
      as the cache is part of the IB infrastructure) to being explicitly
      initialized/freed when a device is registered/removed.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      03db3a2d