1. 09 12月, 2016 3 次提交
  2. 07 12月, 2016 6 次提交
  3. 04 12月, 2016 4 次提交
    • I
      ipv4: fib: Replay events when registering FIB notifier · c3852ef7
      Ido Schimmel 提交于
      Commit b90eb754 ("fib: introduce FIB notification infrastructure")
      introduced a new notification chain to notify listeners (f.e., switchdev
      drivers) about addition and deletion of routes.
      
      However, upon registration to the chain the FIB tables can already be
      populated, which means potential listeners will have an incomplete view
      of the tables.
      
      Solve that by dumping the FIB tables and replaying the events to the
      passed notification block. The dump itself is done using RCU in order
      not to starve consumers that need RTNL to make progress.
      
      The integrity of the dump is ensured by reading the FIB change sequence
      counter before and after the dump under RTNL. This allows us to avoid
      the problematic situation in which the dumping process sends a ENTRY_ADD
      notification following ENTRY_DEL generated by another process holding
      RTNL.
      
      Callers of the registration function may pass a callback that is
      executed in case the dump was inconsistent with current FIB tables.
      
      The number of retries until a consistent dump is achieved is set to a
      fixed number to prevent callers from looping for long periods of time.
      In case current limit proves to be problematic in the future, it can be
      easily converted to be configurable using a sysctl.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3852ef7
    • I
      mlxsw: spectrum_router: Implement FIB offload in deferred work · 3057224e
      Ido Schimmel 提交于
      FIB offload is currently done in process context with RTNL held, but
      we're about to dump the FIB tables in RCU critical section, so we can no
      longer sleep.
      
      Instead, defer the operation to process context using deferred work. Make
      sure fib info isn't freed while the work is queued by taking a reference
      on it and releasing it after the operation is done.
      
      Deferring the operation is valid because the upper layers always assume
      the operation was successful. If it's not, then the driver-specific
      abort mechanism is called and all routed traffic is directed to slow
      path.
      
      The work items are submitted to an ordered workqueue to prevent a
      mismatch between the kernel's FIB table and the device's.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3057224e
    • I
      mlxsw: core: Create an ordered workqueue for FIB offload · a3832b31
      Ido Schimmel 提交于
      We're going to start processing FIB entries addition / deletion events
      in deferred work. These work items must be processed in the order they
      were submitted or otherwise we can have differences between the kernel's
      FIB table and the device's.
      
      Solve this by creating an ordered workqueue to which these work items
      will be submitted to. Note that we can't simply convert the current
      workqueue to be ordered, as EMADs re-transmissions are also processed in
      deferred work.
      
      Later on, we can migrate other work items to this workqueue, such as FDB
      notification processing and nexthop resolution, since they all take the
      same lock anyway.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3832b31
    • Z
      mlx4: use reset to set mac header · 69029109
      Zhang Shengju 提交于
      Since offset is zero, it's not necessary to use set function. Reset
      function is straightforward, and will remove the unnecessary add
      operation in set function.
      Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69029109
  4. 03 12月, 2016 6 次提交
  5. 02 12月, 2016 7 次提交
  6. 01 12月, 2016 2 次提交
  7. 30 11月, 2016 6 次提交
    • I
      mlxsw: core: Change order of operations in removal path · 523779c7
      Ido Schimmel 提交于
      We call bus->init() before allocating 'lag.mapping'. Change the order of
      operations in removal path to reflect that.
      
      This makes the error path of mlxsw_core_bus_device_register() symmetric
      with mlxsw_core_bus_device_unregister().
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      523779c7
    • I
      mlxsw: core: Add missing rollback in error path · 81d4d728
      Ido Schimmel 提交于
      Without this rollback, the thermal zone is still registered during the
      error path, whereas its private data is freed upon the destruction of
      the underlying bus device due to the use of devm_kzalloc(). This results
      in use after free.
      
      Fix this by calling mlxsw_thermal_fini() from the appropriate place in
      the error path.
      
      Fixes: a50c1e35 ("mlxsw: core: Implement thermal zone")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81d4d728
    • I
      mlxsw: spectrum_buffers: Limit size of pools · 87259f18
      Ido Schimmel 提交于
      The shared buffer pools are containers whose size is used to calculate
      the maximum usage for packets from / to a specific port / {port, PG/TC},
      when dynamic threshold is employed.
      
      While it's perfectly fine for the sum of the pools to exceed the maximum
      size of the shared buffer, a single pool cannot.
      
      Add a check when the pool size is set and forbid sizes larger than the
      maximum size of the shared buffer.
      
      Without the patch:
      $ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
      dynamic
      // No error is returned
      
      With the patch:
      $ devlink sb pool set pci/0000:03:00.0 pool 0 size 999999999 thtype
      dynamic
      devlink answers: Invalid argument
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87259f18
    • I
      mlxsw: resources: Add maximum buffer size · f414b48e
      Ido Schimmel 提交于
      We need to be able to limit the size of shared buffer pools, so query
      the maximum size from the device during init.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f414b48e
    • A
      mlxsw: switchib: add MLXSW_PCI dependency · 67ea7ef1
      Arnd Bergmann 提交于
      The newly added switchib driver fails to link if MLXSW_PCI=m:
      
      drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function^Cmlxsw_sib_module_exit':
      switchib.c:(.exit.text+0x8): undefined reference to `mlxsw_pci_driver_unregister'
      switchib.c:(.exit.text+0x10): undefined reference to `mlxsw_pci_driver_unregister'
      drivers/net/ethernet/mellanox/mlxsw/mlxsw_switchib.o: In function `mlxsw_sib_module_init':
      switchib.c:(.init.text+0x28): undefined reference to `mlxsw_pci_driver_register'
      switchib.c:(.init.text+0x38): undefined reference to `mlxsw_pci_driver_register'
      switchib.c:(.init.text+0x48): undefined reference to `mlxsw_pci_driver_unregister'
      
      The other two such sub-drivers have a dependency, so add the same one
      here. In theory we could allow this driver if MLXSW_PCI is disabled,
      but it's probably not worth it.
      
      Fixes: d1ba5263 ("mlxsw: switchib: Introduce SwitchIB and SwitchIB silicon driver")
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67ea7ef1
    • E
      mlx4: give precise rx/tx bytes/packets counters · 40931b85
      Eric Dumazet 提交于
      mlx4 stats are chaotic because a deferred work queue is responsible
      to update them every 250 ms.
      
      Even sampling stats every one second with "sar -n DEV 1" gives
      variations like the following :
      
      lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
      07:39:22         eth0 146877.00 3265554.00   9467.15 4828168.50
      07:39:23         eth0 146587.00 3260329.00   9448.15 4820445.98
      07:39:24         eth0 146894.00 3259989.00   9468.55 4819943.26
      07:39:25         eth0 110368.00 2454497.00   7113.95 3629012.17  <<>>
      07:39:26         eth0 146563.00 3257502.00   9447.25 4816266.23
      07:39:27         eth0 145678.00 3258292.00   9389.79 4817414.39
      07:39:28         eth0 145268.00 3253171.00   9363.85 4809852.46
      07:39:29         eth0 146439.00 3262185.00   9438.97 4823172.48
      07:39:30         eth0 146758.00 3264175.00   9459.94 4826124.13
      07:39:31         eth0 146843.00 3256903.00   9465.44 4815381.97
      Average:         eth0 142827.50 3179259.70   9206.30 4700578.16
      
      This patch allows rx/tx bytes/packets counters being folded at the
      time we need stats.
      
      We now can fetch stats every 1 ms if we want to check NIC behavior
      on a small time window. It is also easier to detect anomalies.
      
      lpaa23:~# sar -n DEV 1 10 | grep eth0 | cut -c1-65
      07:42:50         eth0 142915.00 3177696.00   9212.06 4698270.42
      07:42:51         eth0 143741.00 3200232.00   9265.15 4731593.02
      07:42:52         eth0 142781.00 3171600.00   9202.92 4689260.16
      07:42:53         eth0 143835.00 3192932.00   9271.80 4720761.39
      07:42:54         eth0 141922.00 3165174.00   9147.64 4679759.21
      07:42:55         eth0 142993.00 3207038.00   9216.78 4741653.05
      07:42:56         eth0 141394.06 3154335.64   9113.85 4663731.73
      07:42:57         eth0 141850.00 3161202.00   9144.48 4673866.07
      07:42:58         eth0 143439.00 3180736.00   9246.05 4702755.35
      07:42:59         eth0 143501.00 3210992.00   9249.99 4747501.84
      Average:         eth0 142835.66 3182165.93   9206.98 4704874.08
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40931b85
  8. 29 11月, 2016 6 次提交