1. 22 4月, 2020 2 次提交
  2. 21 4月, 2020 9 次提交
    • Z
      net/mlx5e: Get the latest values from counters in switchdev mode · dcdf4ce0
      Zhu Yanjun 提交于
      In the switchdev mode, when running "cat
      /sys/class/net/NIC/statistics/tx_packets", the ppcnt register is
      accessed to get the latest values. But currently this command can
      not get the correct values from ppcnt.
      
      From firmware manual, before getting the 802_3 counters, the 802_3
      data layout should be set to the ppcnt register.
      
      When the command "cat /sys/class/net/NIC/statistics/tx_packets" is
      run, before updating 802_3 data layout with ppcnt register, the
      monitor counters are tested. The test result will decide the
      802_3 data layout is updated or not.
      
      Actually the monitor counters do not support to monitor rx/tx
      stats of 802_3 in switchdev mode. So the rx/tx counters change
      will not trigger monitor counters. So the 802_3 data layout will
      not be updated in ppcnt register. Finally this command can not get
      the latest values from ppcnt register with 802_3 data layout.
      
      Fixes: 5c7e8bbb ("net/mlx5e: Use monitor counters for update stats")
      Signed-off-by: NZhu Yanjun <yanjunz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      dcdf4ce0
    • S
      net/mlx5: Kconfig: convert imply usage to weak dependency · 96c34151
      Saeed Mahameed 提交于
      MLX5_CORE uses the 'imply' keyword to depend on VXLAN, PTP_1588_CLOCK,
      MLXFW and PCI_HYPERV_INTERFACE.
      
      This was useful to force vxlan, ptp, etc.. to be reachable to mlx5
      regardless of their config states.
      
      Due to the changes in the cited commit below, the semantics of 'imply'
      was changed to not force any restriction on the implied config.
      
      As a result of this change, the compilation of MLX5_CORE=y and VXLAN=m
      would result in undefined references, as VXLAN now would stay as 'm'.
      
      To fix this we change MLX5_CORE to have a weak dependency on
      these modules/configs and make sure they are reachable, by adding:
      depend on symbol || !symbol.
      
      For example: VXLAN=m MLX5_CORE=y, this will force MLX5_CORE to m
      
      Fixes: def2fbff ("kconfig: allow symbols implied by y to become m")
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Nicolas Pitre <nico@fluxnic.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      96c34151
    • M
      net/mlx5e: Don't trigger IRQ multiple times on XSK wakeup to avoid WQ overruns · e7e0004a
      Maxim Mikityanskiy 提交于
      XSK wakeup function triggers NAPI by posting a NOP WQE to a special XSK
      ICOSQ. When the application floods the driver with wakeup requests by
      calling sendto() in a certain pattern that ends up in mlx5e_trigger_irq,
      the XSK ICOSQ may overflow.
      
      Multiple NOPs are not required and won't accelerate the process, so
      avoid posting a second NOP if there is one already on the way. This way
      we also avoid increasing the queue size (which might not help anyway).
      
      Fixes: db05815b ("net/mlx5e: Add XSK zero-copy support")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      e7e0004a
    • P
      net/mlx5: CT: Change idr to xarray to protect parallel tuple id allocation · 70840b66
      Paul Blakey 提交于
      After allowing parallel tuple insertion, we get the following trace:
      
      [ 5505.142249] ------------[ cut here ]------------
      [ 5505.148155] WARNING: CPU: 21 PID: 13313 at lib/radix-tree.c:581 delete_node+0x16c/0x180
      [ 5505.295553] CPU: 21 PID: 13313 Comm: kworker/u50:22 Tainted: G           OE     5.6.0+ #78
      [ 5505.304824] Hardware name: Supermicro Super Server/X10DRT-P, BIOS 2.0b 03/30/2017
      [ 5505.313740] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_flow_table]
      [ 5505.323257] RIP: 0010:delete_node+0x16c/0x180
      [ 5505.349862] RSP: 0018:ffffb19184eb7b30 EFLAGS: 00010282
      [ 5505.356785] RAX: 0000000000000000 RBX: ffff904ac95b86d8 RCX: ffff904b6f938838
      [ 5505.365190] RDX: 0000000000000000 RSI: ffff904ac954b908 RDI: ffff904ac954b920
      [ 5505.373628] RBP: ffff904b4ac13060 R08: 0000000000000001 R09: 0000000000000000
      [ 5505.382155] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000000
      [ 5505.390527] R13: ffffb19184eb7bfc R14: ffff904b6bef5800 R15: ffff90482c1203c0
      [ 5505.399246] FS:  0000000000000000(0000) GS:ffff904c2fc80000(0000) knlGS:0000000000000000
      [ 5505.408621] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5505.415739] CR2: 00007f5d27006010 CR3: 0000000058c10006 CR4: 00000000001626e0
      [ 5505.424547] Call Trace:
      [ 5505.428429]  idr_alloc_u32+0x7b/0xc0
      [ 5505.433803]  mlx5_tc_ct_entry_add_rule+0xbf/0x950 [mlx5_core]
      [ 5505.441354]  ? mlx5_fc_create+0x23c/0x370 [mlx5_core]
      [ 5505.448225]  mlx5_tc_ct_block_flow_offload+0x874/0x10b0 [mlx5_core]
      [ 5505.456278]  ? mlx5_tc_ct_block_flow_offload+0x63d/0x10b0 [mlx5_core]
      [ 5505.464532]  nf_flow_offload_tuple.isra.21+0xc5/0x140 [nf_flow_table]
      [ 5505.472286]  ? __kmalloc+0x217/0x2f0
      [ 5505.477093]  ? flow_rule_alloc+0x1c/0x30
      [ 5505.482117]  flow_offload_work_handler+0x1d0/0x290 [nf_flow_table]
      [ 5505.489674]  ? process_one_work+0x17c/0x580
      [ 5505.494922]  process_one_work+0x202/0x580
      [ 5505.500082]  ? process_one_work+0x17c/0x580
      [ 5505.505696]  worker_thread+0x4c/0x3f0
      [ 5505.510458]  kthread+0x103/0x140
      [ 5505.514989]  ? process_one_work+0x580/0x580
      [ 5505.520616]  ? kthread_bind+0x10/0x10
      [ 5505.525837]  ret_from_fork+0x3a/0x50
      [ 5505.570841] ---[ end trace 07995de9c56d6831 ]---
      
      This happens from parallel deletes/adds to idr, as idr isn't protected.
      Fix that by using xarray as the tuple_ids allocator instead of idr.
      
      Fixes: 7da182a9 ("netfilter: flowtable: Use work entry per offload command")
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      70840b66
    • N
      net/mlx5: Fix failing fw tracer allocation on s390 · a019b361
      Niklas Schnelle 提交于
      On s390 FORCE_MAX_ZONEORDER is 9 instead of 11, thus a larger kzalloc()
      allocation as done for the firmware tracer will always fail.
      
      Looking at mlx5_fw_tracer_save_trace(), it is actually the driver itself
      that copies the debug data into the trace array and there is no need for
      the allocation to be contiguous in physical memory. We can therefor use
      kvzalloc() instead of kzalloc() and get rid of the large contiguous
      allcoation.
      
      Fixes: f53aaa31 ("net/mlx5: FW tracer, implement tracer logic")
      Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a019b361
    • T
      team: fix hang in team_mode_get() · 1c30fbc7
      Taehee Yoo 提交于
      When team mode is changed or set, the team_mode_get() is called to check
      whether the mode module is inserted or not. If the mode module is not
      inserted, it calls the request_module().
      In the request_module(), it creates a child process, which is
      the "modprobe" process and waits for the done of the child process.
      At this point, the following locks were used.
      down_read(&cb_lock()); by genl_rcv()
          genl_lock(); by genl_rcv_msc()
              rtnl_lock(); by team_nl_cmd_options_set()
                  mutex_lock(&team->lock); by team_nl_team_get()
      
      Concurrently, the team module could be removed by rmmod or "modprobe -r"
      The __exit function of team module is team_module_exit(), which calls
      team_nl_fini() and it tries to acquire following locks.
      down_write(&cb_lock);
          genl_lock();
      Because of the genl_lock() and cb_lock, this process can't be finished
      earlier than request_module() routine.
      
      The problem secenario.
      CPU0                                     CPU1
      team_mode_get
          request_module()
                                               modprobe -r team_mode_roundrobin
                                                           team <--(B)
              modprobe team <--(A)
                  team_mode_roundrobin
      
      By request_module(), the "modprobe team_mode_roundrobin" command
      will be executed. At this point, the modprobe process will decide
      that the team module should be inserted before team_mode_roundrobin.
      Because the team module is being removed.
      
      By the module infrastructure, the same module insert/remove operations
      can't be executed concurrently.
      So, (A) waits for (B) but (B) also waits for (A) because of locks.
      So that the hang occurs at this point.
      
      Test commands:
          while :
          do
              teamd -d &
      	killall teamd &
      	modprobe -rv team_mode_roundrobin &
          done
      
      The approach of this patch is to hold the reference count of the team
      module if the team module is compiled as a module. If the reference count
      of the team module is not zero while request_module() is being called,
      the team module will not be removed at that moment.
      So that the above scenario could not occur.
      
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c30fbc7
    • R
      cxgb4: fix large delays in PTP synchronization · bd019427
      Rahul Lakkireddy 提交于
      Fetching PTP sync information from mailbox is slow and can take
      up to 10 milliseconds. Reduce this unnecessary delay by directly
      reading the information from the corresponding registers.
      
      Fixes: 9c33e420 ("cxgb4: Add PTP Hardware Clock (PHC) support")
      Signed-off-by: NManoj Malviya <manojmalviya@chelsio.com>
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd019427
    • M
      net: stmmac: dwmac-meson8b: Add missing boundary to RGMII TX clock array · f0212a5e
      Marc Zyngier 提交于
      Running with KASAN on a VIM3L systems leads to the following splat
      when probing the Ethernet device:
      
      ==================================================================
      BUG: KASAN: global-out-of-bounds in _get_maxdiv+0x74/0xd8
      Read of size 4 at addr ffffa000090615f4 by task systemd-udevd/139
      CPU: 1 PID: 139 Comm: systemd-udevd Tainted: G            E     5.7.0-rc1-00101-g8624b7577b9c #781
      Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020
      Call trace:
       dump_backtrace+0x0/0x2a0
       show_stack+0x20/0x30
       dump_stack+0xec/0x148
       print_address_description.isra.12+0x70/0x35c
       __kasan_report+0xfc/0x1d4
       kasan_report+0x4c/0x68
       __asan_load4+0x9c/0xd8
       _get_maxdiv+0x74/0xd8
       clk_divider_bestdiv+0x74/0x5e0
       clk_divider_round_rate+0x80/0x1a8
       clk_core_determine_round_nolock.part.9+0x9c/0xd0
       clk_core_round_rate_nolock+0xf0/0x108
       clk_hw_round_rate+0xac/0xf0
       clk_factor_round_rate+0xb8/0xd0
       clk_core_determine_round_nolock.part.9+0x9c/0xd0
       clk_core_round_rate_nolock+0xf0/0x108
       clk_core_round_rate_nolock+0xbc/0x108
       clk_core_set_rate_nolock+0xc4/0x2e8
       clk_set_rate+0x58/0xe0
       meson8b_dwmac_probe+0x588/0x72c [dwmac_meson8b]
       platform_drv_probe+0x78/0xd8
       really_probe+0x158/0x610
       driver_probe_device+0x140/0x1b0
       device_driver_attach+0xa4/0xb0
       __driver_attach+0xcc/0x1c8
       bus_for_each_dev+0xf4/0x168
       driver_attach+0x3c/0x50
       bus_add_driver+0x238/0x2e8
       driver_register+0xc8/0x1e8
       __platform_driver_register+0x88/0x98
       meson8b_dwmac_driver_init+0x28/0x1000 [dwmac_meson8b]
       do_one_initcall+0xa8/0x328
       do_init_module+0xe8/0x368
       load_module+0x3300/0x36b0
       __do_sys_finit_module+0x120/0x1a8
       __arm64_sys_finit_module+0x4c/0x60
       el0_svc_common.constprop.2+0xe4/0x268
       do_el0_svc+0x98/0xa8
       el0_svc+0x24/0x68
       el0_sync_handler+0x12c/0x318
       el0_sync+0x158/0x180
      
      The buggy address belongs to the variable:
       div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b]
      
      Memory state around the buggy address:
       ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00
       ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa
      >ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa
                                                                   ^
       ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa
       ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa
      ==================================================================
      
      Digging into this indeed shows that the clock divider array is
      lacking a final fence, and that the clock subsystems goes in the
      weeds. Oh well.
      
      Let's add the empty structure that indicates the end of the array.
      
      Fixes: bd6f4854 ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: NMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0212a5e
    • T
      net: systemport: Omit superfluous error message in bcm_sysport_probe() · bdbe05b3
      Tang Bin 提交于
      In the function bcm_sysport_probe(), when get irq failed, the function
      platform_get_irq() logs an error message, so remove redundant message
      here.
      Signed-off-by: NTang Bin <tangbin@cmss.chinamobile.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdbe05b3
  3. 19 4月, 2020 4 次提交
    • E
      net/mlx4_en: avoid indirect call in TX completion · 310660a1
      Eric Dumazet 提交于
      Commit 9ecc2d86 ("net/mlx4_en: add xdp forwarding and data write support")
      brought another indirect call in fast path.
      
      Use INDIRECT_CALL_2() helper to avoid the cost of the indirect call
      when/if CONFIG_RETPOLINE=y
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      310660a1
    • J
      net: stmmac: Fix sub-second increment · 91a2559c
      Julien Beraud 提交于
      In fine adjustement mode, which is the current default, the sub-second
          increment register is the number of nanoseconds that will be added to
          the clock when the accumulator overflows. At each clock cycle, the
          value of the addend register is added to the accumulator.
          Currently, we use 20ns = 1e09ns / 50MHz as this value whatever the
          frequency of the ptp clock actually is.
          The adjustment is then done on the addend register, only incrementing
          every X clock cycles X being the ratio between 50MHz and ptp_clock_rate
          (addend = 2^32 * 50MHz/ptp_clock_rate).
          This causes the following issues :
          - In case the frequency of the ptp clock is inferior or equal to 50MHz,
            the addend value calculation will overflow and the default
            addend value will be set to 0, causing the clock to not work at
            all. (For instance, for ptp_clock_rate = 50MHz, addend = 2^32).
          - The resolution of the timestamping clock is limited to 20ns while it
            is not needed, thus limiting the accuracy of the timestamping to
            20ns.
      
          Fix this by setting sub-second increment to 2e09ns / ptp_clock_rate.
          It will allow to reach the minimum possible frequency for
          ptp_clk_ref, which is 5MHz for GMII 1000Mps Full-Duplex by setting the
          sub-second-increment to a higher value. For instance, for 25MHz, it
          gives ssinc = 80ns and default_addend = 2^31.
          It will also allow to use a lower value for sub-second-increment, thus
          improving the timestamping accuracy with frequencies higher than
          100MHz, for instance, for 200MHz, ssinc = 10ns and default_addend =
          2^31.
      
      v1->v2:
       - Remove modifications to the calculation of default addend, which broke
       compatibility with clock frequencies for which 2000000000 / ptp_clk_freq
       is not an integer.
       - Modify description according to discussions.
      Signed-off-by: NJulien Beraud <julien.beraud@orolia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91a2559c
    • J
      net: stmmac: fix enabling socfpga's ptp_ref_clock · 15ce3060
      Julien Beraud 提交于
      There are 2 registers to write to enable a ptp ref clock coming from the
      fpga.
      One that enables the usage of the clock from the fpga for emac0 and emac1
      as a ptp ref clock, and the other to allow signals from the fpga to reach
      emac0 and emac1.
      Currently, if the dwmac-socfpga has phymode set to PHY_INTERFACE_MODE_MII,
      PHY_INTERFACE_MODE_GMII, or PHY_INTERFACE_MODE_SGMII, both registers will
      be written and the ptp ref clock will be set as coming from the fpga.
      Separate the 2 register writes to only enable signals from the fpga to
      reach emac0 or emac1 when ptp ref clock is not coming from the fpga.
      Signed-off-by: NJulien Beraud <julien.beraud@orolia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15ce3060
    • X
      wimax/i2400m: Fix potential urb refcnt leak · 7717cbec
      Xiyu Yang 提交于
      i2400mu_bus_bm_wait_for_ack() invokes usb_get_urb(), which increases the
      refcount of the "notif_urb".
      
      When i2400mu_bus_bm_wait_for_ack() returns, local variable "notif_urb"
      becomes invalid, so the refcount should be decreased to keep refcount
      balanced.
      
      The issue happens in all paths of i2400mu_bus_bm_wait_for_ack(), which
      forget to decrease the refcnt increased by usb_get_urb(), causing a
      refcnt leak.
      
      Fix this issue by calling usb_put_urb() before the
      i2400mu_bus_bm_wait_for_ack() returns.
      Signed-off-by: NXiyu Yang <xiyuyang19@fudan.edu.cn>
      Signed-off-by: NXin Tan <tanxin.ctf@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7717cbec
  4. 17 4月, 2020 4 次提交
  5. 16 4月, 2020 2 次提交
    • J
      net: tulip: make early_486_chipsets static · ae5a44bb
      Jason Yan 提交于
      Fix the following sparse warning:
      
      drivers/net/ethernet/dec/tulip/tulip_core.c:1280:28: warning: symbol
      'early_486_chipsets' was not declared. Should it be static?
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae5a44bb
    • V
      net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge · 87b0f983
      Vladimir Oltean 提交于
      To rehash a previous explanation given in commit 1c44ce56 ("net:
      mscc: ocelot: fix vlan_filtering when enslaving to bridge before link is
      up"), the switch driver operates the in a mode where a single VLAN can
      be transmitted as untagged on a particular egress port. That is the
      "native VLAN on trunk port" use case.
      
      The configuration for this native VLAN is driven in 2 ways:
       - Set the egress port rewriter to strip the VLAN tag for the native
         VID (as it is egress-untagged, after all).
       - Configure the ingress port to drop untagged and priority-tagged
         traffic, if there is no native VLAN. The intention of this setting is
         that a trunk port with no native VLAN should not accept untagged
         traffic.
      
      Since both of the above configurations for the native VLAN should only
      be done if VLAN awareness is requested, they are actually done from the
      ocelot_port_vlan_filtering function, after the basic procedure of
      toggling the VLAN awareness flag of the port.
      
      But there's a problem with that simplistic approach: we are trying to
      juggle with 2 independent variables from a single function:
       - Native VLAN of the port - its value is held in port->vid.
       - VLAN awareness state of the port - currently there are some issues
         here, more on that later*.
      The actual problem can be seen when enslaving the switch ports to a VLAN
      filtering bridge:
       0. The driver configures a pvid of zero for each port, when in
          standalone mode. While the bridge configures a default_pvid of 1 for
          each port that gets added as a slave to it.
       1. The bridge calls ocelot_port_vlan_filtering with vlan_aware=true.
          The VLAN-filtering-dependent portion of the native VLAN
          configuration is done, considering that the native VLAN is 0.
       2. The bridge calls ocelot_vlan_add with vid=1, pvid=true,
          untagged=true. The native VLAN changes to 1 (change which gets
          propagated to hardware).
       3. ??? - nobody calls ocelot_port_vlan_filtering again, to reapply the
          VLAN-filtering-dependent portion of the native VLAN configuration,
          for the new native VLAN of 1. One can notice that after toggling "ip
          link set dev br0 type bridge vlan_filtering 0 && ip link set dev br0
          type bridge vlan_filtering 1", the new native VLAN finally makes it
          through and untagged traffic finally starts flowing again. But
          obviously that shouldn't be needed.
      
      So it is clear that 2 independent variables need to both re-trigger the
      native VLAN configuration. So we introduce the second variable as
      ocelot_port->vlan_aware.
      
      *Actually both the DSA Felix driver and the Ocelot driver already had
      each its own variable:
       - Ocelot: ocelot_port_private->vlan_aware
       - Felix: dsa_port->vlan_filtering
      but the common Ocelot library needs to work with a single, common,
      variable, so there is some refactoring done to move the vlan_aware
      property from the private structure into the common ocelot_port
      structure.
      
      Fixes: 97bb69e1 ("net: mscc: ocelot: break apart ocelot_vlan_port_apply")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NHoratiu Vultur <horatiu.vultur@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87b0f983
  6. 15 4月, 2020 8 次提交
  7. 14 4月, 2020 7 次提交
  8. 13 4月, 2020 4 次提交