1. 24 5月, 2020 5 次提交
    • Q
      net/mlx4_core: fix a memory leak bug. · febfd9d3
      Qiushi Wu 提交于
      In function mlx4_opreq_action(), pointer "mailbox" is not released,
      when mlx4_cmd_box() return and error, causing a memory leak bug.
      Fix this issue by going to "out" label, mlx4_free_cmd_mailbox() can
      free this pointer.
      
      Fixes: fe6f700d ("net/mlx4_core: Respond to operation request by firmware")
      Signed-off-by: NQiushi Wu <wu000273@umn.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      febfd9d3
    • G
      net: ethernet: ti: cpsw: fix ASSERT_RTNL() warning during suspend · 4c64b83d
      Grygorii Strashko 提交于
      vlan_for_each() are required to be called with rtnl_lock taken, otherwise
      ASSERT_RTNL() warning will be triggered - which happens now during System
      resume from suspend:
        cpsw_suspend()
        |- cpsw_ndo_stop()
          |- __hw_addr_ref_unsync_dev()
            |- cpsw_purge_all_mc()
               |- vlan_for_each()
                  |- ASSERT_RTNL();
      
      Hence, fix it by surrounding cpsw_ndo_stop() by rtnl_lock/unlock() calls.
      
      Fixes: 15180eca ("net: ethernet: ti: cpsw: fix vlan mcast")
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c64b83d
    • A
      net: phy: mscc: fix initialization of the MACsec protocol mode · 0ddfee1f
      Antoine Tenart 提交于
      At the very end of the MACsec block initialization in the MSCC PHY
      driver, the MACsec "protocol mode" is set. This setting should be set
      based on the PHY id within the package, as the bank used to access the
      register used depends on this. This was not done correctly, and only the
      first bank was used leading to the two upper PHYs being unstable when
      using the VSC8584. This patch fixes it.
      
      Fixes: 1bbe0ecc ("net: phy: mscc: macsec initialization")
      Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ddfee1f
    • L
      net: stmmac: don't attach interface until resume finishes · 31096c3e
      Leon Yu 提交于
      Commit 14b41a29 ("net: stmmac: Delete txtimer in suspend") was the
      first attempt to fix a race between mod_timer() and setup_timer()
      during stmmac_resume(). However the issue still exists as the commit
      only addressed half of the issue.
      
      Same race can still happen as stmmac_resume() re-attaches interface
      way too early - even before hardware is fully initialized.  Worse,
      doing so allows network traffic to restart and stmmac_tx_timer_arm()
      being called in the middle of stmmac_resume(), which re-init tx timers
      in stmmac_init_coalesce().  timer_list will be corrupted and system
      crashes as a result of race between mod_timer() and setup_timer().
      
        systemd--1995    2.... 552950018us : stmmac_suspend: 4994
        ksoftirq-9       0..s2 553123133us : stmmac_tx_timer_arm: 2276
        systemd--1995    0.... 553127896us : stmmac_resume: 5101
        systemd--320     7...2 553132752us : stmmac_tx_timer_arm: 2276
        (sd-exec-1999    5...2 553135204us : stmmac_tx_timer_arm: 2276
        ---------------------------------
        pc : run_timer_softirq+0x468/0x5e0
        lr : run_timer_softirq+0x570/0x5e0
        Call trace:
         run_timer_softirq+0x468/0x5e0
         __do_softirq+0x124/0x398
         irq_exit+0xd8/0xe0
         __handle_domain_irq+0x6c/0xc0
         gic_handle_irq+0x60/0xb0
         el1_irq+0xb8/0x180
         arch_cpu_idle+0x38/0x230
         default_idle_call+0x24/0x3c
         do_idle+0x1e0/0x2b8
         cpu_startup_entry+0x28/0x48
         secondary_start_kernel+0x1b4/0x208
      
      Fix this by deferring netif_device_attach() to the end of
      stmmac_resume().
      Signed-off-by: NLeon Yu <leoyu@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31096c3e
    • T
      net: Fix return value about devm_platform_ioremap_resource() · ef24d6c3
      Tiezhu Yang 提交于
      When call function devm_platform_ioremap_resource(), we should use IS_ERR()
      to check the return value and return PTR_ERR() if failed.
      Signed-off-by: NTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef24d6c3
  2. 23 5月, 2020 21 次提交
    • D
      Merge tag 'rxrpc-fixes-20200523-v2' of... · d04322a0
      David S. Miller 提交于
      Merge tag 'rxrpc-fixes-20200523-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      David Howells says:
      
      ====================
      rxrpc: Fix a warning and a leak [ver #2]
      
      Here are a couple of fixes for AF_RXRPC:
      
       (1) Fix an uninitialised variable warning.
      
       (2) Fix a leak of the ticket on error in rxkad.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d04322a0
    • Q
      rxrpc: Fix a memory leak in rxkad_verify_response() · f45d01f4
      Qiushi Wu 提交于
      A ticket was not released after a call of the function
      "rxkad_decrypt_ticket" failed. Thus replace the jump target
      "temporary_error_free_resp" by "temporary_error_free_ticket".
      
      Fixes: 8c2f826d ("rxrpc: Don't put crypto buffers on the stack")
      Signed-off-by: NQiushi Wu <wu000273@umn.edu>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Markus Elfring <Markus.Elfring@web.de>
      f45d01f4
    • D
      rxrpc: Fix a warning · 8a1d24e1
      David Howells 提交于
      Fix a warning due to an uninitialised variable.
      
      le included from ../fs/afs/fs_probe.c:11:
      ../fs/afs/fs_probe.c: In function 'afs_fileserver_probe_result':
      ../fs/afs/internal.h:1453:2: warning: 'rtt_us' may be used uninitialized in this function [-Wmaybe-uninitialized]
       1453 |  printk("[%-6.6s] "FMT"\n", current->comm ,##__VA_ARGS__)
            |  ^~~~~~
      ../fs/afs/fs_probe.c:35:15: note: 'rtt_us' was declared here
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8a1d24e1
    • Q
      net: sun: fix missing release regions in cas_init_one(). · 5a730153
      Qiushi Wu 提交于
      In cas_init_one(), "pdev" is requested by "pci_request_regions", but it
      was not released after a call of the function “pci_write_config_byte”
      failed. Thus replace the jump target “err_write_cacheline” by
      "err_out_free_res".
      
      Fixes: 1f26dac3 ("[NET]: Add Sun Cassini driver.")
      Signed-off-by: NQiushi Wu <wu000273@umn.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a730153
    • V
      net: mscc: ocelot: fix address ageing time (again) · bf655ba2
      Vladimir Oltean 提交于
      ocelot_set_ageing_time has 2 callers:
       - felix_set_ageing_time: from drivers/net/dsa/ocelot/felix.c
       - ocelot_port_attr_ageing_set: from drivers/net/ethernet/mscc/ocelot.c
      
      The issue described in the fixed commit below actually happened for the
      felix_set_ageing_time code path only, since ocelot_port_attr_ageing_set
      was already dividing by 1000. So to make both paths symmetrical (and to
      fix addresses getting aged way too fast on Ocelot), stop dividing by
      1000 at caller side altogether.
      
      Fixes: c0d7eccb ("net: mscc: ocelot: ANA_AUTOAGE_AGE_PERIOD holds a value in seconds, not ms")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf655ba2
    • H
      r8169: fix OCP access on RTL8117 · 561535b0
      Heiner Kallweit 提交于
      According to r8168 vendor driver DASHv3 chips like RTL8168fp/RTL8117
      need a special addressing for OCP access.
      Fix is compile-tested only due to missing test hardware.
      
      Fixes: 1287723a ("r8169: add support for RTL8117")
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      561535b0
    • D
      Merge branch 'mlxsw-Various-fixes' · 156ee62b
      David S. Miller 提交于
      Ido Schimmel says:
      
      ====================
      mlxsw: Various fixes
      
      Patch #1 from Jiri fixes a use-after-free discovered while fuzzing mlxsw
      / devlink with syzkaller.
      
      Patch #2 from Amit works around a limitation in new versions of arping,
      which is used in several selftests.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      156ee62b
    • A
      selftests: mlxsw: qos_mc_aware: Specify arping timeout as an integer · 46ca1117
      Amit Cohen 提交于
      Starting from iputils s20190709 (used in Fedora 31), arping does not
      support timeout being specified as a decimal:
      
      $ arping -c 1 -I swp1 -b 192.0.2.66 -q -w 0.1
      arping: invalid argument: '0.1'
      
      Previously, such timeouts were rounded to an integer.
      
      Fix this by specifying the timeout as an integer.
      
      Fixes: a5ee171d ("selftests: mlxsw: qos_mc_aware: Add a test for UC awareness")
      Signed-off-by: NAmit Cohen <amitc@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46ca1117
    • J
      mlxsw: spectrum: Fix use-after-free of split/unsplit/type_set in case reload fails · 4340f42f
      Jiri Pirko 提交于
      In case of reload fail, the mlxsw_sp->ports contains a pointer to a
      freed memory (either by reload_down() or reload_up() error path).
      Fix this by initializing the pointer to NULL and checking it before
      dereferencing in split/unsplit/type_set callpaths.
      
      Fixes: 24cc68ad ("mlxsw: core: Add support for reload")
      Reported-by: NDanielle Ratson <danieller@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4340f42f
    • J
      net: ethernet: stmmac: Enable interface clocks on probe for IPQ806x · a96ac8a0
      Jonathan McDowell 提交于
      The ipq806x_gmac_probe() function enables the PTP clock but not the
      appropriate interface clocks. This means that if the bootloader hasn't
      done so attempting to bring up the interface will fail with an error
      like:
      
      [   59.028131] ipq806x-gmac-dwmac 37600000.ethernet: Failed to reset the dma
      [   59.028196] ipq806x-gmac-dwmac 37600000.ethernet eth1: stmmac_hw_setup: DMA engine initialization failed
      [   59.034056] ipq806x-gmac-dwmac 37600000.ethernet eth1: stmmac_open: Hw setup failed
      
      This patch, a slightly cleaned up version of one posted by Sergey
      Sergeev in:
      
      https://forum.openwrt.org/t/support-for-mikrotik-rb3011uias-rm/4064/257
      
      correctly enables the clock; we have already configured the source just
      before this.
      
      Tested on a MikroTik RB3011.
      Signed-off-by: NJonathan McDowell <noodles@earth.li>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a96ac8a0
    • D
      Merge branch 'netdevsim-Two-small-fixes' · 7a40a2d2
      David S. Miller 提交于
      Ido Schimmel says:
      
      ====================
      netdevsim: Two small fixes
      
      Fix two bugs observed while analyzing regression failures.
      
      Patch #1 fixes a bug where sometimes the drop counter of a packet trap
      policer would not increase.
      
      Patch #2 adds a missing initialization of a variable in a related
      selftest.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a40a2d2
    • I
      selftests: netdevsim: Always initialize 'RET' variable · 4d59e59c
      Ido Schimmel 提交于
      The variable is used by log_test() to check if the test case completely
      successfully or not. In case it is not initialized at the start of a
      test case, it is possible for the test case to fail despite not
      encountering any errors.
      
      Example:
      
      ```
      ...
      TEST: Trap group statistics                                         [ OK ]
      TEST: Trap policer                                                  [FAIL]
      	Policer drop counter was not incremented
      TEST: Trap policer binding                                          [FAIL]
      	Policer drop counter was not incremented
      ```
      
      Failure of trap_policer_test() caused trap_policer_bind_test() to fail
      as well.
      
      Fix by adding missing initialization of the variable.
      
      Fixes: 5fbff58e ("selftests: netdevsim: Add test cases for devlink-trap policers")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d59e59c
    • I
      netdevsim: Ensure policer drop counter always increases · be43224f
      Ido Schimmel 提交于
      In case the policer drop counter is retrieved when the jiffies value is
      a multiple of 64, the counter will not be incremented.
      
      This randomly breaks a selftest [1] the reads the counter twice and
      checks that it was incremented:
      
      ```
      TEST: Trap policer                                                  [FAIL]
      	Policer drop counter was not incremented
      ```
      
      Fix by always incrementing the counter by 1.
      
      [1] tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh
      
      Fixes: ad188458 ("netdevsim: Add devlink-trap policer support")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be43224f
    • D
      Merge tag 'rxrpc-fixes-20200520' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 4629ed2e
      David S. Miller 提交于
      David Howells says:
      
      ====================
      rxrpc: Fix retransmission timeout and ACK discard
      
      Here are a couple of fixes and an extra tracepoint for AF_RXRPC:
      
       (1) Calculate the RTO pretty much as TCP does, rather than making
           something up, including an initial 4s timeout (which causes return
           probes from the fileserver to fail if a packet goes missing), and add
           backoff.
      
       (2) Fix the discarding of out-of-order received ACKs.  We mustn't let the
           hard-ACK point regress, nor do we want to do unnecessary
           retransmission because the soft-ACK list regresses.  This is not
           trivial, however, due to some loose wording in various old protocol
           specs, the ACK field that should be used for this sometimes has the
           wrong information in it.
      
       (3) Add a tracepoint to log a discarded ACK.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4629ed2e
    • V
      net/ethernet/freescale: rework quiesce/activate for ucc_geth · 79dde73c
      Valentin Longchamp 提交于
      ugeth_quiesce/activate are used to halt the controller when there is a
      link change that requires to reconfigure the mac.
      
      The previous implementation called netif_device_detach(). This however
      causes the initial activation of the netdevice to fail precisely because
      it's detached. For details, see [1].
      
      A possible workaround was the revert of commit
      net: linkwatch: add check for netdevice being present to linkwatch_do_dev
      However, the check introduced in the above commit is correct and shall be
      kept.
      
      The netif_device_detach() is thus replaced with
      netif_tx_stop_all_queues() that prevents any tranmission. This allows to
      perform mac config change required by the link change, without detaching
      the corresponding netdevice and thus not preventing its initial
      activation.
      
      [1] https://lists.openwall.net/netdev/2020/01/08/201Signed-off-by: NValentin Longchamp <valentin@longchamp.me>
      Acked-by: NMatteo Ghidoni <matteo.ghidoni@ch.abb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79dde73c
    • J
      sctp: Start shutdown on association restart if in SHUTDOWN-SENT state and socket is closed · d3e8e4c1
      Jere Leppänen 提交于
      Commit bdf6fa52 ("sctp: handle association restarts when the
      socket is closed.") starts shutdown when an association is restarted,
      if in SHUTDOWN-PENDING state and the socket is closed. However, the
      rationale stated in that commit applies also when in SHUTDOWN-SENT
      state - we don't want to move an association to ESTABLISHED state when
      the socket has been closed, because that results in an association
      that is unreachable from user space.
      
      The problem scenario:
      
      1.  Client crashes and/or restarts.
      
      2.  Server (using one-to-one socket) calls close(). SHUTDOWN is lost.
      
      3.  Client reconnects using the same addresses and ports.
      
      4.  Server's association is restarted. The association and the socket
          move to ESTABLISHED state, even though the server process has
          closed its descriptor.
      
      Also, after step 4 when the server process exits, some resources are
      leaked in an attempt to release the underlying inet sock structure in
      ESTABLISHED state:
      
          IPv4: Attempt to release TCP socket in state 1 00000000377288c7
      
      Fix by acting the same way as in SHUTDOWN-PENDING state. That is, if
      an association is restarted in SHUTDOWN-SENT state and the socket is
      closed, then start shutdown and don't move the association or the
      socket to ESTABLISHED state.
      
      Fixes: bdf6fa52 ("sctp: handle association restarts when the socket is closed.")
      Signed-off-by: NJere Leppänen <jere.leppanen@nokia.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3e8e4c1
    • E
      tipc: block BH before using dst_cache · 13788174
      Eric Dumazet 提交于
      dst_cache_get() documents it must be used with BH disabled.
      
      sysbot reported :
      
      BUG: using smp_processor_id() in preemptible [00000000] code: /21697
      caller is dst_cache_get+0x3a/0xb0 net/core/dst_cache.c:68
      CPU: 0 PID: 21697 Comm:  Not tainted 5.7.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       check_preemption_disabled lib/smp_processor_id.c:47 [inline]
       debug_smp_processor_id.cold+0x88/0x9b lib/smp_processor_id.c:57
       dst_cache_get+0x3a/0xb0 net/core/dst_cache.c:68
       tipc_udp_xmit.isra.0+0xb9/0xad0 net/tipc/udp_media.c:164
       tipc_udp_send_msg+0x3e6/0x490 net/tipc/udp_media.c:244
       tipc_bearer_xmit_skb+0x1de/0x3f0 net/tipc/bearer.c:526
       tipc_enable_bearer+0xb2f/0xd60 net/tipc/bearer.c:331
       __tipc_nl_bearer_enable+0x2bf/0x390 net/tipc/bearer.c:995
       tipc_nl_bearer_enable+0x1e/0x30 net/tipc/bearer.c:1003
       genl_family_rcv_msg_doit net/netlink/genetlink.c:673 [inline]
       genl_family_rcv_msg net/netlink/genetlink.c:718 [inline]
       genl_rcv_msg+0x627/0xdf0 net/netlink/genetlink.c:735
       netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:746
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
       ___sys_sendmsg+0x100/0x170 net/socket.c:2416
       __sys_sendmsg+0xec/0x1b0 net/socket.c:2449
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x45ca29
      
      Fixes: e9c1a793 ("tipc: add dst_cache support for udp media")
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13788174
    • R
      net: mvpp2: fix RX hashing for non-10G ports · 3138a07c
      Russell King 提交于
      When rxhash is enabled on any ethernet port except the first in each CP
      block, traffic flow is prevented.  The analysis is below:
      
      I've been investigating this afternoon, and what I've found, comparing
      a kernel without 895586d5 and with 895586d5 applied is:
      
      - The table programmed into the hardware via mvpp22_rss_fill_table()
        appears to be identical with or without the commit.
      
      - When rxhash is enabled on eth2, mvpp2_rss_port_c2_enable() reports
        that c2.attr[0] and c2.attr[2] are written back containing:
      
         - with 895586d5, failing:    00200000 40000000
         - without 895586d5, working: 04000000 40000000
      
      - When disabling rxhash, c2.attr[0] and c2.attr[2] are written back as:
      
         04000000 00000000
      
      The second value represents the MVPP22_CLS_C2_ATTR2_RSS_EN bit, the
      first value is the queue number, which comprises two fields. The high
      5 bits are 24:29 and the low three are 21:23 inclusive. This comes
      from:
      
             c2.attr[0] = MVPP22_CLS_C2_ATTR0_QHIGH(qh) |
                           MVPP22_CLS_C2_ATTR0_QLOW(ql);
      
      So, the working case gives eth2 a queue id of 4.0, or 32 as per
      port->first_rxq, and the non-working case a queue id of 0.1, or 1.
      The allocation of queue IDs seems to be in mvpp2_port_probe():
      
              if (priv->hw_version == MVPP21)
                      port->first_rxq = port->id * port->nrxqs;
              else
                      port->first_rxq = port->id * priv->max_port_rxqs;
      
      Where:
      
              if (priv->hw_version == MVPP21)
                      priv->max_port_rxqs = 8;
              else
                      priv->max_port_rxqs = 32;
      
      Making the port 0 (eth0 / eth1) have port->first_rxq = 0, and port 1
      (eth2) be 32. It seems the idea is that the first 32 queues belong to
      port 0, the second 32 queues belong to port 1, etc.
      
      mvpp2_rss_port_c2_enable() gets the queue number from it's parameter,
      'ctx', which comes from mvpp22_rss_ctx(port, 0). This returns
      port->rss_ctx[0].
      
      mvpp22_rss_context_create() is responsible for allocating that, which
      it does by looking for an unallocated priv->rss_tables[] pointer. This
      table is shared amongst all ports on the CP silicon.
      
      When we write the tables in mvpp22_rss_fill_table(), the RSS table
      entry is defined by:
      
                      u32 sel = MVPP22_RSS_INDEX_TABLE(rss_ctx) |
                                MVPP22_RSS_INDEX_TABLE_ENTRY(i);
      
      where rss_ctx is the context ID (queue number) and i is the index in
      the table.
      
      If we look at what is written:
      
      - The first table to be written has "sel" values of 00000000..0000001f,
        containing values 0..3. This appears to be for eth1. This is table 0,
        RX queue number 0.
      - The second table has "sel" values of 00000100..0000011f, and appears
        to be for eth2.  These contain values 0x20..0x23. This is table 1,
        RX queue number 0.
      - The third table has "sel" values of 00000200..0000021f, and appears
        to be for eth3.  These contain values 0x40..0x43. This is table 2,
        RX queue number 0.
      
      How do queue numbers translate to the RSS table?  There is another
      table - the RXQ2RSS table, indexed by the MVPP22_RSS_INDEX_QUEUE field
      of MVPP22_RSS_INDEX and accessed through the MVPP22_RXQ2RSS_TABLE
      register. Before 895586d5, it was:
      
             mvpp2_write(priv, MVPP22_RSS_INDEX,
                         MVPP22_RSS_INDEX_QUEUE(port->first_rxq));
             mvpp2_write(priv, MVPP22_RXQ2RSS_TABLE,
                         MVPP22_RSS_TABLE_POINTER(port->id));
      
      and after:
      
             mvpp2_write(priv, MVPP22_RSS_INDEX, MVPP22_RSS_INDEX_QUEUE(ctx));
             mvpp2_write(priv, MVPP22_RXQ2RSS_TABLE, MVPP22_RSS_TABLE_POINTER(ctx));
      
      Before the commit, for eth2, that would've contained '32' for the
      index and '1' for the table pointer - mapping queue 32 to table 1.
      Remember that this is queue-high.queue-low of 4.0.
      
      After the commit, we appear to map queue 1 to table 1. That again
      looks fine on the face of it.
      
      Section 9.3.1 of the A8040 manual seems indicate the reason that the
      queue number is separated. queue-low seems to always come from the
      classifier, whereas queue-high can be from the ingress physical port
      number or the classifier depending on the MVPP2_CLS_SWFWD_PCTRL_REG.
      
      We set the port bit in MVPP2_CLS_SWFWD_PCTRL_REG, meaning that queue-high
      comes from the MVPP2_CLS_SWFWD_P2HQ_REG() register... and this seems to
      be where our bug comes from.
      
      mvpp2_cls_oversize_rxq_set() sets this up as:
      
              mvpp2_write(port->priv, MVPP2_CLS_SWFWD_P2HQ_REG(port->id),
                          (port->first_rxq >> MVPP2_CLS_OVERSIZE_RXQ_LOW_BITS));
      
              val = mvpp2_read(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG);
              val |= MVPP2_CLS_SWFWD_PCTRL_MASK(port->id);
              mvpp2_write(port->priv, MVPP2_CLS_SWFWD_PCTRL_REG, val);
      
      Setting the MVPP2_CLS_SWFWD_PCTRL_MASK bit means that the queue-high
      for eth2 is _always_ 4, so only queues 32 through 39 inclusive are
      available to eth2. Yet, we're trying to tell the classifier to set
      queue-high, which will be ignored, to zero. Hence, the queue-high
      field (MVPP22_CLS_C2_ATTR0_QHIGH()) from the classifier will be
      ignored.
      
      This means we end up directing traffic from eth2 not to queue 1, but
      to queue 33, and then we tell it to look up queue 33 in the RSS table.
      However, RSS table has not been programmed for queue 33, and so it ends
      up (presumably) dropping the packets.
      
      It seems that mvpp22_rss_context_create() doesn't take account of the
      fact that the upper 5 bits of the queue ID can't actually be changed
      due to the settings in mvpp2_cls_oversize_rxq_set(), _or_ it seems that
      mvpp2_cls_oversize_rxq_set() has been missed in this commit. Either
      way, these two functions mutually disagree with what queue number
      should be used.
      
      Looking deeper into what mvpp2_cls_oversize_rxq_set() and the MTU
      validation is doing, it seems that MVPP2_CLS_SWFWD_P2HQ_REG() is used
      for over-sized packets attempting to egress through this port. With
      the classifier having had RSS enabled and directing eth2 traffic to
      queue 1, we may still have packets appearing on queue 32 for this port.
      
      However, the only way we may end up with over-sized packets attempting
      to egress through eth2 - is if the A8040 forwards frames between its
      ports. From what I can see, we don't support that feature, and the
      kernel restricts the egress packet size to the MTU. In any case, if we
      were to attempt to transmit an oversized packet, we have no support in
      the kernel to deal with that appearing in the port's receive queue.
      
      So, this patch attempts to solve the issue by clearing the
      MVPP2_CLS_SWFWD_PCTRL_MASK() bit, allowing MVPP22_CLS_C2_ATTR0_QHIGH()
      from the classifier to define the queue-high field of the queue number.
      
      My testing seems to confirm my findings above - clearing this bit
      means that if I enable rxhash on eth2, the interface can then pass
      traffic, as we are now directing traffic to RX queue 1 rather than
      queue 33. Traffic still seems to work with rxhash off as well.
      Reported-by: NMatteo Croce <mcroce@redhat.com>
      Tested-by: NMatteo Croce <mcroce@redhat.com>
      Fixes: 895586d5 ("net: mvpp2: cls: Use RSS contexts to handle RSS tables")
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3138a07c
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · d3b968bc
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2020-05-22
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 3 non-merge commits during the last 3 day(s) which contain
      a total of 5 files changed, 69 insertions(+), 11 deletions(-).
      
      The main changes are:
      
      1) Fix to reject mmap()'ing read-only array maps as writable since BPF verifier
         relies on such map content to be frozen, from Andrii Nakryiko.
      
      2) Fix breaking audit from secid_to_secctx() LSM hook by avoiding to use
         call_int_hook() since this hook is not stackable, from KP Singh.
      
      3) Fix BPF flow dissector program ref leak on netns cleanup, from Jakub Sitnicki.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3b968bc
    • C
      felix: Fix initialization of ioremap resources · b4024c9e
      Claudiu Manoil 提交于
      The caller of devm_ioremap_resource(), either accidentally
      or by wrong assumption, is writing back derived resource data
      to global static resource initialization tables that should
      have been constant.  Meaning that after it computes the final
      physical start address it saves the address for no reason
      in the static tables.  This doesn't affect the first driver
      probing after reboot, but it breaks consecutive driver reloads
      (i.e. driver unbind & bind) because the initialization tables
      no longer have the correct initial values.  So the next probe()
      will map the device registers to wrong physical addresses,
      causing ARM SError async exceptions.
      This patch fixes all of the above.
      
      Fixes: 56051948 ("net: dsa: ocelot: add driver for Felix switch family")
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Reviewed-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4024c9e
    • T
      mptcp: use untruncated hash in ADD_ADDR HMAC · bd697222
      Todd Malsbary 提交于
      There is some ambiguity in the RFC as to whether the ADD_ADDR HMAC is
      the rightmost 64 bits of the entire hash or of the leftmost 160 bits
      of the hash.  The intention, as clarified with the author of the RFC,
      is the entire hash.
      
      This change returns the entire hash from
      mptcp_crypto_hmac_sha (instead of only the first 160 bits), and moves
      any truncation/selection operation on the hash to the caller.
      
      Fixes: 12555a2d ("mptcp: use rightmost 64 bits in ADD_ADDR HMAC")
      Reviewed-by: NChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NTodd Malsbary <todd.malsbary@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd697222
  3. 22 5月, 2020 12 次提交
  4. 21 5月, 2020 2 次提交
    • S
      net: nlmsg_cancel() if put fails for nhmsg · d69100b8
      Stephen Worley 提交于
      Fixes data remnant seen when we fail to reserve space for a
      nexthop group during a larger dump.
      
      If we fail the reservation, we goto nla_put_failure and
      cancel the message.
      
      Reproduce with the following iproute2 commands:
      =====================
      ip link add dummy1 type dummy
      ip link add dummy2 type dummy
      ip link add dummy3 type dummy
      ip link add dummy4 type dummy
      ip link add dummy5 type dummy
      ip link add dummy6 type dummy
      ip link add dummy7 type dummy
      ip link add dummy8 type dummy
      ip link add dummy9 type dummy
      ip link add dummy10 type dummy
      ip link add dummy11 type dummy
      ip link add dummy12 type dummy
      ip link add dummy13 type dummy
      ip link add dummy14 type dummy
      ip link add dummy15 type dummy
      ip link add dummy16 type dummy
      ip link add dummy17 type dummy
      ip link add dummy18 type dummy
      ip link add dummy19 type dummy
      ip link add dummy20 type dummy
      ip link add dummy21 type dummy
      ip link add dummy22 type dummy
      ip link add dummy23 type dummy
      ip link add dummy24 type dummy
      ip link add dummy25 type dummy
      ip link add dummy26 type dummy
      ip link add dummy27 type dummy
      ip link add dummy28 type dummy
      ip link add dummy29 type dummy
      ip link add dummy30 type dummy
      ip link add dummy31 type dummy
      ip link add dummy32 type dummy
      
      ip link set dummy1 up
      ip link set dummy2 up
      ip link set dummy3 up
      ip link set dummy4 up
      ip link set dummy5 up
      ip link set dummy6 up
      ip link set dummy7 up
      ip link set dummy8 up
      ip link set dummy9 up
      ip link set dummy10 up
      ip link set dummy11 up
      ip link set dummy12 up
      ip link set dummy13 up
      ip link set dummy14 up
      ip link set dummy15 up
      ip link set dummy16 up
      ip link set dummy17 up
      ip link set dummy18 up
      ip link set dummy19 up
      ip link set dummy20 up
      ip link set dummy21 up
      ip link set dummy22 up
      ip link set dummy23 up
      ip link set dummy24 up
      ip link set dummy25 up
      ip link set dummy26 up
      ip link set dummy27 up
      ip link set dummy28 up
      ip link set dummy29 up
      ip link set dummy30 up
      ip link set dummy31 up
      ip link set dummy32 up
      
      ip link set dummy33 up
      ip link set dummy34 up
      
      ip link set vrf-red up
      ip link set vrf-blue up
      
      ip link set dummyVRFred up
      ip link set dummyVRFblue up
      
      ip ro add 1.1.1.1/32 dev dummy1
      ip ro add 1.1.1.2/32 dev dummy2
      ip ro add 1.1.1.3/32 dev dummy3
      ip ro add 1.1.1.4/32 dev dummy4
      ip ro add 1.1.1.5/32 dev dummy5
      ip ro add 1.1.1.6/32 dev dummy6
      ip ro add 1.1.1.7/32 dev dummy7
      ip ro add 1.1.1.8/32 dev dummy8
      ip ro add 1.1.1.9/32 dev dummy9
      ip ro add 1.1.1.10/32 dev dummy10
      ip ro add 1.1.1.11/32 dev dummy11
      ip ro add 1.1.1.12/32 dev dummy12
      ip ro add 1.1.1.13/32 dev dummy13
      ip ro add 1.1.1.14/32 dev dummy14
      ip ro add 1.1.1.15/32 dev dummy15
      ip ro add 1.1.1.16/32 dev dummy16
      ip ro add 1.1.1.17/32 dev dummy17
      ip ro add 1.1.1.18/32 dev dummy18
      ip ro add 1.1.1.19/32 dev dummy19
      ip ro add 1.1.1.20/32 dev dummy20
      ip ro add 1.1.1.21/32 dev dummy21
      ip ro add 1.1.1.22/32 dev dummy22
      ip ro add 1.1.1.23/32 dev dummy23
      ip ro add 1.1.1.24/32 dev dummy24
      ip ro add 1.1.1.25/32 dev dummy25
      ip ro add 1.1.1.26/32 dev dummy26
      ip ro add 1.1.1.27/32 dev dummy27
      ip ro add 1.1.1.28/32 dev dummy28
      ip ro add 1.1.1.29/32 dev dummy29
      ip ro add 1.1.1.30/32 dev dummy30
      ip ro add 1.1.1.31/32 dev dummy31
      ip ro add 1.1.1.32/32 dev dummy32
      
      ip next add id 1 via 1.1.1.1 dev dummy1
      ip next add id 2 via 1.1.1.2 dev dummy2
      ip next add id 3 via 1.1.1.3 dev dummy3
      ip next add id 4 via 1.1.1.4 dev dummy4
      ip next add id 5 via 1.1.1.5 dev dummy5
      ip next add id 6 via 1.1.1.6 dev dummy6
      ip next add id 7 via 1.1.1.7 dev dummy7
      ip next add id 8 via 1.1.1.8 dev dummy8
      ip next add id 9 via 1.1.1.9 dev dummy9
      ip next add id 10 via 1.1.1.10 dev dummy10
      ip next add id 11 via 1.1.1.11 dev dummy11
      ip next add id 12 via 1.1.1.12 dev dummy12
      ip next add id 13 via 1.1.1.13 dev dummy13
      ip next add id 14 via 1.1.1.14 dev dummy14
      ip next add id 15 via 1.1.1.15 dev dummy15
      ip next add id 16 via 1.1.1.16 dev dummy16
      ip next add id 17 via 1.1.1.17 dev dummy17
      ip next add id 18 via 1.1.1.18 dev dummy18
      ip next add id 19 via 1.1.1.19 dev dummy19
      ip next add id 20 via 1.1.1.20 dev dummy20
      ip next add id 21 via 1.1.1.21 dev dummy21
      ip next add id 22 via 1.1.1.22 dev dummy22
      ip next add id 23 via 1.1.1.23 dev dummy23
      ip next add id 24 via 1.1.1.24 dev dummy24
      ip next add id 25 via 1.1.1.25 dev dummy25
      ip next add id 26 via 1.1.1.26 dev dummy26
      ip next add id 27 via 1.1.1.27 dev dummy27
      ip next add id 28 via 1.1.1.28 dev dummy28
      ip next add id 29 via 1.1.1.29 dev dummy29
      ip next add id 30 via 1.1.1.30 dev dummy30
      ip next add id 31 via 1.1.1.31 dev dummy31
      ip next add id 32 via 1.1.1.32 dev dummy32
      
      i=100
      
      while [ $i -le 200 ]
      do
      ip next add id $i group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
      
      	echo $i
      
      	((i++))
      
      done
      
      ip next add id 999 group 1/2/3/4/5/6
      
      ip next ls
      
      ========================
      
      Fixes: ab84be7e ("net: Initial nexthop code")
      Signed-off-by: NStephen Worley <sworley@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d69100b8
    • E
      ax25: fix setsockopt(SO_BINDTODEVICE) · 687775ce
      Eric Dumazet 提交于
      syzbot was able to trigger this trace [1], probably by using
      a zero optlen.
      
      While we are at it, cap optlen to IFNAMSIZ - 1 instead of IFNAMSIZ.
      
      [1]
      BUG: KMSAN: uninit-value in strnlen+0xf9/0x170 lib/string.c:569
      CPU: 0 PID: 8807 Comm: syz-executor483 Not tainted 5.7.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       strnlen+0xf9/0x170 lib/string.c:569
       dev_name_hash net/core/dev.c:207 [inline]
       netdev_name_node_lookup net/core/dev.c:277 [inline]
       __dev_get_by_name+0x75/0x2b0 net/core/dev.c:778
       ax25_setsockopt+0xfa3/0x1170 net/ax25/af_ax25.c:654
       __compat_sys_setsockopt+0x4ed/0x910 net/compat.c:403
       __do_compat_sys_setsockopt net/compat.c:413 [inline]
       __se_compat_sys_setsockopt+0xdd/0x100 net/compat.c:410
       __ia32_compat_sys_setsockopt+0x62/0x80 net/compat.c:410
       do_syscall_32_irqs_on arch/x86/entry/common.c:339 [inline]
       do_fast_syscall_32+0x3bf/0x6d0 arch/x86/entry/common.c:398
       entry_SYSENTER_compat+0x68/0x77 arch/x86/entry/entry_64_compat.S:139
      RIP: 0023:0xf7f57dd9
      Code: 90 e8 0b 00 00 00 f3 90 0f ae e8 eb f9 8d 74 26 00 89 3c 24 c3 90 90 90 90 90 90 90 90 90 90 90 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 eb 0d 90 90 90 90 90 90 90 90 90 90 90 90
      RSP: 002b:00000000ffae8c1c EFLAGS: 00000217 ORIG_RAX: 000000000000016e
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000000101
      RDX: 0000000000000019 RSI: 0000000020000000 RDI: 0000000000000004
      RBP: 0000000000000012 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
      
      Local variable ----devname@ax25_setsockopt created at:
       ax25_setsockopt+0xe6/0x1170 net/ax25/af_ax25.c:536
       ax25_setsockopt+0xe6/0x1170 net/ax25/af_ax25.c:536
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      687775ce