1. 19 1月, 2020 2 次提交
    • R
      cxgb4: fix Tx multi channel port rate limit · c856e2b6
      Rahul Lakkireddy 提交于
      T6 can support 2 egress traffic management channels per port to
      double the total number of traffic classes that can be configured.
      In this configuration, if the class belongs to the other channel,
      then all the queues must be bound again explicitly to the new class,
      for the rate limit parameters on the other channel to take effect.
      
      So, always explicitly bind all queues to the port rate limit traffic
      class, regardless of the traffic management channel that it belongs
      to. Also, only bind queues to port rate limit traffic class, if all
      the queues don't already belong to an existing different traffic
      class.
      
      Fixes: 4ec4762d ("cxgb4: add TC-MATCHALL classifier egress offload")
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c856e2b6
    • E
      net: sched: act_ctinfo: fix memory leak · 09d4f10a
      Eric Dumazet 提交于
      Implement a cleanup method to properly free ci->params
      
      BUG: memory leak
      unreferenced object 0xffff88811746e2c0 (size 64):
        comm "syz-executor617", pid 7106, jiffies 4294943055 (age 14.250s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          c0 34 60 84 ff ff ff ff 00 00 00 00 00 00 00 00  .4`.............
        backtrace:
          [<0000000015aa236f>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline]
          [<0000000015aa236f>] slab_post_alloc_hook mm/slab.h:586 [inline]
          [<0000000015aa236f>] slab_alloc mm/slab.c:3320 [inline]
          [<0000000015aa236f>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549
          [<000000002c946bd1>] kmalloc include/linux/slab.h:556 [inline]
          [<000000002c946bd1>] kzalloc include/linux/slab.h:670 [inline]
          [<000000002c946bd1>] tcf_ctinfo_init+0x21a/0x530 net/sched/act_ctinfo.c:236
          [<0000000086952cca>] tcf_action_init_1+0x400/0x5b0 net/sched/act_api.c:944
          [<000000005ab29bf8>] tcf_action_init+0x135/0x1c0 net/sched/act_api.c:1000
          [<00000000392f56f9>] tcf_action_add+0x9a/0x200 net/sched/act_api.c:1410
          [<0000000088f3c5dd>] tc_ctl_action+0x14d/0x1bb net/sched/act_api.c:1465
          [<000000006b39d986>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424
          [<00000000fd6ecace>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477
          [<0000000047493d02>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
          [<00000000bdcf8286>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
          [<00000000bdcf8286>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328
          [<00000000fc5b92d9>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917
          [<00000000da84d076>] sock_sendmsg_nosec net/socket.c:639 [inline]
          [<00000000da84d076>] sock_sendmsg+0x54/0x70 net/socket.c:659
          [<0000000042fb2eee>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330
          [<000000008f23f67e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384
          [<00000000d838e4f6>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417
          [<00000000289a9cb1>] __do_sys_sendmsg net/socket.c:2426 [inline]
          [<00000000289a9cb1>] __se_sys_sendmsg net/socket.c:2424 [inline]
          [<00000000289a9cb1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424
      
      Fixes: 24ec483c ("net: sched: Introduce act_ctinfo action")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Toke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: NKevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09d4f10a
  2. 18 1月, 2020 4 次提交
  3. 17 1月, 2020 8 次提交
    • F
      net: systemport: Fixed queue mapping in internal ring map · 5a9ef194
      Florian Fainelli 提交于
      We would not be transmitting using the correct SYSTEMPORT transmit queue
      during ndo_select_queue() which looks up the internal TX ring map
      because while establishing the mapping we would be off by 4, so for
      instance, when we populate switch port mappings we would be doing:
      
      switch port 0, queue 0 -> ring index #0
      switch port 0, queue 1 -> ring index #1
      ...
      switch port 0, queue 3 -> ring index #3
      switch port 1, queue 0 -> ring index #8 (4 + 4 * 1)
      ...
      
      instead of using ring index #4. This would cause our ndo_select_queue()
      to use the fallback queue mechanism which would pick up an incorrect
      ring for that switch port. Fix this by using the correct switch queue
      number instead of SYSTEMPORT queue number.
      
      Fixes: 25c44070 ("net: systemport: Simplify queue mapping logic")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a9ef194
    • F
      net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec · 8f1880cb
      Florian Fainelli 提交于
      With the implementation of the system reset controller we lost a setting
      that is currently applied by the bootloader and which configures the IMP
      port for 2Gb/sec, the default is 1Gb/sec. This is needed given the
      number of ports and applications we expect to run so bring back that
      setting.
      
      Fixes: 01b0ac07589e ("net: dsa: bcm_sf2: Add support for optional reset controller line")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f1880cb
    • V
      net: dsa: sja1105: Don't error out on disabled ports with no phy-mode · 27afe0d3
      Vladimir Oltean 提交于
      The sja1105_parse_ports_node function was tested only on device trees
      where all ports were enabled. Fix this check so that the driver
      continues to probe only with the ports where status is not "disabled",
      as expected.
      
      Fixes: 8aa9ebcc ("net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27afe0d3
    • M
      net: phy: dp83867: Set FORCE_LINK_GOOD to default after reset · 86ffe920
      Michael Grzeschik 提交于
      According to the Datasheet this bit should be 0 (Normal operation) in
      default. With the FORCE_LINK_GOOD bit set, it is not possible to get a
      link. This patch sets FORCE_LINK_GOOD to the default value after
      resetting the phy.
      Signed-off-by: NMichael Grzeschik <m.grzeschik@pengutronix.de>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86ffe920
    • Y
      net: hns: fix soft lockup when there is not enough memory · 49edd6a2
      Yonglong Liu 提交于
      When there is not enough memory and napi_alloc_skb() return NULL,
      the HNS driver will print error message, and than try again, if
      the memory is not enough for a while, huge error message and the
      retry operation will cause soft lockup.
      
      When napi_alloc_skb() return NULL because of no memory, we can
      get a warn_alloc() call trace, so this patch deletes the error
      message. We already use polling mode to handle irq, but the
      retry operation will render the polling weight inactive, this
      patch just return budget when the rx is not completed to avoid
      dead loop.
      
      Fixes: 36eedfde ("net: hns: Optimize hns_nic_common_poll for better performance")
      Fixes: b5996f11 ("net: add Hisilicon Network Subsystem basic ethernet support")
      Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49edd6a2
    • C
      net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key() · 53d37497
      Cong Wang 提交于
      syzbot reported some bogus lockdep warnings, for example bad unlock
      balance in sch_direct_xmit(). They are due to a race condition between
      slow path and fast path, that is qdisc_xmit_lock_key gets re-registered
      in netdev_update_lockdep_key() on slow path, while we could still
      acquire the queue->_xmit_lock on fast path in this small window:
      
      CPU A						CPU B
      						__netif_tx_lock();
      lockdep_unregister_key(qdisc_xmit_lock_key);
      						__netif_tx_unlock();
      lockdep_register_key(qdisc_xmit_lock_key);
      
      In fact, unlike the addr_list_lock which has to be reordered when
      the master/slave device relationship changes, queue->_xmit_lock is
      only acquired on fast path and only when NETIF_F_LLTX is not set,
      so there is likely no nested locking for it.
      
      Therefore, we can just get rid of re-registration of
      qdisc_xmit_lock_key.
      
      Reported-by: syzbot+4ec99438ed7450da6272@syzkaller.appspotmail.com
      Fixes: ab92d68f ("net: core: add generic lockdep keys")
      Cc: Taehee Yoo <ap420073@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53d37497
    • E
      net/sched: act_ife: initalize ife->metalist earlier · 44c23d71
      Eric Dumazet 提交于
      It seems better to init ife->metalist earlier in tcf_ife_init()
      to avoid the following crash :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline]
      RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431
      Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b
      RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09
      RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000
      RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e
      R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8
      R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000
      FS:  0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119
       __tcf_action_put+0xfa/0x130 net/sched/act_api.c:135
       __tcf_idr_release net/sched/act_api.c:165 [inline]
       __tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145
       tcf_idr_release include/net/act_api.h:171 [inline]
       tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616
       tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944
       tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000
       tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410
       tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465
       rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424
       netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442
       netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline]
       netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328
       netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:639 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:659
       ____sys_sendmsg+0x753/0x880 net/socket.c:2330
       ___sys_sendmsg+0x100/0x170 net/socket.c:2384
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2417
       __do_sys_sendmsg net/socket.c:2426 [inline]
       __se_sys_sendmsg net/socket.c:2424 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 11a94d7f ("net/sched: act_ife: validate the control action inside init()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Davide Caratti <dcaratti@redhat.com>
      Reviewed-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44c23d71
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · a72b6a1e
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) Fix use-after-free in ipset bitmap destroy path, from Cong Wang.
      
      2) Missing init netns in entry cleanup path of arp_tables,
         from Florian Westphal.
      
      3) Fix WARN_ON in set destroy path due to missing cleanup on
         transaction error.
      
      4) Incorrect netlink sanity check in tunnel, from Florian Westphal.
      
      5) Missing sanity check for erspan version netlink attribute, also
         from Florian.
      
      6) Remove WARN in nft_request_module() that can be triggered from
         userspace, from Florian Westphal.
      
      7) Memleak in NFTA_HOOK_DEVS netlink parser, from Dan Carpenter.
      
      8) List poison from commit path for flowtables that are added and
         deleted in the same batch, from Florian Westphal.
      
      9) Fix NAT ICMP packet corruption, from Eyal Birger.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a72b6a1e
  4. 16 1月, 2020 26 次提交
    • E
      netfilter: nat: fix ICMP header corruption on ICMP errors · 61177e91
      Eyal Birger 提交于
      Commit 8303b7e8 ("netfilter: nat: fix spurious connection timeouts")
      made nf_nat_icmp_reply_translation() use icmp_manip_pkt() as the l4
      manipulation function for the outer packet on ICMP errors.
      
      However, icmp_manip_pkt() assumes the packet has an 'id' field which
      is not correct for all types of ICMP messages.
      
      This is not correct for ICMP error packets, and leads to bogus bytes
      being written the ICMP header, which can be wrongfully regarded as
      'length' bytes by RFC 4884 compliant receivers.
      
      Fix by assigning the 'id' field only for ICMP messages that have this
      semantic.
      Reported-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Fixes: 8303b7e8 ("netfilter: nat: fix spurious connection timeouts")
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      61177e91
    • M
      net: wan: lapbether.c: Use built-in RCU list checking · 93ad0f96
      Madhuparna Bhowmik 提交于
      The only callers of the function lapbeth_get_x25_dev()
      are lapbeth_rcv() and lapbeth_device_event().
      
      lapbeth_rcv() uses rcu_read_lock() whereas lapbeth_device_event()
      is called with RTNL held (As mentioned in the comments).
      
      Therefore, pass lockdep_rtnl_is_held() as cond argument in
      list_for_each_entry_rcu();
      Signed-off-by: NMadhuparna Bhowmik <madhuparnabhowmik04@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93ad0f96
    • F
      netfilter: nf_tables: fix flowtable list del corruption · 335178d5
      Florian Westphal 提交于
      syzbot reported following crash:
      
        list_del corruption, ffff88808c9bb000->prev is LIST_POISON2 (dead000000000122)
        [..]
        Call Trace:
         __list_del_entry include/linux/list.h:131 [inline]
         list_del_rcu include/linux/rculist.h:148 [inline]
         nf_tables_commit+0x1068/0x3b30 net/netfilter/nf_tables_api.c:7183
         [..]
      
      The commit transaction list has:
      
      NFT_MSG_NEWTABLE
      NFT_MSG_NEWFLOWTABLE
      NFT_MSG_DELFLOWTABLE
      NFT_MSG_DELTABLE
      
      A missing generation check during DELTABLE processing causes it to queue
      the DELFLOWTABLE operation a second time, so we corrupt the list here:
      
        case NFT_MSG_DELFLOWTABLE:
           list_del_rcu(&nft_trans_flowtable(trans)->list);
           nf_tables_flowtable_notify(&trans->ctx,
      
      because we have two different DELFLOWTABLE transactions for the same
      flowtable.  We then call list_del_rcu() twice for the same flowtable->list.
      
      The object handling seems to suffer from the same bug so add a generation
      check too and only queue delete transactions for flowtables/objects that
      are still active in the next generation.
      
      Reported-by: syzbot+37a6804945a3a13b1572@syzkaller.appspotmail.com
      Fixes: 3b49e2e9 ("netfilter: nf_tables: add flow table netlink frontend")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      335178d5
    • D
      netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks() · cd77e75b
      Dan Carpenter 提交于
      Syzbot detected a leak in nf_tables_parse_netdev_hooks().  If the hook
      already exists, then the error handling doesn't free the newest "hook".
      
      Reported-by: syzbot+f9d4095107fc8749c69c@syzkaller.appspotmail.com
      Fixes: b75a3e83 ("netfilter: nf_tables: allow netdevice to be used only once per flowtable")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cd77e75b
    • F
      netfilter: nf_tables: remove WARN and add NLA_STRING upper limits · 9332d27d
      Florian Westphal 提交于
      This WARN can trigger because some of the names fed to the module
      autoload function can be of arbitrary length.
      
      Remove the WARN and add limits for all NLA_STRING attributes.
      
      Reported-by: syzbot+0e63ae76d117ae1c3a01@syzkaller.appspotmail.com
      Fixes: 452238e8 ("netfilter: nf_tables: add and use helper for module autoload")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9332d27d
    • F
      netfilter: nft_tunnel: ERSPAN_VERSION must not be null · 9ec22d7c
      Florian Westphal 提交于
      Fixes: af308b94 ("netfilter: nf_tables: add tunnel support")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9ec22d7c
    • F
      netfilter: nft_tunnel: fix null-attribute check · 1c702bf9
      Florian Westphal 提交于
      else we get null deref when one of the attributes is missing, both
      must be non-null.
      
      Reported-by: syzbot+76d0b80493ac881ff77b@syzkaller.appspotmail.com
      Fixes: aaecfdb5 ("netfilter: nf_tables: match on tunnel metadata")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1c702bf9
    • P
      netfilter: nf_tables: store transaction list locally while requesting module · ec7470b8
      Pablo Neira Ayuso 提交于
      This patch fixes a WARN_ON in nft_set_destroy() due to missing
      set reference count drop from the preparation phase. This is triggered
      by the module autoload path. Do not exercise the abort path from
      nft_request_module() while preparation phase cleaning up is still
      pending.
      
       WARNING: CPU: 3 PID: 3456 at net/netfilter/nf_tables_api.c:3740 nft_set_destroy+0x45/0x50 [nf_tables]
       [...]
       CPU: 3 PID: 3456 Comm: nft Not tainted 5.4.6-arch3-1 #1
       RIP: 0010:nft_set_destroy+0x45/0x50 [nf_tables]
       Code: e8 30 eb 83 c6 48 8b 85 80 00 00 00 48 8b b8 90 00 00 00 e8 dd 6b d7 c5 48 8b 7d 30 e8 24 dd eb c5 48 89 ef 5d e9 6b c6 e5 c5 <0f> 0b c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 7f 10 e9 52
       RSP: 0018:ffffac4f43e53700 EFLAGS: 00010202
       RAX: 0000000000000001 RBX: ffff99d63a154d80 RCX: 0000000001f88e03
       RDX: 0000000001f88c03 RSI: ffff99d6560ef0c0 RDI: ffff99d63a101200
       RBP: ffff99d617721de0 R08: 0000000000000000 R09: 0000000000000318
       R10: 00000000f0000000 R11: 0000000000000001 R12: ffffffff880fabf0
       R13: dead000000000122 R14: dead000000000100 R15: ffff99d63a154d80
       FS:  00007ff3dbd5b740(0000) GS:ffff99d6560c0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00001cb5de6a9000 CR3: 000000016eb6a004 CR4: 00000000001606e0
       Call Trace:
        __nf_tables_abort+0x3e3/0x6d0 [nf_tables]
        nft_request_module+0x6f/0x110 [nf_tables]
        nft_expr_type_request_module+0x28/0x50 [nf_tables]
        nf_tables_expr_parse+0x198/0x1f0 [nf_tables]
        nft_expr_init+0x3b/0xf0 [nf_tables]
        nft_dynset_init+0x1e2/0x410 [nf_tables]
        nf_tables_newrule+0x30a/0x930 [nf_tables]
        nfnetlink_rcv_batch+0x2a0/0x640 [nfnetlink]
        nfnetlink_rcv+0x125/0x171 [nfnetlink]
        netlink_unicast+0x179/0x210
        netlink_sendmsg+0x208/0x3d0
        sock_sendmsg+0x5e/0x60
        ____sys_sendmsg+0x21b/0x290
      
      Update comment on the code to describe the new behaviour.
      Reported-by: NMarco Oliverio <marco.oliverio@tanaza.com>
      Fixes: 452238e8 ("netfilter: nf_tables: add and use helper for module autoload")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ec7470b8
    • A
      net: dsa: tag_qca: fix doubled Tx statistics · bd5874da
      Alexander Lobakin 提交于
      DSA subsystem takes care of netdev statistics since commit 4ed70ce9
      ("net: dsa: Refactor transmit path to eliminate duplication"), so
      any accounting inside tagger callbacks is redundant and can lead to
      messing up the stats.
      This bug is present in Qualcomm tagger since day 0.
      
      Fixes: cafdc45c ("net-next: dsa: add Qualcomm tag RX/TX handler")
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NAlexander Lobakin <alobakin@dlink.ru>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd5874da
    • A
      net: dsa: tag_gswip: fix typo in tagger name · ad322054
      Alexander Lobakin 提交于
      The correct name is GSWIP (Gigabit Switch IP). Typo was introduced in
      875138f8 ("dsa: Move tagger name into its ops structure") while
      moving tagger names to their structures.
      
      Fixes: 875138f8 ("dsa: Move tagger name into its ops structure")
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NAlexander Lobakin <alobakin@dlink.ru>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NHauke Mehrtens <hauke@hauke-m.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad322054
    • K
      net: ethernet: ave: Avoid lockdep warning · 82d5d6a6
      Kunihiko Hayashi 提交于
      When building with PROVE_LOCKING=y, lockdep shows the following
      dump message.
      
          INFO: trying to register non-static key.
          the code is fine but needs lockdep annotation.
          turning off the locking correctness validator.
           ...
      
      Calling device_set_wakeup_enable() directly occurs this issue,
      and it isn't necessary for initialization, so this patch creates
      internal function __ave_ethtool_set_wol() and replaces with this
      in ave_init() and ave_resume().
      
      Fixes: 7200f2e3 ("net: ethernet: ave: Set initial wol state to disabled")
      Signed-off-by: NKunihiko Hayashi <hayashi.kunihiko@socionext.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82d5d6a6
    • Y
      net: hns3: pad the short frame before sending to the hardware · 36c67349
      Yunsheng Lin 提交于
      The hardware can not handle short frames below or equal to 32
      bytes according to the hardware user manual, and it will trigger
      a RAS error when the frame's length is below 33 bytes.
      
      This patch pads the SKB when skb->len is below 33 bytes before
      sending it to hardware.
      
      Fixes: 76ad4f0e ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
      Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36c67349
    • E
      macvlan: use skb_reset_mac_header() in macvlan_queue_xmit() · 1712b2ff
      Eric Dumazet 提交于
      I missed the fact that macvlan_broadcast() can be used both
      in RX and TX.
      
      skb_eth_hdr() makes only sense in TX paths, so we can not
      use it blindly in macvlan_broadcast()
      
      Fixes: 96cc4b69 ("macvlan: do not assume mac_header is set in macvlan_broadcast()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJurgen Van Ham <juvanham@gmail.com>
      Tested-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1712b2ff
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 3981f955
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2020-01-15
      
      The following pull-request contains BPF updates for your *net* tree.
      
      We've added 12 non-merge commits during the last 9 day(s) which contain
      a total of 13 files changed, 95 insertions(+), 43 deletions(-).
      
      The main changes are:
      
      1) Fix refcount leak for TCP time wait and request sockets for socket lookup
         related BPF helpers, from Lorenz Bauer.
      
      2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann.
      
      3) Batch of several sockmap and related TLS fixes found while operating
         more complex BPF programs with Cilium and OpenSSL, from John Fastabend.
      
      4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue()
         to avoid purging data upon teardown, from Lingpeng Chen.
      
      5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly
         dump a BPF map's value with BTF, from Martin KaFai Lau.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3981f955
    • D
      Merge branch 'bpf-sockmap-tls-fixes' · 85ddd9c3
      Daniel Borkmann 提交于
      John Fastabend says:
      
      ====================
      To date our usage of sockmap/tls has been fairly simple, the BPF programs
      did only well-defined pop, push, pull and apply/cork operations.
      
      Now that we started to push more complex programs into sockmap we uncovered
      a series of issues addressed here. Further OpenSSL3.0 version should be
      released soon with kTLS support so its important to get any remaining
      issues on BPF and kTLS support resolved.
      
      Additionally, I have a patch under development to allow sockmap to be
      enabled/disabled at runtime for Cilium endpoints. This allows us to stress
      the map insert/delete with kTLS more than previously where Cilium only
      added the socket to the map when it entered ESTABLISHED state and never
      touched it from the control path side again relying on the sockets own
      close() hook to remove it.
      
      To test I have a set of test cases in test_sockmap.c that expose these
      issues. Once we get fixes here merged and in bpf-next I'll submit the
      tests to bpf-next tree to ensure we don't regress again. Also I've run
      these patches in the Cilium CI with OpenSSL (master branch) this will
      run tools such as netperf, ab, wrk2, curl, etc. to get a broad set of
      testing.
      
      I'm aware of two more issues that we are working to resolve in another
      couple (probably two) patches. First we see an auth tag corruption in
      kTLS when sending small 1byte chunks under stress. I've not pinned this
      down yet. But, guessing because its under 1B stress tests it must be
      some error path being triggered. And second we need to ensure BPF RX
      programs are not skipped when kTLS ULP is loaded. This breaks some of the
      sockmap selftests when running with kTLS. I'll send a follow up for this.
      
      v2: I dropped a patch that added !0 size check in tls_push_record
          this originated from a panic I caught awhile ago with a trace
          in the crypto stack. But I can not reproduce it anymore so will
          dig into that and send another patch later if needed. Anyways
          after a bit of thought it would be nicer if tls/crypto/bpf didn't
          require special case handling for the !0 size.
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      85ddd9c3
    • J
      bpf: Sockmap/tls, fix pop data with SK_DROP return code · 7361d448
      John Fastabend 提交于
      When user returns SK_DROP we need to reset the number of copied bytes
      to indicate to the user the bytes were dropped and not sent. If we
      don't reset the copied arg sendmsg will return as if those bytes were
      copied giving the user a positive return value.
      
      This works as expected today except in the case where the user also
      pops bytes. In the pop case the sg.size is reduced but we don't correctly
      account for this when copied bytes is reset. The popped bytes are not
      accounted for and we return a small positive value potentially confusing
      the user.
      
      The reason this happens is due to a typo where we do the wrong comparison
      when accounting for pop bytes. In this fix notice the if/else is not
      needed and that we have a similar problem if we push data except its not
      visible to the user because if delta is larger the sg.size we return a
      negative value so it appears as an error regardless.
      
      Fixes: 7246d8ed ("bpf: helper to pop data from messages")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-9-john.fastabend@gmail.com
      7361d448
    • J
      bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chaining · 9aaaa568
      John Fastabend 提交于
      Its possible through a set of push, pop, apply helper calls to construct
      a skmsg, which is just a ring of scatterlist elements, with the start
      value larger than the end value. For example,
      
            end       start
        |_0_|_1_| ... |_n_|_n+1_|
      
      Where end points at 1 and start points and n so that valid elements is
      the set {n, n+1, 0, 1}.
      
      Currently, because we don't build the correct chain only {n, n+1} will
      be sent. This adds a check and sg_chain call to correctly submit the
      above to the crypto and tls send path.
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-8-john.fastabend@gmail.com
      9aaaa568
    • J
      bpf: Sockmap/tls, tls_sw can create a plaintext buf > encrypt buf · d468e477
      John Fastabend 提交于
      It is possible to build a plaintext buffer using push helper that is larger
      than the allocated encrypt buffer. When this record is pushed to crypto
      layers this can result in a NULL pointer dereference because the crypto
      API expects the encrypt buffer is large enough to fit the plaintext
      buffer. Kernel splat below.
      
      To resolve catch the cases this can happen and split the buffer into two
      records to send individually. Unfortunately, there is still one case to
      handle where the split creates a zero sized buffer. In this case we merge
      the buffers and unmark the split. This happens when apply is zero and user
      pushed data beyond encrypt buffer. This fixes the original case as well
      because the split allocated an encrypt buffer larger than the plaintext
      buffer and the merge simply moves the pointers around so we now have
      a reference to the new (larger) encrypt buffer.
      
      Perhaps its not ideal but it seems the best solution for a fixes branch
      and avoids handling these two cases, (a) apply that needs split and (b)
      non apply case. The are edge cases anyways so optimizing them seems not
      necessary unless someone wants later in next branches.
      
      [  306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008
      [...]
      [  306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0
      [...]
      [  306.770350] Call Trace:
      [  306.770956]  scatterwalk_map_and_copy+0x6c/0x80
      [  306.772026]  gcm_enc_copy_hash+0x4b/0x50
      [  306.772925]  gcm_hash_crypt_remain_continue+0xef/0x110
      [  306.774138]  gcm_hash_crypt_continue+0xa1/0xb0
      [  306.775103]  ? gcm_hash_crypt_continue+0xa1/0xb0
      [  306.776103]  gcm_hash_assoc_remain_continue+0x94/0xa0
      [  306.777170]  gcm_hash_assoc_continue+0x9d/0xb0
      [  306.778239]  gcm_hash_init_continue+0x8f/0xa0
      [  306.779121]  gcm_hash+0x73/0x80
      [  306.779762]  gcm_encrypt_continue+0x6d/0x80
      [  306.780582]  crypto_gcm_encrypt+0xcb/0xe0
      [  306.781474]  crypto_aead_encrypt+0x1f/0x30
      [  306.782353]  tls_push_record+0x3b9/0xb20 [tls]
      [  306.783314]  ? sk_psock_msg_verdict+0x199/0x300
      [  306.784287]  bpf_exec_tx_verdict+0x3f2/0x680 [tls]
      [  306.785357]  tls_sw_sendmsg+0x4a3/0x6a0 [tls]
      
      test_sockmap test signature to trigger bug,
      
      [TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,):
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-7-john.fastabend@gmail.com
      d468e477
    • J
      bpf: Sockmap/tls, msg_push_data may leave end mark in place · cf21e9ba
      John Fastabend 提交于
      Leaving an incorrect end mark in place when passing to crypto
      layer will cause crypto layer to stop processing data before
      all data is encrypted. To fix clear the end mark on push
      data instead of expecting users of the helper to clear the
      mark value after the fact.
      
      This happens when we push data into the middle of a skmsg and
      have room for it so we don't do a set of copies that already
      clear the end flag.
      
      Fixes: 6fff607e ("bpf: sk_msg program helper bpf_msg_push_data")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-6-john.fastabend@gmail.com
      cf21e9ba
    • J
      bpf: Sockmap, skmsg helper overestimates push, pull, and pop bounds · 6562e29c
      John Fastabend 提交于
      In the push, pull, and pop helpers operating on skmsg objects to make
      data writable or insert/remove data we use this bounds check to ensure
      specified data is valid,
      
       /* Bounds checks: start and pop must be inside message */
       if (start >= offset + l || last >= msg->sg.size)
           return -EINVAL;
      
      The problem here is offset has already included the length of the
      current element the 'l' above. So start could be past the end of
      the scatterlist element in the case where start also points into an
      offset on the last skmsg element.
      
      To fix do the accounting slightly different by adding the length of
      the previous entry to offset at the start of the iteration. And
      ensure its initialized to zero so that the first iteration does
      nothing.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Fixes: 6fff607e ("bpf: sk_msg program helper bpf_msg_push_data")
      Fixes: 7246d8ed ("bpf: helper to pop data from messages")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-5-john.fastabend@gmail.com
      6562e29c
    • J
      bpf: Sockmap/tls, push write_space updates through ulp updates · 33bfe20d
      John Fastabend 提交于
      When sockmap sock with TLS enabled is removed we cleanup bpf/psock state
      and call tcp_update_ulp() to push updates to TLS ULP on top. However, we
      don't push the write_space callback up and instead simply overwrite the
      op with the psock stored previous op. This may or may not be correct so
      to ensure we don't overwrite the TLS write space hook pass this field to
      the ULP and have it fixup the ctx.
      
      This completes a previous fix that pushed the ops through to the ULP
      but at the time missed doing this for write_space, presumably because
      write_space TLS hook was added around the same time.
      
      Fixes: 95fa1454 ("bpf: sockmap/tls, close can race with map free")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-4-john.fastabend@gmail.com
      33bfe20d
    • J
      bpf: Sockmap, ensure sock lock held during tear down · 7e81a353
      John Fastabend 提交于
      The sock_map_free() and sock_hash_free() paths used to delete sockmap
      and sockhash maps walk the maps and destroy psock and bpf state associated
      with the socks in the map. When done the socks no longer have BPF programs
      attached and will function normally. This can happen while the socks in
      the map are still "live" meaning data may be sent/received during the walk.
      
      Currently, though we don't take the sock_lock when the psock and bpf state
      is removed through this path. Specifically, this means we can be writing
      into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc.
      while they are also being called from the networking side. This is not
      safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we
      believed it was safe. Further its not clear to me its even a good idea
      to try and do this on "live" sockets while networking side might also
      be using the socket. Instead of trying to reason about using the socks
      from both sides lets realize that every use case I'm aware of rarely
      deletes maps, in fact kubernetes/Cilium case builds map at init and
      never tears it down except on errors. So lets do the simple fix and
      grab sock lock.
      
      This patch wraps sock deletes from maps in sock lock and adds some
      annotations so we catch any other cases easier.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-3-john.fastabend@gmail.com
      7e81a353
    • J
      bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop · 4da6a196
      John Fastabend 提交于
      When a sockmap is free'd and a socket in the map is enabled with tls
      we tear down the bpf context on the socket, the psock struct and state,
      and then call tcp_update_ulp(). The tcp_update_ulp() call is to inform
      the tls stack it needs to update its saved sock ops so that when the tls
      socket is later destroyed it doesn't try to call the now destroyed psock
      hooks.
      
      This is about keeping stacked ULPs in good shape so they always have
      the right set of stacked ops.
      
      However, recently unhash() hook was removed from TLS side. But, the
      sockmap/bpf side is not doing any extra work to update the unhash op
      when is torn down instead expecting TLS side to manage it. So both
      TLS and sockmap believe the other side is managing the op and instead
      no one updates the hook so it continues to point at tcp_bpf_unhash().
      When unhash hook is called we call tcp_bpf_unhash() which detects the
      psock has already been destroyed and calls sk->sk_prot_unhash() which
      calls tcp_bpf_unhash() yet again and so on looping and hanging the core.
      
      To fix have sockmap tear down logic fixup the stale pointer.
      
      Fixes: 5d92e631 ("net/tls: partially revert fix transition through disconnect with close")
      Reported-by: syzbot+83979935eb6304f8cd46@syzkaller.appspotmail.com
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/bpf/20200111061206.8028-2-john.fastabend@gmail.com
      4da6a196
    • D
      Merge branch 'stmmac-Fix-selftests-in-Synopsys-AXS101-board' · 567110f1
      David S. Miller 提交于
      Jose Abreu says:
      
      ====================
      net: stmmac: Fix selftests in Synopsys AXS101 board
      
      Set of fixes for sefltests so that they work in Synopsys AXS101 board.
      
      Final output:
      
      $ ethtool -t eth0
      The test result is PASS
      The test extra info:
       1. MAC Loopback                 0
       2. PHY Loopback                 -95
       3. MMC Counters                 0
       4. EEE                          -95
       5. Hash Filter MC               0
       6. Perfect Filter UC            0
       7. MC Filter                    0
       8. UC Filter                    0
       9. Flow Control                 -95
      10. RSS                          -95
      11. VLAN Filtering               -95
      12. VLAN Filtering (perf)        -95
      13. Double VLAN Filter           -95
      14. Double VLAN Filter (perf)    -95
      15. Flexible RX Parser           -95
      16. SA Insertion (desc)          -95
      17. SA Replacement (desc)        -95
      18. SA Insertion (reg)           -95
      19. SA Replacement (reg)         -95
      20. VLAN TX Insertion            -95
      21. SVLAN TX Insertion           -95
      22. L3 DA Filtering              -95
      23. L3 SA Filtering              -95
      24. L4 DA TCP Filtering          -95
      25. L4 SA TCP Filtering          -95
      26. L4 DA UDP Filtering          -95
      27. L4 SA UDP Filtering          -95
      28. ARP Offload                  -95
      29. Jumbo Frame                  0
      30. Multichannel Jumbo           -95
      31. Split Header                 -95
      
      Description:
      
      1) Fixes the unaligned accesses that caused CPU halt in Synopsys AXS101
      boards.
      
      2) Fixes the VLAN tests when filtering failed to work.
      
      3) Fixes the VLAN Perfect tests when filtering is not available in HW.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      567110f1
    • J
      net: stmmac: selftests: Guard VLAN Perfect test against non supported HW · 4eee13f1
      Jose Abreu 提交于
      When HW does not support perfect filtering the feature will not be
      enabled in the net_device. Add a check for this to prevent failures.
      
      Fixes: 1b2250a0 ("net: stmmac: selftests: Add tests for VLAN Perfect Filtering")
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eee13f1
    • J
      net: stmmac: selftests: Mark as fail when received VLAN ID != expected · d39b68e5
      Jose Abreu 提交于
      When the VLAN ID does not match the expected one it means filter failed
      in HW. Fix it.
      
      Fixes: 94e18382 ("net: stmmac: selftests: Add selftest for VLAN TX Offload")
      Signed-off-by: NJose Abreu <Jose.Abreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d39b68e5