1. 23 4月, 2020 14 次提交
    • V
      net: dsa: don't fail to probe if we couldn't set the MTU · 72579e14
      Vladimir Oltean 提交于
      There is no reason to fail the probing of the switch if the MTU couldn't
      be configured correctly (either the switch port itself, or the host
      port) for whatever reason. MTU-sized traffic probably won't work, sure,
      but we can still probably limp on and support some form of communication
      anyway, which the users would probably appreciate more.
      
      Fixes: bfcb8132 ("net: dsa: configure the MTU for switch ports")
      Reported-by: NOleksij Rempel <o.rempel@pengutronix.de>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72579e14
    • E
      sched: etf: do not assume all sockets are full blown · a1211bf9
      Eric Dumazet 提交于
      skb->sk does not always point to a full blown socket,
      we need to use sk_fullsock() before accessing fields which
      only make sense on full socket.
      
      BUG: KASAN: use-after-free in report_sock_error+0x286/0x300 net/sched/sch_etf.c:141
      Read of size 1 at addr ffff88805eb9b245 by task syz-executor.5/9630
      
      CPU: 1 PID: 9630 Comm: syz-executor.5 Not tainted 5.7.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xd3/0x315 mm/kasan/report.c:382
       __kasan_report.cold+0x35/0x4d mm/kasan/report.c:511
       kasan_report+0x33/0x50 mm/kasan/common.c:625
       report_sock_error+0x286/0x300 net/sched/sch_etf.c:141
       etf_enqueue_timesortedlist+0x389/0x740 net/sched/sch_etf.c:170
       __dev_xmit_skb net/core/dev.c:3710 [inline]
       __dev_queue_xmit+0x154a/0x30a0 net/core/dev.c:4021
       neigh_hh_output include/net/neighbour.h:499 [inline]
       neigh_output include/net/neighbour.h:508 [inline]
       ip6_finish_output2+0xfb5/0x25b0 net/ipv6/ip6_output.c:117
       __ip6_finish_output+0x442/0xab0 net/ipv6/ip6_output.c:143
       ip6_finish_output+0x34/0x1f0 net/ipv6/ip6_output.c:153
       NF_HOOK_COND include/linux/netfilter.h:296 [inline]
       ip6_output+0x239/0x810 net/ipv6/ip6_output.c:176
       dst_output include/net/dst.h:435 [inline]
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip6_xmit+0xe1a/0x2090 net/ipv6/ip6_output.c:280
       tcp_v6_send_synack+0x4e7/0x960 net/ipv6/tcp_ipv6.c:521
       tcp_rtx_synack+0x10d/0x1a0 net/ipv4/tcp_output.c:3916
       inet_rtx_syn_ack net/ipv4/inet_connection_sock.c:669 [inline]
       reqsk_timer_handler+0x4c2/0xb40 net/ipv4/inet_connection_sock.c:763
       call_timer_fn+0x1ac/0x780 kernel/time/timer.c:1405
       expire_timers kernel/time/timer.c:1450 [inline]
       __run_timers kernel/time/timer.c:1774 [inline]
       __run_timers kernel/time/timer.c:1741 [inline]
       run_timer_softirq+0x623/0x1600 kernel/time/timer.c:1787
       __do_softirq+0x26c/0x9f7 kernel/softirq.c:292
       invoke_softirq kernel/softirq.c:373 [inline]
       irq_exit+0x192/0x1d0 kernel/softirq.c:413
       exiting_irq arch/x86/include/asm/apic.h:546 [inline]
       smp_apic_timer_interrupt+0x19e/0x600 arch/x86/kernel/apic/apic.c:1140
       apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
       </IRQ>
      RIP: 0010:des_encrypt+0x157/0x9c0 lib/crypto/des.c:792
      Code: 85 22 06 00 00 41 31 dc 41 8b 4d 04 44 89 e2 41 83 e4 3f 4a 8d 3c a5 60 72 72 88 81 e2 3f 3f 3f 3f 48 89 f8 48 c1 e8 03 31 d9 <0f> b6 34 28 48 89 f8 c1 c9 04 83 e0 07 83 c0 03 40 38 f0 7c 09 40
      RSP: 0018:ffffc90003b5f6c0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
      RAX: 1ffffffff10e4e55 RBX: 00000000d2f846d0 RCX: 00000000d2f846d0
      RDX: 0000000012380612 RSI: ffffffff839863ca RDI: ffffffff887272a8
      RBP: dffffc0000000000 R08: ffff888091d0a380 R09: 0000000000800081
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000012
      R13: ffff8880a8ae8078 R14: 00000000c545c93e R15: 0000000000000006
       cipher_crypt_one crypto/cipher.c:75 [inline]
       crypto_cipher_encrypt_one+0x124/0x210 crypto/cipher.c:82
       crypto_cbcmac_digest_update+0x1b5/0x250 crypto/ccm.c:830
       crypto_shash_update+0xc4/0x120 crypto/shash.c:119
       shash_ahash_update+0xa3/0x110 crypto/shash.c:246
       crypto_ahash_update include/crypto/hash.h:547 [inline]
       hash_sendmsg+0x518/0xad0 crypto/algif_hash.c:102
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x308/0x7e0 net/socket.c:2362
       ___sys_sendmsg+0x100/0x170 net/socket.c:2416
       __sys_sendmmsg+0x195/0x480 net/socket.c:2506
       __do_sys_sendmmsg net/socket.c:2535 [inline]
       __se_sys_sendmmsg net/socket.c:2532 [inline]
       __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2532
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      RIP: 0033:0x45c829
      Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f6d9528ec78 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000004fc080 RCX: 000000000045c829
      RDX: 0000000000000001 RSI: 0000000020002640 RDI: 0000000000000004
      RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000008d7 R14: 00000000004cb7aa R15: 00007f6d9528f6d4
      
      Fixes: 4b15c707 ("net/sched: Make etf report drops on error_queue")
      Fixes: 25db26a9 ("net/sched: Introduce the ETF Qdisc")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
      Reviewed-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1211bf9
    • D
      selftests: Fix suppress test in fib_tests.sh · 2c1dd4c1
      David Ahern 提交于
      fib_tests is spewing errors:
          ...
          Cannot open network namespace "ns1": No such file or directory
          Cannot open network namespace "ns1": No such file or directory
          Cannot open network namespace "ns1": No such file or directory
          Cannot open network namespace "ns1": No such file or directory
          ping: connect: Network is unreachable
          Cannot open network namespace "ns1": No such file or directory
          Cannot open network namespace "ns1": No such file or directory
          ...
      
      Each test entry in fib_tests is supposed to do its own setup and
      cleanup. Right now the $IP commands in fib_suppress_test are
      failing because there is no ns1. Add the setup/cleanup and logging
      expected for each test.
      
      Fixes: ca7a03c4 ("ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress rule")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Jason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c1dd4c1
    • D
      Merge branch 'net-dsa-b53-Various-ARL-fixes' · d5812a86
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: b53: Various ARL fixes
      
      This patch series fixes a number of short comings in the existing b53
      driver ARL management logic in particular:
      
      - we were not looking up the {MAC,VID} tuples against their VID, despite
        having VLANs enabled
      
      - the MDB entries (multicast) would lose their validity as soon as a
        single port in the vector would leave the entry
      
      - the ARL was currently under utilized because we would always place new
        entries in bin index #1, instead of using all possible bins available,
        thus reducing the ARL effective size by 50% or 75% depending on the
        switch generation
      
      - it was possible to overwrite the ARL entries because no proper space
        verification was done
      
      This patch series addresses all of these issues.
      
      Changes in v2:
      - added a new patch to correctly flip invidual VLAN learning vs. shared
        VLAN learning depending on the global VLAN state
      
      - added Andrew's R-b tags for patches which did not change
      
      - corrected some verbosity and minor issues in patch #4 to match caller
        expectations, also avoid a variable length DECLARE_BITMAP() call
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5812a86
    • F
      net: dsa: b53: b53_arl_rw_op() needs to select IVL or SVL · 64fec949
      Florian Fainelli 提交于
      Flip the IVL_SVL_SELECT bit correctly based on the VLAN enable status,
      the default is to perform Shared VLAN learning instead of Individual
      learning.
      
      Fixes: 1da6df85 ("net: dsa: b53: Implement ARL add/del/dump operations")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64fec949
    • F
      net: dsa: b53: Rework ARL bin logic · 6344dbde
      Florian Fainelli 提交于
      When asking the ARL to read a MAC address, we will get a number of bins
      returned in a single read. Out of those bins, there can essentially be 3
      states:
      
      - all bins are full, we have no space left, and we can either replace an
        existing address or return that full condition
      
      - the MAC address was found, then we need to return its bin index and
        modify that one, and only that one
      
      - the MAC address was not found and we have a least one bin free, we use
        that bin index location then
      
      The code would unfortunately fail on all counts.
      
      Fixes: 1da6df85 ("net: dsa: b53: Implement ARL add/del/dump operations")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6344dbde
    • F
      net: dsa: b53: Fix ARL register definitions · c2e77a18
      Florian Fainelli 提交于
      The ARL {MAC,VID} tuple and the forward entry were off by 0x10 bytes,
      which means that when we read/wrote from/to ARL bin index 0, we were
      actually accessing the ARLA_RWCTRL register.
      
      Fixes: 1da6df85 ("net: dsa: b53: Implement ARL add/del/dump operations")
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2e77a18
    • F
      net: dsa: b53: Fix valid setting for MDB entries · eab167f4
      Florian Fainelli 提交于
      When support for the MDB entries was added, the valid bit was correctly
      changed to be assigned depending on the remaining port bitmask, that is,
      if there were no more ports added to the entry's port bitmask, the entry
      now becomes invalid. There was another assignment a few lines below that
      would override this which would invalidate entries even when there were
      still multiple ports left in the MDB entry.
      
      Fixes: 5d65b64a ("net: dsa: b53: Add support for MDB")
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eab167f4
    • F
      net: dsa: b53: Lookup VID in ARL searches when VLAN is enabled · 2e97b0cd
      Florian Fainelli 提交于
      When VLAN is enabled, and an ARL search is issued, we also need to
      compare the full {MAC,VID} tuple before returning a successful search
      result.
      
      Fixes: 1da6df85 ("net: dsa: b53: Implement ARL add/del/dump operations")
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e97b0cd
    • D
      Merge branch 'vrf-looping' · 87f78f27
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net: Fix looping with vrf, xfrms and qdisc on VRF
      
      Trev reported that use of VRFs with xfrms is looping when a qdisc
      is added to the VRF device. The combination of xfrm + qdisc is not
      handled by the VRF driver which lost track that it has already
      seen the packet.
      
      The XFRM_TRANSFORMED flag is used by the netfilter code for a similar
      purpose, so re-use for VRF. Patch 1 drops the #ifdef around setting
      the flag in the xfrm output functions. Patch 2 adds a check to
      the VRF driver for flag; if set the packet has already passed through
      the VRF driver once and does not need to recirculated a second time.
      
      This is a day 1 bug with VRFs; stable wise, I would only take this
      back to 4.14. I have a set of test cases which I will submit to
      net-next.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87f78f27
    • D
      vrf: Check skb for XFRM_TRANSFORMED flag · 16b9db1c
      David Ahern 提交于
      To avoid a loop with qdiscs and xfrms, check if the skb has already gone
      through the qdisc attached to the VRF device and then to the xfrm layer.
      If so, no need for a second redirect.
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Reported-by: NTrev Larock <trev@larock.ca>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16b9db1c
    • D
      xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish · 0c922a48
      David Ahern 提交于
      IPSKB_XFRM_TRANSFORMED and IP6SKB_XFRM_TRANSFORMED are skb flags set by
      xfrm code to tell other skb handlers that the packet has been passed
      through the xfrm output functions. Simplify the code and just always
      set them rather than conditionally based on netfilter enabled thus
      making the flag available for other users.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c922a48
    • M
      ipv6: ndisc: RFC-ietf-6man-ra-pref64-09 is now published as RFC8781 · 9175d3f3
      Maciej Żenczykowski 提交于
      See:
        https://www.rfc-editor.org/authors/rfc8781.txt
      
      Cc: Erik Kline <ek@google.com>
      Cc: Jen Linkova <furry@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Michael Haro <mharo@google.com>
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Fixes: c24a77ed ("ipv6: ndisc: add support for 'PREF64' dns64 prefix identifier")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9175d3f3
    • Y
      net: phy: microchip_t1: add lan87xx_phy_init to initialize the lan87xx phy. · 63edbcce
      Yuiko Oshino 提交于
      lan87xx_phy_init() initializes the lan87xx phy hardware
      including its TC10 Wake-up and Sleep features.
      
      Fixes: 3e50d2da ("Add driver for Microchip LAN87XX T1 PHYs")
      Signed-off-by: NYuiko Oshino <yuiko.oshino@microchip.com>
      v0->v1:
          - Add more details in the commit message and source comments.
          - Update to the latest initialization sequences.
          - Add access_ereg_modify_changed().
          - Fix access_ereg() to access SMI bank correctly.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63edbcce
  2. 22 4月, 2020 8 次提交
    • V
      net: stmmac: Enable SERDES power up/down sequence · b9663b7c
      Voon Weifeng 提交于
      This patch is to enable Intel SERDES power up/down sequence. The SERDES
      converts 8/10 bits data to SGMII signal. Below is an example of
      HW configuration for SGMII mode. The SERDES is located in the PHY IF
      in the diagram below.
      
      <-----------------GBE Controller---------->|<--External PHY chip-->
      +----------+         +----+            +---+           +----------+
      |   EQoS   | <-GMII->| DW | < ------ > |PHY| <-SGMII-> | External |
      |   MAC    |         |xPCS|            |IF |           | PHY      |
      +----------+         +----+            +---+           +----------+
             ^               ^                 ^                ^
             |               |                 |                |
             +---------------------MDIO-------------------------+
      
      PHY IF configuration and status registers are accessible through
      mdio address 0x15 which is defined as mdio_adhoc_addr. During D0,
      The driver will need to power up PHY IF by changing the power state
      to P0. Likewise, for D3, the driver sets PHY IF power state to P3.
      Signed-off-by: NVoon Weifeng <weifeng.voon@intel.com>
      Signed-off-by: NOng Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9663b7c
    • D
      net: broadcom: convert to devm_platform_ioremap_resource_byname() · d7a5502b
      Dejin Zheng 提交于
      Use the function devm_platform_ioremap_resource_byname() to simplify
      source code which calls the functions platform_get_resource_byname()
      and devm_ioremap_resource(). Remove also a few error messages which
      became unnecessary with this software refactoring.
      Suggested-by: NMarkus Elfring <Markus.Elfring@web.de>
      Signed-off-by: NDejin Zheng <zhengdejin5@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7a5502b
    • T
      macvlan: fix null dereference in macvlan_device_event() · 4dee15b4
      Taehee Yoo 提交于
      In the macvlan_device_event(), the list_first_entry_or_null() is used.
      This function could return null pointer if there is no node.
      But, the macvlan module doesn't check the null pointer.
      So, null-ptr-deref would occur.
      
            bond0
              |
         +----+-----+
         |          |
      macvlan0   macvlan1
         |          |
       dummy0     dummy1
      
      The problem scenario.
      If dummy1 is removed,
      1. ->dellink() of dummy1 is called.
      2. NETDEV_UNREGISTER of dummy1 notification is sent to macvlan module.
      3. ->dellink() of macvlan1 is called.
      4. NETDEV_UNREGISTER of macvlan1 notification is sent to bond module.
      5. __bond_release_one() is called and it internally calls
         dev_set_mac_address().
      6. dev_set_mac_address() calls the ->ndo_set_mac_address() of macvlan1,
         which is macvlan_set_mac_address().
      7. macvlan_set_mac_address() calls the dev_set_mac_address() with dummy1.
      8. NETDEV_CHANGEADDR of dummy1 is sent to macvlan module.
      9. In the macvlan_device_event(), it calls list_first_entry_or_null().
      At this point, dummy1 and macvlan1 were removed.
      So, list_first_entry_or_null() will return NULL.
      
      Test commands:
          ip netns add nst
          ip netns exec nst ip link add bond0 type bond
          for i in {0..10}
          do
              ip netns exec nst ip link add dummy$i type dummy
      	ip netns exec nst ip link add macvlan$i link dummy$i \
      		type macvlan mode passthru
      	ip netns exec nst ip link set macvlan$i master bond0
          done
          ip netns del nst
      
      Splat looks like:
      [   40.585687][  T146] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP DEI
      [   40.587249][  T146] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
      [   40.588342][  T146] CPU: 1 PID: 146 Comm: kworker/u8:2 Not tainted 5.7.0-rc1+ #532
      [   40.589299][  T146] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   40.590469][  T146] Workqueue: netns cleanup_net
      [   40.591045][  T146] RIP: 0010:macvlan_device_event+0x4e2/0x900 [macvlan]
      [   40.591905][  T146] Code: 00 00 00 00 00 fc ff df 80 3c 06 00 0f 85 45 02 00 00 48 89 da 48 b8 00 00 00 00 00 fc ff d2
      [   40.594126][  T146] RSP: 0018:ffff88806116f4a0 EFLAGS: 00010246
      [   40.594783][  T146] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [   40.595653][  T146] RDX: 0000000000000000 RSI: ffff88806547ddd8 RDI: ffff8880540f1360
      [   40.596495][  T146] RBP: ffff88804011a808 R08: fffffbfff4fb8421 R09: fffffbfff4fb8421
      [   40.597377][  T146] R10: ffffffffa7dc2107 R11: 0000000000000000 R12: 0000000000000008
      [   40.598186][  T146] R13: ffff88804011a000 R14: ffff8880540f1000 R15: 1ffff1100c22de9a
      [   40.599012][  T146] FS:  0000000000000000(0000) GS:ffff888067800000(0000) knlGS:0000000000000000
      [   40.600004][  T146] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   40.600665][  T146] CR2: 00005572d3a807b8 CR3: 000000005fcf4003 CR4: 00000000000606e0
      [   40.601485][  T146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   40.602461][  T146] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   40.603443][  T146] Call Trace:
      [   40.603871][  T146]  ? nf_tables_dump_setelem+0xa0/0xa0 [nf_tables]
      [   40.604587][  T146]  ? macvlan_uninit+0x100/0x100 [macvlan]
      [   40.605212][  T146]  ? __module_text_address+0x13/0x140
      [   40.605842][  T146]  notifier_call_chain+0x90/0x160
      [   40.606477][  T146]  dev_set_mac_address+0x28e/0x3f0
      [   40.607117][  T146]  ? netdev_notify_peers+0xc0/0xc0
      [   40.607762][  T146]  ? __module_text_address+0x13/0x140
      [   40.608440][  T146]  ? notifier_call_chain+0x90/0x160
      [   40.609097][  T146]  ? dev_set_mac_address+0x1f0/0x3f0
      [   40.609758][  T146]  dev_set_mac_address+0x1f0/0x3f0
      [   40.610402][  T146]  ? __local_bh_enable_ip+0xe9/0x1b0
      [   40.611071][  T146]  ? bond_hw_addr_flush+0x77/0x100 [bonding]
      [   40.611823][  T146]  ? netdev_notify_peers+0xc0/0xc0
      [   40.612461][  T146]  ? bond_hw_addr_flush+0x77/0x100 [bonding]
      [   40.613213][  T146]  ? bond_hw_addr_flush+0x77/0x100 [bonding]
      [   40.613963][  T146]  ? __local_bh_enable_ip+0xe9/0x1b0
      [   40.614631][  T146]  ? bond_time_in_interval.isra.31+0x90/0x90 [bonding]
      [   40.615484][  T146]  ? __bond_release_one+0x9f0/0x12c0 [bonding]
      [   40.616230][  T146]  __bond_release_one+0x9f0/0x12c0 [bonding]
      [   40.616949][  T146]  ? bond_enslave+0x47c0/0x47c0 [bonding]
      [   40.617642][  T146]  ? lock_downgrade+0x730/0x730
      [   40.618218][  T146]  ? check_flags.part.42+0x450/0x450
      [   40.618850][  T146]  ? __mutex_unlock_slowpath+0xd0/0x670
      [   40.619519][  T146]  ? trace_hardirqs_on+0x30/0x180
      [   40.620117][  T146]  ? wait_for_completion+0x250/0x250
      [   40.620754][  T146]  bond_netdev_event+0x822/0x970 [bonding]
      [   40.621460][  T146]  ? __module_text_address+0x13/0x140
      [   40.622097][  T146]  notifier_call_chain+0x90/0x160
      [   40.622806][  T146]  rollback_registered_many+0x660/0xcf0
      [   40.623522][  T146]  ? netif_set_real_num_tx_queues+0x780/0x780
      [   40.624290][  T146]  ? notifier_call_chain+0x90/0x160
      [   40.624957][  T146]  ? netdev_upper_dev_unlink+0x114/0x180
      [   40.625686][  T146]  ? __netdev_adjacent_dev_unlink_neighbour+0x30/0x30
      [   40.626421][  T146]  ? mutex_is_locked+0x13/0x50
      [   40.627016][  T146]  ? unregister_netdevice_queue+0xf2/0x240
      [   40.627663][  T146]  unregister_netdevice_many.part.134+0x13/0x1b0
      [   40.628362][  T146]  default_device_exit_batch+0x2d9/0x390
      [   40.628987][  T146]  ? unregister_netdevice_many+0x40/0x40
      [   40.629615][  T146]  ? dev_change_net_namespace+0xcb0/0xcb0
      [   40.630279][  T146]  ? prepare_to_wait_exclusive+0x2e0/0x2e0
      [   40.630943][  T146]  ? ops_exit_list.isra.9+0x97/0x140
      [   40.631554][  T146]  cleanup_net+0x441/0x890
      [ ... ]
      
      Fixes: e289fd28 ("macvlan: fix the problem when mac address changes for passthru mode")
      Reported-by: syzbot+5035b1f9dc7ea4558d5a@syzkaller.appspotmail.com
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dee15b4
    • J
      e1000: remove unneeded conversion to bool · c95576a3
      Jason Yan 提交于
      The '==' expression itself is bool, no need to convert it to bool again.
      This fixes the following coccicheck warning:
      
      drivers/net/ethernet/intel/e1000/e1000_main.c:1479:44-49: WARNING:
      conversion to bool not needed here
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c95576a3
    • J
      i40e: Remove unneeded conversion to bool · 7ff4f063
      Jason Yan 提交于
      The '==' expression itself is bool, no need to convert it to bool again.
      This fixes the following coccicheck warning:
      
      drivers/net/ethernet/intel/i40e/i40e_main.c:1614:52-57: WARNING:
      conversion to bool not needed here
      drivers/net/ethernet/intel/i40e/i40e_main.c:11439:52-57: WARNING:
      conversion to bool not needed here
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ff4f063
    • J
      ptp: Remove unneeded conversion to bool · e9a9e519
      Jason Yan 提交于
      The '==' expression itself is bool, no need to convert it to bool again.
      This fixes the following coccicheck warning:
      
      drivers/ptp/ptp_ines.c:403:55-60: WARNING: conversion to bool not
      needed here
      drivers/ptp/ptp_ines.c:404:55-60: WARNING: conversion to bool not
      needed here
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9a9e519
    • J
      cgroup, netclassid: remove double cond_resched · 526f3d96
      Jiri Slaby 提交于
      Commit 018d26fc ("cgroup, netclassid: periodically release file_lock
      on classid") added a second cond_resched to write_classid indirectly by
      update_classid_task. Remove the one in write_classid.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Dmitry Yakunin <zeil@yandex-team.ru>
      Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      526f3d96
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 76fc6a9a
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for net:
      
      1) flow_block_cb memleak in nf_flow_table_offload_del_cb(), from Roi Dayan.
      
      2) Fix error path handling in nf_nat_inet_register_fn(), from Hillf Danton.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76fc6a9a
  3. 21 4月, 2020 16 次提交
    • D
    • Z
      net/mlx5e: Get the latest values from counters in switchdev mode · dcdf4ce0
      Zhu Yanjun 提交于
      In the switchdev mode, when running "cat
      /sys/class/net/NIC/statistics/tx_packets", the ppcnt register is
      accessed to get the latest values. But currently this command can
      not get the correct values from ppcnt.
      
      From firmware manual, before getting the 802_3 counters, the 802_3
      data layout should be set to the ppcnt register.
      
      When the command "cat /sys/class/net/NIC/statistics/tx_packets" is
      run, before updating 802_3 data layout with ppcnt register, the
      monitor counters are tested. The test result will decide the
      802_3 data layout is updated or not.
      
      Actually the monitor counters do not support to monitor rx/tx
      stats of 802_3 in switchdev mode. So the rx/tx counters change
      will not trigger monitor counters. So the 802_3 data layout will
      not be updated in ppcnt register. Finally this command can not get
      the latest values from ppcnt register with 802_3 data layout.
      
      Fixes: 5c7e8bbb ("net/mlx5e: Use monitor counters for update stats")
      Signed-off-by: NZhu Yanjun <yanjunz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      dcdf4ce0
    • S
      net/mlx5: Kconfig: convert imply usage to weak dependency · 96c34151
      Saeed Mahameed 提交于
      MLX5_CORE uses the 'imply' keyword to depend on VXLAN, PTP_1588_CLOCK,
      MLXFW and PCI_HYPERV_INTERFACE.
      
      This was useful to force vxlan, ptp, etc.. to be reachable to mlx5
      regardless of their config states.
      
      Due to the changes in the cited commit below, the semantics of 'imply'
      was changed to not force any restriction on the implied config.
      
      As a result of this change, the compilation of MLX5_CORE=y and VXLAN=m
      would result in undefined references, as VXLAN now would stay as 'm'.
      
      To fix this we change MLX5_CORE to have a weak dependency on
      these modules/configs and make sure they are reachable, by adding:
      depend on symbol || !symbol.
      
      For example: VXLAN=m MLX5_CORE=y, this will force MLX5_CORE to m
      
      Fixes: def2fbff ("kconfig: allow symbols implied by y to become m")
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Nicolas Pitre <nico@fluxnic.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      96c34151
    • M
      net/mlx5e: Don't trigger IRQ multiple times on XSK wakeup to avoid WQ overruns · e7e0004a
      Maxim Mikityanskiy 提交于
      XSK wakeup function triggers NAPI by posting a NOP WQE to a special XSK
      ICOSQ. When the application floods the driver with wakeup requests by
      calling sendto() in a certain pattern that ends up in mlx5e_trigger_irq,
      the XSK ICOSQ may overflow.
      
      Multiple NOPs are not required and won't accelerate the process, so
      avoid posting a second NOP if there is one already on the way. This way
      we also avoid increasing the queue size (which might not help anyway).
      
      Fixes: db05815b ("net/mlx5e: Add XSK zero-copy support")
      Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      e7e0004a
    • P
      net/mlx5: CT: Change idr to xarray to protect parallel tuple id allocation · 70840b66
      Paul Blakey 提交于
      After allowing parallel tuple insertion, we get the following trace:
      
      [ 5505.142249] ------------[ cut here ]------------
      [ 5505.148155] WARNING: CPU: 21 PID: 13313 at lib/radix-tree.c:581 delete_node+0x16c/0x180
      [ 5505.295553] CPU: 21 PID: 13313 Comm: kworker/u50:22 Tainted: G           OE     5.6.0+ #78
      [ 5505.304824] Hardware name: Supermicro Super Server/X10DRT-P, BIOS 2.0b 03/30/2017
      [ 5505.313740] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_flow_table]
      [ 5505.323257] RIP: 0010:delete_node+0x16c/0x180
      [ 5505.349862] RSP: 0018:ffffb19184eb7b30 EFLAGS: 00010282
      [ 5505.356785] RAX: 0000000000000000 RBX: ffff904ac95b86d8 RCX: ffff904b6f938838
      [ 5505.365190] RDX: 0000000000000000 RSI: ffff904ac954b908 RDI: ffff904ac954b920
      [ 5505.373628] RBP: ffff904b4ac13060 R08: 0000000000000001 R09: 0000000000000000
      [ 5505.382155] R10: 0000000000000000 R11: 0000000000000040 R12: 0000000000000000
      [ 5505.390527] R13: ffffb19184eb7bfc R14: ffff904b6bef5800 R15: ffff90482c1203c0
      [ 5505.399246] FS:  0000000000000000(0000) GS:ffff904c2fc80000(0000) knlGS:0000000000000000
      [ 5505.408621] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 5505.415739] CR2: 00007f5d27006010 CR3: 0000000058c10006 CR4: 00000000001626e0
      [ 5505.424547] Call Trace:
      [ 5505.428429]  idr_alloc_u32+0x7b/0xc0
      [ 5505.433803]  mlx5_tc_ct_entry_add_rule+0xbf/0x950 [mlx5_core]
      [ 5505.441354]  ? mlx5_fc_create+0x23c/0x370 [mlx5_core]
      [ 5505.448225]  mlx5_tc_ct_block_flow_offload+0x874/0x10b0 [mlx5_core]
      [ 5505.456278]  ? mlx5_tc_ct_block_flow_offload+0x63d/0x10b0 [mlx5_core]
      [ 5505.464532]  nf_flow_offload_tuple.isra.21+0xc5/0x140 [nf_flow_table]
      [ 5505.472286]  ? __kmalloc+0x217/0x2f0
      [ 5505.477093]  ? flow_rule_alloc+0x1c/0x30
      [ 5505.482117]  flow_offload_work_handler+0x1d0/0x290 [nf_flow_table]
      [ 5505.489674]  ? process_one_work+0x17c/0x580
      [ 5505.494922]  process_one_work+0x202/0x580
      [ 5505.500082]  ? process_one_work+0x17c/0x580
      [ 5505.505696]  worker_thread+0x4c/0x3f0
      [ 5505.510458]  kthread+0x103/0x140
      [ 5505.514989]  ? process_one_work+0x580/0x580
      [ 5505.520616]  ? kthread_bind+0x10/0x10
      [ 5505.525837]  ret_from_fork+0x3a/0x50
      [ 5505.570841] ---[ end trace 07995de9c56d6831 ]---
      
      This happens from parallel deletes/adds to idr, as idr isn't protected.
      Fix that by using xarray as the tuple_ids allocator instead of idr.
      
      Fixes: 7da182a9 ("netfilter: flowtable: Use work entry per offload command")
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      70840b66
    • N
      net/mlx5: Fix failing fw tracer allocation on s390 · a019b361
      Niklas Schnelle 提交于
      On s390 FORCE_MAX_ZONEORDER is 9 instead of 11, thus a larger kzalloc()
      allocation as done for the firmware tracer will always fail.
      
      Looking at mlx5_fw_tracer_save_trace(), it is actually the driver itself
      that copies the debug data into the trace array and there is no need for
      the allocation to be contiguous in physical memory. We can therefor use
      kvzalloc() instead of kzalloc() and get rid of the large contiguous
      allcoation.
      
      Fixes: f53aaa31 ("net/mlx5: FW tracer, implement tracer logic")
      Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a019b361
    • T
      team: fix hang in team_mode_get() · 1c30fbc7
      Taehee Yoo 提交于
      When team mode is changed or set, the team_mode_get() is called to check
      whether the mode module is inserted or not. If the mode module is not
      inserted, it calls the request_module().
      In the request_module(), it creates a child process, which is
      the "modprobe" process and waits for the done of the child process.
      At this point, the following locks were used.
      down_read(&cb_lock()); by genl_rcv()
          genl_lock(); by genl_rcv_msc()
              rtnl_lock(); by team_nl_cmd_options_set()
                  mutex_lock(&team->lock); by team_nl_team_get()
      
      Concurrently, the team module could be removed by rmmod or "modprobe -r"
      The __exit function of team module is team_module_exit(), which calls
      team_nl_fini() and it tries to acquire following locks.
      down_write(&cb_lock);
          genl_lock();
      Because of the genl_lock() and cb_lock, this process can't be finished
      earlier than request_module() routine.
      
      The problem secenario.
      CPU0                                     CPU1
      team_mode_get
          request_module()
                                               modprobe -r team_mode_roundrobin
                                                           team <--(B)
              modprobe team <--(A)
                  team_mode_roundrobin
      
      By request_module(), the "modprobe team_mode_roundrobin" command
      will be executed. At this point, the modprobe process will decide
      that the team module should be inserted before team_mode_roundrobin.
      Because the team module is being removed.
      
      By the module infrastructure, the same module insert/remove operations
      can't be executed concurrently.
      So, (A) waits for (B) but (B) also waits for (A) because of locks.
      So that the hang occurs at this point.
      
      Test commands:
          while :
          do
              teamd -d &
      	killall teamd &
      	modprobe -rv team_mode_roundrobin &
          done
      
      The approach of this patch is to hold the reference count of the team
      module if the team module is compiled as a module. If the reference count
      of the team module is not zero while request_module() is being called,
      the team module will not be removed at that moment.
      So that the above scenario could not occur.
      
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c30fbc7
    • D
      Merge branch 'mptcp-fix-races-on-accept' · 0b943d90
      David S. Miller 提交于
      Paolo Abeni says:
      
      ====================
      mptcp: fix races on accept()
      
      This series includes some fixes for accept() races which may cause inconsistent
      MPTCP socket status and oops. Please see the individual patches for the
      technical details.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b943d90
    • P
      mptcp: drop req socket remote_key* fields · fca5c82c
      Paolo Abeni 提交于
      We don't need them, as we can use the current ingress opt
      data instead. Setting them in syn_recv_sock() may causes
      inconsistent mptcp socket status, as per previous commit.
      
      Fixes: cc7972ea ("mptcp: parse and emit MP_CAPABLE option according to v1 spec")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fca5c82c
    • P
      mptcp: avoid flipping mp_capable field in syn_recv_sock() · 4c8941de
      Paolo Abeni 提交于
      If multiple CPUs races on the same req_sock in syn_recv_sock(),
      flipping such field can cause inconsistent child socket status.
      
      When racing, the CPU losing the req ownership may still change
      the mptcp request socket mp_capable flag while the CPU owning
      the request is cloning the socket, leaving the child socket with
      'is_mptcp' set but no 'mp_capable' flag.
      
      Such socket will stay with 'conn' field cleared, heading to oops
      in later mptcp callback.
      
      Address the issue tracking the fallback status in a local variable.
      
      Fixes: 58b09919 ("mptcp: create msk early")
      Co-developed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c8941de
    • F
      mptcp: handle mptcp listener destruction via rcu · 5e20087d
      Florian Westphal 提交于
      Following splat can occur during self test:
      
       BUG: KASAN: use-after-free in subflow_data_ready+0x156/0x160
       Read of size 8 at addr ffff888100c35c28 by task mptcp_connect/4808
      
        subflow_data_ready+0x156/0x160
        tcp_child_process+0x6a3/0xb30
        tcp_v4_rcv+0x2231/0x3730
        ip_protocol_deliver_rcu+0x5c/0x860
        ip_local_deliver_finish+0x220/0x360
        ip_local_deliver+0x1c8/0x4e0
        ip_rcv_finish+0x1da/0x2f0
        ip_rcv+0xd0/0x3c0
        __netif_receive_skb_one_core+0xf5/0x160
        __netif_receive_skb+0x27/0x1c0
        process_backlog+0x21e/0x780
        net_rx_action+0x35f/0xe90
        do_softirq+0x4c/0x50
        [..]
      
      This occurs when accessing subflow_ctx->conn.
      
      Problem is that tcp_child_process() calls listen sockets'
      sk_data_ready() notification, but it doesn't hold the listener
      lock.  Another cpu calling close() on the listener will then cause
      transition of refcount to 0.
      
      Fixes: 58b09919 ("mptcp: create msk early")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e20087d
    • R
      cxgb4: fix large delays in PTP synchronization · bd019427
      Rahul Lakkireddy 提交于
      Fetching PTP sync information from mailbox is slow and can take
      up to 10 milliseconds. Reduce this unnecessary delay by directly
      reading the information from the corresponding registers.
      
      Fixes: 9c33e420 ("cxgb4: Add PTP Hardware Clock (PHC) support")
      Signed-off-by: NManoj Malviya <manojmalviya@chelsio.com>
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd019427
    • M
      net: stmmac: dwmac-meson8b: Add missing boundary to RGMII TX clock array · f0212a5e
      Marc Zyngier 提交于
      Running with KASAN on a VIM3L systems leads to the following splat
      when probing the Ethernet device:
      
      ==================================================================
      BUG: KASAN: global-out-of-bounds in _get_maxdiv+0x74/0xd8
      Read of size 4 at addr ffffa000090615f4 by task systemd-udevd/139
      CPU: 1 PID: 139 Comm: systemd-udevd Tainted: G            E     5.7.0-rc1-00101-g8624b7577b9c #781
      Hardware name: amlogic w400/w400, BIOS 2020.01-rc5 03/12/2020
      Call trace:
       dump_backtrace+0x0/0x2a0
       show_stack+0x20/0x30
       dump_stack+0xec/0x148
       print_address_description.isra.12+0x70/0x35c
       __kasan_report+0xfc/0x1d4
       kasan_report+0x4c/0x68
       __asan_load4+0x9c/0xd8
       _get_maxdiv+0x74/0xd8
       clk_divider_bestdiv+0x74/0x5e0
       clk_divider_round_rate+0x80/0x1a8
       clk_core_determine_round_nolock.part.9+0x9c/0xd0
       clk_core_round_rate_nolock+0xf0/0x108
       clk_hw_round_rate+0xac/0xf0
       clk_factor_round_rate+0xb8/0xd0
       clk_core_determine_round_nolock.part.9+0x9c/0xd0
       clk_core_round_rate_nolock+0xf0/0x108
       clk_core_round_rate_nolock+0xbc/0x108
       clk_core_set_rate_nolock+0xc4/0x2e8
       clk_set_rate+0x58/0xe0
       meson8b_dwmac_probe+0x588/0x72c [dwmac_meson8b]
       platform_drv_probe+0x78/0xd8
       really_probe+0x158/0x610
       driver_probe_device+0x140/0x1b0
       device_driver_attach+0xa4/0xb0
       __driver_attach+0xcc/0x1c8
       bus_for_each_dev+0xf4/0x168
       driver_attach+0x3c/0x50
       bus_add_driver+0x238/0x2e8
       driver_register+0xc8/0x1e8
       __platform_driver_register+0x88/0x98
       meson8b_dwmac_driver_init+0x28/0x1000 [dwmac_meson8b]
       do_one_initcall+0xa8/0x328
       do_init_module+0xe8/0x368
       load_module+0x3300/0x36b0
       __do_sys_finit_module+0x120/0x1a8
       __arm64_sys_finit_module+0x4c/0x60
       el0_svc_common.constprop.2+0xe4/0x268
       do_el0_svc+0x98/0xa8
       el0_svc+0x24/0x68
       el0_sync_handler+0x12c/0x318
       el0_sync+0x158/0x180
      
      The buggy address belongs to the variable:
       div_table.63646+0x34/0xfffffffffffffa40 [dwmac_meson8b]
      
      Memory state around the buggy address:
       ffffa00009061480: fa fa fa fa 00 00 00 01 fa fa fa fa 00 00 00 00
       ffffa00009061500: 05 fa fa fa fa fa fa fa 00 04 fa fa fa fa fa fa
      >ffffa00009061580: 00 03 fa fa fa fa fa fa 00 00 00 00 00 00 fa fa
                                                                   ^
       ffffa00009061600: fa fa fa fa 00 01 fa fa fa fa fa fa 01 fa fa fa
       ffffa00009061680: fa fa fa fa 00 01 fa fa fa fa fa fa 04 fa fa fa
      ==================================================================
      
      Digging into this indeed shows that the clock divider array is
      lacking a final fence, and that the clock subsystems goes in the
      weeds. Oh well.
      
      Let's add the empty structure that indicates the end of the array.
      
      Fixes: bd6f4854 ("net: stmmac: dwmac-meson8b: Fix the RGMII TX delay on Meson8b/8m2 SoCs")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
      Reviewed-by: NMartin Blumenstingl <martin.blumenstingl@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0212a5e
    • J
      ipv6: fix restrict IPV6_ADDRFORM operation · 82c9ae44
      John Haxby 提交于
      Commit b6f61189 ("ipv6: restrict IPV6_ADDRFORM operation") fixed a
      problem found by syzbot an unfortunate logic error meant that it
      also broke IPV6_ADDRFORM.
      
      Rearrange the checks so that the earlier test is just one of the series
      of checks made before moving the socket from IPv6 to IPv4.
      
      Fixes: b6f61189 ("ipv6: restrict IPV6_ADDRFORM operation")
      Signed-off-by: NJohn Haxby <john.haxby@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82c9ae44
    • T
      net: systemport: Omit superfluous error message in bcm_sysport_probe() · bdbe05b3
      Tang Bin 提交于
      In the function bcm_sysport_probe(), when get irq failed, the function
      platform_get_irq() logs an error message, so remove redundant message
      here.
      Signed-off-by: NTang Bin <tangbin@cmss.chinamobile.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdbe05b3
    • T
      net: openvswitch: ovs_ct_exit to be done under ovs_lock · 27de77ce
      Tonghao Zhang 提交于
      syzbot wrote:
      | =============================
      | WARNING: suspicious RCU usage
      | 5.7.0-rc1+ #45 Not tainted
      | -----------------------------
      | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
      |
      | other info that might help us debug this:
      | rcu_scheduler_active = 2, debug_locks = 1
      | ...
      |
      | stack backtrace:
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
      | Workqueue: netns cleanup_net
      | Call Trace:
      | ...
      | ovs_ct_exit
      | ovs_exit_net
      | ops_exit_list.isra.7
      | cleanup_net
      | process_one_work
      | worker_thread
      
      To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
      lockdep_ovsl_is_held as optional lockdep expression.
      
      Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
      Fixes: 11efd5cb ("openvswitch: Support conntrack zone limit")
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Yi-Hung Wei <yihung.wei@gmail.com>
      Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27de77ce
  4. 19 4月, 2020 2 次提交
    • H
      netfilter: nat: fix error handling upon registering inet hook · b4faef17
      Hillf Danton 提交于
      A case of warning was reported by syzbot.
      
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 19934 at net/netfilter/nf_nat_core.c:1106
      nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 19934 Comm: syz-executor.5 Not tainted 5.6.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       panic+0x2e3/0x75c kernel/panic.c:221
       __warn.cold+0x2f/0x35 kernel/panic.c:582
       report_bug+0x27b/0x2f0 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:175 [inline]
       fixup_bug arch/x86/kernel/traps.c:170 [inline]
       do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
       do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
      RIP: 0010:nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
      Code: ff df 48 c1 ea 03 80 3c 02 00 75 75 48 8b 44 24 10 4c 89 ef 48 c7 00 00 00 00 00 e8 e8 f8 53 fb e9 4d fe ff ff e8 ee 9c 16 fb <0f> 0b e9 41 fe ff ff e8 e2 45 54 fb e9 b5 fd ff ff 48 8b 7c 24 20
      RSP: 0018:ffffc90005487208 EFLAGS: 00010246
      RAX: 0000000000040000 RBX: 0000000000000004 RCX: ffffc9001444a000
      RDX: 0000000000040000 RSI: ffffffff865c94a2 RDI: 0000000000000005
      RBP: ffff88808b5cf000 R08: ffff8880a2620140 R09: fffffbfff14bcd79
      R10: ffffc90005487208 R11: fffffbfff14bcd78 R12: 0000000000000000
      R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
       nf_nat_ipv6_unregister_fn net/netfilter/nf_nat_proto.c:1017 [inline]
       nf_nat_inet_register_fn net/netfilter/nf_nat_proto.c:1038 [inline]
       nf_nat_inet_register_fn+0xfc/0x140 net/netfilter/nf_nat_proto.c:1023
       nf_tables_register_hook net/netfilter/nf_tables_api.c:224 [inline]
       nf_tables_addchain.constprop.0+0x82e/0x13c0 net/netfilter/nf_tables_api.c:1981
       nf_tables_newchain+0xf68/0x16a0 net/netfilter/nf_tables_api.c:2235
       nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
       nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
       ___sys_sendmsg+0x100/0x170 net/socket.c:2416
       __sys_sendmsg+0xec/0x1b0 net/socket.c:2449
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      and to quiesce it, unregister NFPROTO_IPV6 hook instead of NFPROTO_INET
      in case of failing to register NFPROTO_IPV4 hook.
      Reported-by: Nsyzbot <syzbot+33e06702fd6cffc24c40@syzkaller.appspotmail.com>
      Fixes: d164385e ("netfilter: nat: add inet family nat support")
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NHillf Danton <hdanton@sina.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b4faef17
    • E
      tcp: cache line align MAX_TCP_HEADER · 9bacd256
      Eric Dumazet 提交于
      TCP stack is dumb in how it cooks its output packets.
      
      Depending on MAX_HEADER value, we might chose a bad ending point
      for the headers.
      
      If we align the end of TCP headers to cache line boundary, we
      make sure to always use the smallest number of cache lines,
      which always help.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bacd256