1. 06 6月, 2019 8 次提交
    • Z
      net: rds: fix memory leak when unload rds_rdma · b50e0587
      Zhu Yanjun 提交于
      When KASAN is enabled, after several rds connections are
      created, then "rmmod rds_rdma" is run. The following will
      appear.
      
      "
      BUG rds_ib_incoming (Not tainted): Objects remaining
      in rds_ib_incoming on __kmem_cache_shutdown()
      
      Call Trace:
       dump_stack+0x71/0xab
       slab_err+0xad/0xd0
       __kmem_cache_shutdown+0x17d/0x370
       shutdown_cache+0x17/0x130
       kmem_cache_destroy+0x1df/0x210
       rds_ib_recv_exit+0x11/0x20 [rds_rdma]
       rds_ib_exit+0x7a/0x90 [rds_rdma]
       __x64_sys_delete_module+0x224/0x2c0
       ? __ia32_sys_delete_module+0x2c0/0x2c0
       do_syscall_64+0x73/0x190
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      "
      This is rds connection memory leak. The root cause is:
      When "rmmod rds_rdma" is run, rds_ib_remove_one will call
      rds_ib_dev_shutdown to drop the rds connections.
      rds_ib_dev_shutdown will call rds_conn_drop to drop rds
      connections as below.
      "
      rds_conn_path_drop(&conn->c_path[0], false);
      "
      In the above, destroy is set to false.
      void rds_conn_path_drop(struct rds_conn_path *cp, bool destroy)
      {
              atomic_set(&cp->cp_state, RDS_CONN_ERROR);
      
              rcu_read_lock();
              if (!destroy && rds_destroy_pending(cp->cp_conn)) {
                      rcu_read_unlock();
                      return;
              }
              queue_work(rds_wq, &cp->cp_down_w);
              rcu_read_unlock();
      }
      In the above function, destroy is set to false. rds_destroy_pending
      is called. This does not move rds connections to ib_nodev_conns.
      So destroy is set to true to move rds connections to ib_nodev_conns.
      In rds_ib_unregister_client, flush_workqueue is called to make rds_wq
      finsh shutdown rds connections. The function rds_ib_destroy_nodev_conns
      is called to shutdown rds connections finally.
      Then rds_ib_recv_exit is called to destroy slab.
      
      void rds_ib_recv_exit(void)
      {
              kmem_cache_destroy(rds_ib_incoming_slab);
              kmem_cache_destroy(rds_ib_frag_slab);
      }
      The above slab memory leak will not occur again.
      
      >From tests,
      256 rds connections
      [root@ca-dev14 ~]# time rmmod rds_rdma
      
      real    0m16.522s
      user    0m0.000s
      sys     0m8.152s
      512 rds connections
      [root@ca-dev14 ~]# time rmmod rds_rdma
      
      real    0m32.054s
      user    0m0.000s
      sys     0m15.568s
      
      To rmmod rds_rdma with 256 rds connections, about 16 seconds are needed.
      And with 512 rds connections, about 32 seconds are needed.
      >From ftrace, when one rds connection is destroyed,
      
      "
       19)               |  rds_conn_destroy [rds]() {
       19)   7.782 us    |    rds_conn_path_drop [rds]();
       15)               |  rds_shutdown_worker [rds]() {
       15)               |    rds_conn_shutdown [rds]() {
       15)   1.651 us    |      rds_send_path_reset [rds]();
       15)   7.195 us    |    }
       15) + 11.434 us   |  }
       19)   2.285 us    |    rds_cong_remove_conn [rds]();
       19) * 24062.76 us |  }
      "
      So if many rds connections will be destroyed, this function
      rds_ib_destroy_nodev_conns uses most of time.
      Suggested-by: NHåkon Bugge <haakon.bugge@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b50e0587
    • X
      ipv6: fix the check before getting the cookie in rt6_get_cookie · b7999b07
      Xin Long 提交于
      In Jianlin's testing, netperf was broken with 'Connection reset by peer',
      as the cookie check failed in rt6_check() and ip6_dst_check() always
      returned NULL.
      
      It's caused by Commit 93531c67 ("net/ipv6: separate handling of FIB
      entries from dst based routes"), where the cookie can be got only when
      'c1'(see below) for setting dst_cookie whereas rt6_check() is called
      when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().
      
      Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
      (!c1) will check the 'from' cookie, this patch is to remove the c1 check
      in rt6_get_cookie(), so that the dst_cookie can always be set properly.
      
      c1:
        (rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7999b07
    • X
      ipv4: not do cache for local delivery if bc_forwarding is enabled · 0a90478b
      Xin Long 提交于
      With the topo:
      
          h1 ---| rp1            |
                |     route  rp3 |--- h3 (192.168.200.1)
          h2 ---| rp2            |
      
      If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after
      doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on
      h2, and the packets can still be forwared.
      
      This issue was caused by the input route cache. It should only do
      the cache for either bc forwarding or local delivery. Otherwise,
      local delivery can use the route cache for bc forwarding of other
      interfaces.
      
      This patch is to fix it by not doing cache for local delivery if
      all.bc_forwarding is enabled.
      
      Note that we don't fix it by checking route cache local flag after
      rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the
      common route code shouldn't be touched for bc_forwarding.
      
      Fixes: 5cbf777c ("route: add support for directed broadcast forwarding")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a90478b
    • D
      Merge branch 's390-qeth-fixes' · e7a9fe7b
      David S. Miller 提交于
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2019-06-05
      
      one more shot...  now with patch 2 fixed up so that it uses the
      dst entry returned from dst_check().
      
      From the v1 cover letter:
      
      Please apply the following set of qeth fixes to -net.
      
      - The first two patches fix issues in the L3 driver's cast type
        selection for transmitted skbs.
      - Alexandra adds a sanity check when retrieving VLAN information from
        neighbour address events.
      - The last patch adds some missing error handling for qeth's new
        multiqueue code.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7a9fe7b
    • J
      s390/qeth: handle error when updating TX queue count · bd966839
      Julian Wiedmann 提交于
      netif_set_real_num_tx_queues() can return an error, deal with it.
      
      Fixes: 73dc2daf ("s390/qeth: add TX multiqueue support for OSA devices")
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd966839
    • A
      s390/qeth: fix VLAN attribute in bridge_hostnotify udev event · 33572619
      Alexandra Winter 提交于
      Enabling sysfs attribute bridge_hostnotify triggers a series of udev events
      for the MAC addresses of all currently connected peers. In case no VLAN is
      set for a peer, the device reports the corresponding MAC addresses with
      VLAN ID 4096. This currently results in attribute VLAN=4096 for all
      non-VLAN interfaces in the initial series of events after host-notify is
      enabled.
      
      Instead, no VLAN attribute should be reported in the udev event for
      non-VLAN interfaces.
      
      Only the initial events face this issue. For dynamic changes that are
      reported later, the device uses a validity flag.
      
      This also changes the code so that it now sets the VLAN attribute for
      MAC addresses with VID 0. On Linux, no qeth interface will ever be
      registered with VID 0: Linux kernel registers VID 0 on all network
      interfaces initially, but qeth will drop .ndo_vlan_rx_add_vid for VID 0.
      Peers with other OSs could register MACs with VID 0.
      
      Fixes: 9f48b9db ("qeth: bridgeport support - address notifications")
      Signed-off-by: NAlexandra Winter <wintera@linux.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33572619
    • J
      s390/qeth: check dst entry before use · 0cd6783d
      Julian Wiedmann 提交于
      While qeth_l3 uses netif_keep_dst() to hold onto the dst, a skb's dst
      may still have been obsoleted (via dst_dev_put()) by the time that we
      end up using it. The dst then points to the loopback interface, which
      means the neighbour lookup in qeth_l3_get_cast_type() determines a bogus
      cast type of RTN_BROADCAST.
      For IQD interfaces this causes us to place such skbs on the wrong
      HW queue, resulting in TX errors.
      
      Fix-up the various call sites to first validate the dst entry with
      dst_check(), and fall back accordingly.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cd6783d
    • J
      s390/qeth: handle limited IPv4 broadcast in L3 TX path · 72c87976
      Julian Wiedmann 提交于
      When selecting the cast type of a neighbourless IPv4 skb (eg. on a raw
      socket), qeth_l3 falls back to the packet's destination IP address.
      For this case we should classify traffic sent to 255.255.255.255 as
      broadcast.
      This fixes DHCP requests, which were misclassified as unicast
      (and for IQD interfaces thus ended up on the wrong HW queue).
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72c87976
  2. 05 6月, 2019 8 次提交
  3. 04 6月, 2019 5 次提交
  4. 03 6月, 2019 4 次提交
  5. 01 6月, 2019 1 次提交
    • V
      net: dsa: sja1105: Don't store frame type in skb->cb · e8d67fa5
      Vladimir Oltean 提交于
      Due to a confusion I thought that eth_type_trans() was called by the
      network stack whereas it can actually be called by network drivers to
      figure out the skb protocol and next packet_type handlers.
      
      In light of the above, it is not safe to store the frame type from the
      DSA tagger's .filter callback (first entry point on RX path), since GRO
      is yet to be invoked on the received traffic.  Hence it is very likely
      that the skb->cb will actually get overwritten between eth_type_trans()
      and the actual DSA packet_type handler.
      
      Of course, what this patch fixes is the actual overwriting of the
      SJA1105_SKB_CB(skb)->type field from the GRO layer, which made all
      frames be seen as SJA1105_FRAME_TYPE_NORMAL (0).
      
      Fixes: 227d07a0 ("net: dsa: sja1105: Add support for traffic through standalone ports")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8d67fa5
  6. 31 5月, 2019 14 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 036e3431
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix OOPS during nf_tables rule dump, from Florian Westphal.
      
       2) Use after free in ip_vs_in, from Yue Haibing.
      
       3) Fix various kTLS bugs (NULL deref during device removal resync,
          netdev notification ignoring, etc.) From Jakub Kicinski.
      
       4) Fix ipv6 redirects with VRF, from David Ahern.
      
       5) Memory leak fix in igmpv3_del_delrec(), from Eric Dumazet.
      
       6) Missing memory allocation failure check in ip6_ra_control(), from
          Gen Zhang. And likewise fix ip_ra_control().
      
       7) TX clean budget logic error in aquantia, from Igor Russkikh.
      
       8) SKB leak in llc_build_and_send_ui_pkt(), from Eric Dumazet.
      
       9) Double frees in mlx5, from Parav Pandit.
      
      10) Fix lost MAC address in r8169 during PCI D3, from Heiner Kallweit.
      
      11) Fix botched register access in mvpp2, from Antoine Tenart.
      
      12) Use after free in napi_gro_frags(), from Eric Dumazet.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (89 commits)
        net: correct zerocopy refcnt with udp MSG_MORE
        ethtool: Check for vlan etype or vlan tci when parsing flow_rule
        net: don't clear sock->sk early to avoid trouble in strparser
        net-gro: fix use-after-free read in napi_gro_frags()
        net: dsa: tag_8021q: Create a stable binary format
        net: dsa: tag_8021q: Change order of rx_vid setup
        net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
        ipv4: tcp_input: fix stack out of bounds when parsing TCP options.
        mlxsw: spectrum: Prevent force of 56G
        mlxsw: spectrum_acl: Avoid warning after identical rules insertion
        net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT
        r8169: fix MAC address being lost in PCI D3
        net: core: support XDP generic on stacked devices.
        netvsc: unshare skb in VF rx handler
        udp: Avoid post-GRO UDP checksum recalculation
        net: phy: dp83867: Set up RGMII TX delay
        net: phy: dp83867: do not call config_init twice
        net: phy: dp83867: increase SGMII autoneg timer duration
        net: phy: dp83867: fix speed 10 in sgmii mode
        net: phy: marvell10g: report if the PHY fails to boot firmware
        ...
      036e3431
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · adc3f554
      Linus Torvalds 提交于
      Pull arm64 fixes from Will Deacon:
       "The fixes are still trickling in for arm64, but the only really
        significant one here is actually fixing a regression in the botched
        module relocation range checking merged for -rc2.
      
        Hopefully we've nailed it this time.
      
         - Fix implementation of our set_personality() system call, which
           wasn't being wrapped properly
      
         - Fix system call function types to keep CFI happy
      
         - Fix siginfo layout when delivering SIGKILL after a kernel fault
      
         - Really fix module relocation range checking"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: use the correct function type for __arm64_sys_ni_syscall
        arm64: use the correct function type in SYSCALL_DEFINE0
        arm64: fix syscall_fn_t type
        signal/arm64: Use force_sig not force_sig_fault for SIGKILL
        arm64/module: revert to unsigned interpretation of ABS16/32 relocations
        arm64: Fix the arm64_personality() syscall wrapper redirection
      adc3f554
    • L
      Merge tag 'for-5.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 318adf8e
      Linus Torvalds 提交于
      Pull btrfs fixes from David Sterba:
       "A few more fixes for bugs reported by users, fuzzing tools and
        regressions:
      
         - fix crashes in relocation:
             + resuming interrupted balance operation does not properly clean
               up orphan trees
             + with enabled qgroups, resuming needs to be more careful about
               block groups due to limited context when updating qgroups
      
         - fsync and logging fixes found by fuzzing
      
         - incremental send fixes for no-holes and clone
      
         - fix spin lock type used in timer function for zstd"
      
      * tag 'for-5.2-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix race updating log root item during fsync
        Btrfs: fix wrong ctime and mtime of a directory after log replay
        Btrfs: fix fsync not persisting changed attributes of a directory
        btrfs: qgroup: Check bg while resuming relocation to avoid NULL pointer dereference
        btrfs: reloc: Also queue orphan reloc tree for cleanup to avoid BUG_ON()
        Btrfs: incremental send, fix emission of invalid clone operations
        Btrfs: incremental send, fix file corruption when no-holes feature is enabled
        btrfs: correct zstd workspace manager lock to use spin_lock_bh()
        btrfs: Ensure replaced device doesn't have pending chunk allocation
      318adf8e
    • L
      Merge tag 'configfs-for-5.2-2' of git://git.infradead.org/users/hch/configfs · 8cb7104d
      Linus Torvalds 提交于
      Pull configs fix from Christoph Hellwig:
      
       - fix a use after free in configfs_d_iput (Sahitya Tummala)
      
      * tag 'configfs-for-5.2-2' of git://git.infradead.org/users/hch/configfs:
        configfs: Fix use-after-free when accessing sd->s_dentry
      8cb7104d
    • L
      Merge tag 'sound-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · c5ba1712
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "No big surprises here, just a few device-specific fixes.
      
        HD-audio received several fixes for Acer, Dell, Huawei and other
        laptops as well as the workaround for the new Intel chipset. One
        significant one-liner fix is the disablement of the node-power saving
        on Realtek codecs, which may potentially cover annoying bugs like the
        background noises or click noises on many devices.
      
        Other than that, a fix for FireWire bit definitions, and another fix
        for LINE6 USB audio bug that was discovered by syzkaller"
      
      * tag 'sound-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: fireface: Use ULL suffixes for 64-bit constants
        ALSA: hda/realtek - Improve the headset mic for Acer Aspire laptops
        ALSA: line6: Assure canceling delayed work at disconnection
        ALSA: hda - Force polling mode on CNL for fixing codec communication
        ALSA: hda/realtek - Enable micmute LED for Huawei laptops
        ALSA: hda/realtek - Set default power save node to 0
        ALSA: hda/realtek - Check headset type by unplug and resume
      c5ba1712
    • L
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 20f94496
      Linus Torvalds 提交于
      Pull clk driver fixes from Stephen Boyd:
      
       - Don't expose the SiFive clk driver on non-RISCV architectures
      
       - Fix some bits describing clks in the imx8mm driver
      
       - Always call clk domain code in the TI driver so non-legacy platforms
         work
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: ti: clkctrl: Fix clkdm_clk handling
        clk: imx: imx8mm: fix int pll clk gate
        clk: sifive: restrict Kconfig scope for the FU540 PRCI driver
      20f94496
    • W
      net: correct zerocopy refcnt with udp MSG_MORE · 100f6d8e
      Willem de Bruijn 提交于
      TCP zerocopy takes a uarg reference for every skb, plus one for the
      tcp_sendmsg_locked datapath temporarily, to avoid reaching refcnt zero
      as it builds, sends and frees skbs inside its inner loop.
      
      UDP and RAW zerocopy do not send inside the inner loop so do not need
      the extra sock_zerocopy_get + sock_zerocopy_put pair. Commit
      52900d22288ed ("udp: elide zerocopy operation in hot path") introduced
      extra_uref to pass the initial reference taken in sock_zerocopy_alloc
      to the first generated skb.
      
      But, sock_zerocopy_realloc takes this extra reference at the start of
      every call. With MSG_MORE, no new skb may be generated to attach the
      extra_uref to, so refcnt is incorrectly 2 with only one skb.
      
      Do not take the extra ref if uarg && !tcp, which implies MSG_MORE.
      Update extra_uref accordingly.
      
      This conditional assignment triggers a false positive may be used
      uninitialized warning, so have to initialize extra_uref at define.
      
      Changes v1->v2: fix typo in Fixes SHA1
      
      Fixes: 52900d22 ("udp: elide zerocopy operation in hot path")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Diagnosed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      100f6d8e
    • M
      ethtool: Check for vlan etype or vlan tci when parsing flow_rule · b73484b2
      Maxime Chevallier 提交于
      When parsing an ethtool flow spec to build a flow_rule, the code checks
      if both the vlan etype and the vlan tci are specified by the user to add
      a FLOW_DISSECTOR_KEY_VLAN match.
      
      However, when the user only specified a vlan etype or a vlan tci, this
      check silently ignores these parameters.
      
      For example, the following rule :
      
      ethtool -N eth0 flow-type udp4 vlan 0x0010 action -1 loc 0
      
      will result in no error being issued, but the equivalent rule will be
      created and passed to the NIC driver :
      
      ethtool -N eth0 flow-type udp4 action -1 loc 0
      
      In the end, neither the NIC driver using the rule nor the end user have
      a way to know that these keys were dropped along the way, or that
      incorrect parameters were entered.
      
      This kind of check should be left to either the driver, or the ethtool
      flow spec layer.
      
      This commit makes so that ethtool parameters are forwarded as-is to the
      NIC driver.
      
      Since none of the users of ethtool_rx_flow_rule_create are using the
      VLAN dissector, I don't think this qualifies as a regression.
      
      Fixes: eca4205f ("ethtool: add ethtool_rx_flow_spec to flow_rule structure translator")
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Acked-by: NPablo Neira Ayuso <pablo@gnumonks.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b73484b2
    • J
      net: don't clear sock->sk early to avoid trouble in strparser · 2b81f816
      Jakub Kicinski 提交于
      af_inet sets sock->sk to NULL which trips strparser over:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000012
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP PTI
      CPU: 7 PID: 0 Comm: swapper/7 Not tainted 5.2.0-rc1-00139-g14629453a6d3 #21
      RIP: 0010:tcp_peek_len+0x10/0x60
      RSP: 0018:ffffc02e41c54b98 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff9cf924c4e030 RCX: 0000000000000051
      RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff9cf97128f480
      RBP: ffff9cf9365e0300 R08: ffff9cf94fe7d2c0 R09: 0000000000000000
      R10: 000000000000036b R11: ffff9cf939735e00 R12: ffff9cf91ad9ae40
      R13: ffff9cf924c4e000 R14: ffff9cf9a8fcbaae R15: 0000000000000020
      FS: 0000000000000000(0000) GS:ffff9cf9af7c0000(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000012 CR3: 000000013920a003 CR4: 00000000003606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
       <IRQ>
       strp_data_ready+0x48/0x90
       tls_data_ready+0x22/0xd0 [tls]
       tcp_rcv_established+0x569/0x620
       tcp_v4_do_rcv+0x127/0x1e0
       tcp_v4_rcv+0xad7/0xbf0
       ip_protocol_deliver_rcu+0x2c/0x1c0
       ip_local_deliver_finish+0x41/0x50
       ip_local_deliver+0x6b/0xe0
       ? ip_protocol_deliver_rcu+0x1c0/0x1c0
       ip_rcv+0x52/0xd0
       ? ip_rcv_finish_core.isra.20+0x380/0x380
       __netif_receive_skb_one_core+0x7e/0x90
       netif_receive_skb_internal+0x42/0xf0
       napi_gro_receive+0xed/0x150
       nfp_net_poll+0x7a2/0xd30 [nfp]
       ? kmem_cache_free_bulk+0x286/0x310
       net_rx_action+0x149/0x3b0
       __do_softirq+0xe3/0x30a
       ? handle_irq_event_percpu+0x6a/0x80
       irq_exit+0xe8/0xf0
       do_IRQ+0x85/0xd0
       common_interrupt+0xf/0xf
       </IRQ>
      RIP: 0010:cpuidle_enter_state+0xbc/0x450
      
      To avoid this issue set sock->sk after sk_prot->close.
      My grepping and testing did not discover any code which
      would depend on the current behaviour.
      
      Fixes: c46234eb ("tls: RX path for ktls")
      Reported-by: NDavid Beckett <david.beckett@netronome.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b81f816
    • E
      net-gro: fix use-after-free read in napi_gro_frags() · a4270d67
      Eric Dumazet 提交于
      If a network driver provides to napi_gro_frags() an
      skb with a page fragment of exactly 14 bytes, the call
      to gro_pull_from_frag0() will 'consume' the fragment
      by calling skb_frag_unref(skb, 0), and the page might
      be freed and reused.
      
      Reading eth->h_proto at the end of napi_frags_skb() might
      read mangled data, or crash under specific debugging features.
      
      BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline]
      BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
      Read of size 2 at addr ffff88809366840c by task syz-executor599/8957
      
      CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
       __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       kasan_report+0x12/0x20 mm/kasan/common.c:614
       __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142
       napi_frags_skb net/core/dev.c:5833 [inline]
       napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841
       tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
       call_write_iter include/linux/fs.h:1872 [inline]
       do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693
       do_iter_write fs/read_write.c:970 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:951
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
       do_writev+0x15b/0x330 fs/read_write.c:1058
      
      Fixes: a50e233c ("net-gro: restore frag0 optimization")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4270d67
    • D
      Merge branch 'Fixes-for-DSA-tagging-using-802-1Q' · c3bc6deb
      David S. Miller 提交于
      Vladimir Oltean says:
      
      ====================
      Fixes for DSA tagging using 802.1Q
      
      During the prototyping for the "Decoupling PHYLINK from struct
      net_device" patchset, the CPU port of the sja1105 driver was moved to a
      different spot.  This uncovered an issue in the tag_8021q DSA code,
      which used to work by mistake - the CPU port was the last hardware port
      numerically, and this was masking an ordering issue which is very likely
      to be seen in other drivers that make use of 802.1Q tags.
      
      A question was also raised whether the VID numbers bear any meaning, and
      the conclusion was that they don't, at least not in an absolute sense.
      The second patch defines bit fields inside the DSA 802.1Q VID so that
      tcpdump can decode it unambiguously (although the meaning is now clear
      even by visual inspection).
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3bc6deb
    • V
      net: dsa: tag_8021q: Create a stable binary format · 0471dd42
      Vladimir Oltean 提交于
      Tools like tcpdump need to be able to decode the significance of fake
      VLAN headers that DSA uses to separate switch ports.
      
      But currently these have no global significance - they are simply an
      ordered list of DSA_MAX_SWITCHES x DSA_MAX_PORTS numbers ending at 4095.
      
      The reason why this is submitted as a fix is that the existing mapping
      of VIDs should not enter into a stable kernel, so we can pretend that
      only the new format exists. This way tcpdump won't need to try to make
      something out of the VLAN tags on 5.2 kernels.
      
      Fixes: f9bbe447 ("net: dsa: Optional VLAN-based port separation for switches without tagging")
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0471dd42
    • I
      net: dsa: tag_8021q: Change order of rx_vid setup · d34d2baa
      Ioana Ciornei 提交于
      The 802.1Q tagging performs an unbalanced setup in terms of RX VIDs on
      the CPU port. For the ingress path of a 802.1Q switch to work, the RX
      VID of a port needs to be seen as tagged egress on the CPU port.
      
      While configuring the other front-panel ports to be part of this VID,
      for bridge scenarios, the untagged flag is applied even on the CPU port
      in dsa_switch_vlan_add.  This happens because DSA applies the same flags
      on the CPU port as on the (bridge-controlled) slave ports, and the
      effect in this case is that the CPU port tagged settings get deleted.
      
      Instead of fixing DSA by introducing a way to control VLAN flags on the
      CPU port (and hence stop inheriting from the slave ports) - a hard,
      perhaps intractable problem - avoid this situation by moving the setup
      part of the RX VID on the CPU port after all the other front-panel ports
      have been added to the VID.
      
      Fixes: f9bbe447 ("net: dsa: Optional VLAN-based port separation for switches without tagging")
      Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d34d2baa
    • A
      net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value · 21808437
      Antoine Tenart 提交于
      MVPP2_TXQ_SCHED_TOKEN_CNTR_REG() expects the logical queue id but
      the current code is passing the global tx queue offset, so it ends
      up writing to unknown registers (between 0x8280 and 0x82fc, which
      seemed to be unused by the hardware). This fixes the issue by using
      the logical queue id instead.
      
      Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit")
      Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21808437