1. 13 4月, 2019 4 次提交
  2. 12 4月, 2019 14 次提交
    • E
      dctcp: more accurate tracking of packets delivery · e3058450
      Eric Dumazet 提交于
      After commit e21db6f6 ("tcp: track total bytes delivered with ECN CE marks")
      core TCP stack does a very good job tracking ECN signals.
      
      The "sender's best estimate of CE information" Yuchung mentioned in his
      patch is indeed the best we can do.
      
      DCTCP can use tp->delivered_ce and tp->delivered to not duplicate the logic,
      and use the existing best estimate.
      
      This solves some problems, since current DCTCP logic does not deal with losses
      and/or GRO or ack aggregation very well.
      
      This also removes a dubious use of inet_csk(sk)->icsk_ack.rcv_mss
      (this should have been tp->mss_cache), and a 64 bit divide.
      
      Finally, we can see that the DCTCP logic, calling dctcp_update_alpha() for
      every ACK could be done differently, calling it only once per RTT.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Abdul Kabbani <akabbani@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3058450
    • D
      selftests: fib_tests: Fix 'Command line is not complete' errors · a5f62298
      David Ahern 提交于
      A couple of tests are verifying a route has been removed. The helper
      expects the prefix as the first part of the expected output. When
      checking that a route has been deleted the prefix is empty leading
      to an invalid ip command:
      
        $ ip ro ls match
        Command line is not complete. Try option "help"
      
      Fix by moving the comparison of expected output and output to a new
      function that is used by both check_route and check_route6. Use the
      new helper for the 2 checks on route removal.
      
      Also, remove the reset of 'set -x' in route_setup which overrides the
      user managed setting.
      
      Fixes: d69faad7 ("selftests: fib_tests: Add prefix route tests with metric")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5f62298
    • Y
      net: netrom: Fix error cleanup path of nr_proto_init · d3706566
      YueHaibing 提交于
      Syzkaller report this:
      
      BUG: unable to handle kernel paging request at fffffbfff830524b
      PGD 237fe8067 P4D 237fe8067 PUD 237e64067 PMD 1c9716067 PTE 0
      Oops: 0000 [#1] SMP KASAN PTI
      CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 5.0.0+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:__list_add_valid+0x21/0xe0 lib/list_debug.c:23
      Code: 8b 0c 24 e9 17 fd ff ff 90 55 48 89 fd 48 8d 7a 08 53 48 89 d3 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 0f 85 8b 00 00 00 48 8b 53 08 48 39 f2 75 35 48 89 f2
      RSP: 0018:ffff8881ea2278d0 EFLAGS: 00010282
      RAX: dffffc0000000000 RBX: ffffffffc1829250 RCX: 1ffff1103d444ef4
      RDX: 1ffffffff830524b RSI: ffffffff85659300 RDI: ffffffffc1829258
      RBP: ffffffffc1879250 R08: fffffbfff0acb269 R09: fffffbfff0acb269
      R10: ffff8881ea2278f0 R11: fffffbfff0acb268 R12: ffffffffc1829250
      R13: dffffc0000000000 R14: 0000000000000008 R15: ffffffffc187c830
      FS:  00007fe0361df700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: fffffbfff830524b CR3: 00000001eb39a001 CR4: 00000000007606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       __list_add include/linux/list.h:60 [inline]
       list_add include/linux/list.h:79 [inline]
       proto_register+0x444/0x8f0 net/core/sock.c:3375
       nr_proto_init+0x73/0x4b3 [netrom]
       ? 0xffffffffc1628000
       ? 0xffffffffc1628000
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fe0361dec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
      RBP: 00007fe0361dec70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0361df6bc
      R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      Modules linked in: netrom(+) ax25 fcrypt pcbc af_alg arizona_ldo1 v4l2_common videodev media v4l2_dv_timings hdlc ide_cd_mod snd_soc_sigmadsp_regmap snd_soc_sigmadsp intel_spi_platform intel_spi mtd spi_nor snd_usbmidi_lib usbcore lcd ti_ads7950 hi6421_regulator snd_soc_kbl_rt5663_max98927 snd_soc_hdac_hdmi snd_hda_ext_core snd_hda_core snd_soc_rt5663 snd_soc_core snd_pcm_dmaengine snd_compress snd_soc_rl6231 mac80211 rtc_rc5t583 spi_slave_time leds_pwm hid_gt683r hid industrialio_triggered_buffer kfifo_buf industrialio ir_kbd_i2c rc_core led_class_flash dwc_xlgmac snd_ymfpci gameport snd_mpu401_uart snd_rawmidi snd_ac97_codec snd_pcm ac97_bus snd_opl3_lib snd_timer snd_seq_device snd_hwdep snd soundcore iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan
       bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ide_pci_generic piix aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ide_core psmouse input_leds i2c_piix4 serio_raw intel_agp intel_gtt ata_generic agpgart pata_acpi parport_pc rtc_cmos parport floppy sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: rxrpc]
      Dumping ftrace buffer:
         (ftrace buffer empty)
      CR2: fffffbfff830524b
      ---[ end trace 039ab24b305c4b19 ]---
      
      If nr_proto_init failed, it may forget to call proto_unregister,
      tiggering this issue.This patch rearrange code of nr_proto_init
      to avoid such issues.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3706566
    • A
      net: fec: manage ahb clock in runtime pm · d7c3a206
      Andy Duan 提交于
      Some SOC like i.MX6SX clock have some limits:
      - ahb clock should be disabled before ipg.
      - ahb and ipg clocks are required for MAC MII bus.
      So, move the ahb clock to runtime management together with
      ipg clock.
      Signed-off-by: NFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7c3a206
    • N
      net: bridge: multicast: use rcu to access port list from br_multicast_start_querier · c5b493ce
      Nikolay Aleksandrov 提交于
      br_multicast_start_querier() walks over the port list but it can be
      called from a timer with only multicast_lock held which doesn't protect
      the port list, so use RCU to walk over it.
      
      Fixes: c83b8fab ("bridge: Restart queries when last querier expires")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5b493ce
    • D
      Merge branch 'thunderx-xdp-mtu' · 9a4dda81
      David S. Miller 提交于
      Matteo Croce says:
      
      ====================
      Fix thunderx MTU with XDP
      
      The thunderx driver can't use XDP with all MTU values.
      This patches sets the right MTU values, and add a check to avoid setting
      a wrong value which will not function.
      
      v3: Fix a copy-paste from two functions, tested on proper hardware:
      
      2: enP2p1s0v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
          link/ether 1c:1b:0d:0d:52:a4 brd ff:ff:ff:ff:ff:ff
      [  787.019730] nicvf 0002:01:00.1 enP2p1s0v0: Jumbo frames not yet supported with XDP, current MTU 1800.
      RTNETLINK answers: Operation not supported
      [  800.574568] nicvf 0002:01:00.1 enP2p1s0v0: Link is Up 10000 Mbps Full duplex
      [  807.248321] nicvf 0002:01:00.1 enP2p1s0v0: Jumbo frames not yet supported with XDP, current MTU 1500.
      RTNETLINK answers: Invalid argument
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a4dda81
    • M
      net: thunderx: don't allow jumbo frames with XDP · 1f227d16
      Matteo Croce 提交于
      The thunderx driver forbids to load an eBPF program if the MTU is too high,
      but this can be circumvented by loading the eBPF, then raising the MTU.
      
      Fix this by limiting the MTU if an eBPF program is already loaded.
      
      Fixes: 05c773f5 ("net: thunderx: Add basic XDP support")
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f227d16
    • M
      net: thunderx: raise XDP MTU to 1508 · 5ee15c10
      Matteo Croce 提交于
      The thunderx driver splits frames bigger than 1530 bytes to multiple
      pages, making impossible to run an eBPF program on it.
      This leads to a maximum MTU of 1508 if QinQ is in use.
      
      The thunderx driver forbids to load an eBPF program if the MTU is higher
      than 1500 bytes. Raise the limit to 1508 so it is possible to use L2
      protocols which need some more headroom.
      
      Fixes: 05c773f5 ("net: thunderx: Add basic XDP support")
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ee15c10
    • D
      Merge branch 'smc-fixes' · 796fff0c
      David S. Miller 提交于
      Ursula Braun says:
      
      ====================
      net/smc: fixes 2019-04-11
      
      here are some fixes in different areas of the smc code for the net
      tree.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      796fff0c
    • U
      net/smc: move unhash before release of clcsock · f61bca58
      Ursula Braun 提交于
      Commit <26d92e95>
      ("net/smc: move unhash as early as possible in smc_release()")
      fixes one occurrence in the smc code, but the same pattern exists
      in other places. This patch covers the remaining occurrences and
      makes sure, the unhash operation is done before the smc->clcsock is
      released. This avoids a potential use-after-free in smc_diag_dump().
      Reviewed-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f61bca58
    • K
      net/smc: fix return code from FLUSH command · 8ef659f1
      Karsten Graul 提交于
      The FLUSH command is used to empty the pnet table. No return code is
      expected from the command. Commit a9d8b0b1e3d6 added namespace support
      for the pnet table and changed the FLUSH command processing to call
      smc_pnet_remove_by_pnetid() to remove the pnet entries. This function
      returns -ENOENT when no entry was deleted, which is now the return code
      of the FLUSH command. As a result the FLUSH command will return an error
      when the pnet table is already empty.
      Restore the expected behavior and let FLUSH always return 0.
      
      Fixes: a9d8b0b1e3d6 ("net/smc: add pnet table namespace support")
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ef659f1
    • U
      net/smc: propagate file from SMC to TCP socket · 07603b23
      Ursula Braun 提交于
      fcntl(fd, F_SETOWN, getpid()) selects the recipient of SIGURG signals
      that are delivered when out-of-band data arrives on socket fd.
      If an SMC socket program makes use of such an fcntl() call, it fails
      in case of fallback to TCP-mode. In case of fallback the traffic is
      processed with the internal TCP socket. Propagating field "file" from the
      SMC socket to the internal TCP socket fixes the issue.
      Reviewed-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07603b23
    • K
      net/smc: fix a NULL pointer dereference · e183d4e4
      Kangjie Lu 提交于
      In case alloc_ordered_workqueue fails, the fix returns NULL
      to avoid NULL pointer dereference.
      Signed-off-by: NKangjie Lu <kjlu@umn.edu>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e183d4e4
    • K
      net/smc: wait for pending work before clcsock release_sock · fd57770d
      Karsten Graul 提交于
      When the clcsock is already released using sock_release() and a pending
      smc_listen_work accesses the clcsock than that will fail. Solve this
      by canceling and waiting for the work to complete first. Because the
      work holds the sock_lock it must make sure that the lock is not hold
      before the new helper smc_clcsock_release() is invoked. And before the
      smc_listen_work starts working check if the parent listen socket is
      still valid, otherwise stop the work early.
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd57770d
  3. 11 4月, 2019 22 次提交
    • L
      net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv · 988dc4a9
      Lorenzo Bianconi 提交于
      gue tunnels run iptunnel_pull_offloads on received skbs. This can
      determine a possible use-after-free accessing guehdr pointer since
      the packet will be 'uncloned' running pskb_expand_head if it is a
      cloned gso skb (e.g if the packet has been sent though a veth device)
      
      Fixes: a09a4c8d ("tunnels: Remove encapsulation offloads on decap")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      988dc4a9
    • H
      tipc: missing entries in name table of publications · d1841533
      Hoang Le 提交于
      When binding multiple services with specific type 1Ki, 2Ki..,
      this leads to some entries in the name table of publications
      missing when listed out via 'tipc name show'.
      
      The problem is at identify zero last_type conditional provided
      via netlink. The first is initial 'type' when starting name table
      dummping. The second is continuously with zero type (node state
      service type). Then, lookup function failure to finding node state
      service type in next iteration.
      
      To solve this, adding more conditional to marked as dirty type and
      lookup correct service type for the next iteration instead of select
      the first service as initial 'type' zero.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NHoang Le <hoang.h.le@dektech.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1841533
    • J
      vhost: reject zero size iova range · 813dbeb6
      Jason Wang 提交于
      We used to accept zero size iova range which will lead a infinite loop
      in translate_desc(). Fixing this by failing the request in this case.
      
      Reported-by: syzbot+d21e6e297322a900c128@syzkaller.appspotmail.com
      Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      813dbeb6
    • J
      net/tls: prevent bad memory access in tls_is_sk_tx_device_offloaded() · b4f47f38
      Jakub Kicinski 提交于
      Unlike '&&' operator, the '&' does not have short-circuit
      evaluation semantics.  IOW both sides of the operator always
      get evaluated.  Fix the wrong operator in
      tls_is_sk_tx_device_offloaded(), which would lead to
      out-of-bounds access for for non-full sockets.
      
      Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4f47f38
    • S
      failover: allow name change on IFF_UP slave interfaces · 8065a779
      Si-Wei Liu 提交于
      When a netdev appears through hot plug then gets enslaved by a failover
      master that is already up and running, the slave will be opened
      right away after getting enslaved. Today there's a race that userspace
      (udev) may fail to rename the slave if the kernel (net_failover)
      opens the slave earlier than when the userspace rename happens.
      Unlike bond or team, the primary slave of failover can't be renamed by
      userspace ahead of time, since the kernel initiated auto-enslavement is
      unable to, or rather, is never meant to be synchronized with the rename
      request from userspace.
      
      As the failover slave interfaces are not designed to be operated
      directly by userspace apps: IP configuration, filter rules with
      regard to network traffic passing and etc., should all be done on master
      interface. In general, userspace apps only care about the
      name of master interface, while slave names are less important as long
      as admin users can see reliable names that may carry
      other information describing the netdev. For e.g., they can infer that
      "ens3nsby" is a standby slave of "ens3", while for a
      name like "eth0" they can't tell which master it belongs to.
      
      Historically the name of IFF_UP interface can't be changed because
      there might be admin script or management software that is already
      relying on such behavior and assumes that the slave name can't be
      changed once UP. But failover is special: with the in-kernel
      auto-enslavement mechanism, the userspace expectation for device
      enumeration and bring-up order is already broken. Previously initramfs
      and various userspace config tools were modified to bypass failover
      slaves because of auto-enslavement and duplicate MAC address. Similarly,
      in case that users care about seeing reliable slave name, the new type
      of failover slaves needs to be taken care of specifically in userspace
      anyway.
      
      It's less risky to lift up the rename restriction on failover slave
      which is already UP. Although it's possible this change may potentially
      break userspace component (most likely configuration scripts or
      management software) that assumes slave name can't be changed while
      UP, it's relatively a limited and controllable set among all userspace
      components, which can be fixed specifically to listen for the rename
      events on failover slaves. Userspace component interacting with slaves
      is expected to be changed to operate on failover master interface
      instead, as the failover slave is dynamic in nature which may come and
      go at any point.  The goal is to make the role of failover slaves less
      relevant, and userspace components should only deal with failover master
      in the long run.
      
      Fixes: 30c8bd5a ("net: Introduce generic failover module")
      Signed-off-by: NSi-Wei Liu <si-wei.liu@oracle.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Acked-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8065a779
    • H
      team: set slave to promisc if team is already in promisc mode · 43c2adb9
      Hangbin Liu 提交于
      After adding a team interface to bridge, the team interface will enter
      promisc mode. Then if we add a new slave to team0, the slave will keep
      promisc off. Fix it by setting slave to promisc on if team master is
      already in promisc mode, also do the same for allmulti.
      
      v2: add promisc and allmulti checking when delete ports
      
      Fixes: 3d249d4c ("net: introduce ethernet teaming device")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43c2adb9
    • J
      net/tls: fix build without CONFIG_TLS_DEVICE · 903f1a18
      Jakub Kicinski 提交于
      buildbot noticed that TLS_HW is not defined if CONFIG_TLS_DEVICE=n.
      Wrap the cleanup branch into an ifdef, tls_device_free_resources_tx()
      wouldn't be compiled either in this case.
      
      Fixes: 35b71a34 ("net/tls: don't leak partially sent record in device mode")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      903f1a18
    • D
      Merge branch 'tls-leaks' · 44f5e048
      David S. Miller 提交于
      Jakub Kicinski says:
      
      ====================
      net: tls: fix memory leaks and freeing skbs
      
      This series fixes two memory issues and a stack overflow.
      First two patches are fairly simple leaks.  Third patch
      partially reverts an optimization made to the strparser
      which causes creation of skb->frag_list->skb->frag_list...
      chains of 100s of skbs, leading to recursive kfree_skb()
      filling up the kernel stack.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44f5e048
    • J
      net: strparser: partially revert "strparser: Call skb_unclone conditionally" · 4a9c2e37
      Jakub Kicinski 提交于
      This reverts the first part of commit 4e485d06 ("strparser: Call
      skb_unclone conditionally").  To build a message with multiple
      fragments we need our own root of frag_list.  We can't simply
      use the frag_list of orig_skb, because it will lead to linking
      all orig_skbs together creating very long frag chains, and causing
      stack overflow on kfree_skb() (which is called recursively on
      the frag_lists).
      
      BUG: stack guard page was hit at 00000000d40fad41 (stack is 0000000029dde9f4..000000008cce03d5)
      kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP
      RIP: 0010:free_one_page+0x2b/0x490
      
      Call Trace:
        __free_pages_ok+0x143/0x2c0
        skb_release_data+0x8e/0x140
        ? skb_release_data+0xad/0x140
        kfree_skb+0x32/0xb0
      
        [...]
      
        skb_release_data+0xad/0x140
        ? skb_release_data+0xad/0x140
        kfree_skb+0x32/0xb0
        skb_release_data+0xad/0x140
        ? skb_release_data+0xad/0x140
        kfree_skb+0x32/0xb0
        skb_release_data+0xad/0x140
        ? skb_release_data+0xad/0x140
        kfree_skb+0x32/0xb0
        skb_release_data+0xad/0x140
        ? skb_release_data+0xad/0x140
        kfree_skb+0x32/0xb0
        skb_release_data+0xad/0x140
        __kfree_skb+0xe/0x20
        tcp_disconnect+0xd6/0x4d0
        tcp_close+0xf4/0x430
        ? tcp_check_oom+0xf0/0xf0
        tls_sk_proto_close+0xe4/0x1e0 [tls]
        inet_release+0x36/0x60
        __sock_release+0x37/0xa0
        sock_close+0x11/0x20
        __fput+0xa2/0x1d0
        task_work_run+0x89/0xb0
        exit_to_usermode_loop+0x9a/0xa0
        do_syscall_64+0xc0/0xf0
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Let's leave the second unclone conditional, as I'm not entirely
      sure what is its purpose :)
      
      Fixes: 4e485d06 ("strparser: Call skb_unclone conditionally")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a9c2e37
    • J
      net/tls: don't leak partially sent record in device mode · 35b71a34
      Jakub Kicinski 提交于
      David reports that tls triggers warnings related to
      sk->sk_forward_alloc not being zero at destruction time:
      
      WARNING: CPU: 5 PID: 6831 at net/core/stream.c:206 sk_stream_kill_queues+0x103/0x110
      WARNING: CPU: 5 PID: 6831 at net/ipv4/af_inet.c:160 inet_sock_destruct+0x15b/0x170
      
      When sender fills up the write buffer and dies from
      SIGPIPE.  This is due to the device implementation
      not cleaning up the partially_sent_record.
      
      This is because commit a42055e8 ("net/tls: Add support for async encryption of records for performance")
      moved the partial record cleanup to the SW-only path.
      
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Reported-by: NDavid Beckett <david.beckett@netronome.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35b71a34
    • J
      net/tls: fix the IV leaks · 5a03bc73
      Jakub Kicinski 提交于
      Commit f66de3ee ("net/tls: Split conf to rx + tx") made
      freeing of IV and record sequence number conditional to SW
      path only, but commit e8f69799 ("net/tls: Add generic NIC
      offload infrastructure") also allocates that state for the
      device offload configuration.  Remember to free it.
      
      Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a03bc73
    • D
      Merge branch 'ibmvnic-features' · f4a58857
      David S. Miller 提交于
      Thomas Falcon says:
      
      ====================
      ibmvnic: Fix netdev features settings on reset
      
      In its current state, a driver reset clobbers any feature settings
      a user may have toggled and will disable GRO as it is not explicitly
      enabled in the driver. This patch set enables GRO and tries to retain
      user settings after a reset. If the underlying carrier changes, however,
      the driver will disable features unsupported by the new carrier.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4a58857
    • T
      ibmvnic: Fix netdev feature clobbering during a reset · dde746a3
      Thomas Falcon 提交于
      While determining offload capabilities of backing hardware during
      a device reset, the driver is clobbering current feature settings.
      Update hw_features on reset instead of features unless a feature
      is enabled that is no longer supported on the current backing device.
      Also enable features that were not supported prior to the reset but
      were previously enabled or requested by the user.
      
      This can occur if the reset is the result of a carrier change, such
      as a device failover or partition migration.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dde746a3
    • T
      ibmvnic: Enable GRO · b66b7bd2
      Thomas Falcon 提交于
      Enable Generic Receive Offload in the ibmvnic driver.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b66b7bd2
    • D
      Merge branch 'mlxsw-Various-fixes' · f8d49bee
      David S. Miller 提交于
      Ido Schimmel says:
      
      ====================
      mlxsw: Various fixes
      
      This patchset contains various small fixes for mlxsw.
      
      Patch #1 fixes a warning generated by switchdev core when the driver
      fails to insert an MDB entry in the commit phase.
      
      Patches #2-#4 fix a warning in check_flush_dependency() that can be
      triggered when a work item in a WQ_MEM_RECLAIM workqueue tries to flush
      a non-WQ_MEM_RECLAIM workqueue.
      
      It seems that the semantics of the WQ_MEM_RECLAIM flag are not very
      clear [1] and that various patches have been sent to remove it from
      various workqueues throughout the kernel [2][3][4] in order to silence
      the warning.
      
      These patches do the same for the workqueues created by mlxsw that
      probably should not have been created with this flag in the first place.
      
      Patch #5 fixes a regression where an IP address cannot be assigned to a
      VRF upper due to erroneous MAC validation check. Patch #6 adds a test
      case.
      
      Patch #7 adjusts Spectrum-2 shared buffer configuration to be compatible
      with Spectrum-1. The problem and fix are described in detail in the
      commit message.
      
      Please consider patches #1-#5 for 5.0.y. I verified they apply cleanly.
      
      [1] https://patchwork.kernel.org/patch/10791315/
      [2] Commit ce162bfb ("mac80211_hwsim: don't use WQ_MEM_RECLAIM")
      [3] Commit 39baf103 ("IB/core: Fix use workqueue without WQ_MEM_RECLAIM")
      [4] Commit 75215e5b ("iwcm: Don't allocate iwcm workqueue with WQ_MEM_RECLAIM")
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8d49bee
    • I
      mlxsw: spectrum_buffers: Add a multicast pool for Spectrum-2 · d5949d92
      Ido Schimmel 提交于
      In Spectrum-1, when a multicast packet is admitted to the shared buffer
      it increases the quotas of all the ports and {port, TC} to which it is
      forwarded to.
      
      The above means that multicast packets are accounted multiple times in
      the shared buffer and can therefore cause the associated shared buffer
      pool to fill up very quickly.
      
      To work around this issue, commit e83c045e ("mlxsw:
      spectrum_buffers: Configure MC pool") added a dedicated multicast pool
      in which multicast packets are accounted.
      
      The issue is not present in Spectrum-2, but in order to be backward
      compatible with Spectrum-1, its default behavior is to allow a multicast
      packet to increase multiple egress quotas instead of one.
      
      Until the new (non-backward compatible) mode is supported, configure a
      dedicated multicast pool as in Spectrum-1.
      
      Fixes: fe099bf6 ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5949d92
    • I
      selftests: mlxsw: Test VRF MAC vetoing · 7052e243
      Ido Schimmel 提交于
      Test that it is possible to set an IP address on a VRF and that it is
      not vetoed.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7052e243
    • I
      mlxsw: spectrum_router: Do not check VRF MAC address · 972fae68
      Ido Schimmel 提交于
      Commit 74bc9939 ("mlxsw: spectrum_router: Veto unsupported RIF MAC
      addresses") enabled the driver to veto router interface (RIF) MAC
      addresses that it cannot support.
      
      This check should only be performed for interfaces for which the driver
      actually configures a RIF. A VRF upper is not one of them, so ignore it.
      
      Without this patch it is not possible to set an IP address on the VRF
      device and use it as a loopback.
      
      Fixes: 74bc9939 ("mlxsw: spectrum_router: Veto unsupported RIF MAC addresses")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAlexander Petrovskiy <alexpe@mellanox.com>
      Tested-by: NAlexander Petrovskiy <alexpe@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      972fae68
    • I
      mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw workqueue · b442fed1
      Ido Schimmel 提交于
      The workqueue is used to periodically update the networking stack about
      activity / statistics of various objects such as neighbours and TC
      actions.
      
      It should not be called as part of memory reclaim path, so remove the
      WQ_MEM_RECLAIM flag.
      
      Fixes: 3d5479e9 ("mlxsw: core: Remove deprecated create_workqueue")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b442fed1
    • I
      mlxsw: core: Do not use WQ_MEM_RECLAIM for mlxsw ordered workqueue · 4af06997
      Ido Schimmel 提交于
      The ordered workqueue is used to offload various objects such as routes
      and neighbours in the order they are notified.
      
      It should not be called as part of memory reclaim path, so remove the
      WQ_MEM_RECLAIM flag. This can also result in a warning [1], if a worker
      tries to flush a non-WQ_MEM_RECLAIM workqueue.
      
      [1]
      [97703.542861] workqueue: WQ_MEM_RECLAIM mlxsw_core_ordered:mlxsw_sp_router_fib6_event_work [mlxsw_spectrum] is flushing !WQ_MEM_RECLAIM events:rht_deferred_worker
      [97703.542884] WARNING: CPU: 1 PID: 32492 at kernel/workqueue.c:2605 check_flush_dependency+0xb5/0x130
      ...
      [97703.542988] Hardware name: Mellanox Technologies Ltd. MSN3700C/VMOD0008, BIOS 5.11 10/10/2018
      [97703.543049] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fib6_event_work [mlxsw_spectrum]
      [97703.543061] RIP: 0010:check_flush_dependency+0xb5/0x130
      ...
      [97703.543071] RSP: 0018:ffffb3f08137bc00 EFLAGS: 00010086
      [97703.543076] RAX: 0000000000000000 RBX: ffff96e07740ae00 RCX: 0000000000000000
      [97703.543080] RDX: 0000000000000094 RSI: ffffffff82dc1934 RDI: 0000000000000046
      [97703.543084] RBP: ffffb3f08137bc20 R08: ffffffff82dc18a0 R09: 00000000000225c0
      [97703.543087] R10: 0000000000000000 R11: 0000000000007eec R12: ffffffff816e4ee0
      [97703.543091] R13: ffff96e06f6a5c00 R14: ffff96e077ba7700 R15: ffffffff812ab0c0
      [97703.543097] FS: 0000000000000000(0000) GS:ffff96e077a80000(0000) knlGS:0000000000000000
      [97703.543101] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [97703.543104] CR2: 00007f8cd135b280 CR3: 00000001e860e003 CR4: 00000000003606e0
      [97703.543109] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [97703.543112] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [97703.543115] Call Trace:
      [97703.543129] __flush_work+0xbd/0x1e0
      [97703.543137] ? __cancel_work_timer+0x136/0x1b0
      [97703.543145] ? pwq_dec_nr_in_flight+0x49/0xa0
      [97703.543154] __cancel_work_timer+0x136/0x1b0
      [97703.543175] ? mlxsw_reg_trans_bulk_wait+0x145/0x400 [mlxsw_core]
      [97703.543184] cancel_work_sync+0x10/0x20
      [97703.543191] rhashtable_free_and_destroy+0x23/0x140
      [97703.543198] rhashtable_destroy+0xd/0x10
      [97703.543254] mlxsw_sp_fib_destroy+0xb1/0xf0 [mlxsw_spectrum]
      [97703.543310] mlxsw_sp_vr_put+0xa8/0xc0 [mlxsw_spectrum]
      [97703.543364] mlxsw_sp_fib_node_put+0xbf/0x140 [mlxsw_spectrum]
      [97703.543418] ? mlxsw_sp_fib6_entry_destroy+0xe8/0x110 [mlxsw_spectrum]
      [97703.543475] mlxsw_sp_router_fib6_event_work+0x6cd/0x7f0 [mlxsw_spectrum]
      [97703.543484] process_one_work+0x1fd/0x400
      [97703.543493] worker_thread+0x34/0x410
      [97703.543500] kthread+0x121/0x140
      [97703.543507] ? process_one_work+0x400/0x400
      [97703.543512] ? kthread_park+0x90/0x90
      [97703.543523] ret_from_fork+0x35/0x40
      
      Fixes: a3832b31 ("mlxsw: core: Create an ordered workqueue for FIB offload")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NSemion Lisyansky <semionl@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4af06997
    • I
      mlxsw: core: Do not use WQ_MEM_RECLAIM for EMAD workqueue · a8c133b0
      Ido Schimmel 提交于
      The EMAD workqueue is used to handle retransmission of EMAD packets that
      contain configuration data for the device's firmware.
      
      Given the workers need to allocate these packets and that the code is
      not called as part of memory reclaim path, remove the WQ_MEM_RECLAIM
      flag.
      
      Fixes: d965465b ("mlxsw: core: Fix possible deadlock")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8c133b0
    • I
      mlxsw: spectrum_switchdev: Add MDB entries in prepare phase · d4d0e409
      Ido Schimmel 提交于
      The driver cannot guarantee in the prepare phase that it will be able to
      write an MDB entry to the device. In case the driver returned success
      during the prepare phase, but then failed to add the entry in the commit
      phase, a WARNING [1] will be generated by the switchdev core.
      
      Fix this by doing the work in the prepare phase instead.
      
      [1]
      [  358.544486] swp12s0: Commit of object (id=2) failed.
      [  358.550061] WARNING: CPU: 0 PID: 30 at net/switchdev/switchdev.c:281 switchdev_port_obj_add_now+0x9b/0xe0
      [  358.560754] CPU: 0 PID: 30 Comm: kworker/0:1 Not tainted 5.0.0-custom-13382-gf2449babf221 #1350
      [  358.570472] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016
      [  358.580582] Workqueue: events switchdev_deferred_process_work
      [  358.587001] RIP: 0010:switchdev_port_obj_add_now+0x9b/0xe0
      ...
      [  358.614109] RSP: 0018:ffffa6b900d6fe18 EFLAGS: 00010286
      [  358.619943] RAX: 0000000000000000 RBX: ffff8b00797ff000 RCX: 0000000000000000
      [  358.627912] RDX: ffff8b00b7a1d4c0 RSI: ffff8b00b7a152e8 RDI: ffff8b00b7a152e8
      [  358.635881] RBP: ffff8b005c3f5bc0 R08: 000000000000022b R09: 0000000000000000
      [  358.643850] R10: 0000000000000000 R11: ffffa6b900d6fcc8 R12: 0000000000000000
      [  358.651819] R13: dead000000000100 R14: ffff8b00b65a23c0 R15: 0ffff8b00b7a2200
      [  358.659790] FS:  0000000000000000(0000) GS:ffff8b00b7a00000(0000) knlGS:0000000000000000
      [  358.668820] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  358.675228] CR2: 00007f00aad90de0 CR3: 00000001ca80d000 CR4: 00000000001006f0
      [  358.683188] Call Trace:
      [  358.685918]  switchdev_port_obj_add_deferred+0x13/0x60
      [  358.691655]  switchdev_deferred_process+0x6b/0xf0
      [  358.696907]  switchdev_deferred_process_work+0xa/0x10
      [  358.702548]  process_one_work+0x1f5/0x3f0
      [  358.707022]  worker_thread+0x28/0x3c0
      [  358.711099]  ? process_one_work+0x3f0/0x3f0
      [  358.715768]  kthread+0x10d/0x130
      [  358.719369]  ? __kthread_create_on_node+0x180/0x180
      [  358.724815]  ret_from_fork+0x35/0x40
      
      Fixes: 3a49b4fd ("mlxsw: Adding layer 2 multicast support")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NAlex Kushnarov <alexanderk@mellanox.com>
      Tested-by: NAlex Kushnarov <alexanderk@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4d0e409