1. 12 3月, 2018 1 次提交
    • S
      macvlan: filter out unsupported feature flags · 13fbcc8d
      Shannon Nelson 提交于
      Adding a macvlan device on top of a lowerdev that supports
      the xfrm offloads fails with a new regression:
        # ip link add link ens1f0 mv0 type macvlan
        RTNETLINK answers: Operation not permitted
      
      Tracing down the failure shows that the macvlan device inherits
      the NETIF_F_HW_ESP and NETIF_F_HW_ESP_TX_CSUM feature flags
      from the lowerdev, but with no dev->xfrmdev_ops API filled
      in, it doesn't actually support xfrm.  When the request is
      made to add the new macvlan device, the XFRM listener for
      NETDEV_REGISTER calls xfrm_api_check() which fails the new
      registration because dev->xfrmdev_ops is NULL.
      
      The macvlan creation succeeds when we filter out the ESP
      feature flags in macvlan_fix_features(), so let's filter them
      out like we're already filtering out ~NETIF_F_NETNS_LOCAL.
      When XFRM support is added in the future, we can add the flags
      into MACVLAN_FEATURES.
      
      This same problem could crop up in the future with any other
      new feature flags, so let's filter out any flags that aren't
      defined as supported in macvlan.
      
      Fixes: d77e38e6 ("xfrm: Add an IPsec hardware offloading API")
      Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13fbcc8d
  2. 10 3月, 2018 15 次提交
    • D
      Merge branch 'erspan-fixes' · 87de1201
      David S. Miller 提交于
      William Tu says:
      
      ====================
      a couple of erspan fixes
      
      The series fixes a couple of erspan issues.
      The first patch adds the erspan v2 proto type to the ip6 tunnel lookup.
      The second patch improves the error handling when users screws the
      version number in metadata.  The final patch makes sure the skb has
      enough headroom for pushing erspan header when xmit.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87de1201
    • W
      ip6erspan: make sure enough headroom at xmit. · e41c7c68
      William Tu 提交于
      The patch adds skb_cow_header() to ensure enough headroom
      at ip6erspan_tunnel_xmit before pushing the erspan header
      to the skb.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e41c7c68
    • W
      ip6erspan: improve error handling for erspan version number. · d6aa7119
      William Tu 提交于
      When users fill in incorrect erspan version number through
      the struct erspan_metadata uapi, current code skips pushing
      the erspan header but continue pushing the gre header, which
      is incorrect.  The patch fixes it by returning error.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6aa7119
    • W
      ip6gre: add erspan v2 to tunnel lookup · 3b04caab
      William Tu 提交于
      The patch adds the erspan v2 proto in ip6gre_tunnel_lookup
      so the erspan v2 tunnel can be found correctly.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b04caab
    • D
      Merge branch 'mlxsw-ACL-and-mirroring-fixes' · 4eb57ecc
      David S. Miller 提交于
      Ido Schimmel says:
      
      ====================
      mlxsw: ACL and mirroring fixes
      
      The first patch fixes offload of rules using the 'pass' action. Instead
      of continuing to evaluate lower priority rules, the binding is
      terminated and the packet proceeds to the bridge and router blocks on
      ingress, or goes out of the port on egress.
      
      Second patch prevents the user from mirroring more than once from a
      given {Port, Direction} as this is not supported by the device.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eb57ecc
    • P
      mlxsw: spectrum: Prevent duplicate mirrors · 663f1b26
      Petr Machata 提交于
      The Spectrum ASIC doesn't support mirroring more than once from a single
      binding point (which is a port-direction pair). Therefore detect that a
      second binding of a given binding point is attempted.
      
      To that end, extend struct mlxsw_sp_span_inspected_port to track whether
      a given binding point is bound or not. Extend
      mlxsw_sp_span_entry_port_find() to look for ports based on the full
      unique key: port number, direction, and boundness.
      
      Besides fixing the overt bug where configured mirrors are not offloaded,
      this also fixes a more subtle bug: mlxsw_sp_span_inspected_port_del()
      just defers to mlxsw_sp_span_entry_bound_port_find(), and that used to
      find the first port with the right number (disregarding the type). Thus
      by adding and removing egress and ingress mirrors in the right order,
      one could trick the system into believing it has no egress mirrors when
      in fact it did have some. That then caused that
      mlxsw_sp_span_port_mtu_update() didn't update mirroring buffer when MTU
      was changed.
      
      Fixes: 763b4b70 ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      663f1b26
    • J
      mlxsw: spectrum: Fix gact_ok offloading · 49bae2f3
      Jiri Pirko 提交于
      For ok GACT action, TERMINATE binding_cmd should be used in action set
      passed down to HW.
      
      Fixes: b2925957 ("mlxsw: spectrum_flower: Offload "ok" termination action")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reported-by: NAlexander Petrovskiy <alexpe@mellanox.com>
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49bae2f3
    • D
      Merge branch 'vhost_net-ptr_ring-fixes' · bcf34adc
      David S. Miller 提交于
      Jason Wang says:
      
      ====================
      Several fixes for vhost_net ptr_ring usage
      
      This small series try to fix several bugs of ptr_ring usage in
      vhost_net. Please review.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcf34adc
    • J
      vhost_net: examine pointer types during un-producing · 3a403076
      Jason Wang 提交于
      After commit fc72d1d5 ("tuntap: XDP transmission"), we can
      actually queueing XDP pointers in the pointer ring, so we should
      examine the pointer type before freeing the pointer.
      
      Fixes: fc72d1d5 ("tuntap: XDP transmission")
      Reported-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a403076
    • J
      vhost_net: keep private_data and rx_ring synced · 303fd71b
      Jason Wang 提交于
      We get pointer ring from the exported sock, this means we should keep
      rx_ring and vq->private synced during both vq stop and backend set,
      otherwise we may see stale rx_ring.
      
      Fixes: c67df11f ("vhost_net: try batch dequing from skb array")
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      303fd71b
    • A
      vhost_net: initialize rx_ring in vhost_net_open() · ab7e34b3
      Alexander Potapenko 提交于
      KMSAN reported a use of uninit memory in vhost_net_buf_unproduce()
      while trying to access n->vqs[VHOST_NET_VQ_TX].rx_ring:
      
      ==================================================================
      BUG: KMSAN: use of uninitialized memory in vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vho
      et.c:170
      CPU: 0 PID: 3021 Comm: syz-fuzzer Not tainted 4.16.0-rc4+ #3853
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x1f0 mm/kmsan/kmsan.c:1093
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vhost/net.c:170
       vhost_net_stop_vq drivers/vhost/net.c:974 [inline]
       vhost_net_stop+0x146/0x380 drivers/vhost/net.c:982
       vhost_net_release+0xb1/0x4f0 drivers/vhost/net.c:1015
       __fput+0x49f/0xa00 fs/file_table.c:209
       ____fput+0x37/0x40 fs/file_table.c:243
       task_work_run+0x243/0x2c0 kernel/task_work.c:113
       tracehook_notify_resume include/linux/tracehook.h:191 [inline]
       exit_to_usermode_loop arch/x86/entry/common.c:166 [inline]
       prepare_exit_to_usermode+0x349/0x3b0 arch/x86/entry/common.c:196
       syscall_return_slowpath+0xf3/0x6d0 arch/x86/entry/common.c:265
       do_syscall_64+0x34d/0x450 arch/x86/entry/common.c:292
      ...
      origin:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213
       kmsan_kmalloc_large+0x6f/0xd0 mm/kmsan/kmsan.c:392
       kmalloc_large_node_hook mm/slub.c:1366 [inline]
       kmalloc_large_node mm/slub.c:3808 [inline]
       __kmalloc_node+0x100e/0x1290 mm/slub.c:3818
       kmalloc_node include/linux/slab.h:554 [inline]
       kvmalloc_node+0x1a5/0x2e0 mm/util.c:419
       kvmalloc include/linux/mm.h:541 [inline]
       vhost_net_open+0x64/0x5f0 drivers/vhost/net.c:921
       misc_open+0x7b5/0x8b0 drivers/char/misc.c:154
       chrdev_open+0xc28/0xd90 fs/char_dev.c:417
       do_dentry_open+0xccb/0x1430 fs/open.c:752
       vfs_open+0x272/0x2e0 fs/open.c:866
       do_last fs/namei.c:3378 [inline]
       path_openat+0x49ad/0x6580 fs/namei.c:3519
       do_filp_open+0x267/0x640 fs/namei.c:3553
       do_sys_open+0x6ad/0x9c0 fs/open.c:1059
       SYSC_openat+0xc7/0xe0 fs/open.c:1086
       SyS_openat+0x63/0x90 fs/open.c:1080
       do_syscall_64+0x2f1/0x450 arch/x86/entry/common.c:287
      ==================================================================
      
      Fixes: c67df11f ("vhost_net: try batch dequing from skb array")
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab7e34b3
    • K
      net: ethernet: ave: enable Rx drop interrupt · d06cbe9c
      Kunihiko Hayashi 提交于
      This enables AVE_GI_RXDROP interrupt factor. This factor indicates
      depletion of Rx descriptors and the handler counts the number
      of dropped packets.
      Signed-off-by: NKunihiko Hayashi <hayashi.kunihiko@socionext.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d06cbe9c
    • D
      net: use skb_is_gso_sctp() instead of open-coding · 1dd27cde
      Daniel Axtens 提交于
      As well as the basic conversion, I noticed that a lot of the
      SCTP code checks gso_type without first checking skb_is_gso()
      so I have added that where appropriate.
      
      Also, document the helper.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dd27cde
    • E
      ieee802154: 6lowpan: fix possible NULL deref in lowpan_device_event() · ca0edb13
      Eric Dumazet 提交于
      A tun device type can trivially be set to arbitrary value using
      TUNSETLINK ioctl().
      
      Therefore, lowpan_device_event() must really check that ieee802154_ptr
      is not NULL.
      
      Fixes: 2c88b528 ("ieee802154: 6lowpan: remove check on null")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@osg.samsung.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NStefan Schmidt <stefan@osg.samsung.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca0edb13
    • L
      ipv6: fix access to non-linear packet in ndisc_fill_redirect_hdr_option() · 9f62c15f
      Lorenzo Bianconi 提交于
      Fix the following slab-out-of-bounds kasan report in
      ndisc_fill_redirect_hdr_option when the incoming ipv6 packet is not
      linear and the accessed data are not in the linear data region of orig_skb.
      
      [ 1503.122508] ==================================================================
      [ 1503.122832] BUG: KASAN: slab-out-of-bounds in ndisc_send_redirect+0x94e/0x990
      [ 1503.123036] Read of size 1184 at addr ffff8800298ab6b0 by task netperf/1932
      
      [ 1503.123220] CPU: 0 PID: 1932 Comm: netperf Not tainted 4.16.0-rc2+ #124
      [ 1503.123347] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
      [ 1503.123527] Call Trace:
      [ 1503.123579]  <IRQ>
      [ 1503.123638]  print_address_description+0x6e/0x280
      [ 1503.123849]  kasan_report+0x233/0x350
      [ 1503.123946]  memcpy+0x1f/0x50
      [ 1503.124037]  ndisc_send_redirect+0x94e/0x990
      [ 1503.125150]  ip6_forward+0x1242/0x13b0
      [...]
      [ 1503.153890] Allocated by task 1932:
      [ 1503.153982]  kasan_kmalloc+0x9f/0xd0
      [ 1503.154074]  __kmalloc_track_caller+0xb5/0x160
      [ 1503.154198]  __kmalloc_reserve.isra.41+0x24/0x70
      [ 1503.154324]  __alloc_skb+0x130/0x3e0
      [ 1503.154415]  sctp_packet_transmit+0x21a/0x1810
      [ 1503.154533]  sctp_outq_flush+0xc14/0x1db0
      [ 1503.154624]  sctp_do_sm+0x34e/0x2740
      [ 1503.154715]  sctp_primitive_SEND+0x57/0x70
      [ 1503.154807]  sctp_sendmsg+0xaa6/0x1b10
      [ 1503.154897]  sock_sendmsg+0x68/0x80
      [ 1503.154987]  ___sys_sendmsg+0x431/0x4b0
      [ 1503.155078]  __sys_sendmsg+0xa4/0x130
      [ 1503.155168]  do_syscall_64+0x171/0x3f0
      [ 1503.155259]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      [ 1503.155436] Freed by task 1932:
      [ 1503.155527]  __kasan_slab_free+0x134/0x180
      [ 1503.155618]  kfree+0xbc/0x180
      [ 1503.155709]  skb_release_data+0x27f/0x2c0
      [ 1503.155800]  consume_skb+0x94/0xe0
      [ 1503.155889]  sctp_chunk_put+0x1aa/0x1f0
      [ 1503.155979]  sctp_inq_pop+0x2f8/0x6e0
      [ 1503.156070]  sctp_assoc_bh_rcv+0x6a/0x230
      [ 1503.156164]  sctp_inq_push+0x117/0x150
      [ 1503.156255]  sctp_backlog_rcv+0xdf/0x4a0
      [ 1503.156346]  __release_sock+0x142/0x250
      [ 1503.156436]  release_sock+0x80/0x180
      [ 1503.156526]  sctp_sendmsg+0xbb0/0x1b10
      [ 1503.156617]  sock_sendmsg+0x68/0x80
      [ 1503.156708]  ___sys_sendmsg+0x431/0x4b0
      [ 1503.156799]  __sys_sendmsg+0xa4/0x130
      [ 1503.156889]  do_syscall_64+0x171/0x3f0
      [ 1503.156980]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      [ 1503.157158] The buggy address belongs to the object at ffff8800298ab600
                      which belongs to the cache kmalloc-1024 of size 1024
      [ 1503.157444] The buggy address is located 176 bytes inside of
                      1024-byte region [ffff8800298ab600, ffff8800298aba00)
      [ 1503.157702] The buggy address belongs to the page:
      [ 1503.157820] page:ffffea0000a62a00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
      [ 1503.158053] flags: 0x4000000000008100(slab|head)
      [ 1503.158171] raw: 4000000000008100 0000000000000000 0000000000000000 00000001800e000e
      [ 1503.158350] raw: dead000000000100 dead000000000200 ffff880036002600 0000000000000000
      [ 1503.158523] page dumped because: kasan: bad access detected
      
      [ 1503.158698] Memory state around the buggy address:
      [ 1503.158816]  ffff8800298ab900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [ 1503.158988]  ffff8800298ab980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [ 1503.159165] >ffff8800298aba00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1503.159338]                    ^
      [ 1503.159436]  ffff8800298aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 1503.159610]  ffff8800298abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 1503.159785] ==================================================================
      [ 1503.159964] Disabling lock debugging due to kernel taint
      
      The test scenario to trigger the issue consists of 4 devices:
      - H0: data sender, connected to LAN0
      - H1: data receiver, connected to LAN1
      - GW0 and GW1: routers between LAN0 and LAN1. Both of them have an
        ethernet connection on LAN0 and LAN1
      On H{0,1} set GW0 as default gateway while on GW0 set GW1 as next hop for
      data from LAN0 to LAN1.
      Moreover create an ip6ip6 tunnel between H0 and H1 and send 3 concurrent
      data streams (TCP/UDP/SCTP) from H0 to H1 through ip6ip6 tunnel (send
      buffer size is set to 16K). While data streams are active flush the route
      cache on HA multiple times.
      I have not been able to identify a given commit that introduced the issue
      since, using the reproducer described above, the kasan report has been
      triggered from 4.14 and I have not gone back further.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f62c15f
  3. 09 3月, 2018 8 次提交
  4. 08 3月, 2018 16 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · cfda06d7
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-03-08
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix various BPF helpers which adjust the skb and its GSO information
         with regards to SCTP GSO. The latter is a special case where gso_size
         is of value GSO_BY_FRAGS, so mangling that will end up corrupting
         the skb, thus bail out when seeing SCTP GSO packets, from Daniel(s).
      
      2) Fix a compilation error in bpftool where BPF_FS_MAGIC is not defined
         due to too old kernel headers in the system, from Jiri.
      
      3) Increase the number of x64 JIT passes in order to allow larger images
         to converge instead of punting them to interpreter or having them
         rejected when the interpreter is not built into the kernel, from Daniel.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfda06d7
    • D
      bpf, x64: increase number of passes · 6007b080
      Daniel Borkmann 提交于
      In Cilium some of the main programs we run today are hitting 9 passes
      on x64's JIT compiler, and we've had cases already where we surpassed
      the limit where the JIT then punts the program to the interpreter
      instead, leading to insertion failures due to CONFIG_BPF_JIT_ALWAYS_ON
      or insertion failures due to the prog array owner being JITed but the
      program to insert not (both must have the same JITed/non-JITed property).
      
      One concrete case the program image shrunk from 12,767 bytes down to
      10,288 bytes where the image converged after 16 steps. I've measured
      that this took 340us in the JIT until it converges on my i7-6600U. Thus,
      increase the original limit we had from day one where the JIT covered
      cBPF only back then before we run into the case (as similar with the
      complexity limit) where we trip over this and hit program rejections.
      Also add a cond_resched() into the compilation loop, the JIT process
      runs without any locks and may sleep anyway.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      6007b080
    • G
      cxgb4: do not set needs_free_netdev for mgmt dev's · b06ef18a
      Ganesh Goudar 提交于
      Do not set 'needs_free_netdev' as we do call free_netdev
      for mgmt net devices, doing both hits BUG_ON.
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b06ef18a
    • G
      cxgb4: copy adap index to PF0-3 adapter instances · 016764de
      Ganesh Goudar 提交于
      instantiation of VF's on different adapters fails, copy
      adapter index and chip type to PF0-3 adapter instances
      to fix the issue.
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      016764de
    • P
      net: don't unnecessarily load kernel modules in dev_ioctl() · b51f26b1
      Paul Moore 提交于
      Starting with v4.16-rc1 we've been seeing a higher than usual number
      of requests for the kernel to load networking modules, even on events
      which shouldn't trigger a module load (e.g. ioctl(TCGETS)).  Stephen
      Smalley suggested the problem may lie in commit 44c02a2c
      ("dev_ioctl(): move copyin/copyout to callers") which moves changes
      the network dev_ioctl() function to always call dev_load(),
      regardless of the requested ioctl.
      
      This patch moves the dev_load() calls back into the individual ioctls
      while preserving the rest of the original patch.
      Reported-by: NDominick Grift <dac.override@gmail.com>
      Suggested-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b51f26b1
    • S
      tcp: purge write queue upon aborting the connection · e05836ac
      Soheil Hassas Yeganeh 提交于
      When the connection is aborted, there is no point in
      keeping the packets on the write queue until the connection
      is closed.
      
      Similar to a27fd7a8 ('tcp: purge write queue upon RST'),
      this is essential for a correct MSG_ZEROCOPY implementation,
      because userspace cannot call close(fd) before receiving
      zerocopy signals even when the connection is aborted.
      
      Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY")
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e05836ac
    • A
      dccp: check sk for closed state in dccp_sendmsg() · 67f93df7
      Alexey Kodanev 提交于
      dccp_disconnect() sets 'dp->dccps_hc_tx_ccid' tx handler to NULL,
      therefore if DCCP socket is disconnected and dccp_sendmsg() is
      called after it, it will cause a NULL pointer dereference in
      dccp_write_xmit().
      
      This crash and the reproducer was reported by syzbot. Looks like
      it is reproduced if commit 69c64866 ("dccp: CVE-2017-8824:
      use-after-free in DCCP code") is applied.
      
      Reported-by: syzbot+f99ab3887ab65d70f816@syzkaller.appspotmail.com
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67f93df7
    • E
      l2tp: do not accept arbitrary sockets · 17cfe79a
      Eric Dumazet 提交于
      syzkaller found an issue caused by lack of sufficient checks
      in l2tp_tunnel_create()
      
      RAW sockets can not be considered as UDP ones for instance.
      
      In another patch, we shall replace all pr_err() by less intrusive
      pr_debug() so that syzkaller can find other bugs faster.
      Acked-by: NGuillaume Nault <g.nault@alphalink.fr>
      Acked-by: NJames Chapman <jchapman@katalix.com>
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69
      dst_release: dst:00000000d53d0d0f refcnt:-1
      Write of size 1 at addr ffff8801d013b798 by task syz-executor3/6242
      
      CPU: 1 PID: 6242 Comm: syz-executor3 Not tainted 4.16.0-rc2+ #253
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x24d lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report+0x23b/0x360 mm/kasan/report.c:412
       __asan_report_store1_noabort+0x17/0x20 mm/kasan/report.c:435
       setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69
       l2tp_tunnel_create+0x1354/0x17f0 net/l2tp/l2tp_core.c:1596
       pppol2tp_connect+0x14b1/0x1dd0 net/l2tp/l2tp_ppp.c:707
       SYSC_connect+0x213/0x4a0 net/socket.c:1640
       SyS_connect+0x24/0x30 net/socket.c:1621
       do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
      Fixes: fd558d18 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17cfe79a
    • K
      net: Fix hlist corruptions in inet_evict_bucket() · a5600024
      Kirill Tkhai 提交于
      inet_evict_bucket() iterates global list, and
      several tasks may call it in parallel. All of
      them hash the same fq->list_evictor to different
      lists, which leads to list corruption.
      
      This patch makes fq be hashed to expired list
      only if this has not been made yet by another
      task. Since inet_frag_alloc() allocates fq
      using kmem_cache_zalloc(), we may rely on
      list_evictor is initially unhashed.
      
      The problem seems to exist before async
      pernet_operations, as there was possible to have
      exit method to be executed in parallel with
      inet_frags::frags_work, so I add two Fixes tags.
      This also may go to stable.
      
      Fixes: d1fe1944 "inet: frag: don't re-use chainlist for evictor"
      Fixes: f84c6821 "net: Convert pernet_subsys, registered from inet_init()"
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5600024
    • J
      net: smsc911x: Fix unload crash when link is up · e06513d7
      Jeremy Linton 提交于
      The smsc911x driver will crash if it is rmmod'ed while the netdev
      is up like:
      
      Call trace:
        phy_detach+0x94/0x150
        phy_disconnect+0x40/0x50
        smsc911x_stop+0x104/0x128 [smsc911x]
        __dev_close_many+0xb4/0x138
        dev_close_many+0xbc/0x190
        rollback_registered_many+0x140/0x460
        rollback_registered+0x68/0xb0
        unregister_netdevice_queue+0x100/0x118
        unregister_netdev+0x28/0x38
        smsc911x_drv_remove+0x58/0x130 [smsc911x]
        platform_drv_remove+0x30/0x50
        device_release_driver_internal+0x15c/0x1f8
        driver_detach+0x54/0x98
        bus_remove_driver+0x64/0xe8
        driver_unregister+0x34/0x60
        platform_driver_unregister+0x20/0x30
        smsc911x_cleanup_module+0x14/0xbca8 [smsc911x]
        SyS_delete_module+0x1e8/0x238
        __sys_trace_return+0x0/0x4
      
      This is caused by the mdiobus being unregistered/free'd
      and the code in phy_detach() attempting to manipulate mdio
      related structures from unregister_netdev() calling close()
      
      To fix this, we delay the mdiobus teardown until after
      the netdev is deregistered.
      Reported-by: NMatt Sealey <matt.sealey@arm.com>
      Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e06513d7
    • S
      ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes · e9fa1495
      Stefano Brivio 提交于
      Currently, administrative MTU changes on a given netdevice are
      not reflected on route exceptions for MTU-less routes, with a
      set PMTU value, for that device:
      
       # ip -6 route get 2001:db8::b
       2001:db8::b from :: dev vti_a proto kernel src 2001:db8::a metric 256 pref medium
       # ping6 -c 1 -q -s10000 2001:db8::b > /dev/null
       # ip netns exec a ip -6 route get 2001:db8::b
       2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
           cache expires 571sec mtu 4926 pref medium
       # ip link set dev vti_a mtu 3000
       # ip -6 route get 2001:db8::b
       2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
           cache expires 571sec mtu 4926 pref medium
       # ip link set dev vti_a mtu 9000
       # ip -6 route get 2001:db8::b
       2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
           cache expires 571sec mtu 4926 pref medium
      
      The first issue is that since commit fb56be83 ("net-ipv6: on
      device mtu change do not add mtu to mtu-less routes") we don't
      call rt6_exceptions_update_pmtu() from rt6_mtu_change_route(),
      which handles administrative MTU changes, if the regular route
      is MTU-less.
      
      However, PMTU exceptions should be always updated, as long as
      RTAX_MTU is not locked. Keep the check for MTU-less main route,
      as introduced by that commit, but, for exceptions,
      call rt6_exceptions_update_pmtu() regardless of that check.
      
      Once that is fixed, one problem remains: MTU changes are not
      reflected if the new MTU is higher than the previous one,
      because rt6_exceptions_update_pmtu() doesn't allow that. We
      should instead allow PMTU increase if the old PMTU matches the
      local MTU, as that implies that the old MTU was the lowest in the
      path, and PMTU discovery might lead to different results.
      
      The existing check in rt6_mtu_change_route() correctly took that
      case into account (for regular routes only), so factor it out
      and re-use it also in rt6_exceptions_update_pmtu().
      
      While at it, fix comments style and grammar, and try to be a bit
      more descriptive.
      Reported-by: NXiumei Mu <xmu@redhat.com>
      Fixes: fb56be83 ("net-ipv6: on device mtu change do not add mtu to mtu-less routes")
      Fixes: f5bbe7ee ("ipv6: prepare rt6_mtu_change() for exception table")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9fa1495
    • H
      net: qcom/emac: Use proper free methods during TX · cc5db315
      Hemanth Puranik 提交于
      This patch fixes the warning messages/call traces seen if DMA debug is
      enabled, In case of fragmented skb's memory was allocated using
      dma_map_page but freed using dma_unmap_single. This patch modifies buffer
      allocations in TX path to use dma_map_page in all the places and
      dma_unmap_page while freeing the buffers.
      Signed-off-by: NHemanth Puranik <hpuranik@codeaurora.org>
      Acked-by: NTimur Tabi <timur@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc5db315
    • M
      qed: Free RoCE ILT Memory on rmmod qedr · 9de506a5
      Michal Kalderon 提交于
      Rdma requires ILT Memory to be allocated for it's QPs.
      Each ILT entry points to a page used by several Rdma QPs.
      To avoid allocating all the memory in advance, the rdma
      implementation dynamically allocates memory as more QPs are
      added, however it does not dynamically free the memory.
      The memory should have been freed on rmmod qedr, but isn't.
      This patch adds the memory freeing on rmmod qedr (currently
      it will be freed with qed is removed).
      
      An outcome of this bug, is that if qedr is unloaded and loaded
      without unloaded qed, there will be no more RoCE traffic.
      
      The reason these are related, is that the logic of detecting the
      first QP ever opened is by asking whether ILT memory for RoCE has
      been allocated.
      
      In addition, this patch modifies freeing of the Task context to
      always use the PROTOCOLID_ROCE and not the protocol passed,
      this is because task context for iWARP and ROCE both use the
      ROCE protocol id, as opposed to the connection context.
      
      Fixes: dbb799c3 ("qed: Initialize hardware for new protocols")
      Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9de506a5
    • D
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 87772fe6
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2018-03-05
      
      This series contains fixes to e1000e only.
      
      Benjamin Poirier provides all but one fix in this series, starting with
      workaround for a VMWare e1000e emulation issue where ICR reads 0x0 on
      the emulated device.  Partially reverted a previous commit dealing with
      the "Other" interrupt throttling to avoid unforeseen fallout from these
      changes that are not strictly necessary.  Restored the ICS write for
      receive and transmit queue interrupts in the case that txq or rxq bits
      were set in ICR and the Other interrupt handler read and cleared ICR
      before the queue interrupt was raised.  Fixed an bug where interrupts
      may be missed if ICR is read while INT_ASSERTED is not set, so avoid the
      problem by setting all bits related to events that can trigger the Other
      interrupt in IMS.  Fixed the return value for check_for_link() when
      auto-negotiation is off.
      
      Pierre-Yves Kerbrat fixes e1000e to use dma_zalloc_coherent() to make
      sure the ring is memset to 0 to prevent the area from containing
      garbage.
      
      v2: added an additional e1000e fix to the series
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87772fe6
    • E
      net: usbnet: fix potential deadlock on 32bit hosts · 2695578b
      Eric Dumazet 提交于
      Marek reported a LOCKDEP issue occurring on 32bit host,
      that we tracked down to the fact that usbnet could either
      run from soft or hard irqs.
      
      This patch adds u64_stats_update_begin_irqsave() and
      u64_stats_update_end_irqrestore() helpers to solve this case.
      
      [   17.768040] ================================
      [   17.772239] WARNING: inconsistent lock state
      [   17.776511] 4.16.0-rc3-next-20180227-00007-g876c53a7493c #453 Not tainted
      [   17.783329] --------------------------------
      [   17.787580] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
      [   17.793607] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [   17.798751]  (&syncp->seq#5){?.-.}, at: [<9b22e5f0>]
      asix_rx_fixup_internal+0x188/0x288
      [   17.806790] {IN-HARDIRQ-W} state was registered at:
      [   17.811677]   tx_complete+0x100/0x208
      [   17.815319]   __usb_hcd_giveback_urb+0x60/0xf0
      [   17.819770]   xhci_giveback_urb_in_irq+0xa8/0x240
      [   17.824469]   xhci_td_cleanup+0xf4/0x16c
      [   17.828367]   xhci_irq+0xe74/0x2240
      [   17.831827]   usb_hcd_irq+0x24/0x38
      [   17.835343]   __handle_irq_event_percpu+0x98/0x510
      [   17.840111]   handle_irq_event_percpu+0x1c/0x58
      [   17.844623]   handle_irq_event+0x38/0x5c
      [   17.848519]   handle_fasteoi_irq+0xa4/0x138
      [   17.852681]   generic_handle_irq+0x18/0x28
      [   17.856760]   __handle_domain_irq+0x6c/0xe4
      [   17.860941]   gic_handle_irq+0x54/0xa0
      [   17.864666]   __irq_svc+0x70/0xb0
      [   17.867964]   arch_cpu_idle+0x20/0x3c
      [   17.871578]   arch_cpu_idle+0x20/0x3c
      [   17.875190]   do_idle+0x144/0x218
      [   17.878468]   cpu_startup_entry+0x18/0x1c
      [   17.882454]   start_kernel+0x394/0x400
      [   17.886177] irq event stamp: 161912
      [   17.889616] hardirqs last  enabled at (161912): [<7bedfacf>]
      __netdev_alloc_skb+0xcc/0x140
      [   17.897893] hardirqs last disabled at (161911): [<d58261d0>]
      __netdev_alloc_skb+0x94/0x140
      [   17.904903] exynos5-hsi2c 12ca0000.i2c: tx timeout
      [   17.906116] softirqs last  enabled at (161904): [<387102ff>]
      irq_enter+0x78/0x80
      [   17.906123] softirqs last disabled at (161905): [<cf4c628e>]
      irq_exit+0x134/0x158
      [   17.925722].
      [   17.925722] other info that might help us debug this:
      [   17.933435]  Possible unsafe locking scenario:
      [   17.933435].
      [   17.940331]        CPU0
      [   17.942488]        ----
      [   17.944894]   lock(&syncp->seq#5);
      [   17.948274]   <Interrupt>
      [   17.950847]     lock(&syncp->seq#5);
      [   17.954386].
      [   17.954386]  *** DEADLOCK ***
      [   17.954386].
      [   17.962422] no locks held by swapper/0/0.
      
      Fixes: c8b5d129 ("net: usbnet: support 64bit stats")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2695578b
    • A
      sch_netem: fix skb leak in netem_enqueue() · 35d889d1
      Alexey Kodanev 提交于
      When we exceed current packets limit and we have more than one
      segment in the list returned by skb_gso_segment(), netem drops
      only the first one, skipping the rest, hence kmemleak reports:
      
      unreferenced object 0xffff880b5d23b600 (size 1024):
        comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s)
        hex dump (first 32 bytes):
          00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00  ..#]............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000d8a19b9d>] __alloc_skb+0xc9/0x520
          [<000000001709b32f>] skb_segment+0x8c8/0x3710
          [<00000000c7b9bb88>] tcp_gso_segment+0x331/0x1830
          [<00000000c921cba1>] inet_gso_segment+0x476/0x1370
          [<000000008b762dd4>] skb_mac_gso_segment+0x1f9/0x510
          [<000000002182660a>] __skb_gso_segment+0x1dd/0x620
          [<00000000412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem]
          [<0000000005d3b2a9>] __dev_queue_xmit+0x1167/0x2120
          [<00000000fc5f7327>] ip_finish_output2+0x998/0xf00
          [<00000000d309e9d3>] ip_output+0x1aa/0x2c0
          [<000000007ecbd3a4>] tcp_transmit_skb+0x18db/0x3670
          [<0000000042d2a45f>] tcp_write_xmit+0x4d4/0x58c0
          [<0000000056a44199>] tcp_tasklet_func+0x3d9/0x540
          [<0000000013d06d02>] tasklet_action+0x1ca/0x250
          [<00000000fcde0b8b>] __do_softirq+0x1b4/0x5a3
          [<00000000e7ed027c>] irq_exit+0x1e2/0x210
      
      Fix it by adding the rest of the segments, if any, to skb 'to_free'
      list. Add new __qdisc_drop_all() and qdisc_drop_all() functions
      because they can be useful in the future if we need to drop segmented
      GSO packets in other places.
      
      Fixes: 6071bd1a ("netem: Segment GSO packets on enqueue")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35d889d1