1. 09 5月, 2017 1 次提交
    • M
      treewide: use kv[mz]alloc* rather than opencoded variants · 752ade68
      Michal Hocko 提交于
      There are many code paths opencoding kvmalloc.  Let's use the helper
      instead.  The main difference to kvmalloc is that those users are
      usually not considering all the aspects of the memory allocator.  E.g.
      allocation requests <= 32kB (with 4kB pages) are basically never failing
      and invoke OOM killer to satisfy the allocation.  This sounds too
      disruptive for something that has a reasonable fallback - the vmalloc.
      On the other hand those requests might fallback to vmalloc even when the
      memory allocator would succeed after several more reclaim/compaction
      attempts previously.  There is no guarantee something like that happens
      though.
      
      This patch converts many of those places to kv[mz]alloc* helpers because
      they are more conservative.
      
      Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
      Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
      Acked-by: David Sterba <dsterba@suse.com> # btrfs
      Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
      Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Santosh Raspatur <santosh@chelsio.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Yishai Hadas <yishaih@mellanox.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: "Yan, Zheng" <zyan@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      752ade68
  2. 04 5月, 2017 2 次提交
    • A
      ipv4, ipv6: ensure raw socket message is big enough to hold an IP header · 86f4c90a
      Alexander Potapenko 提交于
      raw_send_hdrinc() and rawv6_send_hdrinc() expect that the buffer copied
      from the userspace contains the IPv4/IPv6 header, so if too few bytes are
      copied, parts of the header may remain uninitialized.
      
      This bug has been detected with KMSAN.
      
      For the record, the KMSAN report:
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory in nf_ct_frag6_gather+0xf5a/0x44a0
      inter: 0
      CPU: 0 PID: 1036 Comm: probe Not tainted 4.11.0-rc5+ #2455
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x143/0x1b0 lib/dump_stack.c:52
       kmsan_report+0x16b/0x1e0 mm/kmsan/kmsan.c:1078
       __kmsan_warning_32+0x5c/0xa0 mm/kmsan/kmsan_instr.c:510
       nf_ct_frag6_gather+0xf5a/0x44a0 net/ipv6/netfilter/nf_conntrack_reasm.c:577
       ipv6_defrag+0x1d9/0x280 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn ./include/linux/netfilter.h:102
       nf_hook_slow+0x13f/0x3c0 net/netfilter/core.c:310
       nf_hook ./include/linux/netfilter.h:212
       NF_HOOK ./include/linux/netfilter.h:255
       rawv6_send_hdrinc net/ipv6/raw.c:673
       rawv6_sendmsg+0x2fcb/0x41a0 net/ipv6/raw.c:919
       inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696
       SyS_sendto+0xbc/0xe0 net/socket.c:1664
       do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
      RIP: 0033:0x436e03
      RSP: 002b:00007ffce48baf38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000436e03
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
      RBP: 00007ffce48baf90 R08: 00007ffce48baf50 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000401790 R14: 0000000000401820 R15: 0000000000000000
      origin: 00000000d9400053
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:362
       kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:257
       kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:270
       slab_alloc_node mm/slub.c:2735
       __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4341
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:933
       alloc_skb_with_frags+0x209/0xbc0 net/core/skbuff.c:4678
       sock_alloc_send_pskb+0x9ff/0xe00 net/core/sock.c:1903
       sock_alloc_send_skb+0xe4/0x100 net/core/sock.c:1920
       rawv6_send_hdrinc net/ipv6/raw.c:638
       rawv6_sendmsg+0x2918/0x41a0 net/ipv6/raw.c:919
       inet_sendmsg+0x3f8/0x6d0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       SYSC_sendto+0x6a5/0x7c0 net/socket.c:1696
       SyS_sendto+0xbc/0xe0 net/socket.c:1664
       do_syscall_64+0x72/0xa0 arch/x86/entry/common.c:285
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
      ==================================================================
      
      , triggered by the following syscalls:
        socket(PF_INET6, SOCK_RAW, IPPROTO_RAW) = 3
        sendto(3, NULL, 0, 0, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "ff00::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EPERM
      
      A similar report is triggered in net/ipv4/raw.c if we use a PF_INET socket
      instead of a PF_INET6 one.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86f4c90a
    • E
      tcp: do not inherit fastopen_req from parent · 8b485ce6
      Eric Dumazet 提交于
      Under fuzzer stress, it is possible that a child gets a non NULL
      fastopen_req pointer from its parent at accept() time, when/if parent
      morphs from listener to active session.
      
      We need to make sure this can not happen, by clearing the field after
      socket cloning.
      
      BUG: Double free or freeing an invalid pointer
      Unexpected shadow byte: 0xFB
      CPU: 3 PID: 20933 Comm: syz-executor3 Not tainted 4.11.0+ #306
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x292/0x395 lib/dump_stack.c:52
       kasan_object_err+0x1c/0x70 mm/kasan/report.c:164
       kasan_report_double_free+0x5c/0x70 mm/kasan/report.c:185
       kasan_slab_free+0x9d/0xc0 mm/kasan/kasan.c:580
       slab_free_hook mm/slub.c:1357 [inline]
       slab_free_freelist_hook mm/slub.c:1379 [inline]
       slab_free mm/slub.c:2961 [inline]
       kfree+0xe8/0x2b0 mm/slub.c:3882
       tcp_free_fastopen_req net/ipv4/tcp.c:1077 [inline]
       tcp_disconnect+0xc15/0x13e0 net/ipv4/tcp.c:2328
       inet_child_forget+0xb8/0x600 net/ipv4/inet_connection_sock.c:898
       inet_csk_reqsk_queue_add+0x1e7/0x250
      net/ipv4/inet_connection_sock.c:928
       tcp_get_cookie_sock+0x21a/0x510 net/ipv4/syncookies.c:217
       cookie_v4_check+0x1a19/0x28b0 net/ipv4/syncookies.c:384
       tcp_v4_cookie_check net/ipv4/tcp_ipv4.c:1384 [inline]
       tcp_v4_do_rcv+0x731/0x940 net/ipv4/tcp_ipv4.c:1421
       tcp_v4_rcv+0x2dc0/0x31c0 net/ipv4/tcp_ipv4.c:1715
       ip_local_deliver_finish+0x4cc/0xc20 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip_local_deliver+0x1ce/0x700 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:492 [inline]
       ip_rcv_finish+0xb1d/0x20b0 net/ipv4/ip_input.c:396
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip_rcv+0xd8c/0x19c0 net/ipv4/ip_input.c:487
       __netif_receive_skb_core+0x1ad1/0x3400 net/core/dev.c:4210
       __netif_receive_skb+0x2a/0x1a0 net/core/dev.c:4248
       process_backlog+0xe5/0x6c0 net/core/dev.c:4868
       napi_poll net/core/dev.c:5270 [inline]
       net_rx_action+0xe70/0x18e0 net/core/dev.c:5335
       __do_softirq+0x2fb/0xb99 kernel/softirq.c:284
       do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:899
       </IRQ>
       do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
       do_softirq kernel/softirq.c:176 [inline]
       __local_bh_enable_ip+0x1cf/0x1e0 kernel/softirq.c:181
       local_bh_enable include/linux/bottom_half.h:31 [inline]
       rcu_read_unlock_bh include/linux/rcupdate.h:931 [inline]
       ip_finish_output2+0x9ab/0x15e0 net/ipv4/ip_output.c:230
       ip_finish_output+0xa35/0xdf0 net/ipv4/ip_output.c:316
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip_output+0x1f6/0x7b0 net/ipv4/ip_output.c:404
       dst_output include/net/dst.h:486 [inline]
       ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
       ip_queue_xmit+0x9a8/0x1a10 net/ipv4/ip_output.c:503
       tcp_transmit_skb+0x1ade/0x3470 net/ipv4/tcp_output.c:1057
       tcp_write_xmit+0x79e/0x55b0 net/ipv4/tcp_output.c:2265
       __tcp_push_pending_frames+0xfa/0x3a0 net/ipv4/tcp_output.c:2450
       tcp_push+0x4ee/0x780 net/ipv4/tcp.c:683
       tcp_sendmsg+0x128d/0x39b0 net/ipv4/tcp.c:1342
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x660/0x810 net/socket.c:1696
       SyS_sendto+0x40/0x50 net/socket.c:1664
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x446059
      RSP: 002b:00007faa6761fb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 0000000000446059
      RDX: 0000000000000001 RSI: 0000000020ba3fcd RDI: 0000000000000017
      RBP: 00000000006e40a0 R08: 0000000020ba4ff0 R09: 0000000000000010
      R10: 0000000020000000 R11: 0000000000000282 R12: 0000000000708150
      R13: 0000000000000000 R14: 00007faa676209c0 R15: 00007faa67620700
      Object at ffff88003b5bbcb8, in cache kmalloc-64 size: 64
      Allocated:
      PID = 20909
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:513
       set_track mm/kasan/kasan.c:525 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:616
       kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2745
       kmalloc include/linux/slab.h:490 [inline]
       kzalloc include/linux/slab.h:663 [inline]
       tcp_sendmsg_fastopen net/ipv4/tcp.c:1094 [inline]
       tcp_sendmsg+0x221a/0x39b0 net/ipv4/tcp.c:1139
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x660/0x810 net/socket.c:1696
       SyS_sendto+0x40/0x50 net/socket.c:1664
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      Freed:
      PID = 20909
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:513
       set_track mm/kasan/kasan.c:525 [inline]
       kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:589
       slab_free_hook mm/slub.c:1357 [inline]
       slab_free_freelist_hook mm/slub.c:1379 [inline]
       slab_free mm/slub.c:2961 [inline]
       kfree+0xe8/0x2b0 mm/slub.c:3882
       tcp_free_fastopen_req net/ipv4/tcp.c:1077 [inline]
       tcp_disconnect+0xc15/0x13e0 net/ipv4/tcp.c:2328
       __inet_stream_connect+0x20c/0xf90 net/ipv4/af_inet.c:593
       tcp_sendmsg_fastopen net/ipv4/tcp.c:1111 [inline]
       tcp_sendmsg+0x23a8/0x39b0 net/ipv4/tcp.c:1139
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x660/0x810 net/socket.c:1696
       SyS_sendto+0x40/0x50 net/socket.c:1664
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Fixes: 7db92362 ("tcp: fix potential double free issue for fastopen_req")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b485ce6
  3. 03 5月, 2017 1 次提交
  4. 02 5月, 2017 1 次提交
    • I
      net/esp4: Fix invalid esph pointer crash · 67d349ed
      Ilan Tayari 提交于
      Both esp_output and esp_xmit take a pointer to the ESP header
      and place it in esp_info struct prior to calling esp_output_head.
      
      Inside esp_output_head, the call to esp_output_udp_encap
      makes sure to update the pointer if it gets invalid.
      However, if esp_output_head itself calls skb_cow_data, the
      pointer is not updated and stays invalid, causing a crash
      after esp_output_head returns.
      
      Update the pointer if it becomes invalid in esp_output_head
      
      Fixes: fca11ebd ("esp4: Reorganize esp_output")
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67d349ed
  5. 01 5月, 2017 3 次提交
  6. 29 4月, 2017 2 次提交
  7. 28 4月, 2017 1 次提交
  8. 27 4月, 2017 12 次提交
  9. 26 4月, 2017 5 次提交
  10. 25 4月, 2017 6 次提交
    • W
      net/tcp_fastopen: Remove mss check in tcp_write_timeout() · 59450f8d
      Wei Wang 提交于
      Christoph Paasch from Apple found another firewall issue for TFO:
      After successful 3WHS using TFO, server and client starts to exchange
      data. Afterwards, a 10s idle time occurs on this connection. After that,
      firewall starts to drop every packet on this connection.
      
      The fix for this issue is to extend existing firewall blackhole detection
      logic in tcp_write_timeout() by removing the mss check.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59450f8d
    • W
      net/tcp_fastopen: Add snmp counter for blackhole detection · 46c2fa39
      Wei Wang 提交于
      This counter records the number of times the firewall blackhole issue is
      detected and active TFO is disabled.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46c2fa39
    • W
      net/tcp_fastopen: Disable active side TFO in certain scenarios · cf1ef3f0
      Wei Wang 提交于
      Middlebox firewall issues can potentially cause server's data being
      blackholed after a successful 3WHS using TFO. Following are the related
      reports from Apple:
      https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
      Slide 31 identifies an issue where the client ACK to the server's data
      sent during a TFO'd handshake is dropped.
      C ---> syn-data ---> S
      C <--- syn/ack ----- S
      C (accept & write)
      C <---- data ------- S
      C ----- ACK -> X     S
      		[retry and timeout]
      
      https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
      Slide 5 shows a similar situation that the server's data gets dropped
      after 3WHS.
      C ---- syn-data ---> S
      C <--- syn/ack ----- S
      C ---- ack --------> S
      S (accept & write)
      C?  X <- data ------ S
      		[retry and timeout]
      
      This is the worst failure b/c the client can not detect such behavior to
      mitigate the situation (such as disabling TFO). Failing to proceed, the
      application (e.g., SSL library) may simply timeout and retry with TFO
      again, and the process repeats indefinitely.
      
      The proposed solution is to disable active TFO globally under the
      following circumstances:
      1. client side TFO socket detects out of order FIN
      2. client side TFO socket receives out of order RST
      
      We disable active side TFO globally for 1hr at first. Then if it
      happens again, we disable it for 2h, then 4h, 8h, ...
      And we reset the timeout to 1hr if a client side TFO sockets not opened
      on loopback has successfully received data segs from server.
      And we examine this condition during close().
      
      The rational behind it is that when such firewall issue happens,
      application running on the client should eventually close the socket as
      it is not able to get the data it is expecting. Or application running
      on the server should close the socket as it is not able to receive any
      response from client.
      In both cases, out of order FIN or RST will get received on the client
      given that the firewall will not block them as no data are in those
      frames.
      And we want to disable active TFO globally as it helps if the middle box
      is very close to the client and most of the connections are likely to
      fail.
      
      Also, add a debug sysctl:
        tcp_fastopen_blackhole_detect_timeout_sec:
          the initial timeout to use when firewall blackhole issue happens.
          This can be set and read.
          When setting it to 0, it means to disable the active disable logic.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf1ef3f0
    • D
      net: add rcu locking when changing early demux · 58c4c6a3
      David Ahern 提交于
      systemd-sysctl is triggering a suspicious RCU usage message when
      net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via
      a sysctl config file:
      
      [   33.896184] ===============================
      [   33.899558] [ ERR: suspicious RCU usage.  ]
      [   33.900624] 4.11.0-rc7+ #104 Not tainted
      [   33.901698] -------------------------------
      [   33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage!
      [   33.905724]
      other info that might help us debug this:
      
      [   33.907656]
      rcu_scheduler_active = 2, debug_locks = 0
      [   33.909288] 1 lock held by systemd-sysctl/143:
      [   33.910373]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff8123a370>] file_start_write+0x45/0x48
      [   33.912407]
      stack backtrace:
      [   33.914018] CPU: 0 PID: 143 Comm: systemd-sysctl Not tainted 4.11.0-rc7+ #104
      [   33.915631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [   33.917870] Call Trace:
      [   33.918431]  dump_stack+0x81/0xb6
      [   33.919241]  lockdep_rcu_suspicious+0x10f/0x118
      [   33.920263]  proc_configure_early_demux+0x65/0x10a
      [   33.921391]  proc_udp_early_demux+0x3a/0x41
      
      add rcu locking to proc_configure_early_demux.
      
      Fixes: dddb64bc ("net: Add sysctl to toggle early demux for tcp and udp")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58c4c6a3
    • A
      udp: disable inner UDP checksum offloads in IPsec case · b40c5f4f
      Ansis Atteka 提交于
      Otherwise, UDP checksum offloads could corrupt ESP packets by attempting
      to calculate UDP checksum when this inner UDP packet is already protected
      by IPsec.
      
      One way to reproduce this bug is to have a VM with virtio_net driver (UFO
      set to ON in the guest VM); and then encapsulate all guest's Ethernet
      frames in Geneve; and then further encrypt Geneve with IPsec.  In this
      case following symptoms are observed:
      1. If using ixgbe NIC, then it will complain with following error message:
         ixgbe 0000:01:00.1: partial checksum but l4 proto=32!
      2. Receiving IPsec stack will drop all the corrupted ESP packets and
         increase XfrmInStateProtoError counter in /proc/net/xfrm_stat.
      3. iperf UDP test from the VM with packet sizes above MTU will not work at
         all.
      4. iperf TCP test from the VM will get ridiculously low performance because.
      Signed-off-by: NAnsis Atteka <aatteka@ovn.org>
      Co-authored-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b40c5f4f
    • R
      ipv4: Avoid caching l3mdev dst on mismatched local route · b7c8487c
      Robert Shearman 提交于
      David reported that doing the following:
      
          ip li add red type vrf table 10
          ip link set dev eth1 vrf red
          ip addr add 127.0.0.1/8 dev red
          ip link set dev eth1 up
          ip li set red up
          ping -c1 -w1 -I red 127.0.0.1
          ip li del red
      
      when either policy routing IP rules are present or the local table
      lookup ip rule is before the l3mdev lookup results in a hang with
      these messages:
      
          unregister_netdevice: waiting for red to become free. Usage count = 1
      
      The problem is caused by caching the dst used for sending the packet
      out of the specified interface on a local route with a different
      nexthop interface. Thus the dst could stay around until the route in
      the table the lookup was done is deleted which may be never.
      
      Address the problem by not forcing output device to be the l3mdev in
      the flow's output interface if the lookup didn't use the l3mdev. This
      then results in the dst using the right device according to the route.
      
      Changes in v2:
       - make the dev_out passed in by __ip_route_output_key_hash correct
         instead of checking the nh dev if FLOWI_FLAG_SKIP_NH_OIF is set as
         suggested by David.
      
      Fixes: 5f02ce24 ("net: l3mdev: Allow the l3mdev to be a loopback")
      Reported-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Suggested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7c8487c
  11. 24 4月, 2017 1 次提交
  12. 22 4月, 2017 1 次提交
    • C
      ip_tunnel: Allow policy-based routing through tunnels · 9830ad4c
      Craig Gallek 提交于
      This feature allows the administrator to set an fwmark for
      packets traversing a tunnel.  This allows the use of independent
      routing tables for tunneled packets without the use of iptables.
      
      There is no concept of per-packet routing decisions through IPv4
      tunnels, so this implementation does not need to work with
      per-packet route lookups as the v6 implementation may
      (with IP6_TNL_F_USE_ORIG_FWMARK).
      
      Further, since the v4 tunnel ioctls share datastructures
      (which can not be trivially modified) with the kernel's internal
      tunnel configuration structures, the mark attribute must be stored
      in the tunnel structure itself and passed as a parameter when
      creating or changing tunnel attributes.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9830ad4c
  13. 21 4月, 2017 3 次提交
  14. 19 4月, 2017 1 次提交
    • I
      esp4/6: Fix GSO path for non-GSO SW-crypto packets · 8f92e03e
      Ilan Tayari 提交于
      If esp*_offload module is loaded, outbound packets take the
      GSO code path, being encapsulated at layer 3, but encrypted
      in layer 2. validate_xmit_xfrm calls esp*_xmit for that.
      
      esp*_xmit was wrongfully detecting these packets as going
      through hardware crypto offload, while in fact they should
      be encrypted in software, causing plaintext leakage to
      the network, and also dropping at the receiver side.
      
      Perform the encryption in esp*_xmit, if the SA doesn't have
      a hardware offload_handle.
      
      Also, align esp6 code to esp4 logic.
      
      Fixes: fca11ebd ("esp4: Reorganize esp_output")
      Fixes: 383d0350 ("esp6: Reorganize esp_output")
      Signed-off-by: NIlan Tayari <ilant@mellanox.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      8f92e03e
新手
引导
客服 返回
顶部