1. 11 5月, 2018 7 次提交
  2. 08 5月, 2018 2 次提交
  3. 07 5月, 2018 2 次提交
  4. 05 5月, 2018 1 次提交
  5. 03 5月, 2018 2 次提交
  6. 02 5月, 2018 3 次提交
    • T
      ipv6: Allow non-gateway ECMP for IPv6 · edd7ceb7
      Thomas Winter 提交于
      It is valid to have static routes where the nexthop
      is an interface not an address such as tunnels.
      For IPv4 it was possible to use ECMP on these routes
      but not for IPv6.
      Signed-off-by: NThomas Winter <Thomas.Winter@alliedtelesis.co.nz>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edd7ceb7
    • W
      udp: disable gso with no_check_tx · a8c744a8
      Willem de Bruijn 提交于
      Syzbot managed to send a udp gso packet without checksum offload into
      the gso stack by disabling tx checksum (UDP_NO_CHECK6_TX). This
      triggered the skb_warn_bad_offload.
      
        RIP: 0010:skb_warn_bad_offload+0x2bc/0x600 net/core/dev.c:2658
         skb_gso_segment include/linux/netdevice.h:4038 [inline]
         validate_xmit_skb+0x54d/0xd90 net/core/dev.c:3120
         __dev_queue_xmit+0xbf8/0x34c0 net/core/dev.c:3577
         dev_queue_xmit+0x17/0x20 net/core/dev.c:3618
      
      UDP_NO_CHECK6_TX sets skb->ip_summed to CHECKSUM_NONE just after the
      udp gso integrity checks in udp_(v6_)send_skb. Extend those checks to
      catch and fail in this case.
      
      After the integrity checks jump directly to the CHECKSUM_PARTIAL case
      to avoid reading the no_check_tx flags again (a TOCTTOU race).
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8c744a8
    • E
      ipv6: fix uninit-value in ip6_multipath_l3_keys() · cea67a2d
      Eric Dumazet 提交于
      syzbot/KMSAN reported an uninit-value in ip6_multipath_l3_keys(),
      root caused to a bad assumption of ICMP header being already
      pulled in skb->head
      
      ip_multipath_l3_keys() does the correct thing, so it is an IPv6 only bug.
      
      BUG: KMSAN: uninit-value in ip6_multipath_l3_keys net/ipv6/route.c:1830 [inline]
      BUG: KMSAN: uninit-value in rt6_multipath_hash+0x5c4/0x640 net/ipv6/route.c:1858
      CPU: 0 PID: 4507 Comm: syz-executor661 Not tainted 4.16.0+ #87
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
       ip6_multipath_l3_keys net/ipv6/route.c:1830 [inline]
       rt6_multipath_hash+0x5c4/0x640 net/ipv6/route.c:1858
       ip6_route_input+0x65a/0x920 net/ipv6/route.c:1884
       ip6_rcv_finish+0x413/0x6e0 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:288 [inline]
       ipv6_rcv+0x1e16/0x2340 net/ipv6/ip6_input.c:208
       __netif_receive_skb_core+0x47df/0x4a90 net/core/dev.c:4562
       __netif_receive_skb net/core/dev.c:4627 [inline]
       netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
       netif_receive_skb+0x230/0x240 net/core/dev.c:4725
       tun_rx_batched drivers/net/tun.c:1555 [inline]
       tun_get_user+0x740f/0x7c60 drivers/net/tun.c:1962
       tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
       call_write_iter include/linux/fs.h:1782 [inline]
       new_sync_write fs/read_write.c:469 [inline]
       __vfs_write+0x7fb/0x9f0 fs/read_write.c:482
       vfs_write+0x463/0x8d0 fs/read_write.c:544
       SYSC_write+0x172/0x360 fs/read_write.c:589
       SyS_write+0x55/0x80 fs/read_write.c:581
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 23aebdac ("ipv6: Compute multipath hash for ICMP errors from offending packet")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jakub Sitnicki <jkbs@redhat.com>
      Acked-by: NJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea67a2d
  7. 01 5月, 2018 2 次提交
    • S
      change the comment of vti6_ioctl · 154a8c46
      Sun Lianwen 提交于
      The comment of vti6_ioctl() is wrong. which use vti6_tnl_ioctl
      instead of vti6_ioctl.
      Signed-off-by: NSun Lianwen <sunlw.fnst@cn.fujitsu.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      154a8c46
    • A
      ipv6: sr: extract the right key values for "seg6_make_flowlabel" · 6df93462
      Ahmed Abdelsalam 提交于
      The seg6_make_flowlabel() is used by seg6_do_srh_encap() to compute the
      flowlabel from a given skb. It relies on skb_get_hash() which eventually
      calls __skb_flow_dissect() to extract the flow_keys struct values from
      the skb.
      
      In case of IPv4 traffic, calling seg6_make_flowlabel() after skb_push(),
      skb_reset_network_header(), and skb_mac_header_rebuild() will results in
      flow_keys struct of all key values set to zero.
      
      This patch calls seg6_make_flowlabel() before resetting the headers of skb
      to get the right key values.
      
      Extracted Key values are based on the type inner packet as follows:
      1) IPv6 traffic: src_IP, dst_IP, L4 proto, and flowlabel of inner packet.
      2) IPv4 traffic: src_IP, dst_IP, L4 proto, src_port, and dst_port
      3) L2 traffic: depends on what kind of traffic carried into the L2
      frame. IPv6 and IPv4 traffic works as discussed 1) and 2)
      
      Here a hex_dump of struct flow_keys for IPv4 and IPv6 traffic
      10.100.1.100: 47302 > 30.0.0.2: 5001
      00000000: 14 00 02 00 00 00 00 00 08 00 11 00 00 00 00 00
      00000010: 00 00 00 00 00 00 00 00 13 89 b8 c6 1e 00 00 02
      00000020: 0a 64 01 64
      
      fc00:a1:a > b2::2
      00000000: 28 00 03 00 00 00 00 00 86 dd 11 00 99 f9 02 00
      00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 b2 00 00
      00000020: 00 00 00 00 00 00 00 00 00 00 00 02 fc 00 00 a1
      00000030: 00 00 00 00 00 00 00 00 00 00 00 0a
      Signed-off-by: NAhmed Abdelsalam <amsalam20@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6df93462
  8. 30 4月, 2018 2 次提交
    • W
      erspan: auto detect truncated packets. · 1baf5ebf
      William Tu 提交于
      Currently the truncated bit is set only when the mirrored packet
      is larger than mtu.  For certain cases, the packet might already
      been truncated before sending to the erspan tunnel.  In this case,
      the patch detect whether the IP header's total length is larger
      than the actual skb->len.  If true, this indicated that the
      mirrored packet is truncated and set the erspan truncate bit.
      
      I tested the patch using bpf_skb_change_tail helper function to
      shrink the packet size and send to erspan tunnel.
      Reported-by: NXiaoyan Jin <xiaoyanj@vmware.com>
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1baf5ebf
    • E
      tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive · 05255b82
      Eric Dumazet 提交于
      When adding tcp mmap() implementation, I forgot that socket lock
      had to be taken before current->mm->mmap_sem. syzbot eventually caught
      the bug.
      
      Since we can not lock the socket in tcp mmap() handler we have to
      split the operation in two phases.
      
      1) mmap() on a tcp socket simply reserves VMA space, and nothing else.
        This operation does not involve any TCP locking.
      
      2) getsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements
       the transfert of pages from skbs to one VMA.
        This operation only uses down_read(&current->mm->mmap_sem) after
        holding TCP lock, thus solving the lockdep issue.
      
      This new implementation was suggested by Andy Lutomirski with great details.
      
      Benefits are :
      
      - Better scalability, in case multiple threads reuse VMAS
         (without mmap()/munmap() calls) since mmap_sem wont be write locked.
      
      - Better error recovery.
         The previous mmap() model had to provide the expected size of the
         mapping. If for some reason one part could not be mapped (partial MSS),
         the whole operation had to be aborted.
         With the tcp_zerocopy_receive struct, kernel can report how
         many bytes were successfuly mapped, and how many bytes should
         be read to skip the problematic sequence.
      
      - No more memory allocation to hold an array of page pointers.
        16 MB mappings needed 32 KB for this array, potentially using vmalloc() :/
      
      - skbs are freed while mmap_sem has been released
      
      Following patch makes the change in tcp_mmap tool to demonstrate
      one possible use of mmap() and setsockopt(... TCP_ZEROCOPY_RECEIVE ...)
      
      Note that memcg might require additional changes.
      
      Fixes: 93ab6cc6 ("tcp: implement mmap() for zero copy receive")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Cc: linux-mm@kvack.org
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05255b82
  9. 27 4月, 2018 5 次提交
    • W
      udp: add gso segment cmsg · 2e8de857
      Willem de Bruijn 提交于
      Allow specifying segment size in the send call.
      
      The new control message performs the same function as socket option
      UDP_SEGMENT while avoiding the extra system call.
      
      [ Export udp_cmsg_send for ipv6. -DaveM ]
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e8de857
    • W
      udp: paged allocation with gso · 15e36f5b
      Willem de Bruijn 提交于
      When sending large datagrams that are later segmented, store data in
      page frags to avoid copying from linear in skb_segment.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15e36f5b
    • W
      udp: generate gso with UDP_SEGMENT · bec1f6f6
      Willem de Bruijn 提交于
      Support generic segmentation offload for udp datagrams. Callers can
      concatenate and send at once the payload of multiple datagrams with
      the same destination.
      
      To set segment size, the caller sets socket option UDP_SEGMENT to the
      length of each discrete payload. This value must be smaller than or
      equal to the relevant MTU.
      
      A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
      per send call basis.
      
      Total byte length may then exceed MTU. If not an exact multiple of
      segment size, the last segment will be shorter.
      
      The implementation adds a gso_size field to the udp socket, ip(v6)
      cmsg cookie and inet_cork structure to be able to set the value at
      setsockopt or cmsg time and to work with both lockless and corked
      paths.
      
      Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.
      
          tcp tso
           3197 MB/s 54232 msg/s 54232 calls/s
               6,457,754,262      cycles
      
          tcp gso
           1765 MB/s 29939 msg/s 29939 calls/s
              11,203,021,806      cycles
      
          tcp without tso/gso *
            739 MB/s 12548 msg/s 12548 calls/s
              11,205,483,630      cycles
      
          udp
            876 MB/s 14873 msg/s 624666 calls/s
              11,205,777,429      cycles
      
          udp gso
           2139 MB/s 36282 msg/s 36282 calls/s
              11,204,374,561      cycles
      
         [*] after reverting commit 0a6b2a1d
             ("tcp: switch to GSO being always on")
      
      Measured total system cycles ('-a') for one core while pinning both
      the network receive path and benchmark process to that core:
      
        perf stat -a -C 12 -e cycles \
          ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4
      
      Note the reduction in calls/s with GSO. Bytes per syscall drops
      increases from 1470 to 61818.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bec1f6f6
    • W
      udp: add udp gso · ee80d1eb
      Willem de Bruijn 提交于
      Implement generic segmentation offload support for udp datagrams. A
      follow-up patch adds support to the protocol stack to generate such
      packets.
      
      UDP GSO is not UFO. UFO fragments a single large datagram. GSO splits
      a large payload into a number of discrete UDP datagrams.
      
      The implementation adds a GSO type SKB_UDP_GSO_L4 to differentiate it
      from UFO (SKB_UDP_GSO).
      
      IPPROTO_UDPLITE is excluded, as that protocol has no gso handler
      registered.
      
      [ Export __udp_gso_segment for ipv6. -DaveM ]
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee80d1eb
    • W
      udp: expose inet cork to udp · 1cd7884d
      Willem de Bruijn 提交于
      UDP segmentation offload needs access to inet_cork in the udp layer.
      Pass the struct to ip(6)_make_skb instead of allocating it on the
      stack in that function itself.
      
      This patch is a noop otherwise.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cd7884d
  10. 26 4月, 2018 3 次提交
  11. 25 4月, 2018 1 次提交
  12. 24 4月, 2018 7 次提交
  13. 23 4月, 2018 2 次提交
    • R
      net: fib_rules: add extack support · b16fb418
      Roopa Prabhu 提交于
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b16fb418
    • A
      ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pkts · a957fa19
      Ahmed Abdelsalam 提交于
      In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src()
      in order to set the src addr of outer IPv6 header.
      
      The net_device is required for set_tun_src(). However calling ip6_dst_idev()
      on dst_entry in case of IPv4 traffic results on the following bug.
      
      Using just dst->dev should fix this BUG.
      
      [  196.242461] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      [  196.242975] PGD 800000010f076067 P4D 800000010f076067 PUD 10f060067 PMD 0
      [  196.243329] Oops: 0000 [#1] SMP PTI
      [  196.243468] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd input_leds glue_helper led_class pcspkr serio_raw mac_hid video autofs4 hid_generic usbhid hid e1000 i2c_piix4 ahci pata_acpi libahci
      [  196.244362] CPU: 2 PID: 1089 Comm: ping Not tainted 4.16.0+ #1
      [  196.244606] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [  196.244968] RIP: 0010:seg6_do_srh_encap+0x1ac/0x300
      [  196.245236] RSP: 0018:ffffb2ce00b23a60 EFLAGS: 00010202
      [  196.245464] RAX: 0000000000000000 RBX: ffff8c7f53eea300 RCX: 0000000000000000
      [  196.245742] RDX: 0000f10000000000 RSI: ffff8c7f52085a6c RDI: ffff8c7f41166850
      [  196.246018] RBP: ffffb2ce00b23aa8 R08: 00000000000261e0 R09: ffff8c7f41166800
      [  196.246294] R10: ffffdce5040ac780 R11: ffff8c7f41166828 R12: ffff8c7f41166808
      [  196.246570] R13: ffff8c7f52085a44 R14: ffffffffb73211c0 R15: ffff8c7e69e44200
      [  196.246846] FS:  00007fc448789700(0000) GS:ffff8c7f59d00000(0000) knlGS:0000000000000000
      [  196.247286] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  196.247526] CR2: 0000000000000000 CR3: 000000010f05a000 CR4: 00000000000406e0
      [  196.247804] Call Trace:
      [  196.247972]  seg6_do_srh+0x15b/0x1c0
      [  196.248156]  seg6_output+0x3c/0x220
      [  196.248341]  ? prandom_u32+0x14/0x20
      [  196.248526]  ? ip_idents_reserve+0x6c/0x80
      [  196.248723]  ? __ip_select_ident+0x90/0x100
      [  196.248923]  ? ip_append_data.part.50+0x6c/0xd0
      [  196.249133]  lwtunnel_output+0x44/0x70
      [  196.249328]  ip_send_skb+0x15/0x40
      [  196.249515]  raw_sendmsg+0x8c3/0xac0
      [  196.249701]  ? _copy_from_user+0x2e/0x60
      [  196.249897]  ? rw_copy_check_uvector+0x53/0x110
      [  196.250106]  ? _copy_from_user+0x2e/0x60
      [  196.250299]  ? copy_msghdr_from_user+0xce/0x140
      [  196.250508]  sock_sendmsg+0x36/0x40
      [  196.250690]  ___sys_sendmsg+0x292/0x2a0
      [  196.250881]  ? _cond_resched+0x15/0x30
      [  196.251074]  ? copy_termios+0x1e/0x70
      [  196.251261]  ? _copy_to_user+0x22/0x30
      [  196.251575]  ? tty_mode_ioctl+0x1c3/0x4e0
      [  196.251782]  ? _cond_resched+0x15/0x30
      [  196.251972]  ? mutex_lock+0xe/0x30
      [  196.252152]  ? vvar_fault+0xd2/0x110
      [  196.252337]  ? __do_fault+0x1f/0xc0
      [  196.252521]  ? __handle_mm_fault+0xc1f/0x12d0
      [  196.252727]  ? __sys_sendmsg+0x63/0xa0
      [  196.252919]  __sys_sendmsg+0x63/0xa0
      [  196.253107]  do_syscall_64+0x72/0x200
      [  196.253305]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [  196.253530] RIP: 0033:0x7fc4480b0690
      [  196.253715] RSP: 002b:00007ffde9f252f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [  196.254053] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007fc4480b0690
      [  196.254331] RDX: 0000000000000000 RSI: 000000000060a360 RDI: 0000000000000003
      [  196.254608] RBP: 00007ffde9f253f0 R08: 00000000002d1e81 R09: 0000000000000002
      [  196.254884] R10: 00007ffde9f250c0 R11: 0000000000000246 R12: 0000000000b22070
      [  196.255205] R13: 20c49ba5e353f7cf R14: 431bde82d7b634db R15: 00007ffde9f278fe
      [  196.255484] Code: a5 0f b6 45 c0 41 88 41 28 41 0f b6 41 2c 48 c1 e0 04 49 8b 54 01 38 49 8b 44 01 30 49 89 51 20 49 89 41 18 48 8b 83 b0 00 00 00 <48> 8b 30 49 8b 86 08 0b 00 00 48 8b 40 20 48 8b 50 08 48 0b 10
      [  196.256190] RIP: seg6_do_srh_encap+0x1ac/0x300 RSP: ffffb2ce00b23a60
      [  196.256445] CR2: 0000000000000000
      [  196.256676] ---[ end trace 71af7d093603885c ]---
      
      Fixes: 8936ef76 ("ipv6: sr: fix NULL pointer dereference when setting encap source address")
      Signed-off-by: NAhmed Abdelsalam <amsalam20@gmail.com>
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a957fa19
  14. 22 4月, 2018 1 次提交