1. 18 12月, 2018 1 次提交
  2. 16 12月, 2018 3 次提交
  3. 15 12月, 2018 2 次提交
    • P
      net: tcp6: prefer listeners bound to an address · 0ee58dad
      Peter Oskolkov 提交于
      A relatively common use case is to have several IPs configured
      on a host, and have different listeners for each of them. We would
      like to add a "catch all" listener on addr_any, to match incoming
      connections not served by any of the listeners bound to a specific
      address.
      
      However, port-only lookups can match addr_any sockets when sockets
      listening on specific addresses are present if so_reuseport flag
      is set. This patch eliminates lookups into port-only hashtable,
      as lookups by (addr,port) tuple are easily available.
      
      In addition, compute_score() is tweaked to _not_ match
      addr_any sockets to specific addresses, as hash collisions
      could result in the unwanted behavior described above.
      
      Tested: the patch compiles; full test in the last patch in this
      patchset. Existing reuseport_* selftests also pass.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ee58dad
    • P
      net: udp6: prefer listeners bound to an address · 23b0269e
      Peter Oskolkov 提交于
      A relatively common use case is to have several IPs configured
      on a host, and have different listeners for each of them. We would
      like to add a "catch all" listener on addr_any, to match incoming
      connections not served by any of the listeners bound to a specific
      address.
      
      However, port-only lookups can match addr_any sockets when sockets
      listening on specific addresses are present if so_reuseport flag
      is set. This patch eliminates lookups into port-only hashtable,
      as lookups by (addr,port) tuple are easily available.
      
      In addition, compute_score() is tweaked to _not_ match
      addr_any sockets to specific addresses, as hash collisions
      could result in the unwanted behavior described above.
      
      Tested: the patch compiles; full test in the last patch in this
      patchset. Existing reuseport_* selftests also pass.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23b0269e
  4. 11 12月, 2018 1 次提交
  5. 09 12月, 2018 1 次提交
  6. 08 12月, 2018 2 次提交
    • S
      ipv6: Check available headroom in ip6_xmit() even without options · 66033f47
      Stefano Brivio 提交于
      Even if we send an IPv6 packet without options, MAX_HEADER might not be
      enough to account for the additional headroom required by alignment of
      hardware headers.
      
      On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
      sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
      100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
      bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().
      
      Those would be enough to append our 14 bytes header, but we're going to
      align that to 16 bytes, and write 2 bytes out of the allocated slab in
      neigh_hh_output().
      
      KASan says:
      
      [  264.967848] ==================================================================
      [  264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
      [  264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201
      [  264.967870]
      [  264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
      [  264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
      [  264.967887] Call Trace:
      [  264.967896] ([<00000000001347d6>] show_stack+0x56/0xa0)
      [  264.967903]  [<00000000017e379c>] dump_stack+0x23c/0x290
      [  264.967912]  [<00000000007bc594>] print_address_description+0xf4/0x290
      [  264.967919]  [<00000000007bc8fc>] kasan_report+0x13c/0x240
      [  264.967927]  [<000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70
      [  264.967935]  [<000000000163f890>] ip6_finish_output+0x430/0x7f0
      [  264.967943]  [<000000000163fe44>] ip6_output+0x1f4/0x580
      [  264.967953]  [<000000000163882a>] ip6_xmit+0xfea/0x1ce8
      [  264.967963]  [<00000000017396e2>] inet6_csk_xmit+0x282/0x3f8
      [  264.968033]  [<000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
      [  264.968037]  [<000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
      [  264.968041]  [<0000000001220020>] dev_hard_start_xmit+0x268/0x928
      [  264.968069]  [<0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350
      [  264.968071]  [<000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478
      [  264.968075]  [<00000000013d2862>] ip_finish_output2+0xce2/0x11a0
      [  264.968078]  [<00000000013d9b14>] ip_finish_output+0x56c/0x8c8
      [  264.968081]  [<00000000013ddd1e>] ip_output+0x226/0x4c0
      [  264.968083]  [<00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938
      [  264.968100]  [<000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
      [  264.968116]  [<000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
      [  264.968131]  [<000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
      [  264.968146]  [<000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
      [  264.968161]  [<000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
      [  264.968177]  [<000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
      [  264.968192]  [<000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
      [  264.968208]  [<000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
      [  264.968212]  [<0000000001197942>] __sys_connect+0x21a/0x450
      [  264.968215]  [<000000000119aff8>] sys_socketcall+0x3d0/0xb08
      [  264.968218]  [<000000000184ea7a>] system_call+0x2a2/0x2c0
      
      [...]
      
      Just like ip_finish_output2() does for IPv4, check that we have enough
      headroom in ip6_xmit(), and reallocate it if we don't.
      
      This issue is older than git history.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66033f47
    • S
      ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output · 1b4e5ad5
      Shmulik Ladkani 提交于
      In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
      initialization.
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b4e5ad5
  7. 07 12月, 2018 1 次提交
  8. 06 12月, 2018 2 次提交
    • J
      ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes · ebaf39e6
      Jiri Wiesner 提交于
      The *_frag_reasm() functions are susceptible to miscalculating the byte
      count of packet fragments in case the truesize of a head buffer changes.
      The truesize member may be changed by the call to skb_unclone(), leaving
      the fragment memory limit counter unbalanced even if all fragments are
      processed. This miscalculation goes unnoticed as long as the network
      namespace which holds the counter is not destroyed.
      
      Should an attempt be made to destroy a network namespace that holds an
      unbalanced fragment memory limit counter the cleanup of the namespace
      never finishes. The thread handling the cleanup gets stuck in
      inet_frags_exit_net() waiting for the percpu counter to reach zero. The
      thread is usually in running state with a stacktrace similar to:
      
       PID: 1073   TASK: ffff880626711440  CPU: 1   COMMAND: "kworker/u48:4"
        #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
        #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
        #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
        #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
        #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
       #10 [ffff880621563e38] process_one_work at ffffffff81096f14
      
      It is not possible to create new network namespaces, and processes
      that call unshare() end up being stuck in uninterruptible sleep state
      waiting to acquire the net_mutex.
      
      The bug was observed in the IPv6 netfilter code by Per Sundstrom.
      I thank him for his analysis of the problem. The parts of this patch
      that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.
      Signed-off-by: NJiri Wiesner <jwiesner@suse.com>
      Reported-by: NPer Sundstrom <per.sundstrom@redqube.se>
      Acked-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebaf39e6
    • E
      net: use skb_list_del_init() to remove from RX sublists · 22f6bbb7
      Edward Cree 提交于
      list_del() leaves the skb->next pointer poisoned, which can then lead to
       a crash in e.g. OVS forwarding.  For example, setting up an OVS VXLAN
       forwarding bridge on sfc as per:
      
      ========
      $ ovs-vsctl show
      5dfd9c47-f04b-4aaa-aa96-4fbb0a522a30
          Bridge "br0"
              Port "br0"
                  Interface "br0"
                      type: internal
              Port "enp6s0f0"
                  Interface "enp6s0f0"
              Port "vxlan0"
                  Interface "vxlan0"
                      type: vxlan
                      options: {key="1", local_ip="10.0.0.5", remote_ip="10.0.0.4"}
          ovs_version: "2.5.0"
      ========
      (where 10.0.0.5 is an address on enp6s0f1)
      and sending traffic across it will lead to the following panic:
      ========
      general protection fault: 0000 [#1] SMP PTI
      CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc3-ehc+ #701
      Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013
      RIP: 0010:dev_hard_start_xmit+0x38/0x200
      Code: 53 48 89 fb 48 83 ec 20 48 85 ff 48 89 54 24 08 48 89 4c 24 18 0f 84 ab 01 00 00 48 8d 86 90 00 00 00 48 89 f5 48 89 44 24 10 <4c> 8b 33 48 c7 03 00 00 00 00 48 8b 05 c7 d1 b3 00 4d 85 f6 0f 95
      RSP: 0018:ffff888627b437e0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff88862279c000
      RDX: ffff888614a342c0 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff888618a88000 R08: 0000000000000001 R09: 00000000000003e8
      R10: 0000000000000000 R11: ffff888614a34140 R12: 0000000000000000
      R13: 0000000000000062 R14: dead000000000100 R15: ffff888616430000
      FS:  0000000000000000(0000) GS:ffff888627b40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f6d2bc6d000 CR3: 000000000200a000 CR4: 00000000000006e0
      Call Trace:
       <IRQ>
       __dev_queue_xmit+0x623/0x870
       ? masked_flow_lookup+0xf7/0x220 [openvswitch]
       ? ep_poll_callback+0x101/0x310
       do_execute_actions+0xaba/0xaf0 [openvswitch]
       ? __wake_up_common+0x8a/0x150
       ? __wake_up_common_lock+0x87/0xc0
       ? queue_userspace_packet+0x31c/0x5b0 [openvswitch]
       ovs_execute_actions+0x47/0x120 [openvswitch]
       ovs_dp_process_packet+0x7d/0x110 [openvswitch]
       ovs_vport_receive+0x6e/0xd0 [openvswitch]
       ? dst_alloc+0x64/0x90
       ? rt_dst_alloc+0x50/0xd0
       ? ip_route_input_slow+0x19a/0x9a0
       ? __udp_enqueue_schedule_skb+0x198/0x1b0
       ? __udp4_lib_rcv+0x856/0xa30
       ? __udp4_lib_rcv+0x856/0xa30
       ? cpumask_next_and+0x19/0x20
       ? find_busiest_group+0x12d/0xcd0
       netdev_frame_hook+0xce/0x150 [openvswitch]
       __netif_receive_skb_core+0x205/0xae0
       __netif_receive_skb_list_core+0x11e/0x220
       netif_receive_skb_list+0x203/0x460
       ? __efx_rx_packet+0x335/0x5e0 [sfc]
       efx_poll+0x182/0x320 [sfc]
       net_rx_action+0x294/0x3c0
       __do_softirq+0xca/0x297
       irq_exit+0xa6/0xb0
       do_IRQ+0x54/0xd0
       common_interrupt+0xf/0xf
       </IRQ>
      ========
      So, in all listified-receive handling, instead pull skbs off the lists with
       skb_list_del_init().
      
      Fixes: 9af86f93 ("net: core: fix use-after-free in __netif_receive_skb_list_core")
      Fixes: 7da517a3 ("net: core: Another step of skb receive list processing")
      Fixes: a4ca8b7d ("net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()")
      Fixes: d8269e2c ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22f6bbb7
  9. 05 12月, 2018 1 次提交
  10. 04 12月, 2018 3 次提交
    • W
      udp: elide zerocopy operation in hot path · 52900d22
      Willem de Bruijn 提交于
      With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
      Release of its last reference triggers a completion notification.
      
      The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
      the skbs, because it can build, send and free skbs within its loop,
      possibly reaching refcount zero and freeing the ubuf_info too soon.
      
      The UDP stack currently also takes this extra ref, but does not need
      it as all skbs are sent after return from __ip(6)_append_data.
      
      Avoid the extra refcount_inc and refcount_dec_and_test, and generally
      the sock_zerocopy_put in the common path, by passing the initial
      reference to the first skb.
      
      This approach is taken instead of initializing the refcount to 0, as
      that would generate error "refcount_t: increment on 0" on the
      next skb_zcopy_set.
      
      Changes
        v3 -> v4
          - Move skb_zcopy_set below the only kfree_skb that might cause
            a premature uarg destroy before skb_zerocopy_put_abort
            - Move the entire skb_shinfo assignment block, to keep that
              cacheline access in one place
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52900d22
    • W
      udp: msg_zerocopy · b5947e5d
      Willem de Bruijn 提交于
      Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
      interpret flag MSG_ZEROCOPY.
      
      This patch was previously part of the zerocopy RFC patchsets. Zerocopy
      is not effective at small MTU. With segmentation offload building
      larger datagrams, the benefit of page flipping outweights the cost of
      generating a completion notification.
      
      tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
      test patch and making skb_orphan_frags_rx same as skb_orphan_frags:
      
          ipv4 udp -t 1
          tx=191312 (11938 MB) txc=0 zc=n
          rx=191312 (11938 MB)
          ipv4 udp -z -t 1
          tx=304507 (19002 MB) txc=304507 zc=y
          rx=304507 (19002 MB)
          ok
          ipv6 udp -t 1
          tx=174485 (10888 MB) txc=0 zc=n
          rx=174485 (10888 MB)
          ipv6 udp -z -t 1
          tx=294801 (18396 MB) txc=294801 zc=y
          rx=294801 (18396 MB)
          ok
      
      Changes
        v1 -> v2
          - Fixup reverse christmas tree violation
        v2 -> v3
          - Split refcount avoidance optimization into separate patch
            - Fix refcount leak on error in fragmented case
              (thanks to Paolo Abeni for pointing this one out!)
            - Fix refcount inc on zero
            - Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
              This is needed since commit 5cf4a853 ("tcp: really ignore
      	MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5947e5d
    • A
      udp_tunnel: add config option to bind to a device · da5095d0
      Alexis Bauvin 提交于
      UDP tunnel sockets are always opened unbound to a specific device. This
      patch allow the socket to be bound on a custom device, which
      incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.
      Signed-off-by: NAlexis Bauvin <abauvin@scaleway.com>
      Reviewed-by: NAmine Kherbouche <akherbouche@scaleway.com>
      Tested-by: NAmine Kherbouche <akherbouche@scaleway.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da5095d0
  11. 27 11月, 2018 3 次提交
    • T
      netfilter: nat: fix double register in masquerade modules · 095faf45
      Taehee Yoo 提交于
      There is a reference counter to ensure that masquerade modules register
      notifiers only once. However, the existing reference counter approach is
      not safe, test commands are:
      
         while :
         do
         	   modprobe ip6t_MASQUERADE &
      	   modprobe nft_masq_ipv6 &
      	   modprobe -rv ip6t_MASQUERADE &
      	   modprobe -rv nft_masq_ipv6 &
         done
      
      numbers below represent the reference counter.
      --------------------------------------------------------
      CPU0        CPU1        CPU2        CPU3        CPU4
      [insmod]    [insmod]    [rmmod]     [rmmod]     [insmod]
      --------------------------------------------------------
      0->1
      register    1->2
                  returns     2->1
      			returns     1->0
                                                      0->1
                                                      register <--
                                          unregister
      --------------------------------------------------------
      
      The unregistation of CPU3 should be processed before the
      registration of CPU4.
      
      In order to fix this, use a mutex instead of reference counter.
      
      splat looks like:
      [  323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381]
      [  323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n]
      [  323.869574] irq event stamp: 194074
      [  323.898930] hardirqs last  enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c
      [  323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  323.898930] softirqs last  enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b
      [  323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0
      [  323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27
      [  323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240
      [  323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488
      [  323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
      [  323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000
      [  323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0
      [  323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001
      [  323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000
      [  323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0
      [  323.898930] FS:  00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      [  323.898930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0
      [  323.898930] Call Trace:
      [  323.898930]  ? atomic_notifier_chain_register+0x2d0/0x2d0
      [  323.898930]  ? down_read+0x150/0x150
      [  323.898930]  ? sched_clock_cpu+0x126/0x170
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  register_netdevice_notifier+0xbb/0x790
      [  323.898930]  ? __dev_close_many+0x2d0/0x2d0
      [  323.898930]  ? __mutex_unlock_slowpath+0x17f/0x740
      [  323.898930]  ? wait_for_completion+0x710/0x710
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  ? up_write+0x6c/0x210
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  324.127073]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  324.127073]  nft_chain_filter_init+0x1e/0xe8a [nf_tables]
      [  324.127073]  nf_tables_module_init+0x37/0x92 [nf_tables]
      [ ... ]
      
      Fixes: 8dd33cc9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables")
      Fixes: be6b635c ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      095faf45
    • T
      netfilter: add missing error handling code for register functions · 584eab29
      Taehee Yoo 提交于
      register_{netdevice/inetaddr/inet6addr}_notifier may return an error
      value, this patch adds the code to handle these error paths.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      584eab29
    • A
      netfilter: ipv6: Preserve link scope traffic original oif · 508b0904
      Alin Nastac 提交于
      When ip6_route_me_harder is invoked, it resets outgoing interface of:
        - link-local scoped packets sent by neighbor discovery
        - multicast packets sent by MLD host
        - multicast packets send by MLD proxy daemon that sets outgoing
          interface through IPV6_PKTINFO ipi6_ifindex
      
      Link-local and multicast packets must keep their original oif after
      ip6_route_me_harder is called.
      Signed-off-by: NAlin Nastac <alin.nastac@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      508b0904
  12. 25 11月, 2018 1 次提交
    • W
      net: always initialize pagedlen · aba36930
      Willem de Bruijn 提交于
      In ip packet generation, pagedlen is initialized for each skb at the
      start of the loop in __ip(6)_append_data, before label alloc_new_skb.
      
      Depending on compiler options, code can be generated that jumps to
      this label, triggering use of an an uninitialized variable.
      
      In practice, at -O2, the generated code moves the initialization below
      the label. But the code should not rely on that for correctness.
      
      Fixes: 15e36f5b ("udp: paged allocation with gso")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aba36930
  13. 24 11月, 2018 1 次提交
    • H
      net/ipv6: re-do dad when interface has IFF_NOARP flag change · 896585d4
      Hangbin Liu 提交于
      When we add a new IPv6 address, we should also join corresponding solicited-node
      multicast address, unless the interface has IFF_NOARP flag, as function
      addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do
      not do dad and add the mcast address. So we will drop corresponding neighbour
      discovery message that came from other nodes.
      
      A typical example is after creating a ipvlan with mode l3, setting up an ipv6
      address and changing the mode to l2. Then we will not be able to ping this
      address as the interface doesn't join related solicited-node mcast address.
      
      Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add
      corresponding mcast group and check if there is a duplicate address on the
      network.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      896585d4
  14. 19 11月, 2018 1 次提交
  15. 17 11月, 2018 2 次提交
  16. 10 11月, 2018 1 次提交
  17. 09 11月, 2018 5 次提交
    • S
      fou, fou6: ICMP error handlers for FoU and GUE · b8a51b38
      Stefano Brivio 提交于
      As the destination port in FoU and GUE receiving sockets doesn't
      necessarily match the remote destination port, we can't associate errors
      to the encapsulating tunnels with a socket lookup -- we need to blindly
      try them instead. This means we don't even know if we are handling errors
      for FoU or GUE without digging into the packets.
      
      Hence, implement a single handler for both, one for IPv4 and one for IPv6,
      that will check whether the packet that generated the ICMP error used a
      direct IP encapsulation or if it had a GUE header, and send the error to
      the matching protocol handler, if any.
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8a51b38
    • S
      udp: Support for error handlers of tunnels with arbitrary destination port · e7cc0824
      Stefano Brivio 提交于
      ICMP error handling is currently not possible for UDP tunnels not
      employing a receiving socket with local destination port matching the
      remote one, because we have no way to look them up.
      
      Add an err_handler tunnel encapsulation operation that can be exported by
      tunnels in order to pass the error to the protocol implementing the
      encapsulation. We can't easily use a lookup function as we did for VXLAN
      and GENEVE, as protocol error handlers, which would be in turn called by
      implementations of this new operation, handle the errors themselves,
      together with the tunnel lookup.
      
      Without a socket, we can't be sure which encapsulation error handler is
      the appropriate one: encapsulation handlers (the ones for FoU and GUE
      introduced in the next patch, e.g.) will need to check the new error codes
      returned by protocol handlers to figure out if errors match the given
      encapsulation, and, in turn, report this error back, so that we can try
      all of them in __udp{4,6}_lib_err_encap_no_sk() until we have a match.
      
      v2:
      - Name all arguments in err_handler prototypes (David Miller)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7cc0824
    • S
      net: Convert protocol error handlers from void to int · 32bbd879
      Stefano Brivio 提交于
      We'll need this to handle ICMP errors for tunnels without a sending socket
      (i.e. FoU and GUE). There, we might have to look up different types of IP
      tunnels, registered as network protocols, before we get a match, so we
      want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
      inet_protos and inet6_protos. These error codes will be used in the next
      patch.
      
      For consistency, return sensible error codes in protocol error handlers
      whenever handlers can't handle errors because, even if valid, they don't
      match a protocol or any of its states.
      
      This has no effect on existing error handling paths.
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32bbd879
    • S
      udp: Handle ICMP errors for tunnels with same destination port on both endpoints · a36e185e
      Stefano Brivio 提交于
      For both IPv4 and IPv6, if we can't match errors to a socket, try
      tunnels before ignoring them. Look up a socket with the original source
      and destination ports as found in the UDP packet inside the ICMP payload,
      this will work for tunnels that force the same destination port for both
      endpoints, i.e. VXLAN and GENEVE.
      
      Actually, lwtunnels could break this assumption if they are configured by
      an external control plane to have different destination ports on the
      endpoints: in this case, we won't be able to trace ICMP messages back to
      them.
      
      For IPv6 redirect messages, call ip6_redirect() directly with the output
      interface argument set to the interface we received the packet from (as
      it's the very interface we should build the exception on), otherwise the
      new nexthop will be rejected. There's no such need for IPv4.
      
      Tunnels can now export an encap_err_lookup() operation that indicates a
      match. Pass the packet to the lookup function, and if the tunnel driver
      reports a matching association, continue with regular ICMP error handling.
      
      v2:
      - Added newline between network and transport header sets in
        __udp{4,6}_lib_err_encap() (David Miller)
      - Removed redundant skb_reset_network_header(skb); in
        __udp4_lib_err_encap()
      - Removed redundant reassignment of iph in __udp4_lib_err_encap()
        (Sabrina Dubroca)
      - Edited comment to __udp{4,6}_lib_err_encap() to reflect the fact this
        won't work with lwtunnels configured to use asymmetric ports. By the way,
        it's VXLAN, not VxLAN (Jiri Benc)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a36e185e
    • L
      net/ipv6: compute anycast address hash only if dev is null · 1c51dc9a
      Li RongQing 提交于
      avoid to compute the hash value if dev is not null, since
      hash value is not used
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c51dc9a
  18. 08 11月, 2018 9 次提交