1. 22 11月, 2019 1 次提交
    • P
      ipv6: introduce and uses route look hints for list input. · 197dbf24
      Paolo Abeni 提交于
      When doing RX batch packet processing, we currently always repeat
      the route lookup for each ingress packet. When no custom rules are
      in place, and there aren't routes depending on source addresses,
      we know that packets with the same destination address will use
      the same dst.
      
      This change tries to avoid per packet route lookup caching
      the destination address of the latest successful lookup, and
      reusing it for the next packet when the above conditions are
      in place. Ingress traffic for most servers should fit.
      
      The measured performance delta under UDP flood vs a recvmmsg
      receiver is as follow:
      
      vanilla		patched		delta
      Kpps		Kpps		%
      1431		1674		+17
      
      In the worst-case scenario - each packet has a different
      destination address - the performance delta is within noise
      range.
      
      v3 -> v4:
       - support hints for SUBFLOW build, too (David A.)
       - several style fixes (Eric)
      
      v2 -> v3:
       - add fib6_has_custom_rules() helpers (David A.)
       - add ip6_extract_route_hint() helper (Edward C.)
       - use hint directly in ip6_list_rcv_finish() (Willem)
      
      v1 -> v2:
       - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES
       - fix potential race when fib6_has_custom_rules is set
         while processing a packet batch
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      197dbf24
  2. 30 10月, 2019 1 次提交
    • F
      inet: do not call sublist_rcv on empty list · 51210ad5
      Florian Westphal 提交于
      syzbot triggered struct net NULL deref in NF_HOOK_LIST:
      RIP: 0010:NF_HOOK_LIST include/linux/netfilter.h:331 [inline]
      RIP: 0010:ip6_sublist_rcv+0x5c9/0x930 net/ipv6/ip6_input.c:292
       ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:328
       __netif_receive_skb_list_ptype net/core/dev.c:5274 [inline]
      
      Reason:
      void ipv6_list_rcv(struct list_head *head, struct packet_type *pt,
                         struct net_device *orig_dev)
      [..]
              list_for_each_entry_safe(skb, next, head, list) {
      		/* iterates list */
                      skb = ip6_rcv_core(skb, dev, net);
      		/* ip6_rcv_core drops skb -> NULL is returned */
                      if (skb == NULL)
                              continue;
      	[..]
      	}
      	/* sublist is empty -> curr_net is NULL */
              ip6_sublist_rcv(&sublist, curr_dev, curr_net);
      
      Before the recent change NF_HOOK_LIST did a list iteration before
      struct net deref, i.e. it was a no-op in the empty list case.
      
      List iteration now happens after *net deref, causing crash.
      
      Follow the same pattern as the ip(v6)_list_rcv loop and add a list_empty
      test for the final sublist dispatch too.
      
      Cc: Edward Cree <ecree@solarflare.com>
      Reported-by: syzbot+c54f457cad330e57e967@syzkaller.appspotmail.com
      Fixes: ca58fbe0 ("netfilter: add and use nf_hook_slow_list()")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Tested-by: NLeon Romanovsky <leonro@mellanox.com>
      Tested-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51210ad5
  3. 03 10月, 2019 1 次提交
    • E
      ipv6: drop incoming packets having a v4mapped source address · 6af1799a
      Eric Dumazet 提交于
      This began with a syzbot report. syzkaller was injecting
      IPv6 TCP SYN packets having a v4mapped source address.
      
      After an unsuccessful 4-tuple lookup, TCP creates a request
      socket (SYN_RECV) and calls reqsk_queue_hash_req()
      
      reqsk_queue_hash_req() calls sk_ehashfn(sk)
      
      At this point we have AF_INET6 sockets, and the heuristic
      used by sk_ehashfn() to either hash the IPv4 or IPv6 addresses
      is to use ipv6_addr_v4mapped(&sk->sk_v6_daddr)
      
      For the particular spoofed packet, we end up hashing V4 addresses
      which were not initialized by the TCP IPv6 stack, so KMSAN fired
      a warning.
      
      I first fixed sk_ehashfn() to test both source and destination addresses,
      but then faced various problems, including user-space programs
      like packetdrill that had similar assumptions.
      
      Instead of trying to fix the whole ecosystem, it is better
      to admit that we have a dual stack behavior, and that we
      can not build linux kernels without V4 stack anyway.
      
      The dual stack API automatically forces the traffic to be IPv4
      if v4mapped addresses are used at bind() or connect(), so it makes
      no sense to allow IPv6 traffic to use the same v4mapped class.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6af1799a
  4. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  5. 24 8月, 2019 1 次提交
    • X
      net: ipv6: fix listify ip6_rcv_finish in case of forwarding · c7a42eb4
      Xin Long 提交于
      We need a similar fix for ipv6 as Commit 0761680d ("net: ipv4: fix
      listify ip_rcv_finish in case of forwarding") does for ipv4.
      
      This issue can be reprocuded by syzbot since Commit 323ebb61 ("net:
      use listified RX for handling GRO_NORMAL skbs") on net-next. The call
      trace was:
      
        kernel BUG at include/linux/skbuff.h:2225!
        RIP: 0010:__skb_pull include/linux/skbuff.h:2225 [inline]
        RIP: 0010:skb_pull+0xea/0x110 net/core/skbuff.c:1902
        Call Trace:
          sctp_inq_pop+0x2f1/0xd80 net/sctp/inqueue.c:202
          sctp_endpoint_bh_rcv+0x184/0x8d0 net/sctp/endpointola.c:385
          sctp_inq_push+0x1e4/0x280 net/sctp/inqueue.c:80
          sctp_rcv+0x2807/0x3590 net/sctp/input.c:256
          sctp6_rcv+0x17/0x30 net/sctp/ipv6.c:1049
          ip6_protocol_deliver_rcu+0x2fe/0x1660 net/ipv6/ip6_input.c:397
          ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:438
          NF_HOOK include/linux/netfilter.h:305 [inline]
          NF_HOOK include/linux/netfilter.h:299 [inline]
          ip6_input+0xe4/0x3f0 net/ipv6/ip6_input.c:447
          dst_input include/net/dst.h:442 [inline]
          ip6_sublist_rcv_finish+0x98/0x1e0 net/ipv6/ip6_input.c:84
          ip6_list_rcv_finish net/ipv6/ip6_input.c:118 [inline]
          ip6_sublist_rcv+0x80c/0xcf0 net/ipv6/ip6_input.c:282
          ipv6_list_rcv+0x373/0x4b0 net/ipv6/ip6_input.c:316
          __netif_receive_skb_list_ptype net/core/dev.c:5049 [inline]
          __netif_receive_skb_list_core+0x5fc/0x9d0 net/core/dev.c:5097
          __netif_receive_skb_list net/core/dev.c:5149 [inline]
          netif_receive_skb_list_internal+0x7eb/0xe60 net/core/dev.c:5244
          gro_normal_list.part.0+0x1e/0xb0 net/core/dev.c:5757
          gro_normal_list net/core/dev.c:5755 [inline]
          gro_normal_one net/core/dev.c:5769 [inline]
          napi_frags_finish net/core/dev.c:5782 [inline]
          napi_gro_frags+0xa6a/0xea0 net/core/dev.c:5855
          tun_get_user+0x2e98/0x3fa0 drivers/net/tun.c:1974
          tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2020
      
      Fixes: d8269e2c ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
      Fixes: 323ebb61 ("net: use listified RX for handling GRO_NORMAL skbs")
      Reported-by: syzbot+eb349eeee854e389c36d@syzkaller.appspotmail.com
      Reported-by: syzbot+4a0643a653ac375612d1@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7a42eb4
  6. 31 5月, 2019 1 次提交
  7. 06 5月, 2019 2 次提交
  8. 06 12月, 2018 1 次提交
    • E
      net: use skb_list_del_init() to remove from RX sublists · 22f6bbb7
      Edward Cree 提交于
      list_del() leaves the skb->next pointer poisoned, which can then lead to
       a crash in e.g. OVS forwarding.  For example, setting up an OVS VXLAN
       forwarding bridge on sfc as per:
      
      ========
      $ ovs-vsctl show
      5dfd9c47-f04b-4aaa-aa96-4fbb0a522a30
          Bridge "br0"
              Port "br0"
                  Interface "br0"
                      type: internal
              Port "enp6s0f0"
                  Interface "enp6s0f0"
              Port "vxlan0"
                  Interface "vxlan0"
                      type: vxlan
                      options: {key="1", local_ip="10.0.0.5", remote_ip="10.0.0.4"}
          ovs_version: "2.5.0"
      ========
      (where 10.0.0.5 is an address on enp6s0f1)
      and sending traffic across it will lead to the following panic:
      ========
      general protection fault: 0000 [#1] SMP PTI
      CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc3-ehc+ #701
      Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013
      RIP: 0010:dev_hard_start_xmit+0x38/0x200
      Code: 53 48 89 fb 48 83 ec 20 48 85 ff 48 89 54 24 08 48 89 4c 24 18 0f 84 ab 01 00 00 48 8d 86 90 00 00 00 48 89 f5 48 89 44 24 10 <4c> 8b 33 48 c7 03 00 00 00 00 48 8b 05 c7 d1 b3 00 4d 85 f6 0f 95
      RSP: 0018:ffff888627b437e0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff88862279c000
      RDX: ffff888614a342c0 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff888618a88000 R08: 0000000000000001 R09: 00000000000003e8
      R10: 0000000000000000 R11: ffff888614a34140 R12: 0000000000000000
      R13: 0000000000000062 R14: dead000000000100 R15: ffff888616430000
      FS:  0000000000000000(0000) GS:ffff888627b40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f6d2bc6d000 CR3: 000000000200a000 CR4: 00000000000006e0
      Call Trace:
       <IRQ>
       __dev_queue_xmit+0x623/0x870
       ? masked_flow_lookup+0xf7/0x220 [openvswitch]
       ? ep_poll_callback+0x101/0x310
       do_execute_actions+0xaba/0xaf0 [openvswitch]
       ? __wake_up_common+0x8a/0x150
       ? __wake_up_common_lock+0x87/0xc0
       ? queue_userspace_packet+0x31c/0x5b0 [openvswitch]
       ovs_execute_actions+0x47/0x120 [openvswitch]
       ovs_dp_process_packet+0x7d/0x110 [openvswitch]
       ovs_vport_receive+0x6e/0xd0 [openvswitch]
       ? dst_alloc+0x64/0x90
       ? rt_dst_alloc+0x50/0xd0
       ? ip_route_input_slow+0x19a/0x9a0
       ? __udp_enqueue_schedule_skb+0x198/0x1b0
       ? __udp4_lib_rcv+0x856/0xa30
       ? __udp4_lib_rcv+0x856/0xa30
       ? cpumask_next_and+0x19/0x20
       ? find_busiest_group+0x12d/0xcd0
       netdev_frame_hook+0xce/0x150 [openvswitch]
       __netif_receive_skb_core+0x205/0xae0
       __netif_receive_skb_list_core+0x11e/0x220
       netif_receive_skb_list+0x203/0x460
       ? __efx_rx_packet+0x335/0x5e0 [sfc]
       efx_poll+0x182/0x320 [sfc]
       net_rx_action+0x294/0x3c0
       __do_softirq+0xca/0x297
       irq_exit+0xa6/0xb0
       do_IRQ+0x54/0xd0
       common_interrupt+0xf/0xf
       </IRQ>
      ========
      So, in all listified-receive handling, instead pull skbs off the lists with
       skb_list_del_init().
      
      Fixes: 9af86f93 ("net: core: fix use-after-free in __netif_receive_skb_list_core")
      Fixes: 7da517a3 ("net: core: Another step of skb receive list processing")
      Fixes: a4ca8b7d ("net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()")
      Fixes: d8269e2c ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22f6bbb7
  9. 08 11月, 2018 2 次提交
  10. 20 9月, 2018 1 次提交
  11. 06 7月, 2018 1 次提交
  12. 18 4月, 2018 1 次提交
  13. 18 4月, 2017 1 次提交
  14. 25 3月, 2017 1 次提交
    • S
      net: Add sysctl to toggle early demux for tcp and udp · dddb64bc
      subashab@codeaurora.org 提交于
      Certain system process significant unconnected UDP workload.
      It would be preferrable to disable UDP early demux for those systems
      and enable it for TCP only.
      
      By disabling UDP demux, we see these slight gains on an ARM64 system-
      782 -> 788Mbps unconnected single stream UDPv4
      633 -> 654Mbps unconnected UDPv4 different sources
      
      The performance impact can change based on CPU architecure and cache
      sizes. There will not much difference seen if entire UDP hash table
      is in cache.
      
      Both sysctls are enabled by default to preserve existing behavior.
      
      v1->v2: Change function pointer instead of adding conditional as
      suggested by Stephen.
      
      v2->v3: Read once in callers to avoid issues due to compiler
      optimizations. Also update commit message with the tests.
      
      v3->v4: Store and use read once result instead of querying pointer
      again incorrectly.
      
      v4->v5: Refactor to avoid errors due to compilation with IPV6={m,n}
      Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dddb64bc
  15. 08 6月, 2016 1 次提交
    • D
      net: vrf: ipv6 support for local traffic to local addresses · b4869aa2
      David Ahern 提交于
      Add support for locally originated traffic to VRF-local IPv6 addresses.
      Similar to IPv4 a local dst is set on the skb and the packet is
      reinserted with a call to netif_rx. With this patch, ping, tcp and udp
      packets to a local IPv6 address are successfully routed:
      
          $ ip addr show dev eth1
          4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master red state UP group default qlen 1000
              link/ether 02:e0:f9:1c:b9:74 brd ff:ff:ff:ff:ff:ff
              inet 10.100.1.1/24 brd 10.100.1.255 scope global eth1
                 valid_lft forever preferred_lft forever
              inet6 2100:1::1/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe1c:b974/64 scope link
                 valid_lft forever preferred_lft forever
      
          $ ping6 -c1 -I red 2100:1::1
          ping6: Warning: source address might be selected on device other than red.
          PING 2100:1::1(2100:1::1) from 2100:1::1 red: 56 data bytes
          64 bytes from 2100:1::1: icmp_seq=1 ttl=64 time=0.098 ms
      
      ip6_input is exported so the VRF driver can use it for the dst input
      function. The dst_alloc function for IPv4 defaults to setting the input and
      output functions; IPv6's does not. VRF does not need to duplicate the Rx path
      so just export the ipv6 input function.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4869aa2
  16. 21 5月, 2016 2 次提交
    • T
      ipv6: Change "final" protocol processing for encapsulation · 1da44f9c
      Tom Herbert 提交于
      When performing foo-over-UDP, UDP packets are processed by the
      encapsulation handler which returns another protocol to process.
      This may result in processing two (or more) protocols in the
      loop that are marked as INET6_PROTO_FINAL. The actions taken
      for hitting a final protocol, in particular the skb_postpull_rcsum
      can only be performed once.
      
      This patch set adds a check of a final protocol has been seen. The
      rules are:
        - If the final protocol has not been seen any protocol is processed
          (final and non-final). In the case of a final protocol, the final
          actions are taken (like the skb_postpull_rcsum)
        - If a final protocol has been seen (e.g. an encapsulating UDP
          header) then no further non-final protocols are allowed
          (e.g. extension headers). For more final protocols the
          final actions are not taken (e.g. skb_postpull_rcsum).
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1da44f9c
    • T
      ipv6: Fix nexthdr for reinjection · 4c64242a
      Tom Herbert 提交于
      In ip6_input_finish the nexthdr protocol is retrieved from the
      next header offset that is returned in the cb of the skb.
      This method does not work for UDP encapsulation that may not
      even have a concept of a nexthdr field (e.g. FOU).
      
      This patch checks for a final protocol (INET6_PROTO_FINAL) when a
      protocol handler returns > 0. If the protocol is not final then
      resubmission is performed on nhoff value. If the protocol is final
      then the nexthdr is taken to be the return value.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c64242a
  17. 12 5月, 2016 1 次提交
    • D
      net: l3mdev: Add hook in ip and ipv6 · 74b20582
      David Ahern 提交于
      Currently the VRF driver uses the rx_handler to switch the skb device
      to the VRF device. Switching the dev prior to the ip / ipv6 layer
      means the VRF driver has to duplicate IP/IPv6 processing which adds
      overhead and makes features such as retaining the ingress device index
      more complicated than necessary.
      
      This patch moves the hook to the L3 layer just after the first NF_HOOK
      for PRE_ROUTING. This location makes exposing the original ingress device
      trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
      in the future.
      
      dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
      with the switched device through the packet taps to maintain current
      behavior (tcpdump can be used on either the vrf device or the enslaved
      devices).
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74b20582
  18. 28 4月, 2016 2 次提交
  19. 17 2月, 2016 1 次提交
  20. 11 2月, 2016 1 次提交
  21. 18 9月, 2015 3 次提交
    • E
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman 提交于
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c4b51f0
    • E
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman 提交于
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29a26a56
    • E
      ipv6: Don't recompute net in ip6_rcv · 9865249f
      Eric W. Biederman 提交于
      Avoid silly redundant code
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9865249f
  22. 27 7月, 2015 1 次提交
    • W
      ipv6: fix crash over flow-based vxlan device · 48fb6b55
      Wei-Chun Chao 提交于
      Similar check was added in ip_rcv but not in ipv6_rcv.
      
      BUG: unable to handle kernel NULL pointer dereference at (null)
      IP: [<ffffffff81734e0a>] ipv6_rcv+0xfa/0x500
      Call Trace:
      [<ffffffff816c9786>] ? ip_rcv+0x296/0x400
      [<ffffffff817732d2>] ? packet_rcv+0x52/0x410
      [<ffffffff8168e99f>] __netif_receive_skb_core+0x63f/0x9a0
      [<ffffffffc02b34a0>] ? br_handle_frame_finish+0x580/0x580 [bridge]
      [<ffffffff8109912c>] ? update_rq_clock.part.81+0x1c/0x40
      [<ffffffff8168ed18>] __netif_receive_skb+0x18/0x60
      [<ffffffff8168fa1f>] process_backlog+0x9f/0x150
      
      Fixes: ee122c79 (vxlan: Flow based tunneling)
      Signed-off-by: NWei-Chun Chao <weichunc@plumgrid.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48fb6b55
  23. 04 7月, 2015 1 次提交
  24. 11 6月, 2015 1 次提交
  25. 09 6月, 2015 1 次提交
    • J
      ipv6: Fix protocol resubmission · 0243508e
      Josh Hunt 提交于
      UDP encapsulation is broken on IPv6. This is because the logic to resubmit
      the nexthdr is inverted, checking for a ret value > 0 instead of < 0. Also,
      the resubmit label is in the wrong position since we already get the
      nexthdr value when performing decapsulation. In addition the skb pull is no
      longer necessary either.
      
      This changes the return value check to look for < 0, using it for the
      nexthdr on the next iteration, and moves the resubmit label to the proper
      location.
      
      With these changes the v6 code now matches what we do in the v4 ip input
      code wrt resubmitting when decapsulating.
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Acked-by: N"Tom Herbert" <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0243508e
  26. 08 4月, 2015 1 次提交
    • D
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller 提交于
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7026b1dd
  27. 01 4月, 2015 1 次提交
  28. 24 11月, 2014 1 次提交
  29. 25 8月, 2014 1 次提交
    • I
      ipv6: White-space cleansing : Line Layouts · 67ba4152
      Ian Morris 提交于
      This patch makes no changes to the logic of the code but simply addresses
      coding style issues as detected by checkpatch.
      
      Both objdump and diff -w show no differences.
      
      A number of items are addressed in this patch:
      * Multiple spaces converted to tabs
      * Spaces before tabs removed.
      * Spaces in pointer typing cleansed (char *)foo etc.
      * Remove space after sizeof
      * Ensure spacing around comparators such as if statements.
      Signed-off-by: NIan Morris <ipm@chirality.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67ba4152
  30. 28 1月, 2014 1 次提交
    • H
      net: Fix memory leak if TPROXY used with TCP early demux · a452ce34
      Holger Eitzenberger 提交于
      I see a memory leak when using a transparent HTTP proxy using TPROXY
      together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):
      
      unreferenced object 0xffff88008cba4a40 (size 1696):
        comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
        hex dump (first 32 bytes):
          0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
          02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9
          [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5
          [<ffffffff812702cf>] sk_clone_lock+0x14/0x283
          [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b
          [<ffffffff8129a893>] netlink_broadcast+0x14/0x16
          [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3
          [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d
          [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0
          [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e
          [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55
          [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725
          [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154
          [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514
          [<ffffffff8127aa77>] process_backlog+0xee/0x1c5
          [<ffffffff8127c949>] net_rx_action+0xa7/0x200
          [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157
      
      But there are many more, resulting in the machine going OOM after some
      days.
      
      From looking at the TPROXY code, and with help from Florian, I see
      that the memory leak is introduced in tcp_v4_early_demux():
      
        void tcp_v4_early_demux(struct sk_buff *skb)
        {
          /* ... */
      
          iph = ip_hdr(skb);
          th = tcp_hdr(skb);
      
          if (th->doff < sizeof(struct tcphdr) / 4)
              return;
      
          sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
                             iph->saddr, th->source,
                             iph->daddr, ntohs(th->dest),
                             skb->skb_iif);
          if (sk) {
              skb->sk = sk;
      
      where the socket is assigned unconditionally to skb->sk, also bumping
      the refcnt on it.  This is problematic, because in our case the skb
      has already a socket assigned in the TPROXY target.  This then results
      in the leak I see.
      
      The very same issue seems to be with IPv6, but haven't tested.
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a452ce34
  31. 09 8月, 2013 1 次提交
    • E
      net: add SNMP counters tracking incoming ECN bits · 1f07d03e
      Eric Dumazet 提交于
      With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP
      counters do not count the number of frames, but number of aggregated
      segments.
      
      Its probably too late to change this now.
      
      This patch adds four new counters, tracking number of frames, regardless
      of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6.
      
      Ip[6]NoECTPkts : Number of packets received with NOECT
      Ip[6]ECT1Pkts  : Number of packets received with ECT(1)
      Ip[6]ECT0Pkts  : Number of packets received with ECT(0)
      Ip[6]CEPkts    : Number of packets received with Congestion Experienced
      
      lph37:~# nstat | egrep "Pkts|InReceive"
      IpInReceives                    1634137            0.0
      Ip6InReceives                   3714107            0.0
      Ip6InNoECTPkts                  19205              0.0
      Ip6InECT0Pkts                   52651828           0.0
      IpExtInNoECTPkts                33630              0.0
      IpExtInECT0Pkts                 15581379           0.0
      IpExtInCEPkts                   6                  0.0
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f07d03e
  32. 30 3月, 2013 1 次提交
  33. 09 3月, 2013 1 次提交
  34. 02 3月, 2013 1 次提交