1. 26 9月, 2015 2 次提交
  2. 18 9月, 2015 7 次提交
    • F
      ipv6: ip6_fragment: fix headroom tests and skb leak · 1d325d21
      Florian Westphal 提交于
      David Woodhouse reports skb_under_panic when we try to push ethernet
      header to fragmented ipv6 skbs:
      
       skbuff: skb_under_panic: text:c1277f1e len:1294 put:14 head:dec98000
       data:dec97ffc tail:0xdec9850a end:0xdec98f40 dev:br-lan
      [..]
      ip6_finish_output2+0x196/0x4da
      
      David further debugged this:
        [..] offending fragments were arriving here with skb_headroom(skb)==10.
        Which is reasonable, being the Solos ADSL card's header of 8 bytes
        followed by 2 bytes of PPP frame type.
      
      The problem is that if netfilter ipv6 defragmentation is used, skb_cow()
      in ip6_forward will only see reassembled skb.
      
      Therefore, headroom is overestimated by 8 bytes (we pulled fragment
      header) and we don't check the skbs in the frag_list either.
      
      We can't do these checks in netfilter defrag since outdev isn't known yet.
      
      Furthermore, existing tests in ip6_fragment did not consider the fragment
      or ipv6 header size when checking headroom of the fraglist skbs.
      
      While at it, also fix a skb leak on memory allocation -- ip6_fragment
      must consume the skb.
      
      I tested this e1000 driver hacked to not allocate additional headroom
      (we end up in slowpath, since LL_RESERVED_SPACE is 16).
      
      If 2 bytes of headroom are allocated, fastpath is taken (14 byte
      ethernet header was pulled, so 16 byte headroom available in all
      fragments).
      Reported-by: NDavid Woodhouse <dwmw2@infradead.org>
      Diagnosed-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Tested-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d325d21
    • E
      netfilter: Add blank lines in callers of netfilter hooks · be10de0a
      Eric W. Biederman 提交于
      In code review it was noticed that I had failed to add some blank lines
      in places where they are customarily used.  Taking a second look at the
      code I have to agree blank lines would be nice so I have added them
      here.
      Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be10de0a
    • E
      netfilter: Pass net into okfn · 0c4b51f0
      Eric W. Biederman 提交于
      This is immediately motivated by the bridge code that chains functions that
      call into netfilter.  Without passing net into the okfns the bridge code would
      need to guess about the best expression for the network namespace to process
      packets in.
      
      As net is frequently one of the first things computed in continuation functions
      after netfilter has done it's job passing in the desired network namespace is in
      many cases a code simplification.
      
      To support this change the function dst_output_okfn is introduced to
      simplify passing dst_output as an okfn.  For the moment dst_output_okfn
      just silently drops the struct net.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c4b51f0
    • E
      netfilter: Pass struct net into the netfilter hooks · 29a26a56
      Eric W. Biederman 提交于
      Pass a network namespace parameter into the netfilter hooks.  At the
      call site of the netfilter hooks the path a packet is taking through
      the network stack is well known which allows the network namespace to
      be easily and reliabily.
      
      This allows the replacement of magic code like
      "dev_net(state->in?:state->out)" that appears at the start of most
      netfilter hooks with "state->net".
      
      In almost all cases the network namespace passed in is derived
      from the first network device passed in, guaranteeing those
      paths will not see any changes in practice.
      
      The exceptions are:
      xfrm/xfrm_output.c:xfrm_output_resume()         xs_net(skb_dst(skb)->xfrm)
      ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont()      ip_vs_conn_net(cp)
      ipvs/ip_vs_xmit.c:ip_vs_send_or_cont()          ip_vs_conn_net(cp)
      ipv4/raw.c:raw_send_hdrinc()                    sock_net(sk)
      ipv6/ip6_output.c:ip6_xmit()			sock_net(sk)
      ipv6/ndisc.c:ndisc_send_skb()                   dev_net(skb->dev) not dev_net(dst->dev)
      ipv6/raw.c:raw6_send_hdrinc()                   sock_net(sk)
      br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev
      
      In all cases these exceptions seem to be a better expression for the
      network namespace the packet is being processed in then the historic
      "dev_net(in?in:out)".  I am documenting them in case something odd
      pops up and someone starts trying to track down what happened.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29a26a56
    • E
      ipv6: Cache net in ip6_output · 19a0644c
      Eric W. Biederman 提交于
      Keep net in a local variable so I can use it in NF_HOOK_COND
      when I pass struct net to all of the netfilter hooks.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19a0644c
    • E
    • E
      net: Merge dst_output and dst_output_sk · 5a70649e
      Eric W. Biederman 提交于
      Add a sock paramter to dst_output making dst_output_sk superfluous.
      Add a skb->sk parameter to all of the callers of dst_output
      Have the callers of dst_output_sk call dst_output.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a70649e
  3. 01 8月, 2015 2 次提交
  4. 21 7月, 2015 1 次提交
    • P
      net/ipv6: update flowi6_oif in ip6_dst_lookup_flow if not set · a0a9f33b
      Phil Sutter 提交于
      Newly created flows don't have flowi6_oif set (at least if the
      associated socket is not interface-bound). This leads to a mismatch in
      __xfrm6_selector_match() for policies which specify an interface in the
      selector (sel->ifindex != 0).
      
      Backtracing shows this happens in code-paths originating from e.g.
      ip6_datagram_connect(), rawv6_sendmsg() or tcp_v6_connect(). (UDP was
      not tested for.)
      
      In summary, this patch fixes policy matching on outgoing interface for
      locally generated packets.
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0a9f33b
  5. 26 5月, 2015 4 次提交
    • F
      ipv6: don't increase size when refragmenting forwarded ipv6 skbs · 485fca66
      Florian Westphal 提交于
      since commit 6aafeef0 ("netfilter: push reasm skb through instead of
      original frag skbs") we will end up sometimes re-fragmenting skbs
      that we've reassembled.
      
      ipv6 defrag preserves the original skbs using the skb frag list, i.e. as long
      as the skb frag list is preserved there is no problem since we keep
      original geometry of fragments intact.
      
      However, in the rare case where the frag list is munged or skb
      is linearized, we might send larger fragments than what we originally
      received.
      
      A router in the path might then send packet-too-big errors even if
      sender never sent fragments exceeding the reported mtu:
      
      mtu 1500 - 1500:1400 - 1400:1280 - 1280
           A         R1         R2        B
      
      1 - A sends to B, fragment size 1400
      2 - R2 sends pkttoobig error for 1280
      3 - A sends to B, fragment size 1280
      4 - R2 sends pkttoobig error for 1280 again because it sees fragments of size 1400.
      
      make sure ip6_fragment always caps MTU at largest packet size seen
      when defragmented skb is forwarded.
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      485fca66
    • M
      ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST · 2647a9b0
      Martin KaFai Lau 提交于
      When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst.
      Also, rt6i_gateway is always set to the nexthop while the nexthop
      could be a gateway or the rt6i_dst.addr.
      
      After removing the rt6i_dst and rt6i_src dependency in the last patch,
      we also need to stop the caller from depending on rt6i_gateway and
      RTF_ANYCAST.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2647a9b0
    • M
      ipv6: Remove external dependency on rt6i_dst and rt6i_src · fd0273d7
      Martin KaFai Lau 提交于
      This patch removes the assumptions that the returned rt is always
      a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the
      destination and source address.  The dst and src can be recovered from
      the calling site.
      
      We may consider to rename (rt6i_dst, rt6i_src) to
      (rt6i_key_dst, rt6i_key_src) later.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd0273d7
    • M
      ipv6: Clean up ipv6_select_ident() and ip6_fragment() · 286c2349
      Martin KaFai Lau 提交于
      This patch changes the ipv6_select_ident() signature to return a
      fragment id instead of taking a whole frag_hdr as a param to
      only set the frag_hdr->identification.
      
      It also cleans up ip6_fragment() to obtain the fragment id at the
      beginning instead of using multiple "if" later to check fragment id
      has been generated or not.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      286c2349
  6. 15 5月, 2015 1 次提交
  7. 10 5月, 2015 1 次提交
    • M
      ipv6: Fixed source specific default route handling. · e16e888b
      Markus Stenberg 提交于
      If there are only IPv6 source specific default routes present, the
      host gets -ENETUNREACH on e.g. connect() because ip6_dst_lookup_tail
      calls ip6_route_output first, and given source address any, it fails,
      and ip6_route_get_saddr is never called.
      
      The change is to use the ip6_route_get_saddr, even if the initial
      ip6_route_output fails, and then doing ip6_route_output _again_ after
      we have appropriate source address available.
      
      Note that this is '99% fix' to the problem; a correct fix would be to
      do route lookups only within addrconf.c when picking a source address,
      and never call ip6_route_output before source address has been
      populated.
      Signed-off-by: NMarkus Stenberg <markus.stenberg@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e16e888b
  8. 08 4月, 2015 1 次提交
    • D
      netfilter: Pass socket pointer down through okfn(). · 7026b1dd
      David Miller 提交于
      On the output paths in particular, we have to sometimes deal with two
      socket contexts.  First, and usually skb->sk, is the local socket that
      generated the frame.
      
      And second, is potentially the socket used to control a tunneling
      socket, such as one the encapsulates using UDP.
      
      We do not want to disassociate skb->sk when encapsulating in order
      to fix this, because that would break socket memory accounting.
      
      The most extreme case where this can cause huge problems is an
      AF_PACKET socket transmitting over a vxlan device.  We hit code
      paths doing checks that assume they are dealing with an ipv4
      socket, but are actually operating upon the AF_PACKET one.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7026b1dd
  9. 07 4月, 2015 1 次提交
    • H
      ipv6: protect skb->sk accesses from recursive dereference inside the stack · f60e5990
      hannes@stressinduktion.org 提交于
      We should not consult skb->sk for output decisions in xmit recursion
      levels > 0 in the stack. Otherwise local socket settings could influence
      the result of e.g. tunnel encapsulation process.
      
      ipv6 does not conform with this in three places:
      
      1) ip6_fragment: we do consult ipv6_npinfo for frag_size
      
      2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
         loop the packet back to the local socket
      
      3) ip6_skb_dst_mtu could query the settings from the user socket and
         force a wrong MTU
      
      Furthermore:
      In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
      PF_PACKET socket ontop of an IPv6-backed vxlan device.
      
      Reuse xmit_recursion as we are currently only interested in protecting
      tunnel devices.
      
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f60e5990
  10. 01 4月, 2015 2 次提交
  11. 26 3月, 2015 1 次提交
  12. 12 3月, 2015 1 次提交
  13. 03 3月, 2015 1 次提交
    • M
      udp: only allow UFO for packets from SOCK_DGRAM sockets · acf8dd0a
      Michal Kubeček 提交于
      If an over-MTU UDP datagram is sent through a SOCK_RAW socket to a
      UFO-capable device, ip_ufo_append_data() sets skb->ip_summed to
      CHECKSUM_PARTIAL unconditionally as all GSO code assumes transport layer
      checksum is to be computed on segmentation. However, in this case,
      skb->csum_start and skb->csum_offset are never set as raw socket
      transmit path bypasses udp_send_skb() where they are usually set. As a
      result, driver may access invalid memory when trying to calculate the
      checksum and store the result (as observed in virtio_net driver).
      
      Moreover, the very idea of modifying the userspace provided UDP header
      is IMHO against raw socket semantics (I wasn't able to find a document
      clearly stating this or the opposite, though). And while allowing
      CHECKSUM_NONE in the UFO case would be more efficient, it would be a bit
      too intrusive change just to handle a corner case like this. Therefore
      disallowing UFO for packets from SOCK_DGRAM seems to be the best option.
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acf8dd0a
  14. 12 2月, 2015 1 次提交
  15. 04 2月, 2015 1 次提交
    • V
      ipv6: Select fragment id during UFO segmentation if not set. · 0508c07f
      Vlad Yasevich 提交于
      If the IPv6 fragment id has not been set and we perform
      fragmentation due to UFO, select a new fragment id.
      We now consider a fragment id of 0 as unset and if id selection
      process returns 0 (after all the pertrubations), we set it to
      0x80000000, thus giving us ample space not to create collisions
      with the next packet we may have to fragment.
      
      When doing UFO integrity checking, we also select the
      fragment id if it has not be set yet.   This is stored into
      the skb_shinfo() thus allowing UFO to function correclty.
      
      This patch also removes duplicate fragment id generation code
      and moves ipv6_select_ident() into the header as it may be
      used during GSO.
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0508c07f
  16. 03 2月, 2015 4 次提交
  17. 24 11月, 2014 1 次提交
  18. 07 11月, 2014 1 次提交
  19. 16 9月, 2014 1 次提交
  20. 10 9月, 2014 1 次提交
  21. 25 8月, 2014 2 次提交
  22. 06 8月, 2014 1 次提交
    • W
      net-timestamp: add key to disambiguate concurrent datagrams · 09c2d251
      Willem de Bruijn 提交于
      Datagrams timestamped on transmission can coexist in the kernel stack
      and be reordered in packet scheduling. When reading looped datagrams
      from the socket error queue it is not always possible to unique
      correlate looped data with original send() call (for application
      level retransmits). Even if possible, it may be expensive and complex,
      requiring packet inspection.
      
      Introduce a data-independent ID mechanism to associate timestamps with
      send calls. Pass an ID alongside the timestamp in field ee_data of
      sock_extended_err.
      
      The ID is a simple 32 bit unsigned int that is associated with the
      socket and incremented on each send() call for which software tx
      timestamp generation is enabled.
      
      The feature is enabled only if SOF_TIMESTAMPING_OPT_ID is set, to
      avoid changing ee_data for existing applications that expect it 0.
      The counter is reset each time the flag is reenabled. Reenabling
      does not change the ID of already submitted data. It is possible
      to receive out of order IDs if the timestamp stream is not quiesced
      first.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09c2d251
  23. 29 7月, 2014 1 次提交
    • E
      ip: make IP identifiers less predictable · 04ca6973
      Eric Dumazet 提交于
      In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
      Jedidiah describe ways exploiting linux IP identifier generation to
      infer whether two machines are exchanging packets.
      
      With commit 73f156a6 ("inetpeer: get rid of ip_id_count"), we
      changed IP id generation, but this does not really prevent this
      side-channel technique.
      
      This patch adds a random amount of perturbation so that IP identifiers
      for a given destination [1] are no longer monotonically increasing after
      an idle period.
      
      Note that prandom_u32_max(1) returns 0, so if generator is used at most
      once per jiffy, this patch inserts no hole in the ID suite and do not
      increase collision probability.
      
      This is jiffies based, so in the worst case (HZ=1000), the id can
      rollover after ~65 seconds of idle time, which should be fine.
      
      We also change the hash used in __ip_select_ident() to not only hash
      on daddr, but also saddr and protocol, so that ICMP probes can not be
      used to infer information for other protocols.
      
      For IPv6, adds saddr into the hash as well, but not nexthdr.
      
      If I ping the patched target, we can see ID are now hard to predict.
      
      21:57:11.008086 IP (...)
          A > target: ICMP echo request, seq 1, length 64
      21:57:11.010752 IP (... id 2081 ...)
          target > A: ICMP echo reply, seq 1, length 64
      
      21:57:12.013133 IP (...)
          A > target: ICMP echo request, seq 2, length 64
      21:57:12.015737 IP (... id 3039 ...)
          target > A: ICMP echo reply, seq 2, length 64
      
      21:57:13.016580 IP (...)
          A > target: ICMP echo request, seq 3, length 64
      21:57:13.019251 IP (... id 3437 ...)
          target > A: ICMP echo reply, seq 3, length 64
      
      [1] TCP sessions uses a per flow ID generator not changed by this patch.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJeffrey Knockel <jeffk@cs.unm.edu>
      Reported-by: NJedidiah R. Crandall <crandall@cs.unm.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Hannes Frederic Sowa <hannes@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04ca6973
  24. 25 7月, 2014 1 次提交