1. 03 12月, 2016 36 次提交
  2. 02 12月, 2016 4 次提交
    • S
      sock: reset sk_err for ICMP packets read from error queue · 83a1a1a7
      Soheil Hassas Yeganeh 提交于
      Only when ICMP packets are enqueued onto the error queue,
      sk_err is also set. Before f5f99309 (sock: do not set sk_err
      in sock_dequeue_err_skb), a subsequent error queue read
      would set sk_err to the next error on the queue, or 0 if empty.
      As no error types other than ICMP set this field, sk_err should
      not be modified upon dequeuing them.
      
      Only for ICMP errors, reset the (racy) sk_err. Some applications,
      like traceroute, rely on it and go into a futile busy POLLERR
      loop otherwise.
      
      In principle, sk_err has to be set while an ICMP error is queued.
      Testing is_icmp_err_skb(skb_next) approximates this without
      requiring a full queue walk. Applications that receive both ICMP
      and other errors cannot rely on this legacy behavior, as other
      errors do not set sk_err in the first place.
      
      Fixes: f5f99309 (sock: do not set sk_err in sock_dequeue_err_skb)
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83a1a1a7
    • D
      Merge branch 'lwt-bpf' · f577e22c
      David S. Miller 提交于
      Thomas Graf says:
      
      ====================
      bpf: BPF for lightweight tunnel encapsulation
      
      This series implements BPF program invocation from dst entries via the
      lightweight tunnels infrastructure. The BPF program can be attached to
      lwtunnel_input(), lwtunnel_output() or lwtunnel_xmit() and see an L3
      skb as context. Programs attached to input and output are read-only.
      Programs attached to lwtunnel_xmit() can modify and redirect, push headers
      and redirect packets.
      
      The facility can be used to:
       - Collect statistics and generate sampling data for a subset of traffic
         based on the dst utilized by the packet thus allowing to extend the
         existing realms.
       - Apply additional per route/dst filters to prohibit certain outgoing
         or incoming packets based on BPF filters. In particular, this allows
         to maintain per dst custom state across multiple packets in BPF maps
         and apply filters based on statistics and behaviour observed over time.
       - Attachment of L2 headers at transmit where resolving the L2 address
         is not required.
       - Possibly many more.
      
      v3 -> v4:
       - Bumped LWT_BPF_MAX_HEADROOM from 128 to 256 (Alexei)
       - Renamed bpf_skb_push() helper to bpf_skb_change_head() to relate to
         existing bpf_skb_change_tail() helper (Alexei/Daniel)
       - Added check in __bpf_redirect_common() to verify that program added a
         link header before redirecting to a l2 device. Adding the check to
         lwt-bpf code was considered but dropped due to massive code required
         due to retrieval of net_device via per-cpu redirect buffer. A test
         case was added to cover the scenario when a program directs to an l2
         device without adding an appropriate l2 header.
         (Alexei)
       - Prohibited access to tc_classid (Daniel)
       - Collapsed bpf_verifier_ops instance for lwt in/out as they are
         identical (Daniel)
       - Some cosmetic changes
      
      v2 -> v3:
       - Added real world sample lwt_len_hist_kern.c which demonstrates how to
         collect a histogram on packet sizes for all packets flowing through
         a number of routes.
       - Restricted output to be read-only. Since the header can no longer
         be modified, the rerouting functionality has been removed again.
       - Added test case which cover destructive modification of packet data.
      
      v1 -> v2:
       - Added new BPF_LWT_REROUTE return code for program to indicate
         that new route lookup should be performed. Suggested by Tom.
       - New sample to illustrate rerouting
       - New patch 05: Recursion limit for lwtunnel_output for the case
         when user creates circular dst redirection. Also resolves the
         issue for ILA.
       - Fix to ensure headroom for potential future L2 header is still
         guaranteed
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f577e22c
    • T
      bpf: Add tests and samples for LWT-BPF · f74599f7
      Thomas Graf 提交于
      Adds a series of tests to verify the functionality of attaching
      BPF programs at LWT hooks.
      
      Also adds a sample which collects a histogram of packet sizes which
      pass through an LWT hook.
      
      $ ./lwt_len_hist.sh
      Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.253.2 () port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    39857.69
             1 -> 1        : 0        |                                      |
             2 -> 3        : 0        |                                      |
             4 -> 7        : 0        |                                      |
             8 -> 15       : 0        |                                      |
            16 -> 31       : 0        |                                      |
            32 -> 63       : 22       |                                      |
            64 -> 127      : 98       |                                      |
           128 -> 255      : 213      |                                      |
           256 -> 511      : 1444251  |********                              |
           512 -> 1023     : 660610   |***                                   |
          1024 -> 2047     : 535241   |**                                    |
          2048 -> 4095     : 19       |                                      |
          4096 -> 8191     : 180      |                                      |
          8192 -> 16383    : 5578023  |************************************* |
         16384 -> 32767    : 632099   |***                                   |
         32768 -> 65535    : 6575     |                                      |
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f74599f7
    • T
      bpf: BPF for lightweight tunnel infrastructure · 3a0af8fd
      Thomas Graf 提交于
      Registers new BPF program types which correspond to the LWT hooks:
        - BPF_PROG_TYPE_LWT_IN   => dst_input()
        - BPF_PROG_TYPE_LWT_OUT  => dst_output()
        - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit()
      
      The separate program types are required to differentiate between the
      capabilities each LWT hook allows:
      
       * Programs attached to dst_input() or dst_output() are restricted and
         may only read the data of an skb. This prevent modification and
         possible invalidation of already validated packet headers on receive
         and the construction of illegal headers while the IP headers are
         still being assembled.
      
       * Programs attached to lwtunnel_xmit() are allowed to modify packet
         content as well as prepending an L2 header via a newly introduced
         helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is
         invoked after the IP header has been assembled completely.
      
      All BPF programs receive an skb with L3 headers attached and may return
      one of the following error codes:
      
       BPF_OK - Continue routing as per nexthop
       BPF_DROP - Drop skb and return EPERM
       BPF_REDIRECT - Redirect skb to device as per redirect() helper.
                      (Only valid in lwtunnel_xmit() context)
      
      The return codes are binary compatible with their TC_ACT_
      relatives to ease compatibility.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a0af8fd