1. 04 12月, 2016 1 次提交
  2. 03 12月, 2016 2 次提交
  3. 02 12月, 2016 1 次提交
    • T
      bpf: BPF for lightweight tunnel infrastructure · 3a0af8fd
      Thomas Graf 提交于
      Registers new BPF program types which correspond to the LWT hooks:
        - BPF_PROG_TYPE_LWT_IN   => dst_input()
        - BPF_PROG_TYPE_LWT_OUT  => dst_output()
        - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit()
      
      The separate program types are required to differentiate between the
      capabilities each LWT hook allows:
      
       * Programs attached to dst_input() or dst_output() are restricted and
         may only read the data of an skb. This prevent modification and
         possible invalidation of already validated packet headers on receive
         and the construction of illegal headers while the IP headers are
         still being assembled.
      
       * Programs attached to lwtunnel_xmit() are allowed to modify packet
         content as well as prepending an L2 header via a newly introduced
         helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is
         invoked after the IP header has been assembled completely.
      
      All BPF programs receive an skb with L3 headers attached and may return
      one of the following error codes:
      
       BPF_OK - Continue routing as per nexthop
       BPF_DROP - Drop skb and return EPERM
       BPF_REDIRECT - Redirect skb to device as per redirect() helper.
                      (Only valid in lwtunnel_xmit() context)
      
      The return codes are binary compatible with their TC_ACT_
      relatives to ease compatibility.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a0af8fd
  4. 01 12月, 2016 1 次提交
  5. 30 11月, 2016 3 次提交
  6. 29 11月, 2016 1 次提交
  7. 26 11月, 2016 2 次提交
    • D
      bpf: add BPF_PROG_ATTACH and BPF_PROG_DETACH commands · f4324551
      Daniel Mack 提交于
      Extend the bpf(2) syscall by two new commands, BPF_PROG_ATTACH and
      BPF_PROG_DETACH which allow attaching and detaching eBPF programs
      to a target.
      
      On the API level, the target could be anything that has an fd in
      userspace, hence the name of the field in union bpf_attr is called
      'target_fd'.
      
      When called with BPF_ATTACH_TYPE_CGROUP_INET_{E,IN}GRESS, the target is
      expected to be a valid file descriptor of a cgroup v2 directory which
      has the bpf controller enabled. These are the only use-cases
      implemented by this patch at this point, but more can be added.
      
      If a program of the given type already exists in the given cgroup,
      the program is swapped automically, so userspace does not have to drop
      an existing program first before installing a new one, which would
      otherwise leave a gap in which no program is attached.
      
      For more information on the propagation logic to subcgroups, please
      refer to the bpf cgroup controller implementation.
      
      The API is guarded by CAP_NET_ADMIN.
      Signed-off-by: NDaniel Mack <daniel@zonque.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4324551
    • D
      bpf: add new prog type for cgroup socket filtering · 0e33661d
      Daniel Mack 提交于
      This program type is similar to BPF_PROG_TYPE_SOCKET_FILTER, except that
      it does not allow BPF_LD_[ABS|IND] instructions and hooks up the
      bpf_skb_load_bytes() helper.
      
      Programs of this type will be attached to cgroups for network filtering
      and accounting.
      Signed-off-by: NDaniel Mack <daniel@zonque.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e33661d
  8. 25 11月, 2016 1 次提交
  9. 22 11月, 2016 2 次提交
  10. 20 11月, 2016 1 次提交
  11. 19 11月, 2016 2 次提交
  12. 16 11月, 2016 2 次提交
  13. 14 11月, 2016 1 次提交
    • M
      Revert "include/uapi/linux/atm_zatm.h: include linux/time.h" · 7b5b74ef
      Mike Frysinger 提交于
      This reverts commit cf00713a ("include/uapi/linux/atm_zatm.h: include
      linux/time.h").
      
      This attempted to fix userspace breakage that no longer existed when
      the patch was merged.  Almost one year earlier, commit 70ba07b6
      ("atm: remove 'struct zatm_t_hist'") deleted the struct in question.
      
      After this patch was merged, we now have to deal with people being
      unable to include this header in conjunction with standard C library
      headers like stdlib.h (which linux-atm does).  Example breakage:
      x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -I./../q2931 -I./../saal \
      	-I.  -DCPPFLAGS_TEST  -I../../src/include -O2 -march=native -pipe -g \
      	-frecord-gcc-switches -freport-bug -Wimplicit-function-declaration \
      	-Wnonnull -Wstrict-aliasing -Wparentheses -Warray-bounds \
      	-Wfree-nonheap-object -Wreturn-local-addr -fno-strict-aliasing -Wall \
      	-Wshadow -Wpointer-arith -Wwrite-strings -Wstrict-prototypes -c zntune.c
      In file included from /usr/include/linux/atm_zatm.h:17:0,
                       from zntune.c:17:
      /usr/include/linux/time.h:9:8: error: redefinition of ‘struct timespec’
       struct timespec {
              ^
      In file included from /usr/include/sys/select.h:43:0,
                       from /usr/include/sys/types.h:219,
                       from /usr/include/stdlib.h:314,
                       from zntune.c:9:
      /usr/include/time.h:120:8: note: originally defined here
       struct timespec
              ^
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Acked-by: NMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b5b74ef
  14. 13 11月, 2016 2 次提交
  15. 10 11月, 2016 7 次提交
  16. 05 11月, 2016 1 次提交
  17. 04 11月, 2016 3 次提交
    • S
      net/sched: cls_flower: Support matching on SCTP ports · 5976c5f4
      Simon Horman 提交于
      Support matching on SCTP ports in the same way that matching
      on TCP and UDP ports is already supported.
      
      Example usage:
      
      tc qdisc add dev eth0 ingress
      
      tc filter add dev eth0 protocol ip parent ffff: \
              flower indev eth0 ip_proto sctp dst_port 80 \
              action drop
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5976c5f4
    • W
      ipv6: add IPV6_RECVFRAGSIZE cmsg · 0cc0aa61
      Willem de Bruijn 提交于
      When reading a datagram or raw packet that arrived fragmented, expose
      the maximum fragment size if recorded to allow applications to
      estimate receive path MTU.
      
      At this point, the field is only recorded when ipv6 connection
      tracking is enabled. A follow-up patch will record this field also
      in the ipv6 input path.
      
      Tested using the test for IP_RECVFRAGSIZE plus
      
        ip netns exec to ip addr add dev veth1 fc07::1/64
        ip netns exec from ip addr add dev veth0 fc07::2/64
      
        ip netns exec to ./recv_cmsg_recvfragsize -6 -u -p 6000 &
        ip netns exec from nc -q 1 -u fc07::1 6000 < payload
      
      Both with and without enabling connection tracking
      
        ip6tables -A INPUT -m state --state NEW -p udp -j LOG
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cc0aa61
    • W
      ipv4: add IP_RECVFRAGSIZE cmsg · 70ecc248
      Willem de Bruijn 提交于
      The IP stack records the largest fragment of a reassembled packet
      in IPCB(skb)->frag_max_size. When reading a datagram or raw packet
      that arrived fragmented, expose the value to allow applications to
      estimate receive path MTU.
      
      Tested:
        Sent data over a veth pair of which the source has a small mtu.
        Sent data using netcat, received using a dedicated process.
      
        Verified that the cmsg IP_RECVFRAGSIZE is returned only when
        data arrives fragmented, and in that cases matches the veth mtu.
      
          ip link add veth0 type veth peer name veth1
      
          ip netns add from
          ip netns add to
      
          ip link set dev veth1 netns to
          ip netns exec to ip addr add dev veth1 192.168.10.1/24
          ip netns exec to ip link set dev veth1 up
      
          ip link set dev veth0 netns from
          ip netns exec from ip addr add dev veth0 192.168.10.2/24
          ip netns exec from ip link set dev veth0 up
          ip netns exec from ip link set dev veth0 mtu 1300
          ip netns exec from ethtool -K veth0 ufo off
      
          dd if=/dev/zero bs=1 count=1400 2>/dev/null > payload
      
          ip netns exec to ./recv_cmsg_recvfragsize -4 -u -p 6000 &
          ip netns exec from nc -q 1 -u 192.168.10.1 6000 < payload
      
        using github.com/wdebruij/kerneltools/blob/master/tests/recvfragsize.c
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70ecc248
  18. 03 11月, 2016 1 次提交
    • P
      netfilter: deprecate NF_STOP · 06fd3a39
      Pablo Neira Ayuso 提交于
      NF_STOP is only used by br_netfilter these days, and it can be emulated
      with a combination of NF_STOLEN plus explicit call to the ->okfn()
      function as Florian suggests.
      
      To retain binary compatibility with userspace nf_queue application, we
      have to keep NF_STOP around, so libnetfilter_queue userspace userspace
      applications still work if they use NF_STOP for some exotic reason.
      
      Out of tree modules using NF_STOP would break, but we don't care about
      those.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      06fd3a39
  19. 02 11月, 2016 2 次提交
    • A
      netfilter: nf_tables: introduce routing expression · 2fa84193
      Anders K. Pedersen 提交于
      Introduces an nftables rt expression for routing related data with support
      for nexthop (i.e. the directly connected IP address that an outgoing packet
      is sent to), which can be used either for matching or accounting, eg.
      
       # nft add rule filter postrouting \
      	ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop
      
      This will drop any traffic to 192.168.1.0/24 that is not routed via
      192.168.0.1.
      
       # nft add rule filter postrouting \
      	flow table acct { rt nexthop timeout 600s counter }
       # nft add rule ip6 filter postrouting \
      	flow table acct { rt nexthop timeout 600s counter }
      
      These rules count outgoing traffic per nexthop. Note that the timeout
      releases an entry if no traffic is seen for this nexthop within 10 minutes.
      
       # nft add rule inet filter postrouting \
      	ether type ip \
      	flow table acct { rt nexthop timeout 600s counter }
       # nft add rule inet filter postrouting \
      	ether type ip6 \
      	flow table acct { rt nexthop timeout 600s counter }
      
      Same as above, but via the inet family, where the ether type must be
      specified explicitly.
      
      "rt classid" is also implemented identical to "meta rtclassid", since it
      is more logical to have this match in the routing expression going forward.
      Signed-off-by: NAnders K. Pedersen <akp@cohaesio.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2fa84193
    • F
      netfilter: nf_tables: add fib expression · f6d0cbcf
      Florian Westphal 提交于
      Add FIB expression, supported for ipv4, ipv6 and inet family (the latter
      just dispatches to ipv4 or ipv6 one based on nfproto).
      
      Currently supports fetching output interface index/name and the
      rtm_type associated with an address.
      
      This can be used for adding path filtering. rtm_type is useful
      to e.g. enforce a strong-end host model where packets
      are only accepted if daddr is configured on the interface the
      packet arrived on.
      
      The fib expression is a native nftables alternative to the
      xtables addrtype and rp_filter matches.
      
      FIB result order for oif/oifname retrieval is as follows:
       - if packet is local (skb has rtable, RTF_LOCAL set, this
         will also catch looped-back multicast packets), set oif to
         the loopback interface.
       - if fib lookup returns an error, or result points to local,
         store zero result.  This means '--local' option of -m rpfilter
         is not supported. It is possible to use 'fib type local' or add
         explicit saddr/daddr matching rules to create exceptions if this
         is really needed.
       - store result in the destination register.
         In case of multiple routes, search set for desired oif in case
         strict matching is requested.
      
      ipv4 and ipv6 behave fib expressions are supposed to behave the same.
      
      [ I have collapsed Arnd Bergmann's ("netfilter: nf_tables: fib warnings")
      
      	http://patchwork.ozlabs.org/patch/688615/
      
        to address fallout from this patch after rebasing nf-next, that was
        posted to address compilation warnings. --pablo ]
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f6d0cbcf
  20. 31 10月, 2016 2 次提交
  21. 30 10月, 2016 1 次提交
  22. 28 10月, 2016 1 次提交
    • J
      genetlink: use idr to track families · 2ae0f17d
      Johannes Berg 提交于
      Since generic netlink family IDs are small integers, allocated
      densely, IDR is an ideal match for lookups. Replace the existing
      hand-written hash-table with IDR for allocation and lookup.
      
      This lets the families only be written to once, during register,
      since the list_head can be removed and removal of a family won't
      cause any writes.
      
      It also slightly reduces the code size (by about 1.3k on x86-64).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ae0f17d