1. 22 11月, 2016 1 次提交
  2. 20 11月, 2016 1 次提交
    • A
      net: fix bogus cast in skb_pagelen() and use unsigned variables · c72d8cda
      Alexey Dobriyan 提交于
      1) cast to "int" is unnecessary:
         u8 will be promoted to int before decrementing,
         small positive numbers fit into "int", so their values won't be changed
         during promotion.
      
         Once everything is int including loop counters, signedness doesn't
         matter: 32-bit operations will stay 32-bit operations.
      
         But! Someone tried to make this loop smart by making everything of
         the same type apparently in an attempt to optimise it.
         Do the optimization, just differently.
         Do the cast where it matters. :^)
      
      2) frag size is unsigned entity and sum of fragments sizes is also
         unsigned.
      
      Make everything unsigned, leave no MOVSX instruction behind.
      
      	add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-4 (-4)
      	function                                     old     new   delta
      	skb_cow_data                                 835     834      -1
      	ip_do_fragment                              2549    2548      -1
      	ip6_fragment                                3130    3128      -2
      	Total: Before=154865032, After=154865028, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c72d8cda
  3. 18 11月, 2016 3 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
    • E
      udp: enable busy polling for all sockets · e68b6e50
      Eric Dumazet 提交于
      UDP busy polling is restricted to connected UDP sockets.
      
      This is because sk_busy_loop() only takes care of one NAPI context.
      
      There are cases where it could be extended.
      
      1) Some hosts receive traffic on a single NIC, with one RX queue.
      
      2) Some applications use SO_REUSEPORT and associated BPF filter
         to split the incoming traffic on one UDP socket per RX
      queue/thread/cpu
      
      3) Some UDP sockets are used to send/receive traffic for one flow, but
      they do not bother with connect()
      
      This patch records the napi_id of first received skb, giving more
      reach to busy polling.
      
      Tested:
      
      lpaa23:~# echo 70 >/proc/sys/net/core/busy_read
      lpaa24:~# echo 70 >/proc/sys/net/core/busy_read
      
      lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done
      
      Before patch :
         27867   28870   37324   41060   41215
         36764   36838   44455   41282   43843
      After patch :
         73920   73213   70147   74845   71697
         68315   68028   75219   70082   73707
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e68b6e50
    • P
      ip6_tunnel: disable caching when the traffic class is inherited · b5c2d495
      Paolo Abeni 提交于
      If an ip6 tunnel is configured to inherit the traffic class from
      the inner header, the dst_cache must be disabled or it will foul
      the policy routing.
      
      The issue is apprently there since at leat Linux-2.6.12-rc2.
      Reported-by: NLiam McBirnie <liam.mcbirnie@boeing.com>
      Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5c2d495
  4. 17 11月, 2016 1 次提交
    • D
      ipv6: sr: add option to control lwtunnel support · 46738b13
      David Lebrun 提交于
      This patch adds a new option CONFIG_IPV6_SEG6_LWTUNNEL to enable/disable
      support of encapsulation with the lightweight tunnels. When this option
      is enabled, CONFIG_LWTUNNEL is automatically selected.
      
      Fix commit 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      
      Without a proper option to control lwtunnel support for SR-IPv6, if
      CONFIG_LWTUNNEL=n then the IPv6 initialization fails as a consequence
      of seg6_iptunnel_init() failure with EOPNOTSUPP:
      
      NET: Registered protocol family 10
      IPv6: Attempt to unregister permanent protocol 6
      IPv6: Attempt to unregister permanent protocol 136
      IPv6: Attempt to unregister permanent protocol 17
      NET: Unregistered protocol family 10
      
      Tested (compiling, booting, and loading ipv6 module when relevant)
      with possible combinations of CONFIG_IPV6={y,m,n},
      CONFIG_IPV6_SEG6_LWTUNNEL={y,n} and CONFIG_LWTUNNEL={y,n}.
      Reported-by: NLorenzo Colitti <lorenzo@google.com>
      Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46738b13
  5. 16 11月, 2016 2 次提交
  6. 14 11月, 2016 2 次提交
  7. 10 11月, 2016 12 次提交
    • D
      net: tcp response should set oif only if it is L3 master · 9b6c14d5
      David Ahern 提交于
      Lorenzo noted an Android unit test failed due to e0d56fdd:
      "The expectation in the test was that the RST replying to a SYN sent to a
      closed port should be generated with oif=0. In other words it should not
      prefer the interface where the SYN came in on, but instead should follow
      whatever the routing table says it should do."
      
      Revert the change to ip_send_unicast_reply and tcp_v6_send_response such
      that the oif in the flow is set to the skb_iif only if skb_iif is an L3
      master.
      
      Fixes: e0d56fdd ("net: l3mdev: remove redundant calls")
      Reported-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NLorenzo Colitti <lorenzo@google.com>
      Acked-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b6c14d5
    • D
      ipv6: sr: add support for SRH injection through setsockopt · a149e7c7
      David Lebrun 提交于
      This patch adds support for per-socket SRH injection with the setsockopt
      system call through the IPPROTO_IPV6, IPV6_RTHDR options.
      The SRH is pushed through the ipv6_push_nfrag_opts function.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a149e7c7
    • D
      ipv6: add source address argument for ipv6_push_nfrag_opts · 613fa3ca
      David Lebrun 提交于
      This patch prepares for insertion of SRH through setsockopt().
      The new source address argument is used when an HMAC field is
      present in the SRH, which must be filled. The HMAC signature
      process requires the source address as input text.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      613fa3ca
    • D
      ipv6: sr: add calls to verify and insert HMAC signatures · 9baee834
      David Lebrun 提交于
      This patch enables the verification of the HMAC signature for transiting
      SR-enabled packets, and its insertion on encapsulated/injected SRH.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9baee834
    • D
      ipv6: sr: implement API to control SR HMAC structure · 4f4853dc
      David Lebrun 提交于
      This patch provides an implementation of the genetlink commands
      to associate a given HMAC key identifier with an hashing algorithm
      and a secret.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f4853dc
    • D
      ipv6: sr: add core files for SR HMAC support · bf355b8d
      David Lebrun 提交于
      This patch adds the necessary functions to compute and check the HMAC signature
      of an SR-enabled packet. Two HMAC algorithms are supported: hmac(sha1) and
      hmac(sha256).
      
      In order to avoid dynamic memory allocation for each HMAC computation,
      a per-cpu ring buffer is allocated for this purpose.
      
      A new per-interface sysctl called seg6_require_hmac is added, allowing a
      user-defined policy for processing HMAC-signed SR-enabled packets.
      A value of -1 means that the HMAC field will always be ignored.
      A value of 0 means that if an HMAC field is present, its validity will
      be enforced (the packet is dropped is the signature is incorrect).
      Finally, a value of 1 means that any SR-enabled packet that does not
      contain an HMAC signature or whose signature is incorrect will be dropped.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf355b8d
    • D
      ipv6: sr: add support for SRH encapsulation and injection with lwtunnels · 6c8702c6
      David Lebrun 提交于
      This patch creates a new type of interfaceless lightweight tunnel (SEG6),
      enabling the encapsulation and injection of SRH within locally emitted
      packets and forwarded packets.
      
      >From a configuration viewpoint, a seg6 tunnel would be configured as follows:
      
        ip -6 ro ad fc00::1/128 encap seg6 mode encap segs fc42::1,fc42::2,fc42::3 dev eth0
      
      Any packet whose destination address is fc00::1 would thus be encapsulated
      within an outer IPv6 header containing the SRH with three segments, and would
      actually be routed to the first segment of the list. If `mode inline' was
      specified instead of `mode encap', then the SRH would be directly inserted
      after the IPv6 header without outer encapsulation.
      
      The inline mode is only available if CONFIG_IPV6_SEG6_INLINE is enabled. This
      feature was made configurable because direct header insertion may break
      several mechanisms such as PMTUD or IPSec AH.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c8702c6
    • D
      ipv6: sr: add code base for control plane support of SR-IPv6 · 915d7e5e
      David Lebrun 提交于
      This patch adds the necessary hooks and structures to provide support
      for SR-IPv6 control plane, essentially the Generic Netlink commands
      that will be used for userspace control over the Segment Routing
      kernel structures.
      
      The genetlink commands provide control over two different structures:
      tunnel source and HMAC data. The tunnel source is the source address
      that will be used by default when encapsulating packets into an
      outer IPv6 header + SRH. If the tunnel source is set to :: then an
      address of the outgoing interface will be selected as the source.
      
      The HMAC commands currently just return ENOTSUPP and will be implemented
      in a future patch.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      915d7e5e
    • D
      ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header) · 1ababeba
      David Lebrun 提交于
      Implement minimal support for processing of SR-enabled packets
      as described in
      https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02.
      
      This patch implements the following operations:
      - Intermediate segment endpoint: incrementation of active segment and rerouting.
      - Egress for SR-encapsulated packets: decapsulation of outer IPv6 header + SRH
        and routing of inner packet.
      - Cleanup flag support for SR-inlined packets: removal of SRH if we are the
        penultimate segment endpoint.
      
      A per-interface sysctl seg6_enabled is provided, to accept/deny SR-enabled
      packets. Default is deny.
      
      This patch does not provide support for HMAC-signed packets.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ababeba
    • A
      udp: provide udp{4,6}_lib_lookup for nf_socket_ipv{4,6} · 30f58158
      Arnd Bergmann 提交于
      Since commit ca065d0c ("udp: no longer use SLAB_DESTROY_BY_RCU")
      the udp6_lib_lookup and udp4_lib_lookup functions are only
      provided when it is actually possible to call them.
      
      However, moving the callers now caused a link error:
      
      net/built-in.o: In function `nf_sk_lookup_slow_v6':
      (.text+0x131a39): undefined reference to `udp6_lib_lookup'
      net/ipv4/netfilter/nf_socket_ipv4.o: In function `nf_sk_lookup_slow_v4':
      nf_socket_ipv4.c:(.text.nf_sk_lookup_slow_v4+0x114): undefined reference to `udp4_lib_lookup'
      
      This extends the #ifdef so we also provide the functions when
      CONFIG_NF_SOCKET_IPV4 or CONFIG_NF_SOCKET_IPV6, respectively
      are set.
      
      Fixes: 8db4c5be ("netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      30f58158
    • D
      netfilter: conntrack: simplify init/uninit of L4 protocol trackers · 0e54d217
      Davide Caratti 提交于
      modify registration and deregistration of layer-4 protocol trackers to
      facilitate inclusion of new elements into the current list of builtin
      protocols. Both builtin (TCP, UDP, ICMP) and non-builtin (DCCP, GRE, SCTP,
      UDPlite) layer-4 protocol trackers usually register/deregister themselves
      using consecutive calls to nf_ct_l4proto_{,pernet}_{,un}register(...).
      This sequence is interrupted and rolled back in case of error; in order to
      simplify addition of builtin protocols, the input of the above functions
      has been modified to allow registering/unregistering multiple protocols.
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0e54d217
    • M
      net-ipv6: on device mtu change do not add mtu to mtu-less routes · fb56be83
      Maciej Żenczykowski 提交于
      Routes can specify an mtu explicitly or inherit the mtu from
      the underlying device - this inheritance is implemented in
      dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().
      
      Currently changing the mtu of a device adds mtu explicitly
      to routes using that device.
      
      ie.
        # ip link set dev lo mtu 65536
        # ip -6 route add local 2000::1 dev lo
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
      
        # ip link set dev lo mtu 65535
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65535 pref medium
      
        # ip link set dev lo mtu 65536
        # ip -6 route get 2000::1
        local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium
      
        # ip -6 route del local 2000::1
      
      After this patch the route entry no longer changes unless it already has an mtu.
      There is no need: this inheritance is already done in ip6_mtu()
      
        # ip link set dev lo mtu 65536
        # ip -6 route add local 2000::1 dev lo
        # ip -6 route add local 2000::2 dev lo mtu 2000
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium
      
        # ip link set dev lo mtu 65535
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium
      
        # ip link set dev lo mtu 1501
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 1501 pref medium
      
        # ip link set dev lo mtu 65536
        # ip -6 route get 2000::1; ip -6 route get 2000::2
        local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
        local 2000::2 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium
      
        # ip -6 route del local 2000::1
        # ip -6 route del local 2000::2
      
      This is desirable because changing device mtu and then resetting it
      to the previous value shouldn't change the user visible routing table.
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      CC: Eric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb56be83
  8. 08 11月, 2016 3 次提交
  9. 05 11月, 2016 2 次提交
    • L
      net: inet: Support UID-based routing in IP protocols. · e2d118a1
      Lorenzo Colitti 提交于
      - Use the UID in routing lookups made by protocol connect() and
        sendmsg() functions.
      - Make sure that routing lookups triggered by incoming packets
        (e.g., Path MTU discovery) take the UID of the socket into
        account.
      - For packets not associated with a userspace socket, (e.g., ping
        replies) use UID 0 inside the user namespace corresponding to
        the network namespace the socket belongs to. This allows
        all namespaces to apply routing and iptables rules to
        kernel-originated traffic in that namespaces by matching UID 0.
        This is better than using the UID of the kernel socket that is
        sending the traffic, because the UID of kernel sockets created
        at namespace creation time (e.g., the per-processor ICMP and
        TCP sockets) is the UID of the user that created the socket,
        which might not be mapped in the namespace.
      
      Tested: compiles allnoconfig, allyesconfig, allmodconfig
      Tested: https://android-review.googlesource.com/253302Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2d118a1
    • L
      net: core: add UID to flows, rules, and routes · 622ec2c9
      Lorenzo Colitti 提交于
      - Define a new FIB rule attributes, FRA_UID_RANGE, to describe a
        range of UIDs.
      - Define a RTA_UID attribute for per-UID route lookups and dumps.
      - Support passing these attributes to and from userspace via
        rtnetlink. The value INVALID_UID indicates no UID was
        specified.
      - Add a UID field to the flow structures.
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      622ec2c9
  10. 04 11月, 2016 2 次提交
  11. 03 11月, 2016 4 次提交
  12. 02 11月, 2016 2 次提交
    • P
      netfilter: move socket lookup infrastructure to nf_socket_ipv{4,6}.c · 8db4c5be
      Pablo Neira Ayuso 提交于
      We need this split to reuse existing codebase for the upcoming nf_tables
      socket expression.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8db4c5be
    • F
      netfilter: nf_tables: add fib expression · f6d0cbcf
      Florian Westphal 提交于
      Add FIB expression, supported for ipv4, ipv6 and inet family (the latter
      just dispatches to ipv4 or ipv6 one based on nfproto).
      
      Currently supports fetching output interface index/name and the
      rtm_type associated with an address.
      
      This can be used for adding path filtering. rtm_type is useful
      to e.g. enforce a strong-end host model where packets
      are only accepted if daddr is configured on the interface the
      packet arrived on.
      
      The fib expression is a native nftables alternative to the
      xtables addrtype and rp_filter matches.
      
      FIB result order for oif/oifname retrieval is as follows:
       - if packet is local (skb has rtable, RTF_LOCAL set, this
         will also catch looped-back multicast packets), set oif to
         the loopback interface.
       - if fib lookup returns an error, or result points to local,
         store zero result.  This means '--local' option of -m rpfilter
         is not supported. It is possible to use 'fib type local' or add
         explicit saddr/daddr matching rules to create exceptions if this
         is really needed.
       - store result in the destination register.
         In case of multiple routes, search set for desired oif in case
         strict matching is requested.
      
      ipv4 and ipv6 behave fib expressions are supposed to behave the same.
      
      [ I have collapsed Arnd Bergmann's ("netfilter: nf_tables: fib warnings")
      
      	http://patchwork.ozlabs.org/patch/688615/
      
        to address fallout from this patch after rebasing nf-next, that was
        posted to address compilation warnings. --pablo ]
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f6d0cbcf
  13. 01 11月, 2016 3 次提交
  14. 31 10月, 2016 1 次提交
  15. 30 10月, 2016 1 次提交