1. 06 1月, 2016 1 次提交
    • X
      sctp: add the rhashtable apis for sctp global transport hashtable · d6c0256a
      Xin Long 提交于
      tranport hashtbale will replace the association hashtable to do the
      lookup for transport, and then get association by t->assoc, rhashtable
      apis will be used because of it's resizable, scalable and using rcu.
      
      lport + rport + paddr will be the base hashkey to locate the chain,
      with net to protect one netns from another, then plus the laddr to
      compare to get the target.
      
      this patch will provider the lookup functions:
      - sctp_epaddr_lookup_transport
      - sctp_addrs_lookup_transport
      
      hash/unhash functions:
      - sctp_hash_transport
      - sctp_unhash_transport
      
      init/destroy functions:
      - sctp_transport_hashtable_init
      - sctp_transport_hashtable_destroy
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6c0256a
  2. 05 1月, 2016 4 次提交
    • C
      soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF · 538950a1
      Craig Gallek 提交于
      Expose socket options for setting a classic or extended BPF program
      for use when selecting sockets in an SO_REUSEPORT group.  These options
      can be used on the first socket to belong to a group before bind or
      on any socket in the group after bind.
      
      This change includes refactoring of the existing sk_filter code to
      allow reuse of the existing BPF filter validation checks.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      538950a1
    • C
      soreuseport: fast reuseport UDP socket selection · e32ea7e7
      Craig Gallek 提交于
      Include a struct sock_reuseport instance when a UDP socket binds to
      a specific address for the first time with the reuseport flag set.
      When selecting a socket for an incoming UDP packet, use the information
      available in sock_reuseport if present.
      
      This required adding an additional field to the UDP source address
      equality function to differentiate between exact and wildcard matches.
      The original use case allowed wildcard matches when checking for
      existing port uses during bind.  The new use case of adding a socket
      to a reuseport group requires exact address matching.
      
      Performance test (using a machine with 2 CPU sockets and a total of
      48 cores):  Create reuseport groups of varying size.  Use one socket
      from this group per user thread (pinning each thread to a different
      core) calling recvmmsg in a tight loop.  Record number of messages
      received per second while saturating a 10G link.
        10 sockets: 18% increase (~2.8M -> 3.3M pkts/s)
        20 sockets: 14% increase (~2.9M -> 3.3M pkts/s)
        40 sockets: 13% increase (~3.0M -> 3.4M pkts/s)
      
      This work is based off a similar implementation written by
      Ying Cai <ycai@google.com> for implementing policy-based reuseport
      selection.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e32ea7e7
    • C
      soreuseport: define reuseport groups · ef456144
      Craig Gallek 提交于
      struct sock_reuseport is an optional shared structure referenced by each
      socket belonging to a reuseport group.  When a socket is bound to an
      address/port not yet in use and the reuseport flag has been set, the
      structure will be allocated and attached to the newly bound socket.
      When subsequent calls to bind are made for the same address/port, the
      shared structure will be updated to include the new socket and the
      newly bound socket will reference the group structure.
      
      Usually, when an incoming packet was destined for a reuseport group,
      all sockets in the same group needed to be considered before a
      dispatching decision was made.  With this structure, an appropriate
      socket can be found after looking up just one socket in the group.
      
      This shared structure will also allow for more complicated decisions to
      be made when selecting a socket (eg a BPF filter).
      
      This work is based off a similar implementation written by
      Ying Cai <ycai@google.com> for implementing policy-based reuseport
      selection.
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef456144
    • A
      net: make ip6tunnel_xmit definition conditional · 0efeff29
      Arnd Bergmann 提交于
      Moving the caller of iptunnel_xmit_stats causes a build error in
      randconfig builds that disable CONFIG_INET:
      
      In file included from ../net/xfrm/xfrm_input.c:17:0:
      ../include/net/ip6_tunnel.h: In function 'ip6tunnel_xmit':
      ../include/net/ip6_tunnel.h:93:2: error: implicit declaration of function 'iptunnel_xmit_stats' [-Werror=implicit-function-declaration]
        iptunnel_xmit_stats(dev, pkt_len);
      
      The reason is that the iptunnel_xmit_stats definition is hidden
      inside #ifdef CONFIG_INET but the caller is not. We can change
      one or the other to fix it, and this patch adds a second #ifdef
      around ip6tunnel_xmit() to avoid seeing the invalid call.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: 039f5062 ("ip_tunnel: Move stats update to iptunnel_xmit()")
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0efeff29
  3. 30 12月, 2015 1 次提交
  4. 26 12月, 2015 1 次提交
  5. 23 12月, 2015 1 次提交
  6. 19 12月, 2015 2 次提交
  7. 17 12月, 2015 1 次提交
  8. 16 12月, 2015 10 次提交
  9. 15 12月, 2015 2 次提交
    • E
      net: fix IP early demux races · 5037e9ef
      Eric Dumazet 提交于
      David Wilder reported crashes caused by dst reuse.
      
      <quote David>
        I am seeing a crash on a distro V4.2.3 kernel caused by a double
        release of a dst_entry.  In ipv4_dst_destroy() the call to
        list_empty() finds a poisoned next pointer, indicating the dst_entry
        has already been removed from the list and freed. The crash occurs
        18 to 24 hours into a run of a network stress exerciser.
      </quote>
      
      Thanks to his detailed report and analysis, we were able to understand
      the core issue.
      
      IP early demux can associate a dst to skb, after a lookup in TCP/UDP
      sockets.
      
      When socket cache is not properly set, we want to store into
      sk->sk_dst_cache the dst for future IP early demux lookups,
      by acquiring a stable refcount on the dst.
      
      Problem is this acquisition is simply using an atomic_inc(),
      which works well, unless the dst was queued for destruction from
      dst_release() noticing dst refcount went to zero, if DST_NOCACHE
      was set on dst.
      
      We need to make sure current refcount is not zero before incrementing
      it, or risk double free as David reported.
      
      This patch, being a stable candidate, adds two new helpers, and use
      them only from IP early demux problematic paths.
      
      It might be possible to merge in net-next skb_dst_force() and
      skb_dst_force_safe(), but I prefer having the smallest patch for stable
      kernels : Maybe some skb_dst_force() callers do not expect skb->dst
      can suddenly be cleared.
      
      Can probably be backported back to linux-3.6 kernels
      Reported-by: NDavid J. Wilder <dwilder@us.ibm.com>
      Tested-by: NDavid J. Wilder <dwilder@us.ibm.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5037e9ef
    • H
      net: add validation for the socket syscall protocol argument · 79462ad0
      Hannes Frederic Sowa 提交于
      郭永刚 reported that one could simply crash the kernel as root by
      using a simple program:
      
      	int socket_fd;
      	struct sockaddr_in addr;
      	addr.sin_port = 0;
      	addr.sin_addr.s_addr = INADDR_ANY;
      	addr.sin_family = 10;
      
      	socket_fd = socket(10,3,0x40000000);
      	connect(socket_fd , &addr,16);
      
      AF_INET, AF_INET6 sockets actually only support 8-bit protocol
      identifiers. inet_sock's skc_protocol field thus is sized accordingly,
      thus larger protocol identifiers simply cut off the higher bits and
      store a zero in the protocol fields.
      
      This could lead to e.g. NULL function pointer because as a result of
      the cut off inet_num is zero and we call down to inet_autobind, which
      is NULL for raw sockets.
      
      kernel: Call Trace:
      kernel:  [<ffffffff816db90e>] ? inet_autobind+0x2e/0x70
      kernel:  [<ffffffff816db9a4>] inet_dgram_connect+0x54/0x80
      kernel:  [<ffffffff81645069>] SYSC_connect+0xd9/0x110
      kernel:  [<ffffffff810ac51b>] ? ptrace_notify+0x5b/0x80
      kernel:  [<ffffffff810236d8>] ? syscall_trace_enter_phase2+0x108/0x200
      kernel:  [<ffffffff81645e0e>] SyS_connect+0xe/0x10
      kernel:  [<ffffffff81779515>] tracesys_phase2+0x84/0x89
      
      I found no particular commit which introduced this problem.
      
      CVE: CVE-2015-8543
      Cc: Cong Wang <cwang@twopensource.com>
      Reported-by: N郭永刚 <guoyonggang@360.cn>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79462ad0
  10. 14 12月, 2015 2 次提交
  11. 12 12月, 2015 2 次提交
  12. 10 12月, 2015 10 次提交
  13. 09 12月, 2015 3 次提交
    • F
      netfilter: nf_tables: wrap tracing with a static key · e639f7ab
      Florian Westphal 提交于
      Only needed when meta nftrace rule(s) were added.
      The assumption is that no such rules are active, so the call to
      nft_trace_init is "never" needed.
      
      When nftrace rules are active, we always call the nft_trace_* functions,
      but will only send netlink messages when all of the following are true:
      
       - traceinfo structure was initialised
       - skb->nf_trace == 1
       - at least one subscriber to trace group.
      
      Adding an extra conditional
      (static_branch ... && skb->nf_trace)
      	nft_trace_init( ..)
      
      Is possible but results in a larger nft_do_chain footprint.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e639f7ab
    • F
      netfilter: nf_tables: extend tracing infrastructure · 33d5a7b1
      Florian Westphal 提交于
      nft monitor mode can then decode and display this trace data.
      
      Parts of LL/Network/Transport headers are provided as separate
      attributes.
      
      Otherwise, printing IP address data becomes virtually impossible
      for userspace since in the case of the netdev family we really don't
      want userspace to have to know all the possible link layer types
      and/or sizes just to display/print an ip address.
      
      We also don't want userspace to have to follow ipv6 header chains
      to get the s/dport info, the kernel already did this work for us.
      
      To avoid bloating nft_do_chain all data required for tracing is
      encapsulated in nft_traceinfo.
      
      The structure is initialized unconditionally(!) for each nft_do_chain
      invocation.
      
      This unconditionall call will be moved under a static key in a
      followup patch.
      
      With lots of help from Patrick McHardy and Pablo Neira.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      33d5a7b1
    • T
      net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct · 2a56a1fe
      Tejun Heo 提交于
      Introduce sock->sk_cgrp_data which is a struct sock_cgroup_data.
      ->sk_cgroup_prioidx and ->sk_classid are moved into it.  The struct
      and its accessors are defined in cgroup-defs.h.  This is to prepare
      for overloading the fields with a cgroup pointer.
      
      This patch mostly performs equivalent conversions but the followings
      are noteworthy.
      
      * Equality test before updating classid is removed from
        sock_update_classid().  This shouldn't make any noticeable
        difference and a similar test will be implemented on the helper side
        later.
      
      * sock_update_netprioidx() now takes struct sock_cgroup_data and can
        be moved to netprio_cgroup.h without causing include dependency
        loop.  Moved.
      
      * The dummy version of sock_update_netprioidx() converted to a static
        inline function while at it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a56a1fe