1. 05 8月, 2017 7 次提交
  2. 04 8月, 2017 11 次提交
  3. 03 8月, 2017 1 次提交
  4. 02 8月, 2017 4 次提交
  5. 01 8月, 2017 4 次提交
  6. 31 7月, 2017 1 次提交
    • J
      net netlink: Add new type NLA_BITFIELD32 · 64c83d83
      Jamal Hadi Salim 提交于
      Generic bitflags attribute content sent to the kernel by user.
      With this netlink attr type the user can either set or unset a
      flag in the kernel.
      
      The value is a bitmap that defines the bit values being set
      The selector is a bitmask that defines which value bit is to be
      considered.
      
      A check is made to ensure the rules that a kernel subsystem always
      conforms to bitflags the kernel already knows about. i.e
      if the user tries to set a bit flag that is not understood then
      the _it will be rejected_.
      
      In the most basic form, the user specifies the attribute policy as:
      [ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data = &myvalidflags },
      
      where myvalidflags is the bit mask of the flags the kernel understands.
      
      If the user _does not_ provide myvalidflags then the attribute will
      also be rejected.
      
      Examples:
      value = 0x0, and selector = 0x1
      implies we are selecting bit 1 and we want to set its value to 0.
      
      value = 0x2, and selector = 0x2
      implies we are selecting bit 2 and we want to set its value to 1.
      Suggested-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64c83d83
  7. 30 7月, 2017 1 次提交
    • P
      udp6: fix socket leak on early demux · c9f2c1ae
      Paolo Abeni 提交于
      When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
      sk reference is retrieved and used, but the relevant reference
      count is leaked and the socket destructor is never called.
      Beyond leaking the sk memory, if there are pending UDP packets
      in the receive queue, even the related accounted memory is leaked.
      
      In the long run, this will cause persistent forward allocation errors
      and no UDP skbs (both ipv4 and ipv6) will be able to reach the
      user-space.
      
      Fix this by explicitly accessing the early demux reference before
      the lookup, and properly decreasing the socket reference count
      after usage.
      
      Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
      the now obsoleted comment about "socket cache".
      
      The newly added code is derived from the current ipv4 code for the
      similar path.
      
      v1 -> v2:
        fixed the __udp6_lib_rcv() return code for resubmission,
        as suggested by Eric
      Reported-by: NSam Edwards <CFSworks@gmail.com>
      Reported-by: NMarc Haber <mh+netdev@zugschlus.de>
      Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9f2c1ae
  8. 27 7月, 2017 1 次提交
    • X
      sctp: fix the check for _sctp_walk_params and _sctp_walk_errors · 6b84202c
      Xin Long 提交于
      Commit b1f5bfc2 ("sctp: don't dereference ptr before leaving
      _sctp_walk_{params, errors}()") tried to fix the issue that it
      may overstep the chunk end for _sctp_walk_{params, errors} with
      'chunk_end > offset(length) + sizeof(length)'.
      
      But it introduced a side effect: When processing INIT, it verifies
      the chunks with 'param.v == chunk_end' after iterating all params
      by sctp_walk_params(). With the check 'chunk_end > offset(length)
      + sizeof(length)', it would return when the last param is not yet
      accessed. Because the last param usually is fwdtsn supported param
      whose size is 4 and 'chunk_end == offset(length) + sizeof(length)'
      
      This is a badly issue even causing sctp couldn't process 4-shakes.
      Client would always get abort when connecting to server, due to
      the failure of INIT chunk verification on server.
      
      The patch is to use 'chunk_end <= offset(length) + sizeof(length)'
      instead of 'chunk_end < offset(length) + sizeof(length)' for both
      _sctp_walk_params and _sctp_walk_errors.
      
      Fixes: b1f5bfc2 ("sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b84202c
  9. 26 7月, 2017 1 次提交
    • P
      udp: preserve head state for IP_CMSG_PASSSEC · dce4551c
      Paolo Abeni 提交于
      Paul Moore reported a SELinux/IP_PASSSEC regression
      caused by missing skb->sp at recvmsg() time. We need to
      preserve the skb head state to process the IP_CMSG_PASSSEC
      cmsg.
      
      With this commit we avoid releasing the skb head state in the
      BH even if a secpath is attached to the current skb, and stores
      the skb status (with/without head states) in the scratch area,
      so that we can access it at skb deallocation time, without
      incurring in cache-miss penalties.
      
      This also avoids misusing the skb CB for ipv6 packets,
      as introduced by the commit 0ddf3fb2 ("udp: preserve
      skb->dst if required for IP options processing").
      
      Clean a bit the scratch area helpers implementation, to
      reduce the code differences between 32 and 64 bits build.
      Reported-by: NPaul Moore <paul@paul-moore.com>
      Fixes: 0a463c78 ("udp: avoid a cache miss on dequeue")
      Fixes: 0ddf3fb2 ("udp: preserve skb->dst if required for IP options processing")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Tested-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dce4551c
  10. 25 7月, 2017 3 次提交
  11. 20 7月, 2017 2 次提交
  12. 19 7月, 2017 2 次提交
    • F
      xfrm: add xdst pcpu cache · ec30d78c
      Florian Westphal 提交于
      retain last used xfrm_dst in a pcpu cache.
      On next request, reuse this dst if the policies are the same.
      
      The cache will not help with strict RR workloads as there is no hit.
      
      The cache packet-path part is reasonably small, the notifier part is
      needed so we do not add long hangs when a device is dismantled but some
      pcpu xdst still holds a reference, there are also calls to the flush
      operation when userspace deletes SAs so modules can be removed
      (there is no hit.
      
      We need to run the dst_release on the correct cpu to avoid races with
      packet path.  This is done by adding a work_struct for each cpu and then
      doing the actual test/release on each affected cpu via schedule_work_on().
      
      Test results using 4 network namespaces and null encryption:
      
      ns1           ns2          -> ns3           -> ns4
      netperf -> xfrm/null enc   -> xfrm/null dec -> netserver
      
      what                    TCP_STREAM      UDP_STREAM      UDP_RR
      Flow cache:             14644.61        294.35          327231.64
      No flow cache:		14349.81	242.64		202301.72
      Pcpu cache:		14629.70	292.21		205595.22
      
      UDP tests used 64byte packets, tests ran for one minute each,
      value is average over ten iterations.
      
      'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
      series but without this patch.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec30d78c
    • F
      xfrm: remove flow cache · 09c75704
      Florian Westphal 提交于
      After rcu conversions performance degradation in forward tests isn't that
      noticeable anymore.
      
      See next patch for some numbers.
      
      A followup patcg could then also remove genid from the policies
      as we do not cache bundles anymore.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09c75704
  13. 17 7月, 2017 2 次提交
    • E
      inetpeer: remove AVL implementation in favor of RB tree · b145425f
      Eric Dumazet 提交于
      As discussed in Faro during Netfilter Workshop 2017, RB trees can be
      used with RCU, using a seqlock.
      
      Note that net/rxrpc/conn_service.c is already using this.
      
      This patch converts inetpeer from AVL tree to RB tree, since it allows
      to remove private AVL implementation in favor of shared RB code.
      
      $ size net/ipv4/inetpeer.before net/ipv4/inetpeer.after
         text    data     bss     dec     hex filename
         3195      40     128    3363     d23 net/ipv4/inetpeer.before
         1562      24       0    1586     632 net/ipv4/inetpeer.after
      
      The same technique can be used to speed up
      net/netfilter/nft_set_rbtree.c (removing rwlock contention in fast path)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b145425f
    • D
      net/unix: drop obsolete fd-recursion limits · 27eac47b
      David Herrmann 提交于
      All unix sockets now account inflight FDs to the respective sender.
      This was introduced in:
      
          commit 712f4aad
          Author: willy tarreau <w@1wt.eu>
          Date:   Sun Jan 10 07:54:56 2016 +0100
      
              unix: properly account for FDs passed over unix sockets
      
      and further refined in:
      
          commit 415e3d3e
          Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
          Date:   Wed Feb 3 02:11:03 2016 +0100
      
              unix: correctly track in-flight fds in sending process user_struct
      
      Hence, regardless of the stacking depth of FDs, the total number of
      inflight FDs is limited, and accounted. There is no known way for a
      local user to exceed those limits or exploit the accounting.
      
      Furthermore, the GC logic is independent of the recursion/stacking depth
      as well. It solely depends on the total number of inflight FDs,
      regardless of their layout.
      
      Lastly, the current `recursion_level' suffers a TOCTOU race, since it
      checks and inherits depths only at queue time. If we consider `A<-B' to
      mean `queue-B-on-A', the following sequence circumvents the recursion
      level easily:
      
          A<-B
             B<-C
                C<-D
                   ...
                     Y<-Z
      
      resulting in:
      
          A<-B<-C<-...<-Z
      
      With all of this in mind, lets drop the recursion limit. It has no
      additional security value, anymore. On the contrary, it randomly
      confuses message brokers that try to forward file-descriptors, since
      any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client
      maliciously modifies the FD while inflight.
      
      Cc: Alban Crequy <alban.crequy@collabora.co.uk>
      Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Reviewed-by: NTom Gundersen <teg@jklm.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27eac47b