1. 03 8月, 2017 1 次提交
  2. 02 8月, 2017 4 次提交
  3. 01 8月, 2017 4 次提交
  4. 31 7月, 2017 1 次提交
    • J
      net netlink: Add new type NLA_BITFIELD32 · 64c83d83
      Jamal Hadi Salim 提交于
      Generic bitflags attribute content sent to the kernel by user.
      With this netlink attr type the user can either set or unset a
      flag in the kernel.
      
      The value is a bitmap that defines the bit values being set
      The selector is a bitmask that defines which value bit is to be
      considered.
      
      A check is made to ensure the rules that a kernel subsystem always
      conforms to bitflags the kernel already knows about. i.e
      if the user tries to set a bit flag that is not understood then
      the _it will be rejected_.
      
      In the most basic form, the user specifies the attribute policy as:
      [ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data = &myvalidflags },
      
      where myvalidflags is the bit mask of the flags the kernel understands.
      
      If the user _does not_ provide myvalidflags then the attribute will
      also be rejected.
      
      Examples:
      value = 0x0, and selector = 0x1
      implies we are selecting bit 1 and we want to set its value to 0.
      
      value = 0x2, and selector = 0x2
      implies we are selecting bit 2 and we want to set its value to 1.
      Suggested-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64c83d83
  5. 30 7月, 2017 1 次提交
    • P
      udp6: fix socket leak on early demux · c9f2c1ae
      Paolo Abeni 提交于
      When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
      sk reference is retrieved and used, but the relevant reference
      count is leaked and the socket destructor is never called.
      Beyond leaking the sk memory, if there are pending UDP packets
      in the receive queue, even the related accounted memory is leaked.
      
      In the long run, this will cause persistent forward allocation errors
      and no UDP skbs (both ipv4 and ipv6) will be able to reach the
      user-space.
      
      Fix this by explicitly accessing the early demux reference before
      the lookup, and properly decreasing the socket reference count
      after usage.
      
      Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
      the now obsoleted comment about "socket cache".
      
      The newly added code is derived from the current ipv4 code for the
      similar path.
      
      v1 -> v2:
        fixed the __udp6_lib_rcv() return code for resubmission,
        as suggested by Eric
      Reported-by: NSam Edwards <CFSworks@gmail.com>
      Reported-by: NMarc Haber <mh+netdev@zugschlus.de>
      Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9f2c1ae
  6. 27 7月, 2017 1 次提交
    • X
      sctp: fix the check for _sctp_walk_params and _sctp_walk_errors · 6b84202c
      Xin Long 提交于
      Commit b1f5bfc2 ("sctp: don't dereference ptr before leaving
      _sctp_walk_{params, errors}()") tried to fix the issue that it
      may overstep the chunk end for _sctp_walk_{params, errors} with
      'chunk_end > offset(length) + sizeof(length)'.
      
      But it introduced a side effect: When processing INIT, it verifies
      the chunks with 'param.v == chunk_end' after iterating all params
      by sctp_walk_params(). With the check 'chunk_end > offset(length)
      + sizeof(length)', it would return when the last param is not yet
      accessed. Because the last param usually is fwdtsn supported param
      whose size is 4 and 'chunk_end == offset(length) + sizeof(length)'
      
      This is a badly issue even causing sctp couldn't process 4-shakes.
      Client would always get abort when connecting to server, due to
      the failure of INIT chunk verification on server.
      
      The patch is to use 'chunk_end <= offset(length) + sizeof(length)'
      instead of 'chunk_end < offset(length) + sizeof(length)' for both
      _sctp_walk_params and _sctp_walk_errors.
      
      Fixes: b1f5bfc2 ("sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b84202c
  7. 26 7月, 2017 1 次提交
    • P
      udp: preserve head state for IP_CMSG_PASSSEC · dce4551c
      Paolo Abeni 提交于
      Paul Moore reported a SELinux/IP_PASSSEC regression
      caused by missing skb->sp at recvmsg() time. We need to
      preserve the skb head state to process the IP_CMSG_PASSSEC
      cmsg.
      
      With this commit we avoid releasing the skb head state in the
      BH even if a secpath is attached to the current skb, and stores
      the skb status (with/without head states) in the scratch area,
      so that we can access it at skb deallocation time, without
      incurring in cache-miss penalties.
      
      This also avoids misusing the skb CB for ipv6 packets,
      as introduced by the commit 0ddf3fb2 ("udp: preserve
      skb->dst if required for IP options processing").
      
      Clean a bit the scratch area helpers implementation, to
      reduce the code differences between 32 and 64 bits build.
      Reported-by: NPaul Moore <paul@paul-moore.com>
      Fixes: 0a463c78 ("udp: avoid a cache miss on dequeue")
      Fixes: 0ddf3fb2 ("udp: preserve skb->dst if required for IP options processing")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Tested-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dce4551c
  8. 25 7月, 2017 3 次提交
  9. 20 7月, 2017 2 次提交
  10. 19 7月, 2017 2 次提交
    • F
      xfrm: add xdst pcpu cache · ec30d78c
      Florian Westphal 提交于
      retain last used xfrm_dst in a pcpu cache.
      On next request, reuse this dst if the policies are the same.
      
      The cache will not help with strict RR workloads as there is no hit.
      
      The cache packet-path part is reasonably small, the notifier part is
      needed so we do not add long hangs when a device is dismantled but some
      pcpu xdst still holds a reference, there are also calls to the flush
      operation when userspace deletes SAs so modules can be removed
      (there is no hit.
      
      We need to run the dst_release on the correct cpu to avoid races with
      packet path.  This is done by adding a work_struct for each cpu and then
      doing the actual test/release on each affected cpu via schedule_work_on().
      
      Test results using 4 network namespaces and null encryption:
      
      ns1           ns2          -> ns3           -> ns4
      netperf -> xfrm/null enc   -> xfrm/null dec -> netserver
      
      what                    TCP_STREAM      UDP_STREAM      UDP_RR
      Flow cache:             14644.61        294.35          327231.64
      No flow cache:		14349.81	242.64		202301.72
      Pcpu cache:		14629.70	292.21		205595.22
      
      UDP tests used 64byte packets, tests ran for one minute each,
      value is average over ten iterations.
      
      'Flow cache' is 'net-next', 'No flow cache' is net-next plus this
      series but without this patch.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec30d78c
    • F
      xfrm: remove flow cache · 09c75704
      Florian Westphal 提交于
      After rcu conversions performance degradation in forward tests isn't that
      noticeable anymore.
      
      See next patch for some numbers.
      
      A followup patcg could then also remove genid from the policies
      as we do not cache bundles anymore.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09c75704
  11. 17 7月, 2017 6 次提交
    • E
      inetpeer: remove AVL implementation in favor of RB tree · b145425f
      Eric Dumazet 提交于
      As discussed in Faro during Netfilter Workshop 2017, RB trees can be
      used with RCU, using a seqlock.
      
      Note that net/rxrpc/conn_service.c is already using this.
      
      This patch converts inetpeer from AVL tree to RB tree, since it allows
      to remove private AVL implementation in favor of shared RB code.
      
      $ size net/ipv4/inetpeer.before net/ipv4/inetpeer.after
         text    data     bss     dec     hex filename
         3195      40     128    3363     d23 net/ipv4/inetpeer.before
         1562      24       0    1586     632 net/ipv4/inetpeer.after
      
      The same technique can be used to speed up
      net/netfilter/nft_set_rbtree.c (removing rwlock contention in fast path)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b145425f
    • D
      net/unix: drop obsolete fd-recursion limits · 27eac47b
      David Herrmann 提交于
      All unix sockets now account inflight FDs to the respective sender.
      This was introduced in:
      
          commit 712f4aad
          Author: willy tarreau <w@1wt.eu>
          Date:   Sun Jan 10 07:54:56 2016 +0100
      
              unix: properly account for FDs passed over unix sockets
      
      and further refined in:
      
          commit 415e3d3e
          Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
          Date:   Wed Feb 3 02:11:03 2016 +0100
      
              unix: correctly track in-flight fds in sending process user_struct
      
      Hence, regardless of the stacking depth of FDs, the total number of
      inflight FDs is limited, and accounted. There is no known way for a
      local user to exceed those limits or exploit the accounting.
      
      Furthermore, the GC logic is independent of the recursion/stacking depth
      as well. It solely depends on the total number of inflight FDs,
      regardless of their layout.
      
      Lastly, the current `recursion_level' suffers a TOCTOU race, since it
      checks and inherits depths only at queue time. If we consider `A<-B' to
      mean `queue-B-on-A', the following sequence circumvents the recursion
      level easily:
      
          A<-B
             B<-C
                C<-D
                   ...
                     Y<-Z
      
      resulting in:
      
          A<-B<-C<-...<-Z
      
      With all of this in mind, lets drop the recursion limit. It has no
      additional security value, anymore. On the contrary, it randomly
      confuses message brokers that try to forward file-descriptors, since
      any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client
      maliciously modifies the FD while inflight.
      
      Cc: Alban Crequy <alban.crequy@collabora.co.uk>
      Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Reviewed-by: NTom Gundersen <teg@jklm.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27eac47b
    • X
      sctp: remove the typedef sctp_hmac_algo_param_t · 1474774a
      Xin Long 提交于
      This patch is to remove the typedef sctp_hmac_algo_param_t, and
      replace with struct sctp_hmac_algo_param in the places where it's
      using this typedef.
      
      It is also to use sizeof(variable) instead of sizeof(type).
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1474774a
    • X
      sctp: remove the typedef sctp_chunks_param_t · a762a9d9
      Xin Long 提交于
      This patch is to remove the typedef sctp_chunks_param_t, and
      replace with struct sctp_chunks_param in the places where it's
      using this typedef.
      
      It is also to use sizeof(variable) instead of sizeof(type).
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a762a9d9
    • X
      sctp: remove the typedef sctp_random_param_t · b02db702
      Xin Long 提交于
      This patch is to remove the typedef sctp_random_param_t, and
      replace with struct sctp_random_param in the places where it's
      using this typedef.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b02db702
    • V
      ip6: fix PMTU discovery when using /127 subnets · ccdb2d17
      Vincent Bernat 提交于
      The definition of an "anycast destination address" has been tweaked as a
      side-effect of commit 2647a9b0 ("ipv6: Remove external dependency on
      rt6i_gateway and RTF_ANYCAST"). The first address of a point-to-point
      /127 subnet is now considered as an anycast address. This prevents
      ICMPv6 errors to be returned to a sender of such a subnet and breaks
      PMTU discovery.
      
      This can be reproduced with:
      
          ip link add name out6 type veth peer name in6
          ip link add name out7 type veth peer name in7
          ip link set mtu 1400 dev out7
          ip link set mtu 1400 dev in7
          ip netns add next-hop
          ip netns add next-next-hop
          ip link set netns next-hop dev in6
          ip link set netns next-hop dev out7
          ip link set netns next-next-hop dev in7
          ip link set up dev out6
          ip addr add 2001:db8:1::12/127 dev out6
          ip netns exec next-hop ip link set up dev in6
          ip netns exec next-hop ip link set up dev out7
          ip netns exec next-hop ip addr add 2001:db8:1::13/127 dev in6
          ip netns exec next-hop ip addr add 2001:db8:1::14/127 dev out7
          ip netns exec next-hop ip route add default via 2001:db8:1::15
          ip netns exec next-hop sysctl -qw net.ipv6.conf.all.forwarding=1
          ip netns exec next-next-hop ip link set up dev in7
          ip netns exec next-next-hop ip addr add 2001:db8:1::15/127 dev in7
          ip netns exec next-next-hop ip addr add 2001:db8:1::50/128 dev in7
          ip netns exec next-next-hop ip route add default via 2001:db8:1::14
          ip netns exec next-next-hop sysctl -qw net.ipv6.conf.all.forwarding=1
          ip route add 2001:db8:1::48/123 via 2001:db8:1::13
          sleep 4
          ping -M do -s 1452 -c 3 2001:db8:1::50 || true
          ip route get 2001:db8:1::50
      
      Before the patch, we get:
      
          2001:db8:1::50 from :: via 2001:db8:1::13 dev out6 src 2001:db8:1::12 metric 1024  pref medium
      
      After the patch, we get:
      
          2001:db8:1::50 via 2001:db8:1::13 dev out6 src 2001:db8:1::12 metric 0
              cache  expires 578sec mtu 1400 pref medium
      
      Fixes: 2647a9b0 ("ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST")
      Signed-off-by: NVincent Bernat <vincent@bernat.im>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccdb2d17
  12. 16 7月, 2017 1 次提交
    • A
      sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}() · b1f5bfc2
      Alexander Potapenko 提交于
      If the length field of the iterator (|pos.p| or |err|) is past the end
      of the chunk, we shouldn't access it.
      
      This bug has been detected by KMSAN. For the following pair of system
      calls:
      
        socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3
        sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0),
               inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0,
               sin6_scope_id=0}, 28) = 1
      
      the tool has reported a use of uninitialized memory:
      
        ==================================================================
        BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0
        CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
        01/01/2011
        Call Trace:
         <IRQ>
         __dump_stack lib/dump_stack.c:16
         dump_stack+0x172/0x1c0 lib/dump_stack.c:52
         kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
         __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
         __sctp_rcv_init_lookup net/sctp/input.c:1074
         __sctp_rcv_lookup_harder net/sctp/input.c:1233
         __sctp_rcv_lookup net/sctp/input.c:1255
         sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170
         sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984
         ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
         NF_HOOK ./include/linux/netfilter.h:257
         ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
         dst_input ./include/net/dst.h:492
         ip6_rcv_finish net/ipv6/ip6_input.c:69
         NF_HOOK ./include/linux/netfilter.h:257
         ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
         __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
         __netif_receive_skb net/core/dev.c:4246
         process_backlog+0x667/0xba0 net/core/dev.c:4866
         napi_poll net/core/dev.c:5268
         net_rx_action+0xc95/0x1590 net/core/dev.c:5333
         __do_softirq+0x485/0x942 kernel/softirq.c:284
         do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
         </IRQ>
         do_softirq kernel/softirq.c:328
         __local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181
         local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31
         rcu_read_unlock_bh ./include/linux/rcupdate.h:931
         ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124
         ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149
         NF_HOOK_COND ./include/linux/netfilter.h:246
         ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163
         dst_output ./include/net/dst.h:486
         NF_HOOK ./include/linux/netfilter.h:257
         ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261
         sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225
         sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632
         sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
         sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
         sctp_side_effects net/sctp/sm_sideeffect.c:1773
         sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
         sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
         sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
         inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
         sock_sendmsg_nosec net/socket.c:633
         sock_sendmsg net/socket.c:643
         SYSC_sendto+0x608/0x710 net/socket.c:1696
         SyS_sendto+0x8a/0xb0 net/socket.c:1664
         do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
         entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
        RIP: 0033:0x401133
        RSP: 002b:00007fff6d99cd38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
        RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401133
        RDX: 0000000000000001 RSI: 0000000000494088 RDI: 0000000000000003
        RBP: 00007fff6d99cd90 R08: 00007fff6d99cd50 R09: 000000000000001c
        R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
        R13: 00000000004063d0 R14: 0000000000406460 R15: 0000000000000000
        origin:
         save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
         kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
         kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
         kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211
         slab_alloc_node mm/slub.c:2743
         __kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351
         __kmalloc_reserve net/core/skbuff.c:138
         __alloc_skb+0x26b/0x840 net/core/skbuff.c:231
         alloc_skb ./include/linux/skbuff.h:933
         sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570
         sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
         sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
         sctp_side_effects net/sctp/sm_sideeffect.c:1773
         sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
         sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
         sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
         inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
         sock_sendmsg_nosec net/socket.c:633
         sock_sendmsg net/socket.c:643
         SYSC_sendto+0x608/0x710 net/socket.c:1696
         SyS_sendto+0x8a/0xb0 net/socket.c:1664
         do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
         return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
        ==================================================================
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1f5bfc2
  13. 14 7月, 2017 1 次提交
  14. 13 7月, 2017 1 次提交
  15. 11 7月, 2017 1 次提交
    • D
      9p: Implement show_options · c4fac910
      David Howells 提交于
      Implement the show_options superblock op for 9p as part of a bid to get
      rid of s_options and generic_show_options() to make it easier to implement
      a context-based mount where the mount options can be passed individually
      over a file descriptor.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Eric Van Hensbergen <ericvh@gmail.com>
      cc: Ron Minnich <rminnich@sandia.gov>
      cc: Latchesar Ionkov <lucho@ionkov.net>
      cc: v9fs-developer@lists.sourceforge.net
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c4fac910
  16. 08 7月, 2017 1 次提交
  17. 06 7月, 2017 1 次提交
  18. 05 7月, 2017 8 次提交