1. 17 7月, 2016 1 次提交
    • P
      vlan: use a valid default mtu value for vlan over macsec · 18d3df3e
      Paolo Abeni 提交于
      macsec can't cope with mtu frames which need vlan tag insertion, and
      vlan device set the default mtu equal to the underlying dev's one.
      By default vlan over macsec devices use invalid mtu, dropping
      all the large packets.
      This patch adds a netif helper to check if an upper vlan device
      needs mtu reduction. The helper is used during vlan devices
      initialization to set a valid default and during mtu updating to
      forbid invalid, too bit, mtu values.
      The helper currently only check if the lower dev is a macsec device,
      if we get more users, we need to update only the helper (possibly
      reserving an additional IFF bit).
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18d3df3e
  2. 16 7月, 2016 1 次提交
    • J
      tcp: enable per-socket rate limiting of all 'challenge acks' · 083ae308
      Jason Baron 提交于
      The per-socket rate limit for 'challenge acks' was introduced in the
      context of limiting ack loops:
      
      commit f2b2c582 ("tcp: mitigate ACK loops for connections as tcp_sock")
      
      And I think it can be extended to rate limit all 'challenge acks' on a
      per-socket basis.
      
      Since we have the global tcp_challenge_ack_limit, this patch allows for
      tcp_challenge_ack_limit to be set to a large value and effectively rely on
      the per-socket limit, or set tcp_challenge_ack_limit to a lower value and
      still prevents a single connections from consuming the entire challenge ack
      quota.
      
      It further moves in the direction of eliminating the global limit at some
      point, as Eric Dumazet has suggested. This a follow-up to:
      Subject: tcp: make challenge acks less predictable
      
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Yue Cao <ycao009@ucr.edu>
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      083ae308
  3. 14 7月, 2016 2 次提交
    • W
      dccp: limit sk_filter trim to payload · 4f0c40d9
      Willem de Bruijn 提交于
      Dccp verifies packet integrity, including length, at initial rcv in
      dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.
      
      A call to sk_filter in-between can cause __skb_pull to wrap skb->len.
      skb_copy_datagram_msg interprets this as a negative value, so
      (correctly) fails with EFAULT. The negative length is reported in
      ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.
      
      Introduce an sk_receive_skb variant that caps how small a filter
      program can trim packets, and call this in dccp with the header
      length. Excessively trimmed packets are now processed normally and
      queued for reception as 0B payloads.
      
      Fixes: 7c657876 ("[DCCP]: Initial implementation")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f0c40d9
    • W
      rose: limit sk_filter trim to payload · f4979fce
      Willem de Bruijn 提交于
      Sockets can have a filter program attached that drops or trims
      incoming packets based on the filter program return value.
      
      Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
      verifies this on arrival in rose_route_frame and unconditionally pulls
      the bytes in rose_recvmsg. The filter can trim packets to below this
      value in-between, causing pull to fail, leaving the partial header at
      the time of skb_copy_datagram_msg.
      
      Place a lower bound on the size to which sk_filter may trim packets
      by introducing sk_filter_trim_cap and call this for rose packets.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4979fce
  4. 12 7月, 2016 8 次提交
    • P
      netfilter: conntrack: skip clash resolution if nat is in place · 590b52e1
      Pablo Neira Ayuso 提交于
      The clash resolution is not easy to apply if the NAT table is
      registered. Even if no NAT rules are installed, the nul-binding ensures
      that a unique tuple is used, thus, the packet that loses race gets a
      different source port number, as described by:
      
      http://marc.info/?l=netfilter-devel&m=146818011604484&w=2
      
      Clash resolution with NAT is also problematic if addresses/port range
      ports are used since the conntrack that wins race may describe a
      different mangling that we may have earlier applied to the packet via
      nf_nat_setup_info().
      
      Fixes: 71d8c47f ("netfilter: conntrack: introduce clash resolution on insertion race")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Tested-by: NMarc Dionne <marc.c.dionne@gmail.com>
      590b52e1
    • J
      tipc: reset all unicast links when broadcast send link fails · 1fc07f3e
      Jon Paul Maloy 提交于
      In test situations with many nodes and a heavily stressed system we have
      observed that the transmission broadcast link may fail due to an
      excessive number of retransmissions of the same packet. In such
      situations we need to reset all unicast links to all peers, in order to
      reset and re-synchronize the broadcast link.
      
      In this commit, we add a new function tipc_bearer_reset_all() to be used
      in such situations. The function scans across all bearers and resets all
      their pertaining links.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fc07f3e
    • J
      tipc: ensure correct broadcast send buffer release when peer is lost · a71eb720
      Jon Paul Maloy 提交于
      After a new receiver peer has been added to the broadcast transmission
      link, we allow immediate transmission of new broadcast packets, trusting
      that the new peer will not accept the packets until it has received the
      previously sent unicast broadcast initialiation message. In the same
      way, the sender must not accept any acknowledges until it has itself
      received the broadcast initialization from the peer, as well as
      confirmation of the reception of its own initialization message.
      
      Furthermore, when a receiver peer goes down, the sender has to produce
      the missing acknowledges from the lost peer locally, in order ensure
      correct release of the buffers that were expected to be acknowledged by
      the said peer.
      
      In a highly stressed system we have observed that contact with a peer
      may come up and be lost before the above mentioned broadcast initial-
      ization and confirmation have been received. This leads to the locally
      produced acknowledges being rejected, and the non-acknowledged buffers
      to linger in the broadcast link transmission queue until it fills up
      and the link goes into permanent congestion.
      
      In this commit, we remedy this by temporarily setting the corresponding
      broadcast receive link state to ESTABLISHED and the 'bc_peer_is_up'
      state to true before we issue the local acknowledges. This ensures that
      those acknowledges will always be accepted. The mentioned state values
      are restored immediately afterwards when the link is reset.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a71eb720
    • J
      tipc: extend broadcast link initialization criteria · 2d18ac4b
      Jon Paul Maloy 提交于
      At first contact between two nodes, an endpoint might sometimes have
      time to send out a LINK_PROTOCOL/STATE packet before it has received
      the broadcast initialization packet from the peer, i.e., before it has
      received a valid broadcast packet number to add to the 'bc_ack' field
      of the protocol message.
      
      This means that the peer endpoint will receive a protocol packet with an
      invalid broadcast acknowledge value of 0. Under unlucky circumstances
      this may lead to the original, already received acknowledge value being
      overwritten, so that the whole broadcast link goes stale after a while.
      
      We fix this by delaying the setting of the link field 'bc_peer_is_up'
      until we know that the peer really has received our own broadcast
      initialization message. The latter is always sent out as the first
      unicast message on a link, and always with seqeunce number 1. Because
      of this, we only need to look for a non-zero unicast acknowledge value
      in the arriving STATE messages, and once that is confirmed we know we
      are safe and can set the mentioned field. Before this moment, we must
      ignore all broadcast acknowledges from the peer.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d18ac4b
    • S
      sock: ignore SCM_RIGHTS and SCM_CREDENTIALS in __sock_cmsg_send · 779f1ede
      Soheil Hassas Yeganeh 提交于
      Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS
      as a control message to TCP. Since __sock_cmsg_send does not
      support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and
      hence breaks pulse audio over TCP.
      
      SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer
      but they semantically belong to SOL_UNIX. Since all
      cmsg-processing functions including sock_cmsg_send ignore control
      messages of other layers, it is best to ignore SCM_RIGHTS
      and SCM_CREDENTIALS for consistency (and also for fixing pulse
      audio over TCP).
      
      Fixes: c14ac945 ("sock: enable timestamping using control messages")
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Reported-by: NSergei Trofimovich <slyfox@gentoo.org>
      Tested-by: NSergei Trofimovich <slyfox@gentoo.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      779f1ede
    • J
      ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space · 80610229
      Julian Anastasov 提交于
      Vegard Nossum is reporting for a crash in fib_dump_info
      when nh_dev = NULL and fib_nhs == 1:
      
      Pid: 50, comm: netlink.exe Not tainted 4.7.0-rc5+
      RIP: 0033:[<00000000602b3d18>]
      RSP: 0000000062623890  EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 000000006261b800 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000024 RDI: 000000006245ba00
      RBP: 00000000626238f0 R08: 000000000000029c R09: 0000000000000000
      R10: 0000000062468038 R11: 000000006245ba00 R12: 000000006245ba00
      R13: 00000000625f96c0 R14: 00000000601e16f0 R15: 0000000000000000
      Kernel panic - not syncing: Kernel mode fault at addr 0x2e0, ip 0x602b3d18
      CPU: 0 PID: 50 Comm: netlink.exe Not tainted 4.7.0-rc5+ #581
      Stack:
       626238f0 960226a02 00000400 000000fe
       62623910 600afca7 62623970 62623a48
       62468038 00000018 00000000 00000000
      Call Trace:
       [<602b3e93>] rtmsg_fib+0xd3/0x190
       [<602b6680>] fib_table_insert+0x260/0x500
       [<602b0e5d>] inet_rtm_newroute+0x4d/0x60
       [<60250def>] rtnetlink_rcv_msg+0x8f/0x270
       [<60267079>] netlink_rcv_skb+0xc9/0xe0
       [<60250d4b>] rtnetlink_rcv+0x3b/0x50
       [<60265400>] netlink_unicast+0x1a0/0x2c0
       [<60265e47>] netlink_sendmsg+0x3f7/0x470
       [<6021dc9a>] sock_sendmsg+0x3a/0x90
       [<6021e0d0>] ___sys_sendmsg+0x300/0x360
       [<6021fa64>] __sys_sendmsg+0x54/0xa0
       [<6021fac0>] SyS_sendmsg+0x10/0x20
       [<6001ea68>] handle_syscall+0x88/0x90
       [<600295fd>] userspace+0x3fd/0x500
       [<6001ac55>] fork_handler+0x85/0x90
      
      $ addr2line -e vmlinux -i 0x602b3d18
      include/linux/inetdevice.h:222
      net/ipv4/fib_semantics.c:1264
      
      Problem happens when RTNH_F_LINKDOWN is provided from user space
      when creating routes that do not use the flag, catched with
      netlink fuzzer.
      
      Currently, the kernel allows user space to set both flags
      to nh_flags and fib_flags but this is not intentional, the
      assumption was that they are not set. Fix this by rejecting
      both flags with EINVAL.
      Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
      Fixes: 0eeb075f ("net: ipv4 sysctl option to ignore routes when nexthop link is down")
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
      Cc: Dinesh Dutt <ddutt@cumulusnetworks.com>
      Cc: Scott Feldman <sfeldma@gmail.com>
      Reviewed-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80610229
    • E
      tcp: make challenge acks less predictable · 75ff39cc
      Eric Dumazet 提交于
      Yue Cao claims that current host rate limiting of challenge ACKS
      (RFC 5961) could leak enough information to allow a patient attacker
      to hijack TCP sessions. He will soon provide details in an academic
      paper.
      
      This patch increases the default limit from 100 to 1000, and adds
      some randomization so that the attacker can no longer hijack
      sessions without spending a considerable amount of probes.
      
      Based on initial analysis and patch from Linus.
      
      Note that we also have per socket rate limiting, so it is tempting
      to remove the host limit in the future.
      
      v2: randomize the count of challenge acks per second, not the period.
      
      Fixes: 282f23c6 ("tcp: implement RFC 5961 3.2")
      Reported-by: NYue Cao <ycao009@ucr.edu>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75ff39cc
    • M
      udp: prevent bugcheck if filter truncates packet too much · a6127697
      Michal Kubeček 提交于
      If socket filter truncates an udp packet below the length of UDP header
      in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
      BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
      kernel is configured that way) can be easily enforced by an unprivileged
      user which was reported as CVE-2016-6162. For a reproducer, see
      http://seclists.org/oss-sec/2016/q3/8
      
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Reported-by: NMarco Grassi <marco.gra@gmail.com>
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6127697
  5. 10 7月, 2016 1 次提交
  6. 08 7月, 2016 2 次提交
  7. 06 7月, 2016 5 次提交
  8. 05 7月, 2016 7 次提交
  9. 02 7月, 2016 3 次提交
  10. 01 7月, 2016 1 次提交
  11. 30 6月, 2016 1 次提交
  12. 29 6月, 2016 8 次提交