1. 30 5月, 2017 3 次提交
  2. 28 5月, 2017 1 次提交
  3. 27 5月, 2017 13 次提交
  4. 26 5月, 2017 9 次提交
    • D
      bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data · 41703a73
      Daniel Borkmann 提交于
      The bpf_clone_redirect() still needs to be listed in
      bpf_helper_changes_pkt_data() since we call into
      bpf_try_make_head_writable() from there, thus we need
      to invalidate prior pkt regs as well.
      
      Fixes: 36bbef52 ("bpf: direct packet write and access for helpers for clsact progs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41703a73
    • I
      arp: fixed -Wuninitialized compiler warning · 5990baaa
      Ihar Hrachyshka 提交于
      Commit 7d472a59 ("arp: always override
      existing neigh entries with gratuitous ARP") introduced a compiler
      warning:
      
      net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
      this function [-Wmaybe-uninitialized]
      
      While the code logic seems to be correct and doesn't allow the variable
      to be used uninitialized, and the warning is not consistently
      reproducible, it's still worth fixing it for other people not to waste
      time looking at the warning in case it pops up in the build environment.
      Yes, compiler is probably at fault, but we will need to accommodate.
      
      Fixes: 7d472a59 ("arp: always override existing neigh entries with gratuitous ARP")
      Signed-off-by: NIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5990baaa
    • W
      tcp: avoid fastopen API to be used on AF_UNSPEC · ba615f67
      Wei Wang 提交于
      Fastopen API should be used to perform fastopen operations on the TCP
      socket. It does not make sense to use fastopen API to perform disconnect
      by calling it with AF_UNSPEC. The fastopen data path is also prone to
      race conditions and bugs when using with AF_UNSPEC.
      
      One issue reported and analyzed by Vegard Nossum is as follows:
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Thread A:                            Thread B:
      ------------------------------------------------------------------------
      sendto()
       - tcp_sendmsg()
           - sk_stream_memory_free() = 0
               - goto wait_for_sndbuf
      	     - sk_stream_wait_memory()
      	        - sk_wait_event() // sleep
                |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
      	  |                           - tcp_sendmsg()
      	  |                              - tcp_sendmsg_fastopen()
      	  |                                 - __inet_stream_connect()
      	  |                                    - tcp_disconnect() //because of AF_UNSPEC
      	  |                                       - tcp_transmit_skb()// send RST
      	  |                                    - return 0; // no reconnect!
      	  |                           - sk_stream_wait_connect()
      	  |                                 - sock_error()
      	  |                                    - xchg(&sk->sk_err, 0)
      	  |                                    - return -ECONNRESET
      	- ... // wake up, see sk->sk_err == 0
          - skb_entail() on TCP_CLOSE socket
      
      If the connection is reopened then we will send a brand new SYN packet
      after thread A has already queued a buffer. At this point I think the
      socket internal state (sequence numbers etc.) becomes messed up.
      
      When the new connection is closed, the FIN-ACK is rejected because the
      sequence number is outside the window. The other side tries to
      retransmit,
      but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
      corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      
      Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
      and return EOPNOTSUPP to user if such case happens.
      
      Fixes: cf60af03 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba615f67
    • D
      rxrpc: Support network namespacing · 2baec2c3
      David Howells 提交于
      Support network namespacing in AF_RXRPC with the following changes:
      
       (1) All the local endpoint, peer and call lists, locks, counters, etc. are
           moved into the per-namespace record.
      
       (2) All the connection tracking is moved into the per-namespace record
           with the exception of the client connection ID tree, which is kept
           global so that connection IDs are kept unique per-machine.
      
       (3) Each namespace gets its own epoch.  This allows each network namespace
           to pretend to be a separate client machine.
      
       (4) The /proc/net/rxrpc_xxx files are now called /proc/net/rxrpc/xxx and
           the contents reflect the namespace.
      
      fs/afs/ should be okay with this patch as it explicitly requires the current
      net namespace to be init_net to permit a mount to proceed at the moment.  It
      will, however, need updating so that cells, IP addresses and DNS records are
      per-namespace also.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2baec2c3
    • R
      net/packet: remove unused parameter in prb_curr_blk_in_use(). · 878cd3ba
      Rosen, Rami 提交于
      This patch removes unused parameter from prb_curr_blk_in_use() method
      in net/packet/af_packet.c.
      Signed-off-by: NRami Rosen <rami.rosen@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      878cd3ba
    • R
      net: move somaxconn init from sysctl code · 7c3f1875
      Roman Kapl 提交于
      The default value for somaxconn is set in sysctl_core_net_init(), but this
      function is not called when kernel is configured without CONFIG_SYSCTL.
      
      This results in the kernel not being able to accept TCP connections,
      because the backlog has zero size. Usually, the user ends up with:
      "TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request.  Check SNMP counters."
      If SYN cookies are not enabled the connection is rejected.
      
      Before ef547f2a (tcp: remove max_qlen_log), the effects were less
      severe, because the backlog was always at least eight slots long.
      Signed-off-by: NRoman Kapl <roman.kapl@sysgo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c3f1875
    • E
      tcp: better validation of received ack sequences · d0e1a1b5
      Eric Dumazet 提交于
      Paul Fiterau Brostean reported :
      
      <quote>
      Linux TCP stack we analyze exhibits behavior that seems odd to me.
      The scenario is as follows (all packets have empty payloads, no window
      scaling, rcv/snd window size should not be a factor):
      
             TEST HARNESS (CLIENT)                        LINUX SERVER
      
         1.  -                                          LISTEN (server listen,
      then accepts)
      
         2.  - --> <SEQ=100><CTL=SYN>               --> SYN-RECEIVED
      
         3.  - <-- <SEQ=300><ACK=101><CTL=SYN,ACK>  <-- SYN-RECEIVED
      
         4.  - --> <SEQ=101><ACK=301><CTL=ACK>      --> ESTABLISHED
      
         5.  - <-- <SEQ=301><ACK=101><CTL=FIN,ACK>  <-- FIN WAIT-1 (server
      opts to close the data connection calling "close" on the connection
      socket)
      
         6.  - --> <SEQ=101><ACK=99999><CTL=FIN,ACK> --> CLOSING (client sends
      FIN,ACK with not yet sent acknowledgement number)
      
         7.  - <-- <SEQ=302><ACK=102><CTL=ACK>      <-- CLOSING (ACK is 102
      instead of 101, why?)
      
      ... (silence from CLIENT)
      
         8.  - <-- <SEQ=301><ACK=102><CTL=FIN,ACK>  <-- CLOSING
      (retransmission, again ACK is 102)
      
      Now, note that packet 6 while having the expected sequence number,
      acknowledges something that wasn't sent by the server. So I would
      expect
      the packet to maybe prompt an ACK response from the server, and then be
      ignored. Yet it is not ignored and actually leads to an increase of the
      acknowledgement number in the server's retransmission of the FIN,ACK
      packet. The explanation I found is that the FIN  in packet 6 was
      processed, despite the acknowledgement number being unacceptable.
      Further experiments indeed show that the server processes this FIN,
      transitioning to CLOSING, then on receiving an ACK for the FIN it had
      send in packet 5, the server (or better said connection) transitions
      from CLOSING to TIME_WAIT (as signaled by netstat).
      
      </quote>
      
      Indeed, tcp_rcv_state_process() calls tcp_ack() but
      does not exploit the @acceptable status but for TCP_SYN_RECV
      state.
      
      What we want here is to send a challenge ACK, if not in TCP_SYN_RECV
      state. TCP_FIN_WAIT1 state is not the only state we should fix.
      
      Add a FLAG_NO_CHALLENGE_ACK so that tcp_rcv_state_process()
      can choose to send a challenge ACK and discard the packet instead
      of wrongly change socket state.
      
      With help from Neal Cardwell.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NPaul Fiterau Brostean <p.fiterau-brostean@science.ru.nl>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0e1a1b5
    • W
      net_sched: only create filter chains for new filters/actions · 367a8ce8
      WANG Cong 提交于
      tcf_chain_get() always creates a new filter chain if not found
      in existing ones. This is totally unnecessary when we get or
      delete filters, new chain should be only created for new filters
      (or new actions).
      
      Fixes: 5bc17018 ("net: sched: introduce multichain support for filters")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      367a8ce8
    • J
      net: sched: cls_api: make reclassify return all the way back to the original tp · ee538dce
      Jiri Pirko 提交于
      With the introduction of chain goto action, the reclassification would
      cause the re-iteration of the actual chain. It makes more sense to restart
      the whole thing and re-iterate starting from the original tp - start
      of chain 0.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee538dce
  5. 25 5月, 2017 8 次提交
    • E
      tcp: fix TCP_SYNCNT flakes · ce682ef6
      Eric Dumazet 提交于
      After the mentioned commit, some of our packetdrill tests became flaky.
      
      TCP_SYNCNT socket option can limit the number of SYN retransmits.
      
      retransmits_timed_out() has to compare times computations based on
      local_clock() while timers are based on jiffies. With NTP adjustments
      and roundings we can observe 999 ms delay for 1000 ms timers.
      We end up sending one extra SYN packet.
      
      Gimmick added in commit 6fa12c85 ("Revert Backoff [v3]: Calculate
      TCP's connection close threshold as a time value") makes no
      real sense for TCP_SYN_SENT sockets where no RTO backoff can happen at
      all.
      
      Lets use a simpler logic for TCP_SYN_SENT sockets and remove @syn_set
      parameter from retransmits_timed_out()
      
      Fixes: 9a568de4 ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce682ef6
    • V
      net: dsa: support cross-chip ageing time · 64dba236
      Vivien Didelot 提交于
      Now that the switchdev bridge ageing time attribute is propagated to all
      switch chips of the fabric, each switch can check if the requested value
      is valid and program itself, so that the whole fabric shares a common
      ageing time setting.
      
      This is especially needed for switch chips in between others, containing
      no bridge port members but evidently used in the data path.
      
      To achieve that, remove the condition which skips the other switches. We
      also don't need to identify the target switch anymore, thus remove the
      sw_index member of the dsa_notifier_ageing_time_info notifier structure.
      
      On ZII Dev Rev B (with two 88E6352 and one 88E6185) and ZII Dev Rev C
      (with two 88E6390X), we have the following hardware configuration:
      
          # ip link add name br0 type bridge
          # ip link set master br0 dev lan6
          br0: port 1(lan6) entered blocking state
          br0: port 1(lan6) entered disabled state
          # echo 2000 > /sys/class/net/br0/bridge/ageing_time
      
      Before this patch:
      
          zii-rev-b# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
          300000
          300000
          15000
      
          zii-rev-c# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
          300000
          18750
      
      After this patch:
      
          zii-rev-b# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
          15000
          15000
          15000
      
          zii-rev-c# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
          18750
          18750
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64dba236
    • J
      net/sched: flower: add support for matching on tcp flags · fdfc7dd6
      Jiri Pirko 提交于
      Benefit from the support of tcp flags dissection and allow user to
      insert rules matching on tcp flags.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdfc7dd6
    • J
      net: flow_dissector: add support for dissection of tcp flags · ac4bb5de
      Jiri Pirko 提交于
      Add support for dissection of tcp flags. Uses similar function call to
      tcp dissection function as arp, mpls and others.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac4bb5de
    • A
      net: rtnetlink: bail out from rtnl_fdb_dump() on parse error · 0ff50e83
      Alexander Potapenko 提交于
      rtnl_fdb_dump() failed to check the result of nlmsg_parse(), which led
      to contents of |ifm| being uninitialized because nlh->nlmsglen was too
      small to accommodate |ifm|. The uninitialized data may affect some
      branches and result in unwanted effects, although kernel data doesn't
      seem to leak to the userspace directly.
      
      The bug has been detected with KMSAN and syzkaller.
      
      For the record, here is the KMSAN report:
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory in rtnl_fdb_dump+0x5dc/0x1000
      CPU: 0 PID: 1039 Comm: probe Not tainted 4.11.0-rc5+ #2727
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x143/0x1b0 lib/dump_stack.c:52
       kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
       __kmsan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:491
       rtnl_fdb_dump+0x5dc/0x1000 net/core/rtnetlink.c:3230
       netlink_dump+0x84f/0x1190 net/netlink/af_netlink.c:2168
       __netlink_dump_start+0xc97/0xe50 net/netlink/af_netlink.c:2258
       netlink_dump_start ./include/linux/netlink.h:165
       rtnetlink_rcv_msg+0xae9/0xb40 net/core/rtnetlink.c:4094
       netlink_rcv_skb+0x339/0x5a0 net/netlink/af_netlink.c:2339
       rtnetlink_rcv+0x83/0xa0 net/core/rtnetlink.c:4110
       netlink_unicast_kernel net/netlink/af_netlink.c:1272
       netlink_unicast+0x13b7/0x1480 net/netlink/af_netlink.c:1298
       netlink_sendmsg+0x10b8/0x10f0 net/netlink/af_netlink.c:1844
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
       __sys_sendmsg net/socket.c:2031
       SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
       SyS_sendmsg+0x87/0xb0 net/socket.c:2038
       do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
      RIP: 0033:0x401300
      RSP: 002b:00007ffc3b0e6d58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401300
      RDX: 0000000000000000 RSI: 00007ffc3b0e6d80 RDI: 0000000000000003
      RBP: 00007ffc3b0e6e00 R08: 000000000000000b R09: 0000000000000004
      R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
      R13: 00000000004065a0 R14: 0000000000406630 R15: 0000000000000000
      origin: 000000008fe00056
       save_stack_trace+0x59/0x60 arch/x86/kernel/stacktrace.c:59
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:352
       kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:247
       kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:260
       slab_alloc_node mm/slub.c:2743
       __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4349
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:933
       netlink_alloc_large_skb net/netlink/af_netlink.c:1144
       netlink_sendmsg+0x934/0x10f0 net/netlink/af_netlink.c:1819
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
       __sys_sendmsg net/socket.c:2031
       SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
       SyS_sendmsg+0x87/0xb0 net/socket.c:2038
       do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
      ==================================================================
      
      and the reproducer:
      
      ==================================================================
        #include <sys/socket.h>
        #include <net/if_arp.h>
        #include <linux/netlink.h>
        #include <stdint.h>
      
        int main()
        {
          int sock = socket(PF_NETLINK, SOCK_DGRAM | SOCK_NONBLOCK, 0);
          struct msghdr msg;
          memset(&msg, 0, sizeof(msg));
          char nlmsg_buf[32];
          memset(nlmsg_buf, 0, sizeof(nlmsg_buf));
          struct nlmsghdr *nlmsg = nlmsg_buf;
          nlmsg->nlmsg_len = 0x11;
          nlmsg->nlmsg_type = 0x1e; // RTM_NEWROUTE = RTM_BASE + 0x0e
          // type = 0x0e = 1110b
          // kind = 2
          nlmsg->nlmsg_flags = 0x101; // NLM_F_ROOT | NLM_F_REQUEST
          nlmsg->nlmsg_seq = 0;
          nlmsg->nlmsg_pid = 0;
          nlmsg_buf[16] = (char)7;
          struct iovec iov;
          iov.iov_base = nlmsg_buf;
          iov.iov_len = 17;
          msg.msg_iov = &iov;
          msg.msg_iovlen = 1;
          sendmsg(sock, &msg, 0);
          return 0;
        }
      ==================================================================
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ff50e83
    • X
      sctp: set new_asoc temp when processing dupcookie · 7e062977
      Xin Long 提交于
      After sctp changed to use transport hashtable, a transport would be
      added into global hashtable when adding the peer to an asoc, then
      the asoc can be got by searching the transport in the hashtbale.
      
      The problem is when processing dupcookie in sctp_sf_do_5_2_4_dupcook,
      a new asoc would be created. A peer with the same addr and port as
      the one in the old asoc might be added into the new asoc, but fail
      to be added into the hashtable, as they also belong to the same sk.
      
      It causes that sctp's dupcookie processing can not really work.
      
      Since the new asoc will be freed after copying it's information to
      the old asoc, it's more like a temp asoc. So this patch is to fix
      it by setting it as a temp asoc to avoid adding it's any transport
      into the hashtable and also avoid allocing assoc_id.
      
      An extra thing it has to do is to also alloc stream info for any
      temp asoc, as sctp dupcookie process needs it to update old asoc.
      But I don't think it would hurt something, as a temp asoc would
      always be freed after finishing processing cookie echo packet.
      Reported-by: NJianwen Ji <jiji@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e062977
    • X
      sctp: fix stream update when processing dupcookie · 3ab21379
      Xin Long 提交于
      Since commit 3dbcc105 ("sctp: alloc stream info when initializing
      asoc"), stream and stream.out info are always alloced when creating
      an asoc.
      
      So it's not correct to check !asoc->stream before updating stream
      info when processing dupcookie, but would be better to check asoc
      state instead.
      
      Fixes: 3dbcc105 ("sctp: alloc stream info when initializing asoc")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ab21379
    • Y
      libceph: cleanup old messages according to reconnect seq · 0a2ad541
      Yan, Zheng 提交于
      when reopen a connection, use 'reconnect seq' to clean up
      messages that have already been received by peer.
      
      Link: http://tracker.ceph.com/issues/18690Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      0a2ad541
  6. 24 5月, 2017 6 次提交