1. 29 6月, 2017 1 次提交
    • C
      sunrpc: Disable splice for krb5i · 06eb8a56
      Chuck Lever 提交于
      Running a multi-threaded 8KB fio test (70/30 mix), three or four out
      of twelve of the jobs fail when using krb5i. The failure is an EIO
      on a read.
      
      Troubleshooting confirmed the EIO results when the client fails to
      verify the MIC of an NFS READ reply. Bruce suggested the problem
      could be due to the data payload changing between the time the
      reply's MIC was computed on the server and the time the reply was
      actually sent.
      
      krb5p gets around this problem by disabling RQ_SPLICE_OK. Use the
      same mechanism for krb5i RPCs.
      
      "iozone -i0 -i1 -s128m -y1k -az -I", export is tmpfs, mount is
      sec=krb5i,vers=3,proto=rdma. The important numbers are the
      read / reread column.
      
      Here's without the RQ_SPLICE_OK patch:
      
                    kB  reclen    write  rewrite    read    reread
                131072       1     7546     7929     8396     8267
                131072       2    14375    14600    15843    15639
                131072       4    19280    19248    21303    21410
                131072       8    32350    31772    35199    34883
                131072      16    36748    37477    49365    51706
                131072      32    55669    56059    57475    57389
                131072      64    74599    75190    74903    75550
                131072     128    99810   101446   102828   102724
                131072     256   122042   122612   124806   125026
                131072     512   137614   138004   141412   141267
                131072    1024   146601   148774   151356   151409
                131072    2048   180684   181727   293140   292840
                131072    4096   206907   207658   552964   549029
                131072    8192   223982   224360   454493   473469
                131072   16384   228927   228390   654734   632607
      
      And here's with it:
      
                    kB  reclen    write  rewrite    read    reread
                131072       1     7700     7365     7958     8011
                131072       2    13211    13303    14937    14414
                131072       4    19001    19265    20544    20657
                131072       8    30883    31097    34255    33566
                131072      16    36868    34908    51499    49944
                131072      32    56428    55535    58710    56952
                131072      64    73507    74676    75619    74378
                131072     128   100324   101442   103276   102736
                131072     256   122517   122995   124639   124150
                131072     512   137317   139007   140530   140830
                131072    1024   146807   148923   151246   151072
                131072    2048   179656   180732   292631   292034
                131072    4096   206216   208583   543355   541951
                131072    8192   223738   224273   494201   489372
                131072   16384   229313   229840   691719   668427
      
      I would say that there is not much difference in this test.
      
      For good measure, here's the same test with sec=krb5p:
      
                    kB  reclen    write  rewrite    read    reread
                131072       1     5982     5881     6137     6218
                131072       2    10216    10252    10850    10932
                131072       4    12236    12575    15375    15526
                131072       8    15461    15462    23821    22351
                131072      16    25677    25811    27529    27640
                131072      32    31903    32354    34063    33857
                131072      64    42989    43188    45635    45561
                131072     128    52848    53210    56144    56141
                131072     256    59123    59214    62691    62933
                131072     512    63140    63277    66887    67025
                131072    1024    65255    65299    69213    69140
                131072    2048    76454    76555   133767   133862
                131072    4096    84726    84883   251925   250702
                131072    8192    89491    89482   270821   276085
                131072   16384    91572    91597   361768   336868
      
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=307Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      06eb8a56
  2. 07 6月, 2017 2 次提交
  3. 06 6月, 2017 1 次提交
  4. 05 6月, 2017 7 次提交
  5. 03 6月, 2017 2 次提交
  6. 02 6月, 2017 3 次提交
  7. 01 6月, 2017 4 次提交
  8. 30 5月, 2017 1 次提交
    • J
      mac80211: fix TX aggregation start/stop callback race · 7a7c0a64
      Johannes Berg 提交于
      When starting or stopping an aggregation session, one of the steps
      is that the driver calls back to mac80211 that the start/stop can
      proceed. This is handled by queueing up a fake SKB and processing
      it from the normal iface/sdata work. Since this isn't flushed when
      disassociating, the following race is possible:
      
       * associate
       * start aggregation session
       * driver callback
       * disassociate
       * associate again to the same AP
       * callback processing runs, leading to a WARN_ON() that
         the TID hadn't requested aggregation
      
      If the second association isn't to the same AP, there would only
      be a message printed ("Could not find station: <addr>"), but the
      same race could happen.
      
      Fix this by not going the whole detour with a fake SKB etc. but
      simply looking up the aggregation session in the driver callback,
      marking it with a START_CB/STOP_CB bit and then scheduling the
      regular aggregation work that will now process these bits as well.
      This also simplifies the code and gets rid of the whole problem
      with allocation failures of said skb, which could have left the
      session in limbo.
      Reported-by: NJouni Malinen <j@w1.fi>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      7a7c0a64
  9. 27 5月, 2017 4 次提交
    • E
      ipv4: add reference counting to metrics · 3fb07daf
      Eric Dumazet 提交于
      Andrey Konovalov reported crashes in ipv4_mtu()
      
      I could reproduce the issue with KASAN kernels, between
      10.246.7.151 and 10.246.7.152 :
      
      1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &
      
      2) At the same time run following loop :
      while :
      do
       ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
       ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
      done
      
      Cong Wang attempted to add back rt->fi in commit
      82486aa6 ("ipv4: restore rt->fi for reference counting")
      but this proved to add some issues that were complex to solve.
      
      Instead, I suggested to add a refcount to the metrics themselves,
      being a standalone object (in particular, no reference to other objects)
      
      I tried to make this patch as small as possible to ease its backport,
      instead of being super clean. Note that we believe that only ipv4 dst
      need to take care of the metric refcount. But if this is wrong,
      this patch adds the basic infrastructure to extend this to other
      families.
      
      Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
      for his efforts on this problem.
      
      Fixes: 2860583f ("ipv4: Kill rt->fi")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fb07daf
    • P
      ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets · 0e9a7095
      Peter Dawson 提交于
      This fix addresses two problems in the way the DSCP field is formulated
       on the encapsulating header of IPv6 tunnels.
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661
      
      1) The IPv6 tunneling code was manipulating the DSCP field of the
       encapsulating packet using the 32b flowlabel. Since the flowlabel is
       only the lower 20b it was incorrect to assume that the upper 12b
       containing the DSCP and ECN fields would remain intact when formulating
       the encapsulating header. This fix handles the 'inherit' and
       'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.
      
      2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
       incorrect and resulted in the DSCP value always being set to 0.
      
      Commit 90427ef5 ("ipv6: fix flow labels when the traffic class
       is non-0") caused the regression by masking out the flowlabel
       which exposed the incorrect handling of the DSCP portion of the
       flowlabel in ip6_tunnel and ip6_gre.
      
      Fixes: 90427ef5 ("ipv6: fix flow labels when the traffic class is non-0")
      Signed-off-by: NPeter Dawson <peter.a.dawson@boeing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e9a7095
    • D
      sctp: fix ICMP processing if skb is non-linear · 804ec7eb
      Davide Caratti 提交于
      sometimes ICMP replies to INIT chunks are ignored by the client, even if
      the encapsulated SCTP headers match an open socket. This happens when the
      ICMP packet is carried by a paged skb: use skb_header_pointer() to read
      packet contents beyond the SCTP header, so that chunk header and initiate
      tag are validated correctly.
      
      v2:
      - don't use skb_header_pointer() to read the transport header, since
        icmp_socket_deliver() already puts these 8 bytes in the linear area.
      - change commit message to make specific reference to INIT chunks.
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      804ec7eb
    • L
      net: llc: add lock_sock in llc_ui_bind to avoid a race condition · 0908cf4d
      linzhang 提交于
      There is a race condition in llc_ui_bind if two or more processes/threads
      try to bind a same socket.
      
      If more processes/threads bind a same socket success that will lead to
      two problems, one is this action is not what we expected, another is
      will lead to kernel in unstable status or oops(in my simple test case,
      cause llc2.ko can't unload).
      
      The current code is test SOCK_ZAPPED bit to avoid a process to
      bind a same socket twice but that is can't avoid more processes/threads
      try to bind a same socket at the same time.
      
      So, add lock_sock in llc_ui_bind like others, such as llc_ui_connect.
      Signed-off-by: NLin Zhang <xiaolou4617@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0908cf4d
  10. 26 5月, 2017 4 次提交
    • D
      bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data · 41703a73
      Daniel Borkmann 提交于
      The bpf_clone_redirect() still needs to be listed in
      bpf_helper_changes_pkt_data() since we call into
      bpf_try_make_head_writable() from there, thus we need
      to invalidate prior pkt regs as well.
      
      Fixes: 36bbef52 ("bpf: direct packet write and access for helpers for clsact progs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41703a73
    • I
      arp: fixed -Wuninitialized compiler warning · 5990baaa
      Ihar Hrachyshka 提交于
      Commit 7d472a59 ("arp: always override
      existing neigh entries with gratuitous ARP") introduced a compiler
      warning:
      
      net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
      this function [-Wmaybe-uninitialized]
      
      While the code logic seems to be correct and doesn't allow the variable
      to be used uninitialized, and the warning is not consistently
      reproducible, it's still worth fixing it for other people not to waste
      time looking at the warning in case it pops up in the build environment.
      Yes, compiler is probably at fault, but we will need to accommodate.
      
      Fixes: 7d472a59 ("arp: always override existing neigh entries with gratuitous ARP")
      Signed-off-by: NIhar Hrachyshka <ihrachys@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5990baaa
    • W
      tcp: avoid fastopen API to be used on AF_UNSPEC · ba615f67
      Wei Wang 提交于
      Fastopen API should be used to perform fastopen operations on the TCP
      socket. It does not make sense to use fastopen API to perform disconnect
      by calling it with AF_UNSPEC. The fastopen data path is also prone to
      race conditions and bugs when using with AF_UNSPEC.
      
      One issue reported and analyzed by Vegard Nossum is as follows:
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Thread A:                            Thread B:
      ------------------------------------------------------------------------
      sendto()
       - tcp_sendmsg()
           - sk_stream_memory_free() = 0
               - goto wait_for_sndbuf
      	     - sk_stream_wait_memory()
      	        - sk_wait_event() // sleep
                |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
      	  |                           - tcp_sendmsg()
      	  |                              - tcp_sendmsg_fastopen()
      	  |                                 - __inet_stream_connect()
      	  |                                    - tcp_disconnect() //because of AF_UNSPEC
      	  |                                       - tcp_transmit_skb()// send RST
      	  |                                    - return 0; // no reconnect!
      	  |                           - sk_stream_wait_connect()
      	  |                                 - sock_error()
      	  |                                    - xchg(&sk->sk_err, 0)
      	  |                                    - return -ECONNRESET
      	- ... // wake up, see sk->sk_err == 0
          - skb_entail() on TCP_CLOSE socket
      
      If the connection is reopened then we will send a brand new SYN packet
      after thread A has already queued a buffer. At this point I think the
      socket internal state (sequence numbers etc.) becomes messed up.
      
      When the new connection is closed, the FIN-ACK is rejected because the
      sequence number is outside the window. The other side tries to
      retransmit,
      but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
      corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      
      Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
      and return EOPNOTSUPP to user if such case happens.
      
      Fixes: cf60af03 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Reported-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba615f67
    • R
      net: move somaxconn init from sysctl code · 7c3f1875
      Roman Kapl 提交于
      The default value for somaxconn is set in sysctl_core_net_init(), but this
      function is not called when kernel is configured without CONFIG_SYSCTL.
      
      This results in the kernel not being able to accept TCP connections,
      because the backlog has zero size. Usually, the user ends up with:
      "TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request.  Check SNMP counters."
      If SYN cookies are not enabled the connection is rejected.
      
      Before ef547f2a (tcp: remove max_qlen_log), the effects were less
      severe, because the backlog was always at least eight slots long.
      Signed-off-by: NRoman Kapl <roman.kapl@sysgo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c3f1875
  11. 25 5月, 2017 4 次提交
    • A
      net: rtnetlink: bail out from rtnl_fdb_dump() on parse error · 0ff50e83
      Alexander Potapenko 提交于
      rtnl_fdb_dump() failed to check the result of nlmsg_parse(), which led
      to contents of |ifm| being uninitialized because nlh->nlmsglen was too
      small to accommodate |ifm|. The uninitialized data may affect some
      branches and result in unwanted effects, although kernel data doesn't
      seem to leak to the userspace directly.
      
      The bug has been detected with KMSAN and syzkaller.
      
      For the record, here is the KMSAN report:
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory in rtnl_fdb_dump+0x5dc/0x1000
      CPU: 0 PID: 1039 Comm: probe Not tainted 4.11.0-rc5+ #2727
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x143/0x1b0 lib/dump_stack.c:52
       kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:1007
       __kmsan_warning_32+0x66/0xb0 mm/kmsan/kmsan_instr.c:491
       rtnl_fdb_dump+0x5dc/0x1000 net/core/rtnetlink.c:3230
       netlink_dump+0x84f/0x1190 net/netlink/af_netlink.c:2168
       __netlink_dump_start+0xc97/0xe50 net/netlink/af_netlink.c:2258
       netlink_dump_start ./include/linux/netlink.h:165
       rtnetlink_rcv_msg+0xae9/0xb40 net/core/rtnetlink.c:4094
       netlink_rcv_skb+0x339/0x5a0 net/netlink/af_netlink.c:2339
       rtnetlink_rcv+0x83/0xa0 net/core/rtnetlink.c:4110
       netlink_unicast_kernel net/netlink/af_netlink.c:1272
       netlink_unicast+0x13b7/0x1480 net/netlink/af_netlink.c:1298
       netlink_sendmsg+0x10b8/0x10f0 net/netlink/af_netlink.c:1844
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
       __sys_sendmsg net/socket.c:2031
       SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
       SyS_sendmsg+0x87/0xb0 net/socket.c:2038
       do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
       entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
      RIP: 0033:0x401300
      RSP: 002b:00007ffc3b0e6d58 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401300
      RDX: 0000000000000000 RSI: 00007ffc3b0e6d80 RDI: 0000000000000003
      RBP: 00007ffc3b0e6e00 R08: 000000000000000b R09: 0000000000000004
      R10: 000000000000000d R11: 0000000000000246 R12: 0000000000000000
      R13: 00000000004065a0 R14: 0000000000406630 R15: 0000000000000000
      origin: 000000008fe00056
       save_stack_trace+0x59/0x60 arch/x86/kernel/stacktrace.c:59
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:352
       kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:247
       kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:260
       slab_alloc_node mm/slub.c:2743
       __kmalloc_node_track_caller+0x1f4/0x390 mm/slub.c:4349
       __kmalloc_reserve net/core/skbuff.c:138
       __alloc_skb+0x2cd/0x740 net/core/skbuff.c:231
       alloc_skb ./include/linux/skbuff.h:933
       netlink_alloc_large_skb net/netlink/af_netlink.c:1144
       netlink_sendmsg+0x934/0x10f0 net/netlink/af_netlink.c:1819
       sock_sendmsg_nosec net/socket.c:633
       sock_sendmsg net/socket.c:643
       ___sys_sendmsg+0xd4b/0x10f0 net/socket.c:1997
       __sys_sendmsg net/socket.c:2031
       SYSC_sendmsg+0x2c6/0x3f0 net/socket.c:2042
       SyS_sendmsg+0x87/0xb0 net/socket.c:2038
       do_syscall_64+0x102/0x150 arch/x86/entry/common.c:285
       return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
      ==================================================================
      
      and the reproducer:
      
      ==================================================================
        #include <sys/socket.h>
        #include <net/if_arp.h>
        #include <linux/netlink.h>
        #include <stdint.h>
      
        int main()
        {
          int sock = socket(PF_NETLINK, SOCK_DGRAM | SOCK_NONBLOCK, 0);
          struct msghdr msg;
          memset(&msg, 0, sizeof(msg));
          char nlmsg_buf[32];
          memset(nlmsg_buf, 0, sizeof(nlmsg_buf));
          struct nlmsghdr *nlmsg = nlmsg_buf;
          nlmsg->nlmsg_len = 0x11;
          nlmsg->nlmsg_type = 0x1e; // RTM_NEWROUTE = RTM_BASE + 0x0e
          // type = 0x0e = 1110b
          // kind = 2
          nlmsg->nlmsg_flags = 0x101; // NLM_F_ROOT | NLM_F_REQUEST
          nlmsg->nlmsg_seq = 0;
          nlmsg->nlmsg_pid = 0;
          nlmsg_buf[16] = (char)7;
          struct iovec iov;
          iov.iov_base = nlmsg_buf;
          iov.iov_len = 17;
          msg.msg_iov = &iov;
          msg.msg_iovlen = 1;
          sendmsg(sock, &msg, 0);
          return 0;
        }
      ==================================================================
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ff50e83
    • X
      sctp: set new_asoc temp when processing dupcookie · 7e062977
      Xin Long 提交于
      After sctp changed to use transport hashtable, a transport would be
      added into global hashtable when adding the peer to an asoc, then
      the asoc can be got by searching the transport in the hashtbale.
      
      The problem is when processing dupcookie in sctp_sf_do_5_2_4_dupcook,
      a new asoc would be created. A peer with the same addr and port as
      the one in the old asoc might be added into the new asoc, but fail
      to be added into the hashtable, as they also belong to the same sk.
      
      It causes that sctp's dupcookie processing can not really work.
      
      Since the new asoc will be freed after copying it's information to
      the old asoc, it's more like a temp asoc. So this patch is to fix
      it by setting it as a temp asoc to avoid adding it's any transport
      into the hashtable and also avoid allocing assoc_id.
      
      An extra thing it has to do is to also alloc stream info for any
      temp asoc, as sctp dupcookie process needs it to update old asoc.
      But I don't think it would hurt something, as a temp asoc would
      always be freed after finishing processing cookie echo packet.
      Reported-by: NJianwen Ji <jiji@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e062977
    • X
      sctp: fix stream update when processing dupcookie · 3ab21379
      Xin Long 提交于
      Since commit 3dbcc105 ("sctp: alloc stream info when initializing
      asoc"), stream and stream.out info are always alloced when creating
      an asoc.
      
      So it's not correct to check !asoc->stream before updating stream
      info when processing dupcookie, but would be better to check asoc
      state instead.
      
      Fixes: 3dbcc105 ("sctp: alloc stream info when initializing asoc")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ab21379
    • Y
      libceph: cleanup old messages according to reconnect seq · 0a2ad541
      Yan, Zheng 提交于
      when reopen a connection, use 'reconnect seq' to clean up
      messages that have already been received by peer.
      
      Link: http://tracker.ceph.com/issues/18690Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Reviewed-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      0a2ad541
  12. 24 5月, 2017 7 次提交