1. 28 10月, 2017 1 次提交
  2. 27 10月, 2017 3 次提交
    • X
      ip6_gre: update dst pmtu if dev mtu has been updated by toobig in __gre6_xmit · 8aec4959
      Xin Long 提交于
      When receiving a Toobig icmpv6 packet, ip6gre_err would just set
      tunnel dev's mtu, that's not enough. For skb_dst(skb)'s pmtu may
      still be using the old value, it has no chance to be updated with
      tunnel dev's mtu.
      
      Jianlin found this issue by reducing route's mtu while running
      netperf, the performance went to 0.
      
      ip6ip6 and ip4ip6 tunnel can work well with this, as they lookup
      the upper dst and update_pmtu it's pmtu or icmpv6_send a Toobig
      to upper socket after setting tunnel dev's mtu.
      
      We couldn't do that for ip6_gre, as gre's inner packet could be
      any protocol, it's difficult to handle them (like lookup upper
      dst) in a good way.
      
      So this patch is to fix it by updating skb_dst(skb)'s pmtu when
      dev->mtu < skb_dst(skb)'s pmtu in tx path. It's safe to do this
      update there, as usually dev->mtu <= skb_dst(skb)'s pmtu and no
      performance regression can be caused by this.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8aec4959
    • X
      ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err · f8d20b46
      Xin Long 提交于
      The similar fix in patch 'ipip: only increase err_count for some
      certain type icmp in ipip_err' is needed for ip6gre_err.
      
      In Jianlin's case, udp netperf broke even when receiving a TooBig
      icmpv6 packet.
      
      Fixes: c12b395a ("gre: Support GRE over IPv6")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8d20b46
    • X
      ipip: only increase err_count for some certain type icmp in ipip_err · f3594f0a
      Xin Long 提交于
      t->err_count is used to count the link failure on tunnel and an err
      will be reported to user socket in tx path if t->err_count is not 0.
      udp socket could even return EHOSTUNREACH to users.
      
      Since commit fd58156e ("IPIP: Use ip-tunneling code.") removed
      the 'switch check' for icmp type in ipip_err(), err_count would be
      increased by the icmp packet with ICMP_EXC_FRAGTIME code. an link
      failure would be reported out due to this.
      
      In Jianlin's case, when receiving ICMP_EXC_FRAGTIME a icmp packet,
      udp netperf failed with the err:
        send_data: data send error: No route to host (errno 113)
      
      We expect this error reported from tunnel to socket when receiving
      some certain type icmp, but not ICMP_EXC_FRAGTIME, ICMP_SR_FAILED
      or ICMP_PARAMETERPROB ones.
      
      This patch is to bring 'switch check' for icmp type back to ipip_err
      so that it only reports link failure for the right type icmp, just as
      in ipgre_err() and ipip6_err().
      
      Fixes: fd58156e ("IPIP: Use ip-tunneling code.")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3594f0a
  3. 26 10月, 2017 5 次提交
  4. 25 10月, 2017 3 次提交
  5. 24 10月, 2017 1 次提交
    • L
      sctp: full support for ipv6 ip_nonlocal_bind & IP_FREEBIND · b71d21c2
      Laszlo Toth 提交于
      Commit 9b974202 ("sctp: support ipv6 nonlocal bind")
      introduced support for the above options as v4 sctp did,
      so patched sctp_v6_available().
      
      In the v4 implementation it's enough, because
      sctp_inet_bind_verify() just returns with sctp_v4_available().
      However sctp_inet6_bind_verify() has an extra check before that
      for link-local scope_id, which won't respect the above options.
      
      Added the checks before calling ipv6_chk_addr(), but
      not before the validation of scope_id.
      
      before (w/ both options):
       ./v6test fe80::10 sctp
       bind failed, errno: 99 (Cannot assign requested address)
       ./v6test fe80::10 tcp
       bind success, errno: 0 (Success)
      
      after (w/ both options):
       ./v6test fe80::10 sctp
       bind success, errno: 0 (Success)
      Signed-off-by: NLaszlo Toth <laszlth@gmail.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b71d21c2
  6. 23 10月, 2017 3 次提交
    • H
      ipsec: Fix aborted xfrm policy dump crash · 1137b5e2
      Herbert Xu 提交于
      An independent security researcher, Mohamed Ghannam, has reported
      this vulnerability to Beyond Security's SecuriTeam Secure Disclosure
      program.
      
      The xfrm_dump_policy_done function expects xfrm_dump_policy to
      have been called at least once or it will crash.  This can be
      triggered if a dump fails because the target socket's receive
      buffer is full.
      
      This patch fixes it by using the cb->start mechanism to ensure that
      the initialisation is always done regardless of the buffer situation.
      
      Fixes: 12a169e7 ("ipsec: Put dumpers on the dump list")
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      1137b5e2
    • E
      tcp/dccp: fix lockdep splat in inet_csk_route_req() · a6ca7abe
      Eric Dumazet 提交于
      This patch fixes the following lockdep splat in inet_csk_route_req()
      
        lockdep_rcu_suspicious
        inet_csk_route_req
        tcp_v4_send_synack
        tcp_rtx_synack
        inet_rtx_syn_ack
        tcp_fastopen_synack_time
        tcp_retransmit_timer
        tcp_write_timer_handler
        tcp_write_timer
        call_timer_fn
      
      Thread running inet_csk_route_req() owns a reference on the request
      socket, so we have the guarantee ireq->ireq_opt wont be changed or
      freed.
      
      lockdep can enforce this invariant for us.
      
      Fixes: c92e8c02 ("tcp/dccp: fix ireq->opt races")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6ca7abe
    • K
      tcp: do tcp_mstamp_refresh before retransmits on TSQ handler · 3a91d29f
      Koichiro Den 提交于
      When retransmission on TSQ handler was introduced in the commit
      f9616c35 ("tcp: implement TSQ for retransmits"), the retransmitted
      skbs' timestamps were updated on the actual transmission. In the later
      commit 385e2070 ("tcp: use tp->tcp_mstamp in output path"), it stops
      being done so. In the commit, the comment says "We try to refresh
      tp->tcp_mstamp only when necessary", and at present tcp_tsq_handler and
      tcp_v4_mtu_reduced applies to this. About the latter, it's okay since
      it's rare enough.
      
      About the former, even though possible retransmissions on the tasklet
      comes just after the destructor run in NET_RX softirq handling, the time
      between them could be nonnegligibly large to the extent that
      tcp_rack_advance or rto rearming be affected if other (remaining) RX,
      BLOCK and (preceding) TASKLET sofirq handlings are unexpectedly heavy.
      
      So in the same way as tcp_write_timer_handler does, doing tcp_mstamp_refresh
      ensures the accuracy of algorithms relying on it.
      
      Fixes: 385e2070 ("tcp: use tp->tcp_mstamp in output path")
      Signed-off-by: NKoichiro Den <den@klaipeden.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a91d29f
  7. 22 10月, 2017 6 次提交
    • E
      ipv6: flowlabel: do not leave opt->tot_len with garbage · 864e2a1f
      Eric Dumazet 提交于
      When syzkaller team brought us a C repro for the crash [1] that
      had been reported many times in the past, I finally could find
      the root cause.
      
      If FlowLabel info is merged by fl6_merge_options(), we leave
      part of the opt_space storage provided by udp/raw/l2tp with random value
      in opt_space.tot_len, unless a control message was provided at sendmsg()
      time.
      
      Then ip6_setup_cork() would use this random value to perform a kzalloc()
      call. Undefined behavior and crashes.
      
      Fix is to properly set tot_len in fl6_merge_options()
      
      At the same time, we can also avoid consuming memory and cpu cycles
      to clear it, if every option is copied via a kmemdup(). This is the
      change in ip6_setup_cork().
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801cb64a100 task.stack: ffff8801cc350000
      RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
      RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
      RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
      RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
      RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
      R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
      R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
      FS:  00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
      DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
       udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x358/0x5a0 net/socket.c:1750
       SyS_sendto+0x40/0x50 net/socket.c:1718
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4520a9
      RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
      RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
      RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
      R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
      Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
      RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      864e2a1f
    • D
      rxrpc: Don't release call mutex on error pointer · 6cb3ece9
      David Howells 提交于
      Don't release call mutex at the end of rxrpc_kernel_begin_call() if the
      call pointer actually holds an error value.
      
      Fixes: 540b1c48 ("rxrpc: Fix deadlock between call creation and sendmsg/recvmsg")
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cb3ece9
    • N
      net: ethtool: remove error check for legacy setting transceiver type · 95491e3c
      Niklas Söderlund 提交于
      Commit 9cab88726929605 ("net: ethtool: Add back transceiver type")
      restores the transceiver type to struct ethtool_link_settings and
      convert_link_ksettings_to_legacy_settings() but forgets to remove the
      error check for the same in convert_legacy_settings_to_link_ksettings().
      This prevents older versions of ethtool to change link settings.
      
          # ethtool --version
          ethtool version 3.16
      
          # ethtool -s eth0 autoneg on speed 100 duplex full
          Cannot set new settings: Invalid argument
            not setting speed
            not setting duplex
            not setting autoneg
      
      While newer versions of ethtool works.
      
          # ethtool --version
          ethtool version 4.10
      
          # ethtool -s eth0 autoneg on speed 100 duplex full
          [   57.703268] sh-eth ee700000.ethernet eth0: Link is Down
          [   59.618227] sh-eth ee700000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
      
      Fixes: 19cab887 ("net: ethtool: Add back transceiver type")
      Signed-off-by: NNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Reported-by: NRenjith R V <renjith.rv@quest-global.com>
      Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95491e3c
    • C
      soreuseport: fix initialization race · 1b5f962e
      Craig Gallek 提交于
      Syzkaller stumbled upon a way to trigger
      WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
      reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39
      
      There are two initialization paths for the sock_reuseport structure in a
      socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through
      SO_ATTACH_REUSEPORT_[CE]BPF before bind.  The existing implementation
      assumedthat the socket lock protected both of these paths when it actually
      only protects the SO_ATTACH_REUSEPORT path.  Syzkaller triggered this
      double allocation by running these paths concurrently.
      
      This patch moves the check for double allocation into the reuseport_alloc
      function which is protected by a global spin lock.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Fixes: c125e80b ("soreuseport: fast reuseport TCP socket selection")
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b5f962e
    • N
      net: bridge: fix returning of vlan range op errors · 66c54517
      Nikolay Aleksandrov 提交于
      When vlan tunnels were introduced, vlan range errors got silently
      dropped and instead 0 was returned always. Restore the previous
      behaviour and return errors to user-space.
      
      Fixes: efa5356b ("bridge: per vlan dst_metadata netlink support")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66c54517
    • W
      sock: correct sk_wmem_queued accounting on efault in tcp zerocopy · 54d43117
      Willem de Bruijn 提交于
      Syzkaller hits WARN_ON(sk->sk_wmem_queued) in sk_stream_kill_queues
      after triggering an EFAULT in __zerocopy_sg_from_iter.
      
      On this error, skb_zerocopy_stream_iter resets the skb to its state
      before the operation with __pskb_trim. It cannot kfree_skb like
      datagram callers, as the skb may have data from a previous send call.
      
      __pskb_trim calls skb_condense for unowned skbs, which adjusts their
      truesize. These tcp skbuffs are owned and their truesize must add up
      to sk_wmem_queued. But they match because their skb->sk is NULL until
      tcp_transmit_skb.
      
      Temporarily set skb->sk when calling __pskb_trim to signal that the
      skbuffs are owned and avoid the skb_condense path.
      
      Fixes: 52267790 ("sock: add MSG_ZEROCOPY")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54d43117
  8. 21 10月, 2017 9 次提交
    • M
      udp: make some messages more descriptive · 197df02c
      Matteo Croce 提交于
      In the UDP code there are two leftover error messages with very few meaning.
      Replace them with a more descriptive error message as some users
      reported them as "strange network error".
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      197df02c
    • D
      hv_sock: add locking in the open/close/release code paths · b4562ca7
      Dexuan Cui 提交于
      Without the patch, when hvs_open_connection() hasn't completely established
      a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't
      inserted the sock into the connected queue), vsock_stream_connect() may see
      the sk_state change and return the connection to the userspace, and next
      when the userspace closes the connection quickly, hvs_release() may not see
      the connection in the connected queue; finally hvs_open_connection()
      inserts the connection into the queue, but we won't be able to purge the
      connection for ever.
      Signed-off-by: NDexuan Cui <decui@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Cathy Avery <cavery@redhat.com>
      Cc: Rolf Neugebauer <rolf.neugebauer@docker.com>
      Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4562ca7
    • G
      net/ncsi: Fix length of GVI response packet · 0a90e251
      Gavin Shan 提交于
      The length of GVI (GetVersionInfo) response packet should be 40 instead
      of 36. This issue was found from /sys/kernel/debug/ncsi/eth0/stats.
      
       # ethtool --ncsi eth0 swstats
           :
       RESPONSE     OK       TIMEOUT  ERROR
       =======================================
       GVI          0        0        2
      
      With this applied, no error reported on GVI response packets:
      
       # ethtool --ncsi eth0 swstats
           :
       RESPONSE     OK       TIMEOUT  ERROR
       =======================================
       GVI          2        0        0
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a90e251
    • G
      net/ncsi: Enforce failover on link monitor timeout · 52b4c862
      Gavin Shan 提交于
      The NCSI channel has been configured to provide service if its link
      monitor timer is enabled, regardless of its state (inactive or active).
      So the timeout event on the link monitor indicates the out-of-service
      on that channel, for which a failover is needed.
      
      This sets NCSI_DEV_RESHUFFLE flag to enforce failover on link monitor
      timeout, regardless the channel's original state (inactive or active).
      Also, the link is put into "down" state to give the failing channel
      lowest priority when selecting for the active channel. The state of
      failing channel should be set to active in order for deinitialization
      and failover to be done.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52b4c862
    • G
      net/ncsi: Disable HWA mode when no channels are found · 100ef01f
      Gavin Shan 提交于
      When there are no NCSI channels probed, HWA (Hardware Arbitration)
      mode is enabled. It's not correct because HWA depends on the fact:
      NCSI channels exist and all of them support HWA mode. This disables
      HWA when no channels are probed.
      Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
      Signed-off-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      100ef01f
    • S
      net/ncsi: Stop monitor if channel times out or is inactive · 0795fb20
      Samuel Mendoza-Jonas 提交于
      ncsi_channel_monitor() misses stopping the channel monitor in several
      places that it should, causing a WARN_ON_ONCE() to trigger when the
      monitor is re-started later, eg:
      
      [  459.040000] WARNING: CPU: 0 PID: 1093 at net/ncsi/ncsi-manage.c:269 ncsi_start_channel_monitor+0x7c/0x90
      [  459.040000] CPU: 0 PID: 1093 Comm: kworker/0:3 Not tainted 4.10.17-gaca2fdd #140
      [  459.040000] Hardware name: ASpeed SoC
      [  459.040000] Workqueue: events ncsi_dev_work
      [  459.040000] [<80010094>] (unwind_backtrace) from [<8000d950>] (show_stack+0x20/0x24)
      [  459.040000] [<8000d950>] (show_stack) from [<801dbf70>] (dump_stack+0x20/0x28)
      [  459.040000] [<801dbf70>] (dump_stack) from [<80018d7c>] (__warn+0xe0/0x108)
      [  459.040000] [<80018d7c>] (__warn) from [<80018e70>] (warn_slowpath_null+0x30/0x38)
      [  459.040000] [<80018e70>] (warn_slowpath_null) from [<803f6a08>] (ncsi_start_channel_monitor+0x7c/0x90)
      [  459.040000] [<803f6a08>] (ncsi_start_channel_monitor) from [<803f7664>] (ncsi_configure_channel+0xdc/0x5fc)
      [  459.040000] [<803f7664>] (ncsi_configure_channel) from [<803f8160>] (ncsi_dev_work+0xac/0x474)
      [  459.040000] [<803f8160>] (ncsi_dev_work) from [<8002d244>] (process_one_work+0x1e0/0x450)
      [  459.040000] [<8002d244>] (process_one_work) from [<8002d510>] (worker_thread+0x5c/0x570)
      [  459.040000] [<8002d510>] (worker_thread) from [<80033614>] (kthread+0x124/0x164)
      [  459.040000] [<80033614>] (kthread) from [<8000a5e8>] (ret_from_fork+0x14/0x2c)
      
      This also updates the monitor instead of just returning if
      ncsi_xmit_cmd() fails to send the get-link-status command so that the
      monitor properly times out.
      
      Fixes: e6f44ed6 "net/ncsi: Package and channel management"
      Signed-off-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0795fb20
    • S
      net/ncsi: Fix AEN HNCDSC packet length · 6850d0f8
      Samuel Mendoza-Jonas 提交于
      Correct the value of the HNCDSC AEN packet.
      Fixes: 7a82ecf4 "net/ncsi: NCSI AEN packet handler"
      Signed-off-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6850d0f8
    • E
      packet: avoid panic in packet_getsockopt() · 509c7a1e
      Eric Dumazet 提交于
      syzkaller got crashes in packet_getsockopt() processing
      PACKET_ROLLOVER_STATS command while another thread was managing
      to change po->rollover
      
      Using RCU will fix this bug. We might later add proper RCU annotations
      for sparse sake.
      
      In v2: I replaced kfree(rollover) in fanout_add() to kfree_rcu()
      variant, as spotted by John.
      
      Fixes: a9b63918 ("packet: rollover statistics")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: John Sperbeck <jsperbeck@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      509c7a1e
    • E
      tcp/dccp: fix ireq->opt races · c92e8c02
      Eric Dumazet 提交于
      syzkaller found another bug in DCCP/TCP stacks [1]
      
      For the reasons explained in commit ce105008 ("tcp/dccp: fix
      ireq->pktopts race"), we need to make sure we do not access
      ireq->opt unless we own the request sock.
      
      Note the opt field is renamed to ireq_opt to ease grep games.
      
      [1]
      BUG: KASAN: use-after-free in ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
      Read of size 1 at addr ffff8801c951039c by task syz-executor5/3295
      
      CPU: 1 PID: 3295 Comm: syz-executor5 Not tainted 4.14.0-rc4+ #80
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x25b/0x340 mm/kasan/report.c:409
       __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:427
       ip_queue_xmit+0x1687/0x18e0 net/ipv4/ip_output.c:474
       tcp_transmit_skb+0x1ab7/0x3840 net/ipv4/tcp_output.c:1135
       tcp_send_ack.part.37+0x3bb/0x650 net/ipv4/tcp_output.c:3587
       tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3557
       __tcp_ack_snd_check+0x2c6/0x4b0 net/ipv4/tcp_input.c:5072
       tcp_ack_snd_check net/ipv4/tcp_input.c:5085 [inline]
       tcp_rcv_state_process+0x2eff/0x4850 net/ipv4/tcp_input.c:6071
       tcp_child_process+0x342/0x990 net/ipv4/tcp_minisocks.c:816
       tcp_v4_rcv+0x1827/0x2f80 net/ipv4/tcp_ipv4.c:1682
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x40c341
      RSP: 002b:00007f469523ec10 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 000000000040c341
      RDX: 0000000000000037 RSI: 0000000020004000 RDI: 0000000000000015
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 00000000000f4240 R11: 0000000000000293 R12: 00000000004b7fd1
      R13: 00000000ffffffff R14: 0000000020000000 R15: 0000000000025000
      
      Allocated by task 3295:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       __do_kmalloc mm/slab.c:3725 [inline]
       __kmalloc+0x162/0x760 mm/slab.c:3734
       kmalloc include/linux/slab.h:498 [inline]
       tcp_v4_save_options include/net/tcp.h:1962 [inline]
       tcp_v4_init_req+0x2d3/0x3e0 net/ipv4/tcp_ipv4.c:1271
       tcp_conn_request+0xf6d/0x3410 net/ipv4/tcp_input.c:6283
       tcp_v4_conn_request+0x157/0x210 net/ipv4/tcp_ipv4.c:1313
       tcp_rcv_state_process+0x8ea/0x4850 net/ipv4/tcp_input.c:5857
       tcp_v4_do_rcv+0x55c/0x7d0 net/ipv4/tcp_ipv4.c:1482
       tcp_v4_rcv+0x2d10/0x2f80 net/ipv4/tcp_ipv4.c:1711
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 3306:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kfree+0xca/0x250 mm/slab.c:3820
       inet_sock_destruct+0x59d/0x950 net/ipv4/af_inet.c:157
       __sk_destruct+0xfd/0x910 net/core/sock.c:1560
       sk_destruct+0x47/0x80 net/core/sock.c:1595
       __sk_free+0x57/0x230 net/core/sock.c:1603
       sk_free+0x2a/0x40 net/core/sock.c:1614
       sock_put include/net/sock.h:1652 [inline]
       inet_csk_complete_hashdance+0xd5/0xf0 net/ipv4/inet_connection_sock.c:959
       tcp_check_req+0xf4d/0x1620 net/ipv4/tcp_minisocks.c:765
       tcp_v4_rcv+0x17f6/0x2f80 net/ipv4/tcp_ipv4.c:1675
       ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_local_deliver+0x1ce/0x6e0 net/ipv4/ip_input.c:257
       dst_input include/net/dst.h:464 [inline]
       ip_rcv_finish+0x887/0x19a0 net/ipv4/ip_input.c:397
       NF_HOOK include/linux/netfilter.h:249 [inline]
       ip_rcv+0xc3f/0x1820 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x1a3e/0x34b0 net/core/dev.c:4476
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4514
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4587
       netif_receive_skb+0xae/0x390 net/core/dev.c:4611
       tun_rx_batched.isra.50+0x5ed/0x860 drivers/net/tun.c:1372
       tun_get_user+0x249c/0x36d0 drivers/net/tun.c:1766
       tun_chr_write_iter+0xbf/0x160 drivers/net/tun.c:1792
       call_write_iter include/linux/fs.h:1770 [inline]
       new_sync_write fs/read_write.c:468 [inline]
       __vfs_write+0x68a/0x970 fs/read_write.c:481
       vfs_write+0x18f/0x510 fs/read_write.c:543
       SYSC_write fs/read_write.c:588 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:580
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c92e8c02
  9. 20 10月, 2017 3 次提交
  10. 19 10月, 2017 4 次提交
  11. 18 10月, 2017 2 次提交
    • J
      netlink: fix netlink_ack() extack race · 48044eb4
      Johannes Berg 提交于
      It seems that it's possible to toggle NETLINK_F_EXT_ACK
      through setsockopt() while another thread/CPU is building
      a message inside netlink_ack(), which could then trigger
      the WARN_ON()s I added since if it goes from being turned
      off to being turned on between allocating and filling the
      message, the skb could end up being too small.
      
      Avoid this whole situation by storing the value of this
      flag in a separate variable and using that throughout the
      function instead.
      
      Fixes: 2d4bc933 ("netlink: extended ACK reporting")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48044eb4
    • D
      KEYS: Fix race between updating and finding a negative key · 363b02da
      David Howells 提交于
      Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection
      error into one field such that:
      
       (1) The instantiation state can be modified/read atomically.
      
       (2) The error can be accessed atomically with the state.
      
       (3) The error isn't stored unioned with the payload pointers.
      
      This deals with the problem that the state is spread over three different
      objects (two bits and a separate variable) and reading or updating them
      atomically isn't practical, given that not only can uninstantiated keys
      change into instantiated or rejected keys, but rejected keys can also turn
      into instantiated keys - and someone accessing the key might not be using
      any locking.
      
      The main side effect of this problem is that what was held in the payload
      may change, depending on the state.  For instance, you might observe the
      key to be in the rejected state.  You then read the cached error, but if
      the key semaphore wasn't locked, the key might've become instantiated
      between the two reads - and you might now have something in hand that isn't
      actually an error code.
      
      The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error
      code if the key is negatively instantiated.  The key_is_instantiated()
      function is replaced with key_is_positive() to avoid confusion as negative
      keys are also 'instantiated'.
      
      Additionally, barriering is included:
      
       (1) Order payload-set before state-set during instantiation.
      
       (2) Order state-read before payload-read when using the key.
      
      Further separate barriering is necessary if RCU is being used to access the
      payload content after reading the payload pointers.
      
      Fixes: 146aa8b1 ("KEYS: Merge the type-specific data with the payload data")
      Cc: stable@vger.kernel.org # v4.4+
      Reported-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      363b02da