1. 20 12月, 2018 1 次提交
    • S
      net/rds: fix warn in rds_message_alloc_sgs · ea010070
      shamir rabinovitch 提交于
      redundant copy_from_user in rds_sendmsg system call expose rds
      to issue where rds_rdma_extra_size walk the rds iovec and and
      calculate the number pf pages (sgs) it need to add to the tail of
      rds message and later rds_cmsg_rdma_args copy the rds iovec again
      and re calculate the same number and get different result causing
      WARN_ON in rds_message_alloc_sgs.
      
      fix this by doing the copy_from_user only once per rds_sendmsg
      system call.
      
      When issue occur the below dump is seen:
      
      WARNING: CPU: 0 PID: 19789 at net/rds/message.c:316 rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 19789 Comm: syz-executor827 Not tainted 4.19.0-next-20181030+ #101
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x244/0x39d lib/dump_stack.c:113
       panic+0x2ad/0x55c kernel/panic.c:188
       __warn.cold.8+0x20/0x45 kernel/panic.c:540
       report_bug+0x254/0x2d0 lib/bug.c:186
       fixup_bug arch/x86/kernel/traps.c:178 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
       do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
       invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
      RIP: 0010:rds_message_alloc_sgs+0x10c/0x160 net/rds/message.c:316
      Code: c0 74 04 3c 03 7e 6c 44 01 ab 78 01 00 00 e8 2b 9e 35 fa 4c 89 e0 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d c3 e8 14 9e 35 fa <0f> 0b 31 ff 44 89 ee e8 18 9f 35 fa 45 85 ed 75 1b e8 fe 9d 35 fa
      RSP: 0018:ffff8801c51b7460 EFLAGS: 00010293
      RAX: ffff8801bc412080 RBX: ffff8801d7bf4040 RCX: ffffffff8749c9e6
      RDX: 0000000000000000 RSI: ffffffff8749ca5c RDI: 0000000000000004
      RBP: ffff8801c51b7490 R08: ffff8801bc412080 R09: ffffed003b5c5b67
      R10: ffffed003b5c5b67 R11: ffff8801dae2db3b R12: 0000000000000000
      R13: 000000000007165c R14: 000000000007165c R15: 0000000000000005
       rds_cmsg_rdma_args+0x82d/0x1510 net/rds/rdma.c:623
       rds_cmsg_send net/rds/send.c:971 [inline]
       rds_sendmsg+0x19a2/0x3180 net/rds/send.c:1273
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2117
       __sys_sendmsg+0x11d/0x280 net/socket.c:2155
       __do_sys_sendmsg net/socket.c:2164 [inline]
       __se_sys_sendmsg net/socket.c:2162 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2162
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x44a859
      Code: e8 dc e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 6b cb fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f1d4710ada8 EFLAGS: 00000297 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000006dcc28 RCX: 000000000044a859
      RDX: 0000000000000000 RSI: 0000000020001600 RDI: 0000000000000003
      RBP: 00000000006dcc20 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000297 R12: 00000000006dcc2c
      R13: 646e732f7665642f R14: 00007f1d4710b9c0 R15: 00000000006dcd2c
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Reported-by: syzbot+26de17458aeda9d305d8@syzkaller.appspotmail.com
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: Nshamir rabinovitch <shamir.rabinovitch@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea010070
  2. 19 12月, 2018 5 次提交
  3. 18 12月, 2018 3 次提交
  4. 16 12月, 2018 2 次提交
    • E
      net: clear skb->tstamp in forwarding paths · 8203e2d8
      Eric Dumazet 提交于
      Sergey reported that forwarding was no longer working
      if fq packet scheduler was used.
      
      This is caused by the recent switch to EDT model, since incoming
      packets might have been timestamped by __net_timestamp()
      
      __net_timestamp() uses ktime_get_real(), while fq expects packets
      using CLOCK_MONOTONIC base.
      
      The fix is to clear skb->tstamp in forwarding paths.
      
      Fixes: 80b14dee ("net: Add a new socket option for a future transmit time.")
      Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NSergey Matyukevich <geomatsi@gmail.com>
      Tested-by: NSergey Matyukevich <geomatsi@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8203e2d8
    • M
      net: ipv4: do not handle duplicate fragments as overlapping · ade44640
      Michal Kubecek 提交于
      Since commit 7969e5c4 ("ip: discard IPv4 datagrams with overlapping
      segments.") IPv4 reassembly code drops the whole queue whenever an
      overlapping fragment is received. However, the test is written in a way
      which detects duplicate fragments as overlapping so that in environments
      with many duplicate packets, fragmented packets may be undeliverable.
      
      Add an extra test and for (potentially) duplicate fragment, only drop the
      new fragment rather than the whole queue. Only starting offset and length
      are checked, not the contents of the fragments as that would be too
      expensive. For similar reason, linear list ("run") of a rbtree node is not
      iterated, we only check if the new fragment is a subset of the interval
      covered by existing consecutive fragments.
      
      v2: instead of an exact check iterating through linear list of an rbtree
      node, only check if the new fragment is subset of the "run" (suggested
      by Eric Dumazet)
      
      Fixes: 7969e5c4 ("ip: discard IPv4 datagrams with overlapping segments.")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ade44640
  5. 15 12月, 2018 10 次提交
  6. 13 12月, 2018 3 次提交
  7. 12 12月, 2018 1 次提交
    • D
      bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K · fdadd049
      Daniel Borkmann 提交于
      Michael and Sandipan report:
      
        Commit ede95a63 introduced a bpf_jit_limit tuneable to limit BPF
        JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
        and is adjusted again at init time if MODULES_VADDR is defined.
      
        For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
        the compile-time default at boot-time, which is 0x9c400000 when
        using 64K page size. This overflows the signed 32-bit bpf_jit_limit
        value:
      
        root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
        -1673527296
      
        and can cause various unexpected failures throughout the network
        stack. In one case `strace dhclient eth0` reported:
      
        setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
                   16) = -1 ENOTSUPP (Unknown error 524)
      
        and similar failures can be seen with tools like tcpdump. This doesn't
        always reproduce however, and I'm not sure why. The more consistent
        failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
        host would time out on systemd/netplan configuring a virtio-net NIC
        with no noticeable errors in the logs.
      
      Given this and also given that in near future some architectures like
      arm64 will have a custom area for BPF JIT image allocations we should
      get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
      4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
      so therefore add another overridable bpf_jit_alloc_exec_limit() helper
      function which returns the possible size of the memory area for deriving
      the default heuristic in bpf_jit_charge_init().
      
      Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
      bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
      JIT memory provider, and therefore in case archs implement their custom
      module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
      vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.
      
      Additionally, for archs supporting large page sizes, we should change
      the sysctl to be handled as long to not run into sysctl restrictions
      in future.
      
      Fixes: ede95a63 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
      Reported-by: NSandipan Das <sandipan@linux.ibm.com>
      Reported-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      fdadd049
  8. 11 12月, 2018 2 次提交
    • G
      ipv4: Fix potential Spectre v1 vulnerability · 5648451e
      Gustavo A. R. Silva 提交于
      vr.vifi is indirectly controlled by user-space, hence leading to
      a potential exploitation of the Spectre variant 1 vulnerability.
      
      This issue was detected with the help of Smatch:
      
      net/ipv4/ipmr.c:1616 ipmr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
      net/ipv4/ipmr.c:1690 ipmr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap)
      
      Fix this by sanitizing vr.vifi before using it to index mrt->vif_table'
      
      Notice that given that speculation windows are large, the policy is
      to kill the speculation on the first load and not worry if it can be
      completed with a dependent load/store [1].
      
      [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5648451e
    • X
      sctp: initialize sin6_flowinfo for ipv6 addrs in sctp_inet6addr_event · 4a2eb0c3
      Xin Long 提交于
      syzbot reported a kernel-infoleak, which is caused by an uninitialized
      field(sin6_flowinfo) of addr->a.v6 in sctp_inet6addr_event().
      The call trace is as below:
      
        BUG: KMSAN: kernel-infoleak in _copy_to_user+0x19a/0x230 lib/usercopy.c:33
        CPU: 1 PID: 8164 Comm: syz-executor2 Not tainted 4.20.0-rc3+ #95
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
        Google 01/01/2011
        Call Trace:
          __dump_stack lib/dump_stack.c:77 [inline]
          dump_stack+0x32d/0x480 lib/dump_stack.c:113
          kmsan_report+0x12c/0x290 mm/kmsan/kmsan.c:683
          kmsan_internal_check_memory+0x32a/0xa50 mm/kmsan/kmsan.c:743
          kmsan_copy_to_user+0x78/0xd0 mm/kmsan/kmsan_hooks.c:634
          _copy_to_user+0x19a/0x230 lib/usercopy.c:33
          copy_to_user include/linux/uaccess.h:183 [inline]
          sctp_getsockopt_local_addrs net/sctp/socket.c:5998 [inline]
          sctp_getsockopt+0x15248/0x186f0 net/sctp/socket.c:7477
          sock_common_getsockopt+0x13f/0x180 net/core/sock.c:2937
          __sys_getsockopt+0x489/0x550 net/socket.c:1939
          __do_sys_getsockopt net/socket.c:1950 [inline]
          __se_sys_getsockopt+0xe1/0x100 net/socket.c:1947
          __x64_sys_getsockopt+0x62/0x80 net/socket.c:1947
          do_syscall_64+0xcf/0x110 arch/x86/entry/common.c:291
          entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      sin6_flowinfo is not really used by SCTP, so it will be fixed by simply
      setting it to 0.
      
      The issue exists since very beginning.
      Thanks Alexander for the reproducer provided.
      
      Reported-by: syzbot+ad5d327e6936a2e284be@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a2eb0c3
  9. 10 12月, 2018 1 次提交
  10. 08 12月, 2018 5 次提交
    • S
      ipv6: Check available headroom in ip6_xmit() even without options · 66033f47
      Stefano Brivio 提交于
      Even if we send an IPv6 packet without options, MAX_HEADER might not be
      enough to account for the additional headroom required by alignment of
      hardware headers.
      
      On a configuration without HYPERV_NET, WLAN, AX25, and with IPV6_TUNNEL,
      sending short SCTP packets over IPv4 over L2TP over IPv6, we start with
      100 bytes of allocated headroom in sctp_packet_transmit(), end up with 54
      bytes after l2tp_xmit_skb(), and 14 bytes in ip6_finish_output2().
      
      Those would be enough to append our 14 bytes header, but we're going to
      align that to 16 bytes, and write 2 bytes out of the allocated slab in
      neigh_hh_output().
      
      KASan says:
      
      [  264.967848] ==================================================================
      [  264.967861] BUG: KASAN: slab-out-of-bounds in ip6_finish_output2+0x1aec/0x1c70
      [  264.967866] Write of size 16 at addr 000000006af1c7fe by task netperf/6201
      [  264.967870]
      [  264.967876] CPU: 0 PID: 6201 Comm: netperf Not tainted 4.20.0-rc4+ #1
      [  264.967881] Hardware name: IBM 2827 H43 400 (z/VM 6.4.0)
      [  264.967887] Call Trace:
      [  264.967896] ([<00000000001347d6>] show_stack+0x56/0xa0)
      [  264.967903]  [<00000000017e379c>] dump_stack+0x23c/0x290
      [  264.967912]  [<00000000007bc594>] print_address_description+0xf4/0x290
      [  264.967919]  [<00000000007bc8fc>] kasan_report+0x13c/0x240
      [  264.967927]  [<000000000162f5e4>] ip6_finish_output2+0x1aec/0x1c70
      [  264.967935]  [<000000000163f890>] ip6_finish_output+0x430/0x7f0
      [  264.967943]  [<000000000163fe44>] ip6_output+0x1f4/0x580
      [  264.967953]  [<000000000163882a>] ip6_xmit+0xfea/0x1ce8
      [  264.967963]  [<00000000017396e2>] inet6_csk_xmit+0x282/0x3f8
      [  264.968033]  [<000003ff805fb0ba>] l2tp_xmit_skb+0xe02/0x13e0 [l2tp_core]
      [  264.968037]  [<000003ff80631192>] l2tp_eth_dev_xmit+0xda/0x150 [l2tp_eth]
      [  264.968041]  [<0000000001220020>] dev_hard_start_xmit+0x268/0x928
      [  264.968069]  [<0000000001330e8e>] sch_direct_xmit+0x7ae/0x1350
      [  264.968071]  [<000000000122359c>] __dev_queue_xmit+0x2b7c/0x3478
      [  264.968075]  [<00000000013d2862>] ip_finish_output2+0xce2/0x11a0
      [  264.968078]  [<00000000013d9b14>] ip_finish_output+0x56c/0x8c8
      [  264.968081]  [<00000000013ddd1e>] ip_output+0x226/0x4c0
      [  264.968083]  [<00000000013dbd6c>] __ip_queue_xmit+0x894/0x1938
      [  264.968100]  [<000003ff80bc3a5c>] sctp_packet_transmit+0x29d4/0x3648 [sctp]
      [  264.968116]  [<000003ff80b7bf68>] sctp_outq_flush_ctrl.constprop.5+0x8d0/0xe50 [sctp]
      [  264.968131]  [<000003ff80b7c716>] sctp_outq_flush+0x22e/0x7d8 [sctp]
      [  264.968146]  [<000003ff80b35c68>] sctp_cmd_interpreter.isra.16+0x530/0x6800 [sctp]
      [  264.968161]  [<000003ff80b3410a>] sctp_do_sm+0x222/0x648 [sctp]
      [  264.968177]  [<000003ff80bbddac>] sctp_primitive_ASSOCIATE+0xbc/0xf8 [sctp]
      [  264.968192]  [<000003ff80b93328>] __sctp_connect+0x830/0xc20 [sctp]
      [  264.968208]  [<000003ff80bb11ce>] sctp_inet_connect+0x2e6/0x378 [sctp]
      [  264.968212]  [<0000000001197942>] __sys_connect+0x21a/0x450
      [  264.968215]  [<000000000119aff8>] sys_socketcall+0x3d0/0xb08
      [  264.968218]  [<000000000184ea7a>] system_call+0x2a2/0x2c0
      
      [...]
      
      Just like ip_finish_output2() does for IPv4, check that we have enough
      headroom in ip6_xmit(), and reallocate it if we don't.
      
      This issue is older than git history.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66033f47
    • E
      tcp: lack of available data can also cause TSO defer · f9bfe4e6
      Eric Dumazet 提交于
      tcp_tso_should_defer() can return true in three different cases :
      
       1) We are cwnd-limited
       2) We are rwnd-limited
       3) We are application limited.
      
      Neal pointed out that my recent fix went too far, since
      it assumed that if we were not in 1) case, we must be rwnd-limited
      
      Fix this by properly populating the is_cwnd_limited and
      is_rwnd_limited booleans.
      
      After this change, we can finally move the silly check for FIN
      flag only for the application-limited case.
      
      The same move for EOR bit will be handled in net-next,
      since commit 1c09f7d0 ("tcp: do not try to defer skbs
      with eor mark (MSG_EOR)") is scheduled for linux-4.21
      
      Tested by running 200 concurrent netperf -t TCP_RR -- -r 60000,100
      and checking none of them was rwnd_limited in the chrono_stat
      output from "ss -ti" command.
      
      Fixes: 41727549 ("tcp: Do not underestimate rwnd_limited")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Suggested-by: NNeal Cardwell <ncardwell@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Reviewed-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9bfe4e6
    • S
      net/flow_dissector: correctly cap nhoff and thoff in case of BPF · ec3d837a
      Stanislav Fomichev 提交于
      We want to make sure that the following condition holds:
      0 <= nhoff <= thoff <= skb->len
      
      BPF program can set out-of-bounds nhoff and thoff, which is dangerous, see
      recent commit d0c081b4 ("flow_dissector: properly cap thoff field")'.
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      ec3d837a
    • S
      selftests/bpf: use thoff instead of nhoff in BPF flow dissector · 13e56ec2
      Stanislav Fomichev 提交于
      We are returning thoff from the flow dissector, not the nhoff. Pass
      thoff along with nhoff to the bpf program (initially thoff == nhoff)
      and expect flow dissector amend/return thoff, not nhoff.
      
      This avoids confusion, when by the time bpf flow dissector exits,
      nhoff == thoff, which doesn't make much sense.
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      13e56ec2
    • S
      ipv6: sr: properly initialize flowi6 prior passing to ip6_route_output · 1b4e5ad5
      Shmulik Ladkani 提交于
      In 'seg6_output', stack variable 'struct flowi6 fl6' was missing
      initialization.
      
      Fixes: 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels")
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b4e5ad5
  11. 07 12月, 2018 1 次提交
  12. 06 12月, 2018 5 次提交
    • J
      ipv4: ipv6: netfilter: Adjust the frag mem limit when truesize changes · ebaf39e6
      Jiri Wiesner 提交于
      The *_frag_reasm() functions are susceptible to miscalculating the byte
      count of packet fragments in case the truesize of a head buffer changes.
      The truesize member may be changed by the call to skb_unclone(), leaving
      the fragment memory limit counter unbalanced even if all fragments are
      processed. This miscalculation goes unnoticed as long as the network
      namespace which holds the counter is not destroyed.
      
      Should an attempt be made to destroy a network namespace that holds an
      unbalanced fragment memory limit counter the cleanup of the namespace
      never finishes. The thread handling the cleanup gets stuck in
      inet_frags_exit_net() waiting for the percpu counter to reach zero. The
      thread is usually in running state with a stacktrace similar to:
      
       PID: 1073   TASK: ffff880626711440  CPU: 1   COMMAND: "kworker/u48:4"
        #5 [ffff880621563d48] _raw_spin_lock at ffffffff815f5480
        #6 [ffff880621563d48] inet_evict_bucket at ffffffff8158020b
        #7 [ffff880621563d80] inet_frags_exit_net at ffffffff8158051c
        #8 [ffff880621563db0] ops_exit_list at ffffffff814f5856
        #9 [ffff880621563dd8] cleanup_net at ffffffff814f67c0
       #10 [ffff880621563e38] process_one_work at ffffffff81096f14
      
      It is not possible to create new network namespaces, and processes
      that call unshare() end up being stuck in uninterruptible sleep state
      waiting to acquire the net_mutex.
      
      The bug was observed in the IPv6 netfilter code by Per Sundstrom.
      I thank him for his analysis of the problem. The parts of this patch
      that apply to IPv4 and IPv6 fragment reassembly are preemptive measures.
      Signed-off-by: NJiri Wiesner <jwiesner@suse.com>
      Reported-by: NPer Sundstrom <per.sundstrom@redqube.se>
      Acked-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebaf39e6
    • J
      sctp: frag_point sanity check · afd0a800
      Jakub Audykowicz 提交于
      If for some reason an association's fragmentation point is zero,
      sctp_datamsg_from_user will try to endlessly try to divide a message
      into zero-sized chunks. This eventually causes kernel panic due to
      running out of memory.
      
      Although this situation is quite unlikely, it has occurred before as
      reported. I propose to add this simple last-ditch sanity check due to
      the severity of the potential consequences.
      Signed-off-by: NJakub Audykowicz <jakub.audykowicz@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afd0a800
    • Y
      tcp: fix NULL ref in tail loss probe · b2b7af86
      Yuchung Cheng 提交于
      TCP loss probe timer may fire when the retranmission queue is empty but
      has a non-zero tp->packets_out counter. tcp_send_loss_probe will call
      tcp_rearm_rto which triggers NULL pointer reference by fetching the
      retranmission queue head in its sub-routines.
      
      Add a more detailed warning to help catch the root cause of the inflight
      accounting inconsistency.
      Reported-by: NRafael Tinoco <rafael.tinoco@linaro.org>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2b7af86
    • E
      tcp: Do not underestimate rwnd_limited · 41727549
      Eric Dumazet 提交于
      If available rwnd is too small, tcp_tso_should_defer()
      can decide it is worth waiting before splitting a TSO packet.
      
      This really means we are rwnd limited.
      
      Fixes: 5615f886 ("tcp: instrument how long TCP is limited by receive window")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Reviewed-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41727549
    • E
      net: use skb_list_del_init() to remove from RX sublists · 22f6bbb7
      Edward Cree 提交于
      list_del() leaves the skb->next pointer poisoned, which can then lead to
       a crash in e.g. OVS forwarding.  For example, setting up an OVS VXLAN
       forwarding bridge on sfc as per:
      
      ========
      $ ovs-vsctl show
      5dfd9c47-f04b-4aaa-aa96-4fbb0a522a30
          Bridge "br0"
              Port "br0"
                  Interface "br0"
                      type: internal
              Port "enp6s0f0"
                  Interface "enp6s0f0"
              Port "vxlan0"
                  Interface "vxlan0"
                      type: vxlan
                      options: {key="1", local_ip="10.0.0.5", remote_ip="10.0.0.4"}
          ovs_version: "2.5.0"
      ========
      (where 10.0.0.5 is an address on enp6s0f1)
      and sending traffic across it will lead to the following panic:
      ========
      general protection fault: 0000 [#1] SMP PTI
      CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.20.0-rc3-ehc+ #701
      Hardware name: Dell Inc. PowerEdge R710/0M233H, BIOS 6.4.0 07/23/2013
      RIP: 0010:dev_hard_start_xmit+0x38/0x200
      Code: 53 48 89 fb 48 83 ec 20 48 85 ff 48 89 54 24 08 48 89 4c 24 18 0f 84 ab 01 00 00 48 8d 86 90 00 00 00 48 89 f5 48 89 44 24 10 <4c> 8b 33 48 c7 03 00 00 00 00 48 8b 05 c7 d1 b3 00 4d 85 f6 0f 95
      RSP: 0018:ffff888627b437e0 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff88862279c000
      RDX: ffff888614a342c0 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: ffff888618a88000 R08: 0000000000000001 R09: 00000000000003e8
      R10: 0000000000000000 R11: ffff888614a34140 R12: 0000000000000000
      R13: 0000000000000062 R14: dead000000000100 R15: ffff888616430000
      FS:  0000000000000000(0000) GS:ffff888627b40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f6d2bc6d000 CR3: 000000000200a000 CR4: 00000000000006e0
      Call Trace:
       <IRQ>
       __dev_queue_xmit+0x623/0x870
       ? masked_flow_lookup+0xf7/0x220 [openvswitch]
       ? ep_poll_callback+0x101/0x310
       do_execute_actions+0xaba/0xaf0 [openvswitch]
       ? __wake_up_common+0x8a/0x150
       ? __wake_up_common_lock+0x87/0xc0
       ? queue_userspace_packet+0x31c/0x5b0 [openvswitch]
       ovs_execute_actions+0x47/0x120 [openvswitch]
       ovs_dp_process_packet+0x7d/0x110 [openvswitch]
       ovs_vport_receive+0x6e/0xd0 [openvswitch]
       ? dst_alloc+0x64/0x90
       ? rt_dst_alloc+0x50/0xd0
       ? ip_route_input_slow+0x19a/0x9a0
       ? __udp_enqueue_schedule_skb+0x198/0x1b0
       ? __udp4_lib_rcv+0x856/0xa30
       ? __udp4_lib_rcv+0x856/0xa30
       ? cpumask_next_and+0x19/0x20
       ? find_busiest_group+0x12d/0xcd0
       netdev_frame_hook+0xce/0x150 [openvswitch]
       __netif_receive_skb_core+0x205/0xae0
       __netif_receive_skb_list_core+0x11e/0x220
       netif_receive_skb_list+0x203/0x460
       ? __efx_rx_packet+0x335/0x5e0 [sfc]
       efx_poll+0x182/0x320 [sfc]
       net_rx_action+0x294/0x3c0
       __do_softirq+0xca/0x297
       irq_exit+0xa6/0xb0
       do_IRQ+0x54/0xd0
       common_interrupt+0xf/0xf
       </IRQ>
      ========
      So, in all listified-receive handling, instead pull skbs off the lists with
       skb_list_del_init().
      
      Fixes: 9af86f93 ("net: core: fix use-after-free in __netif_receive_skb_list_core")
      Fixes: 7da517a3 ("net: core: Another step of skb receive list processing")
      Fixes: a4ca8b7d ("net: ipv4: fix drop handling in ip_list_rcv() and ip_list_rcv_finish()")
      Fixes: d8269e2c ("net: ipv6: listify ipv6_rcv() and ip6_rcv_finish()")
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22f6bbb7
  13. 05 12月, 2018 1 次提交