1. 01 8月, 2017 2 次提交
    • J
      ipv6: Avoid going through ->sk_net to access the netns · 1f139ed9
      Jakub Sitnicki 提交于
      There is no need to go through sk->sk_net to access the net namespace
      and its sysctl variables because we allocate the sock and initialize
      sk_net just a few lines earlier in the same routine.
      Signed-off-by: NJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f139ed9
    • F
      tcp: remove prequeue support · e7942d06
      Florian Westphal 提交于
      prequeue is a tcp receive optimization that moves part of rx processing
      from bh to process context.
      
      This only works if the socket being processed belongs to a process that
      is blocked in recv on that socket.
      
      In practice, this doesn't happen anymore that often because nowadays
      servers tend to use an event driven (epoll) model.
      
      Even normal client applications (web browsers) commonly use many tcp
      connections in parallel.
      
      This has measureable impact only in netperf (which uses plain recv and
      thus allows prequeue use) from host to locally running vm (~4%), however,
      there were no changes when using netperf between two physical hosts with
      ixgbe interfaces.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7942d06
  2. 29 7月, 2017 1 次提交
  3. 25 7月, 2017 1 次提交
  4. 20 7月, 2017 1 次提交
  5. 19 7月, 2017 4 次提交
    • A
      ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check() · 18bcf290
      Alexander Potapenko 提交于
      KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(),
      which originated from the TCP request socket created in
      cookie_v6_check():
      
       ==================================================================
       BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0
       CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies.  Check SNMP counters.
       Call Trace:
        <IRQ>
        __dump_stack lib/dump_stack.c:16
        dump_stack+0x172/0x1c0 lib/dump_stack.c:52
        kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
        __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
        skb_set_hash_from_sk ./include/net/sock.h:2011
        tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983
        tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493
        tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284
        tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309
        call_timer_fn+0x240/0x520 kernel/time/timer.c:1268
        expire_timers kernel/time/timer.c:1307
        __run_timers+0xc13/0xf10 kernel/time/timer.c:1601
        run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614
        __do_softirq+0x485/0x942 kernel/softirq.c:284
        invoke_softirq kernel/softirq.c:364
        irq_exit+0x1fa/0x230 kernel/softirq.c:405
        exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657
        smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
        apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
       RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36
       RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77
       RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
       RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
       RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005
       RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770
       RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004
       R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810
       R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4
        </IRQ>
        poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293
        SYSC_select+0x4b4/0x4e0 fs/select.c:653
        SyS_select+0x76/0xa0 fs/select.c:634
        entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
       RIP: 0033:0x4597e7
       RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017
       RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
       RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059
       R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20
       R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003
       chained origin:
        save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
        kmsan_save_stack mm/kmsan/kmsan.c:317
        kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547
        __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259
        tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472
        tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103
        tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212
        cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245
        tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
        tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
        tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
        ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
        NF_HOOK ./include/linux/netfilter.h:257
        ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
        dst_input ./include/net/dst.h:492
        ip6_rcv_finish net/ipv6/ip6_input.c:69
        NF_HOOK ./include/linux/netfilter.h:257
        ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
        __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
        __netif_receive_skb net/core/dev.c:4246
        process_backlog+0x667/0xba0 net/core/dev.c:4866
        napi_poll net/core/dev.c:5268
        net_rx_action+0xc95/0x1590 net/core/dev.c:5333
        __do_softirq+0x485/0x942 kernel/softirq.c:284
       origin:
        save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
        kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
        kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
        kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766
        reqsk_alloc ./include/net/request_sock.h:87
        inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200
        cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169
        tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
        tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
        tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
        ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
        NF_HOOK ./include/linux/netfilter.h:257
        ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
        dst_input ./include/net/dst.h:492
        ip6_rcv_finish net/ipv6/ip6_input.c:69
        NF_HOOK ./include/linux/netfilter.h:257
        ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
        __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
        __netif_receive_skb net/core/dev.c:4246
        process_backlog+0x667/0xba0 net/core/dev.c:4866
        napi_poll net/core/dev.c:5268
        net_rx_action+0xc95/0x1590 net/core/dev.c:5333
        __do_softirq+0x485/0x942 kernel/softirq.c:284
       ==================================================================
      
      Similar error is reported for cookie_v4_check().
      
      Fixes: 58d607d3 ("tcp: provide skb->hash to synack packets")
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18bcf290
    • F
      xfrm: remove flow cache · 09c75704
      Florian Westphal 提交于
      After rcu conversions performance degradation in forward tests isn't that
      noticeable anymore.
      
      See next patch for some numbers.
      
      A followup patcg could then also remove genid from the policies
      as we do not cache bundles anymore.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09c75704
    • F
      net: xfrm: revert to lower xfrm dst gc limit · 3c2a89dd
      Florian Westphal 提交于
      revert c386578f ("xfrm: Let the flowcache handle its size by default.").
      
      Once we remove flow cache, we don't have a flow cache limit anymore.
      We must not allow (virtually) unlimited allocations of xfrm dst entries.
      Revert back to the old xfrm dst gc limits.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c2a89dd
    • F
      vti: revert flush x-netns xfrm cache when vti interface is removed · 6b1c42e9
      Florian Westphal 提交于
      flow cache is removed in next commit.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b1c42e9
  6. 18 7月, 2017 2 次提交
  7. 06 7月, 2017 1 次提交
  8. 05 7月, 2017 1 次提交
  9. 04 7月, 2017 7 次提交
  10. 03 7月, 2017 1 次提交
    • S
      ipv6: dad: don't remove dynamic addresses if link is down · ec8add2a
      Sabrina Dubroca 提交于
      Currently, when the link for $DEV is down, this command succeeds but the
      address is removed immediately by DAD (1):
      
          ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800
      
      In the same situation, this will succeed and not remove the address (2):
      
          ip addr add 1111::12/64 dev $DEV
          ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800
      
      The comment in addrconf_dad_begin() when !IF_READY makes it look like
      this is the intended behavior, but doesn't explain why:
      
           * If the device is not ready:
           * - keep it tentative if it is a permanent address.
           * - otherwise, kill it.
      
      We clearly cannot prevent userspace from doing (2), but we can make (1)
      work consistently with (2).
      
      addrconf_dad_stop() is only called in two cases: if DAD failed, or to
      skip DAD when the link is down. In that second case, the fix is to avoid
      deleting the address, like we already do for permanent addresses.
      
      Fixes: 3c21edbd ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec8add2a
  11. 01 7月, 2017 4 次提交
  12. 30 6月, 2017 1 次提交
  13. 28 6月, 2017 1 次提交
  14. 27 6月, 2017 3 次提交
  15. 25 6月, 2017 1 次提交
  16. 24 6月, 2017 2 次提交
    • W
      sit: use __GFP_NOWARN for user controlled allocation · 0ccc22f4
      WANG Cong 提交于
      The memory allocation size is controlled by user-space,
      if it is too large just fail silently and return NULL,
      not to mention there is a fallback allocation later.
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ccc22f4
    • M
      net: account for current skb length when deciding about UFO · a5cb659b
      Michal Kubeček 提交于
      Our customer encountered stuck NFS writes for blocks starting at specific
      offsets w.r.t. page boundary caused by networking stack sending packets via
      UFO enabled device with wrong checksum. The problem can be reproduced by
      composing a long UDP datagram from multiple parts using MSG_MORE flag:
      
        sendto(sd, buff, 1000, MSG_MORE, ...);
        sendto(sd, buff, 1000, MSG_MORE, ...);
        sendto(sd, buff, 3000, 0, ...);
      
      Assume this packet is to be routed via a device with MTU 1500 and
      NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(),
      this condition is tested (among others) to decide whether to call
      ip_ufo_append_data():
      
        ((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))
      
      At the moment, we already have skb with 1028 bytes of data which is not
      marked for GSO so that the test is false (fragheaderlen is usually 20).
      Thus we append second 1000 bytes to this skb without invoking UFO. Third
      sendto(), however, has sufficient length to trigger the UFO path so that we
      end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb()
      uses udp_csum() to calculate the checksum but that assumes all fragments
      have correct checksum in skb->csum which is not true for UFO fragments.
      
      When checking against MTU, we need to add skb->len to length of new segment
      if we already have a partially filled skb and fragheaderlen only if there
      isn't one.
      
      In the IPv6 case, skb can only be null if this is the first segment so that
      we have to use headersize (length of the first IPv6 header) rather than
      fragheaderlen (length of IPv6 header of further fragments) for skb == NULL.
      
      Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
      Fixes: e4c5e13a ("ipv6: Should use consistent conditional judgement for
      	ip6 fragment between __ip6_append_data and ip6_finish_output")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5cb659b
  17. 23 6月, 2017 2 次提交
  18. 22 6月, 2017 4 次提交
  19. 21 6月, 2017 1 次提交