1. 20 7月, 2017 4 次提交
  2. 19 7月, 2017 6 次提交
    • D
      netfilter: fix netfilter_net_init() return · 073dd5ad
      Dan Carpenter 提交于
      We accidentally return an uninitialized variable.
      
      Fixes: cf56c2f8 ("netfilter: remove old pre-netns era hook api")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      073dd5ad
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 3e16afd3
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for your net tree,
      they are:
      
      1) Missing netlink message sanity check in nfnetlink, patch from
         Mateusz Jurczyk.
      
      2) We now have netfilter per-netns hooks, so let's kill global hook
         infrastructure, this infrastructure is known to be racy with netns.
         We don't care about out of tree modules. Patch from Florian Westphal.
      
      3) find_appropriate_src() is buggy when colissions happens after the
         conversion of the nat bysource to rhashtable. Also from Florian.
      
      4) Remove forward chain in nf_tables arp family, it's useless and it is
         causing quite a bit of confusion, from Florian Westphal.
      
      5) nf_ct_remove_expect() is called with the wrong parameter, causing
         kernel oops, patch from Florian Westphal.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e16afd3
    • P
      udp: preserve skb->dst if required for IP options processing · 0ddf3fb2
      Paolo Abeni 提交于
      Eric noticed that in udp_recvmsg() we still need to access
      skb->dst while processing the IP options.
      Since commit 0a463c78 ("udp: avoid a cache miss on dequeue")
      skb->dst is no more available at recvmsg() time and bad things
      will happen if we enter the relevant code path.
      
      This commit address the issue, avoid clearing skb->dst if
      any IP options are present into the relevant skb.
      Since the IP CB is contained in the first skb cacheline, we can
      test it to decide to leverage the consume_stateless_skb()
      optimization, without measurable additional cost in the faster
      path.
      
      v1 -> v2: updated commit message tags
      
      Fixes: 0a463c78 ("udp: avoid a cache miss on dequeue")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ddf3fb2
    • C
      atm: zatm: Fix an error handling path in 'zatm_init_one()' · 799f9172
      Christophe Jaillet 提交于
      If 'dma_set_mask_and_coherent()' fails, we must undo the previous
      'pci_request_regions()' call.
      Adjust corresponding 'goto' to jump at the right place of the error
      handling path.
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      799f9172
    • A
      ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check() · 18bcf290
      Alexander Potapenko 提交于
      KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(),
      which originated from the TCP request socket created in
      cookie_v6_check():
      
       ==================================================================
       BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0
       CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies.  Check SNMP counters.
       Call Trace:
        <IRQ>
        __dump_stack lib/dump_stack.c:16
        dump_stack+0x172/0x1c0 lib/dump_stack.c:52
        kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
        __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
        skb_set_hash_from_sk ./include/net/sock.h:2011
        tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983
        tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493
        tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284
        tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309
        call_timer_fn+0x240/0x520 kernel/time/timer.c:1268
        expire_timers kernel/time/timer.c:1307
        __run_timers+0xc13/0xf10 kernel/time/timer.c:1601
        run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614
        __do_softirq+0x485/0x942 kernel/softirq.c:284
        invoke_softirq kernel/softirq.c:364
        irq_exit+0x1fa/0x230 kernel/softirq.c:405
        exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657
        smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
        apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
       RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36
       RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77
       RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
       RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
       RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005
       RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770
       RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004
       R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810
       R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4
        </IRQ>
        poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293
        SYSC_select+0x4b4/0x4e0 fs/select.c:653
        SyS_select+0x76/0xa0 fs/select.c:634
        entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
       RIP: 0033:0x4597e7
       RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017
       RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
       RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059
       R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20
       R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003
       chained origin:
        save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
        kmsan_save_stack mm/kmsan/kmsan.c:317
        kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547
        __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259
        tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472
        tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103
        tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212
        cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245
        tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
        tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
        tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
        ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
        NF_HOOK ./include/linux/netfilter.h:257
        ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
        dst_input ./include/net/dst.h:492
        ip6_rcv_finish net/ipv6/ip6_input.c:69
        NF_HOOK ./include/linux/netfilter.h:257
        ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
        __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
        __netif_receive_skb net/core/dev.c:4246
        process_backlog+0x667/0xba0 net/core/dev.c:4866
        napi_poll net/core/dev.c:5268
        net_rx_action+0xc95/0x1590 net/core/dev.c:5333
        __do_softirq+0x485/0x942 kernel/softirq.c:284
       origin:
        save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
        kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
        kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
        kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766
        reqsk_alloc ./include/net/request_sock.h:87
        inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200
        cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169
        tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
        tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
        tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
        ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
        NF_HOOK ./include/linux/netfilter.h:257
        ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
        dst_input ./include/net/dst.h:492
        ip6_rcv_finish net/ipv6/ip6_input.c:69
        NF_HOOK ./include/linux/netfilter.h:257
        ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
        __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
        __netif_receive_skb net/core/dev.c:4246
        process_backlog+0x667/0xba0 net/core/dev.c:4866
        napi_poll net/core/dev.c:5268
        net_rx_action+0xc95/0x1590 net/core/dev.c:5333
        __do_softirq+0x485/0x942 kernel/softirq.c:284
       ==================================================================
      
      Similar error is reported for cookie_v4_check().
      
      Fixes: 58d607d3 ("tcp: provide skb->hash to synack packets")
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18bcf290
    • G
      ppp: Fix false xmit recursion detect with two ppp devices · e5dadc65
      Gao Feng 提交于
      The global percpu variable ppp_xmit_recursion is used to detect the ppp
      xmit recursion to avoid the deadlock, which is caused by one CPU tries to
      lock the xmit lock twice. But it would report false recursion when one CPU
      wants to send the skb from two different PPP devices, like one L2TP on the
      PPPoE. It is a normal case actually.
      
      Now use one percpu member of struct ppp instead of the gloable variable to
      detect the xmit recursion of one ppp device.
      
      Fixes: 55454a56 ("ppp: avoid dealock on recursive xmit")
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: NLiu Jianying <jianying.liu@ikuai8.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5dadc65
  3. 18 7月, 2017 2 次提交
  4. 17 7月, 2017 5 次提交
  5. 16 7月, 2017 23 次提交
    • D
      Merge branch 'bcmgenet-Fragmented-SKB-corrections' · 533da29b
      David S. Miller 提交于
      Doug Berger says:
      
      ====================
      bcmgenet: Fragmented SKB corrections
      
      Two issues were observed in a review of the bcmgenet driver support for
      fragmented SKBs which are addressed by this patch set.
      
      The first addresses a problem that could occur if the driver is not able
      to DMA map a fragment of the SKB.  This would be a highly unusual event
      but it would leave the hardware descriptors in an invalid state which
      should be prevented.
      
      The second is a hazard that could occur if the driver is able to reclaim
      the first control block of a fragmented SKB before all of its fragments
      have completed processing by the hardware.  In this case the SKB could
      be freed leading to reuse of memory that is still in use by hardware.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      533da29b
    • D
      net: bcmgenet: Free skb after last Tx frag · f48bed16
      Doug Berger 提交于
      Since the skb is attached to the first control block of a fragmented
      skb it is possible that the skb could be freed when reclaiming that
      control block before all fragments of the skb have been consumed by
      the hardware and unmapped.
      
      This commit introduces first_cb and last_cb pointers to the skb
      control block used by the driver to keep track of which transmit
      control blocks within a transmit ring are the first and last ones
      associated with the skb.
      
      It then splits the bcmgenet_free_cb() function into transmit
      (bcmgenet_free_tx_cb) and receive (bcmgenet_free_rx_cb) versions
      that can handle the unmapping of dma mapped memory and cleaning up
      the corresponding control block structure so that the skb is only
      freed after the last associated transmit control block is reclaimed.
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f48bed16
    • D
      net: bcmgenet: Fix unmapping of fragments in bcmgenet_xmit() · 876dbadd
      Doug Berger 提交于
      In case we fail to map a single fragment, we would be leaving the
      transmit ring populated with stale entries.
      
      This commit introduces the helper function bcmgenet_put_txcb()
      which takes care of rewinding the per-ring write pointer back to
      where we left.
      
      It also consolidates the functionality of bcmgenet_xmit_single()
      and bcmgenet_xmit_frag() into the bcmgenet_xmit() function to
      make the unmapping of control blocks cleaner.
      
      Fixes: 1c1008c7 ("net: bcmgenet: add main driver file")
      Suggested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      876dbadd
    • F
      dt-bindings: net: Remove duplicate NSP Ethernet MAC binding document · ea6c3077
      Florian Fainelli 提交于
      Commit 07d4510f ("dt-bindings: net: bgmac: add bindings documentation for
      bgmac") added both brcm,amac-nsp.txt and brcm,bgmac-nsp.txt. The former is
      actually the one that got updated and is in use by the bgmac driver while the
      latter is duplicating the former and is not used nor updated.
      
      Fixes: 07d4510f ("dt-bindings: net: bgmac: add bindings documentation for bgmac")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea6c3077
    • D
      Merge branch 'isdn-const-pci_device_ids' · 2da73d20
      David S. Miller 提交于
      Arvind Yadav says:
      
      ====================
      Constify isdn pci_device_id's.
      
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2da73d20
    • A
      isdn: avm: c4: constify pci_device_id. · 65f96417
      Arvind Yadav 提交于
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
        11803	    544	      1	  12348	   303c	isdn/hardware/avm/c4.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        11931	    416	      1	  12348	   303c	isdn/hardware/avm/c4.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65f96417
    • A
      isdn: mISDN: hfcpci: constify pci_device_id. · ed038e7e
      Arvind Yadav 提交于
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
        21656	   1024	     96	  22776	   58f8	isdn/hardware/mISDN/hfcpci.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        22424	    256	     96	  22776	   58f8	isdn/hardware/mISDN/hfcpci.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed038e7e
    • A
      isdn: mISDN: avmfritz: constify pci_device_id. · 1d9c8fa0
      Arvind Yadav 提交于
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
         9963	   1936	     16	  11915	   2e8b	isdn/hardware/mISDN/avmfritz.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        10091	   1808	     16	  11915	   2e8b	isdn/hardware/mISDN/avmfritz.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d9c8fa0
    • A
      isdn: mISDN: w6692: constify pci_device_id. · e8336ed0
      Arvind Yadav 提交于
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
        13959	   4080	     24	  18063	   468f isdn/hardware/mISDN/w6692.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        14087	   3952	     24	  18063	   468f isdn/hardware/mISDN/w6692.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8336ed0
    • A
      isdn: mISDN: hfcmulti: constify pci_device_id. · e3b79fcf
      Arvind Yadav 提交于
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
        63450	   1536	   1492	  66478	  103ae	isdn/hardware/mISDN/hfcmulti.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        64698	    288	   1492	  66478	  103ae	isdn/hardware/mISDN/hfcmulti.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3b79fcf
    • A
      isdn: mISDN: netjet: constify pci_device_id. · 0d416689
      Arvind Yadav 提交于
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
        10941	   1776	     16	  12733	   31bd isdn/hardware/mISDN/netjet.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        11005	   1712	     16	  12733	   31bd isdn/hardware/mISDN/netjet.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d416689
    • A
      isdn: eicon: constify pci_device_id. · cf46d351
      Arvind Yadav 提交于
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
         6224	    655	      8	   6887	   1ae7	isdn/hardware/eicon/divasmain.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
         6608	    271	      8	   6887	   1ae7	isdn/hardware/eicon/divasmain.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf46d351
    • A
      isdn: hisax: hisax_fcpcipnp: constify pci_device_id. · 6cfc3d86
      Arvind Yadav 提交于
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
         5989	    576	      0	   6565	   19a5 isdn/hisax/hisax_fcpcipnp.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
         6085	    480	      0	   6565	   19a5 isdn/hisax/hisax_fcpcipnp.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6cfc3d86
    • A
      isdn: hisax: hfc4s8s_l1: constify pci_device_id. · 3651003d
      Arvind Yadav 提交于
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
        10512	    536	      4	  11052	   2b2c	drivers/isdn/hisax/hfc4s8s_l1.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        10672	    376	      4	  11052	   2b2c	drivers/isdn/hisax/hfc4s8s_l1.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3651003d
    • A
      isdn: hisax: constify pci_device_id. · cd7b03e9
      Arvind Yadav 提交于
      pci_device_id are not supposed to change at runtime. All functions
      working with pci_device_id provided by <linux/pci.h> work with
      const pci_device_id. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
        13686	   2064	   4416	  20166	   4ec6	drivers/isdn/hisax/config.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        15030	    720	   4416	  20166	   4ec6	drivers/isdn/hisax/config.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd7b03e9
    • N
      tcp_bbr: init pacing rate on first RTT sample · 32984565
      Neal Cardwell 提交于
      Fixes the following behavior: for connections that had no RTT sample
      at the time of initializing congestion control, BBR was initializing
      the pacing rate to a high nominal rate (based an a guess of RTT=1ms,
      in case this is LAN traffic). Then BBR never adjusted the pacing rate
      downward upon obtaining an actual RTT sample, if the connection never
      filled the pipe (e.g. all sends were small app-limited writes()).
      
      This fix adjusts the pacing rate upon obtaining the first RTT sample.
      
      Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32984565
    • N
      tcp_bbr: remove sk_pacing_rate=0 transient during init · 1d3648eb
      Neal Cardwell 提交于
      Fix a corner case noticed by Eric Dumazet, where BBR's setting
      sk->sk_pacing_rate to 0 during initialization could theoretically
      cause packets in the sending host to hang if there were packets "in
      flight" in the pacing infrastructure at the time the BBR congestion
      control state is initialized. This could occur if the pacing
      infrastructure happened to race with bbr_init() in a way such that the
      pacer read the 0 rather than the immediately following non-zero pacing
      rate.
      
      Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d3648eb
    • N
      tcp_bbr: introduce bbr_init_pacing_rate_from_rtt() helper · 79135b89
      Neal Cardwell 提交于
      Introduce a helper to initialize the BBR pacing rate unconditionally,
      based on the current cwnd and RTT estimate. This is a pure refactor,
      but is needed for two following fixes.
      
      Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79135b89
    • N
      tcp_bbr: introduce bbr_bw_to_pacing_rate() helper · f19fd62d
      Neal Cardwell 提交于
      Introduce a helper to convert a BBR bandwidth and gain factor to a
      pacing rate in bytes per second. This is a pure refactor, but is
      needed for two following fixes.
      
      Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f19fd62d
    • N
      tcp_bbr: cut pacing rate only if filled pipe · 4aea287e
      Neal Cardwell 提交于
      In bbr_set_pacing_rate(), which decides whether to cut the pacing
      rate, there was some code that considered exiting STARTUP to be
      equivalent to the notion of filling the pipe (i.e.,
      bbr_full_bw_reached()). Specifically, as the code was structured,
      exiting STARTUP and going into PROBE_RTT could cause us to cut the
      pacing rate down to something silly and low, based on whatever
      bandwidth samples we've had so far, when it's possible that all of
      them have been small app-limited bandwidth samples that are not
      representative of the bandwidth available in the path. (The code was
      correct at the time it was written, but the state machine changed
      without this spot being adjusted correspondingly.)
      
      Fixes: 0f8782ea ("tcp_bbr: add BBR congestion control")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4aea287e
    • G
      openvswitch: Fix for force/commit action failures · 8b97ac5b
      Greg Rose 提交于
      When there is an established connection in direction A->B, it is
      possible to receive a packet on port B which then executes
      ct(commit,force) without first performing ct() - ie, a lookup.
      In this case, we would expect that this packet can delete the existing
      entry so that we can commit a connection with direction B->A. However,
      currently we only perform a check in skb_nfct_cached() for whether
      OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
      lookup previously occurred. In the above scenario, a lookup has not
      occurred but we should still be able to statelessly look up the
      existing entry and potentially delete the entry if it is in the
      opposite direction.
      
      This patch extends the check to also hint that if the action has the
      force flag set, then we will lookup the existing entry so that the
      force check at the end of skb_nfct_cached has the ability to delete
      the connection.
      
      Fixes: dd41d330b03 ("openvswitch: Add force commit.")
      CC: Pravin Shelar <pshelar@nicira.com>
      CC: dev@openvswitch.org
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b97ac5b
    • A
      sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}() · b1f5bfc2
      Alexander Potapenko 提交于
      If the length field of the iterator (|pos.p| or |err|) is past the end
      of the chunk, we shouldn't access it.
      
      This bug has been detected by KMSAN. For the following pair of system
      calls:
      
        socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3
        sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0),
               inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0,
               sin6_scope_id=0}, 28) = 1
      
      the tool has reported a use of uninitialized memory:
      
        ==================================================================
        BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0
        CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
        01/01/2011
        Call Trace:
         <IRQ>
         __dump_stack lib/dump_stack.c:16
         dump_stack+0x172/0x1c0 lib/dump_stack.c:52
         kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
         __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
         __sctp_rcv_init_lookup net/sctp/input.c:1074
         __sctp_rcv_lookup_harder net/sctp/input.c:1233
         __sctp_rcv_lookup net/sctp/input.c:1255
         sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170
         sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984
         ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
         NF_HOOK ./include/linux/netfilter.h:257
         ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
         dst_input ./include/net/dst.h:492
         ip6_rcv_finish net/ipv6/ip6_input.c:69
         NF_HOOK ./include/linux/netfilter.h:257
         ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
         __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
         __netif_receive_skb net/core/dev.c:4246
         process_backlog+0x667/0xba0 net/core/dev.c:4866
         napi_poll net/core/dev.c:5268
         net_rx_action+0xc95/0x1590 net/core/dev.c:5333
         __do_softirq+0x485/0x942 kernel/softirq.c:284
         do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
         </IRQ>
         do_softirq kernel/softirq.c:328
         __local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181
         local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31
         rcu_read_unlock_bh ./include/linux/rcupdate.h:931
         ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124
         ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149
         NF_HOOK_COND ./include/linux/netfilter.h:246
         ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163
         dst_output ./include/net/dst.h:486
         NF_HOOK ./include/linux/netfilter.h:257
         ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261
         sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225
         sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632
         sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
         sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
         sctp_side_effects net/sctp/sm_sideeffect.c:1773
         sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
         sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
         sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
         inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
         sock_sendmsg_nosec net/socket.c:633
         sock_sendmsg net/socket.c:643
         SYSC_sendto+0x608/0x710 net/socket.c:1696
         SyS_sendto+0x8a/0xb0 net/socket.c:1664
         do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
         entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
        RIP: 0033:0x401133
        RSP: 002b:00007fff6d99cd38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
        RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401133
        RDX: 0000000000000001 RSI: 0000000000494088 RDI: 0000000000000003
        RBP: 00007fff6d99cd90 R08: 00007fff6d99cd50 R09: 000000000000001c
        R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
        R13: 00000000004063d0 R14: 0000000000406460 R15: 0000000000000000
        origin:
         save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
         kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
         kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
         kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211
         slab_alloc_node mm/slub.c:2743
         __kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351
         __kmalloc_reserve net/core/skbuff.c:138
         __alloc_skb+0x26b/0x840 net/core/skbuff.c:231
         alloc_skb ./include/linux/skbuff.h:933
         sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570
         sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
         sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
         sctp_side_effects net/sctp/sm_sideeffect.c:1773
         sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
         sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
         sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
         inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
         sock_sendmsg_nosec net/socket.c:633
         sock_sendmsg net/socket.c:643
         SYSC_sendto+0x608/0x710 net/socket.c:1696
         SyS_sendto+0x8a/0xb0 net/socket.c:1664
         do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
         return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
        ==================================================================
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1f5bfc2
    • V
      ipv4: ip_do_fragment: fix headroom tests · 254d900b
      Vasily Averin 提交于
      Some time ago David Woodhouse reported skb_under_panic
      when we try to push ethernet header to fragmented ipv6 skbs.
      It was fixed for ipv6 by Florian Westphal in
      commit 1d325d21 ("ipv6: ip6_fragment: fix headroom tests and skb leak")
      
      However similar problem still exist in ipv4.
      
      It does not trigger skb_under_panic due paranoid check
      in ip_finish_output2, however according to Alexey Kuznetsov
      current state is abnormal and ip_fragment should be fixed too.
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      254d900b