1. 22 6月, 2018 16 次提交
    • W
      tcp: ignore rcv_rtt sample with old ts ecr value · 3f6c65d6
      Wei Wang 提交于
      When receiving multiple packets with the same ts ecr value, only try
      to compute rcv_rtt sample with the earliest received packet.
      This is because the rcv_rtt calculated by later received packets
      could possibly include long idle time or other types of delay.
      For example:
      (1) server sends last packet of reply with TS val V1
      (2) client ACKs last packet of reply with TS ecr V1
      (3) long idle time passes
      (4) client sends next request data packet with TS ecr V1 (again!)
      At this time, the rcv_rtt computed on server with TS ecr V1 will be
      inflated with the idle time and should get ignored.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f6c65d6
    • D
      Merge branch 'rhashtables-cleanups' · 66caeeb9
      David S. Miller 提交于
      NeilBrown says:
      
      ====================
      Assorted rhashtables cleanups.
      
      Following 7 patches are selections from a recent RFC series I posted
      that have all received suitable Acks.
      
      The most visible changes are that rhashtable-types.h is now preferred
      for inclusion in include/linux/*.h rather than rhashtable.h, and
      that the full hash is used - no bits a reserved for a NULLS pointer.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66caeeb9
    • N
      rhashtable: clean up dereference of ->future_tbl. · c0690016
      NeilBrown 提交于
      Using rht_dereference_bucket() to dereference
      ->future_tbl looks like a type error, and could be confusing.
      Using rht_dereference_rcu() to test a pointer for NULL
      adds an unnecessary barrier - rcu_access_pointer() is preferred
      for NULL tests when no lock is held.
      
      This uses 3 different ways to access ->future_tbl.
      - if we know the mutex is held, use rht_dereference()
      - if we don't hold the mutex, and are only testing for NULL,
        use rcu_access_pointer()
      - otherwise (using RCU protection for true dereference),
        use rht_dereference_rcu().
      
      Note that this includes a simplification of the call to
      rhashtable_last_table() - we don't do an extra dereference
      before the call any more.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0690016
    • N
      rhashtable: use cmpxchg() to protect ->future_tbl. · 0ad66449
      NeilBrown 提交于
      Rather than borrowing one of the bucket locks to
      protect ->future_tbl updates, use cmpxchg().
      This gives more freedom to change how bucket locking
      is implemented.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ad66449
    • N
      rhashtable: simplify nested_table_alloc() and rht_bucket_nested_insert() · 5af68ef7
      NeilBrown 提交于
      Now that we don't use the hash value or shift in nested_table_alloc()
      there is room for simplification.
      We only need to pass a "is this a leaf" flag to nested_table_alloc(),
      and don't need to track as much information in
      rht_bucket_nested_insert().
      
      Note there is another minor cleanup in nested_table_alloc() here.
      The number of elements in a page of "union nested_tables" is most naturally
      
        PAGE_SIZE / sizeof(ntbl[0])
      
      The previous code had
      
        PAGE_SIZE / sizeof(ntbl[0].bucket)
      
      which happens to be the correct value only because the bucket uses all
      the space in the union.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5af68ef7
    • N
      rhashtable: simplify INIT_RHT_NULLS_HEAD() · 9b4f64a2
      NeilBrown 提交于
      The 'ht' and 'hash' arguments to INIT_RHT_NULLS_HEAD() are
      no longer used - so drop them.  This allows us to also
      remove the nhash argument from nested_table_alloc().
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b4f64a2
    • N
      rhashtable: remove nulls_base and related code. · 9f9a7077
      NeilBrown 提交于
      This "feature" is unused, undocumented, and untested and so doesn't
      really belong.  A patch is under development to properly implement
      support for detecting when a search gets diverted down a different
      chain, which the common purpose of nulls markers.
      
      This patch actually fixes a bug too.  The table resizing allows a
      table to grow to 2^31 buckets, but the hash is truncated to 27 bits -
      any growth beyond 2^27 is wasteful an ineffective.
      
      This patch results in NULLS_MARKER(0) being used for all chains,
      and leaves the use of rht_is_a_null() to test for it.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f9a7077
    • N
      rhashtable: split rhashtable.h · 0eb71a9d
      NeilBrown 提交于
      Due to the use of rhashtables in net namespaces,
      rhashtable.h is included in lots of the kernel,
      so a small changes can required a large recompilation.
      This makes development painful.
      
      This patch splits out rhashtable-types.h which just includes
      the major type declarations, and does not include (non-trivial)
      inline code.  rhashtable.h is no longer included by anything
      in the include/ directory.
      Common include files only include rhashtable-types.h so a large
      recompilation is only triggered when that changes.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eb71a9d
    • N
      rhashtable: silence RCU warning in rhashtable_test. · cbab9012
      NeilBrown 提交于
      print_ht in rhashtable_test calls rht_dereference() with neither
      RCU protection or the mutex.  This triggers an RCU warning.
      So take the mutex to silence the warning.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbab9012
    • C
      VSOCK: fix loopback on big-endian systems · e5ab564c
      Claudio Imbrenda 提交于
      The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be
      used, and in fact in virtio_transport_common.c only 64 bit accessors are
      used. Using 32 bit accessors for 64 bit values breaks big endian systems.
      
      This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt.
      
      Fixes: b9116823 ("VSOCK: add loopback to virtio_transport")
      Signed-off-by: NClaudio Imbrenda <imbrenda@linux.vnet.ibm.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5ab564c
    • C
      net: ethernet: ti: davinci_cpdma: make function cpdma_desc_pool_create static · 40141bb4
      Colin Ian King 提交于
      The function cpdma_desc_pool_create is local to the source and does not
      need to be in global scope, so make it static.
      
      Cleans up sparse warning:
      warning: symbol 'cpdma_desc_pool_create' was not declared. Should it
      be static?
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40141bb4
    • D
      Merge branch 'xen-netfront-fixes' · 962c661f
      David S. Miller 提交于
      Ross Lagerwall says:
      
      ====================
      xen-netfront: Fix issues with commit f599c64f
      
      Fix a couple of issues with commit f599c64f ("xen-netfront: Fix race
      between device setup and open").
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      962c661f
    • R
      xen-netfront: Update features after registering netdev · 45c8184c
      Ross Lagerwall 提交于
      Update the features after calling register_netdev() otherwise the
      device features are not set up correctly and it not possible to change
      the MTU of the device. After this change, the features reported by
      ethtool match the device's features before the commit which introduced
      the issue and it is possible to change the device's MTU.
      
      Fixes: f599c64f ("xen-netfront: Fix race between device setup and open")
      Reported-by: NLiam Shepherd <liam@dancer.es>
      Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45c8184c
    • R
      xen-netfront: Fix mismatched rtnl_unlock · cb257783
      Ross Lagerwall 提交于
      Fixes: f599c64f ("xen-netfront: Fix race between device setup and open")
      Reported-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb257783
    • P
      cls_flower: fix use after free in flower S/W path · 44a5cd43
      Paolo Abeni 提交于
      If flower filter is created without the skip_sw flag, fl_mask_put()
      can race with fl_classify() and we can destroy the mask rhashtable
      while a lookup operation is accessing it.
      
       BUG: unable to handle kernel paging request at 00000000000911d1
       PGD 0 P4D 0
       SMP PTI
       CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950
       Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
       RIP: 0010:rht_bucket_nested+0x20/0x60
       Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0
       RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206
       RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12
       RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001
       RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2
       R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000
       R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68
       FS:  0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0
       Call Trace:
        fl_lookup+0x134/0x140 [cls_flower]
        fl_classify+0xf3/0x180 [cls_flower]
        tcf_classify+0x78/0x150
        __netif_receive_skb_core+0x69e/0xa50
        netif_receive_skb_internal+0x42/0xf0
        tun_get_user+0xdd5/0xfd0 [tun]
        tun_sendmsg+0x52/0x70 [tun]
        handle_tx+0x2b3/0x5f0 [vhost_net]
        vhost_worker+0xab/0x100 [vhost]
        kthread+0xf8/0x130
        ret_from_fork+0x35/0x40
       Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress
       CR2: 00000000000911d1
      
      Fix the above waiting for a RCU grace period before destroying the
      rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy()
      must run in process context, as pointed out by Cong Wang.
      
      v1 -> v2: use tcf_queue_work to run rhashtable_destroy().
      
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44a5cd43
    • E
      net/packet: fix use-after-free · 945d015e
      Eric Dumazet 提交于
      We should put copy_skb in receive_queue only after
      a successful call to virtio_net_hdr_from_skb().
      
      syzbot report :
      
      BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline]
      BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline]
      BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
      Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553
      
      CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       __skb_unlink include/linux/skbuff.h:1843 [inline]
       __skb_dequeue include/linux/skbuff.h:1863 [inline]
       skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
       skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852
       packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331
       packet_release+0x630/0xd90 net/packet/af_packet.c:2991
       __sock_release+0xd7/0x260 net/socket.c:603
       sock_close+0x19/0x20 net/socket.c:1186
       __fput+0x35b/0x8b0 fs/file_table.c:209
       ____fput+0x15/0x20 fs/file_table.c:243
       task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x1b08/0x2750 kernel/exit.c:865
       do_group_exit+0x177/0x440 kernel/exit.c:968
       __do_sys_exit_group kernel/exit.c:979 [inline]
       __se_sys_exit_group kernel/exit.c:977 [inline]
       __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4448e9
      Code: Bad RIP value.
      RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9
      RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001
      RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000
      R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0
      R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 4553:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       skb_clone+0x1f5/0x500 net/core/skbuff.c:1282
       tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221
       deliver_skb net/core/dev.c:1925 [inline]
       deliver_ptype_list_skb net/core/dev.c:1940 [inline]
       __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
       netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
       netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
       tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
       tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
       call_write_iter include/linux/fs.h:1795 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
       vfs_write+0x1f8/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 4553:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
       __kfree_skb net/core/skbuff.c:642 [inline]
       kfree_skb+0x1a5/0x580 net/core/skbuff.c:659
       tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385
       deliver_skb net/core/dev.c:1925 [inline]
       deliver_ptype_list_skb net/core/dev.c:1940 [inline]
       __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
       netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
       netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
       tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
       tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
       call_write_iter include/linux/fs.h:1795 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
       vfs_write+0x1f8/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801b044ecc0
       which belongs to the cache skbuff_head_cache of size 232
      The buggy address is located 0 bytes inside of
       232-byte region [ffff8801b044ecc0, ffff8801b044eda8)
      The buggy address belongs to the page:
      page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0
      raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
      >ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                 ^
       ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: 58d19b19 ("packet: vnet_hdr support for tpacket_rcv")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      945d015e
  2. 21 6月, 2018 13 次提交
  3. 20 6月, 2018 11 次提交