1. 20 11月, 2021 1 次提交
  2. 16 11月, 2021 1 次提交
  3. 26 10月, 2021 1 次提交
  4. 30 9月, 2021 1 次提交
  5. 21 7月, 2021 1 次提交
  6. 29 6月, 2021 1 次提交
  7. 11 11月, 2020 1 次提交
  8. 25 9月, 2020 1 次提交
    • M
      net/ipv4: always honour route mtu during forwarding · 02a1b175
      Maciej Żenczykowski 提交于
      Documentation/networking/ip-sysctl.txt:46 says:
        ip_forward_use_pmtu - BOOLEAN
          By default we don't trust protocol path MTUs while forwarding
          because they could be easily forged and can lead to unwanted
          fragmentation by the router.
          You only need to enable this if you have user-space software
          which tries to discover path mtus by itself and depends on the
          kernel honoring this information. This is normally not the case.
          Default: 0 (disabled)
          Possible values:
          0 - disabled
          1 - enabled
      
      Which makes it pretty clear that setting it to 1 is a potential
      security/safety/DoS issue, and yet it is entirely reasonable to want
      forwarded traffic to honour explicitly administrator configured
      route mtus (instead of defaulting to device mtu).
      
      Indeed, I can't think of a single reason why you wouldn't want to.
      Since you configured a route mtu you probably know better...
      
      It is pretty common to have a higher device mtu to allow receiving
      large (jumbo) frames, while having some routes via that interface
      (potentially including the default route to the internet) specify
      a lower mtu.
      
      Note that ipv6 forwarding uses device mtu unless the route is locked
      (in which case it will use the route mtu).
      
      This approach is not usable for IPv4 where an 'mtu lock' on a route
      also has the side effect of disabling TCP path mtu discovery via
      disabling the IPv4 DF (don't frag) bit on all outgoing frames.
      
      I'm not aware of a way to lock a route from an IPv6 RA, so that also
      potentially seems wrong.
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com>
      Cc: Vinay Paradkar <vparadka@qti.qualcomm.com>
      Cc: Tyler Wear <twear@quicinc.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02a1b175
  9. 11 9月, 2020 1 次提交
  10. 25 7月, 2020 2 次提交
  11. 20 7月, 2020 1 次提交
  12. 21 6月, 2020 1 次提交
  13. 29 5月, 2020 5 次提交
  14. 08 12月, 2019 1 次提交
    • E
      inet: protect against too small mtu values. · 501a90c9
      Eric Dumazet 提交于
      syzbot was once again able to crash a host by setting a very small mtu
      on loopback device.
      
      Let's make inetdev_valid_mtu() available in include/net/ip.h,
      and use it in ip_setup_cork(), so that we protect both ip_append_page()
      and __ip_append_data()
      
      Also add a READ_ONCE() when the device mtu is read.
      
      Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(),
      even if other code paths might write over this field.
      
      Add a big comment in include/linux/netdevice.h about dev->mtu
      needing READ_ONCE()/WRITE_ONCE() annotations.
      
      Hopefully we will add the missing ones in followup patches.
      
      [1]
      
      refcount_t: saturated; leaking memory.
      WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x197/0x210 lib/dump_stack.c:118
       panic+0x2e3/0x75c kernel/panic.c:221
       __warn.cold+0x2f/0x3e kernel/panic.c:582
       report_bug+0x289/0x300 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:174 [inline]
       fixup_bug arch/x86/kernel/traps.c:169 [inline]
       do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267
       do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
      RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22
      Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89
      RSP: 0018:ffff88809689f550 EFLAGS: 00010286
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c
      RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1
      R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001
      R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40
       refcount_add include/linux/refcount.h:193 [inline]
       skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999
       sock_wmalloc+0xf1/0x120 net/core/sock.c:2096
       ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383
       udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276
       inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821
       kernel_sendpage+0x92/0xf0 net/socket.c:3794
       sock_sendpage+0x8b/0xc0 net/socket.c:936
       pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458
       splice_from_pipe_feed fs/splice.c:512 [inline]
       __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636
       splice_from_pipe+0x108/0x170 fs/splice.c:671
       generic_splice_sendpage+0x3c/0x50 fs/splice.c:842
       do_splice_from fs/splice.c:861 [inline]
       direct_splice_actor+0x123/0x190 fs/splice.c:1035
       splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990
       do_splice_direct+0x1da/0x2a0 fs/splice.c:1078
       do_sendfile+0x597/0xd00 fs/read_write.c:1464
       __do_sys_sendfile64 fs/read_write.c:1525 [inline]
       __se_sys_sendfile64 fs/read_write.c:1511 [inline]
       __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511
       do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441409
      Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409
      RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005
      RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010
      R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180
      R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000
      Kernel Offset: disabled
      Rebooting in 86400 seconds..
      
      Fixes: 1470ddf7 ("inet: Remove explicit write references to sk/inet in ip_append_data")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      501a90c9
  15. 27 11月, 2019 2 次提交
  16. 23 11月, 2019 1 次提交
  17. 22 10月, 2019 1 次提交
  18. 14 9月, 2019 1 次提交
    • W
      ip: support SO_MARK cmsg · c6af0c22
      Willem de Bruijn 提交于
      Enable setting skb->mark for UDP and RAW sockets using cmsg.
      
      This is analogous to existing support for TOS, TTL, txtime, etc.
      
      Packet sockets already support this as of commit c7d39e32
      ("packet: support per-packet fwmark for af_packet sendmsg").
      
      Similar to other fields, implement by
      1. initialize the sockcm_cookie.mark from socket option sk_mark
      2. optionally overwrite this in ip_cmsg_send/ip6_datagram_send_ctl
      3. initialize inet_cork.mark from sockcm_cookie.mark
      4. initialize each (usually just one) skb->mark from inet_cork.mark
      
      Step 1 is handled in one location for most protocols by ipcm_init_sk
      as of commit 35178206 ("ipv4: ipcm_cookie initializers").
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6af0c22
  19. 15 6月, 2019 1 次提交
  20. 04 6月, 2019 1 次提交
    • E
      net: fix use-after-free in kfree_skb_list · b7034146
      Eric Dumazet 提交于
      syzbot reported nasty use-after-free [1]
      
      Lets remove frag_list field from structs ip_fraglist_iter
      and ip6_fraglist_iter. This seens not needed anyway.
      
      [1] :
      BUG: KASAN: use-after-free in kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706
      Read of size 8 at addr ffff888085a3cbc0 by task syz-executor303/8947
      
      CPU: 0 PID: 8947 Comm: syz-executor303 Not tainted 5.2.0-rc2+ #12
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
       __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       kasan_report+0x12/0x20 mm/kasan/common.c:614
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
       kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706
       ip6_fragment+0x1ef4/0x2680 net/ipv6/ip6_output.c:882
       __ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144
       ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179
       dst_output include/net/dst.h:433 [inline]
       ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179
       ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796
       ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816
       rawv6_push_pending_frames net/ipv6/raw.c:617 [inline]
       rawv6_sendmsg+0x2993/0x35e0 net/ipv6/raw.c:947
       inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:671
       ___sys_sendmsg+0x803/0x920 net/socket.c:2292
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2330
       __do_sys_sendmsg net/socket.c:2339 [inline]
       __se_sys_sendmsg net/socket.c:2337 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x44add9
      Code: e8 7c e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b 05 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f826f33bce8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000006e7a18 RCX: 000000000044add9
      RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000005
      RBP: 00000000006e7a10 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e7a1c
      R13: 00007ffcec4f7ebf R14: 00007f826f33c9c0 R15: 20c49ba5e353f7cf
      
      Allocated by task 8947:
       save_stack+0x23/0x90 mm/kasan/common.c:71
       set_track mm/kasan/common.c:79 [inline]
       __kasan_kmalloc mm/kasan/common.c:489 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
       kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
       slab_post_alloc_hook mm/slab.h:437 [inline]
       slab_alloc_node mm/slab.c:3269 [inline]
       kmem_cache_alloc_node+0x131/0x710 mm/slab.c:3579
       __alloc_skb+0xd5/0x5e0 net/core/skbuff.c:199
       alloc_skb include/linux/skbuff.h:1058 [inline]
       __ip6_append_data.isra.0+0x2a24/0x3640 net/ipv6/ip6_output.c:1519
       ip6_append_data+0x1e5/0x320 net/ipv6/ip6_output.c:1688
       rawv6_sendmsg+0x1467/0x35e0 net/ipv6/raw.c:940
       inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:671
       ___sys_sendmsg+0x803/0x920 net/socket.c:2292
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2330
       __do_sys_sendmsg net/socket.c:2339 [inline]
       __se_sys_sendmsg net/socket.c:2337 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 8947:
       save_stack+0x23/0x90 mm/kasan/common.c:71
       set_track mm/kasan/common.c:79 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
       __cache_free mm/slab.c:3432 [inline]
       kmem_cache_free+0x86/0x260 mm/slab.c:3698
       kfree_skbmem net/core/skbuff.c:625 [inline]
       kfree_skbmem+0xc5/0x150 net/core/skbuff.c:619
       __kfree_skb net/core/skbuff.c:682 [inline]
       kfree_skb net/core/skbuff.c:699 [inline]
       kfree_skb+0xf0/0x390 net/core/skbuff.c:693
       kfree_skb_list+0x44/0x60 net/core/skbuff.c:708
       __dev_xmit_skb net/core/dev.c:3551 [inline]
       __dev_queue_xmit+0x3034/0x36b0 net/core/dev.c:3850
       dev_queue_xmit+0x18/0x20 net/core/dev.c:3914
       neigh_direct_output+0x16/0x20 net/core/neighbour.c:1532
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x1034/0x2550 net/ipv6/ip6_output.c:120
       ip6_fragment+0x1ebb/0x2680 net/ipv6/ip6_output.c:863
       __ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144
       ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179
       dst_output include/net/dst.h:433 [inline]
       ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179
       ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796
       ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816
       rawv6_push_pending_frames net/ipv6/raw.c:617 [inline]
       rawv6_sendmsg+0x2993/0x35e0 net/ipv6/raw.c:947
       inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:671
       ___sys_sendmsg+0x803/0x920 net/socket.c:2292
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2330
       __do_sys_sendmsg net/socket.c:2339 [inline]
       __se_sys_sendmsg net/socket.c:2337 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff888085a3cbc0
       which belongs to the cache skbuff_head_cache of size 224
      The buggy address is located 0 bytes inside of
       224-byte region [ffff888085a3cbc0, ffff888085a3cca0)
      The buggy address belongs to the page:
      page:ffffea0002168f00 refcount:1 mapcount:0 mapping:ffff88821b6f63c0 index:0x0
      flags: 0x1fffc0000000200(slab)
      raw: 01fffc0000000200 ffffea00027bbf88 ffffea0002105b88 ffff88821b6f63c0
      raw: 0000000000000000 ffff888085a3c080 000000010000000c 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888085a3ca80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff888085a3cb00: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
      >ffff888085a3cb80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                 ^
       ffff888085a3cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff888085a3cc80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: 0feca619 ("net: ipv6: add skbuff fraglist splitter")
      Fixes: c8b17be0 ("net: ipv4: add skbuff fraglist splitter")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7034146
  21. 31 5月, 2019 3 次提交
    • P
      net: ipv4: split skbuff into fragments transformer · 065ff79f
      Pablo Neira Ayuso 提交于
      This patch exposes a new API to refragment a skbuff. This allows you to
      split either a linear skbuff or to force the refragmentation of an
      existing fraglist using a different mtu. The API consists of:
      
      * ip_frag_init(), that initializes the internal state of the transformer.
      * ip_frag_next(), that allows you to fetch the next fragment. This function
        internally allocates the skbuff that represents the fragment, it pushes
        the IPv4 header, and it also copies the payload for each fragment.
      
      The ip_frag_state object stores the internal state of the splitter.
      
      This code has been extracted from ip_do_fragment(). Symbols are also
      exported to allow to reuse this iterator from the bridge codepath to
      build its own refragmentation routine by reusing the existing codebase.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      065ff79f
    • P
      net: ipv4: add skbuff fraglist splitter · c8b17be0
      Pablo Neira Ayuso 提交于
      This patch adds the skbuff fraglist splitter. This API provides an
      iterator to transform the fraglist into single skbuff objects, it
      consists of:
      
      * ip_fraglist_init(), that initializes the internal state of the
        fraglist splitter.
      * ip_fraglist_prepare(), that restores the IPv4 header on the
        fragments.
      * ip_fraglist_next(), that retrieves the fragment from the fraglist and
        it updates the internal state of the splitter to point to the next
        fragment skbuff in the fraglist.
      
      The ip_fraglist_iter object stores the internal state of the iterator.
      
      This code has been extracted from ip_do_fragment(). Symbols are also
      exported to allow to reuse this iterator from the bridge codepath to
      build its own refragmentation routine by reusing the existing codebase.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8b17be0
    • T
      treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 · 2874c5fd
      Thomas Gleixner 提交于
      Based on 1 normalized pattern(s):
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version
      
      extracted by the scancode license scanner the SPDX license identifier
      
        GPL-2.0-or-later
      
      has been chosen to replace the boilerplate/reference in 3029 file(s).
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAllison Randal <allison@lohutok.net>
      Cc: linux-spdx@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2874c5fd
  22. 02 4月, 2019 1 次提交
  23. 22 3月, 2019 1 次提交
    • D
      ipv4: Allow amount of dirty memory from fib resizing to be controllable · 9ab948a9
      David Ahern 提交于
      fib_trie implementation calls synchronize_rcu when a certain amount of
      pages are dirty from freed entries. The number of pages was determined
      experimentally in 2009 (commit c3059477).
      
      At the current setting, synchronize_rcu is called often -- 51 times in a
      second in one test with an average of an 8 msec delay adding a fib entry.
      The total impact is a lot of slow down modifying the fib. This is seen
      in the output of 'time' - the difference between real time and sys+user.
      For example, using 720,022 single path routes and 'ip -batch'[1]:
      
          $ time ./ip -batch ipv4/routes-1-hops
          real    0m14.214s
          user    0m2.513s
          sys     0m6.783s
      
      So roughly 35% of the actual time to install the routes is from the ip
      command getting scheduled out, most notably due to synchronize_rcu (this
      is observed using 'perf sched timehist').
      
      This patch makes the amount of dirty memory configurable between 64k where
      the synchronize_rcu is called often (small, low end systems that are memory
      sensitive) to 64M where synchronize_rcu is called rarely during a large
      FIB change (for high end systems with lots of memory). The default is 512kB
      which corresponds to the current setting of 128 pages with a 4kB page size.
      
      As an example, at 16MB the worst interval shows 4 calls to synchronize_rcu
      in a second blocking for up to 30 msec in a single instance, and a total
      of almost 100 msec across the 4 calls in the second. The trade off is
      allowing FIB entries to consume more memory in a given time window but
      but with much better fib insertion rates (~30% increase in prefixes/sec).
      With this patch and net.ipv4.fib_sync_mem set to 16MB, the same batch
      file runs in:
      
          $ time ./ip -batch ipv4/routes-1-hops
          real    0m9.692s
          user    0m2.491s
          sys     0m6.769s
      
      So the dead time is reduced to about 1/2 second or <5% of the real time.
      
      [1] 'ip' modified to not request ACK messages which improves route
          insertion times by about 20%
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ab948a9
  24. 02 3月, 2019 1 次提交
  25. 26 2月, 2019 1 次提交
  26. 08 11月, 2018 1 次提交
  27. 07 11月, 2018 1 次提交
  28. 05 10月, 2018 4 次提交
  29. 07 7月, 2018 1 次提交
    • W
      ip: remove tx_flags from ipcm_cookie and use same logic for v4 and v6 · 678ca42d
      Willem de Bruijn 提交于
      skb_shinfo(skb)->tx_flags is derived from sk->sk_tsflags, possibly
      after modification by __sock_cmsg_send, by calling sock_tx_timestamp.
      
      The IPv4 and IPv6 paths do this conversion differently. In IPv4, the
      individual protocols that support tx timestamps call this function
      and store the result in ipc.tx_flags. In IPv6, sock_tx_timestamp is
      called in __ip6_append_data.
      
      There is no need to store both tx_flags and ts_flags in the cookie
      as one is derived from the other. Convert when setting up the cork
      and remove the redundant field. This is similar to IPv6, only have
      the conversion happen only once per datagram, in ip(6)_setup_cork.
      
      Also change __ip6_append_data to match __ip_append_data. Only update
      tskey if timestamping is enabled with OPT_ID. The SOCK_.. test is
      redundant: only valid protocols can have non-zero cork->tx_flags.
      
      After this change the IPv4 and IPv6 logic is the same.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      678ca42d