1. 18 1月, 2019 3 次提交
  2. 12 1月, 2019 3 次提交
    • T
      net: bpfilter: disallow to remove bpfilter module while being used · 71a85084
      Taehee Yoo 提交于
      The bpfilter.ko module can be removed while functions of the bpfilter.ko
      are executing. so panic can occurred. in order to protect that, locks can
      be used. a bpfilter_lock protects routines in the
      __bpfilter_process_sockopt() but it's not enough because __exit routine
      can be executed concurrently.
      
      Now, the bpfilter_umh can not run in parallel.
      So, the module do not removed while it's being used and it do not
      double-create UMH process.
      The members of the umh_info and the bpfilter_umh_ops are protected by
      the bpfilter_umh_ops.lock.
      
      test commands:
         while :
         do
      	iptables -I FORWARD -m string --string ap --algo kmp &
      	modprobe -rv bpfilter &
         done
      
      splat looks like:
      [  298.623435] BUG: unable to handle kernel paging request at fffffbfff807440b
      [  298.628512] #PF error: [normal kernel read fault]
      [  298.633018] PGD 124327067 P4D 124327067 PUD 11c1a3067 PMD 119eb2067 PTE 0
      [  298.638859] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  298.638859] CPU: 0 PID: 2997 Comm: iptables Not tainted 4.20.0+ #154
      [  298.638859] RIP: 0010:__mutex_lock+0x6b9/0x16a0
      [  298.638859] Code: c0 00 00 e8 89 82 ff ff 80 bd 8f fc ff ff 00 0f 85 d9 05 00 00 48 8b 85 80 fc ff ff 48 bf 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 38 00 0f 85 1d 0e 00 00 48 8b 85 c8 fc ff ff 49 39 47 58 c6
      [  298.638859] RSP: 0018:ffff88810e7777a0 EFLAGS: 00010202
      [  298.638859] RAX: 1ffffffff807440b RBX: ffff888111bd4d80 RCX: 0000000000000000
      [  298.638859] RDX: 1ffff110235ff806 RSI: ffff888111bd5538 RDI: dffffc0000000000
      [  298.638859] RBP: ffff88810e777b30 R08: 0000000080000002 R09: 0000000000000000
      [  298.638859] R10: 0000000000000000 R11: 0000000000000000 R12: fffffbfff168a42c
      [  298.638859] R13: ffff888111bd4d80 R14: ffff8881040e9a05 R15: ffffffffc03a2000
      [  298.638859] FS:  00007f39e3758700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      [  298.638859] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  298.638859] CR2: fffffbfff807440b CR3: 000000011243e000 CR4: 00000000001006f0
      [  298.638859] Call Trace:
      [  298.638859]  ? mutex_lock_io_nested+0x1560/0x1560
      [  298.638859]  ? kasan_kmalloc+0xa0/0xd0
      [  298.638859]  ? kmem_cache_alloc+0x1c2/0x260
      [  298.638859]  ? __alloc_file+0x92/0x3c0
      [  298.638859]  ? alloc_empty_file+0x43/0x120
      [  298.638859]  ? alloc_file_pseudo+0x220/0x330
      [  298.638859]  ? sock_alloc_file+0x39/0x160
      [  298.638859]  ? __sys_socket+0x113/0x1d0
      [  298.638859]  ? __x64_sys_socket+0x6f/0xb0
      [  298.638859]  ? do_syscall_64+0x138/0x560
      [  298.638859]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  298.638859]  ? __alloc_file+0x92/0x3c0
      [  298.638859]  ? init_object+0x6b/0x80
      [  298.638859]  ? cyc2ns_read_end+0x10/0x10
      [  298.638859]  ? cyc2ns_read_end+0x10/0x10
      [  298.638859]  ? hlock_class+0x140/0x140
      [  298.638859]  ? sched_clock_local+0xd4/0x140
      [  298.638859]  ? sched_clock_local+0xd4/0x140
      [  298.638859]  ? check_flags.part.37+0x440/0x440
      [  298.638859]  ? __lock_acquire+0x4f90/0x4f90
      [  298.638859]  ? set_rq_offline.part.89+0x140/0x140
      [ ... ]
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71a85084
    • T
      net: bpfilter: restart bpfilter_umh when error occurred · 61fbf593
      Taehee Yoo 提交于
      The bpfilter_umh will be stopped via __stop_umh() when the bpfilter
      error occurred.
      The bpfilter_umh() couldn't start again because there is no restart
      routine.
      
      The section of the bpfilter_umh_{start/end} is no longer .init.rodata
      because these area should be reused in the restart routine. hence
      the section name is changed to .bpfilter_umh.
      
      The bpfilter_ops->start() is restart callback. it will be called when
      bpfilter_umh is stopped.
      The stop bit means bpfilter_umh is stopped. this bit is set by both
      start and stop routine.
      
      Before this patch,
      Test commands:
         $ iptables -vnL
         $ kill -9 <pid of bpfilter_umh>
         $ iptables -vnL
         [  480.045136] bpfilter: write fail -32
         $ iptables -vnL
      
      All iptables commands will fail.
      
      After this patch,
      Test commands:
         $ iptables -vnL
         $ kill -9 <pid of bpfilter_umh>
         $ iptables -vnL
         $ iptables -vnL
      
      Now, all iptables commands will work.
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61fbf593
    • T
      net: bpfilter: use cleanup callback to release umh_info · 5b4cb650
      Taehee Yoo 提交于
      Now, UMH process is killed, do_exit() calls the umh_info->cleanup callback
      to release members of the umh_info.
      This patch makes bpfilter_umh's cleanup routine to use the
      umh_info->cleanup callback.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b4cb650
  3. 11 1月, 2019 1 次提交
    • Y
      tcp: change txhash on SYN-data timeout · c5715b8f
      Yuchung Cheng 提交于
      Previously upon SYN timeouts the sender recomputes the txhash to
      try a different path. However this does not apply on the initial
      timeout of SYN-data (active Fast Open). Therefore an active IPv6
      Fast Open connection may incur one second RTO penalty to take on
      a new path after the second SYN retransmission uses a new flow label.
      
      This patch removes this undesirable behavior so Fast Open changes
      the flow label just like the regular connections. This also helps
      avoid falsely disabling Fast Open on the sender which triggers
      after two consecutive SYN timeouts on Fast Open.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5715b8f
  4. 10 1月, 2019 1 次提交
  5. 05 1月, 2019 2 次提交
    • S
      fou: Prevent unbounded recursion in GUE error handler also with UDP-Lite · bc6e019b
      Stefano Brivio 提交于
      In commit 11789039 ("fou: Prevent unbounded recursion in GUE error
      handler"), I didn't take care of the case where UDP-Lite is encapsulated
      into UDP or UDP-Lite with GUE. From a syzbot report about a possibly
      similar issue with GUE on IPv6, I just realised the same thing might
      happen with a UDP-Lite inner payload.
      
      Also skip exception handling for inner UDP-Lite protocol.
      
      Fixes: 11789039 ("fou: Prevent unbounded recursion in GUE error handler")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc6e019b
    • A
      netlink: fixup regression in RTM_GETADDR · 7c1e8a38
      Arthur Gautier 提交于
      This commit fixes a regression in AF_INET/RTM_GETADDR and
      AF_INET6/RTM_GETADDR.
      
      Before this commit, the kernel would stop dumping addresses once the first
      skb was full and end the stream with NLMSG_DONE(-EMSGSIZE). The error
      shouldn't be sent back to netlink_dump so the callback is kept alive. The
      userspace is expected to call back with a new empty skb.
      
      Changes from V1:
       - The error is not handled in netlink_dump anymore but rather in
         inet_dump_ifaddr and inet6_dump_addr directly as suggested by
         David Ahern.
      
      Fixes: d7e38611 ("net/ipv4: Put target net when address dump fails due to bad attributes")
      Fixes: 242afaa6 ("net/ipv6: Put target net when address dump fails due to bad attributes")
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: "David S . Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NArthur Gautier <baloo@gandi.net>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c1e8a38
  6. 02 1月, 2019 1 次提交
    • W
      ip: validate header length on virtual device xmit · cb9f1b78
      Willem de Bruijn 提交于
      KMSAN detected read beyond end of buffer in vti and sit devices when
      passing truncated packets with PF_PACKET. The issue affects additional
      ip tunnel devices.
      
      Extend commit 76c0ddd8 ("ip6_tunnel: be careful when accessing the
      inner header") and commit ccfec9e5 ("ip_tunnel: be careful when
      accessing the inner header").
      
      Move the check to a separate helper and call at the start of each
      ndo_start_xmit function in net/ipv4 and net/ipv6.
      
      Minor changes:
      - convert dev_kfree_skb to kfree_skb on error path,
        as dev_kfree_skb calls consume_skb which is not for error paths.
      - use pskb_network_may_pull even though that is pedantic here,
        as the same as pskb_may_pull for devices without llheaders.
      - do not cache ipv6 hdrs if used only once
        (unsafe across pskb_may_pull, was more relevant to earlier patch)
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb9f1b78
  7. 31 12月, 2018 1 次提交
  8. 29 12月, 2018 1 次提交
  9. 25 12月, 2018 2 次提交
  10. 21 12月, 2018 4 次提交
    • E
      tcp: fix a race in inet_diag_dump_icsk() · f0c928d8
      Eric Dumazet 提交于
      Alexei reported use after frees in inet_diag_dump_icsk() [1]
      
      Because we use refcount_set() when various sockets are setup and
      inserted into ehash, we also need to make sure inet_diag_dump_icsk()
      wont race with the refcount_set() operations.
      
      Jonathan Lemon sent a patch changing net_twsk_hashdance() but
      other spots would need risky changes.
      
      Instead, fix inet_diag_dump_icsk() as this bug came with
      linux-4.10 only.
      
      [1] Quoting Alexei :
      
      First something iterating over sockets finds already freed tw socket:
      
      refcount_t: increment on 0; use-after-free.
      WARNING: CPU: 2 PID: 2738 at lib/refcount.c:153 refcount_inc+0x26/0x30
      RIP: 0010:refcount_inc+0x26/0x30
      RSP: 0018:ffffc90004c8fbc0 EFLAGS: 00010282
      RAX: 000000000000002b RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff88085ee9d680 RSI: ffff88085ee954c8 RDI: ffff88085ee954c8
      RBP: ffff88010ecbd2c0 R08: 0000000000000000 R09: 000000000000174c
      R10: ffffffff81e7c5a0 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff8806ba9bf210 R14: ffffffff82304600 R15: ffff88010ecbd328
      FS:  00007f81f5a7d700(0000) GS:ffff88085ee80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f81e2a95000 CR3: 000000069b2eb006 CR4: 00000000003606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       inet_diag_dump_icsk+0x2b3/0x4e0 [inet_diag]  // sock_hold(sk); in net/ipv4/inet_diag.c:1002
       ? kmalloc_large_node+0x37/0x70
       ? __kmalloc_node_track_caller+0x1cb/0x260
       ? __alloc_skb+0x72/0x1b0
       ? __kmalloc_reserve.isra.40+0x2e/0x80
       __inet_diag_dump+0x3b/0x80 [inet_diag]
       netlink_dump+0x116/0x2a0
       netlink_recvmsg+0x205/0x3c0
       sock_read_iter+0x89/0xd0
       __vfs_read+0xf7/0x140
       vfs_read+0x8a/0x140
       SyS_read+0x3f/0xa0
       do_syscall_64+0x5a/0x100
      
      then a minute later twsk timer fires and hits two bad refcnts
      for this freed socket:
      
      refcount_t: decrement hit 0; leaking memory.
      WARNING: CPU: 31 PID: 0 at lib/refcount.c:228 refcount_dec+0x2e/0x40
      Modules linked in:
      RIP: 0010:refcount_dec+0x2e/0x40
      RSP: 0018:ffff88085f5c3ea8 EFLAGS: 00010296
      RAX: 000000000000002c RBX: ffff88010ecbd2c0 RCX: 000000000000083f
      RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
      RBP: ffffc90003c77280 R08: 0000000000000000 R09: 00000000000017d3
      R10: ffffffff81e7c5a0 R11: 0000000000000000 R12: ffffffff82ad2d80
      R13: ffffffff8182de00 R14: ffff88085f5c3ef8 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88085f5c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fbe42685250 CR3: 0000000002209001 CR4: 00000000003606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       inet_twsk_kill+0x9d/0xc0  // inet_twsk_bind_unhash(tw, hashinfo);
       call_timer_fn+0x29/0x110
       run_timer_softirq+0x36b/0x3a0
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 31 PID: 0 at lib/refcount.c:187 refcount_sub_and_test+0x46/0x50
      RIP: 0010:refcount_sub_and_test+0x46/0x50
      RSP: 0018:ffff88085f5c3eb8 EFLAGS: 00010296
      RAX: 0000000000000026 RBX: ffff88010ecbd2c0 RCX: 000000000000083f
      RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000003f
      RBP: ffff88010ecbd358 R08: 0000000000000000 R09: 000000000000185b
      R10: ffffffff81e7c5a0 R11: 0000000000000000 R12: ffff88010ecbd358
      R13: ffffffff8182de00 R14: ffff88085f5c3ef8 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88085f5c0000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fbe42685250 CR3: 0000000002209001 CR4: 00000000003606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <IRQ>
       inet_twsk_put+0x12/0x20  // inet_twsk_put(tw);
       call_timer_fn+0x29/0x110
       run_timer_softirq+0x36b/0x3a0
      
      Fixes: 67db3e4b ("tcp: no longer hold ehash lock while calling tcp_get_info()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0c928d8
    • I
      net: ipv4: Set skb->dev for output route resolution · 21f94775
      Ido Schimmel 提交于
      When user requests to resolve an output route, the kernel synthesizes
      an skb where the relevant parameters (e.g., source address) are set. The
      skb is then passed to ip_route_output_key_hash_rcu() which might call
      into the flow dissector in case a multipath route was hit and a nexthop
      needs to be selected based on the multipath hash.
      
      Since both 'skb->dev' and 'skb->sk' are not set, a warning is triggered
      in the flow dissector [1]. The warning is there to prevent codepaths
      from silently falling back to the standard flow dissector instead of the
      BPF one.
      
      Therefore, instead of removing the warning, set 'skb->dev' to the
      loopback device, as its not used for anything but resolving the correct
      namespace.
      
      [1]
      WARNING: CPU: 1 PID: 24819 at net/core/flow_dissector.c:764 __skb_flow_dissect+0x314/0x16b0
      ...
      RSP: 0018:ffffa0df41fdf650 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8bcded232000 RCX: 0000000000000000
      RDX: ffffa0df41fdf7e0 RSI: ffffffff98e415a0 RDI: ffff8bcded232000
      RBP: ffffa0df41fdf760 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffa0df41fdf7e8 R11: ffff8bcdf27a3000 R12: ffffffff98e415a0
      R13: ffffa0df41fdf7e0 R14: ffffffff98dd2980 R15: ffffa0df41fdf7e0
      FS:  00007f46f6897680(0000) GS:ffff8bcdf7a80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055933e95f9a0 CR3: 000000021e636000 CR4: 00000000001006e0
      Call Trace:
       fib_multipath_hash+0x28c/0x2d0
       ? fib_multipath_hash+0x28c/0x2d0
       fib_select_path+0x241/0x32f
       ? __fib_lookup+0x6a/0xb0
       ip_route_output_key_hash_rcu+0x650/0xa30
       ? __alloc_skb+0x9b/0x1d0
       inet_rtm_getroute+0x3f7/0xb80
       ? __alloc_pages_nodemask+0x11c/0x2c0
       rtnetlink_rcv_msg+0x1d9/0x2f0
       ? rtnl_calcit.isra.24+0x120/0x120
       netlink_rcv_skb+0x54/0x130
       rtnetlink_rcv+0x15/0x20
       netlink_unicast+0x20a/0x2c0
       netlink_sendmsg+0x2d1/0x3d0
       sock_sendmsg+0x39/0x50
       ___sys_sendmsg+0x2a0/0x2f0
       ? filemap_map_pages+0x16b/0x360
       ? __handle_mm_fault+0x108e/0x13d0
       __sys_sendmsg+0x63/0xa0
       ? __sys_sendmsg+0x63/0xa0
       __x64_sys_sendmsg+0x1f/0x30
       do_syscall_64+0x5a/0x120
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Fixes: d0e13a14 ("flow_dissector: lookup netns by skb->sk if skb->dev is NULL")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21f94775
    • J
      bpf: sk_msg, sock{map|hash} redirect through ULP · 0608c69c
      John Fastabend 提交于
      A sockmap program that redirects through a kTLS ULP enabled socket
      will not work correctly because the ULP layer is skipped. This
      fixes the behavior to call through the ULP layer on redirect to
      ensure any operations required on the data stream at the ULP layer
      continue to be applied.
      
      To do this we add an internal flag MSG_SENDPAGE_NOPOLICY to avoid
      calling the BPF layer on a redirected message. This is
      required to avoid calling the BPF layer multiple times (possibly
      recursively) which is not the current/expected behavior without
      ULPs. In the future we may add a redirect flag if users _do_
      want the policy applied again but this would need to work for both
      ULP and non-ULP sockets and be opt-in to avoid breaking existing
      programs.
      
      Also to avoid polluting the flag space with an internal flag we
      reuse the flag space overlapping MSG_SENDPAGE_NOPOLICY with
      MSG_WAITFORONE. Here WAITFORONE is specific to recv path and
      SENDPAGE_NOPOLICY is only used for sendpage hooks. The last thing
      to verify is user space API is masked correctly to ensure the flag
      can not be set by user. (Note this needs to be true regardless
      because we have internal flags already in-use that user space
      should not be able to set). But for completeness we have two UAPI
      paths into sendpage, sendfile and splice.
      
      In the sendfile case the function do_sendfile() zero's flags,
      
      ./fs/read_write.c:
       static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
      		   	    size_t count, loff_t max)
       {
         ...
         fl = 0;
      #if 0
         /*
          * We need to debate whether we can enable this or not. The
          * man page documents EAGAIN return for the output at least,
          * and the application is arguably buggy if it doesn't expect
          * EAGAIN on a non-blocking file descriptor.
          */
          if (in.file->f_flags & O_NONBLOCK)
      	fl = SPLICE_F_NONBLOCK;
      #endif
          file_start_write(out.file);
          retval = do_splice_direct(in.file, &pos, out.file, &out_pos, count, fl);
       }
      
      In the splice case the pipe_to_sendpage "actor" is used which
      masks flags with SPLICE_F_MORE.
      
      ./fs/splice.c:
       static int pipe_to_sendpage(struct pipe_inode_info *pipe,
      			    struct pipe_buffer *buf, struct splice_desc *sd)
       {
         ...
         more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
         ...
       }
      
      Confirming what we expect that internal flags  are in fact internal
      to socket side.
      
      Fixes: d3b18ad3 ("tls: add bpf support to sk_msg handling")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      0608c69c
    • J
      bpf: sk_msg, fix socket data_ready events · 552de910
      John Fastabend 提交于
      When a skb verdict program is in-use and either another BPF program
      redirects to that socket or the new SK_PASS support is used the
      data_ready callback does not wake up application. Instead because
      the stream parser/verdict is using the sk data_ready callback we wake
      up the stream parser/verdict block.
      
      Fix this by adding a helper to check if the stream parser block is
      enabled on the sk and if so call the saved pointer which is the
      upper layers wake up function.
      
      This fixes application stalls observed when an application is waiting
      for data in a blocking read().
      
      Fixes: d829e9c4 ("tls: convert to generic sk_msg interface")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      552de910
  11. 20 12月, 2018 4 次提交
    • F
      net: use skb_sec_path helper in more places · 2294be0f
      Florian Westphal 提交于
      skb_sec_path gains 'const' qualifier to avoid
      xt_policy.c: 'skb_sec_path' discards 'const' qualifier from pointer target type
      
      same reasoning as previous conversions: Won't need to touch these
      spots anymore when skb->sp is removed.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2294be0f
    • F
      xfrm: change secpath_set to return secpath struct, not error value · 0ca64da1
      Florian Westphal 提交于
      It can only return 0 (success) or -ENOMEM.
      Change return value to a pointer to secpath struct.
      
      This avoids direct access to skb->sp:
      
      err = secpath_set(skb);
      if (!err) ..
      skb->sp-> ...
      
      Becomes:
      sp = secpath_set(skb)
      if (!sp) ..
      sp-> ..
      
      This reduces noise in followup patch which is going to remove skb->sp.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ca64da1
    • F
      sk_buff: add skb extension infrastructure · df5042f4
      Florian Westphal 提交于
      This adds an optional extension infrastructure, with ispec (xfrm) and
      bridge netfilter as first users.
      objdiff shows no changes if kernel is built without xfrm and br_netfilter
      support.
      
      The third (planned future) user is Multipath TCP which is still
      out-of-tree.
      MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
      numbers used by individual subflows.
      
      This DSS mapping is read/written from tcp option space on receive and
      written to tcp option space on transmitted tcp packets that are part of
      and MPTCP connection.
      
      Extending skb_shared_info or adding a private data field to skb fclones
      doesn't work for incoming skb, so a different DSS propagation method would
      be required for the receive side.
      
      mptcp has same requirements as secpath/bridge netfilter:
      
      1. extension memory is released when the sk_buff is free'd.
      2. data is shared after cloning an skb (clone inherits extension)
      3. adding extension to an skb will COW the extension buffer if needed.
      
      The "MPTCP upstreaming" effort adds SKB_EXT_MPTCP extension to store the
      mapping for tx and rx processing.
      
      Two new members are added to sk_buff:
      1. 'active_extensions' byte (filling a hole), telling which extensions
         are available for this skb.
         This has two purposes.
         a) avoids the need to initialize the pointer.
         b) allows to "delete" an extension by clearing its bit
         value in ->active_extensions.
      
         While it would be possible to store the active_extensions byte
         in the extension struct instead of sk_buff, there is one problem
         with this:
          When an extension has to be disabled, we can always clear the
          bit in skb->active_extensions.  But in case it would be stored in the
          extension buffer itself, we might have to COW it first, if
          we are dealing with a cloned skb.  On kmalloc failure we would
          be unable to turn an extension off.
      
      2. extension pointer, located at the end of the sk_buff.
         If the active_extensions byte is 0, the pointer is undefined,
         it is not initialized on skb allocation.
      
      This adds extra code to skb clone and free paths (to deal with
      refcount/free of extension area) but this replaces similar code that
      manages skb->nf_bridge and skb->sp structs in the followup patches of
      the series.
      
      It is possible to add support for extensions that are not preseved on
      clones/copies.
      
      To do this, it would be needed to define a bitmask of all extensions that
      need copy/cow semantics, and change __skb_ext_copy() to check
      ->active_extensions & SKB_EXT_PRESERVE_ON_CLONE, then just set
      ->active_extensions to 0 on the new clone.
      
      This isn't done here because all extensions that get added here
      need the copy/cow semantics.
      
      v2:
      Allocate entire extension space using kmem_cache.
      Upside is that this allows better tracking of used memory,
      downside is that we will allocate more space than strictly needed in
      most cases (its unlikely that all extensions are active/needed at same
      time for same skb).
      The allocated memory (except the small extension header) is not cleared,
      so no additonal overhead aside from memory usage.
      
      Avoid atomic_dec_and_test operation on skb_ext_put()
      by using similar trick as kfree_skbmem() does with fclone_ref:
      If recount is 1, there is no concurrent user and we can free right away.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df5042f4
    • F
      netfilter: avoid using skb->nf_bridge directly · c4b0e771
      Florian Westphal 提交于
      This pointer is going to be removed soon, so use the existing helpers in
      more places to avoid noise when the removal happens.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4b0e771
  12. 18 12月, 2018 15 次提交
    • D
      ipmr: Drop mfc_cache argument to ipmr_queue_xmit · 6e0735d1
      David Ahern 提交于
      mfc_cache is not needed by ipmr_queue_xmit so drop it from the input
      argument list.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e0735d1
    • W
      net: add missing SOF_TIMESTAMPING_OPT_ID support · 8f932f76
      Willem de Bruijn 提交于
      SOF_TIMESTAMPING_OPT_ID is supported on TCP, UDP and RAW sockets.
      But it was missing on RAW with IPPROTO_IP, PF_PACKET and CAN.
      
      Add skb_setup_tx_timestamp that configures both tx_flags and tskey
      for these paths that do not need corking or use bytestream keys.
      
      Fixes: 09c2d251 ("net-timestamp: add key to disambiguate concurrent datagrams")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f932f76
    • P
      net: dccp: initialize (addr,port) listening hashtable · eedbbb0d
      Peter Oskolkov 提交于
      Commit d9fbc7f6 "net: tcp: prefer listeners bound to an address"
      removes port-only listener lookups. This caused segfaults in DCCP
      lookups because DCCP did not initialize the (addr,port) hashtable.
      
      This patch adds said initialization.
      
      The only non-trivial issue here is the size of the new hashtable.
      It seemed reasonable to make it match the size of the port-only
      hashtable (= INET_LHTABLE_SIZE) that was used previously. Other
      parameters to inet_hashinfo2_init() match those used in TCP.
      
      V2 changes: marked inet_hashinfo2_init as an exported symbol
      so that DCCP compiles when configured as a module.
      
      Tested: syzcaller issues fixed; the second patch in the patchset
              tests that DCCP lookups work correctly.
      
      Fixes: d9fbc7f6 "net: tcp: prefer listeners bound to an address"
      Reported-by: Nsyzcaller <syzkaller@googlegroups.com>
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eedbbb0d
    • S
      fou: Prevent unbounded recursion in GUE error handler · 11789039
      Stefano Brivio 提交于
      Handling exceptions for direct UDP encapsulation in GUE (that is,
      UDP-in-UDP) leads to unbounded recursion in the GUE exception handler,
      syzbot reported.
      
      While draft-ietf-intarea-gue-06 doesn't explicitly forbid direct
      encapsulation of UDP in GUE, it probably doesn't make sense to set up GUE
      this way, and it's currently not even possible to configure this.
      
      Skip exception handling if the GUE proto/ctype field is set to the UDP
      protocol number. Should we need to handle exceptions for UDP-in-GUE one
      day, we might need to either explicitly set a bound for recursion, or
      implement a special iterative handling for these cases.
      
      Reported-and-tested-by: syzbot+43f6755d1c2e62743468@syzkaller.appspotmail.com
      Fixes: b8a51b38 ("fou, fou6: ICMP error handlers for FoU and GUE")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11789039
    • T
      netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set · 06aa151a
      Taehee Yoo 提交于
      If same destination IP address config is already existing, that config is
      just used. MAC address also should be same.
      However, there is no MAC address checking routine.
      So that MAC address checking routine is added.
      
      test commands:
         %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
      	   -j CLUSTERIP --new --hashmode sourceip \
      	   --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
         %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
      	   -j CLUSTERIP --new --hashmode sourceip \
      	   --clustermac 01:00:5e:00:00:21 --total-nodes 2 --local-node 1
      
      After this patch, above commands are disallowed.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      06aa151a
    • T
      netfilter: ipt_CLUSTERIP: fix sleep-in-atomic bug in clusterip_config_entry_put() · 2a61d8b8
      Taehee Yoo 提交于
      A proc_remove() can sleep. so that it can't be inside of spin_lock.
      Hence proc_remove() is moved to outside of spin_lock. and it also
      adds mutex to sync create and remove of proc entry(config->pde).
      
      test commands:
      SHELL#1
         %while :; do iptables -A INPUT -p udp -i enp2s0 -d 192.168.1.100 \
      	   --dport 9000  -j CLUSTERIP --new --hashmode sourceip \
      	   --clustermac 01:00:5e:00:00:21 --total-nodes 3 --local-node 3; \
      	   iptables -F; done
      
      SHELL#2
         %while :; do echo +1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; \
      	   echo -1 > /proc/net/ipt_CLUSTERIP/192.168.1.100; done
      
      [ 2949.569864] BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
      [ 2949.579944] in_atomic(): 1, irqs_disabled(): 0, pid: 5472, name: iptables
      [ 2949.587920] 1 lock held by iptables/5472:
      [ 2949.592711]  #0: 000000008f0ebcf2 (&(&cn->lock)->rlock){+...}, at: refcount_dec_and_lock+0x24/0x50
      [ 2949.603307] CPU: 1 PID: 5472 Comm: iptables Tainted: G        W         4.19.0-rc5+ #16
      [ 2949.604212] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
      [ 2949.604212] Call Trace:
      [ 2949.604212]  dump_stack+0xc9/0x16b
      [ 2949.604212]  ? show_regs_print_info+0x5/0x5
      [ 2949.604212]  ___might_sleep+0x2eb/0x420
      [ 2949.604212]  ? set_rq_offline.part.87+0x140/0x140
      [ 2949.604212]  ? _rcu_barrier_trace+0x400/0x400
      [ 2949.604212]  wait_for_completion+0x94/0x710
      [ 2949.604212]  ? wait_for_completion_interruptible+0x780/0x780
      [ 2949.604212]  ? __kernel_text_address+0xe/0x30
      [ 2949.604212]  ? __lockdep_init_map+0x10e/0x5c0
      [ 2949.604212]  ? __lockdep_init_map+0x10e/0x5c0
      [ 2949.604212]  ? __init_waitqueue_head+0x86/0x130
      [ 2949.604212]  ? init_wait_entry+0x1a0/0x1a0
      [ 2949.604212]  proc_entry_rundown+0x208/0x270
      [ 2949.604212]  ? proc_reg_get_unmapped_area+0x370/0x370
      [ 2949.604212]  ? __lock_acquire+0x4500/0x4500
      [ 2949.604212]  ? complete+0x18/0x70
      [ 2949.604212]  remove_proc_subtree+0x143/0x2a0
      [ 2949.708655]  ? remove_proc_entry+0x390/0x390
      [ 2949.708655]  clusterip_tg_destroy+0x27a/0x630 [ipt_CLUSTERIP]
      [ ... ]
      
      Fixes: b3e456fc ("netfilter: ipt_CLUSTERIP: fix a race condition of proc file creation")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2a61d8b8
    • T
      netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine · b12f7bad
      Taehee Yoo 提交于
      When network namespace is destroyed, both clusterip_tg_destroy() and
      clusterip_net_exit() are called. and clusterip_net_exit() is called
      before clusterip_tg_destroy().
      Hence cleanup check code in clusterip_net_exit() doesn't make sense.
      
      test commands:
         %ip netns add vm1
         %ip netns exec vm1 bash
         %ip link set lo up
         %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
      	-j CLUSTERIP --new --hashmode sourceip \
      	--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
         %exit
         %ip netns del vm1
      
      splat looks like:
      [  341.184508] WARNING: CPU: 1 PID: 87 at net/ipv4/netfilter/ipt_CLUSTERIP.c:840 clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
      [  341.184850] Modules linked in: ipt_CLUSTERIP nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter bpfilter ip_tables x_tables
      [  341.184850] CPU: 1 PID: 87 Comm: kworker/u4:2 Not tainted 4.19.0-rc5+ #16
      [  341.227509] Workqueue: netns cleanup_net
      [  341.227509] RIP: 0010:clusterip_net_exit+0x319/0x380 [ipt_CLUSTERIP]
      [  341.227509] Code: 0f 85 7f fe ff ff 48 c7 c2 80 64 2c c0 be a8 02 00 00 48 c7 c7 a0 63 2c c0 c6 05 18 6e 00 00 01 e8 bc 38 ff f5 e9 5b fe ff ff <0f> 0b e9 33 ff ff ff e8 4b 90 50 f6 e9 2d fe ff ff 48 89 df e8 de
      [  341.227509] RSP: 0018:ffff88011086f408 EFLAGS: 00010202
      [  341.227509] RAX: dffffc0000000000 RBX: 1ffff1002210de85 RCX: 0000000000000000
      [  341.227509] RDX: 1ffff1002210de85 RSI: ffff880110813be8 RDI: ffffed002210de58
      [  341.227509] RBP: ffff88011086f4d0 R08: 0000000000000000 R09: 0000000000000000
      [  341.227509] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff1002210de81
      [  341.227509] R13: ffff880110625a48 R14: ffff880114cec8c8 R15: 0000000000000014
      [  341.227509] FS:  0000000000000000(0000) GS:ffff880116600000(0000) knlGS:0000000000000000
      [  341.227509] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  341.227509] CR2: 00007f11fd38e000 CR3: 000000013ca16000 CR4: 00000000001006e0
      [  341.227509] Call Trace:
      [  341.227509]  ? __clusterip_config_find+0x460/0x460 [ipt_CLUSTERIP]
      [  341.227509]  ? default_device_exit+0x1ca/0x270
      [  341.227509]  ? remove_proc_entry+0x1cd/0x390
      [  341.227509]  ? dev_change_net_namespace+0xd00/0xd00
      [  341.227509]  ? __init_waitqueue_head+0x130/0x130
      [  341.227509]  ops_exit_list.isra.10+0x94/0x140
      [  341.227509]  cleanup_net+0x45b/0x900
      [ ... ]
      
      Fixes: 613d0776 ("netfilter: exit_net cleanup check added")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b12f7bad
    • T
      netfilter: ipt_CLUSTERIP: fix deadlock in netns exit routine · 5a86d68b
      Taehee Yoo 提交于
      When network namespace is destroyed, cleanup_net() is called.
      cleanup_net() holds pernet_ops_rwsem then calls each ->exit callback.
      So that clusterip_tg_destroy() is called by cleanup_net().
      And clusterip_tg_destroy() calls unregister_netdevice_notifier().
      
      But both cleanup_net() and clusterip_tg_destroy() hold same
      lock(pernet_ops_rwsem). hence deadlock occurrs.
      
      After this patch, only 1 notifier is registered when module is inserted.
      And all of configs are added to per-net list.
      
      test commands:
         %ip netns add vm1
         %ip netns exec vm1 bash
         %ip link set lo up
         %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \
      	-j CLUSTERIP --new --hashmode sourceip \
      	--clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1
         %exit
         %ip netns del vm1
      
      splat looks like:
      [  341.809674] ============================================
      [  341.809674] WARNING: possible recursive locking detected
      [  341.809674] 4.19.0-rc5+ #16 Tainted: G        W
      [  341.809674] --------------------------------------------
      [  341.809674] kworker/u4:2/87 is trying to acquire lock:
      [  341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: unregister_netdevice_notifier+0x8c/0x460
      [  341.809674]
      [  341.809674] but task is already holding lock:
      [  341.809674] 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
      [  341.809674]
      [  341.809674] other info that might help us debug this:
      [  341.809674]  Possible unsafe locking scenario:
      [  341.809674]
      [  341.809674]        CPU0
      [  341.809674]        ----
      [  341.809674]   lock(pernet_ops_rwsem);
      [  341.809674]   lock(pernet_ops_rwsem);
      [  341.809674]
      [  341.809674]  *** DEADLOCK ***
      [  341.809674]
      [  341.809674]  May be due to missing lock nesting notation
      [  341.809674]
      [  341.809674] 3 locks held by kworker/u4:2/87:
      [  341.809674]  #0: 00000000d9df6c92 ((wq_completion)"%s""netns"){+.+.}, at: process_one_work+0xafe/0x1de0
      [  341.809674]  #1: 00000000c2cbcee2 (net_cleanup_work){+.+.}, at: process_one_work+0xb60/0x1de0
      [  341.809674]  #2: 000000005da2d519 (pernet_ops_rwsem){++++}, at: cleanup_net+0x119/0x900
      [  341.809674]
      [  341.809674] stack backtrace:
      [  341.809674] CPU: 1 PID: 87 Comm: kworker/u4:2 Tainted: G        W         4.19.0-rc5+ #16
      [  341.809674] Workqueue: netns cleanup_net
      [  341.809674] Call Trace:
      [ ... ]
      [  342.070196]  down_write+0x93/0x160
      [  342.070196]  ? unregister_netdevice_notifier+0x8c/0x460
      [  342.070196]  ? down_read+0x1e0/0x1e0
      [  342.070196]  ? sched_clock_cpu+0x126/0x170
      [  342.070196]  ? find_held_lock+0x39/0x1c0
      [  342.070196]  unregister_netdevice_notifier+0x8c/0x460
      [  342.070196]  ? register_netdevice_notifier+0x790/0x790
      [  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
      [  342.070196]  ? __local_bh_enable_ip+0xe9/0x1b0
      [  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
      [  342.070196]  ? trace_hardirqs_on+0x93/0x210
      [  342.070196]  ? __bpf_trace_preemptirq_template+0x10/0x10
      [  342.070196]  ? clusterip_tg_destroy+0x372/0x650 [ipt_CLUSTERIP]
      [  342.123094]  clusterip_tg_destroy+0x3ad/0x650 [ipt_CLUSTERIP]
      [  342.123094]  ? clusterip_net_init+0x3d0/0x3d0 [ipt_CLUSTERIP]
      [  342.123094]  ? cleanup_match+0x17d/0x200 [ip_tables]
      [  342.123094]  ? xt_unregister_table+0x215/0x300 [x_tables]
      [  342.123094]  ? kfree+0xe2/0x2a0
      [  342.123094]  cleanup_entry+0x1d5/0x2f0 [ip_tables]
      [  342.123094]  ? cleanup_match+0x200/0x200 [ip_tables]
      [  342.123094]  __ipt_unregister_table+0x9b/0x1a0 [ip_tables]
      [  342.123094]  iptable_filter_net_exit+0x43/0x80 [iptable_filter]
      [  342.123094]  ops_exit_list.isra.10+0x94/0x140
      [  342.123094]  cleanup_net+0x45b/0x900
      [ ... ]
      
      Fixes: 202f59af ("netfilter: ipt_CLUSTERIP: do not hold dev")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5a86d68b
    • F
      netfilter: nat: remove nf_nat_l4proto struct · 5cbabeec
      Florian Westphal 提交于
      This removes the (now empty) nf_nat_l4proto struct, all its instances
      and all the no longer needed runtime (un)register functionality.
      
      nf_nat_need_gre() can be axed as well: the module that calls it (to
      load the no-longer-existing nat_gre module) also calls other nat core
      functions. GRE nat is now always available if kernel is built with it.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5cbabeec
    • F
      netfilter: nat: remove l4proto->manip_pkt · faec18db
      Florian Westphal 提交于
      This removes the last l4proto indirection, the two callers, the l3proto
      packet mangling helpers for ipv4 and ipv6, now call the
      nf_nat_l4proto_manip_pkt() helper.
      
      nf_nat_proto_{dccp,tcp,sctp,gre,icmp,icmpv6} are left behind, even though
      they contain no functionality anymore to not clutter this patch.
      
      Next patch will remove the empty files and the nf_nat_l4proto
      struct.
      
      nf_nat_proto_udp.c is renamed to nf_nat_proto.c, as it now contains the
      other nat manip functionality as well, not just udp and udplite.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      faec18db
    • F
      netfilter: nat: remove l4proto->nlattr_to_range · 76b90019
      Florian Westphal 提交于
      all protocols did set this to nf_nat_l4proto_nlattr_to_range, so
      just call it directly.
      
      The important difference is that we'll now also call it for
      protocols that we don't support (i.e., nf_nat_proto_unknown did
      not provide .nlattr_to_range).
      
      However, there should be no harm, even icmp provided this callback.
      If we don't implement a specific l4nat for this, nothing would make
      use of this information, so adding a big switch/case construct listing
      all supported l4protocols seems a bit pointless.
      
      This change leaves a single function pointer in the l4proto struct.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      76b90019
    • F
      netfilter: nat: remove l4proto->in_range · fe2d0020
      Florian Westphal 提交于
      With exception of icmp, all of the l4 nat protocols set this to
      nf_nat_l4proto_in_range.
      
      Get rid of this and just check the l4proto in the caller.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      fe2d0020
    • F
      netfilter: nat: fold in_range indirection into caller · 40e786bd
      Florian Westphal 提交于
      No need for indirections here, we only support ipv4 and ipv6
      and the called functions are very small.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      40e786bd
    • F
      netfilter: nat: remove l4proto->unique_tuple · 203f2e78
      Florian Westphal 提交于
      fold remaining users (icmp, icmpv6, gre) into nf_nat_l4proto_unique_tuple.
      The static-save of old incarnation of resolved key in gre and icmp is
      removed as well, just use the prandom based offset like the others.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      203f2e78
    • F
      netfilter: remove NF_NAT_RANGE_PROTO_RANDOM support · 912da924
      Florian Westphal 提交于
      Historically this was net_random() based, and was then converted to
      a hash based algorithm (private boot seed + hash of endpoint addresses)
      due to concerns of leaking net_random() bits.
      
      RANDOM_FULLY mode was added later to avoid problems with hash
      based mode (see commit 34ce3240,
      "netfilter: nf_nat: add full port randomization support" for details).
      
      Just make prandom_u32() the default search starting point and get rid of
      ->secure_port() altogether.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      912da924
  13. 16 12月, 2018 2 次提交