1. 02 6月, 2020 3 次提交
  2. 28 5月, 2020 9 次提交
  3. 27 5月, 2020 3 次提交
  4. 26 5月, 2020 4 次提交
  5. 12 5月, 2020 2 次提交
  6. 11 5月, 2020 4 次提交
    • F
      netfilter: conntrack: fix infinite loop on rmmod · 54ab49fd
      Florian Westphal 提交于
      'rmmod nf_conntrack' can hang forever, because the netns exit
      gets stuck in nf_conntrack_cleanup_net_list():
      
      i_see_dead_people:
       busy = 0;
       list_for_each_entry(net, net_exit_list, exit_list) {
        nf_ct_iterate_cleanup(kill_all, net, 0, 0);
        if (atomic_read(&net->ct.count) != 0)
         busy = 1;
       }
       if (busy) {
        schedule();
        goto i_see_dead_people;
       }
      
      When nf_ct_iterate_cleanup iterates the conntrack table, all nf_conn
      structures can be found twice:
      once for the original tuple and once for the conntracks reply tuple.
      
      get_next_corpse() only calls the iterator when the entry is
      in original direction -- the idea was to avoid unneeded invocations
      of the iterator callback.
      
      When support for clashing entries was added, the assumption that
      all nf_conn objects are added twice, once in original, once for reply
      tuple no longer holds -- NF_CLASH_BIT entries are only added in
      the non-clashing reply direction.
      
      Thus, if at least one NF_CLASH entry is in the list then
      nf_conntrack_cleanup_net_list() always skips it completely.
      
      During normal netns destruction, this causes a hang of several
      seconds, until the gc worker removes the entry (NF_CLASH entries
      always have a 1 second timeout).
      
      But in the rmmod case, the gc worker has already been stopped, so
      ct.count never becomes 0.
      
      We can fix this in two ways:
      
      1. Add a second test for CLASH_BIT and call iterator for those
         entries as well, or:
      2. Skip the original tuple direction and use the reply tuple.
      
      2) is simpler, so do that.
      
      Fixes: 6a757c07 ("netfilter: conntrack: allow insertion of clashing entries")
      Reported-by: NChen Yi <yiche@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      54ab49fd
    • R
      netfilter: flowtable: Remove WQ_MEM_RECLAIM from workqueue · 1d10da0e
      Roi Dayan 提交于
      This workqueue is in charge of handling offloaded flow tasks like
      add/del/stats we should not use WQ_MEM_RECLAIM flag.
      The flag can result in the following warning.
      
      [  485.557189] ------------[ cut here ]------------
      [  485.562976] workqueue: WQ_MEM_RECLAIM nf_flow_table_offload:flow_offload_worr
      [  485.562985] WARNING: CPU: 7 PID: 3731 at kernel/workqueue.c:2610 check_flush0
      [  485.590191] Kernel panic - not syncing: panic_on_warn set ...
      [  485.597100] CPU: 7 PID: 3731 Comm: kworker/u112:8 Not tainted 5.7.0-rc1.21802
      [  485.606629] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/177
      [  485.615487] Workqueue: nf_flow_table_offload flow_offload_work_handler [nf_f]
      [  485.624834] Call Trace:
      [  485.628077]  dump_stack+0x50/0x70
      [  485.632280]  panic+0xfb/0x2d7
      [  485.636083]  ? check_flush_dependency+0x110/0x130
      [  485.641830]  __warn.cold.12+0x20/0x2a
      [  485.646405]  ? check_flush_dependency+0x110/0x130
      [  485.652154]  ? check_flush_dependency+0x110/0x130
      [  485.657900]  report_bug+0xb8/0x100
      [  485.662187]  ? sched_clock_cpu+0xc/0xb0
      [  485.666974]  do_error_trap+0x9f/0xc0
      [  485.671464]  do_invalid_op+0x36/0x40
      [  485.675950]  ? check_flush_dependency+0x110/0x130
      [  485.681699]  invalid_op+0x28/0x30
      
      Fixes: 7da182a9 ("netfilter: flowtable: Use work entry per offload command")
      Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1d10da0e
    • P
      netfilter: flowtable: Add pending bit for offload work · 2c889795
      Paul Blakey 提交于
      Gc step can queue offloaded flow del work or stats work.
      Those work items can race each other and a flow could be freed
      before the stats work is executed and querying it.
      To avoid that, add a pending bit that if a work exists for a flow
      don't queue another work for it.
      This will also avoid adding multiple stats works in case stats work
      didn't complete but gc step started again.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2c889795
    • A
      netfilter: conntrack: avoid gcc-10 zero-length-bounds warning · 2c407aca
      Arnd Bergmann 提交于
      gcc-10 warns around a suspicious access to an empty struct member:
      
      net/netfilter/nf_conntrack_core.c: In function '__nf_conntrack_alloc':
      net/netfilter/nf_conntrack_core.c:1522:9: warning: array subscript 0 is outside the bounds of an interior zero-length array 'u8[0]' {aka 'unsigned char[0]'} [-Wzero-length-bounds]
       1522 |  memset(&ct->__nfct_init_offset[0], 0,
            |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from net/netfilter/nf_conntrack_core.c:37:
      include/net/netfilter/nf_conntrack.h:90:5: note: while referencing '__nfct_init_offset'
         90 |  u8 __nfct_init_offset[0];
            |     ^~~~~~~~~~~~~~~~~~
      
      The code is correct but a bit unusual. Rework it slightly in a way that
      does not trigger the warning, using an empty struct instead of an empty
      array. There are probably more elegant ways to do this, but this is the
      smallest change.
      
      Fixes: c41884ce ("netfilter: conntrack: avoid zeroing timer")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2c407aca
  7. 01 5月, 2020 1 次提交
  8. 30 4月, 2020 1 次提交
  9. 29 4月, 2020 2 次提交
  10. 28 4月, 2020 5 次提交
  11. 27 4月, 2020 3 次提交
    • C
      sysctl: pass kernel pointers to ->proc_handler · 32927393
      Christoph Hellwig 提交于
      Instead of having all the sysctl handlers deal with user pointers, which
      is rather hairy in terms of the BPF interaction, copy the input to and
      from  userspace in common code.  This also means that the strings are
      always NUL-terminated by the common code, making the API a little bit
      safer.
      
      As most handler just pass through the data to one of the common handlers
      a lot of the changes are mechnical.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      32927393
    • G
      netfilter: nat: never update the UDP checksum when it's 0 · ea64d8d6
      Guillaume Nault 提交于
      If the UDP header of a local VXLAN endpoint is NAT-ed, and the VXLAN
      device has disabled UDP checksums and enabled Tx checksum offloading,
      then the skb passed to udp_manip_pkt() has hdr->check == 0 (outer
      checksum disabled) and skb->ip_summed == CHECKSUM_PARTIAL (inner packet
      checksum offloaded).
      
      Because of the ->ip_summed value, udp_manip_pkt() tries to update the
      outer checksum with the new address and port, leading to an invalid
      checksum sent on the wire, as the original null checksum obviously
      didn't take the old address and port into account.
      
      So, we can't take ->ip_summed into account in udp_manip_pkt(), as it
      might not refer to the checksum we're acting on. Instead, we can base
      the decision to update the UDP checksum entirely on the value of
      hdr->check, because it's null if and only if checksum is disabled:
      
        * A fully computed checksum can't be 0, since a 0 checksum is
          represented by the CSUM_MANGLED_0 value instead.
      
        * A partial checksum can't be 0, since the pseudo-header always adds
          at least one non-zero value (the UDP protocol type 0x11) and adding
          more values to the sum can't make it wrap to 0 as the carry is then
          added to the wrapped number.
      
        * A disabled checksum uses the special value 0.
      
      The problem seems to be there from day one, although it was probably
      not visible before UDP tunnels were implemented.
      
      Fixes: 5b1158e9 ("[NETFILTER]: Add NAT support for nf_conntrack")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Reviewed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ea64d8d6
    • B
      netfilter: nf_conntrack: add IPS_HW_OFFLOAD status bit · 74f99482
      Bodong Wang 提交于
      This bit indicates that the conntrack entry is offloaded to hardware
      flow table. nf_conntrack entry will be tagged with [HW_OFFLOAD] if
      it's offload to hardware.
      
      cat /proc/net/nf_conntrack
      	ipv4 2 tcp 6 \
      	src=1.1.1.17 dst=1.1.1.16 sport=56394 dport=5001 \
      	src=1.1.1.16 dst=1.1.1.17 sport=5001 dport=56394 [HW_OFFLOAD] \
      	mark=0 zone=0 use=3
      
      Note that HW_OFFLOAD/OFFLOAD/ASSURED are mutually exclusive.
      
      Changelog:
      
      * V1->V2:
      - Remove check of lastused from stats. It was meant for cases such
        as removing driver module while traffic still running. Better to
        handle such cases from garbage collector.
      Signed-off-by: NBodong Wang <bodong@mellanox.com>
      Reviewed-by: NOz Shlomo <ozsh@mellanox.com>
      Reviewed-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      74f99482
  12. 19 4月, 2020 1 次提交
    • H
      netfilter: nat: fix error handling upon registering inet hook · b4faef17
      Hillf Danton 提交于
      A case of warning was reported by syzbot.
      
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 19934 at net/netfilter/nf_nat_core.c:1106
      nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
      Kernel panic - not syncing: panic_on_warn set ...
      CPU: 0 PID: 19934 Comm: syz-executor.5 Not tainted 5.6.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x188/0x20d lib/dump_stack.c:118
       panic+0x2e3/0x75c kernel/panic.c:221
       __warn.cold+0x2f/0x35 kernel/panic.c:582
       report_bug+0x27b/0x2f0 lib/bug.c:195
       fixup_bug arch/x86/kernel/traps.c:175 [inline]
       fixup_bug arch/x86/kernel/traps.c:170 [inline]
       do_error_trap+0x12b/0x220 arch/x86/kernel/traps.c:267
       do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:286
       invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027
      RIP: 0010:nf_nat_unregister_fn+0x532/0x5c0 net/netfilter/nf_nat_core.c:1106
      Code: ff df 48 c1 ea 03 80 3c 02 00 75 75 48 8b 44 24 10 4c 89 ef 48 c7 00 00 00 00 00 e8 e8 f8 53 fb e9 4d fe ff ff e8 ee 9c 16 fb <0f> 0b e9 41 fe ff ff e8 e2 45 54 fb e9 b5 fd ff ff 48 8b 7c 24 20
      RSP: 0018:ffffc90005487208 EFLAGS: 00010246
      RAX: 0000000000040000 RBX: 0000000000000004 RCX: ffffc9001444a000
      RDX: 0000000000040000 RSI: ffffffff865c94a2 RDI: 0000000000000005
      RBP: ffff88808b5cf000 R08: ffff8880a2620140 R09: fffffbfff14bcd79
      R10: ffffc90005487208 R11: fffffbfff14bcd78 R12: 0000000000000000
      R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
       nf_nat_ipv6_unregister_fn net/netfilter/nf_nat_proto.c:1017 [inline]
       nf_nat_inet_register_fn net/netfilter/nf_nat_proto.c:1038 [inline]
       nf_nat_inet_register_fn+0xfc/0x140 net/netfilter/nf_nat_proto.c:1023
       nf_tables_register_hook net/netfilter/nf_tables_api.c:224 [inline]
       nf_tables_addchain.constprop.0+0x82e/0x13c0 net/netfilter/nf_tables_api.c:1981
       nf_tables_newchain+0xf68/0x16a0 net/netfilter/nf_tables_api.c:2235
       nfnetlink_rcv_batch+0x83a/0x1610 net/netfilter/nfnetlink.c:433
       nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:543 [inline]
       nfnetlink_rcv+0x3af/0x420 net/netfilter/nfnetlink.c:561
       netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline]
       netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329
       netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:672
       ____sys_sendmsg+0x6bf/0x7e0 net/socket.c:2362
       ___sys_sendmsg+0x100/0x170 net/socket.c:2416
       __sys_sendmsg+0xec/0x1b0 net/socket.c:2449
       do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      and to quiesce it, unregister NFPROTO_IPV6 hook instead of NFPROTO_INET
      in case of failing to register NFPROTO_IPV4 hook.
      Reported-by: Nsyzbot <syzbot+33e06702fd6cffc24c40@syzkaller.appspotmail.com>
      Fixes: d164385e ("netfilter: nat: add inet family nat support")
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Stefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NHillf Danton <hdanton@sina.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b4faef17
  13. 16 4月, 2020 1 次提交
    • W
      netfilter: Avoid assigning 'const' pointer to non-const pointer · 514cc55b
      Will Deacon 提交于
      nf_remove_net_hook() uses WRITE_ONCE() to assign a 'const' pointer to a
      'non-const' pointer. Cleanups to the implementation of WRITE_ONCE() mean
      that this will give rise to a compiler warning, just like a plain old
      assignment would do:
      
        | In file included from ./include/linux/export.h:43,
        |                  from ./include/linux/linkage.h:7,
        |                  from ./include/linux/kernel.h:8,
        |                  from net/netfilter/core.c:9:
        | net/netfilter/core.c: In function ‘nf_remove_net_hook’:
        | ./include/linux/compiler.h:216:30: warning: assignment discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
        |   *(volatile typeof(x) *)&(x) = (val);  \
        |                               ^
        | net/netfilter/core.c:379:3: note: in expansion of macro ‘WRITE_ONCE’
        |    WRITE_ONCE(orig_ops[i], &dummy_ops);
        |    ^~~~~~~~~~
      
      Follow the pattern used elsewhere in this file and add a cast to 'void *'
      to squash the warning.
      
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      514cc55b
  14. 15 4月, 2020 1 次提交