1. 01 12月, 2018 7 次提交
  2. 30 11月, 2018 6 次提交
    • C
      net: Prevent invalid access to skb->prev in __qdisc_drop_all · 9410d386
      Christoph Paasch 提交于
      __qdisc_drop_all() accesses skb->prev to get to the tail of the
      segment-list.
      
      With commit 68d2f84a ("net: gro: properly remove skb from list")
      the skb-list handling has been changed to set skb->next to NULL and set
      the list-poison on skb->prev.
      
      With that change, __qdisc_drop_all() will panic when it tries to
      dereference skb->prev.
      
      Since commit 992cba7e ("net: Add and use skb_list_del_init().")
      __list_del_entry is used, leaving skb->prev unchanged (thus,
      pointing to the list-head if it's the first skb of the list).
      This will make __qdisc_drop_all modify the next-pointer of the list-head
      and result in a panic later on:
      
      [   34.501053] general protection fault: 0000 [#1] SMP KASAN PTI
      [   34.501968] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.20.0-rc2.mptcp #108
      [   34.502887] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
      [   34.504074] RIP: 0010:dev_gro_receive+0x343/0x1f90
      [   34.504751] Code: e0 48 c1 e8 03 42 80 3c 30 00 0f 85 4a 1c 00 00 4d 8b 24 24 4c 39 65 d0 0f 84 0a 04 00 00 49 8d 7c 24 38 48 89 f8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 04
      [   34.507060] RSP: 0018:ffff8883af507930 EFLAGS: 00010202
      [   34.507761] RAX: 0000000000000007 RBX: ffff8883970b2c80 RCX: 1ffff11072e165a6
      [   34.508640] RDX: 1ffff11075867008 RSI: ffff8883ac338040 RDI: 0000000000000038
      [   34.509493] RBP: ffff8883af5079d0 R08: ffff8883970b2d40 R09: 0000000000000062
      [   34.510346] R10: 0000000000000034 R11: 0000000000000000 R12: 0000000000000000
      [   34.511215] R13: 0000000000000000 R14: dffffc0000000000 R15: ffff8883ac338008
      [   34.512082] FS:  0000000000000000(0000) GS:ffff8883af500000(0000) knlGS:0000000000000000
      [   34.513036] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   34.513741] CR2: 000055ccc3e9d020 CR3: 00000003abf32000 CR4: 00000000000006e0
      [   34.514593] Call Trace:
      [   34.514893]  <IRQ>
      [   34.515157]  napi_gro_receive+0x93/0x150
      [   34.515632]  receive_buf+0x893/0x3700
      [   34.516094]  ? __netif_receive_skb+0x1f/0x1a0
      [   34.516629]  ? virtnet_probe+0x1b40/0x1b40
      [   34.517153]  ? __stable_node_chain+0x4d0/0x850
      [   34.517684]  ? kfree+0x9a/0x180
      [   34.518067]  ? __kasan_slab_free+0x171/0x190
      [   34.518582]  ? detach_buf+0x1df/0x650
      [   34.519061]  ? lapic_next_event+0x5a/0x90
      [   34.519539]  ? virtqueue_get_buf_ctx+0x280/0x7f0
      [   34.520093]  virtnet_poll+0x2df/0xd60
      [   34.520533]  ? receive_buf+0x3700/0x3700
      [   34.521027]  ? qdisc_watchdog_schedule_ns+0xd5/0x140
      [   34.521631]  ? htb_dequeue+0x1817/0x25f0
      [   34.522107]  ? sch_direct_xmit+0x142/0xf30
      [   34.522595]  ? virtqueue_napi_schedule+0x26/0x30
      [   34.523155]  net_rx_action+0x2f6/0xc50
      [   34.523601]  ? napi_complete_done+0x2f0/0x2f0
      [   34.524126]  ? kasan_check_read+0x11/0x20
      [   34.524608]  ? _raw_spin_lock+0x7d/0xd0
      [   34.525070]  ? _raw_spin_lock_bh+0xd0/0xd0
      [   34.525563]  ? kvm_guest_apic_eoi_write+0x6b/0x80
      [   34.526130]  ? apic_ack_irq+0x9e/0xe0
      [   34.526567]  __do_softirq+0x188/0x4b5
      [   34.527015]  irq_exit+0x151/0x180
      [   34.527417]  do_IRQ+0xdb/0x150
      [   34.527783]  common_interrupt+0xf/0xf
      [   34.528223]  </IRQ>
      
      This patch makes sure that skb->prev is set to NULL when entering
      netem_enqueue.
      
      Cc: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Fixes: 68d2f84a ("net: gro: properly remove skb from list")
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9410d386
    • M
      net/x25: handle call collisions · b020fcf6
      Martin Schiller 提交于
      If a session in X25_STATE_1 (Awaiting Call Accept) receives a call
      request, the session will be closed (x25_disconnect), cause=0x01
      (Number Busy) and diag=0x48 (Call Collision) will be set and a clear
      request will be send.
      Signed-off-by: NMartin Schiller <ms@dev.tdt.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b020fcf6
    • M
      net/x25: fix null_x25_address handling · 06137619
      Martin Schiller 提交于
      o x25_find_listener(): the compare for the null_x25_address was wrong.
         We have to check the x25_addr of the listener socket instead of the
         x25_addr of the incomming call.
      
       o x25_bind(): it was not possible to bind a socket to null_x25_address
      Signed-off-by: NMartin Schiller <ms@dev.tdt.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06137619
    • M
      net/x25: fix called/calling length calculation in x25_parse_address_block · d449ba3d
      Martin Schiller 提交于
      The length of the called and calling address was not calculated
      correctly (BCD encoding).
      Signed-off-by: NMartin Schiller <ms@dev.tdt.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d449ba3d
    • S
      net: fix XPS static_key accounting · 867d0ad4
      Sabrina Dubroca 提交于
      Commit 04157469 ("net: Use static_key for XPS maps") introduced a
      static key for XPS, but the increments/decrements don't match.
      
      First, the static key's counter is incremented once for each queue, but
      only decremented once for a whole batch of queues, leading to large
      unbalances.
      
      Second, the xps_rxqs_needed key is decremented whenever we reset a batch
      of queues, whether they had any rxqs mapping or not, so that if we setup
      cpu-XPS on em1 and RXQS-XPS on em2, resetting the queues on em1 would
      decrement the xps_rxqs_needed key.
      
      This reworks the accounting scheme so that the xps_needed key is
      incremented only once for each type of XPS for all the queues on a
      device, and the xps_rxqs_needed key is incremented only once for all
      queues. This is sufficient to let us retrieve queues via
      get_xps_queue().
      
      This patch introduces a new reset_xps_maps(), which reinitializes and
      frees the appropriate map (xps_rxqs_map or xps_cpus_map), and drops a
      reference to the needed keys:
       - both xps_needed and xps_rxqs_needed, in case of rxqs maps,
       - only xps_needed, in case of CPU maps.
      
      Now, we also need to call reset_xps_maps() at the end of
      __netif_set_xps_queue() when there's no active map left, for example
      when writing '00000000,00000000' to all queues' xps_rxqs setting.
      
      Fixes: 04157469 ("net: Use static_key for XPS maps")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      867d0ad4
    • S
      net: restore call to netdev_queue_numa_node_write when resetting XPS · f28c020f
      Sabrina Dubroca 提交于
      Before commit 80d19669 ("net: Refactor XPS for CPUs and Rx queues"),
      netif_reset_xps_queues() did netdev_queue_numa_node_write() for all the
      queues being reset. Now, this is only done when the "active" variable in
      clean_xps_maps() is false, ie when on all the CPUs, there's no active
      XPS mapping left.
      
      Fixes: 80d19669 ("net: Refactor XPS for CPUs and Rx queues")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f28c020f
  3. 28 11月, 2018 3 次提交
    • T
      netfilter: nf_tables: deactivate expressions in rule replecement routine · ca089878
      Taehee Yoo 提交于
      There is no expression deactivation call from the rule replacement path,
      hence, chain counter is not decremented. A few steps to reproduce the
      problem:
      
         %nft add table ip filter
         %nft add chain ip filter c1
         %nft add chain ip filter c1
         %nft add rule ip filter c1 jump c2
         %nft replace rule ip filter c1 handle 3 accept
         %nft flush ruleset
      
      <jump c2> expression means immediate NFT_JUMP to chain c2.
      Reference count of chain c2 is increased when the rule is added.
      
      When rule is deleted or replaced, the reference counter of c2 should be
      decreased via nft_rule_expr_deactivate() which calls
      nft_immediate_deactivate().
      
      Splat looks like:
      [  214.396453] WARNING: CPU: 1 PID: 21 at net/netfilter/nf_tables_api.c:1432 nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
      [  214.398983] Modules linked in: nf_tables nfnetlink
      [  214.398983] CPU: 1 PID: 21 Comm: kworker/1:1 Not tainted 4.20.0-rc2+ #44
      [  214.398983] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
      [  214.398983] RIP: 0010:nf_tables_chain_destroy.isra.38+0x2f9/0x3a0 [nf_tables]
      [  214.398983] Code: 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 8e 00 00 00 48 8b 7b 58 e8 e1 2c 4e c6 48 89 df e8 d9 2c 4e c6 eb 9a <0f> 0b eb 96 0f 0b e9 7e fe ff ff e8 a7 7e 4e c6 e9 a4 fe ff ff e8
      [  214.398983] RSP: 0018:ffff8881152874e8 EFLAGS: 00010202
      [  214.398983] RAX: 0000000000000001 RBX: ffff88810ef9fc28 RCX: ffff8881152876f0
      [  214.398983] RDX: dffffc0000000000 RSI: 1ffff11022a50ede RDI: ffff88810ef9fc78
      [  214.398983] RBP: 1ffff11022a50e9d R08: 0000000080000000 R09: 0000000000000000
      [  214.398983] R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff11022a50eba
      [  214.398983] R13: ffff888114446e08 R14: ffff8881152876f0 R15: ffffed1022a50ed6
      [  214.398983] FS:  0000000000000000(0000) GS:ffff888116400000(0000) knlGS:0000000000000000
      [  214.398983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  214.398983] CR2: 00007fab9bb5f868 CR3: 000000012aa16000 CR4: 00000000001006e0
      [  214.398983] Call Trace:
      [  214.398983]  ? nf_tables_table_destroy.isra.37+0x100/0x100 [nf_tables]
      [  214.398983]  ? __kasan_slab_free+0x145/0x180
      [  214.398983]  ? nf_tables_trans_destroy_work+0x439/0x830 [nf_tables]
      [  214.398983]  ? kfree+0xdb/0x280
      [  214.398983]  nf_tables_trans_destroy_work+0x5f5/0x830 [nf_tables]
      [ ... ]
      
      Fixes: bb7b40ae ("netfilter: nf_tables: bogus EBUSY in chain deletions")
      Reported by: Christoph Anton Mitterer <calestyo@scientia.net>
      Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=914505
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=201791Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ca089878
    • J
      tipc: fix lockdep warning during node delete · ec835f89
      Jon Maloy 提交于
      We see the following lockdep warning:
      
      [ 2284.078521] ======================================================
      [ 2284.078604] WARNING: possible circular locking dependency detected
      [ 2284.078604] 4.19.0+ #42 Tainted: G            E
      [ 2284.078604] ------------------------------------------------------
      [ 2284.078604] rmmod/254 is trying to acquire lock:
      [ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0
      [ 2284.078604]
      [ 2284.078604] but task is already holding lock:
      [ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc]
      [ 2284.078604]
      [ 2284.078604] which lock already depends on the new lock.
      [ 2284.078604]
      [ 2284.078604]
      [ 2284.078604] the existing dependency chain (in reverse order) is:
      [ 2284.078604]
      [ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
      [ 2284.078604]        tipc_node_timeout+0x20a/0x330 [tipc]
      [ 2284.078604]        call_timer_fn+0xa1/0x280
      [ 2284.078604]        run_timer_softirq+0x1f2/0x4d0
      [ 2284.078604]        __do_softirq+0xfc/0x413
      [ 2284.078604]        irq_exit+0xb5/0xc0
      [ 2284.078604]        smp_apic_timer_interrupt+0xac/0x210
      [ 2284.078604]        apic_timer_interrupt+0xf/0x20
      [ 2284.078604]        default_idle+0x1c/0x140
      [ 2284.078604]        do_idle+0x1bc/0x280
      [ 2284.078604]        cpu_startup_entry+0x19/0x20
      [ 2284.078604]        start_secondary+0x187/0x1c0
      [ 2284.078604]        secondary_startup_64+0xa4/0xb0
      [ 2284.078604]
      [ 2284.078604] -> #0 ((&n->timer)#2){+.-.}:
      [ 2284.078604]        del_timer_sync+0x34/0xa0
      [ 2284.078604]        tipc_node_delete+0x1a/0x40 [tipc]
      [ 2284.078604]        tipc_node_stop+0xcb/0x190 [tipc]
      [ 2284.078604]        tipc_net_stop+0x154/0x170 [tipc]
      [ 2284.078604]        tipc_exit_net+0x16/0x30 [tipc]
      [ 2284.078604]        ops_exit_list.isra.8+0x36/0x70
      [ 2284.078604]        unregister_pernet_operations+0x87/0xd0
      [ 2284.078604]        unregister_pernet_subsys+0x1d/0x30
      [ 2284.078604]        tipc_exit+0x11/0x6f2 [tipc]
      [ 2284.078604]        __x64_sys_delete_module+0x1df/0x240
      [ 2284.078604]        do_syscall_64+0x66/0x460
      [ 2284.078604]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 2284.078604]
      [ 2284.078604] other info that might help us debug this:
      [ 2284.078604]
      [ 2284.078604]  Possible unsafe locking scenario:
      [ 2284.078604]
      [ 2284.078604]        CPU0                    CPU1
      [ 2284.078604]        ----                    ----
      [ 2284.078604]   lock(&(&tn->node_list_lock)->rlock);
      [ 2284.078604]                                lock((&n->timer)#2);
      [ 2284.078604]                                lock(&(&tn->node_list_lock)->rlock);
      [ 2284.078604]   lock((&n->timer)#2);
      [ 2284.078604]
      [ 2284.078604]  *** DEADLOCK ***
      [ 2284.078604]
      [ 2284.078604] 3 locks held by rmmod/254:
      [ 2284.078604]  #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30
      [ 2284.078604]  #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc]
      [ 2284.078604]  #2: 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19
      [...}
      
      The reason is that the node timer handler sometimes needs to delete a
      node which has been disconnected for too long. To do this, it grabs
      the lock 'node_list_lock', which may at the same time be held by the
      generic node cleanup function, tipc_node_stop(), during module removal.
      Since the latter is calling del_timer_sync() inside the same lock, we
      have a potential deadlock.
      
      We fix this letting the timer cleanup function use spin_trylock()
      instead of just spin_lock(), and when it fails to grab the lock it
      just returns so that the timer handler can terminate its execution.
      This is safe to do, since tipc_node_stop() anyway is about to
      delete both the timer and the node instance.
      
      Fixes: 6a939f36 ("tipc: Auto removal of peer down node instance")
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec835f89
    • X
      sctp: increase sk_wmem_alloc when head->truesize is increased · 0d32f177
      Xin Long 提交于
      I changed to count sk_wmem_alloc by skb truesize instead of 1 to
      fix the sk_wmem_alloc leak caused by later truesize's change in
      xfrm in Commit 02968ccf ("sctp: count sk_wmem_alloc by skb
      truesize in sctp_packet_transmit").
      
      But I should have also increased sk_wmem_alloc when head->truesize
      is increased in sctp_packet_gso_append() as xfrm does. Otherwise,
      sctp gso packet will cause sk_wmem_alloc underflow.
      
      Fixes: 02968ccf ("sctp: count sk_wmem_alloc by skb truesize in sctp_packet_transmit")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d32f177
  4. 27 11月, 2018 4 次提交
    • T
      netfilter: nf_conncount: remove wrong condition check routine · 53ca0f2f
      Taehee Yoo 提交于
      All lists that reach the tree_nodes_free() function have both zero
      counter and true dead flag. The reason for this is that lists to be
      release are selected by nf_conncount_gc_list() which already decrements
      the list counter and sets on the dead flag. Therefore, this if statement
      in tree_nodes_free() is unnecessary and wrong.
      
      Fixes: 31568ec0 ("netfilter: nf_conncount: fix list_del corruption in conn_free")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      53ca0f2f
    • T
      netfilter: nat: fix double register in masquerade modules · 095faf45
      Taehee Yoo 提交于
      There is a reference counter to ensure that masquerade modules register
      notifiers only once. However, the existing reference counter approach is
      not safe, test commands are:
      
         while :
         do
         	   modprobe ip6t_MASQUERADE &
      	   modprobe nft_masq_ipv6 &
      	   modprobe -rv ip6t_MASQUERADE &
      	   modprobe -rv nft_masq_ipv6 &
         done
      
      numbers below represent the reference counter.
      --------------------------------------------------------
      CPU0        CPU1        CPU2        CPU3        CPU4
      [insmod]    [insmod]    [rmmod]     [rmmod]     [insmod]
      --------------------------------------------------------
      0->1
      register    1->2
                  returns     2->1
      			returns     1->0
                                                      0->1
                                                      register <--
                                          unregister
      --------------------------------------------------------
      
      The unregistation of CPU3 should be processed before the
      registration of CPU4.
      
      In order to fix this, use a mutex instead of reference counter.
      
      splat looks like:
      [  323.869557] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [modprobe:1381]
      [  323.869574] Modules linked in: nf_tables(+) nf_nat_ipv6(-) nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 n]
      [  323.869574] irq event stamp: 194074
      [  323.898930] hardirqs last  enabled at (194073): [<ffffffff90004a0d>] trace_hardirqs_on_thunk+0x1a/0x1c
      [  323.898930] hardirqs last disabled at (194074): [<ffffffff90004a29>] trace_hardirqs_off_thunk+0x1a/0x1c
      [  323.898930] softirqs last  enabled at (182132): [<ffffffff922006ec>] __do_softirq+0x6ec/0xa3b
      [  323.898930] softirqs last disabled at (182109): [<ffffffff90193426>] irq_exit+0x1a6/0x1e0
      [  323.898930] CPU: 0 PID: 1381 Comm: modprobe Not tainted 4.20.0-rc2+ #27
      [  323.898930] RIP: 0010:raw_notifier_chain_register+0xea/0x240
      [  323.898930] Code: 3c 03 0f 8e f2 00 00 00 44 3b 6b 10 7f 4d 49 bc 00 00 00 00 00 fc ff df eb 22 48 8d 7b 10 488
      [  323.898930] RSP: 0018:ffff888101597218 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13
      [  323.898930] RAX: 0000000000000000 RBX: ffffffffc04361c0 RCX: 0000000000000000
      [  323.898930] RDX: 1ffffffff26132ae RSI: ffffffffc04aa3c0 RDI: ffffffffc04361d0
      [  323.898930] RBP: ffffffffc04361c8 R08: 0000000000000000 R09: 0000000000000001
      [  323.898930] R10: ffff8881015972b0 R11: fffffbfff26132c4 R12: dffffc0000000000
      [  323.898930] R13: 0000000000000000 R14: 1ffff110202b2e44 R15: ffffffffc04aa3c0
      [  323.898930] FS:  00007f813ed41540(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      [  323.898930] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  323.898930] CR2: 0000559bf2c9f120 CR3: 000000010bc80000 CR4: 00000000001006f0
      [  323.898930] Call Trace:
      [  323.898930]  ? atomic_notifier_chain_register+0x2d0/0x2d0
      [  323.898930]  ? down_read+0x150/0x150
      [  323.898930]  ? sched_clock_cpu+0x126/0x170
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  register_netdevice_notifier+0xbb/0x790
      [  323.898930]  ? __dev_close_many+0x2d0/0x2d0
      [  323.898930]  ? __mutex_unlock_slowpath+0x17f/0x740
      [  323.898930]  ? wait_for_completion+0x710/0x710
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  323.898930]  ? up_write+0x6c/0x210
      [  323.898930]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  324.127073]  ? nf_tables_core_module_init+0xe4/0xe4 [nf_tables]
      [  324.127073]  nft_chain_filter_init+0x1e/0xe8a [nf_tables]
      [  324.127073]  nf_tables_module_init+0x37/0x92 [nf_tables]
      [ ... ]
      
      Fixes: 8dd33cc9 ("netfilter: nf_nat: generalize IPv4 masquerading support for nf_tables")
      Fixes: be6b635c ("netfilter: nf_nat: generalize IPv6 masquerading support for nf_tables")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      095faf45
    • T
      netfilter: add missing error handling code for register functions · 584eab29
      Taehee Yoo 提交于
      register_{netdevice/inetaddr/inet6addr}_notifier may return an error
      value, this patch adds the code to handle these error paths.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      584eab29
    • A
      netfilter: ipv6: Preserve link scope traffic original oif · 508b0904
      Alin Nastac 提交于
      When ip6_route_me_harder is invoked, it resets outgoing interface of:
        - link-local scoped packets sent by neighbor discovery
        - multicast packets sent by MLD host
        - multicast packets send by MLD proxy daemon that sets outgoing
          interface through IPV6_PKTINFO ipi6_ifindex
      
      Link-local and multicast packets must keep their original oif after
      ip6_route_me_harder is called.
      Signed-off-by: NAlin Nastac <alin.nastac@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      508b0904
  5. 26 11月, 2018 2 次提交
    • F
      netfilter: nfnetlink_cttimeout: fetch timeouts for udplite and gre, too · 89259088
      Florian Westphal 提交于
      syzbot was able to trigger the WARN in cttimeout_default_get() by
      passing UDPLITE as l4protocol.  Alias UDPLITE to UDP, both use
      same timeout values.
      
      Furthermore, also fetch GRE timeouts.  GRE is a bit more complicated,
      as it still can be a module and its netns_proto_gre struct layout isn't
      visible outside of the gre module. Can't move timeouts around, it
      appears conntrack sysctl unregister assumes net_generic() returns
      nf_proto_net, so we get crash. Expose layout of netns_proto_gre instead.
      
      A followup nf-next patch could make gre tracker be built-in as well
      if needed, its not that large.
      
      Last, make the WARN() mention the missing protocol value in case
      anything else is missing.
      
      Reported-by: syzbot+2fae8fa157dd92618cae@syzkaller.appspotmail.com
      Fixes: 8866df92 ("netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      89259088
    • X
      ipvs: call ip_vs_dst_notifier earlier than ipv6_dev_notf · 2a31e4bd
      Xin Long 提交于
      ip_vs_dst_event is supposed to clean up all dst used in ipvs'
      destinations when a net dev is going down. But it works only
      when the dst's dev is the same as the dev from the event.
      
      Now with the same priority but late registration,
      ip_vs_dst_notifier is always called later than ipv6_dev_notf
      where the dst's dev is set to lo for NETDEV_DOWN event.
      
      As the dst's dev lo is not the same as the dev from the event
      in ip_vs_dst_event, ip_vs_dst_notifier doesn't actually work.
      Also as these dst have to wait for dest_trash_timer to clean
      them up. It would cause some non-permanent kernel warnings:
      
        unregister_netdevice: waiting for br0 to become free. Usage count = 3
      
      To fix it, call ip_vs_dst_notifier earlier than ipv6_dev_notf
      by increasing its priority to ADDRCONF_NOTIFY_PRIORITY + 5.
      
      Note that for ipv4 route fib_netdev_notifier doesn't set dst's
      dev to lo in NETDEV_DOWN event, so this fix is only needed when
      IP_VS_IPV6 is defined.
      
      Fixes: 7a4f0761 ("IPVS: init and cleanup restructuring")
      Reported-by: NLi Shuang <shuali@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2a31e4bd
  6. 25 11月, 2018 2 次提交
    • W
      net: always initialize pagedlen · aba36930
      Willem de Bruijn 提交于
      In ip packet generation, pagedlen is initialized for each skb at the
      start of the loop in __ip(6)_append_data, before label alloc_new_skb.
      
      Depending on compiler options, code can be generated that jumps to
      this label, triggering use of an an uninitialized variable.
      
      In practice, at -O2, the generated code moves the initialization below
      the label. But the code should not rely on that for correctness.
      
      Fixes: 15e36f5b ("udp: paged allocation with gso")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aba36930
    • E
      tcp: address problems caused by EDT misshaps · 9efdda4e
      Eric Dumazet 提交于
      When a qdisc setup including pacing FQ is dismantled and recreated,
      some TCP packets are sent earlier than instructed by TCP stack.
      
      TCP can be fooled when ACK comes back, because the following
      operation can return a negative value.
      
          tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr;
      
      Some paths in TCP stack were not dealing properly with this,
      this patch addresses four of them.
      
      Fixes: ab408b6d ("tcp: switch tcp and sch_fq to new earliest departure time model")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9efdda4e
  7. 24 11月, 2018 4 次提交
    • D
      net/sched: act_police: add missing spinlock initialization · 484afd1b
      Davide Caratti 提交于
      commit f2cbd485 ("net/sched: act_police: fix race condition on state
      variables") introduces a new spinlock, but forgets its initialization.
      Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police'
      action is newly created, to avoid the following lockdep splat:
      
       INFO: trying to register non-static key.
       the code is fine but needs lockdep annotation.
       turning off the locking correctness validator.
       <...>
       Call Trace:
        dump_stack+0x85/0xcb
        register_lock_class+0x581/0x590
        __lock_acquire+0xd4/0x1330
        ? tcf_police_init+0x2fa/0x650 [act_police]
        ? lock_acquire+0x9e/0x1a0
        lock_acquire+0x9e/0x1a0
        ? tcf_police_init+0x2fa/0x650 [act_police]
        ? tcf_police_init+0x55a/0x650 [act_police]
        _raw_spin_lock_bh+0x34/0x40
        ? tcf_police_init+0x2fa/0x650 [act_police]
        tcf_police_init+0x2fa/0x650 [act_police]
        tcf_action_init_1+0x384/0x4c0
        tcf_action_init+0xf6/0x160
        tcf_action_add+0x73/0x170
        tc_ctl_action+0x122/0x160
        rtnetlink_rcv_msg+0x2a4/0x490
        ? netlink_deliver_tap+0x99/0x400
        ? validate_linkmsg+0x370/0x370
        netlink_rcv_skb+0x4d/0x130
        netlink_unicast+0x196/0x230
        netlink_sendmsg+0x2e5/0x3e0
        sock_sendmsg+0x36/0x40
        ___sys_sendmsg+0x280/0x2f0
        ? _raw_spin_unlock+0x24/0x30
        ? handle_pte_fault+0xafe/0xf30
        ? find_held_lock+0x2d/0x90
        ? syscall_trace_enter+0x1df/0x360
        ? __sys_sendmsg+0x5e/0xa0
        __sys_sendmsg+0x5e/0xa0
        do_syscall_64+0x60/0x210
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f1841c7cf10
       Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
       RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10
       RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003
       RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc
       R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0
       R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000
      
      Fixes: f2cbd485 ("net/sched: act_police: fix race condition on state variables")
      Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      484afd1b
    • P
      net: don't keep lonely packets forever in the gro hash · 605108ac
      Paolo Abeni 提交于
      Eric noted that with UDP GRO and NAPI timeout, we could keep a single
      UDP packet inside the GRO hash forever, if the related NAPI instance
      calls napi_gro_complete() at an higher frequency than the NAPI timeout.
      Willem noted that even TCP packets could be trapped there, till the
      next retransmission.
      This patch tries to address the issue, flushing the old packets -
      those with a NAPI_GRO_CB age before the current jiffy - before scheduling
      the NAPI timeout. The rationale is that such a timeout should be
      well below a jiffy and we are not flushing packets eligible for sane GRO.
      
      v1  -> v2:
       - clarified the commit message and comment
      
      RFC -> v1:
       - added 'Fixes tags', cleaned-up the wording.
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Fixes: 3b47d303 ("net: gro: add a per device gro flush timer")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      605108ac
    • H
      net/ipv6: re-do dad when interface has IFF_NOARP flag change · 896585d4
      Hangbin Liu 提交于
      When we add a new IPv6 address, we should also join corresponding solicited-node
      multicast address, unless the interface has IFF_NOARP flag, as function
      addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do
      not do dad and add the mcast address. So we will drop corresponding neighbour
      discovery message that came from other nodes.
      
      A typical example is after creating a ipvlan with mode l3, setting up an ipv6
      address and changing the mode to l2. Then we will not be able to ping this
      address as the interface doesn't join related solicited-node mcast address.
      
      Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add
      corresponding mcast group and check if there is a duplicate address on the
      network.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      896585d4
    • W
      packet: copy user buffers before orphan or clone · 5cd8d46e
      Willem de Bruijn 提交于
      tpacket_snd sends packets with user pages linked into skb frags. It
      notifies that pages can be reused when the skb is released by setting
      skb->destructor to tpacket_destruct_skb.
      
      This can cause data corruption if the skb is orphaned (e.g., on
      transmit through veth) or cloned (e.g., on mirror to another psock).
      
      Create a kernel-private copy of data in these cases, same as tun/tap
      zerocopy transmission. Reuse that infrastructure: mark the skb as
      SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
      
      Unlike other zerocopy packets, do not set shinfo destructor_arg to
      struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
      when the original skb is released and a timestamp is recorded. Do not
      change this timestamp behavior. The ubuf_info->callback is not needed
      anyway, as no zerocopy notification is expected.
      
      Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
      resulting value is not a valid ubuf_info pointer, nor a valid
      tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
      
      The fix relies on features introduced in commit 52267790 ("sock:
      add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
      
      Tested with from `./in_netns.sh ./txring_overwrite` from
      http://github.com/wdebruij/kerneltools/tests
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Reported-by: NAnand H. Krishnan <anandhkrishnan@gmail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5cd8d46e
  8. 22 11月, 2018 7 次提交
  9. 21 11月, 2018 2 次提交
  10. 20 11月, 2018 3 次提交
反馈
建议
客服 返回
顶部