1. 29 3月, 2017 6 次提交
  2. 28 3月, 2017 3 次提交
  3. 25 3月, 2017 15 次提交
  4. 24 3月, 2017 3 次提交
  5. 23 3月, 2017 13 次提交
    • E
      inet: frag: release spinlock before calling icmp_send() · ec4fbd64
      Eric Dumazet 提交于
      Dmitry reported a lockdep splat [1] (false positive) that we can fix
      by releasing the spinlock before calling icmp_send() from ip_expire()
      
      This is a false positive because sending an ICMP message can not
      possibly re-enter the IP frag engine.
      
      [1]
      [ INFO: possible circular locking dependency detected ]
      4.10.0+ #29 Not tainted
      -------------------------------------------------------
      modprobe/12392 is trying to acquire lock:
       (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] spin_lock
      include/linux/spinlock.h:299 [inline]
       (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] __netif_tx_lock
      include/linux/netdevice.h:3486 [inline]
       (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>]
      sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
      
      but task is already holding lock:
       (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
      include/linux/spinlock.h:299 [inline]
       (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
      ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&(&q->lock)->rlock){+.-...}:
             validate_chain kernel/locking/lockdep.c:2267 [inline]
             __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
             lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
             __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
             _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
             spin_lock include/linux/spinlock.h:299 [inline]
             ip_defrag+0x3a2/0x4130 net/ipv4/ip_fragment.c:669
             ip_check_defrag+0x4e3/0x8b0 net/ipv4/ip_fragment.c:713
             packet_rcv_fanout+0x282/0x800 net/packet/af_packet.c:1459
             deliver_skb net/core/dev.c:1834 [inline]
             dev_queue_xmit_nit+0x294/0xa90 net/core/dev.c:1890
             xmit_one net/core/dev.c:2903 [inline]
             dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2923
             sch_direct_xmit+0x31f/0x6d0 net/sched/sch_generic.c:182
             __dev_xmit_skb net/core/dev.c:3092 [inline]
             __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
             dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
             neigh_resolve_output+0x6b9/0xb10 net/core/neighbour.c:1308
             neigh_output include/net/neighbour.h:478 [inline]
             ip_finish_output2+0x8b8/0x15a0 net/ipv4/ip_output.c:228
             ip_do_fragment+0x1d93/0x2720 net/ipv4/ip_output.c:672
             ip_fragment.constprop.54+0x145/0x200 net/ipv4/ip_output.c:545
             ip_finish_output+0x82d/0xe10 net/ipv4/ip_output.c:314
             NF_HOOK_COND include/linux/netfilter.h:246 [inline]
             ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
             dst_output include/net/dst.h:486 [inline]
             ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
             ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
             ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
             raw_sendmsg+0x26de/0x3a00 net/ipv4/raw.c:655
             inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
             sock_sendmsg_nosec net/socket.c:633 [inline]
             sock_sendmsg+0xca/0x110 net/socket.c:643
             ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985
             __sys_sendmmsg+0x25c/0x750 net/socket.c:2075
             SYSC_sendmmsg net/socket.c:2106 [inline]
             SyS_sendmmsg+0x35/0x60 net/socket.c:2101
             do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
             return_from_SYSCALL_64+0x0/0x7a
      
      -> #0 (_xmit_ETHER#2){+.-...}:
             check_prev_add kernel/locking/lockdep.c:1830 [inline]
             check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
             validate_chain kernel/locking/lockdep.c:2267 [inline]
             __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
             lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
             __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
             _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
             spin_lock include/linux/spinlock.h:299 [inline]
             __netif_tx_lock include/linux/netdevice.h:3486 [inline]
             sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
             __dev_xmit_skb net/core/dev.c:3092 [inline]
             __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
             dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
             neigh_hh_output include/net/neighbour.h:468 [inline]
             neigh_output include/net/neighbour.h:476 [inline]
             ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
             ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
             NF_HOOK_COND include/linux/netfilter.h:246 [inline]
             ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
             dst_output include/net/dst.h:486 [inline]
             ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
             ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
             ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
             icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
             icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
             ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
             call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
             expire_timers kernel/time/timer.c:1307 [inline]
             __run_timers+0x960/0xcf0 kernel/time/timer.c:1601
             run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
             __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
             invoke_softirq kernel/softirq.c:364 [inline]
             irq_exit+0x1cc/0x200 kernel/softirq.c:405
             exiting_irq arch/x86/include/asm/apic.h:657 [inline]
             smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
             apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
             __read_once_size include/linux/compiler.h:254 [inline]
             atomic_read arch/x86/include/asm/atomic.h:26 [inline]
             rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
             __rcu_is_watching kernel/rcu/tree.c:1133 [inline]
             rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
             rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
             radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
             filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
             do_fault_around mm/memory.c:3231 [inline]
             do_read_fault mm/memory.c:3265 [inline]
             do_fault+0xbd5/0x2080 mm/memory.c:3370
             handle_pte_fault mm/memory.c:3600 [inline]
             __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
             handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
             __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
             do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
             page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&(&q->lock)->rlock);
                                     lock(_xmit_ETHER#2);
                                     lock(&(&q->lock)->rlock);
        lock(_xmit_ETHER#2);
      
       *** DEADLOCK ***
      
      10 locks held by modprobe/12392:
       #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81329758>]
      __do_page_fault+0x2b8/0xb60 arch/x86/mm/fault.c:1336
       #1:  (rcu_read_lock){......}, at: [<ffffffff8188cab6>]
      filemap_map_pages+0x1e6/0x1570 mm/filemap.c:2324
       #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
      spin_lock include/linux/spinlock.h:299 [inline]
       #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
      pte_alloc_one_map mm/memory.c:2944 [inline]
       #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
      alloc_set_pte+0x13b8/0x1b90 mm/memory.c:3072
       #3:  (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
      lockdep_copy_map include/linux/lockdep.h:175 [inline]
       #3:  (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
      call_timer_fn+0x1c2/0x820 kernel/time/timer.c:1258
       #4:  (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
      include/linux/spinlock.h:299 [inline]
       #4:  (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
      ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201
       #5:  (rcu_read_lock){......}, at: [<ffffffff8389a633>]
      ip_expire+0x1b3/0x6c0 net/ipv4/ip_fragment.c:216
       #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] spin_trylock
      include/linux/spinlock.h:309 [inline]
       #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_xmit_lock
      net/ipv4/icmp.c:219 [inline]
       #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>]
      icmp_send+0x803/0x1c80 net/ipv4/icmp.c:681
       #7:  (rcu_read_lock_bh){......}, at: [<ffffffff838ab9a1>]
      ip_finish_output2+0x2c1/0x15a0 net/ipv4/ip_output.c:198
       #8:  (rcu_read_lock_bh){......}, at: [<ffffffff836d1dee>]
      __dev_queue_xmit+0x23e/0x1e60 net/core/dev.c:3324
       #9:  (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at:
      [<ffffffff836d3a27>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
      
      stack backtrace:
      CPU: 0 PID: 12392 Comm: modprobe Not tainted 4.10.0+ #29
      Hardware name: Google Google Compute Engine/Google Compute Engine,
      BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
       print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
       check_prev_add kernel/locking/lockdep.c:1830 [inline]
       check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
       validate_chain kernel/locking/lockdep.c:2267 [inline]
       __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
       lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:299 [inline]
       __netif_tx_lock include/linux/netdevice.h:3486 [inline]
       sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
       __dev_xmit_skb net/core/dev.c:3092 [inline]
       __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
       neigh_hh_output include/net/neighbour.h:468 [inline]
       neigh_output include/net/neighbour.h:476 [inline]
       ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
       ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
       dst_output include/net/dst.h:486 [inline]
       ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
       ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
       ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
       icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
       icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
       ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
       call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
       expire_timers kernel/time/timer.c:1307 [inline]
       __run_timers+0x960/0xcf0 kernel/time/timer.c:1601
       run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
       __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364 [inline]
       irq_exit+0x1cc/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:657 [inline]
       smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
       apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
      RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline]
      RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
      RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
      RIP: 0010:__rcu_is_watching kernel/rcu/tree.c:1133 [inline]
      RIP: 0010:rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
      RSP: 0000:ffff8801c391f120 EFLAGS: 00000a03 ORIG_RAX: ffffffffffffff10
      RAX: dffffc0000000000 RBX: ffff8801c391f148 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 000055edd4374000 RDI: ffff8801dbe1ae0c
      RBP: ffff8801c391f1a0 R08: 0000000000000002 R09: 0000000000000000
      R10: dffffc0000000000 R11: 0000000000000002 R12: 1ffff10038723e25
      R13: ffff8801dbe1ae00 R14: ffff8801c391f680 R15: dffffc0000000000
       </IRQ>
       rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
       radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
       filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
       do_fault_around mm/memory.c:3231 [inline]
       do_read_fault mm/memory.c:3265 [inline]
       do_fault+0xbd5/0x2080 mm/memory.c:3370
       handle_pte_fault mm/memory.c:3600 [inline]
       __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
       handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
       __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
       do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
       page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011
      RIP: 0033:0x7f83172f2786
      RSP: 002b:00007fffe859ae80 EFLAGS: 00010293
      RAX: 000055edd4373040 RBX: 00007f83175111c8 RCX: 000055edd4373238
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8317510970
      RBP: 00007fffe859afd0 R08: 0000000000000009 R09: 0000000000000000
      R10: 0000000000000064 R11: 0000000000000000 R12: 000055edd4373040
      R13: 0000000000000000 R14: 00007fffe859afe8 R15: 0000000000000000
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec4fbd64
    • E
      tcp: initialize icsk_ack.lrcvtime at session start time · 15bb7745
      Eric Dumazet 提交于
      icsk_ack.lrcvtime has a 0 value at socket creation time.
      
      tcpi_last_data_recv can have bogus value if no payload is ever received.
      
      This patch initializes icsk_ack.lrcvtime for active sessions
      in tcp_finish_connect(), and for passive sessions in
      tcp_create_openreq_child()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15bb7745
    • S
      genetlink: fix counting regression on ctrl_dumpfamily() · 1d2a6a5e
      Stanislaw Gruszka 提交于
      Commit 2ae0f17d ("genetlink: use idr to track families") replaced
      
      	if (++n < fams_to_skip)
      		continue;
      into:
      
      	if (n++ < fams_to_skip)
      		continue;
      
      This subtle change cause that on retry ctrl_dumpfamily() call we omit
      one family that failed to do ctrl_fill_info() on previous call, because
      cb->args[0] = n number counts also family that failed to do
      ctrl_fill_info().
      
      Patch fixes the problem and avoid confusion in the future just decrease
      n counter when ctrl_fill_info() fail.
      
      User visible problem caused by this bug is failure to get access to
      some genetlink family i.e. nl80211. However problem is reproducible
      only if number of registered genetlink families is big enough to
      cause second call of ctrl_dumpfamily().
      
      Cc: Xose Vazquez Perez <xose.vazquez@gmail.com>
      Cc: Larry Finger <Larry.Finger@lwfinger.net>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Fixes: 2ae0f17d ("genetlink: use idr to track families")
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Acked-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d2a6a5e
    • D
      socket, bpf: fix sk_filter use after free in sk_clone_lock · a97e50cc
      Daniel Borkmann 提交于
      In sk_clone_lock(), we create a new socket and inherit most of the
      parent's members via sock_copy() which memcpy()'s various sections.
      Now, in case the parent socket had a BPF socket filter attached,
      then newsk->sk_filter points to the same instance as the original
      sk->sk_filter.
      
      sk_filter_charge() is then called on the newsk->sk_filter to take a
      reference and should that fail due to hitting max optmem, we bail
      out and release the newsk instance.
      
      The issue is that commit 278571ba ("net: filter: simplify socket
      charging") wrongly combined the dismantle path with the failure path
      of xfrm_sk_clone_policy(). This means, even when charging failed, we
      call sk_free_unlock_clone() on the newsk, which then still points to
      the same sk_filter as the original sk.
      
      Thus, sk_free_unlock_clone() calls into __sk_destruct() eventually
      where it tests for present sk_filter and calls sk_filter_uncharge()
      on it, which potentially lets sk_omem_alloc wrap around and releases
      the eBPF prog and sk_filter structure from the (still intact) parent.
      
      Fix it by making sure that when sk_filter_charge() failed, we reset
      newsk->sk_filter back to NULL before passing to sk_free_unlock_clone(),
      so that we don't mess with the parents sk_filter.
      
      Only if xfrm_sk_clone_policy() fails, we did reach the point where
      either the parent's filter was NULL and as a result newsk's as well
      or where we previously had a successful sk_filter_charge(), thus for
      that case, we do need sk_filter_uncharge() to release the prior taken
      reference on sk_filter.
      
      Fixes: 278571ba ("net: filter: simplify socket charging")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a97e50cc
    • J
      net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs. · bbea124b
      Joel Scherpelz 提交于
      This commit adds a new sysctl accept_ra_rt_info_min_plen that
      defines the minimum acceptable prefix length of Route Information
      Options. The new sysctl is intended to be used together with
      accept_ra_rt_info_max_plen to configure a range of acceptable
      prefix lengths. It is useful to prevent misconfigurations from
      unintentionally blackholing too much of the IPv6 address space
      (e.g., home routers announcing RIOs for fc00::/7, which is
      incorrect).
      Signed-off-by: NJoel Scherpelz <jscherpelz@google.com>
      Acked-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbea124b
    • E
      ipv4: provide stronger user input validation in nl_fib_input() · c64c0b3c
      Eric Dumazet 提交于
      Alexander reported a KMSAN splat caused by reads of uninitialized
      field (tb_id_in) from user provided struct fib_result_nl
      
      It turns out nl_fib_input() sanity tests on user input is a bit
      wrong :
      
      User can pretend nlh->nlmsg_len is big enough, but provide
      at sendmsg() time a too small buffer.
      Reported-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c64c0b3c
    • D
      rtnetlink: Add dump all for netconf · a7678c70
      David Ahern 提交于
      Use the rtnl_dump_all to dump all netconf handlers that have been
      registered. Allows userspace to send a dump request for PF_UNSPEC
      and get all families.
      
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7678c70
    • A
      ipv6: make sure to initialize sockc.tsflags before first use · d515684d
      Alexander Potapenko 提交于
      In the case udp_sk(sk)->pending is AF_INET6, udpv6_sendmsg() would
      jump to do_append_data, skipping the initialization of sockc.tsflags.
      Fix the problem by moving sockc.tsflags initialization earlier.
      
      The bug was detected with KMSAN.
      
      Fixes: c14ac945 ("sock: enable timestamping using control messages")
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d515684d
    • R
      net: convert sk_filter.refcnt from atomic_t to refcount_t · 4c355cdf
      Reshetova, Elena 提交于
      refcount_t type and corresponding API should be
      used instead of atomic_t when the variable is used as
      a reference counter. This allows to avoid accidental
      refcounter overflows that might lead to use-after-free
      situations.
      Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
      Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c355cdf
    • Y
      tipc: fix nametbl deadlock at tipc_nametbl_unsubscribe · 557d054c
      Ying Xue 提交于
      Until now, tipc_nametbl_unsubscribe() is called at subscriptions
      reference count cleanup. Usually the subscriptions cleanup is
      called at subscription timeout or at subscription cancel or at
      subscriber delete.
      
      We have ignored the possibility of this being called from other
      locations, which causes deadlock as we try to grab the
      tn->nametbl_lock while holding it already.
      
         CPU1:                             CPU2:
      ----------                     ----------------
      tipc_nametbl_publish
      spin_lock_bh(&tn->nametbl_lock)
      tipc_nametbl_insert_publ
      tipc_nameseq_insert_publ
      tipc_subscrp_report_overlap
      tipc_subscrp_get
      tipc_subscrp_send_event
                                   tipc_close_conn
                                   tipc_subscrb_release_cb
                                   tipc_subscrb_delete
                                   tipc_subscrp_put
      tipc_subscrp_put
      tipc_subscrp_kref_release
      tipc_nametbl_unsubscribe
      spin_lock_bh(&tn->nametbl_lock)
      <<grab nametbl_lock again>>
      
         CPU1:                              CPU2:
      ----------                     ----------------
      tipc_nametbl_stop
      spin_lock_bh(&tn->nametbl_lock)
      tipc_purge_publications
      tipc_nameseq_remove_publ
      tipc_subscrp_report_overlap
      tipc_subscrp_get
      tipc_subscrp_send_event
                                   tipc_close_conn
                                   tipc_subscrb_release_cb
                                   tipc_subscrb_delete
                                   tipc_subscrp_put
      tipc_subscrp_put
      tipc_subscrp_kref_release
      tipc_nametbl_unsubscribe
      spin_lock_bh(&tn->nametbl_lock)
      <<grab nametbl_lock again>>
      
      In this commit, we advance the calling of tipc_nametbl_unsubscribe()
      from the refcount cleanup to the intended callers.
      
      Fixes: d094c4d5 ("tipc: add subscription refcount to avoid invalid delete")
      Reported-by: NJohn Thompson <thompa.atl@gmail.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      557d054c
    • G
      net: tcp: Permit user set TCP_MAXSEG to default value · cfc62d87
      Gao Feng 提交于
      When user_mss is zero, it means use the default value. But the current
      codes don't permit user set TCP_MAXSEG to the default value.
      It would return the -EINVAL when val is zero.
      Signed-off-by: NGao Feng <fgao@ikuai8.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfc62d87
    • A
      Openvswitch: Refactor sample and recirc actions implementation · bef7f756
      andy zhou 提交于
      Added clone_execute() that both the sample and the recirc
      action implementation can use.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bef7f756
    • A
      openvswitch: Optimize sample action for the clone use cases · 798c1661
      andy zhou 提交于
      With the introduction of open flow 'clone' action, the OVS user space
      can now translate the 'clone' action into kernel datapath 'sample'
      action, with 100% probability, to ensure that the clone semantics,
      which is that the packet seen by the clone action is the same as the
      packet seen by the action after clone, is faithfully carried out
      in the datapath.
      
      While the sample action in the datpath has the matching semantics,
      its implementation is only optimized for its original use.
      Specifically, there are two limitation: First, there is a 3 level of
      nesting restriction, enforced at the flow downloading time. This
      limit turns out to be too restrictive for the 'clone' use case.
      Second, the implementation avoid recursive call only if the sample
      action list has a single userspace action.
      
      The main optimization implemented in this series removes the static
      nesting limit check, instead, implement the run time recursion limit
      check, and recursion avoidance similar to that of the 'recirc' action.
      This optimization solve both #1 and #2 issues above.
      
      One related optimization attempts to avoid copying flow key as
      long as the actions enclosed does not change the flow key. The
      detection is performed only once at the flow downloading time.
      
      Another related optimization is to rewrite the action list
      at flow downloading time in order to save the fast path from parsing
      the sample action list in its original form repeatedly.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      798c1661