1. 11 2月, 2017 4 次提交
  2. 09 2月, 2017 1 次提交
  3. 08 2月, 2017 3 次提交
  4. 07 2月, 2017 1 次提交
  5. 06 2月, 2017 1 次提交
  6. 04 2月, 2017 4 次提交
    • E
      net: skb_needs_check() accepts CHECKSUM_NONE for tx · 6e7bc478
      Eric Dumazet 提交于
      My recent change missed fact that UFO would perform a complete
      UDP checksum before segmenting in frags.
      
      In this case skb->ip_summed is set to CHECKSUM_NONE.
      
      We need to add this valid case to skb_needs_check()
      
      Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e7bc478
    • E
      net: remove support for per driver ndo_busy_poll() · 79e7fff4
      Eric Dumazet 提交于
      We added generic support for busy polling in NAPI layer in linux-4.5
      
      No network driver uses ndo_busy_poll() anymore, we can get rid
      of the pointer in struct net_device_ops, and its use in sk_busy_loop()
      
      Saves NETIF_F_BUSY_POLL features bit.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e7fff4
    • E
      net: use a work queue to defer net_disable_timestamp() work · 5fa8bbda
      Eric Dumazet 提交于
      Dmitry reported a warning [1] showing that we were calling
      net_disable_timestamp() -> static_key_slow_dec() from a non
      process context.
      
      Grabbing a mutex while holding a spinlock or rcu_read_lock()
      is not allowed.
      
      As Cong suggested, we now use a work queue.
      
      It is possible netstamp_clear() exits while netstamp_needed_deferred
      is not zero, but it is probably not worth trying to do better than that.
      
      netstamp_needed_deferred atomic tracks the exact number of deferred
      decrements.
      
      [1]
      [ INFO: suspicious RCU usage. ]
      4.10.0-rc5+ #192 Not tainted
      -------------------------------
      ./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
      critical section!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 0
      2 locks held by syz-executor14/23111:
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
      include/net/sock.h:1454 [inline]
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
      rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
       #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
      include/linux/netfilter.h:201 [inline]
       #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
      __ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160
      
      stack backtrace:
      CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
       rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
       ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
       __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
       mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
       atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
       __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
       static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
       net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
       sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
       __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sock_wfree+0xae/0x120 net/core/sock.c:1645
       skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put include/net/inet_frag.h:133 [inline]
       nf_ct_frag6_gather+0x1106/0x3840
      net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook include/linux/netfilter.h:212 [inline]
       __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
       rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x600 net/socket.c:848
       do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
       do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
       vfs_writev+0x87/0xc0 fs/read_write.c:911
       do_writev+0x110/0x2c0 fs/read_write.c:944
       SYSC_writev fs/read_write.c:1017 [inline]
       SyS_writev+0x27/0x30 fs/read_write.c:1014
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x445559
      RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
      RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
      RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
      R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
      BUG: sleeping function called from invalid context at
      kernel/locking/mutex.c:752
      in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
      INFO: lockdep is turned off.
      CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
       __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
       mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
       atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
       __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
       static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
       net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
       sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
       __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sock_wfree+0xae/0x120 net/core/sock.c:1645
       skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put include/net/inet_frag.h:133 [inline]
       nf_ct_frag6_gather+0x1106/0x3840
      net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook include/linux/netfilter.h:212 [inline]
       __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
       rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x600 net/socket.c:848
       do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
       do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
       vfs_writev+0x87/0xc0 fs/read_write.c:911
       do_writev+0x110/0x2c0 fs/read_write.c:944
       SYSC_writev fs/read_write.c:1017 [inline]
       SyS_writev+0x27/0x30 fs/read_write.c:1014
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x445559
      
      Fixes: b90e5794 ("net: dont call jump_label_dec from irq context")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fa8bbda
    • S
      ethtool: do not vzalloc(0) on registers dump · 3808d348
      Stanislaw Gruszka 提交于
      If ->get_regs_len() callback return 0, we allocate 0 bytes of memory,
      what print ugly warning in dmesg, which can be found further below.
      
      This happen on mac80211 devices where ieee80211_get_regs_len() just
      return 0 and driver only fills ethtool_regs structure and actually
      do not provide any dump. However I assume this can happen on other
      drivers i.e. when for some devices driver provide regs dump and for
      others do not. Hence preventing to to print warning in ethtool code
      seems to be reasonable.
      
      ethtool: vmalloc: allocation failure: 0 bytes, mode:0x24080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO)
      <snip>
      Call Trace:
      [<ffffffff813bde47>] dump_stack+0x63/0x8c
      [<ffffffff811b0a1f>] warn_alloc+0x13f/0x170
      [<ffffffff811f0476>] __vmalloc_node_range+0x1e6/0x2c0
      [<ffffffff811f0874>] vzalloc+0x54/0x60
      [<ffffffff8169986c>] dev_ethtool+0xb4c/0x1b30
      [<ffffffff816adbb1>] dev_ioctl+0x181/0x520
      [<ffffffff816714d2>] sock_do_ioctl+0x42/0x50
      <snip>
      Mem-Info:
      active_anon:435809 inactive_anon:173951 isolated_anon:0
       active_file:835822 inactive_file:196932 isolated_file:0
       unevictable:0 dirty:8 writeback:0 unstable:0
       slab_reclaimable:157732 slab_unreclaimable:10022
       mapped:83042 shmem:306356 pagetables:9507 bounce:0
       free:130041 free_pcp:1080 free_cma:0
      Node 0 active_anon:1743236kB inactive_anon:695804kB active_file:3343288kB inactive_file:787728kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:332168kB dirty:32kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1225424kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
      Node 0 DMA free:15900kB min:136kB low:168kB high:200kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      lowmem_reserve[]: 0 3187 7643 7643
      Node 0 DMA32 free:419732kB min:28124kB low:35152kB high:42180kB active_anon:541180kB inactive_anon:248988kB active_file:1466388kB inactive_file:389632kB unevictable:0kB writepending:0kB present:3370280kB managed:3290932kB mlocked:0kB slab_reclaimable:217184kB slab_unreclaimable:4180kB kernel_stack:160kB pagetables:984kB bounce:0kB free_pcp:2236kB local_pcp:660kB free_cma:0kB
      lowmem_reserve[]: 0 0 4456 4456
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3808d348
  7. 03 2月, 2017 2 次提交
  8. 02 2月, 2017 3 次提交
    • F
      skbuff: add and use skb_nfct helper · cb9c6836
      Florian Westphal 提交于
      Followup patch renames skb->nfct and changes its type so add a helper to
      avoid intrusive rename change later.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cb9c6836
    • E
      net: reduce skb_warn_bad_offload() noise · b2504a5d
      Eric Dumazet 提交于
      Dmitry reported warnings occurring in __skb_gso_segment() [1]
      
      All SKB_GSO_DODGY producers can allow user space to feed
      packets that trigger the current check.
      
      We could prevent them from doing so, rejecting packets, but
      this might add regressions to existing programs.
      
      It turns out our SKB_GSO_DODGY handlers properly set up checksum
      information that is needed anyway when packets needs to be segmented.
      
      By checking again skb_needs_check() after skb_mac_gso_segment(),
      we should remove these pesky warnings, at a very minor cost.
      
      With help from Willem de Bruijn
      
      [1]
      WARNING: CPU: 1 PID: 6768 at net/core/dev.c:2439 skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
      lo: caps=(0x000000a2803b7c69, 0x0000000000000000) len=138 data_len=0 gso_size=15883 gso_type=4 ip_summed=0
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 6768 Comm: syz-executor1 Not tainted 4.9.0 #5
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       ffff8801c063ecd8 ffffffff82346bdf ffffffff00000001 1ffff100380c7d2e
       ffffed00380c7d26 0000000041b58ab3 ffffffff84b37e38 ffffffff823468f1
       ffffffff84820740 ffffffff84f289c0 dffffc0000000000 ffff8801c063ee20
      Call Trace:
       [<ffffffff82346bdf>] __dump_stack lib/dump_stack.c:15 [inline]
       [<ffffffff82346bdf>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       [<ffffffff81827e34>] panic+0x1fb/0x412 kernel/panic.c:179
       [<ffffffff8141f704>] __warn+0x1c4/0x1e0 kernel/panic.c:542
       [<ffffffff8141f7e5>] warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:565
       [<ffffffff8356cbaf>] skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
       [<ffffffff83585cd2>] __skb_gso_segment+0x482/0x780 net/core/dev.c:2706
       [<ffffffff83586f19>] skb_gso_segment include/linux/netdevice.h:3985 [inline]
       [<ffffffff83586f19>] validate_xmit_skb+0x5c9/0xc20 net/core/dev.c:2969
       [<ffffffff835892bb>] __dev_queue_xmit+0xe6b/0x1e70 net/core/dev.c:3383
       [<ffffffff8358a2d7>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
       [<ffffffff83ad161d>] packet_snd net/packet/af_packet.c:2930 [inline]
       [<ffffffff83ad161d>] packet_sendmsg+0x32ed/0x4d30 net/packet/af_packet.c:2955
       [<ffffffff834f0aaa>] sock_sendmsg_nosec net/socket.c:621 [inline]
       [<ffffffff834f0aaa>] sock_sendmsg+0xca/0x110 net/socket.c:631
       [<ffffffff834f329a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1954
       [<ffffffff834f5e58>] __sys_sendmsg+0x138/0x300 net/socket.c:1988
       [<ffffffff834f604d>] SYSC_sendmsg net/socket.c:1999 [inline]
       [<ffffffff834f604d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
       [<ffffffff84371941>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov  <dvyukov@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2504a5d
    • T
      rtnetlink: Handle IFLA_MASTER parameter when processing rtnl_newlink · 160ca014
      Theuns Verwoerd 提交于
      Allow a master interface to be specified as one of the parameters when
      creating a new interface via rtnl_newlink.  Previously this would
      require invoking interface creation, waiting for it to complete, and
      then separately binding that new interface to a master.
      
      In particular, this is used when creating a macvlan child interface for
      VRRP in a VRF configuration, allowing the interface creator to specify
      directly what master interface should be inherited by the child,
      without having to deal with asynchronous complications and potential
      race conditions.
      Signed-off-by: NTheuns Verwoerd <theuns.verwoerd@alliedtelesis.co.nz>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      160ca014
  9. 01 2月, 2017 1 次提交
    • A
      net: ethtool: convert large order kmalloc allocations to vzalloc · 4d1ceea8
      Alexei Starovoitov 提交于
      under memory pressure 'ethtool -S' command may warn:
      [ 2374.385195] ethtool: page allocation failure: order:4, mode:0x242c0c0
      [ 2374.405573] CPU: 12 PID: 40211 Comm: ethtool Not tainted
      [ 2374.423071] Call Trace:
      [ 2374.423076]  [<ffffffff8148cb29>] dump_stack+0x4d/0x64
      [ 2374.423080]  [<ffffffff811667cb>] warn_alloc_failed+0xeb/0x150
      [ 2374.423082]  [<ffffffff81169cd3>] ? __alloc_pages_direct_compact+0x43/0xf0
      [ 2374.423084]  [<ffffffff8116a25c>] __alloc_pages_nodemask+0x4dc/0xbf0
      [ 2374.423091]  [<ffffffffa0023dc2>] ? cmd_exec+0x722/0xcd0 [mlx5_core]
      [ 2374.423095]  [<ffffffff811b3dcc>] alloc_pages_current+0x8c/0x110
      [ 2374.423097]  [<ffffffff81168859>] alloc_kmem_pages+0x19/0x90
      [ 2374.423099]  [<ffffffff81186e5e>] kmalloc_order_trace+0x2e/0xe0
      [ 2374.423101]  [<ffffffff811c0084>] __kmalloc+0x204/0x220
      [ 2374.423105]  [<ffffffff816c269e>] dev_ethtool+0xe4e/0x1f80
      [ 2374.423106]  [<ffffffff816b967e>] ? dev_get_by_name_rcu+0x5e/0x80
      [ 2374.423108]  [<ffffffff816d6926>] dev_ioctl+0x156/0x560
      [ 2374.423111]  [<ffffffff811d4c68>] ? mem_cgroup_commit_charge+0x78/0x3c0
      [ 2374.423117]  [<ffffffff8169d542>] sock_do_ioctl+0x42/0x50
      [ 2374.423119]  [<ffffffff8169d9c3>] sock_ioctl+0x1b3/0x250
      [ 2374.423121]  [<ffffffff811f0f42>] do_vfs_ioctl+0x92/0x580
      [ 2374.423123]  [<ffffffff8100222b>] ? do_audit_syscall_entry+0x4b/0x70
      [ 2374.423124]  [<ffffffff8100287c>] ? syscall_trace_enter_phase1+0xfc/0x120
      [ 2374.423126]  [<ffffffff811f14a9>] SyS_ioctl+0x79/0x90
      [ 2374.423127]  [<ffffffff81002bb0>] do_syscall_64+0x50/0xa0
      [ 2374.423129]  [<ffffffff817e19bc>] entry_SYSCALL64_slow_path+0x25/0x25
      
      ~1160 mlx5 counters ~= order 4 allocation which is unlikely to succeed
      under memory pressure. Convert them to vzalloc() as ethtool_get_regs() does.
      Also take care of drivers without counters similar to
      commit 67ae7cf1 ("ethtool: Allow zero-length register dumps again")
      and reduce warn_on to warn_on_once.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d1ceea8
  10. 31 1月, 2017 1 次提交
  11. 30 1月, 2017 1 次提交
  12. 28 1月, 2017 1 次提交
    • E
      net: adjust skb->truesize in pskb_expand_head() · 158f323b
      Eric Dumazet 提交于
      Slava Shwartsman reported a warning in skb_try_coalesce(), when we
      detect skb->truesize is completely wrong.
      
      In his case, issue came from IPv6 reassembly coping with malicious
      datagrams, that forced various pskb_may_pull() to reallocate a bigger
      skb->head than the one allocated by NIC driver before entering GRO
      layer.
      
      Current code does not change skb->truesize, leaving this burden to
      callers if they care enough.
      
      Blindly changing skb->truesize in pskb_expand_head() is not
      easy, as some producers might track skb->truesize, for example
      in xmit path for back pressure feedback (sk->sk_wmem_alloc)
      
      We can detect the cases where it should be safe to change
      skb->truesize :
      
      1) skb is not attached to a socket.
      2) If it is attached to a socket, destructor is sock_edemux()
      
      My audit gave only two callers doing their own skb->truesize
      manipulation.
      
      I had to remove skb parameter in sock_edemux macro when
      CONFIG_INET is not set to avoid a compile error.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NSlava Shwartsman <slavash@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      158f323b
  13. 25 1月, 2017 5 次提交
    • R
      lwtunnel: Fix oops on state free after encap module unload · 85c81401
      Robert Shearman 提交于
      When attempting to free lwtunnel state after the module for the encap
      has been unloaded an oops occurs:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      IP: lwtstate_free+0x18/0x40
      [..]
      task: ffff88003e372380 task.stack: ffffc900001fc000
      RIP: 0010:lwtstate_free+0x18/0x40
      RSP: 0018:ffff88003fd83e88 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff88002bbb3380 RCX: ffff88000c91a300
      [..]
      Call Trace:
       <IRQ>
       free_fib_info_rcu+0x195/0x1a0
       ? rt_fibinfo_free+0x50/0x50
       rcu_process_callbacks+0x2d3/0x850
       ? rcu_process_callbacks+0x296/0x850
       __do_softirq+0xe4/0x4cb
       irq_exit+0xb0/0xc0
       smp_apic_timer_interrupt+0x3d/0x50
       apic_timer_interrupt+0x93/0xa0
      [..]
      Code: e8 6e c6 fc ff 89 d8 5b 5d c3 bb de ff ff ff eb f4 66 90 66 66 66 66 90 55 48 89 e5 53 0f b7 07 48 89 fb 48 8b 04 c5 00 81 d5 81 <48> 8b 40 08 48 85 c0 74 13 ff d0 48 8d 7b 20 be 20 00 00 00 e8
      
      The problem is after the module for the encap can be unloaded the
      corresponding ops is removed and is thus NULL here.
      
      Modules implementing lwtunnel ops should not be allowed to unload
      while there is state alive using those ops, so grab the module
      reference for the ops on creating lwtunnel state and of course release
      the reference when freeing the state.
      
      Fixes: 1104d9ba ("lwtunnel: Add destroy state operation")
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85c81401
    • R
      net: Specify the owning module for lwtunnel ops · 88ff7334
      Robert Shearman 提交于
      Modules implementing lwtunnel ops should not be allowed to unload
      while there is state alive using those ops, so specify the owning
      module for all lwtunnel ops.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88ff7334
    • D
      bpf: allow option for setting bpf_l4_csum_replace from scratch · d1b662ad
      Daniel Borkmann 提交于
      When programs need to calculate the csum from scratch for small UDP
      packets and use bpf_l4_csum_replace() to feed the result from helpers
      like bpf_csum_diff(), then we need a flag besides BPF_F_MARK_MANGLED_0
      that would ignore the case of current csum being 0, and which would
      still allow for the helper to set the csum and transform when needed
      to CSUM_MANGLED_0.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1b662ad
    • D
      bpf: enable load bytes helper for filter/reuseport progs · 2492d3b8
      Daniel Borkmann 提交于
      BPF_PROG_TYPE_SOCKET_FILTER are used in various facilities such as
      for SO_REUSEPORT and packet fanout demuxing, packet filtering, kcm,
      etc, and yet the only facility they can use is BPF_LD with {BPF_ABS,
      BPF_IND} for single byte/half/word access.
      
      Direct packet access is only restricted to tc programs right now,
      but we can still facilitate usage by allowing skb_load_bytes() helper
      added back then in 05c74e5e ("bpf: add bpf_skb_load_bytes helper")
      that calls skb_header_pointer() similarly to bpf_load_pointer(), but
      for stack buffers with larger access size.
      
      Name the previous sk_filter_func_proto() as bpf_base_func_proto()
      since this is used everywhere else as well, similarly for the ctx
      converter, that is, bpf_convert_ctx_access().
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2492d3b8
    • D
      bpf: simplify __is_valid_access test on cb · 4faf940d
      Daniel Borkmann 提交于
      The __is_valid_access() test for cb[] from 62c7989b ("bpf: allow
      b/h/w/dw access for bpf's cb in ctx") was done unnecessarily complex,
      we can just simplify it the same way as recent fix from 2d071c64
      ("bpf, trace: make ctx access checks more robust") did. Overflow can
      never happen as size is 1/2/4/8 depending on access.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4faf940d
  14. 21 1月, 2017 2 次提交
  15. 19 1月, 2017 3 次提交
    • D
      lwtunnel: fix autoload of lwt modules · 9ed59592
      David Ahern 提交于
      Trying to add an mpls encap route when the MPLS modules are not loaded
      hangs. For example:
      
          CONFIG_MPLS=y
          CONFIG_NET_MPLS_GSO=m
          CONFIG_MPLS_ROUTING=m
          CONFIG_MPLS_IPTUNNEL=m
      
          $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
      The ip command hangs:
      root       880   826  0 21:25 pts/0    00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
          $ cat /proc/880/stack
          [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134
          [<ffffffff81065efc>] __request_module+0x27b/0x30a
          [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178
          [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4
          [<ffffffff814ae451>] fib_table_insert+0x90/0x41f
          [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52
          ...
      
      modprobe is trying to load rtnl-lwt-MPLS:
      
      root       881     5  0 21:25 ?        00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS
      
      and it hangs after loading mpls_router:
      
          $ cat /proc/881/stack
          [<ffffffff81441537>] rtnl_lock+0x12/0x14
          [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179
          [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router]
          [<ffffffff81000471>] do_one_initcall+0x8e/0x13f
          [<ffffffff81119961>] do_init_module+0x5a/0x1e5
          [<ffffffff810bd070>] load_module+0x13bd/0x17d6
          ...
      
      The problem is that lwtunnel_build_state is called with rtnl lock
      held preventing mpls_init from registering.
      
      Given the potential references held by the time lwtunnel_build_state it
      can not drop the rtnl lock to the load module. So, extract the module
      loading code from lwtunnel_build_state into a new function to validate
      the encap type. The new function is called while converting the user
      request into a fib_config which is well before any table, device or
      fib entries are examined.
      
      Fixes: 745041e2 ("lwtunnel: autoload of lwt modules")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ed59592
    • E
      net: fix harmonize_features() vs NETIF_F_HIGHDMA · 7be2c82c
      Eric Dumazet 提交于
      Ashizuka reported a highmem oddity and sent a patch for freescale
      fec driver.
      
      But the problem root cause is that core networking stack
      must ensure no skb with highmem fragment is ever sent through
      a device that does not assert NETIF_F_HIGHDMA in its features.
      
      We need to call illegal_highdma() from harmonize_features()
      regardless of CSUM checks.
      
      Fixes: ec5f0615 ("net: Kill link between CSUM and SG features.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Reported-by: N"Ashizuka, Yuusuke" <ashiduka@jp.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7be2c82c
    • E
      net: ethtool: Initialize buffer when querying device channel settings · 31a86d13
      Eran Ben Elisha 提交于
      Ethtool channels respond struct was uninitialized when querying device
      channel boundaries settings. As a result, unreported fields by the driver
      hold garbage.  This may cause sending unsupported params to driver.
      
      Fixes: 8bf36862 ('ethtool: ensure channel counts are within bounds ...')
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      CC: John W. Linville <linville@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31a86d13
  16. 18 1月, 2017 1 次提交
    • R
      net: AF-specific RTM_GETSTATS attributes · aefb4d4a
      Robert Shearman 提交于
      Add the functionality for including address-family-specific per-link
      stats in RTM_GETSTATS messages. This is done through adding a new
      IFLA_STATS_AF_SPEC attribute under which address family attributes are
      nested and then the AF-specific attributes can be further nested. This
      follows the model of IFLA_AF_SPEC on RTM_*LINK messages and it has the
      advantage of presenting an easily extended hierarchy. The rtnl_af_ops
      structure is extended to provide AFs with the opportunity to fill and
      provide the size of their stats attributes.
      
      One alternative would have been to provide AFs with the ability to add
      attributes directly into the RTM_GETSTATS message without a nested
      hierarchy. I discounted this approach as it increases the rate at
      which the 32 attribute number space is used up and it makes
      implementation a little more tricky for stats dump resuming (at the
      moment the order in which attributes are added to the message has to
      match the numeric order of the attributes).
      
      Another alternative would have been to register per-AF RTM_GETSTATS
      handlers. I discounted this approach as I perceived a common use-case
      to be getting all the stats for an interface and this approach would
      necessitate multiple requests/dumps to retrieve them all.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aefb4d4a
  17. 17 1月, 2017 1 次提交
  18. 13 1月, 2017 1 次提交
    • E
      secure_seq: fix sparse errors · c1ce1560
      Eric Dumazet 提交于
      Fixes following warnings :
      
      net/core/secure_seq.c:125:28: warning: incorrect type in argument 1
      (different base types)
      net/core/secure_seq.c:125:28:    expected unsigned int const [unsigned]
      [usertype] a
      net/core/secure_seq.c:125:28:    got restricted __be32 [usertype] saddr
      net/core/secure_seq.c:125:35: warning: incorrect type in argument 2
      (different base types)
      net/core/secure_seq.c:125:35:    expected unsigned int const [unsigned]
      [usertype] b
      net/core/secure_seq.c:125:35:    got restricted __be32 [usertype] daddr
      net/core/secure_seq.c:125:43: warning: cast from restricted __be16
      net/core/secure_seq.c:125:61: warning: restricted __be16 degrades to
      integer
      
      Fixes: 7cd23e53 ("secure_seq: use SipHash in place of MD5")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1ce1560
  19. 12 1月, 2017 4 次提交