1. 18 2月, 2017 3 次提交
    • D
      rtnl: don't account unused struct ifla_port_vsi in rtnl_port_size · 025331df
      Daniel Borkmann 提交于
      When allocating rtnl dump messages, struct ifla_port_vsi is never dumped,
      so we can save header plus payload in rtnl_port_size(). Infact, attribute
      IFLA_PORT_VSI_TYPE and struct ifla_port_vsi are not used anywhere in
      the kernel. We only need to keep the nla policy should applications in
      user space be filling this out. Same NLA_BINARY issue exists as was fixed
      in 364d5716 ("rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY")
      and others, but then again IFLA_PORT_VSI_TYPE is not used anywhere, so
      just add a comment that it's unused.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      025331df
    • D
      bpf: make jited programs visible in traces · 74451e66
      Daniel Borkmann 提交于
      Long standing issue with JITed programs is that stack traces from
      function tracing check whether a given address is kernel code
      through {__,}kernel_text_address(), which checks for code in core
      kernel, modules and dynamically allocated ftrace trampolines. But
      what is still missing is BPF JITed programs (interpreted programs
      are not an issue as __bpf_prog_run() will be attributed to them),
      thus when a stack trace is triggered, the code walking the stack
      won't see any of the JITed ones. The same for address correlation
      done from user space via reading /proc/kallsyms. This is read by
      tools like perf, but the latter is also useful for permanent live
      tracing with eBPF itself in combination with stack maps when other
      eBPF types are part of the callchain. See offwaketime example on
      dumping stack from a map.
      
      This work tries to tackle that issue by making the addresses and
      symbols known to the kernel. The lookup from *kernel_text_address()
      is implemented through a latched RB tree that can be read under
      RCU in fast-path that is also shared for symbol/size/offset lookup
      for a specific given address in kallsyms. The slow-path iteration
      through all symbols in the seq file done via RCU list, which holds
      a tiny fraction of all exported ksyms, usually below 0.1 percent.
      Function symbols are exported as bpf_prog_<tag>, in order to aide
      debugging and attribution. This facility is currently enabled for
      root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening
      is active in any mode. The rationale behind this is that still a lot
      of systems ship with world read permissions on kallsyms thus addresses
      should not get suddenly exposed for them. If that situation gets
      much better in future, we always have the option to change the
      default on this. Likewise, unprivileged programs are not allowed
      to add entries there either, but that is less of a concern as most
      such programs types relevant in this context are for root-only anyway.
      If enabled, call graphs and stack traces will then show a correct
      attribution; one example is illustrated below, where the trace is
      now visible in tooling such as perf script --kallsyms=/proc/kallsyms
      and friends.
      
      Before:
      
        7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so)
      
      After:
      
        7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux)
        [...]
        7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux)
        7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux)
               f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so)
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74451e66
    • D
      bpf: mark all registered map/prog types as __ro_after_init · c78f8bdf
      Daniel Borkmann 提交于
      All map types and prog types are registered to the BPF core through
      bpf_register_map_type() and bpf_register_prog_type() during init and
      remain unchanged thereafter. As by design we don't (and never will)
      have any pluggable code that can register to that at any later point
      in time, lets mark all the existing bpf_{map,prog}_type_list objects
      in the tree as __ro_after_init, so they can be moved to read-only
      section from then onwards.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c78f8bdf
  2. 16 2月, 2017 1 次提交
    • M
      net: neigh: Fix netevent NETEVENT_DELAY_PROBE_TIME_UPDATE notification · 7627ae60
      Marcus Huewe 提交于
      When setting a neigh related sysctl parameter, we always send a
      NETEVENT_DELAY_PROBE_TIME_UPDATE netevent. For instance, when
      executing
      
      	sysctl net.ipv6.neigh.wlp3s0.retrans_time_ms=2000
      
      a NETEVENT_DELAY_PROBE_TIME_UPDATE netevent is generated.
      
      This is caused by commit 2a4501ae ("neigh: Send a
      notification when DELAY_PROBE_TIME changes"). According to the
      commit's description, it was intended to generate such an event
      when setting the "delay_first_probe_time" sysctl parameter.
      
      In order to fix this, only generate this event when actually
      setting the "delay_first_probe_time" sysctl parameter. This fix
      should not have any unintended side-effects, because all but one
      registered netevent callbacks check for other netevent event
      types (the registered callbacks were obtained by grepping for
      "register_netevent_notifier"). The only callback that uses the
      NETEVENT_DELAY_PROBE_TIME_UPDATE event is
      mlxsw_sp_router_netevent_event() (in
      drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c): in case
      of this event, it only accesses the DELAY_PROBE_TIME of the
      passed neigh_parms.
      
      Fixes: 2a4501ae ("neigh: Send a notification when DELAY_PROBE_TIME changes")
      Signed-off-by: NMarcus Huewe <suse-tux@gmx.de>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7627ae60
  3. 15 2月, 2017 1 次提交
  4. 14 2月, 2017 1 次提交
  5. 11 2月, 2017 9 次提交
  6. 09 2月, 2017 1 次提交
  7. 08 2月, 2017 3 次提交
  8. 07 2月, 2017 1 次提交
  9. 06 2月, 2017 1 次提交
  10. 04 2月, 2017 4 次提交
    • E
      net: skb_needs_check() accepts CHECKSUM_NONE for tx · 6e7bc478
      Eric Dumazet 提交于
      My recent change missed fact that UFO would perform a complete
      UDP checksum before segmenting in frags.
      
      In this case skb->ip_summed is set to CHECKSUM_NONE.
      
      We need to add this valid case to skb_needs_check()
      
      Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e7bc478
    • E
      net: remove support for per driver ndo_busy_poll() · 79e7fff4
      Eric Dumazet 提交于
      We added generic support for busy polling in NAPI layer in linux-4.5
      
      No network driver uses ndo_busy_poll() anymore, we can get rid
      of the pointer in struct net_device_ops, and its use in sk_busy_loop()
      
      Saves NETIF_F_BUSY_POLL features bit.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e7fff4
    • E
      net: use a work queue to defer net_disable_timestamp() work · 5fa8bbda
      Eric Dumazet 提交于
      Dmitry reported a warning [1] showing that we were calling
      net_disable_timestamp() -> static_key_slow_dec() from a non
      process context.
      
      Grabbing a mutex while holding a spinlock or rcu_read_lock()
      is not allowed.
      
      As Cong suggested, we now use a work queue.
      
      It is possible netstamp_clear() exits while netstamp_needed_deferred
      is not zero, but it is probably not worth trying to do better than that.
      
      netstamp_needed_deferred atomic tracks the exact number of deferred
      decrements.
      
      [1]
      [ INFO: suspicious RCU usage. ]
      4.10.0-rc5+ #192 Not tainted
      -------------------------------
      ./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
      critical section!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 0
      2 locks held by syz-executor14/23111:
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
      include/net/sock.h:1454 [inline]
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
      rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
       #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
      include/linux/netfilter.h:201 [inline]
       #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
      __ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160
      
      stack backtrace:
      CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
       rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
       ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
       __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
       mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
       atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
       __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
       static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
       net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
       sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
       __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sock_wfree+0xae/0x120 net/core/sock.c:1645
       skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put include/net/inet_frag.h:133 [inline]
       nf_ct_frag6_gather+0x1106/0x3840
      net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook include/linux/netfilter.h:212 [inline]
       __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
       rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x600 net/socket.c:848
       do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
       do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
       vfs_writev+0x87/0xc0 fs/read_write.c:911
       do_writev+0x110/0x2c0 fs/read_write.c:944
       SYSC_writev fs/read_write.c:1017 [inline]
       SyS_writev+0x27/0x30 fs/read_write.c:1014
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x445559
      RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
      RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
      RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
      R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
      BUG: sleeping function called from invalid context at
      kernel/locking/mutex.c:752
      in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
      INFO: lockdep is turned off.
      CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
       __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
       mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
       atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
       __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
       static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
       net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
       sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
       __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sock_wfree+0xae/0x120 net/core/sock.c:1645
       skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put include/net/inet_frag.h:133 [inline]
       nf_ct_frag6_gather+0x1106/0x3840
      net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook include/linux/netfilter.h:212 [inline]
       __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
       rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x600 net/socket.c:848
       do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
       do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
       vfs_writev+0x87/0xc0 fs/read_write.c:911
       do_writev+0x110/0x2c0 fs/read_write.c:944
       SYSC_writev fs/read_write.c:1017 [inline]
       SyS_writev+0x27/0x30 fs/read_write.c:1014
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x445559
      
      Fixes: b90e5794 ("net: dont call jump_label_dec from irq context")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fa8bbda
    • S
      ethtool: do not vzalloc(0) on registers dump · 3808d348
      Stanislaw Gruszka 提交于
      If ->get_regs_len() callback return 0, we allocate 0 bytes of memory,
      what print ugly warning in dmesg, which can be found further below.
      
      This happen on mac80211 devices where ieee80211_get_regs_len() just
      return 0 and driver only fills ethtool_regs structure and actually
      do not provide any dump. However I assume this can happen on other
      drivers i.e. when for some devices driver provide regs dump and for
      others do not. Hence preventing to to print warning in ethtool code
      seems to be reasonable.
      
      ethtool: vmalloc: allocation failure: 0 bytes, mode:0x24080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO)
      <snip>
      Call Trace:
      [<ffffffff813bde47>] dump_stack+0x63/0x8c
      [<ffffffff811b0a1f>] warn_alloc+0x13f/0x170
      [<ffffffff811f0476>] __vmalloc_node_range+0x1e6/0x2c0
      [<ffffffff811f0874>] vzalloc+0x54/0x60
      [<ffffffff8169986c>] dev_ethtool+0xb4c/0x1b30
      [<ffffffff816adbb1>] dev_ioctl+0x181/0x520
      [<ffffffff816714d2>] sock_do_ioctl+0x42/0x50
      <snip>
      Mem-Info:
      active_anon:435809 inactive_anon:173951 isolated_anon:0
       active_file:835822 inactive_file:196932 isolated_file:0
       unevictable:0 dirty:8 writeback:0 unstable:0
       slab_reclaimable:157732 slab_unreclaimable:10022
       mapped:83042 shmem:306356 pagetables:9507 bounce:0
       free:130041 free_pcp:1080 free_cma:0
      Node 0 active_anon:1743236kB inactive_anon:695804kB active_file:3343288kB inactive_file:787728kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:332168kB dirty:32kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1225424kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
      Node 0 DMA free:15900kB min:136kB low:168kB high:200kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      lowmem_reserve[]: 0 3187 7643 7643
      Node 0 DMA32 free:419732kB min:28124kB low:35152kB high:42180kB active_anon:541180kB inactive_anon:248988kB active_file:1466388kB inactive_file:389632kB unevictable:0kB writepending:0kB present:3370280kB managed:3290932kB mlocked:0kB slab_reclaimable:217184kB slab_unreclaimable:4180kB kernel_stack:160kB pagetables:984kB bounce:0kB free_pcp:2236kB local_pcp:660kB free_cma:0kB
      lowmem_reserve[]: 0 0 4456 4456
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3808d348
  11. 03 2月, 2017 2 次提交
  12. 02 2月, 2017 3 次提交
    • F
      skbuff: add and use skb_nfct helper · cb9c6836
      Florian Westphal 提交于
      Followup patch renames skb->nfct and changes its type so add a helper to
      avoid intrusive rename change later.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cb9c6836
    • E
      net: reduce skb_warn_bad_offload() noise · b2504a5d
      Eric Dumazet 提交于
      Dmitry reported warnings occurring in __skb_gso_segment() [1]
      
      All SKB_GSO_DODGY producers can allow user space to feed
      packets that trigger the current check.
      
      We could prevent them from doing so, rejecting packets, but
      this might add regressions to existing programs.
      
      It turns out our SKB_GSO_DODGY handlers properly set up checksum
      information that is needed anyway when packets needs to be segmented.
      
      By checking again skb_needs_check() after skb_mac_gso_segment(),
      we should remove these pesky warnings, at a very minor cost.
      
      With help from Willem de Bruijn
      
      [1]
      WARNING: CPU: 1 PID: 6768 at net/core/dev.c:2439 skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
      lo: caps=(0x000000a2803b7c69, 0x0000000000000000) len=138 data_len=0 gso_size=15883 gso_type=4 ip_summed=0
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 6768 Comm: syz-executor1 Not tainted 4.9.0 #5
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       ffff8801c063ecd8 ffffffff82346bdf ffffffff00000001 1ffff100380c7d2e
       ffffed00380c7d26 0000000041b58ab3 ffffffff84b37e38 ffffffff823468f1
       ffffffff84820740 ffffffff84f289c0 dffffc0000000000 ffff8801c063ee20
      Call Trace:
       [<ffffffff82346bdf>] __dump_stack lib/dump_stack.c:15 [inline]
       [<ffffffff82346bdf>] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       [<ffffffff81827e34>] panic+0x1fb/0x412 kernel/panic.c:179
       [<ffffffff8141f704>] __warn+0x1c4/0x1e0 kernel/panic.c:542
       [<ffffffff8141f7e5>] warn_slowpath_fmt+0xc5/0x100 kernel/panic.c:565
       [<ffffffff8356cbaf>] skb_warn_bad_offload+0x2af/0x390 net/core/dev.c:2434
       [<ffffffff83585cd2>] __skb_gso_segment+0x482/0x780 net/core/dev.c:2706
       [<ffffffff83586f19>] skb_gso_segment include/linux/netdevice.h:3985 [inline]
       [<ffffffff83586f19>] validate_xmit_skb+0x5c9/0xc20 net/core/dev.c:2969
       [<ffffffff835892bb>] __dev_queue_xmit+0xe6b/0x1e70 net/core/dev.c:3383
       [<ffffffff8358a2d7>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3424
       [<ffffffff83ad161d>] packet_snd net/packet/af_packet.c:2930 [inline]
       [<ffffffff83ad161d>] packet_sendmsg+0x32ed/0x4d30 net/packet/af_packet.c:2955
       [<ffffffff834f0aaa>] sock_sendmsg_nosec net/socket.c:621 [inline]
       [<ffffffff834f0aaa>] sock_sendmsg+0xca/0x110 net/socket.c:631
       [<ffffffff834f329a>] ___sys_sendmsg+0x8fa/0x9f0 net/socket.c:1954
       [<ffffffff834f5e58>] __sys_sendmsg+0x138/0x300 net/socket.c:1988
       [<ffffffff834f604d>] SYSC_sendmsg net/socket.c:1999 [inline]
       [<ffffffff834f604d>] SyS_sendmsg+0x2d/0x50 net/socket.c:1995
       [<ffffffff84371941>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov  <dvyukov@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2504a5d
    • T
      rtnetlink: Handle IFLA_MASTER parameter when processing rtnl_newlink · 160ca014
      Theuns Verwoerd 提交于
      Allow a master interface to be specified as one of the parameters when
      creating a new interface via rtnl_newlink.  Previously this would
      require invoking interface creation, waiting for it to complete, and
      then separately binding that new interface to a master.
      
      In particular, this is used when creating a macvlan child interface for
      VRRP in a VRF configuration, allowing the interface creator to specify
      directly what master interface should be inherited by the child,
      without having to deal with asynchronous complications and potential
      race conditions.
      Signed-off-by: NTheuns Verwoerd <theuns.verwoerd@alliedtelesis.co.nz>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      160ca014
  13. 01 2月, 2017 1 次提交
    • A
      net: ethtool: convert large order kmalloc allocations to vzalloc · 4d1ceea8
      Alexei Starovoitov 提交于
      under memory pressure 'ethtool -S' command may warn:
      [ 2374.385195] ethtool: page allocation failure: order:4, mode:0x242c0c0
      [ 2374.405573] CPU: 12 PID: 40211 Comm: ethtool Not tainted
      [ 2374.423071] Call Trace:
      [ 2374.423076]  [<ffffffff8148cb29>] dump_stack+0x4d/0x64
      [ 2374.423080]  [<ffffffff811667cb>] warn_alloc_failed+0xeb/0x150
      [ 2374.423082]  [<ffffffff81169cd3>] ? __alloc_pages_direct_compact+0x43/0xf0
      [ 2374.423084]  [<ffffffff8116a25c>] __alloc_pages_nodemask+0x4dc/0xbf0
      [ 2374.423091]  [<ffffffffa0023dc2>] ? cmd_exec+0x722/0xcd0 [mlx5_core]
      [ 2374.423095]  [<ffffffff811b3dcc>] alloc_pages_current+0x8c/0x110
      [ 2374.423097]  [<ffffffff81168859>] alloc_kmem_pages+0x19/0x90
      [ 2374.423099]  [<ffffffff81186e5e>] kmalloc_order_trace+0x2e/0xe0
      [ 2374.423101]  [<ffffffff811c0084>] __kmalloc+0x204/0x220
      [ 2374.423105]  [<ffffffff816c269e>] dev_ethtool+0xe4e/0x1f80
      [ 2374.423106]  [<ffffffff816b967e>] ? dev_get_by_name_rcu+0x5e/0x80
      [ 2374.423108]  [<ffffffff816d6926>] dev_ioctl+0x156/0x560
      [ 2374.423111]  [<ffffffff811d4c68>] ? mem_cgroup_commit_charge+0x78/0x3c0
      [ 2374.423117]  [<ffffffff8169d542>] sock_do_ioctl+0x42/0x50
      [ 2374.423119]  [<ffffffff8169d9c3>] sock_ioctl+0x1b3/0x250
      [ 2374.423121]  [<ffffffff811f0f42>] do_vfs_ioctl+0x92/0x580
      [ 2374.423123]  [<ffffffff8100222b>] ? do_audit_syscall_entry+0x4b/0x70
      [ 2374.423124]  [<ffffffff8100287c>] ? syscall_trace_enter_phase1+0xfc/0x120
      [ 2374.423126]  [<ffffffff811f14a9>] SyS_ioctl+0x79/0x90
      [ 2374.423127]  [<ffffffff81002bb0>] do_syscall_64+0x50/0xa0
      [ 2374.423129]  [<ffffffff817e19bc>] entry_SYSCALL64_slow_path+0x25/0x25
      
      ~1160 mlx5 counters ~= order 4 allocation which is unlikely to succeed
      under memory pressure. Convert them to vzalloc() as ethtool_get_regs() does.
      Also take care of drivers without counters similar to
      commit 67ae7cf1 ("ethtool: Allow zero-length register dumps again")
      and reduce warn_on to warn_on_once.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d1ceea8
  14. 31 1月, 2017 1 次提交
  15. 30 1月, 2017 1 次提交
  16. 28 1月, 2017 1 次提交
    • E
      net: adjust skb->truesize in pskb_expand_head() · 158f323b
      Eric Dumazet 提交于
      Slava Shwartsman reported a warning in skb_try_coalesce(), when we
      detect skb->truesize is completely wrong.
      
      In his case, issue came from IPv6 reassembly coping with malicious
      datagrams, that forced various pskb_may_pull() to reallocate a bigger
      skb->head than the one allocated by NIC driver before entering GRO
      layer.
      
      Current code does not change skb->truesize, leaving this burden to
      callers if they care enough.
      
      Blindly changing skb->truesize in pskb_expand_head() is not
      easy, as some producers might track skb->truesize, for example
      in xmit path for back pressure feedback (sk->sk_wmem_alloc)
      
      We can detect the cases where it should be safe to change
      skb->truesize :
      
      1) skb is not attached to a socket.
      2) If it is attached to a socket, destructor is sock_edemux()
      
      My audit gave only two callers doing their own skb->truesize
      manipulation.
      
      I had to remove skb parameter in sock_edemux macro when
      CONFIG_INET is not set to avoid a compile error.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NSlava Shwartsman <slavash@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      158f323b
  17. 25 1月, 2017 5 次提交
    • R
      lwtunnel: Fix oops on state free after encap module unload · 85c81401
      Robert Shearman 提交于
      When attempting to free lwtunnel state after the module for the encap
      has been unloaded an oops occurs:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      IP: lwtstate_free+0x18/0x40
      [..]
      task: ffff88003e372380 task.stack: ffffc900001fc000
      RIP: 0010:lwtstate_free+0x18/0x40
      RSP: 0018:ffff88003fd83e88 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff88002bbb3380 RCX: ffff88000c91a300
      [..]
      Call Trace:
       <IRQ>
       free_fib_info_rcu+0x195/0x1a0
       ? rt_fibinfo_free+0x50/0x50
       rcu_process_callbacks+0x2d3/0x850
       ? rcu_process_callbacks+0x296/0x850
       __do_softirq+0xe4/0x4cb
       irq_exit+0xb0/0xc0
       smp_apic_timer_interrupt+0x3d/0x50
       apic_timer_interrupt+0x93/0xa0
      [..]
      Code: e8 6e c6 fc ff 89 d8 5b 5d c3 bb de ff ff ff eb f4 66 90 66 66 66 66 90 55 48 89 e5 53 0f b7 07 48 89 fb 48 8b 04 c5 00 81 d5 81 <48> 8b 40 08 48 85 c0 74 13 ff d0 48 8d 7b 20 be 20 00 00 00 e8
      
      The problem is after the module for the encap can be unloaded the
      corresponding ops is removed and is thus NULL here.
      
      Modules implementing lwtunnel ops should not be allowed to unload
      while there is state alive using those ops, so grab the module
      reference for the ops on creating lwtunnel state and of course release
      the reference when freeing the state.
      
      Fixes: 1104d9ba ("lwtunnel: Add destroy state operation")
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85c81401
    • R
      net: Specify the owning module for lwtunnel ops · 88ff7334
      Robert Shearman 提交于
      Modules implementing lwtunnel ops should not be allowed to unload
      while there is state alive using those ops, so specify the owning
      module for all lwtunnel ops.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88ff7334
    • D
      bpf: allow option for setting bpf_l4_csum_replace from scratch · d1b662ad
      Daniel Borkmann 提交于
      When programs need to calculate the csum from scratch for small UDP
      packets and use bpf_l4_csum_replace() to feed the result from helpers
      like bpf_csum_diff(), then we need a flag besides BPF_F_MARK_MANGLED_0
      that would ignore the case of current csum being 0, and which would
      still allow for the helper to set the csum and transform when needed
      to CSUM_MANGLED_0.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1b662ad
    • D
      bpf: enable load bytes helper for filter/reuseport progs · 2492d3b8
      Daniel Borkmann 提交于
      BPF_PROG_TYPE_SOCKET_FILTER are used in various facilities such as
      for SO_REUSEPORT and packet fanout demuxing, packet filtering, kcm,
      etc, and yet the only facility they can use is BPF_LD with {BPF_ABS,
      BPF_IND} for single byte/half/word access.
      
      Direct packet access is only restricted to tc programs right now,
      but we can still facilitate usage by allowing skb_load_bytes() helper
      added back then in 05c74e5e ("bpf: add bpf_skb_load_bytes helper")
      that calls skb_header_pointer() similarly to bpf_load_pointer(), but
      for stack buffers with larger access size.
      
      Name the previous sk_filter_func_proto() as bpf_base_func_proto()
      since this is used everywhere else as well, similarly for the ctx
      converter, that is, bpf_convert_ctx_access().
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2492d3b8
    • D
      bpf: simplify __is_valid_access test on cb · 4faf940d
      Daniel Borkmann 提交于
      The __is_valid_access() test for cb[] from 62c7989b ("bpf: allow
      b/h/w/dw access for bpf's cb in ctx") was done unnecessarily complex,
      we can just simplify it the same way as recent fix from 2d071c64
      ("bpf, trace: make ctx access checks more robust") did. Overflow can
      never happen as size is 1/2/4/8 depending on access.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4faf940d
  18. 21 1月, 2017 1 次提交