1. 12 7月, 2018 2 次提交
  2. 10 7月, 2018 2 次提交
  3. 09 7月, 2018 1 次提交
    • R
      bpf: include errno.h from bpf-cgroup.h · f292b87d
      Roman Gushchin 提交于
      Commit fdb5c453 ("bpf: fix attach type BPF_LIRC_MODE2 dependency
      wrt CONFIG_CGROUP_BPF") caused some build issues, detected by 0-DAY
      kernel test infrastructure.
      
      The problem is that cgroup_bpf_prog_attach/detach/query() functions
      can return -EINVAL error code, which is not defined. Fix this adding
      errno.h to includes.
      
      Fixes: fdb5c453 ("bpf: fix attach type BPF_LIRC_MODE2 dependency wrt CONFIG_CGROUP_BPF")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Cc: Sean Young <sean@mess.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f292b87d
  4. 08 7月, 2018 14 次提交
    • E
      tcp: cleanup copied_seq and urg_data in tcp_disconnect · 6508b678
      Eric Dumazet 提交于
      tcp_zerocopy_receive() relies on tcp_inq() to limit number of bytes
      requested by user.
      
      syzbot found that after tcp_disconnect(), tcp_inq() was returning
      a stale value (number of bytes in queue before the disconnect).
      
      Note that after this patch, ioctl(fd, SIOCINQ, &val) is also fixed
      and returns 0, so this might be a candidate for all known linux kernels.
      
      While we are at this, we probably also should clear urg_data to
      avoid other syzkaller reports after it discovers how to deal with
      urgent data.
      
      syzkaller repro :
      
      socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
      bind(3, {sa_family=AF_INET, sin_port=htons(20000), sin_addr=inet_addr("224.0.0.1")}, 16) = 0
      connect(3, {sa_family=AF_INET, sin_port=htons(20000), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
      send(3, ..., 4096, 0) = 4096
      connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 128) = 0
      getsockopt(3, SOL_TCP, TCP_ZEROCOPY_RECEIVE, ..., [16]) = 0 // CRASH
      
      Fixes: 05255b82 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6508b678
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 7f93d129
      David S. Miller 提交于
      Alexei Starovoitov says:
      
      ====================
      pull-request: bpf 2018-07-07
      
      The following pull-request contains BPF updates for your *net* tree.
      
      Plenty of fixes for different components:
      
      1) A set of critical fixes for sockmap and sockhash, from John Fastabend.
      
      2) fixes for several race conditions in af_xdp, from Magnus Karlsson.
      
      3) hash map refcnt fix, from Mauricio Vasquez.
      
      4) samples/bpf fixes, from Taeung Song.
      
      5) ifup+mtu check for xdp_redirect, from Toshiaki Makita.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f93d129
    • P
      ipfrag: really prevent allocation on netns exit · f6f2a4a2
      Paolo Abeni 提交于
      Setting the low threshold to 0 has no effect on frags allocation,
      we need to clear high_thresh instead.
      
      The code was pre-existent to commit 648700f7 ("inet: frags:
      use rhashtables for reassembly units"), but before the above,
      such assignment had a different role: prevent concurrent eviction
      from the worker and the netns cleanup helper.
      
      Fixes: 648700f7 ("inet: frags: use rhashtables for reassembly units")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6f2a4a2
    • L
      net: diag: Don't double-free TCP_NEW_SYN_RECV sockets in tcp_abort · acc2cf4e
      Lorenzo Colitti 提交于
      When tcp_diag_destroy closes a TCP_NEW_SYN_RECV socket, it first
      frees it by calling inet_csk_reqsk_queue_drop_and_and_put in
      tcp_abort, and then frees it again by calling sock_gen_put.
      
      Since tcp_abort only has one caller, and all the other codepaths
      in tcp_abort don't free the socket, just remove the free in that
      function.
      
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Tested: passes Android sock_diag_test.py, which exercises this codepath
      Fixes: d7226c7a ("net: diag: Fix refcnt leak in error path destroying socket")
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acc2cf4e
    • D
      net/ipv4: Set oif in fib_compute_spec_dst · e7372197
      David Ahern 提交于
      Xin reported that icmp replies may not use the address on the device the
      echo request is received if the destination address is broadcast. Instead
      a route lookup is done without considering VRF context. Fix by setting
      oif in flow struct to the master device if it is enslaved. That directs
      the lookup to the VRF table. If the device is not enslaved, oif is still
      0 so no affect.
      
      Fixes: cd2fbe1b ("net: Use VRF device index for lookups on RX")
      Reported-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7372197
    • T
      xdp: XDP_REDIRECT should check IFF_UP and MTU · d8d7218a
      Toshiaki Makita 提交于
      Otherwise we end up with attempting to send packets from down devices
      or to send oversized packets, which may cause unexpected driver/device
      behaviour. Generic XDP has already done this check, so reuse the logic
      in native XDP.
      
      Fixes: 814abfab ("xdp: add bpf_redirect helper function")
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d8d7218a
    • A
      Merge branch 'sockhash-fixes' · 4fb126cb
      Alexei Starovoitov 提交于
      John Fastabend says:
      
      ====================
      First three patches resolve issues found while testing sockhash and
      reviewing code. Syzbot also found them about the same time as I was
      working on fixes. The main issue is in the sockhash path we reduced
      the scope of sk_callback lock but this meant we could get update and
      close running in parallel so fix that here.
      
      Then testing sk_msg and sk_skb programs together found that skb->dev
      is not always assigned and some of the helpers were depending on this
      to lookup max mtu. Fix this by using SKB_MAX_ALLOC when no MTU is
      available.
      
      Finally, Martin spotted that the sockmap code was still using the
      qdisc skb cb structure. But I was sure we had fixed this long ago.
      Looks like we missed it in a merge conflict resolution and then by
      chance data_end offset was the same in both structures so everything
      sort of continued to work even though it could break at any moment
      if the structs ever change. So redo the conversion and this time
      also convert the helpers.
      
      v2: fix '0 files changed' issue in patches
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      4fb126cb
    • J
      bpf: sockmap, convert bpf_compute_data_pointers to bpf_*_sk_skb · 0ea488ff
      John Fastabend 提交于
      In commit
      
        'bpf: bpf_compute_data uses incorrect cb structure' (8108a775)
      
      we added the routine bpf_compute_data_end_sk_skb() to compute the
      correct data_end values, but this has since been lost. In kernel
      v4.14 this was correct and the above patch was applied in it
      entirety. Then when v4.14 was merged into v4.15-rc1 net-next tree
      we lost the piece that renamed bpf_compute_data_pointers to the
      new function bpf_compute_data_end_sk_skb. This was done here,
      
      e1ea2f98 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
      
      When it conflicted with the following rename patch,
      
      6aaae2b6 ("bpf: rename bpf_compute_data_end into bpf_compute_data_pointers")
      
      Finally, after a refactor I thought even the function
      bpf_compute_data_end_sk_skb() was no longer needed and it was
      erroneously removed.
      
      However, we never reverted the sk_skb_convert_ctx_access() usage of
      tcp_skb_cb which had been committed and survived the merge conflict.
      Here we fix this by adding back the helper and *_data_end_sk_skb()
      usage. Using the bpf_skc_data_end mapping is not correct because it
      expects a qdisc_skb_cb object but at the sock layer this is not the
      case. Even though it happens to work here because we don't overwrite
      any data in-use at the socket layer and the cb structure is cleared
      later this has potential to create some subtle issues. But, even
      more concretely the filter.c access check uses tcp_skb_cb.
      
      And by some act of chance though,
      
      struct bpf_skb_data_end {
              struct qdisc_skb_cb        qdisc_cb;             /*     0    28 */
      
              /* XXX 4 bytes hole, try to pack */
      
              void *                     data_meta;            /*    32     8 */
              void *                     data_end;             /*    40     8 */
      
              /* size: 48, cachelines: 1, members: 3 */
              /* sum members: 44, holes: 1, sum holes: 4 */
              /* last cacheline: 48 bytes */
      };
      
      and then tcp_skb_cb,
      
      struct tcp_skb_cb {
      	[...]
                      struct {
                              __u32      flags;                /*    24     4 */
                              struct sock * sk_redir;          /*    32     8 */
                              void *     data_end;             /*    40     8 */
                      } bpf;                                   /*          24 */
              };
      
      So when we use offset_of() to track down the byte offset we get 40 in
      either case and everything continues to work. Fix this mess and use
      correct structures its unclear how long this might actually work for
      until someone moves the structs around.
      Reported-by: NMartin KaFai Lau <kafai@fb.com>
      Fixes: e1ea2f98 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
      Fixes: 6aaae2b6 ("bpf: rename bpf_compute_data_end into bpf_compute_data_pointers")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      0ea488ff
    • J
      bpf: sockmap, consume_skb in close path · 7ebc14d5
      John Fastabend 提交于
      Currently, when a sock is closed and the bpf_tcp_close() callback is
      used we remove memory but do not free the skb. Call consume_skb() if
      the skb is attached to the buffer.
      
      Reported-by: syzbot+d464d2c20c717ef5a6a8@syzkaller.appspotmail.com
      Fixes: 1aa12bdf ("bpf: sockmap, add sock close() hook to remove socks")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      7ebc14d5
    • J
      bpf: sockhash, disallow bpf_tcp_close and update in parallel · 99ba2b5a
      John Fastabend 提交于
      After latest lock updates there is no longer anything preventing a
      close and recvmsg call running in parallel. Additionally, we can
      race update with close if we close a socket and simultaneously update
      if via the BPF userspace API (note the cgroup ops are already run
      with sock_lock held).
      
      To resolve this take sock_lock in close and update paths.
      
      Reported-by: syzbot+b680e42077a0d7c9a0c4@syzkaller.appspotmail.com
      Fixes: e9db4ef6 ("bpf: sockhash fix omitted bucket lock in sock_close")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      99ba2b5a
    • J
      bpf: fix sk_skb programs without skb->dev assigned · 0c6bc6e5
      John Fastabend 提交于
      Multiple BPF helpers in use by sk_skb programs calculate the max
      skb length using the __bpf_skb_max_len function. However, this
      calculates the max length using the skb->dev pointer which can be
      NULL when an sk_skb program is paired with an sk_msg program.
      
      To force this a sk_msg program needs to redirect into the ingress
      path of a sock with an attach sk_skb program. Then the the sk_skb
      program would need to call one of the helpers that adjust the skb
      size.
      
      To fix the null ptr dereference use SKB_MAX_ALLOC size if no dev
      is available.
      
      Fixes: 8934ce2f ("bpf: sockmap redirect ingress support")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      0c6bc6e5
    • A
      Merge branch 'sockmap-fixes' · 631da853
      Alexei Starovoitov 提交于
      John Fastabend says:
      
      ====================
      I missed fixing the error path in the sockhash code to align with
      supporting socks in multiple maps. Simply checking if the psock is
      present does not mean we can decrement the reference count because
      it could be part of another map. Fix this by cleaning up the error
      path so this situation does not happen.
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      631da853
    • J
      bpf: sockmap, hash table is RCU so readers do not need locks · 1d1ef005
      John Fastabend 提交于
      This removes locking from readers of RCU hash table. Its not
      necessary.
      
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      1d1ef005
    • J
      bpf: sockmap, error path can not release psock in multi-map case · 547b3aa4
      John Fastabend 提交于
      The current code, in the error path of sock_hash_ctx_update_elem,
      checks if the sock has a psock in the user data and if so decrements
      the reference count of the psock. However, if the error happens early
      in the error path we may have never incremented the psock reference
      count and if the psock exists because the sock is in another map then
      we may inadvertently decrement the reference count.
      
      Fix this by making the error path only call smap_release_sock if the
      error happens after the increment.
      
      Reported-by: syzbot+d464d2c20c717ef5a6a8@syzkaller.appspotmail.com
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      547b3aa4
  5. 07 7月, 2018 21 次提交
    • D
      Merge branch 'net-sched-fix-NULL-dereference-in-goto-chain-control-action' · de508f8b
      David S. Miller 提交于
      Davide Caratti says:
      
      ====================
      net/sched: fix NULL dereference in 'goto chain' control action
      
      in a couple of TC actions (i.e. csum and tunnel_key), the control action
      is stored together with the action-specific configuration data.
      This avoids a race condition (see [1]), but it causes a crash when 'goto
      chain' is used with the above actions. Since this race condition is
      tolerated on the other TC actions (it's present even on actions where the
      spinlock is still used), storing the control action in the common area
      should be acceptable for tunnel_key and csum as well.
      
      [1] https://www.spinics.net/lists/netdev/msg472047.html
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de508f8b
    • D
      net/sched: act_tunnel_key: fix NULL dereference when 'goto chain' is used · 38230a3e
      Davide Caratti 提交于
      the control action in the common member of struct tcf_tunnel_key must be a
      valid value, as it can contain the chain index when 'goto chain' is used.
      Ensure that the control action can be read as x->tcfa_action, when x is a
      pointer to struct tc_action and x->ops->type is TCA_ACT_TUNNEL_KEY, to
      prevent the following command:
      
       # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
       > $tcflags dst_mac $h2mac action tunnel_key unset goto chain 1
      
      from causing a NULL dereference when a matching packet is received:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       PGD 80000001097ac067 P4D 80000001097ac067 PUD 103b0a067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 0 PID: 3491 Comm: mausezahn Tainted: G            E     4.18.0-rc2.auguri+ #421
       Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.58 02/07/2013
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246
       RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001
       RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
       RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c
       R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800
       R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40
       FS:  00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0
       Call Trace:
        <IRQ>
        fl_classify+0x1ad/0x1c0 [cls_flower]
        ? __update_load_avg_se.isra.47+0x1ca/0x1d0
        ? __update_load_avg_se.isra.47+0x1ca/0x1d0
        ? update_load_avg+0x665/0x690
        ? update_load_avg+0x665/0x690
        ? kmem_cache_alloc+0x38/0x1c0
        tcf_classify+0x89/0x140
        __netif_receive_skb_core+0x5ea/0xb70
        ? enqueue_entity+0xd0/0x270
        ? process_backlog+0x97/0x150
        process_backlog+0x97/0x150
        net_rx_action+0x14b/0x3e0
        __do_softirq+0xde/0x2b4
        do_softirq_own_stack+0x2a/0x40
        </IRQ>
        do_softirq.part.18+0x49/0x50
        __local_bh_enable_ip+0x49/0x50
        __dev_queue_xmit+0x4ab/0x8a0
        ? wait_woken+0x80/0x80
        ? packet_sendmsg+0x38f/0x810
        ? __dev_queue_xmit+0x8a0/0x8a0
        packet_sendmsg+0x38f/0x810
        sock_sendmsg+0x36/0x40
        __sys_sendto+0x10e/0x140
        ? do_vfs_ioctl+0xa4/0x630
        ? syscall_trace_enter+0x1df/0x2e0
        ? __audit_syscall_exit+0x22a/0x290
        __x64_sys_sendto+0x24/0x30
        do_syscall_64+0x5b/0x180
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fd67e18dc93
       Code: 48 8b 0d 18 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24
       RSP: 002b:00007ffe0189b748 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 00000000020ca010 RCX: 00007fd67e18dc93
       RDX: 0000000000000062 RSI: 00000000020ca322 RDI: 0000000000000003
       RBP: 00007ffe0189b780 R08: 00007ffe0189b760 R09: 0000000000000014
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000062
       R13: 00000000020ca322 R14: 00007ffe0189b760 R15: 0000000000000003
       Modules linked in: act_tunnel_key act_gact cls_flower sch_ingress vrf veth act_csum(E) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_realtek coretemp snd_hda_codec_generic kvm_intel kvm irqbypass snd_hda_intel crct10dif_pclmul crc32_pclmul hp_wmi ghash_clmulni_intel pcbc snd_hda_codec aesni_intel sparse_keymap rfkill snd_hda_core snd_hwdep snd_seq crypto_simd iTCO_wdt gpio_ich iTCO_vendor_support wmi_bmof cryptd mei_wdt glue_helper snd_seq_device snd_pcm pcspkr snd_timer snd i2c_i801 lpc_ich sg soundcore wmi mei_me
        mei ie31200_edac nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod sr_mod cdrom i915 video i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci crc32c_intel libahci serio_raw sfc libata mtd drm ixgbe mdio i2c_core e1000e dca
       CR2: 0000000000000000
       ---[ end trace 1ab8b5b5d4639dfc ]---
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffff95145ea03c40 EFLAGS: 00010246
       RAX: 0000000020000001 RBX: ffff9514499e5800 RCX: 0000000000000001
       RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
       RBP: ffff95145ea03e60 R08: 0000000000000000 R09: ffff95145ea03c9c
       R10: ffff95145ea03c78 R11: 0000000000000008 R12: ffff951456a69800
       R13: ffff951456a69808 R14: 0000000000000001 R15: ffff95144965ee40
       FS:  00007fd67ee11740(0000) GS:ffff95145ea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 00000001038a2006 CR4: 00000000001606f0
       Kernel panic - not syncing: Fatal exception in interrupt
       Kernel Offset: 0x11400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fixes: d0f6dd8a ("net/sched: Introduce act_tunnel_key")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38230a3e
    • D
      net/sched: act_csum: fix NULL dereference when 'goto chain' is used · 11a245e2
      Davide Caratti 提交于
      the control action in the common member of struct tcf_csum must be a valid
      value, as it can contain the chain index when 'goto chain' is used. Ensure
      that the control action can be read as x->tcfa_action, when x is a pointer
      to struct tc_action and x->ops->type is TCA_ACT_CSUM, to prevent the
      following command:
      
        # tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
        > $tcflags dst_mac $h2mac action csum ip or tcp or udp or sctp goto chain 1
      
      from triggering a NULL pointer dereference when a matching packet is
      received.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       PGD 800000010416b067 P4D 800000010416b067 PUD 1041be067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 0 PID: 3072 Comm: mausezahn Tainted: G            E     4.18.0-rc2.auguri+ #421
       Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.58 02/07/2013
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffffa020dea03c40 EFLAGS: 00010246
       RAX: 0000000020000001 RBX: ffffa020d7ccef00 RCX: 0000000000000054
       RDX: 0000000000000000 RSI: ffffa020ca5ae000 RDI: ffffa020d7ccef00
       RBP: ffffa020dea03e60 R08: 0000000000000000 R09: ffffa020dea03c9c
       R10: ffffa020dea03c78 R11: 0000000000000008 R12: ffffa020d3fe4f00
       R13: ffffa020d3fe4f08 R14: 0000000000000001 R15: ffffa020d53ca300
       FS:  00007f5a46942740(0000) GS:ffffa020dea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 0000000104218002 CR4: 00000000001606f0
       Call Trace:
        <IRQ>
        fl_classify+0x1ad/0x1c0 [cls_flower]
        ? arp_rcv+0x121/0x1b0
        ? __x2apic_send_IPI_dest+0x40/0x40
        ? smp_reschedule_interrupt+0x1c/0xd0
        ? reschedule_interrupt+0xf/0x20
        ? reschedule_interrupt+0xa/0x20
        ? device_is_rmrr_locked+0xe/0x50
        ? iommu_should_identity_map+0x49/0xd0
        ? __intel_map_single+0x30/0x140
        ? e1000e_update_rdt_wa.isra.52+0x22/0xb0 [e1000e]
        ? e1000_alloc_rx_buffers+0x233/0x250 [e1000e]
        ? kmem_cache_alloc+0x38/0x1c0
        tcf_classify+0x89/0x140
        __netif_receive_skb_core+0x5ea/0xb70
        ? enqueue_task_fair+0xb6/0x7d0
        ? process_backlog+0x97/0x150
        process_backlog+0x97/0x150
        net_rx_action+0x14b/0x3e0
        __do_softirq+0xde/0x2b4
        do_softirq_own_stack+0x2a/0x40
        </IRQ>
        do_softirq.part.18+0x49/0x50
        __local_bh_enable_ip+0x49/0x50
        __dev_queue_xmit+0x4ab/0x8a0
        ? wait_woken+0x80/0x80
        ? packet_sendmsg+0x38f/0x810
        ? __dev_queue_xmit+0x8a0/0x8a0
        packet_sendmsg+0x38f/0x810
        sock_sendmsg+0x36/0x40
        __sys_sendto+0x10e/0x140
        ? do_vfs_ioctl+0xa4/0x630
        ? syscall_trace_enter+0x1df/0x2e0
        ? __audit_syscall_exit+0x22a/0x290
        __x64_sys_sendto+0x24/0x30
        do_syscall_64+0x5b/0x180
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f5a45cbec93
       Code: 48 8b 0d 18 83 20 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 59 c7 20 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 2b f7 ff ff 48 89 04 24
       RSP: 002b:00007ffd0ee6d748 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000001161010 RCX: 00007f5a45cbec93
       RDX: 0000000000000062 RSI: 0000000001161322 RDI: 0000000000000003
       RBP: 00007ffd0ee6d780 R08: 00007ffd0ee6d760 R09: 0000000000000014
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000062
       R13: 0000000001161322 R14: 00007ffd0ee6d760 R15: 0000000000000003
       Modules linked in: act_csum act_gact cls_flower sch_ingress vrf veth act_tunnel_key(E) xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi snd_hda_codec_realtek kvm snd_hda_codec_generic hp_wmi iTCO_wdt sparse_keymap rfkill mei_wdt iTCO_vendor_support wmi_bmof gpio_ich irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_intel crypto_simd cryptd snd_hda_codec glue_helper snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm pcspkr i2c_i801 snd_timer snd sg lpc_ich soundcore wmi mei_me
        mei ie31200_edac nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ahci libahci crc32c_intel i915 ixgbe serio_raw libata video dca i2c_algo_bit sfc drm_kms_helper syscopyarea mtd sysfillrect mdio sysimgblt fb_sys_fops drm e1000e i2c_core
       CR2: 0000000000000000
       ---[ end trace 3c9e9d1a77df4026 ]---
       RIP: 0010:tcf_action_exec+0xb8/0x100
       Code: 00 00 00 20 74 1d 83 f8 03 75 09 49 83 c4 08 4d 39 ec 75 bc 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 49 8b 97 a8 00 00 00 <48> 8b 12 48 89 55 00 48 83 c4 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3
       RSP: 0018:ffffa020dea03c40 EFLAGS: 00010246
       RAX: 0000000020000001 RBX: ffffa020d7ccef00 RCX: 0000000000000054
       RDX: 0000000000000000 RSI: ffffa020ca5ae000 RDI: ffffa020d7ccef00
       RBP: ffffa020dea03e60 R08: 0000000000000000 R09: ffffa020dea03c9c
       R10: ffffa020dea03c78 R11: 0000000000000008 R12: ffffa020d3fe4f00
       R13: ffffa020d3fe4f08 R14: 0000000000000001 R15: ffffa020d53ca300
       FS:  00007f5a46942740(0000) GS:ffffa020dea00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 0000000104218002 CR4: 00000000001606f0
       Kernel panic - not syncing: Fatal exception in interrupt
       Kernel Offset: 0x26400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
       ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
      
      Fixes: 9c5f69bb ("net/sched: act_csum: don't use spinlock in the fast path")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11a245e2
    • H
      net: macb: Allocate valid memory for TX and RX BD prefetch · 404cd086
      Harini Katakam 提交于
      GEM version in ZynqMP and most versions greater than r1p07 supports
      TX and RX BD prefetch. The number of BDs that can be prefetched is a
      HW configurable parameter. For ZynqMP, this parameter is 4.
      
      When GEM DMA is accessing the last BD in the ring, even before the
      BD is processed and the WRAP bit is noticed, it will have prefetched
      BDs outside the BD ring. These will not be processed but it is
      necessary to have accessible memory after the last BD. Especially
      in cases where SMMU is used, memory locations immediately after the
      last BD may not have translation tables triggering HRESP errors. Hence
      always allocate extra BDs to accommodate for prefetch.
      The value of tx/rx bd prefetch for any given SoC version is:
      2 ^ (corresponding field in design config 10 register).
      (value of this field >= 1)
      
      Added a capability flag so that older IP versions that do not have
      DCFG10 or this prefetch capability are not affected.
      Signed-off-by: NHarini Katakam <harini.katakam@xilinx.com>
      Reviewed-by: NClaudiu Beznea <claudiu.beznea@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      404cd086
    • H
      net: macb: Free RX ring for all queues · e50b770e
      Harini Katakam 提交于
      rx ring is allocated for all queues in macb_alloc_consistent.
      Free the same for all queues instead of just Q0.
      Signed-off-by: NHarini Katakam <harini.katakam@xilinx.com>
      Reviewed-by: NClaudiu Beznea <claudiu.beznea@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e50b770e
    • U
      net/smc: reduce sock_put() for fallback sockets · e1bbdd57
      Ursula Braun 提交于
      smc_release() calls a sock_put() for smc fallback sockets to cover
      the passive closing sock_hold() in __smc_connect() and
      smc_tcp_listen_work(). This does not make sense for sockets in state
      SMC_LISTEN and SMC_INIT.
      An SMC socket stays in state SMC_INIT if connect fails. The sock_put
      in smc_connect_abort() does not cover all failures. Move it into
      smc_connect_decline_fallback().
      
      Fixes: ee9dfbef ("net/smc: handle sockopts forcing fallback")
      Reported-by: syzbot+3a0748c8f2f210c0ef9b@syzkaller.appspotmail.com
      Reported-by: syzbot+9e60d2428a42049a592a@syzkaller.appspotmail.com
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1bbdd57
    • A
      net: bridge: fix br_vlan_get_{pvid,info} return values · 000244d3
      Arnd Bergmann 提交于
      These two functions return the regular -EINVAL failure in the normal
      code path, but return a nonstandard '-1' error otherwise, which gets
      interpreted as -EPERM.
      
      Let's change it to -EINVAL for the dummy functions as well.
      
      Fixes: 4d4fd361 ("net: bridge: Publish bridge accessor functions")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      000244d3
    • C
      cxgb4: assume flash part size to be 4MB, if it can't be determined · 843789f6
      Casey Leedom 提交于
      t4_get_flash_params() fails in a fatal fashion if the FLASH part isn't
      one of the recognized parts. But this leads to desperate efforts to update
      drivers when various FLASH parts which we are using suddenly become
      unavailable and we need to substitute new FLASH parts.  This has lead to
      more than one Customer Field Emergency when a Customer has an old driver
      and suddenly can't use newly shipped adapters.
      
      This commit fixes this by simply assuming that the FLASH part is 4MB in
      size if it can't be identified. Note that all Chelsio adapters will have
      flash parts which are at least 4MB in size.
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      843789f6
    • D
      Merge branch 'tipc-dad-fixes' · 7f978e85
      David S. Miller 提交于
      Jon Maloy says:
      
      ====================
      tipc: fixes in duplicate address discovery function
      
      commit 25b0b9c4 ("tipc: handle collisions of 32-bit node address
      hash values") introduced new functionality that has turned out to
      contain several bugs and weaknesses.
      
      We address those in this series.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f978e85
    • J
      tipc: make function tipc_net_finalize() thread safe · 9faa89d4
      Jon Maloy 提交于
      The setting of the node address is not thread safe, meaning that
      two discoverers may decide to set it simultanously, with a duplicate
      entry in the name table as result. We fix that with this commit.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9faa89d4
    • J
      tipc: fix correct setting of message type in second discoverer · 92018c7c
      Jon Maloy 提交于
      The duplicate address discovery protocol is not safe against two
      discoverers running in parallel. The one executing first after the
      trial period is over will set the node address and change its own
      message type to DSC_REQ_MSG. The one executing last may find that the
      node address is already set, and never change message type, with the
      result that its links may never be established.
      
      In this commmit we ensure that the message type always is set correctly
      after the trial period is over.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92018c7c
    • J
      tipc: correct discovery message handling during address trial period · e415577f
      Jon Maloy 提交于
      With the duplicate address discovery protocol for tipc nodes addresses
      we introduced a one second trial period before a node is allocated a
      hash number to use as address.
      
      Unfortunately, we miss to handle the case when a regular LINK REQUEST/
      RESPONSE arrives from a cluster node during the trial period. Such
      messages are not ignored as they should be, leading to links setup
      attempts while the node still has no address.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e415577f
    • J
      tipc: fix wrong return value from function tipc_node_try_addr() · 2a57f182
      Jon Maloy 提交于
      The function for checking if there is an node address conflict is
      supposed to return a suggestion for a new address if it finds a
      conflict, and zero otherwise. But in case the peer being checked
      is previously unknown it does instead return a "suggestion" for
      the checked address itself. This results in a DSC_TRIAL_FAIL_MSG
      being sent unecessarily to the peer, and sometimes makes the trial
      period starting over again.
      
      Fixes: 25b0b9c4 ("tipc: handle collisions of 32-bit node address hash values")
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a57f182
    • D
      Merge branch 'ravb-sh_eth-fix-sleep-in-atomic-by-reusing-shared-ethtool-handlers' · 0f62aeec
      David S. Miller 提交于
      Vladimir Zapolskiy says:
      
      ====================
      ravb/sh_eth: fix sleep in atomic by reusing shared ethtool handlers
      
      For ages trivial changes to RAVB and SuperH ethernet links by means of
      standard 'ethtool' trigger a 'sleeping function called from invalid
      context' bug, to visualize it on r8a7795 ULCB:
      
        % ethtool -r eth0
        BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747
        in_atomic(): 1, irqs_disabled(): 128, pid: 554, name: ethtool
        INFO: lockdep is turned off.
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>]           (null)
        hardirqs last disabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
        softirqs last  enabled at (0): [<ffff0000080e1d3c>] copy_process.isra.7.part.8+0x2cc/0x1918
        softirqs last disabled at (0): [<0000000000000000>]           (null)
        CPU: 5 PID: 554 Comm: ethtool Not tainted 4.17.0-rc4-arm64-renesas+ #33
        Hardware name: Renesas H3ULCB board based on r8a7795 ES2.0+ (DT)
        Call trace:
         dump_backtrace+0x0/0x198
         show_stack+0x24/0x30
         dump_stack+0xb8/0xf4
         ___might_sleep+0x1c8/0x1f8
         __might_sleep+0x58/0x90
         __mutex_lock+0x50/0x890
         mutex_lock_nested+0x3c/0x50
         phy_start_aneg_priv+0x38/0x180
         phy_start_aneg+0x24/0x30
         ravb_nway_reset+0x3c/0x68
         dev_ethtool+0x3dc/0x2338
         dev_ioctl+0x19c/0x490
         sock_do_ioctl+0xe0/0x238
         sock_ioctl+0x254/0x460
         do_vfs_ioctl+0xb0/0x918
         ksys_ioctl+0x50/0x80
         sys_ioctl+0x34/0x48
         __sys_trace_return+0x0/0x4
      
      The root cause is that an attempt to modify ECMR and GECMR registers
      only when RX/TX function is disabled was too overcomplicated in its
      original implementation, also processing of an optional Link Change
      interrupt added even more complexity, as a result the implementation
      was error prone.
      
      The new locking scheme is confirmed to be correct by dumping driver
      specific and generic PHY framework function calls with aid of ftrace
      while running more or less advanced tests.
      
      Please note that sh_eth patches from the series were built-tested only.
      
      On purpose I do not add Fixes tags, the reused PHY handlers were added
      way later than the fixed problems were firstly found in the drivers.
      
      Changes from v1 to v2:
      * the original patches are split to bugfixes and enhancements only,
        both v1 and v2 series are absolutely equal in total, thus I omit
        description of changes in individual patches,
      * the latter implies that there should be no strict need for retesting,
        but because formally two series are different, I have to drop the tags
        given by Geert and Andrew, please send your tags again.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f62aeec
    • V
      ravb: remove custom .set_link_ksettings from ethtool ops · 44f3d558
      Vladimir Zapolskiy 提交于
      The generic phy_ethtool_set_link_ksettings() function from phylib can
      be used instead of in-house ravb_set_link_ksettings().
      Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Reviewed-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44f3d558
    • V
      ravb: remove custom .get_link_ksettings from ethtool ops · 468e40b5
      Vladimir Zapolskiy 提交于
      The generic phy_ethtool_get_link_ksettings() function from phylib can be
      used instead of in-house ravb_get_link_ksettings().
      Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Reviewed-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      468e40b5
    • V
      ravb: remove useless serialization in ravb_get_link_ksettings() · efdf7511
      Vladimir Zapolskiy 提交于
      phy_ethtool_ksettings_get() call does not modify device state or device
      driver state, hence there is no need to utilize a driver specific
      spinlock.
      Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Reviewed-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efdf7511
    • V
      ravb: remove custom .nway_reset from ethtool ops · eeb07284
      Vladimir Zapolskiy 提交于
      The generic phy_ethtool_nway_reset() function from phylib can be used
      instead of in-house ravb_nway_reset().
      Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Reviewed-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeb07284
    • V
      ravb: simplify link auto-negotiation by ethtool · 2a150c50
      Vladimir Zapolskiy 提交于
      There is no need to call a heavyweight phy_start_aneg() for phy
      auto-negotiation by ethtool, the phy is already initialized and
      link auto-negotiation is started by calling phy_start() from
      ravb_phy_start() when a network device is opened.
      Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Reviewed-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a150c50
    • V
      ravb: fix invalid context bug while changing link options by ethtool · 05925e52
      Vladimir Zapolskiy 提交于
      The change fixes sleep in atomic context bug, which is encountered
      every time when link settings are changed by ethtool.
      
      Since commit 35b5f6b1 ("PHYLIB: Locking fixes for PHY I/O
      potentially sleeping") phy_start_aneg() function utilizes a mutex
      to serialize changes to phy state, however that helper function is
      called in atomic context under a grabbed spinlock, because
      phy_start_aneg() is called by phy_ethtool_ksettings_set() and by
      replaced phy_ethtool_sset() helpers from phylib.
      
      Now duplex mode setting is enforced in ravb_adjust_link() only, also
      now RX/TX is disabled when link is put down or modifications to E-MAC
      registers ECMR and GECMR are expected for both cases of checked and
      ignored link status pin state from E-MAC interrupt handler.
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Reviewed-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05925e52
    • V
      ravb: fix invalid context bug while calling auto-negotiation by ethtool · 0973a4dd
      Vladimir Zapolskiy 提交于
      Since commit 35b5f6b1 ("PHYLIB: Locking fixes for PHY I/O
      potentially sleeping") phy_start_aneg() function utilizes a mutex
      to serialize changes to phy state, however the helper function is
      called in atomic context.
      
      The bug can be reproduced by running "ethtool -r" command, the bug
      is reported if CONFIG_DEBUG_ATOMIC_SLEEP build option is enabled.
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
      Reviewed-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0973a4dd