1. 25 8月, 2019 11 次提交
    • Z
      net: rds: add service level support in rds-info · e0e6d062
      Zhu Yanjun 提交于
      >From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
      is used to identify different flows within an IBA subnet.
      It is carried in the local route header of the packet.
      
      Before this commit, run "rds-info -I". The outputs are as
      below:
      "
      RDS IB Connections:
       LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
      192.2.95.3  192.2.95.1  2   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  1   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      "
      After this commit, the output is as below:
      "
      RDS IB Connections:
       LocalAddr  RemoteAddr Tos SL  LocalDev               RemoteDev
      192.2.95.3  192.2.95.1  2   2  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  1   1  fe80::21:28:1a:39  fe80::21:28:10:b9
      192.2.95.3  192.2.95.1  0   0  fe80::21:28:1a:39  fe80::21:28:10:b9
      "
      
      The commit fe3475af ("net: rds: add per rds connection cache
      statistics") adds cache_allocs in struct rds_info_rdma_connection
      as below:
      struct rds_info_rdma_connection {
      ...
              __u32           rdma_mr_max;
              __u32           rdma_mr_size;
              __u8            tos;
              __u32           cache_allocs;
       };
      The peer struct in rds-tools of struct rds_info_rdma_connection is as
      below:
      struct rds_info_rdma_connection {
      ...
              uint32_t        rdma_mr_max;
              uint32_t        rdma_mr_size;
              uint8_t         tos;
              uint8_t         sl;
              uint32_t        cache_allocs;
      };
      The difference between userspace and kernel is the member variable sl.
      In the kernel struct, the member variable sl is missing. This will
      introduce risks. So it is necessary to use this commit to avoid this risk.
      
      Fixes: fe3475af ("net: rds: add per rds connection cache statistics")
      CC: Joe Jin <joe.jin@oracle.com>
      CC: JUNXIAO_BI <junxiao.bi@oracle.com>
      Suggested-by: NGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0e6d062
    • J
      net: route dump netlink NLM_F_MULTI flag missing · e93fb3e9
      John Fastabend 提交于
      An excerpt from netlink(7) man page,
      
        In multipart messages (multiple nlmsghdr headers with associated payload
        in one byte stream) the first and all following headers have the
        NLM_F_MULTI flag set, except for the last  header  which  has the type
        NLMSG_DONE.
      
      but, after (ee28906f) there is a missing NLM_F_MULTI flag in the middle of a
      FIB dump. The result is user space applications following above man page
      excerpt may get confused and may stop parsing msg believing something went
      wrong.
      
      In the golang netlink lib [0] the library logic stops parsing believing the
      message is not a multipart message. Found this running Cilium[1] against
      net-next while adding a feature to auto-detect routes. I noticed with
      multiple route tables we no longer could detect the default routes on net
      tree kernels because the library logic was not returning them.
      
      Fix this by handling the fib_dump_info_fnhe() case the same way the
      fib_dump_info() handles it by passing the flags argument through the
      call chain and adding a flags argument to rt_fill_info().
      
      Tested with Cilium stack and auto-detection of routes works again. Also
      annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
      NLMSG_DONE flags look correct after this.
      
      Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
      as is done for fib_dump_info() so this looks correct to me.
      
      [0] https://github.com/vishvananda/netlink/
      [1] https://github.com/cilium/
      
      Fixes: ee28906f ("ipv4: Dump route exceptions if requested")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e93fb3e9
    • J
      s390/qeth: reject oversized SNMP requests · 292a50e3
      Julian Wiedmann 提交于
      Commit d4c08afa ("s390/qeth: streamline SNMP cmd code") removed
      the bounds checking for req_len, under the assumption that the check in
      qeth_alloc_cmd() would suffice.
      
      But that code path isn't sufficiently robust to handle a user-provided
      data_length, which could overflow (when adding the cmd header overhead)
      before being checked against QETH_BUFSIZE. We end up allocating just a
      tiny iob, and the subsequent copy_from_user() writes past the end of
      that iob.
      
      Special-case this path and add a coarse bounds check, to protect against
      maliciuous requests. This let's the subsequent code flow do its normal
      job and precise checking, without risk of overflow.
      
      Fixes: d4c08afa ("s390/qeth: streamline SNMP cmd code")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Reviewed-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      292a50e3
    • Z
      sock: fix potential memory leak in proto_register() · b45ce321
      zhanglin 提交于
      If protocols registered exceeded PROTO_INUSE_NR, prot will be
      added to proto_list, but no available bit left for prot in
      proto_inuse_idx.
      
      Changes since v2:
      * Propagate the error code properly
      Signed-off-by: Nzhanglin <zhang.lin16@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b45ce321
    • D
      Merge tag 'mlx5-fixes-2019-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · d37fb975
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      Mellanox, mlx5 fixes 2019-08-22
      
      This series introduces some fixes to mlx5 driver.
      
      1) Form Moshe, two fixes for firmware health reporter
      2) From Eran, two ktls fixes.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d37fb975
    • A
      MAINTAINERS: Add phylink keyword to SFF/SFP/SFP+ MODULE SUPPORT · 0c69b19f
      Andrew Lunn 提交于
      Russell king maintains phylink, as part of the SFP module support.
      However, much of the review work is about drivers swapping from phylib
      to phylink. Such changes don't make changes to the phylink core, and
      so the F: rules in MAINTAINERS don't match. Add a K:, keywork rule,
      which hopefully get_maintainers will match against for patches to MAC
      drivers swapping to phylink.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c69b19f
    • D
      Merge branch 'collect_md-mode-dev-null' · 9b45ff91
      David S. Miller 提交于
      Hangbin Liu says:
      
      ====================
      fix dev null pointer dereference when send packets larger than mtu in collect_md mode
      
      When we send a packet larger than PMTU, we need to reply with
      icmp_send(ICMP_FRAG_NEEDED) or icmpv6_send(ICMPV6_PKT_TOOBIG).
      
      But with collect_md mode, kernel will crash while accessing the dst dev
      as __metadata_dst_init() init dst->dev to NULL by default. Here is what
      the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv4
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmp_send
            - net = dev_net(rt->dst.dev); <-- here
        - ip6gre_xmit_ipv6
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmpv6_send
            ...
            - decode_session4
              - oif = skb_dst(skb)->dev->ifindex; <-- here
            - decode_session6
              - oif = skb_dst(skb)->dev->ifindex; <-- here
      
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      Look in to the __icmp_send()/decode_session{4,6} code we could find the dst
      dev is actually not needed. In __icmp_send(), we could get the net by skb->dev.
      For decode_session{4,6}, as it was called by xfrm_decode_session_reverse()
      in this scenario, the oif is not used by
      fl4->flowi4_oif = reverse ? skb->skb_iif : oif;
      
      The reproducer is easy:
      
      ovs-vsctl add-br br0
      ip link set br0 up
      ovs-vsctl add-port br0 gre0 -- set interface gre0 type=gre options:remote_ip=$dst_addr
      ip link set gre0 up
      ip addr add ${local_gre6}/64 dev br0
      ping6 $remote_gre6 -s 1500
      
      The kernel will crash like
      [40595.821651] BUG: kernel NULL pointer dereference, address: 0000000000000108
      [40595.822411] #PF: supervisor read access in kernel mode
      [40595.822949] #PF: error_code(0x0000) - not-present page
      [40595.823492] PGD 0 P4D 0
      [40595.823767] Oops: 0000 [#1] SMP PTI
      [40595.824139] CPU: 0 PID: 2831 Comm: handler12 Not tainted 5.2.0 #57
      [40595.824788] Hardware name: Red Hat KVM, BIOS 1.11.1-3.module+el8.1.0+2983+b2ae9c0a 04/01/2014
      [40595.825680] RIP: 0010:__xfrm_decode_session+0x6b/0x930
      [40595.826219] Code: b7 c0 00 00 00 b8 06 00 00 00 66 85 d2 0f b7 ca 48 0f 45 c1 44 0f b6 2c 06 48 8b 47 58 48 83 e0 fe 0f 84 f4 04 00 00 48 8b 00 <44> 8b 80 08 01 00 00 41 f6 c4 01 4c 89 e7
      ba 58 00 00 00 0f 85 47
      [40595.828155] RSP: 0018:ffffc90000a73438 EFLAGS: 00010286
      [40595.828705] RAX: 0000000000000000 RBX: ffff8881329d7100 RCX: 0000000000000000
      [40595.829450] RDX: 0000000000000000 RSI: ffff8881339e70ce RDI: ffff8881329d7100
      [40595.830191] RBP: ffffc90000a73470 R08: 0000000000000000 R09: 000000000000000a
      [40595.830936] R10: 0000000000000000 R11: 0000000000000000 R12: ffffc90000a73490
      [40595.831682] R13: 000000000000002c R14: ffff888132ff1301 R15: ffff8881329d7100
      [40595.832427] FS:  00007f5bfcfd6700(0000) GS:ffff88813ba00000(0000) knlGS:0000000000000000
      [40595.833266] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [40595.833883] CR2: 0000000000000108 CR3: 000000013a368000 CR4: 00000000000006f0
      [40595.834633] Call Trace:
      [40595.835392]  ? rt6_multipath_hash+0x4c/0x390
      [40595.835853]  icmpv6_route_lookup+0xcb/0x1d0
      [40595.836296]  ? icmpv6_xrlim_allow+0x3e/0x140
      [40595.836751]  icmp6_send+0x537/0x840
      [40595.837125]  icmpv6_send+0x20/0x30
      [40595.837494]  tnl_update_pmtu.isra.27+0x19d/0x2a0 [ip_tunnel]
      [40595.838088]  ip_md_tunnel_xmit+0x1b6/0x510 [ip_tunnel]
      [40595.838633]  gre_tap_xmit+0x10c/0x160 [ip_gre]
      [40595.839103]  dev_hard_start_xmit+0x93/0x200
      [40595.839551]  sch_direct_xmit+0x101/0x2d0
      [40595.839967]  __dev_queue_xmit+0x69f/0x9c0
      [40595.840399]  do_execute_actions+0x1717/0x1910 [openvswitch]
      [40595.840987]  ? validate_set.isra.12+0x2f5/0x3d0 [openvswitch]
      [40595.841596]  ? reserve_sfa_size+0x31/0x130 [openvswitch]
      [40595.842154]  ? __ovs_nla_copy_actions+0x1b4/0xad0 [openvswitch]
      [40595.842778]  ? __kmalloc_reserve.isra.50+0x2e/0x80
      [40595.843285]  ? should_failslab+0xa/0x20
      [40595.843696]  ? __kmalloc+0x188/0x220
      [40595.844078]  ? __alloc_skb+0x97/0x270
      [40595.844472]  ovs_execute_actions+0x47/0x120 [openvswitch]
      [40595.845041]  ovs_packet_cmd_execute+0x27d/0x2b0 [openvswitch]
      [40595.845648]  genl_family_rcv_msg+0x3a8/0x430
      [40595.846101]  genl_rcv_msg+0x47/0x90
      [40595.846476]  ? __alloc_skb+0x83/0x270
      [40595.846866]  ? genl_family_rcv_msg+0x430/0x430
      [40595.847335]  netlink_rcv_skb+0xcb/0x100
      [40595.847777]  genl_rcv+0x24/0x40
      [40595.848113]  netlink_unicast+0x17f/0x230
      [40595.848535]  netlink_sendmsg+0x2ed/0x3e0
      [40595.848951]  sock_sendmsg+0x4f/0x60
      [40595.849323]  ___sys_sendmsg+0x2bd/0x2e0
      [40595.849733]  ? sock_poll+0x6f/0xb0
      [40595.850098]  ? ep_scan_ready_list.isra.14+0x20b/0x240
      [40595.850634]  ? _cond_resched+0x15/0x30
      [40595.851032]  ? ep_poll+0x11b/0x440
      [40595.851401]  ? _copy_to_user+0x22/0x30
      [40595.851799]  __sys_sendmsg+0x58/0xa0
      [40595.852180]  do_syscall_64+0x5b/0x190
      [40595.852574]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [40595.853105] RIP: 0033:0x7f5c00038c7d
      [40595.853489] Code: c7 20 00 00 75 10 b8 2e 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 8e f7 ff ff 48 89 04 24 b8 2e 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 d7 f7 ff ff 48 89
      d0 48 83 c4 08 48 3d 01
      [40595.855443] RSP: 002b:00007f5bfcf73c00 EFLAGS: 00003293 ORIG_RAX: 000000000000002e
      [40595.856244] RAX: ffffffffffffffda RBX: 00007f5bfcf74a60 RCX: 00007f5c00038c7d
      [40595.856990] RDX: 0000000000000000 RSI: 00007f5bfcf73c60 RDI: 0000000000000015
      [40595.857736] RBP: 0000000000000004 R08: 0000000000000b7c R09: 0000000000000110
      [40595.858613] R10: 0001000800050004 R11: 0000000000003293 R12: 000055c2d8329da0
      [40595.859401] R13: 00007f5bfcf74120 R14: 0000000000000347 R15: 00007f5bfcf73c60
      [40595.860185] Modules linked in: ip_gre ip_tunnel gre openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 sunrpc bochs_drm ttm drm_kms_helper drm pcspkr joydev i2c_piix4 qemu_fw_cfg xfs libcrc32c virtio_net net_failover serio_raw failover ata_generic virtio_blk pata_acpi floppy
      [40595.863155] CR2: 0000000000000108
      [40595.863551] ---[ end trace 22209bbcacb4addd ]---
      
      v4: Julian Anastasov remind skb->dev also could be NULL in icmp_send. We'd
      better still use dst.dev and do a check to avoid crash.
      
      v3: only replace pkg to packets in cover letter. So I didn't update the version
      info in the follow up patches.
      
      v2: fix it in __icmp_send() and decode_session{4,6} separately instead of
      updating shared dst dev in {ip_md, ip6}_tunnel_xmit.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b45ff91
    • H
      xfrm/xfrm_policy: fix dst dev null pointer dereference in collect_md mode · c3b4c3a4
      Hangbin Liu 提交于
      In decode_session{4,6} there is a possibility that the skb dst dev is NULL,
      e,g, with tunnel collect_md mode, which will cause kernel crash.
      Here is what the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv6
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmpv6_send
            - icmpv6_route_lookup
              - xfrm_decode_session_reverse
                - decode_session4
                  - oif = skb_dst(skb)->dev->ifindex; <-- here
                - decode_session6
                  - oif = skb_dst(skb)->dev->ifindex; <-- here
      
      The reason is __metadata_dst_init() init dst->dev to NULL by default.
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      On the other hand, the skb_dst(skb)->dev is actually not needed as we
      called decode_session{4,6} via xfrm_decode_session_reverse(), so oif is not
      used by: fl4->flowi4_oif = reverse ? skb->skb_iif : oif;
      
      So make a dst dev check here should be clean and safe.
      
      v4: No changes.
      
      v3: No changes.
      
      v2: fix the issue in decode_session{4,6} instead of updating shared dst dev
      in {ip_md, ip6}_tunnel_xmit.
      
      Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Tested-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3b4c3a4
    • H
      ipv4/icmp: fix rt dst dev null pointer dereference · e2c69393
      Hangbin Liu 提交于
      In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
      e,g, with tunnel collect_md mode, which will cause kernel crash.
      Here is what the code path looks like, for GRE:
      
      - ip6gre_tunnel_xmit
        - ip6gre_xmit_ipv4
          - __gre6_xmit
            - ip6_tnl_xmit
              - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
          - icmp_send
            - net = dev_net(rt->dst.dev); <-- here
      
      The reason is __metadata_dst_init() init dst->dev to NULL by default.
      We could not fix it in __metadata_dst_init() as there is no dev supplied.
      On the other hand, the reason we need rt->dst.dev is to get the net.
      So we can just try get it from skb->dev when rt->dst.dev is NULL.
      
      v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
      still use dst.dev and do a check to avoid crash.
      
      v3: No changes.
      
      v2: fix the issue in __icmp_send() instead of updating shared dst dev
      in {ip_md, ip6}_tunnel_xmit.
      
      Fixes: c8b34e68 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2c69393
    • Y
      openvswitch: Fix log message in ovs conntrack · 12c6bc38
      Yi-Hung Wei 提交于
      Fixes: 06bd2bdf ("openvswitch: Add timeout support to ct action")
      Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12c6bc38
    • D
      Merge branch 'ieee802154-for-davem-2019-08-24' of... · 12e2e15d
      David S. Miller 提交于
      Merge branch 'ieee802154-for-davem-2019-08-24' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
      
      Stefan Schmidt says:
      
      ====================
      pull-request: ieee802154 for net 2019-08-24
      
      An update from ieee802154 for your *net* tree.
      
      Yue Haibing fixed two bugs discovered by KASAN in the hwsim driver for
      ieee802154 and Colin Ian King cleaned up a redundant variable assignment.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12e2e15d
  2. 24 8月, 2019 7 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 211c4624
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-08-24
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei.
      
      2) Fix a use-after-free in prog symbol exposure, from Daniel.
      
      3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya.
      
      4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan.
      
      5) Fix a potential use-after-free on flow dissector detach, from Jakub.
      
      6) Fix bpftool to close prog fd after showing metadata, from Quentin.
      
      7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      211c4624
    • I
      bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0 · 2c238177
      Ilya Leoshkevich 提交于
      test_select_reuseport fails on s390 due to verifier rejecting
      test_select_reuseport_kern.o with the following message:
      
      	; data_check.eth_protocol = reuse_md->eth_protocol;
      	18: (69) r1 = *(u16 *)(r6 +22)
      	invalid bpf_context access off=22 size=2
      
      This is because on big-endian machines casts from __u32 to __u16 are
      generated by referencing the respective variable as __u16 with an offset
      of 2 (as opposed to 0 on little-endian machines).
      
      The verifier already has all the infrastructure in place to allow such
      accesses, it's just that they are not explicitly enabled for
      eth_protocol field. Enable them for eth_protocol field by using
      bpf_ctx_range instead of offsetof.
      
      Ditto for ip_protocol, bind_inany and len, since they already allow
      narrowing, and the same problem can arise when working with them.
      
      Fixes: 2dbb9b9e ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2c238177
    • D
      bpf: fix use after free in prog symbol exposure · c751798a
      Daniel Borkmann 提交于
      syzkaller managed to trigger the warning in bpf_jit_free() which checks via
      bpf_prog_kallsyms_verify_off() for potentially unlinked JITed BPF progs
      in kallsyms, and subsequently trips over GPF when walking kallsyms entries:
      
        [...]
        8021q: adding VLAN 0 to HW filter on device batadv0
        8021q: adding VLAN 0 to HW filter on device batadv0
        WARNING: CPU: 0 PID: 9869 at kernel/bpf/core.c:810 bpf_jit_free+0x1e8/0x2a0
        Kernel panic - not syncing: panic_on_warn set ...
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x113/0x167 lib/dump_stack.c:113
         panic+0x212/0x40b kernel/panic.c:214
         __warn.cold.8+0x1b/0x38 kernel/panic.c:571
         report_bug+0x1a4/0x200 lib/bug.c:186
         fixup_bug arch/x86/kernel/traps.c:178 [inline]
         do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
         do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
         invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973
        RIP: 0010:bpf_jit_free+0x1e8/0x2a0
        Code: 02 4c 89 e2 83 e2 07 38 d0 7f 08 84 c0 0f 85 86 00 00 00 48 ba 00 02 00 00 00 00 ad de 0f b6 43 02 49 39 d6 0f 84 5f fe ff ff <0f> 0b e9 58 fe ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1
        RSP: 0018:ffff888092f67cd8 EFLAGS: 00010202
        RAX: 0000000000000007 RBX: ffffc90001947000 RCX: ffffffff816e9d88
        RDX: dead000000000200 RSI: 0000000000000008 RDI: ffff88808769f7f0
        RBP: ffff888092f67d00 R08: fffffbfff1394059 R09: fffffbfff1394058
        R10: fffffbfff1394058 R11: ffffffff89ca02c7 R12: ffffc90001947002
        R13: ffffc90001947020 R14: ffffffff881eca80 R15: ffff88808769f7e8
        BUG: unable to handle kernel paging request at fffffbfff400d000
        #PF error: [normal kernel read fault]
        PGD 21ffee067 P4D 21ffee067 PUD 21ffed067 PMD 9f942067 PTE 0
        Oops: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 9869 Comm: kworker/0:7 Not tainted 5.0.0-rc8+ #1
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Workqueue: events bpf_prog_free_deferred
        RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:495 [inline]
        RIP: 0010:bpf_tree_comp kernel/bpf/core.c:558 [inline]
        RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
        RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
        RIP: 0010:bpf_prog_kallsyms_find+0x107/0x2e0 kernel/bpf/core.c:632
        Code: 00 f0 ff ff 44 38 c8 7f 08 84 c0 0f 85 fa 00 00 00 41 f6 45 02 01 75 02 0f 0b 48 39 da 0f 82 92 00 00 00 48 89 d8 48 c1 e8 03 <42> 0f b6 04 30 84 c0 74 08 3c 03 0f 8e 45 01 00 00 8b 03 48 c1 e0
        [...]
      
      Upon further debugging, it turns out that whenever we trigger this
      issue, the kallsyms removal in bpf_prog_ksym_node_del() was /skipped/
      but yet bpf_jit_free() reported that the entry is /in use/.
      
      Problem is that symbol exposure via bpf_prog_kallsyms_add() but also
      perf_event_bpf_event() were done /after/ bpf_prog_new_fd(). Once the
      fd is exposed to the public, a parallel close request came in right
      before we attempted to do the bpf_prog_kallsyms_add().
      
      Given at this time the prog reference count is one, we start to rip
      everything underneath us via bpf_prog_release() -> bpf_prog_put().
      The memory is eventually released via deferred free, so we're seeing
      that bpf_jit_free() has a kallsym entry because we added it from
      bpf_prog_load() but /after/ bpf_prog_put() from the remote CPU.
      
      Therefore, move both notifications /before/ we install the fd. The
      issue was never seen between bpf_prog_alloc_id() and bpf_prog_new_fd()
      because upon bpf_prog_get_fd_by_id() we'll take another reference to
      the BPF prog, so we're still holding the original reference from the
      bpf_prog_load().
      
      Fixes: 6ee52e2a ("perf, bpf: Introduce PERF_RECORD_BPF_EVENT")
      Fixes: 74451e66 ("bpf: make jited programs visible in traces")
      Reported-by: syzbot+bd3bba6ff3fcea7a6ec6@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Song Liu <songliubraving@fb.com>
      c751798a
    • A
      bpf: fix precision tracking in presence of bpf2bpf calls · 6754172c
      Alexei Starovoitov 提交于
      While adding extra tests for precision tracking and extra infra
      to adjust verifier heuristics the existing test
      "calls: cross frame pruning - liveness propagation" started to fail.
      The root cause is the same as described in verifer.c comment:
      
       * Also if parent's curframe > frame where backtracking started,
       * the verifier need to mark registers in both frames, otherwise callees
       * may incorrectly prune callers. This is similar to
       * commit 7640ead9 ("bpf: verifier: make sure callees don't prune with caller differences")
       * For now backtracking falls back into conservative marking.
      
      Turned out though that returning -ENOTSUPP from backtrack_insn() and
      doing mark_all_scalars_precise() in the current parentage chain is not enough.
      Depending on how is_state_visited() heuristic is creating parentage chain
      it's possible that callee will incorrectly prune caller.
      Fix the issue by setting precise=true earlier and more aggressively.
      Before this fix the precision tracking _within_ functions that don't do
      bpf2bpf calls would still work. Whereas now precision tracking is completely
      disabled when bpf2bpf calls are present anywhere in the program.
      
      No difference in cilium tests (they don't have bpf2bpf calls).
      No difference in test_progs though some of them have bpf2bpf calls,
      but precision tracking wasn't effective there.
      
      Fixes: b5dc0163 ("bpf: precise scalar_value tracking")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      6754172c
    • J
      flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH · db38de39
      Jakub Sitnicki 提交于
      Call to bpf_prog_put(), with help of call_rcu(), queues an RCU-callback to
      free the program once a grace period has elapsed. The callback can run
      together with new RCU readers that started after the last grace period.
      New RCU readers can potentially see the "old" to-be-freed or already-freed
      pointer to the program object before the RCU update-side NULLs it.
      
      Reorder the operations so that the RCU update-side resets the protected
      pointer before the end of the grace period after which the program will be
      freed.
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Reported-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Acked-by: NPetar Penkov <ppenkov@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      db38de39
    • H
      Revert "r8169: remove not needed call to dma_sync_single_for_device" · 345b9326
      Heiner Kallweit 提交于
      This reverts commit f072218c.
      
      As reported by Aaro this patch causes network problems on
      MIPS Loongson platform. Therefore revert it.
      
      Fixes: f072218c ("r8169: remove not needed call to dma_sync_single_for_device")
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Reported-by: NAaro Koskinen <aaro.koskinen@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      345b9326
    • S
      ipv6: propagate ipv6_add_dev's error returns out of ipv6_find_idev · db0b99f5
      Sabrina Dubroca 提交于
      Currently, ipv6_find_idev returns NULL when ipv6_add_dev fails,
      ignoring the specific error value. This results in addrconf_add_dev
      returning ENOBUFS in all cases, which is unfortunate in cases such as:
      
          # ip link add dummyX type dummy
          # ip link set dummyX mtu 1200 up
          # ip addr add 2000::/64 dev dummyX
          RTNETLINK answers: No buffer space available
      
      Commit a317a2f1 ("ipv6: fail early when creating netdev named all
      or default") introduced error returns in ipv6_add_dev. Before that,
      that function would simply return NULL for all failures.
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db0b99f5
  3. 23 8月, 2019 8 次提交
  4. 22 8月, 2019 8 次提交
  5. 21 8月, 2019 6 次提交
    • A
      selftests/bpf: install files test_xdp_vlan.sh · 3035bb72
      Anders Roxell 提交于
      When ./test_xdp_vlan_mode_generic.sh runs it complains that it can't
      find file test_xdp_vlan.sh.
      
       # selftests: bpf: test_xdp_vlan_mode_generic.sh
       # ./test_xdp_vlan_mode_generic.sh: line 9: ./test_xdp_vlan.sh: No such
       file or directory
      
      Rework so that test_xdp_vlan.sh gets installed, added to the variable
      TEST_PROGS_EXTENDED.
      
      Fixes: d35661fc ("selftests/bpf: add wrapper scripts for test_xdp_vlan.sh")
      Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
      Acked-by: NJesper Dangaard Brouer <jbrouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      3035bb72
    • A
      selftests/bpf: add config fragment BPF_JIT · 0604409d
      Anders Roxell 提交于
      When running test_kmod.sh the following shows up
      
       # sysctl cannot stat /proc/sys/net/core/bpf_jit_enable No such file or directory
       cannot: stat_/proc/sys/net/core/bpf_jit_enable #
       # sysctl cannot stat /proc/sys/net/core/bpf_jit_harden No such file or directory
       cannot: stat_/proc/sys/net/core/bpf_jit_harden #
      
      Rework to enable CONFIG_BPF_JIT to solve "No such file or directory"
      Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      0604409d
    • I
      selftests/bpf: fix test_btf_dump with O= · e91dcb53
      Ilya Leoshkevich 提交于
      test_btf_dump fails when run with O=, because it needs to access source
      files and assumes they live in ./progs/, which is not the case in this
      scenario.
      
      Fix by instructing kselftest to copy btf_dump_test_case_*.c files to the
      test directory. Since kselftest does not preserve directory structure,
      adjust the test to look in ./progs/ and then in ./.
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e91dcb53
    • I
      selftests/bpf: fix test_cgroup_storage on s390 · 806ce6e2
      Ilya Leoshkevich 提交于
      test_cgroup_storage fails on s390 with an assertion failure: packets are
      dropped when they shouldn't. The problem is that BPF_DW packet count is
      accessed as BPF_W with an offset of 0, which is not correct on
      big-endian machines.
      
      Since the point of this test is not to verify narrow loads/stores,
      simply use BPF_DW when working with packet counts.
      
      Fixes: 68cfa3ac ("selftests/bpf: add a cgroup storage test")
      Fixes: 919646d2 ("selftests/bpf: extend the storage test to test per-cpu cgroup storage")
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      806ce6e2
    • H
      Revert "cfg80211: fix processing world regdomain when non modular" · 0d31d4db
      Hodaszi, Robert 提交于
      This reverts commit 96cce12f ("cfg80211: fix processing world
      regdomain when non modular").
      
      Re-triggering a reg_process_hint with the last request on all events,
      can make the regulatory domain fail in case of multiple WiFi modules. On
      slower boards (espacially with mdev), enumeration of the WiFi modules
      can end up in an intersected regulatory domain, and user cannot set it
      with 'iw reg set' anymore.
      
      This is happening, because:
      - 1st module enumerates, queues up a regulatory request
      - request gets processed by __reg_process_hint_driver():
        - checks if previous was set by CORE -> yes
          - checks if regulator domain changed -> yes, from '00' to e.g. 'US'
            -> sends request to the 'crda'
      - 2nd module enumerates, queues up a regulator request (which triggers
        the reg_todo() work)
      - reg_todo() -> reg_process_pending_hints() sees, that the last request
        is not processed yet, so it tries to process it again.
        __reg_process_hint driver() will run again, and:
        - checks if the last request's initiator was the core -> no, it was
          the driver (1st WiFi module)
        - checks, if the previous initiator was the driver -> yes
          - checks if the regulator domain changed -> yes, it was '00' (set by
            core, and crda call did not return yet), and should be changed to 'US'
      
      ------> __reg_process_hint_driver calls an intersect
      
      Besides, the reg_process_hint call with the last request is meaningless
      since the crda call has a timeout work. If that timeout expires, the
      first module's request will lost.
      
      Cc: stable@vger.kernel.org
      Fixes: 96cce12f ("cfg80211: fix processing world regdomain when non modular")
      Signed-off-by: NRobert Hodaszi <robert.hodaszi@digi.com>
      Link: https://lore.kernel.org/r/20190614131600.GA13897@a1-hrSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      0d31d4db
    • A
      cfg80211: Fix Extended Key ID key install checks · b67fd72e
      Alexander Wetzel 提交于
      Fix two shortcomings in the Extended Key ID API:
      
       1) Allow the userspace to install pairwise keys using keyid 1 without
          NL80211_KEY_NO_TX set. This allows the userspace to install and
          activate pairwise keys with keyid 1 in the same way as for keyid 0,
          simplifying the API usage for e.g. FILS and FT key installs.
      
       2) IEEE 802.11 - 2016 restricts Extended Key ID usage to CCMP/GCMP
          ciphers in IEEE 802.11 - 2016 "9.4.2.25.4 RSN capabilities".
          Enforce that when installing a key.
      
      Cc: stable@vger.kernel.org # 5.2
      Fixes: 6cdd3979 ("nl80211/cfg80211: Extended Key ID support")
      Signed-off-by: NAlexander Wetzel <alexander@wetzel-home.de>
      Link: https://lore.kernel.org/r/20190805123400.51567-1-alexander@wetzel-home.deSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
      b67fd72e