1. 25 1月, 2018 4 次提交
  2. 24 1月, 2018 3 次提交
  3. 23 1月, 2018 2 次提交
  4. 22 1月, 2018 1 次提交
  5. 20 1月, 2018 5 次提交
  6. 19 1月, 2018 2 次提交
    • A
      bpf: allow socket_filter programs to use bpf_prog_test_run · 61f3c964
      Alexei Starovoitov 提交于
      in order to improve test coverage allow socket_filter program type
      to be run via bpf_prog_test_run command.
      Since such programs can be loaded by non-root tighten
      permissions for bpf_prog_test_run to be root only
      to avoid surprises.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      61f3c964
    • E
      flow_dissector: properly cap thoff field · d0c081b4
      Eric Dumazet 提交于
      syzbot reported yet another crash [1] that is caused by
      insufficient validation of DODGY packets.
      
      Two bugs are happening here to trigger the crash.
      
      1) Flow dissection leaves with incorrect thoff field.
      
      2) skb_probe_transport_header() sets transport header to this invalid
      thoff, even if pointing after skb valid data.
      
      3) qdisc_pkt_len_init() reads out-of-bound data because it
      trusts tcp_hdrlen(skb)
      
      Possible fixes :
      
      - Full flow dissector validation before injecting bad DODGY packets in
      the stack.
       This approach was attempted here : https://patchwork.ozlabs.org/patch/
      861874/
      
      - Have more robust functions in the core.
        This might be needed anyway for stable versions.
      
      This patch fixes the flow dissection issue.
      
      [1]
      CPU: 1 PID: 3144 Comm: syzkaller271204 Not tainted 4.15.0-rc4-mm1+ #49
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:355 [inline]
       kasan_report+0x23b/0x360 mm/kasan/report.c:413
       __asan_report_load2_noabort+0x14/0x20 mm/kasan/report.c:432
       __tcp_hdrlen include/linux/tcp.h:35 [inline]
       tcp_hdrlen include/linux/tcp.h:40 [inline]
       qdisc_pkt_len_init net/core/dev.c:3160 [inline]
       __dev_queue_xmit+0x20d3/0x2200 net/core/dev.c:3465
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3554
       packet_snd net/packet/af_packet.c:2943 [inline]
       packet_sendmsg+0x3ad5/0x60a0 net/packet/af_packet.c:2968
       sock_sendmsg_nosec net/socket.c:628 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:638
       sock_write_iter+0x31a/0x5d0 net/socket.c:907
       call_write_iter include/linux/fs.h:1776 [inline]
       new_sync_write fs/read_write.c:469 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:482
       vfs_write+0x189/0x510 fs/read_write.c:544
       SYSC_write fs/read_write.c:589 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:581
       entry_SYSCALL_64_fastpath+0x1f/0x96
      
      Fixes: 34fad54c ("net: __skb_flow_dissect() must cap its return value")
      Fixes: a6e544b0 ("flow_dissector: Jump to exit code in __skb_flow_dissect")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0c081b4
  7. 18 1月, 2018 2 次提交
    • K
      net: Remove spinlock from get_net_ns_by_id() · 42157277
      Kirill Tkhai 提交于
      idr_find() is safe under rcu_read_lock() and
      maybe_get_net() guarantees that net is alive.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42157277
    • K
      net: Fix possible race in peernet2id_alloc() · 0c06bea9
      Kirill Tkhai 提交于
      peernet2id_alloc() is racy without rtnl_lock() as refcount_read(&peer->count)
      under net->nsid_lock does not guarantee, peer is alive:
      
      rcu_read_lock()
      peernet2id_alloc()                            ..
        spin_lock_bh(&net->nsid_lock)               ..
        refcount_read(&peer->count) (!= 0)          ..
        ..                                          put_net()
        ..                                            cleanup_net()
        ..                                              for_each_net(tmp)
        ..                                                spin_lock_bh(&tmp->nsid_lock)
        ..                                                __peernet2id(tmp, net) == -1
        ..                                                    ..
        ..                                                    ..
          __peernet2id_alloc(alloc == true)                   ..
        ..                                                    ..
      rcu_read_unlock()                                       ..
      ..                                                synchronize_rcu()
      ..                                                kmem_cache_free(net)
      
      After the above situation, net::netns_id contains id pointing to freed memory,
      and any other dereferencing by the id will operate with this freed memory.
      
      Currently, peernet2id_alloc() is used under rtnl_lock() everywhere except
      ovs_vport_cmd_fill_info(), and this race can't occur. But peernet2id_alloc()
      is generic interface, and better we fix it before someone really starts
      use it in wrong context.
      
      v2: Don't place refcount_read(&net->count) under net->nsid_lock
          as suggested by Eric W. Biederman <ebiederm@xmission.com>
      v3: Rebase on top of net-next
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c06bea9
  8. 17 1月, 2018 5 次提交
  9. 16 1月, 2018 2 次提交
  10. 15 1月, 2018 1 次提交
  11. 13 1月, 2018 2 次提交
  12. 10 1月, 2018 5 次提交
  13. 09 1月, 2018 1 次提交
  14. 06 1月, 2018 4 次提交
    • J
      bpf: finally expose xdp_rxq_info to XDP bpf-programs · 02dd3291
      Jesper Dangaard Brouer 提交于
      Now all XDP driver have been updated to setup xdp_rxq_info and assign
      this to xdp_buff->rxq.  Thus, it is now safe to enable access to some
      of the xdp_rxq_info struct members.
      
      This patch extend xdp_md and expose UAPI to userspace for
      ingress_ifindex and rx_queue_index.  Access happens via bpf
      instruction rewrite, that load data directly from struct xdp_rxq_info.
      
      * ingress_ifindex map to xdp_rxq_info->dev->ifindex
      * rx_queue_index  map to xdp_rxq_info->queue_index
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      02dd3291
    • J
      xdp: generic XDP handling of xdp_rxq_info · e817f856
      Jesper Dangaard Brouer 提交于
      Hook points for xdp_rxq_info:
       * reg  : netif_alloc_rx_queues
       * unreg: netif_free_rx_queues
      
      The net_device have some members (num_rx_queues + real_num_rx_queues)
      and data-area (dev->_rx with struct netdev_rx_queue's) that were
      primarily used for exporting information about RPS (CONFIG_RPS) queues
      to sysfs (CONFIG_SYSFS).
      
      For generic XDP extend struct netdev_rx_queue with the xdp_rxq_info,
      and remove some of the CONFIG_SYSFS ifdefs.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e817f856
    • J
      xdp/qede: setup xdp_rxq_info and intro xdp_rxq_info_is_reg · c0124f32
      Jesper Dangaard Brouer 提交于
      The driver code qede_free_fp_array() depend on kfree() can be called
      with a NULL pointer. This stems from the qede_alloc_fp_array()
      function which either (kz)alloc memory for fp->txq or fp->rxq.
      This also simplifies error handling code in case of memory allocation
      failures, but xdp_rxq_info_unreg need to know the difference.
      
      Introduce xdp_rxq_info_is_reg() to handle if a memory allocation fails
      and detect this is the failure path by seeing that xdp_rxq_info was
      not registred yet, which first happens after successful alloaction in
      qede_init_fp().
      
      Driver hook points for xdp_rxq_info:
       * reg  : qede_init_fp
       * unreg: qede_free_fp_array
      
      Tested on actual hardware with samples/bpf program.
      
      V2: Driver have no proper error path for failed XDP RX-queue info reg, as
      qede_init_fp() is a void function.
      
      Cc: everest-linux-l2@cavium.com
      Cc: Ariel Elior <Ariel.Elior@cavium.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c0124f32
    • J
      xdp: base API for new XDP rx-queue info concept · aecd67b6
      Jesper Dangaard Brouer 提交于
      This patch only introduce the core data structures and API functions.
      All XDP enabled drivers must use the API before this info can used.
      
      There is a need for XDP to know more about the RX-queue a given XDP
      frames have arrived on.  For both the XDP bpf-prog and kernel side.
      
      Instead of extending xdp_buff each time new info is needed, the patch
      creates a separate read-mostly struct xdp_rxq_info, that contains this
      info.  We stress this data/cache-line is for read-only info.  This is
      NOT for dynamic per packet info, use the data_meta for such use-cases.
      
      The performance advantage is this info can be setup at RX-ring init
      time, instead of updating N-members in xdp_buff.  A possible (driver
      level) micro optimization is that xdp_buff->rxq assignment could be
      done once per XDP/NAPI loop.  The extra pointer deref only happens for
      program needing access to this info (thus, no slowdown to existing
      use-cases).
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      aecd67b6
  15. 05 1月, 2018 1 次提交
    • A
      rtnetlink: give a user socket to get_target_net() · f428fe4a
      Andrei Vagin 提交于
      This function is used from two places: rtnl_dump_ifinfo and
      rtnl_getlink. In rtnl_getlink(), we give a request skb into
      get_target_net(), but in rtnl_dump_ifinfo, we give a response skb
      into get_target_net().
      The problem here is that NETLINK_CB() isn't initialized for the response
      skb. In both cases we can get a user socket and give it instead of skb
      into get_target_net().
      
      This bug was found by syzkaller with this call-trace:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 3149 Comm: syzkaller140561 Not tainted 4.15.0-rc4-mm1+ #47
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__netlink_ns_capable+0x8b/0x120 net/netlink/af_netlink.c:868
      RSP: 0018:ffff8801c880f348 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff8443f900
      RDX: 000000000000007b RSI: ffffffff86510f40 RDI: 00000000000003d8
      RBP: ffff8801c880f360 R08: 0000000000000000 R09: 1ffff10039101e4f
      R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff86510f40
      R13: 000000000000000c R14: 0000000000000004 R15: 0000000000000011
      FS:  0000000001a1a880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020151000 CR3: 00000001c9511005 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        netlink_ns_capable+0x26/0x30 net/netlink/af_netlink.c:886
        get_target_net+0x9d/0x120 net/core/rtnetlink.c:1765
        rtnl_dump_ifinfo+0x2e5/0xee0 net/core/rtnetlink.c:1806
        netlink_dump+0x48c/0xce0 net/netlink/af_netlink.c:2222
        __netlink_dump_start+0x4f0/0x6d0 net/netlink/af_netlink.c:2319
        netlink_dump_start include/linux/netlink.h:214 [inline]
        rtnetlink_rcv_msg+0x7f0/0xb10 net/core/rtnetlink.c:4485
        netlink_rcv_skb+0x21e/0x460 net/netlink/af_netlink.c:2441
        rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4540
        netlink_unicast_kernel net/netlink/af_netlink.c:1308 [inline]
        netlink_unicast+0x4be/0x6a0 net/netlink/af_netlink.c:1334
        netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1897
      
      Cc: Jiri Benc <jbenc@redhat.com>
      Fixes: 79e1ad14 ("rtnetlink: use netnsid to query interface")
      Signed-off-by: NAndrei Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f428fe4a