1. 28 8月, 2018 3 次提交
    • J
      bpf: sockmap, decrement copied count correctly in redirect error case · 501ca817
      John Fastabend 提交于
      Currently, when a redirect occurs in sockmap and an error occurs in
      the redirect call we unwind the scatterlist once in the error path
      of bpf_tcp_sendmsg_do_redirect() and then again in sendmsg(). Then
      in the error path of sendmsg we decrement the copied count by the
      send size.
      
      However, its possible we partially sent data before the error was
      generated. This can happen if do_tcp_sendpages() partially sends the
      scatterlist before encountering a memory pressure error. If this
      happens we need to decrement the copied value (the value tracking
      how many bytes were actually sent to TCP stack) by the number of
      remaining bytes _not_ the entire send size. Otherwise we risk
      confusing userspace.
      
      Also we don't need two calls to free the scatterlist one is
      good enough. So remove the one in bpf_tcp_sendmsg_do_redirect() and
      then properly reduce copied by the number of remaining bytes which
      may in fact be the entire send size if no bytes were sent.
      
      To do this use bool to indicate if free_start_sg() should do mem
      accounting or not.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      501ca817
    • D
      bpf, sockmap: fix psock refcount leak in bpf_tcp_recvmsg · 15c480ef
      Daniel Borkmann 提交于
      In bpf_tcp_recvmsg() we first took a reference on the psock, however
      once we find that there are skbs in the normal socket's receive queue
      we return with processing them through tcp_recvmsg(). Problem is that
      we leak the taken reference on the psock in that path. Given we don't
      really do anything with the psock at this point, move the skb_queue_empty()
      test before we fetch the psock to fix this case.
      
      Fixes: 8934ce2f ("bpf: sockmap redirect ingress support")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      15c480ef
    • D
      bpf, sockmap: fix potential use after free in bpf_tcp_close · e06fa9c1
      Daniel Borkmann 提交于
      bpf_tcp_close() we pop the psock linkage to a map via psock_map_pop().
      A parallel update on the sock hash map can happen between psock_map_pop()
      and lookup_elem_raw() where we override the element under link->hash /
      link->key. In bpf_tcp_close()'s lookup_elem_raw() we subsequently only
      test whether an element is present, but we do not test whether the
      element is infact the element we were looking for.
      
      We lock the sock in bpf_tcp_close() during that time, so do we hold
      the lock in sock_hash_update_elem(). However, the latter locks the
      sock which is newly updated, not the one we're purging from the hash
      table. This means that while one CPU is doing the lookup from bpf_tcp_close(),
      another CPU is doing the map update in parallel, dropped our sock from
      the hlist and released the psock.
      
      Subsequently the first CPU will find the new sock and attempts to drop
      and release the old sock yet another time. Fix is that we need to check
      the elements for a match after lookup, similar as we do in the sock map.
      Note that the hash tab elems are freed via RCU, so access to their
      link->hash / link->key is fine since we're under RCU read side there.
      
      Fixes: e9db4ef6 ("bpf: sockhash fix omitted bucket lock in sock_close")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e06fa9c1
  2. 24 8月, 2018 1 次提交
  3. 23 8月, 2018 3 次提交
  4. 18 8月, 2018 1 次提交
    • D
      bpf: fix redirect to map under tail calls · f6069b9a
      Daniel Borkmann 提交于
      Commits 109980b8 ("bpf: don't select potentially stale ri->map
      from buggy xdp progs") and 7c300131 ("bpf: fix ri->map_owner
      pointer on bpf_prog_realloc") tried to mitigate that buggy programs
      using bpf_redirect_map() helper call do not leave stale maps behind.
      Idea was to add a map_owner cookie into the per CPU struct redirect_info
      which was set to prog->aux by the prog making the helper call as a
      proof that the map is not stale since the prog is implicitly holding
      a reference to it. This owner cookie could later on get compared with
      the program calling into BPF whether they match and therefore the
      redirect could proceed with processing the map safely.
      
      In (obvious) hindsight, this approach breaks down when tail calls are
      involved since the original caller's prog->aux pointer does not have
      to match the one from one of the progs out of the tail call chain,
      and therefore the xdp buffer will be dropped instead of redirected.
      A way around that would be to fix the issue differently (which also
      allows to remove related work in fast path at the same time): once
      the life-time of a redirect map has come to its end we use it's map
      free callback where we need to wait on synchronize_rcu() for current
      outstanding xdp buffers and remove such a map pointer from the
      redirect info if found to be present. At that time no program is
      using this map anymore so we simply invalidate the map pointers to
      NULL iff they previously pointed to that instance while making sure
      that the redirect path only reads out the map once.
      
      Fixes: 97f91a7c ("bpf: add bpf_redirect_map helper routine")
      Fixes: 109980b8 ("bpf: don't select potentially stale ri->map from buggy xdp progs")
      Reported-by: NSebastiano Miano <sebastiano.miano@polito.it>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      f6069b9a
  5. 17 8月, 2018 4 次提交
    • D
      bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist · 585f5a62
      Daniel Borkmann 提交于
      The current code in sock_map_ctx_update_elem() allows for BPF_EXIST
      and BPF_NOEXIST map update flags. While on array-like maps this approach
      is rather uncommon, e.g. bpf_fd_array_map_update_elem() and others
      enforce map update flags to be BPF_ANY such that xchg() can be used
      directly, the current implementation in sock map does not guarantee
      that such operation with BPF_EXIST / BPF_NOEXIST is atomic.
      
      The initial test does a READ_ONCE(stab->sock_map[i]) to fetch the
      socket from the slot which is then tested for NULL / non-NULL. However
      later after __sock_map_ctx_update_elem(), the actual update is done
      through osock = xchg(&stab->sock_map[i], sock). Problem is that in
      the meantime a different CPU could have updated / deleted a socket
      on that specific slot and thus flag contraints won't hold anymore.
      
      I've been thinking whether best would be to just break UAPI and do
      an enforcement of BPF_ANY to check if someone actually complains,
      however trouble is that already in BPF kselftest we use BPF_NOEXIST
      for the map update, and therefore it might have been copied into
      applications already. The fix to keep the current behavior intact
      would be to add a map lock similar to the sock hash bucket lock only
      for covering the whole map.
      
      Fixes: 174a79ff ("bpf: sockmap with sk redirect support")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      585f5a62
    • D
      bpf, sockmap: fix map elem deletion race with smap_stop_sock · 166ab6f0
      Daniel Borkmann 提交于
      The smap_start_sock() and smap_stop_sock() are each protected under
      the sock->sk_callback_lock from their call-sites except in the case
      of sock_map_delete_elem() where we drop the old socket from the map
      slot. This is racy because the same sock could be part of multiple
      sock maps, so we run smap_stop_sock() in parallel, and given at that
      point psock->strp_enabled might be true on both CPUs, we might for
      example wrongly restore the sk->sk_data_ready / sk->sk_write_space.
      Therefore, hold the sock->sk_callback_lock as well on delete. Looks
      like 2f857d04 ("bpf: sockmap, remove STRPARSER map_flags and add
      multi-map support") had this right, but later on e9db4ef6 ("bpf:
      sockhash fix omitted bucket lock in sock_close") removed it again
      from delete leaving this smap_stop_sock() instance unprotected.
      
      Fixes: e9db4ef6 ("bpf: sockhash fix omitted bucket lock in sock_close")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      166ab6f0
    • D
      bpf, sockmap: fix leakage of smap_psock_map_entry · d40b0116
      Daniel Borkmann 提交于
      While working on sockmap I noticed that we do not always kfree the
      struct smap_psock_map_entry list elements which track psocks attached
      to maps. In the case of sock_hash_ctx_update_elem(), these map entries
      are allocated outside of __sock_map_ctx_update_elem() with their
      linkage to the socket hash table filled. In the case of sock array,
      the map entries are allocated inside of __sock_map_ctx_update_elem()
      and added with their linkage to the psock->maps. Both additions are
      under psock->maps_lock each.
      
      Now, we drop these elements from their psock->maps list in a few
      occasions: i) in sock array via smap_list_map_remove() when an entry
      is either deleted from the map from user space, or updated via
      user space or BPF program where we drop the old socket at that map
      slot, or the sock array is freed via sock_map_free() and drops all
      its elements; ii) for sock hash via smap_list_hash_remove() in exactly
      the same occasions as just described for sock array; iii) in the
      bpf_tcp_close() where we remove the elements from the list via
      psock_map_pop() and iterate over them dropping themselves from either
      sock array or sock hash; and last but not least iv) once again in
      smap_gc_work() which is a callback for deferring the work once the
      psock refcount hit zero and thus the socket is being destroyed.
      
      Problem is that the only case where we kfree() the list entry is
      in case iv), which at that point should have an empty list in
      normal cases. So in cases from i) to iii) we unlink the elements
      without freeing where they go out of reach from us. Hence fix is
      to properly kfree() them as well to stop the leakage. Given these
      are all handled under psock->maps_lock there is no need for deferred
      RCU freeing.
      
      I later also ran with kmemleak detector and it confirmed the finding
      as well where in the state before the fix the object goes unreferenced
      while after the patch no kmemleak report related to BPF showed up.
      
        [...]
        unreferenced object 0xffff880378eadae0 (size 64):
          comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
          hex dump (first 32 bytes):
            00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
            50 4d 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  PMu]............
          backtrace:
            [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
            [<0000000045dd6d3c>] bpf_sock_map_update+0x29/0x60
            [<00000000877723aa>] ___bpf_prog_run+0x1e1f/0x4960
            [<000000002ef89e83>] 0xffffffffffffffff
        unreferenced object 0xffff880378ead240 (size 64):
          comm "test_sockmap", pid 2225, jiffies 4294720701 (age 43.504s)
          hex dump (first 32 bytes):
            00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
            00 44 75 5d 03 88 ff ff 00 00 00 00 00 00 00 00  .Du]............
          backtrace:
            [<000000005225ac3c>] sock_map_ctx_update_elem.isra.21+0xd8/0x210
            [<0000000030e37a3a>] sock_map_update_elem+0x125/0x240
            [<000000002e5ce36e>] map_update_elem+0x4eb/0x7b0
            [<00000000db453cc9>] __x64_sys_bpf+0x1f9/0x360
            [<0000000000763660>] do_syscall_64+0x9a/0x300
            [<00000000422a2bb2>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
            [<000000002ef89e83>] 0xffffffffffffffff
        [...]
      
      Fixes: e9db4ef6 ("bpf: sockhash fix omitted bucket lock in sock_close")
      Fixes: 54fedb42 ("bpf: sockmap, fix smap_list_map_remove when psock is in many maps")
      Fixes: 2f857d04 ("bpf: sockmap, remove STRPARSER map_flags and add multi-map support")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d40b0116
    • Y
      bpf: fix a rcu usage warning in bpf_prog_array_copy_core() · 965931e3
      Yonghong Song 提交于
      Commit 394e40a2 ("bpf: extend bpf_prog_array to store pointers
      to the cgroup storage") refactored the bpf_prog_array_copy_core()
      to accommodate new structure bpf_prog_array_item which contains
      bpf_prog array itself.
      
      In the old code, we had
         perf_event_query_prog_array():
           mutex_lock(...)
           bpf_prog_array_copy_call():
             prog = rcu_dereference_check(array, 1)->progs
             bpf_prog_array_copy_core(prog, ...)
           mutex_unlock(...)
      
      With the above commit, we had
         perf_event_query_prog_array():
           mutex_lock(...)
           bpf_prog_array_copy_call():
             bpf_prog_array_copy_core(array, ...):
               item = rcu_dereference(array)->items;
               ...
           mutex_unlock(...)
      
      The new code will trigger a lockdep rcu checking warning.
      The fix is to change rcu_dereference() to rcu_dereference_check()
      to prevent such a warning.
      
      Reported-by: syzbot+6e72317008eef84a216b@syzkaller.appspotmail.com
      Fixes: 394e40a2 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage")
      Cc: Roman Gushchin <guro@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      965931e3
  6. 13 8月, 2018 1 次提交
    • D
      bpf: decouple btf from seq bpf fs dump and enable more maps · e8d2bec0
      Daniel Borkmann 提交于
      Commit a26ca7c9 ("bpf: btf: Add pretty print support to
      the basic arraymap") and 699c86d6 ("bpf: btf: add pretty
      print for hash/lru_hash maps") enabled support for BTF and
      dumping via BPF fs for array and hash/lru map. However, both
      can be decoupled from each other such that regular BPF maps
      can be supported for attaching BTF key/value information,
      while not all maps necessarily need to dump via map_seq_show_elem()
      callback.
      
      The basic sanity check which is a prerequisite for all maps
      is that key/value size has to match in any case, and some maps
      can have extra checks via map_check_btf() callback, e.g.
      probing certain types or indicating no support in general. With
      that we can also enable retrieving BTF info for per-cpu map
      types and lpm.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      e8d2bec0
  7. 11 8月, 2018 4 次提交
    • M
      bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT · 2dbb9b9e
      Martin KaFai Lau 提交于
      This patch adds a BPF_PROG_TYPE_SK_REUSEPORT which can select
      a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY.  Like other
      non SK_FILTER/CGROUP_SKB program, it requires CAP_SYS_ADMIN.
      
      BPF_PROG_TYPE_SK_REUSEPORT introduces "struct sk_reuseport_kern"
      to store the bpf context instead of using the skb->cb[48].
      
      At the SO_REUSEPORT sk lookup time, it is in the middle of transiting
      from a lower layer (ipv4/ipv6) to a upper layer (udp/tcp).  At this
      point,  it is not always clear where the bpf context can be appended
      in the skb->cb[48] to avoid saving-and-restoring cb[].  Even putting
      aside the difference between ipv4-vs-ipv6 and udp-vs-tcp.  It is not
      clear if the lower layer is only ipv4 and ipv6 in the future and
      will it not touch the cb[] again before transiting to the upper
      layer.
      
      For example, in udp_gro_receive(), it uses the 48 byte NAPI_GRO_CB
      instead of IP[6]CB and it may still modify the cb[] after calling
      the udp[46]_lib_lookup_skb().  Because of the above reason, if
      sk->cb is used for the bpf ctx, saving-and-restoring is needed
      and likely the whole 48 bytes cb[] has to be saved and restored.
      
      Instead of saving, setting and restoring the cb[], this patch opts
      to create a new "struct sk_reuseport_kern" and setting the needed
      values in there.
      
      The new BPF_PROG_TYPE_SK_REUSEPORT and "struct sk_reuseport_(kern|md)"
      will serve all ipv4/ipv6 + udp/tcp combinations.  There is no protocol
      specific usage at this point and it is also inline with the current
      sock_reuseport.c implementation (i.e. no protocol specific requirement).
      
      In "struct sk_reuseport_md", this patch exposes data/data_end/len
      with semantic similar to other existing usages.  Together
      with "bpf_skb_load_bytes()" and "bpf_skb_load_bytes_relative()",
      the bpf prog can peek anywhere in the skb.  The "bind_inany" tells
      the bpf prog that the reuseport group is bind-ed to a local
      INANY address which cannot be learned from skb.
      
      The new "bind_inany" is added to "struct sock_reuseport" which will be
      used when running the new "BPF_PROG_TYPE_SK_REUSEPORT" bpf prog in order
      to avoid repeating the "bind INANY" test on
      "sk_v6_rcv_saddr/sk->sk_rcv_saddr" every time a bpf prog is run.  It can
      only be properly initialized when a "sk->sk_reuseport" enabled sk is
      adding to a hashtable (i.e. during "reuseport_alloc()" and
      "reuseport_add_sock()").
      
      The new "sk_select_reuseport()" is the main helper that the
      bpf prog will use to select a SO_REUSEPORT sk.  It is the only function
      that can use the new BPF_MAP_TYPE_REUSEPORT_ARRAY.  As mentioned in
      the earlier patch, the validity of a selected sk is checked in
      run time in "sk_select_reuseport()".  Doing the check in
      verification time is difficult and inflexible (consider the map-in-map
      use case).  The runtime check is to compare the selected sk's reuseport_id
      with the reuseport_id that we want.  This helper will return -EXXX if the
      selected sk cannot serve the incoming request (e.g. reuseport_id
      not match).  The bpf prog can decide if it wants to do SK_DROP as its
      discretion.
      
      When the bpf prog returns SK_PASS, the kernel will check if a
      valid sk has been selected (i.e. "reuse_kern->selected_sk != NULL").
      If it does , it will use the selected sk.  If not, the kernel
      will select one from "reuse->socks[]" (as before this patch).
      
      The SK_DROP and SK_PASS handling logic will be in the next patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2dbb9b9e
    • M
      bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY · 5dc4c4b7
      Martin KaFai Lau 提交于
      This patch introduces a new map type BPF_MAP_TYPE_REUSEPORT_SOCKARRAY.
      
      To unleash the full potential of a bpf prog, it is essential for the
      userspace to be capable of directly setting up a bpf map which can then
      be consumed by the bpf prog to make decision.  In this case, decide which
      SO_REUSEPORT sk to serve the incoming request.
      
      By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control
      and visibility on where a SO_REUSEPORT sk should be located in a bpf map.
      The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that
      the bpf prog can directly select a sk from the bpf map.  That will
      raise the programmability of the bpf prog attached to a reuseport
      group (a group of sk serving the same IP:PORT).
      
      For example, in UDP, the bpf prog can peek into the payload (e.g.
      through the "data" pointer introduced in the later patch) to learn
      the application level's connection information and then decide which sk
      to pick from a bpf map.  The userspace can tightly couple the sk's location
      in a bpf map with the application logic in generating the UDP payload's
      connection information.  This connection info contact/API stays within the
      userspace.
      
      Also, when used with map-in-map, the userspace can switch the
      old-server-process's inner map to a new-server-process's inner map
      in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)".
      The bpf prog will then direct incoming requests to the new process instead
      of the old process.  The old process can finish draining the pending
      requests (e.g. by "accept()") before closing the old-fds.  [Note that
      deleting a fd from a bpf map does not necessary mean the fd is closed]
      
      During map_update_elem(),
      Only SO_REUSEPORT sk (i.e. which has already been added
      to a reuse->socks[]) can be used.  That means a SO_REUSEPORT sk that is
      "bind()" for UDP or "bind()+listen()" for TCP.  These conditions are
      ensured in "reuseport_array_update_check()".
      
      A SO_REUSEPORT sk can only be added once to a map (i.e. the
      same sk cannot be added twice even to the same map).  SO_REUSEPORT
      already allows another sk to be created for the same IP:PORT.
      There is no need to re-create a similar usage in the BPF side.
      
      When a SO_REUSEPORT is deleted from the "reuse->socks[]" (e.g. "close()"),
      it will notify the bpf map to remove it from the map also.  It is
      done through "bpf_sk_reuseport_detach()" and it will only be called
      if >=1 of the "reuse->sock[]" has ever been added to a bpf map.
      
      The map_update()/map_delete() has to be in-sync with the
      "reuse->socks[]".  Hence, the same "reuseport_lock" used
      by "reuse->socks[]" has to be used here also. Care has
      been taken to ensure the lock is only acquired when the
      adding sk passes some strict tests. and
      freeing the map does not require the reuseport_lock.
      
      The reuseport_array will also support lookup from the syscall
      side.  It will return a sock_gen_cookie().  The sock_gen_cookie()
      is on-demand (i.e. a sk's cookie is not generated until the very
      first map_lookup_elem()).
      
      The lookup cookie is 64bits but it goes against the logical userspace
      expectation on 32bits sizeof(fd) (and as other fd based bpf maps do also).
      It may catch user in surprise if we enforce value_size=8 while
      userspace still pass a 32bits fd during update.  Supporting different
      value_size between lookup and update seems unintuitive also.
      
      We also need to consider what if other existing fd based maps want
      to return 64bits value from syscall's lookup in the future.
      Hence, reuseport_array supports both value_size 4 and 8, and
      assuming user will usually use value_size=4.  The syscall's lookup
      will return ENOSPC on value_size=4.  It will will only
      return 64bits value from sock_gen_cookie() when user consciously
      choose value_size=8 (as a signal that lookup is desired) which then
      requires a 64bits value in both lookup and update.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5dc4c4b7
    • Y
      bpf: btf: add pretty print for hash/lru_hash maps · 699c86d6
      Yonghong Song 提交于
      Commit a26ca7c9 ("bpf: btf: Add pretty print support to
      the basic arraymap") added pretty print support to array map.
      This patch adds pretty print for hash and lru_hash maps.
      The following example shows the pretty-print result of
      a pinned hashmap:
      
          struct map_value {
                  int count_a;
                  int count_b;
          };
      
          cat /sys/fs/bpf/pinned_hash_map:
      
          87907: {87907,87908}
          57354: {37354,57355}
          76625: {76625,76626}
          ...
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      699c86d6
    • Y
      bpf: fix bpffs non-array map seq_show issue · dc1508a5
      Yonghong Song 提交于
      In function map_seq_next() of kernel/bpf/inode.c,
      the first key will be the "0" regardless of the map type.
      This works for array. But for hash type, if it happens
      key "0" is in the map, the bpffs map show will miss
      some items if the key "0" is not the first element of
      the first bucket.
      
      This patch fixed the issue by guaranteeing to get
      the first element, if the seq_show is just started,
      by passing NULL pointer key to map_get_next_key() callback.
      This way, no missing elements will occur for
      bpffs hash table show even if key "0" is in the map.
      
      Fixes: a26ca7c9 ("bpf: btf: Add pretty print support to the basic arraymap")
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      dc1508a5
  8. 10 8月, 2018 2 次提交
  9. 09 8月, 2018 2 次提交
  10. 07 8月, 2018 1 次提交
    • R
      bpf: introduce update_effective_progs() · 85fc4b16
      Roman Gushchin 提交于
      __cgroup_bpf_attach() and __cgroup_bpf_detach() functions have
      a good amount of duplicated code, which is possible to eliminate
      by introducing the update_effective_progs() helper function.
      
      The update_effective_progs() calls compute_effective_progs()
      and then in case of success it calls activate_effective_progs()
      for each descendant cgroup. In case of failure (OOM), it releases
      allocated prog arrays and return the error code.
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      85fc4b16
  11. 03 8月, 2018 8 次提交
  12. 01 8月, 2018 1 次提交
    • A
      bpf: verifier: MOV64 don't mark dst reg unbounded · fbeb1603
      Arthur Fabre 提交于
      When check_alu_op() handles a BPF_MOV64 between two registers,
      it calls check_reg_arg(DST_OP) on the dst register, marking it
      as unbounded. If the src and dst register are the same, this
      marks the src as unbounded, which can lead to unexpected errors
      for further checks that rely on bounds info. For example:
      
      	BPF_MOV64_IMM(BPF_REG_2, 0),
      	BPF_MOV64_REG(BPF_REG_2, BPF_REG_2),
      	BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2),
      	BPF_MOV64_IMM(BPF_REG_0, 0),
      	BPF_EXIT_INSN(),
      
      Results in:
      
      	"math between ctx pointer and register with unbounded
      	min value is not allowed"
      
      check_alu_op() now uses check_reg_arg(DST_OP_NO_MARK), and MOVs
      that need to mark the dst register (MOVIMM, MOV32) do so.
      
      Added a test case for MOV64 dst == src, and dst != src.
      Signed-off-by: NArthur Fabre <afabre@cloudflare.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fbeb1603
  13. 27 7月, 2018 1 次提交
    • M
      bpf: btf: Use exact btf value_size match in map_check_btf() · 5f300e80
      Martin KaFai Lau 提交于
      The current map_check_btf() in BPF_MAP_TYPE_ARRAY rejects
      '> map->value_size' to ensure map_seq_show_elem() will not
      access things beyond an array element.
      
      Yonghong suggested that using '!=' is a more correct
      check.  The 8 bytes round_up on value_size is stored
      in array->elem_size.  Hence, using '!=' on map->value_size
      is a proper check.
      
      This patch also adds new tests to check the btf array
      key type and value type.  Two of these new tests verify
      the btf's value_size (the change in this patch).
      
      It also fixes two existing tests that wrongly encoded
      a btf's type size (pprint_test) and the value_type_id (in one
      of the raw_tests[]).  However, that do not affect these two
      BTF verification tests before or after this test changes.
      These two tests mainly failed at array creation time after
      this patch.
      
      Fixes: a26ca7c9 ("bpf: btf: Add pretty print support to the basic arraymap")
      Suggested-by: NYonghong Song <yhs@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5f300e80
  14. 24 7月, 2018 1 次提交
  15. 20 7月, 2018 1 次提交
  16. 18 7月, 2018 6 次提交