1. 12 4月, 2021 2 次提交
  2. 02 4月, 2021 6 次提交
  3. 27 2月, 2021 5 次提交
  4. 09 1月, 2021 1 次提交
  5. 03 12月, 2020 2 次提交
  6. 16 10月, 2020 2 次提交
  7. 12 10月, 2020 2 次提交
  8. 01 10月, 2020 1 次提交
    • D
      bpf, net: Rework cookie generator as per-cpu one · 92acdc58
      Daniel Borkmann 提交于
      With its use in BPF, the cookie generator can be called very frequently
      in particular when used out of cgroup v2 hooks (e.g. connect / sendmsg)
      and attached to the root cgroup, for example, when used in v1/v2 mixed
      environments. In particular, when there's a high churn on sockets in the
      system there can be many parallel requests to the bpf_get_socket_cookie()
      and bpf_get_netns_cookie() helpers which then cause contention on the
      atomic counter.
      
      As similarly done in f991bd2e ("fs: introduce a per-cpu last_ino
      allocator"), add a small helper library that both can use for the 64 bit
      counters. Given this can be called from different contexts, we also need
      to deal with potential nested calls even though in practice they are
      considered extremely rare. One idea as suggested by Eric Dumazet was
      to use a reverse counter for this situation since we don't expect 64 bit
      overflows anyways; that way, we can avoid bigger gaps in the 64 bit
      counter space compared to just batch-wise increase. Even on machines
      with small number of cores (e.g. 4) the cookie generation shrinks from
      min/max/med/avg (ns) of 22/50/40/38.9 down to 10/35/14/17.3 when run
      in parallel from multiple CPUs.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Link: https://lore.kernel.org/bpf/8a80b8d27d3c49f9a14e1d5213c19d8be87d1dc8.1601477936.git.daniel@iogearbox.net
      92acdc58
  9. 29 9月, 2020 1 次提交
  10. 11 9月, 2020 2 次提交
  11. 28 8月, 2020 1 次提交
    • M
      bpf: Add map_meta_equal map ops · f4d05259
      Martin KaFai Lau 提交于
      Some properties of the inner map is used in the verification time.
      When an inner map is inserted to an outer map at runtime,
      bpf_map_meta_equal() is currently used to ensure those properties
      of the inserting inner map stays the same as the verification
      time.
      
      In particular, the current bpf_map_meta_equal() checks max_entries which
      turns out to be too restrictive for most of the maps which do not use
      max_entries during the verification time.  It limits the use case that
      wants to replace a smaller inner map with a larger inner map.  There are
      some maps do use max_entries during verification though.  For example,
      the map_gen_lookup in array_map_ops uses the max_entries to generate
      the inline lookup code.
      
      To accommodate differences between maps, the map_meta_equal is added
      to bpf_map_ops.  Each map-type can decide what to check when its
      map is used as an inner map during runtime.
      
      Also, some map types cannot be used as an inner map and they are
      currently black listed in bpf_map_meta_alloc() in map_in_map.c.
      It is not unusual that the new map types may not aware that such
      blacklist exists.  This patch enforces an explicit opt-in
      and only allows a map to be used as an inner map if it has
      implemented the map_meta_equal ops.  It is based on the
      discussion in [1].
      
      All maps that support inner map has its map_meta_equal points
      to bpf_map_meta_equal in this patch.  A later patch will
      relax the max_entries check for most maps.  bpf_types.h
      counts 28 map types.  This patch adds 23 ".map_meta_equal"
      by using coccinelle.  -5 for
      	BPF_MAP_TYPE_PROG_ARRAY
      	BPF_MAP_TYPE_(PERCPU)_CGROUP_STORAGE
      	BPF_MAP_TYPE_STRUCT_OPS
      	BPF_MAP_TYPE_ARRAY_OF_MAPS
      	BPF_MAP_TYPE_HASH_OF_MAPS
      
      The "if (inner_map->inner_map_meta)" check in bpf_map_meta_alloc()
      is moved such that the same error is returned.
      
      [1]: https://lore.kernel.org/bpf/20200522022342.899756-1-kafai@fb.com/Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200828011806.1970400-1-kafai@fb.com
      f4d05259
  12. 22 8月, 2020 4 次提交
  13. 01 7月, 2020 2 次提交
  14. 23 6月, 2020 2 次提交
  15. 13 6月, 2020 2 次提交
  16. 10 6月, 2020 2 次提交
    • J
      bpf, sockhash: Synchronize delete from bucket list on map free · 75e68e5b
      Jakub Sitnicki 提交于
      We can end up modifying the sockhash bucket list from two CPUs when a
      sockhash is being destroyed (sock_hash_free) on one CPU, while a socket
      that is in the sockhash is unlinking itself from it on another CPU
      it (sock_hash_delete_from_link).
      
      This results in accessing a list element that is in an undefined state as
      reported by KASAN:
      
      | ==================================================================
      | BUG: KASAN: wild-memory-access in sock_hash_free+0x13c/0x280
      | Write of size 8 at addr dead000000000122 by task kworker/2:1/95
      |
      | CPU: 2 PID: 95 Comm: kworker/2:1 Not tainted 5.7.0-rc7-02961-ge22c35ab0038-dirty #691
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      | Workqueue: events bpf_map_free_deferred
      | Call Trace:
      |  dump_stack+0x97/0xe0
      |  ? sock_hash_free+0x13c/0x280
      |  __kasan_report.cold+0x5/0x40
      |  ? mark_lock+0xbc1/0xc00
      |  ? sock_hash_free+0x13c/0x280
      |  kasan_report+0x38/0x50
      |  ? sock_hash_free+0x152/0x280
      |  sock_hash_free+0x13c/0x280
      |  bpf_map_free_deferred+0xb2/0xd0
      |  ? bpf_map_charge_finish+0x50/0x50
      |  ? rcu_read_lock_sched_held+0x81/0xb0
      |  ? rcu_read_lock_bh_held+0x90/0x90
      |  process_one_work+0x59a/0xac0
      |  ? lock_release+0x3b0/0x3b0
      |  ? pwq_dec_nr_in_flight+0x110/0x110
      |  ? rwlock_bug.part.0+0x60/0x60
      |  worker_thread+0x7a/0x680
      |  ? _raw_spin_unlock_irqrestore+0x4c/0x60
      |  kthread+0x1cc/0x220
      |  ? process_one_work+0xac0/0xac0
      |  ? kthread_create_on_node+0xa0/0xa0
      |  ret_from_fork+0x24/0x30
      | ==================================================================
      
      Fix it by reintroducing spin-lock protected critical section around the
      code that removes the elements from the bucket on sockhash free.
      
      To do that we also need to defer processing of removed elements, until out
      of atomic context so that we can unlink the socket from the map when
      holding the sock lock.
      
      Fixes: 90db6d77 ("bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200607205229.2389672-3-jakub@cloudflare.com
      75e68e5b
    • J
      bpf, sockhash: Fix memory leak when unlinking sockets in sock_hash_free · 33a7c831
      Jakub Sitnicki 提交于
      When sockhash gets destroyed while sockets are still linked to it, we will
      walk the bucket lists and delete the links. However, we are not freeing the
      list elements after processing them, leaking the memory.
      
      The leak can be triggered by close()'ing a sockhash map when it still
      contains sockets, and observed with kmemleak:
      
        unreferenced object 0xffff888116e86f00 (size 64):
          comm "race_sock_unlin", pid 223, jiffies 4294731063 (age 217.404s)
          hex dump (first 32 bytes):
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            81 de e8 41 00 00 00 00 c0 69 2f 15 81 88 ff ff  ...A.....i/.....
          backtrace:
            [<00000000dd089ebb>] sock_hash_update_common+0x4ca/0x760
            [<00000000b8219bd5>] sock_hash_update_elem+0x1d2/0x200
            [<000000005e2c23de>] __do_sys_bpf+0x2046/0x2990
            [<00000000d0084618>] do_syscall_64+0xad/0x9a0
            [<000000000d96f263>] entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      Fix it by freeing the list element when we're done with it.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200607205229.2389672-2-jakub@cloudflare.com
      33a7c831
  17. 30 4月, 2020 1 次提交
  18. 11 3月, 2020 1 次提交
    • J
      bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free · 90db6d77
      John Fastabend 提交于
      The bucket->lock is not needed in the sock_hash_free and sock_map_free
      calls, in fact it is causing a splat due to being inside rcu block.
      
      | BUG: sleeping function called from invalid context at net/core/sock.c:2935
      | in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 62, name: kworker/0:1
      | 3 locks held by kworker/0:1/62:
      |  #0: ffff88813b019748 ((wq_completion)events){+.+.}, at: process_one_work+0x1d7/0x5e0
      |  #1: ffffc900000abe50 ((work_completion)(&map->work)){+.+.}, at: process_one_work+0x1d7/0x5e0
      |  #2: ffff8881381f6df8 (&stab->lock){+...}, at: sock_map_free+0x26/0x180
      | CPU: 0 PID: 62 Comm: kworker/0:1 Not tainted 5.5.0-04008-g7b083332376e #454
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
      | Workqueue: events bpf_map_free_deferred
      | Call Trace:
      |  dump_stack+0x71/0xa0
      |  ___might_sleep.cold+0xa6/0xb6
      |  lock_sock_nested+0x28/0x90
      |  sock_map_free+0x5f/0x180
      |  bpf_map_free_deferred+0x58/0x80
      |  process_one_work+0x260/0x5e0
      |  worker_thread+0x4d/0x3e0
      |  kthread+0x108/0x140
      |  ? process_one_work+0x5e0/0x5e0
      |  ? kthread_park+0x90/0x90
      |  ret_from_fork+0x3a/0x50
      
      The reason we have stab->lock and bucket->locks in sockmap code is to
      handle checking EEXIST in update/delete cases. We need to be careful during
      an update operation that we check for EEXIST and we need to ensure that the
      psock object is not in some partial state of removal/insertion while we do
      this. So both map_update_common and sock_map_delete need to guard from being
      run together potentially deleting an entry we are checking, etc. But by the
      time we get to the tear-down code in sock_{ma[|hash}_free we have already
      disconnected the map and we just did synchronize_rcu() in the line above so
      no updates/deletes should be in flight. Because of this we can drop the
      bucket locks from the map free'ing code, noting no update/deletes can be
      in-flight.
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Reported-by: NJakub Sitnicki <jakub@cloudflare.com>
      Suggested-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/158385850787.30597.8346421465837046618.stgit@john-Precision-5820-Tower
      90db6d77
  19. 10 3月, 2020 1 次提交