1. 15 1月, 2021 3 次提交
  2. 13 1月, 2021 1 次提交
  3. 20 11月, 2020 1 次提交
    • E
      crypto: sha - split sha.h into sha1.h and sha2.h · a24d22b2
      Eric Biggers 提交于
      Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
      and <crypto/sha3.h> contains declarations for SHA-3.
      
      This organization is inconsistent, but more importantly SHA-1 is no
      longer considered to be cryptographically secure.  So to the extent
      possible, SHA-1 shouldn't be grouped together with any of the other SHA
      versions, and usage of it should be phased out.
      
      Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
      <crypto/sha2.h>, and make everyone explicitly specify whether they want
      the declarations for SHA-1, SHA-2, or both.
      
      This avoids making the SHA-1 declarations visible to files that don't
      want anything to do with SHA-1.  It also prepares for potentially moving
      sha1.h into a new insecure/ or dangerous/ directory.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NArd Biesheuvel <ardb@kernel.org>
      Acked-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      a24d22b2
  4. 27 10月, 2020 1 次提交
  5. 22 10月, 2020 1 次提交
  6. 11 9月, 2020 1 次提交
    • L
      bpf: Plug hole in struct bpf_sk_lookup_kern · d66423fb
      Lorenz Bauer 提交于
      As Alexei points out, struct bpf_sk_lookup_kern has two 4-byte holes.
      This leads to suboptimal instructions being generated (IPv4, x86):
      
          1372                    struct bpf_sk_lookup_kern ctx = {
             0xffffffff81b87f30 <+624>:   xor    %eax,%eax
             0xffffffff81b87f32 <+626>:   mov    $0x6,%ecx
             0xffffffff81b87f37 <+631>:   lea    0x90(%rsp),%rdi
             0xffffffff81b87f3f <+639>:   movl   $0x110002,0x88(%rsp)
             0xffffffff81b87f4a <+650>:   rep stos %rax,%es:(%rdi)
             0xffffffff81b87f4d <+653>:   mov    0x8(%rsp),%eax
             0xffffffff81b87f51 <+657>:   mov    %r13d,0x90(%rsp)
             0xffffffff81b87f59 <+665>:   incl   %gs:0x7e4970a0(%rip)
             0xffffffff81b87f60 <+672>:   mov    %eax,0x8c(%rsp)
             0xffffffff81b87f67 <+679>:   movzwl 0x10(%rsp),%eax
             0xffffffff81b87f6c <+684>:   mov    %ax,0xa8(%rsp)
             0xffffffff81b87f74 <+692>:   movzwl 0x38(%rsp),%eax
             0xffffffff81b87f79 <+697>:   mov    %ax,0xaa(%rsp)
      
      Fix this by moving around sport and dport. pahole confirms there
      are no more holes:
      
          struct bpf_sk_lookup_kern {
              u16                        family;       /*     0     2 */
              u16                        protocol;     /*     2     2 */
              __be16                     sport;        /*     4     2 */
              u16                        dport;        /*     6     2 */
              struct {
                      __be32             saddr;        /*     8     4 */
                      __be32             daddr;        /*    12     4 */
              } v4;                                    /*     8     8 */
              struct {
                      const struct in6_addr  * saddr;  /*    16     8 */
                      const struct in6_addr  * daddr;  /*    24     8 */
              } v6;                                    /*    16    16 */
              struct sock *              selected_sk;  /*    32     8 */
              bool                       no_reuseport; /*    40     1 */
      
              /* size: 48, cachelines: 1, members: 8 */
              /* padding: 7 */
              /* last cacheline: 48 bytes */
          };
      
      The assembly also doesn't contain the pesky rep stos anymore:
      
          1372                    struct bpf_sk_lookup_kern ctx = {
             0xffffffff81b87f60 <+624>:   movzwl 0x10(%rsp),%eax
             0xffffffff81b87f65 <+629>:   movq   $0x0,0xa8(%rsp)
             0xffffffff81b87f71 <+641>:   movq   $0x0,0xb0(%rsp)
             0xffffffff81b87f7d <+653>:   mov    %ax,0x9c(%rsp)
             0xffffffff81b87f85 <+661>:   movzwl 0x38(%rsp),%eax
             0xffffffff81b87f8a <+666>:   movq   $0x0,0xb8(%rsp)
             0xffffffff81b87f96 <+678>:   mov    %ax,0x9e(%rsp)
             0xffffffff81b87f9e <+686>:   mov    0x8(%rsp),%eax
             0xffffffff81b87fa2 <+690>:   movq   $0x0,0xc0(%rsp)
             0xffffffff81b87fae <+702>:   movl   $0x110002,0x98(%rsp)
             0xffffffff81b87fb9 <+713>:   mov    %eax,0xa0(%rsp)
             0xffffffff81b87fc0 <+720>:   mov    %r13d,0xa4(%rsp)
      
      1: https://lore.kernel.org/bpf/CAADnVQKE6y9h2fwX6OS837v-Uf+aBXnT_JXiN_bbo2gitZQ3tA@mail.gmail.com/
      
      Fixes: e9ddbb77 ("bpf: Introduce SK_LOOKUP program type with a dedicated attach point")
      Suggested-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Link: https://lore.kernel.org/bpf/20200910110248.198326-1-lmb@cloudflare.com
      d66423fb
  7. 25 8月, 2020 2 次提交
    • M
      bpf: tcp: Allow bpf prog to write and parse TCP header option · 0813a841
      Martin KaFai Lau 提交于
      [ Note: The TCP changes here is mainly to implement the bpf
        pieces into the bpf_skops_*() functions introduced
        in the earlier patches. ]
      
      The earlier effort in BPF-TCP-CC allows the TCP Congestion Control
      algorithm to be written in BPF.  It opens up opportunities to allow
      a faster turnaround time in testing/releasing new congestion control
      ideas to production environment.
      
      The same flexibility can be extended to writing TCP header option.
      It is not uncommon that people want to test new TCP header option
      to improve the TCP performance.  Another use case is for data-center
      that has a more controlled environment and has more flexibility in
      putting header options for internal only use.
      
      For example, we want to test the idea in putting maximum delay
      ACK in TCP header option which is similar to a draft RFC proposal [1].
      
      This patch introduces the necessary BPF API and use them in the
      TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse
      and write TCP header options.  It currently supports most of
      the TCP packet except RST.
      
      Supported TCP header option:
      ───────────────────────────
      This patch allows the bpf-prog to write any option kind.
      Different bpf-progs can write its own option by calling the new helper
      bpf_store_hdr_opt().  The helper will ensure there is no duplicated
      option in the header.
      
      By allowing bpf-prog to write any option kind, this gives a lot of
      flexibility to the bpf-prog.  Different bpf-prog can write its
      own option kind.  It could also allow the bpf-prog to support a
      recently standardized option on an older kernel.
      
      Sockops Callback Flags:
      ──────────────────────
      The bpf program will only be called to parse/write tcp header option
      if the following newly added callback flags are enabled
      in tp->bpf_sock_ops_cb_flags:
      BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG
      BPF_SOCK_OPS_PARSE_ALL_HDR_OPT_CB_FLAG
      BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG
      
      A few words on the PARSE CB flags.  When the above PARSE CB flags are
      turned on, the bpf-prog will be called on packets received
      at a sk that has at least reached the ESTABLISHED state.
      The parsing of the SYN-SYNACK-ACK will be discussed in the
      "3 Way HandShake" section.
      
      The default is off for all of the above new CB flags, i.e. the bpf prog
      will not be called to parse or write bpf hdr option.  There are
      details comment on these new cb flags in the UAPI bpf.h.
      
      sock_ops->skb_data and bpf_load_hdr_opt()
      ─────────────────────────────────────────
      sock_ops->skb_data and sock_ops->skb_data_end covers the whole
      TCP header and its options.  They are read only.
      
      The new bpf_load_hdr_opt() helps to read a particular option "kind"
      from the skb_data.
      
      Please refer to the comment in UAPI bpf.h.  It has details
      on what skb_data contains under different sock_ops->op.
      
      3 Way HandShake
      ───────────────
      The bpf-prog can learn if it is sending SYN or SYNACK by reading the
      sock_ops->skb_tcp_flags.
      
      * Passive side
      
      When writing SYNACK (i.e. sock_ops->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB),
      the received SYN skb will be available to the bpf prog.  The bpf prog can
      use the SYN skb (which may carry the header option sent from the remote bpf
      prog) to decide what bpf header option should be written to the outgoing
      SYNACK skb.  The SYN packet can be obtained by getsockopt(TCP_BPF_SYN*).
      More on this later.  Also, the bpf prog can learn if it is in syncookie
      mode (by checking sock_ops->args[0] == BPF_WRITE_HDR_TCP_SYNACK_COOKIE).
      
      The bpf prog can store the received SYN pkt by using the existing
      bpf_setsockopt(TCP_SAVE_SYN).  The example in a later patch does it.
      [ Note that the fullsock here is a listen sk, bpf_sk_storage
        is not very useful here since the listen sk will be shared
        by many concurrent connection requests.
      
        Extending bpf_sk_storage support to request_sock will add weight
        to the minisock and it is not necessary better than storing the
        whole ~100 bytes SYN pkt. ]
      
      When the connection is established, the bpf prog will be called
      in the existing PASSIVE_ESTABLISHED_CB callback.  At that time,
      the bpf prog can get the header option from the saved syn and
      then apply the needed operation to the newly established socket.
      The later patch will use the max delay ack specified in the SYN
      header and set the RTO of this newly established connection
      as an example.
      
      The received ACK (that concludes the 3WHS) will also be available to
      the bpf prog during PASSIVE_ESTABLISHED_CB through the sock_ops->skb_data.
      It could be useful in syncookie scenario.  More on this later.
      
      There is an existing getsockopt "TCP_SAVED_SYN" to return the whole
      saved syn pkt which includes the IP[46] header and the TCP header.
      A few "TCP_BPF_SYN*" getsockopt has been added to allow specifying where to
      start getting from, e.g. starting from TCP header, or from IP[46] header.
      
      The new getsockopt(TCP_BPF_SYN*) will also know where it can get
      the SYN's packet from:
        - (a) the just received syn (available when the bpf prog is writing SYNACK)
              and it is the only way to get SYN during syncookie mode.
        or
        - (b) the saved syn (available in PASSIVE_ESTABLISHED_CB and also other
              existing CB).
      
      The bpf prog does not need to know where the SYN pkt is coming from.
      The getsockopt(TCP_BPF_SYN*) will hide this details.
      
      Similarly, a flags "BPF_LOAD_HDR_OPT_TCP_SYN" is also added to
      bpf_load_hdr_opt() to read a particular header option from the SYN packet.
      
      * Fastopen
      
      Fastopen should work the same as the regular non fastopen case.
      This is a test in a later patch.
      
      * Syncookie
      
      For syncookie, the later example patch asks the active
      side's bpf prog to resend the header options in ACK.  The server
      can use bpf_load_hdr_opt() to look at the options in this
      received ACK during PASSIVE_ESTABLISHED_CB.
      
      * Active side
      
      The bpf prog will get a chance to write the bpf header option
      in the SYN packet during WRITE_HDR_OPT_CB.  The received SYNACK
      pkt will also be available to the bpf prog during the existing
      ACTIVE_ESTABLISHED_CB callback through the sock_ops->skb_data
      and bpf_load_hdr_opt().
      
      * Turn off header CB flags after 3WHS
      
      If the bpf prog does not need to write/parse header options
      beyond the 3WHS, the bpf prog can clear the bpf_sock_ops_cb_flags
      to avoid being called for header options.
      Or the bpf-prog can select to leave the UNKNOWN_HDR_OPT_CB_FLAG on
      so that the kernel will only call it when there is option that
      the kernel cannot handle.
      
      [1]: draft-wang-tcpm-low-latency-opt-00
           https://tools.ietf.org/html/draft-wang-tcpm-low-latency-opt-00Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200820190104.2885895-1-kafai@fb.com
      0813a841
    • M
      bpf: sock_ops: Change some members of sock_ops_kern from u32 to u8 · c9985d09
      Martin KaFai Lau 提交于
      A later patch needs to add a few pointers and a few u8 to
      sock_ops_kern.  Hence, this patch saves some spaces by moving
      some of the existing members from u32 to u8 so that the later
      patch can still fit everything in a cacheline.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200820190058.2885640-1-kafai@fb.com
      c9985d09
  8. 24 8月, 2020 1 次提交
  9. 26 7月, 2020 1 次提交
  10. 25 7月, 2020 1 次提交
  11. 20 7月, 2020 1 次提交
  12. 18 7月, 2020 3 次提交
    • J
      inet6: Run SK_LOOKUP BPF program on socket lookup · 1122702f
      Jakub Sitnicki 提交于
      Following ipv4 stack changes, run a BPF program attached to netns before
      looking up a listening socket. Program can return a listening socket to use
      as result of socket lookup, fail the lookup, or take no action.
      Suggested-by: NMarek Majkowski <marek@cloudflare.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200717103536.397595-7-jakub@cloudflare.com
      1122702f
    • J
      inet: Run SK_LOOKUP BPF program on socket lookup · 1559b4aa
      Jakub Sitnicki 提交于
      Run a BPF program before looking up a listening socket on the receive path.
      Program selects a listening socket to yield as result of socket lookup by
      calling bpf_sk_assign() helper and returning SK_PASS code. Program can
      revert its decision by assigning a NULL socket with bpf_sk_assign().
      
      Alternatively, BPF program can also fail the lookup by returning with
      SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
      no socket has been selected with bpf_sk_assign().
      
      This lets the user match packets with listening sockets freely at the last
      possible point on the receive path, where we know that packets are destined
      for local delivery after undergoing policing, filtering, and routing.
      
      With BPF code selecting the socket, directing packets destined to an IP
      range or to a port range to a single socket becomes possible.
      
      In case multiple programs are attached, they are run in series in the order
      in which they were attached. The end result is determined from return codes
      of all the programs according to following rules:
      
       1. If any program returned SK_PASS and selected a valid socket, the socket
          is used as result of socket lookup.
       2. If more than one program returned SK_PASS and selected a socket,
          last selection takes effect.
       3. If any program returned SK_DROP, and no program returned SK_PASS and
          selected a socket, socket lookup fails with -ECONNREFUSED.
       4. If all programs returned SK_PASS and none of them selected a socket,
          socket lookup continues to htable-based lookup.
      Suggested-by: NMarek Majkowski <marek@cloudflare.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200717103536.397595-5-jakub@cloudflare.com
      1559b4aa
    • J
      bpf: Introduce SK_LOOKUP program type with a dedicated attach point · e9ddbb77
      Jakub Sitnicki 提交于
      Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
      BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
      when looking up a listening socket for a new connection request for
      connection oriented protocols, or when looking up an unconnected socket for
      a packet for connection-less protocols.
      
      When called, SK_LOOKUP BPF program can select a socket that will receive
      the packet. This serves as a mechanism to overcome the limits of what
      bind() API allows to express. Two use-cases driving this work are:
      
       (1) steer packets destined to an IP range, on fixed port to a socket
      
           192.0.2.0/24, port 80 -> NGINX socket
      
       (2) steer packets destined to an IP address, on any port to a socket
      
           198.51.100.1, any port -> L7 proxy socket
      
      In its run-time context program receives information about the packet that
      triggered the socket lookup. Namely IP version, L4 protocol identifier, and
      address 4-tuple. Context can be further extended to include ingress
      interface identifier.
      
      To select a socket BPF program fetches it from a map holding socket
      references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
      helper to record the selection. Transport layer then uses the selected
      socket as a result of socket lookup.
      
      In its basic form, SK_LOOKUP acts as a filter and hence must return either
      SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
      look for a socket to receive the packet, or use the one selected by the
      program if available, while SK_DROP informs the transport layer that the
      lookup should fail.
      
      This patch only enables the user to attach an SK_LOOKUP program to a
      network namespace. Subsequent patches hook it up to run on local delivery
      path in ipv4 and ipv6 stacks.
      Suggested-by: NMarek Majkowski <marek@cloudflare.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200717103536.397595-3-jakub@cloudflare.com
      e9ddbb77
  13. 09 7月, 2020 2 次提交
    • K
      bpf: Check correct cred for CAP_SYSLOG in bpf_dump_raw_ok() · 63960260
      Kees Cook 提交于
      When evaluating access control over kallsyms visibility, credentials at
      open() time need to be used, not the "current" creds (though in BPF's
      case, this has likely always been the same). Plumb access to associated
      file->f_cred down through bpf_dump_raw_ok() and its callers now that
      kallsysm_show_value() has been refactored to take struct cred.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: bpf@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 7105e828 ("bpf: allow for correlation of maps and helpers in dump")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      63960260
    • K
      kallsyms: Refactor kallsyms_show_value() to take cred · 16025184
      Kees Cook 提交于
      In order to perform future tests against the cred saved during open(),
      switch kallsyms_show_value() to operate on a cred, and have all current
      callers pass current_cred(). This makes it very obvious where callers
      are checking the wrong credential in their "read" contexts. These will
      be fixed in the coming patches.
      
      Additionally switch return value to bool, since it is always used as a
      direct permission check, not a 0-on-success, negative-on-error style
      function return.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      16025184
  14. 08 5月, 2020 2 次提交
    • E
      crypto: lib/sha1 - fold linux/cryptohash.h into crypto/sha.h · 228c4f26
      Eric Biggers 提交于
      <linux/cryptohash.h> sounds very generic and important, like it's the
      header to include if you're doing cryptographic hashing in the kernel.
      But actually it only includes the library implementation of the SHA-1
      compression function (not even the full SHA-1).  This should basically
      never be used anymore; SHA-1 is no longer considered secure, and there
      are much better ways to do cryptographic hashing in the kernel.
      
      Remove this header and fold it into <crypto/sha.h> which already
      contains constants and functions for SHA-1 (along with SHA-2).
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      228c4f26
    • E
      crypto: lib/sha1 - rename "sha" to "sha1" · 6b0b0fa2
      Eric Biggers 提交于
      The library implementation of the SHA-1 compression function is
      confusingly called just "sha_transform()".  Alongside it are some "SHA_"
      constants and "sha_init()".  Presumably these are left over from a time
      when SHA just meant SHA-1.  But now there are also SHA-2 and SHA-3, and
      moreover SHA-1 is now considered insecure and thus shouldn't be used.
      
      Therefore, rename these functions and constants to make it very clear
      that they are for SHA-1.  Also add a comment to make it clear that these
      shouldn't be used.
      
      For the extra-misleadingly named "SHA_MESSAGE_BYTES", rename it to
      SHA1_BLOCK_SIZE and define it to just '64' rather than '(512/8)' so that
      it matches the same definition in <crypto/sha.h>.  This prepares for
      merging <linux/cryptohash.h> into <crypto/sha.h>.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      6b0b0fa2
  15. 05 5月, 2020 1 次提交
    • A
      bpf: Avoid gcc-10 stringop-overflow warning in struct bpf_prog · d26c0cc5
      Arnd Bergmann 提交于
      gcc-10 warns about accesses to zero-length arrays:
      
      kernel/bpf/core.c: In function 'bpf_patch_insn_single':
      cc1: warning: writing 8 bytes into a region of size 0 [-Wstringop-overflow=]
      In file included from kernel/bpf/core.c:21:
      include/linux/filter.h:550:20: note: at offset 0 to object 'insnsi' with size 0 declared here
        550 |   struct bpf_insn  insnsi[0];
            |                    ^~~~~~
      
      In this case, we really want to have two flexible-array members,
      but that is not possible. Removing the union to make insnsi a
      flexible-array member while leaving insns as a zero-length array
      fixes the warning, as nothing writes to the other one in that way.
      
      This trick only works on linux-3.18 or higher, as older versions
      had additional members in the union.
      
      Fixes: 60a3b225 ("net: bpf: make eBPF interpreter images read-only")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200430213101.135134-6-arnd@arndb.de
      d26c0cc5
  16. 26 4月, 2020 1 次提交
  17. 14 3月, 2020 2 次提交
  18. 25 2月, 2020 4 次提交
  19. 17 1月, 2020 1 次提交
    • T
      xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths · 1d233886
      Toke Høiland-Jørgensen 提交于
      Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
      we can re-use the bulking for the non-map version of the bpf_redirect()
      helper. This is a simple matter of having xdp_do_redirect_slow() queue the
      frame on the bulk queue instead of sending it out with __bpf_tx_xdp().
      
      Unfortunately we can't make the bpf_redirect() helper return an error if
      the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
      have a reference to the network namespace of the ingress device at the time
      the helper is called. So we have to leave it as-is and keep the device
      lookup in xdp_do_redirect_slow().
      
      Since this leaves less reason to have the non-map redirect code in a
      separate function, so we get rid of the xdp_do_redirect_slow() function
      entirely. This does lose us the tracepoint disambiguation, but fortunately
      the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
      entry structures. This means both can contain a map index, so we can just
      amend the tracepoint definitions so we always emit the xdp_redirect(_err)
      tracepoints, but with the map ID only populated if a map is present. This
      means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
      the definitions around in case someone is still listening for them.
      
      With this change, the performance of the xdp_redirect sample program goes
      from 5Mpps to 8.4Mpps (a 68% increase).
      
      Since the flush functions are no longer map-specific, rename the flush()
      functions to drop _map from their names. One of the renamed functions is
      the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
      keep from having to update all drivers, use a #define to keep the old name
      working, and only update the virtual drivers in this patch.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk
      1d233886
  20. 10 1月, 2020 1 次提交
    • M
      bpf: tcp: Support tcp_congestion_ops in bpf · 0baf26b0
      Martin KaFai Lau 提交于
      This patch makes "struct tcp_congestion_ops" to be the first user
      of BPF STRUCT_OPS.  It allows implementing a tcp_congestion_ops
      in bpf.
      
      The BPF implemented tcp_congestion_ops can be used like
      regular kernel tcp-cc through sysctl and setsockopt.  e.g.
      [root@arch-fb-vm1 bpf]# sysctl -a | egrep congestion
      net.ipv4.tcp_allowed_congestion_control = reno cubic bpf_cubic
      net.ipv4.tcp_available_congestion_control = reno bic cubic bpf_cubic
      net.ipv4.tcp_congestion_control = bpf_cubic
      
      There has been attempt to move the TCP CC to the user space
      (e.g. CCP in TCP).   The common arguments are faster turn around,
      get away from long-tail kernel versions in production...etc,
      which are legit points.
      
      BPF has been the continuous effort to join both kernel and
      userspace upsides together (e.g. XDP to gain the performance
      advantage without bypassing the kernel).  The recent BPF
      advancements (in particular BTF-aware verifier, BPF trampoline,
      BPF CO-RE...) made implementing kernel struct ops (e.g. tcp cc)
      possible in BPF.  It allows a faster turnaround for testing algorithm
      in the production while leveraging the existing (and continue growing)
      BPF feature/framework instead of building one specifically for
      userspace TCP CC.
      
      This patch allows write access to a few fields in tcp-sock
      (in bpf_tcp_ca_btf_struct_access()).
      
      The optional "get_info" is unsupported now.  It can be added
      later.  One possible way is to output the info with a btf-id
      to describe the content.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200109003508.3856115-1-kafai@fb.com
      0baf26b0
  21. 20 12月, 2019 1 次提交
  22. 14 12月, 2019 1 次提交
  23. 10 12月, 2019 1 次提交
  24. 02 12月, 2019 1 次提交
  25. 25 11月, 2019 2 次提交
  26. 16 11月, 2019 1 次提交
  27. 23 10月, 2019 1 次提交
    • D
      bpf: Fix use after free in subprog's jited symbol removal · cd7455f1
      Daniel Borkmann 提交于
      syzkaller managed to trigger the following crash:
      
        [...]
        BUG: unable to handle page fault for address: ffffc90001923030
        #PF: supervisor read access in kernel mode
        #PF: error_code(0x0000) - not-present page
        PGD aa551067 P4D aa551067 PUD aa552067 PMD a572b067 PTE 80000000a1173163
        Oops: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 7982 Comm: syz-executor912 Not tainted 5.4.0-rc3+ #0
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:bpf_jit_binary_hdr include/linux/filter.h:787 [inline]
        RIP: 0010:bpf_get_prog_addr_region kernel/bpf/core.c:531 [inline]
        RIP: 0010:bpf_tree_comp kernel/bpf/core.c:600 [inline]
        RIP: 0010:__lt_find include/linux/rbtree_latch.h:115 [inline]
        RIP: 0010:latch_tree_find include/linux/rbtree_latch.h:208 [inline]
        RIP: 0010:bpf_prog_kallsyms_find kernel/bpf/core.c:674 [inline]
        RIP: 0010:is_bpf_text_address+0x184/0x3b0 kernel/bpf/core.c:709
        [...]
        Call Trace:
         kernel_text_address kernel/extable.c:147 [inline]
         __kernel_text_address+0x9a/0x110 kernel/extable.c:102
         unwind_get_return_address+0x4c/0x90 arch/x86/kernel/unwind_frame.c:19
         arch_stack_walk+0x98/0xe0 arch/x86/kernel/stacktrace.c:26
         stack_trace_save+0xb6/0x150 kernel/stacktrace.c:123
         save_stack mm/kasan/common.c:69 [inline]
         set_track mm/kasan/common.c:77 [inline]
         __kasan_kmalloc+0x11c/0x1b0 mm/kasan/common.c:510
         kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:518
         slab_post_alloc_hook mm/slab.h:584 [inline]
         slab_alloc mm/slab.c:3319 [inline]
         kmem_cache_alloc+0x1f5/0x2e0 mm/slab.c:3483
         getname_flags+0xba/0x640 fs/namei.c:138
         getname+0x19/0x20 fs/namei.c:209
         do_sys_open+0x261/0x560 fs/open.c:1091
         __do_sys_open fs/open.c:1115 [inline]
         __se_sys_open fs/open.c:1110 [inline]
         __x64_sys_open+0x87/0x90 fs/open.c:1110
         do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        [...]
      
      After further debugging it turns out that we walk kallsyms while in parallel
      we tear down a BPF program which contains subprograms that have been JITed
      though the program itself has not been fully exposed and is eventually bailing
      out with error.
      
      The bpf_prog_kallsyms_del_subprogs() in bpf_prog_load()'s error path removes
      the symbols, however, bpf_prog_free() tears down the JIT memory too early via
      scheduled work. Instead, it needs to properly respect RCU grace period as the
      kallsyms walk for BPF is under RCU.
      
      Fix it by refactoring __bpf_prog_put()'s tear down and reuse it in our error
      path where we defer final destruction when we have subprogs in the program.
      
      Fixes: 7d1982b4 ("bpf: fix panic in prog load calls cleanup")
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Reported-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Tested-by: syzbot+710043c5d1d5b5013bc7@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/bpf/55f6367324c2d7e9583fa9ccf5385dcbba0d7a6e.1571752452.git.daniel@iogearbox.net
      cd7455f1
  28. 17 10月, 2019 1 次提交