1. 13 1月, 2018 1 次提交
    • M
      error-injection: Separate error-injection from kprobe · 540adea3
      Masami Hiramatsu 提交于
      Since error-injection framework is not limited to be used
      by kprobes, nor bpf. Other kernel subsystems can use it
      freely for checking safeness of error-injection, e.g.
      livepatch, ftrace etc.
      So this separate error-injection framework from kprobes.
      
      Some differences has been made:
      
      - "kprobe" word is removed from any APIs/structures.
      - BPF_ALLOW_ERROR_INJECTION() is renamed to
        ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
      - CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
        feature. It is automatically enabled if the arch supports
        error injection feature for kprobe or ftrace etc.
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      540adea3
  2. 11 1月, 2018 4 次提交
  3. 10 1月, 2018 2 次提交
    • Q
      bpf: export function to write into verifier log buffer · 430e68d1
      Quentin Monnet 提交于
      Rename the BPF verifier `verbose()` to `bpf_verifier_log_write()` and
      export it, so that other components (in particular, drivers for BPF
      offload) can reuse the user buffer log to dump error messages at
      verification time.
      
      Renaming `verbose()` was necessary in order to avoid a name so generic
      to be exported to the global namespace. However to prevent too much pain
      for backports, the calls to `verbose()` in the kernel BPF verifier were
      not changed. Instead, use function aliasing to make `verbose` point to
      `bpf_verifier_log_write`. Another solution could consist in making a
      wrapper around `verbose()`, but since it is a variadic function, I don't
      see a clean way without creating two identical wrappers, one for the
      verifier and one to export.
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      430e68d1
    • D
      bpf: avoid false sharing of map refcount with max_entries · be95a845
      Daniel Borkmann 提交于
      In addition to commit b2157399 ("bpf: prevent out-of-bounds
      speculation") also change the layout of struct bpf_map such that
      false sharing of fast-path members like max_entries is avoided
      when the maps reference counter is altered. Therefore enforce
      them to be placed into separate cachelines.
      
      pahole dump after change:
      
        struct bpf_map {
              const struct bpf_map_ops  * ops;                 /*     0     8 */
              struct bpf_map *           inner_map_meta;       /*     8     8 */
              void *                     security;             /*    16     8 */
              enum bpf_map_type          map_type;             /*    24     4 */
              u32                        key_size;             /*    28     4 */
              u32                        value_size;           /*    32     4 */
              u32                        max_entries;          /*    36     4 */
              u32                        map_flags;            /*    40     4 */
              u32                        pages;                /*    44     4 */
              u32                        id;                   /*    48     4 */
              int                        numa_node;            /*    52     4 */
              bool                       unpriv_array;         /*    56     1 */
      
              /* XXX 7 bytes hole, try to pack */
      
              /* --- cacheline 1 boundary (64 bytes) --- */
              struct user_struct *       user;                 /*    64     8 */
              atomic_t                   refcnt;               /*    72     4 */
              atomic_t                   usercnt;              /*    76     4 */
              struct work_struct         work;                 /*    80    32 */
              char                       name[16];             /*   112    16 */
              /* --- cacheline 2 boundary (128 bytes) --- */
      
              /* size: 128, cachelines: 2, members: 17 */
              /* sum members: 121, holes: 1, sum holes: 7 */
        };
      
      Now all entries in the first cacheline are read only throughout
      the life time of the map, set up once during map creation. Overall
      struct size and number of cachelines doesn't change from the
      reordering. struct bpf_map is usually first member and embedded
      in map structs in specific map implementations, so also avoid those
      members to sit at the end where it could potentially share the
      cacheline with first map values e.g. in the array since remote
      CPUs could trigger map updates just as well for those (easily
      dirtying members like max_entries intentionally as well) while
      having subsequent values in cache.
      
      Quoting from Google's Project Zero blog [1]:
      
        Additionally, at least on the Intel machine on which this was
        tested, bouncing modified cache lines between cores is slow,
        apparently because the MESI protocol is used for cache coherence
        [8]. Changing the reference counter of an eBPF array on one
        physical CPU core causes the cache line containing the reference
        counter to be bounced over to that CPU core, making reads of the
        reference counter on all other CPU cores slow until the changed
        reference counter has been written back to memory. Because the
        length and the reference counter of an eBPF array are stored in
        the same cache line, this also means that changing the reference
        counter on one physical CPU core causes reads of the eBPF array's
        length to be slow on other physical CPU cores (intentional false
        sharing).
      
      While this doesn't 'control' the out-of-bounds speculation through
      masking the index as in commit b2157399, triggering a manipulation
      of the map's reference counter is really trivial, so lets not allow
      to easily affect max_entries from it.
      
      Splitting to separate cachelines also generally makes sense from
      a performance perspective anyway in that fast-path won't have a
      cache miss if the map gets pinned, reused in other progs, etc out
      of control path, thus also avoids unintentional false sharing.
      
        [1] https://googleprojectzero.blogspot.ch/2018/01/reading-privileged-memory-with-side.htmlSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      be95a845
  4. 09 1月, 2018 25 次提交
  5. 06 1月, 2018 4 次提交
    • J
      xdp: generic XDP handling of xdp_rxq_info · e817f856
      Jesper Dangaard Brouer 提交于
      Hook points for xdp_rxq_info:
       * reg  : netif_alloc_rx_queues
       * unreg: netif_free_rx_queues
      
      The net_device have some members (num_rx_queues + real_num_rx_queues)
      and data-area (dev->_rx with struct netdev_rx_queue's) that were
      primarily used for exporting information about RPS (CONFIG_RPS) queues
      to sysfs (CONFIG_SYSFS).
      
      For generic XDP extend struct netdev_rx_queue with the xdp_rxq_info,
      and remove some of the CONFIG_SYSFS ifdefs.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e817f856
    • J
      xdp: base API for new XDP rx-queue info concept · aecd67b6
      Jesper Dangaard Brouer 提交于
      This patch only introduce the core data structures and API functions.
      All XDP enabled drivers must use the API before this info can used.
      
      There is a need for XDP to know more about the RX-queue a given XDP
      frames have arrived on.  For both the XDP bpf-prog and kernel side.
      
      Instead of extending xdp_buff each time new info is needed, the patch
      creates a separate read-mostly struct xdp_rxq_info, that contains this
      info.  We stress this data/cache-line is for read-only info.  This is
      NOT for dynamic per packet info, use the data_meta for such use-cases.
      
      The performance advantage is this info can be setup at RX-ring init
      time, instead of updating N-members in xdp_buff.  A possible (driver
      level) micro optimization is that xdp_buff->rxq assignment could be
      done once per XDP/NAPI loop.  The extra pointer deref only happens for
      program needing access to this info (thus, no slowdown to existing
      use-cases).
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      aecd67b6
    • S
      sh_eth: fix SH7757 GEther initialization · 51335502
      Sergei Shtylyov 提交于
      Renesas  SH7757 has 2 Fast and 2 Gigabit Ether controllers, while the
      'sh_eth' driver can only reset and initialize TSU of the first controller
      pair. Shimoda-san tried to solve that adding the 'needs_init' member to the
      'struct sh_eth_plat_data', however the platform code still never sets this
      flag. I think  that we can infer this information from the 'devno' variable
      (set  to 'platform_device::id') and reset/init the Ether controller pair
      only for an even 'devno'; therefore 'sh_eth_plat_data::needs_init' can be
      removed...
      
      Fixes: 150647fb ("net: sh_eth: change the condition of initialization")
      Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51335502
    • A
      fix "netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'" · 040ee692
      Al Viro 提交于
      Descriptor table is a shared object; it's not a place where you can
      stick temporary references to files, especially when we don't need
      an opened file at all.
      
      Cc: stable@vger.kernel.org # v4.14
      Fixes: 98589a09 ("netfilter: xt_bpf: Fix XT_BPF_MODE_FD_PINNED mode of 'xt_bpf_info_v1'")
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      040ee692
  6. 05 1月, 2018 2 次提交
  7. 04 1月, 2018 2 次提交