1. 20 1月, 2022 1 次提交
  2. 19 1月, 2022 4 次提交
    • K
      bpf: Add reference tracking support to kfunc · 5c073f26
      Kumar Kartikeya Dwivedi 提交于
      This patch adds verifier support for PTR_TO_BTF_ID return type of kfunc
      to be a reference, by reusing acquire_reference_state/release_reference
      support for existing in-kernel bpf helpers.
      
      We make use of the three kfunc types:
      
      - BTF_KFUNC_TYPE_ACQUIRE
        Return true if kfunc_btf_id is an acquire kfunc.  This will
        acquire_reference_state for the returned PTR_TO_BTF_ID (this is the
        only allow return value). Note that acquire kfunc must always return a
        PTR_TO_BTF_ID{_OR_NULL}, otherwise the program is rejected.
      
      - BTF_KFUNC_TYPE_RELEASE
        Return true if kfunc_btf_id is a release kfunc.  This will release the
        reference to the passed in PTR_TO_BTF_ID which has a reference state
        (from earlier acquire kfunc).
        The btf_check_func_arg_match returns the regno (of argument register,
        hence > 0) if the kfunc is a release kfunc, and a proper referenced
        PTR_TO_BTF_ID is being passed to it.
        This is similar to how helper call check uses bpf_call_arg_meta to
        store the ref_obj_id that is later used to release the reference.
        Similar to in-kernel helper, we only allow passing one referenced
        PTR_TO_BTF_ID as an argument. It can also be passed in to normal
        kfunc, but in case of release kfunc there must always be one
        PTR_TO_BTF_ID argument that is referenced.
      
      - BTF_KFUNC_TYPE_RET_NULL
        For kfunc returning PTR_TO_BTF_ID, tells if it can be NULL, hence
        force caller to mark the pointer not null (using check) before
        accessing it. Note that taking into account the case fixed by commit
        93c230e3 ("bpf: Enforce id generation for all may-be-null register type")
        we assign a non-zero id for mark_ptr_or_null_reg logic. Later, if more
        return types are supported by kfunc, which have a _OR_NULL variant, it
        might be better to move this id generation under a common
        reg_type_may_be_null check, similar to the case in the commit.
      
      Referenced PTR_TO_BTF_ID is currently only limited to kfunc, but can be
      extended in the future to other BPF helpers as well.  For now, we can
      rely on the btf_struct_ids_match check to ensure we get the pointer to
      the expected struct type. In the future, care needs to be taken to avoid
      ambiguity for reference PTR_TO_BTF_ID passed to release function, in
      case multiple candidates can release same BTF ID.
      
      e.g. there might be two release kfuncs (or kfunc and helper):
      
      foo(struct abc *p);
      bar(struct abc *p);
      
      ... such that both release a PTR_TO_BTF_ID with btf_id of struct abc. In
      this case we would need to track the acquire function corresponding to
      the release function to avoid type confusion, and store this information
      in the register state so that an incorrect program can be rejected. This
      is not a problem right now, hence it is left as an exercise for the
      future patch introducing such a case in the kernel.
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-6-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      5c073f26
    • K
      bpf: Introduce mem, size argument pair support for kfunc · d583691c
      Kumar Kartikeya Dwivedi 提交于
      BPF helpers can associate two adjacent arguments together to pass memory
      of certain size, using ARG_PTR_TO_MEM and ARG_CONST_SIZE arguments.
      Since we don't use bpf_func_proto for kfunc, we need to leverage BTF to
      implement similar support.
      
      The ARG_CONST_SIZE processing for helpers is refactored into a common
      check_mem_size_reg helper that is shared with kfunc as well. kfunc
      ptr_to_mem support follows logic similar to global functions, where
      verification is done as if pointer is not null, even when it may be
      null.
      
      This leads to a simple to follow rule for writing kfunc: always check
      the argument pointer for NULL, except when it is PTR_TO_CTX. Also, the
      PTR_TO_CTX case is also only safe when the helper expecting pointer to
      program ctx is not exposed to other programs where same struct is not
      ctx type. In that case, the type check will fall through to other cases
      and would permit passing other types of pointers, possibly NULL at
      runtime.
      
      Currently, we require the size argument to be suffixed with "__sz" in
      the parameter name. This information is then recorded in kernel BTF and
      verified during function argument checking. In the future we can use BTF
      tagging instead, and modify the kernel function definitions. This will
      be a purely kernel-side change.
      
      This allows us to have some form of backwards compatibility for
      structures that are passed in to the kernel function with their size,
      and allow variable length structures to be passed in if they are
      accompanied by a size parameter.
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-5-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      d583691c
    • K
      bpf: Remove check_kfunc_call callback and old kfunc BTF ID API · b202d844
      Kumar Kartikeya Dwivedi 提交于
      Completely remove the old code for check_kfunc_call to help it work
      with modules, and also the callback itself.
      
      The previous commit adds infrastructure to register all sets and put
      them in vmlinux or module BTF, and concatenates all related sets
      organized by the hook and the type. Once populated, these sets remain
      immutable for the lifetime of the struct btf.
      
      Also, since we don't need the 'owner' module anywhere when doing
      check_kfunc_call, drop the 'btf_modp' module parameter from
      find_kfunc_desc_btf.
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-4-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      b202d844
    • K
      bpf: Populate kfunc BTF ID sets in struct btf · dee872e1
      Kumar Kartikeya Dwivedi 提交于
      This patch prepares the kernel to support putting all kinds of kfunc BTF
      ID sets in the struct btf itself. The various kernel subsystems will
      make register_btf_kfunc_id_set call in the initcalls (for built-in code
      and modules).
      
      The 'hook' is one of the many program types, e.g. XDP and TC/SCHED_CLS,
      STRUCT_OPS, and 'types' are check (allowed or not), acquire, release,
      and ret_null (with PTR_TO_BTF_ID_OR_NULL return type).
      
      A maximum of BTF_KFUNC_SET_MAX_CNT (32) kfunc BTF IDs are permitted in a
      set of certain hook and type for vmlinux sets, since they are allocated
      on demand, and otherwise set as NULL. Module sets can only be registered
      once per hook and type, hence they are directly assigned.
      
      A new btf_kfunc_id_set_contains function is exposed for use in verifier,
      this new method is faster than the existing list searching method, and
      is also automatic. It also lets other code not care whether the set is
      unallocated or not.
      
      Note that module code can only do single register_btf_kfunc_id_set call
      per hook. This is why sorting is only done for in-kernel vmlinux sets,
      because there might be multiple sets for the same hook and type that
      must be concatenated, hence sorting them is required to ensure bsearch
      in btf_id_set_contains continues to work correctly.
      
      Next commit will update the kernel users to make use of this
      infrastructure.
      
      Finally, add __maybe_unused annotation for BTF ID macros for the
      !CONFIG_DEBUG_INFO_BTF case, so that they don't produce warnings during
      build time.
      
      The previous patch is also needed to provide synchronization against
      initialization for module BTF's kfunc_set_tab introduced here, as
      described below:
      
        The kfunc_set_tab pointer in struct btf is write-once (if we consider
        the registration phase (comprised of multiple register_btf_kfunc_id_set
        calls) as a single operation). In this sense, once it has been fully
        prepared, it isn't modified, only used for lookup (from the verifier
        context).
      
        For btf_vmlinux, it is initialized fully during the do_initcalls phase,
        which happens fairly early in the boot process, before any processes are
        present. This also eliminates the possibility of bpf_check being called
        at that point, thus relieving us of ensuring any synchronization between
        the registration and lookup function (btf_kfunc_id_set_contains).
      
        However, the case for module BTF is a bit tricky. The BTF is parsed,
        prepared, and published from the MODULE_STATE_COMING notifier callback.
        After this, the module initcalls are invoked, where our registration
        function will be called to populate the kfunc_set_tab for module BTF.
      
        At this point, BTF may be available to userspace while its corresponding
        module is still intializing. A BTF fd can then be passed to verifier
        using bpf syscall (e.g. for kfunc call insn).
      
        Hence, there is a race window where verifier may concurrently try to
        lookup the kfunc_set_tab. To prevent this race, we must ensure the
        operations are serialized, or waiting for the __init functions to
        complete.
      
        In the earlier registration API, this race was alleviated as verifier
        bpf_check_mod_kfunc_call didn't find the kfunc BTF ID until it was added
        by the registration function (called usually at the end of module __init
        function after all module resources have been initialized). If the
        verifier made the check_kfunc_call before kfunc BTF ID was added to the
        list, it would fail verification (saying call isn't allowed). The
        access to list was protected using a mutex.
      
        Now, it would still fail verification, but for a different reason
        (returning ENXIO due to the failed btf_try_get_module call in
        add_kfunc_call), because if the __init call is in progress the module
        will be in the middle of MODULE_STATE_COMING -> MODULE_STATE_LIVE
        transition, and the BTF_MODULE_LIVE flag for btf_module instance will
        not be set, so the btf_try_get_module call will fail.
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Link: https://lore.kernel.org/r/20220114163953.1455836-3-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      dee872e1
  3. 10 1月, 2022 9 次提交
  4. 07 1月, 2022 2 次提交
  5. 06 1月, 2022 13 次提交
    • C
      gro: add ability to control gro max packet size · eac1b93c
      Coco Li 提交于
      Eric Dumazet suggested to allow users to modify max GRO packet size.
      
      We have seen GRO being disabled by users of appliances (such as
      wifi access points) because of claimed bufferbloat issues,
      or some work arounds in sch_cake, to split GRO/GSO packets.
      
      Instead of disabling GRO completely, one can chose to limit
      the maximum packet size of GRO packets, depending on their
      latency constraints.
      
      This patch adds a per device gro_max_size attribute
      that can be changed with ip link command.
      
      ip link set dev eth0 gro_max_size 16000
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NCoco Li <lixiaoyan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eac1b93c
    • M
      net: fix SOF_TIMESTAMPING_BIND_PHC to work with multiple sockets · 007747a9
      Miroslav Lichvar 提交于
      When multiple sockets using the SOF_TIMESTAMPING_BIND_PHC flag received
      a packet with a hardware timestamp (e.g. multiple PTP instances in
      different PTP domains using the UDPv4/v6 multicast or L2 transport),
      the timestamps received on some sockets were corrupted due to repeated
      conversion of the same timestamp (by the same or different vclocks).
      
      Fix ptp_convert_timestamp() to not modify the shared skb timestamp
      and return the converted timestamp as a ktime_t instead. If the
      conversion fails, return 0 to not confuse the application with
      timestamps corresponding to an unexpected PHC.
      
      Fixes: d7c08826 ("net: socket: support hardware timestamp conversion to PHC bound")
      Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Cc: Yangbo Lu <yangbo.lu@nxp.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      007747a9
    • M
      bootmem: Use page->index instead of page->freelist · c5e97ed1
      Matthew Wilcox (Oracle) 提交于
      page->freelist is for the use of slab.  Using page->index is the same
      set of bits as page->freelist, and by using an integer instead of a
      pointer, we can avoid casts.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: <x86@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      c5e97ed1
    • M
      mm/kasan: Convert to struct folio and struct slab · 6e48a966
      Matthew Wilcox (Oracle) 提交于
      KASAN accesses some slab related struct page fields so we need to
      convert it to struct slab. Some places are a bit simplified thanks to
      kasan_addr_to_slab() encapsulating the PageSlab flag check through
      virt_to_slab().  When resolving object address to either a real slab or
      a large kmalloc, use struct folio as the intermediate type for testing
      the slab flag to avoid unnecessary implicit compound_head().
      
      [ vbabka@suse.cz: use struct folio, adjust to differences in previous
        patches ]
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NAndrey Konovalov <andreyknvl@gmail.com>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Tested-by: NHyeongogn Yoo <42.hyeyoo@gmail.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <kasan-dev@googlegroups.com>
      6e48a966
    • V
      mm/memcg: Convert slab objcgs from struct page to struct slab · 4b5f8d9a
      Vlastimil Babka 提交于
      page->memcg_data is used with MEMCG_DATA_OBJCGS flag only for slab pages
      so convert all the related infrastructure to struct slab. Also use
      struct folio instead of struct page when resolving object pointers.
      
      This is not just mechanistic changing of types and names. Now in
      mem_cgroup_from_obj() we use folio_test_slab() to decide if we interpret
      the folio as a real slab instead of a large kmalloc, instead of relying
      on MEMCG_DATA_OBJCGS bit that used to be checked in page_objcgs_check().
      Similarly in memcg_slab_free_hook() where we can encounter
      kmalloc_large() pages (here the folio slab flag check is implied by
      virt_to_slab()). As a result, page_objcgs_check() can be dropped instead
      of converted.
      
      To avoid include cycles, move the inline definition of slab_objcgs()
      from memcontrol.h to mm/slab.h.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <cgroups@vger.kernel.org>
      4b5f8d9a
    • V
      mm: Convert struct page to struct slab in functions used by other subsystems · 40f3bf0c
      Vlastimil Babka 提交于
      KASAN, KFENCE and memcg interact with SLAB or SLUB internals through
      functions nearest_obj(), obj_to_index() and objs_per_slab() that use
      struct page as parameter. This patch converts it to struct slab
      including all callers, through a coccinelle semantic patch.
      
      // Options: --include-headers --no-includes --smpl-spacing include/linux/slab_def.h include/linux/slub_def.h mm/slab.h mm/kasan/*.c mm/kfence/kfence_test.c mm/memcontrol.c mm/slab.c mm/slub.c
      // Note: needs coccinelle 1.1.1 to avoid breaking whitespace
      
      @@
      @@
      
      -objs_per_slab_page(
      +objs_per_slab(
       ...
       )
       { ... }
      
      @@
      @@
      
      -objs_per_slab_page(
      +objs_per_slab(
       ...
       )
      
      @@
      identifier fn =~ "obj_to_index|objs_per_slab";
      @@
      
       fn(...,
      -   const struct page *page
      +   const struct slab *slab
          ,...)
       {
      <...
      (
      - page_address(page)
      + slab_address(slab)
      |
      - page
      + slab
      )
      ...>
       }
      
      @@
      identifier fn =~ "nearest_obj";
      @@
      
       fn(...,
      -   struct page *page
      +   const struct slab *slab
          ,...)
       {
      <...
      (
      - page_address(page)
      + slab_address(slab)
      |
      - page
      + slab
      )
      ...>
       }
      
      @@
      identifier fn =~ "nearest_obj|obj_to_index|objs_per_slab";
      expression E;
      @@
      
       fn(...,
      (
      - slab_page(E)
      + E
      |
      - virt_to_page(E)
      + virt_to_slab(E)
      |
      - virt_to_head_page(E)
      + virt_to_slab(E)
      |
      - page
      + page_slab(page)
      )
        ,...)
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NAndrey Konovalov <andreyknvl@gmail.com>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Julia Lawall <julia.lawall@inria.fr>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <kasan-dev@googlegroups.com>
      Cc: <cgroups@vger.kernel.org>
      40f3bf0c
    • V
      mm/slub: Finish struct page to struct slab conversion · c2092c12
      Vlastimil Babka 提交于
      Update comments mentioning pages to mention slabs where appropriate.
      Also some goto labels.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      c2092c12
    • V
      mm/slub: Convert most struct page to struct slab by spatch · bb192ed9
      Vlastimil Babka 提交于
      The majority of conversion from struct page to struct slab in SLUB
      internals can be delegated to a coccinelle semantic patch. This includes
      renaming of variables with 'page' in name to 'slab', and similar.
      
      Big thanks to Julia Lawall and Luis Chamberlain for help with
      coccinelle.
      
      // Options: --include-headers --no-includes --smpl-spacing include/linux/slub_def.h mm/slub.c
      // Note: needs coccinelle 1.1.1 to avoid breaking whitespace, and ocaml for the
      // embedded script
      
      // build list of functions to exclude from applying the next rule
      @initialize:ocaml@
      @@
      
      let ok_function p =
        not (List.mem (List.hd p).current_element ["nearest_obj";"obj_to_index";"objs_per_slab_page";"__slab_lock";"__slab_unlock";"free_nonslab_page";"kmalloc_large_node"])
      
      // convert the type from struct page to struct page in all functions except the
      // list from previous rule
      // this also affects struct kmem_cache_cpu, but that's ok
      @@
      position p : script:ocaml() { ok_function p };
      @@
      
      - struct page@p
      + struct slab
      
      // in struct kmem_cache_cpu, change the name from page to slab
      // the type was already converted by the previous rule
      @@
      @@
      
      struct kmem_cache_cpu {
      ...
      -struct slab *page;
      +struct slab *slab;
      ...
      }
      
      // there are many places that use c->page which is now c->slab after the
      // previous rule
      @@
      struct kmem_cache_cpu *c;
      @@
      
      -c->page
      +c->slab
      
      @@
      @@
      
      struct kmem_cache {
      ...
      - unsigned int cpu_partial_pages;
      + unsigned int cpu_partial_slabs;
      ...
      }
      
      @@
      struct kmem_cache *s;
      @@
      
      - s->cpu_partial_pages
      + s->cpu_partial_slabs
      
      @@
      @@
      
      static void
      - setup_page_debug(
      + setup_slab_debug(
       ...)
       {...}
      
      @@
      @@
      
      - setup_page_debug(
      + setup_slab_debug(
       ...);
      
      // for all functions (with exceptions), change any "struct slab *page"
      // parameter to "struct slab *slab" in the signature, and generally all
      // occurences of "page" to "slab" in the body - with some special cases.
      
      @@
      identifier fn !~ "free_nonslab_page|obj_to_index|objs_per_slab_page|nearest_obj";
      @@
       fn(...,
      -   struct slab *page
      +   struct slab *slab
          ,...)
       {
      <...
      - page
      + slab
      ...>
       }
      
      // similar to previous but the param is called partial_page
      @@
      identifier fn;
      @@
      
       fn(...,
      -   struct slab *partial_page
      +   struct slab *partial_slab
          ,...)
       {
      <...
      - partial_page
      + partial_slab
      ...>
       }
      
      // similar to previous but for functions that take pointer to struct page ptr
      @@
      identifier fn;
      @@
      
       fn(...,
      -   struct slab **ret_page
      +   struct slab **ret_slab
          ,...)
       {
      <...
      - ret_page
      + ret_slab
      ...>
       }
      
      // functions converted by previous rules that were temporarily called using
      // slab_page(E) so we want to remove the wrapper now that they accept struct
      // slab ptr directly
      @@
      identifier fn =~ "slab_free|do_slab_free";
      expression E;
      @@
      
       fn(...,
      - slab_page(E)
      + E
        ,...)
      
      // similar to previous but for another pattern
      @@
      identifier fn =~ "slab_pad_check|check_object";
      @@
      
       fn(...,
      - folio_page(folio, 0)
      + slab
        ,...)
      
      // functions that were returning struct page ptr and now will return struct
      // slab ptr, including slab_page() wrapper removal
      @@
      identifier fn =~ "allocate_slab|new_slab";
      expression E;
      @@
      
       static
      -struct slab *
      +struct slab *
       fn(...)
       {
      <...
      - slab_page(E)
      + E
      ...>
       }
      
      // rename any former struct page * declarations
      @@
      @@
      
      struct slab *
      (
      - page
      + slab
      |
      - partial_page
      + partial_slab
      |
      - oldpage
      + oldslab
      )
      ;
      
      // this has to be separate from previous rule as page and page2 appear at the
      // same line
      @@
      @@
      
      struct slab *
      -page2
      +slab2
      ;
      
      // similar but with initial assignment
      @@
      expression E;
      @@
      
      struct slab *
      (
      - page
      + slab
      |
      - flush_page
      + flush_slab
      |
      - discard_page
      + slab_to_discard
      |
      - page_to_unfreeze
      + slab_to_unfreeze
      )
      = E;
      
      // convert most of struct page to struct slab usage inside functions (with
      // exceptions), including specific variable renames
      @@
      identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node";
      expression E;
      @@
      
       fn(...)
       {
      <...
      (
      - int pages;
      + int slabs;
      |
      - int pages = E;
      + int slabs = E;
      |
      - page
      + slab
      |
      - flush_page
      + flush_slab
      |
      - partial_page
      + partial_slab
      |
      - oldpage->pages
      + oldslab->slabs
      |
      - oldpage
      + oldslab
      |
      - unsigned int nr_pages;
      + unsigned int nr_slabs;
      |
      - nr_pages
      + nr_slabs
      |
      - unsigned int partial_pages = E;
      + unsigned int partial_slabs = E;
      |
      - partial_pages
      + partial_slabs
      )
      ...>
       }
      
      // this has to be split out from the previous rule so that lines containing
      // multiple matching changes will be fully converted
      @@
      identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node";
      @@
      
       fn(...)
       {
      <...
      (
      - slab->pages
      + slab->slabs
      |
      - pages
      + slabs
      |
      - page2
      + slab2
      |
      - discard_page
      + slab_to_discard
      |
      - page_to_unfreeze
      + slab_to_unfreeze
      )
      ...>
       }
      
      // after we simply changed all occurences of page to slab, some usages need
      // adjustment for slab-specific functions, or use slab_page() wrapper
      @@
      identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node";
      @@
      
       fn(...)
       {
      <...
      (
      - page_slab(slab)
      + slab
      |
      - kasan_poison_slab(slab)
      + kasan_poison_slab(slab_page(slab))
      |
      - page_address(slab)
      + slab_address(slab)
      |
      - page_size(slab)
      + slab_size(slab)
      |
      - PageSlab(slab)
      + folio_test_slab(slab_folio(slab))
      |
      - page_to_nid(slab)
      + slab_nid(slab)
      |
      - compound_order(slab)
      + slab_order(slab)
      )
      ...>
       }
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
      Tested-by: NHyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Julia Lawall <julia.lawall@inria.fr>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      bb192ed9
    • M
      mm: Convert check_heap_object() to use struct slab · 0b3eb091
      Matthew Wilcox (Oracle) 提交于
      Ensure that we're not seeing a tail page inside __check_heap_object() by
      converting to a slab instead of a page.  Take the opportunity to mark
      the slab as const since we're not modifying it.  Also move the
      declaration of __check_heap_object() to mm/slab.h so it's not available
      to the wider kernel.
      
      [ vbabka@suse.cz: in check_heap_object() only convert to struct slab for
        actual PageSlab pages; use folio as intermediate step instead of page ]
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      0b3eb091
    • M
      mm: Split slab into its own type · d122019b
      Matthew Wilcox (Oracle) 提交于
      Make struct slab independent of struct page. It still uses the
      underlying memory in struct page for storing slab-specific data, but
      slab and slub can now be weaned off using struct page directly.  Some of
      the wrapper functions (slab_address() and slab_order()) still need to
      cast to struct folio, but this is a significant disentanglement.
      
      [ vbabka@suse.cz: Rebase on folios, use folio instead of page where
        possible.
      
        Do not duplicate flags field in struct slab, instead make the related
        accessors go through slab_folio(). For testing pfmemalloc use the
        folio_*_active flag accessors directly so the PageSlabPfmemalloc
        wrappers can be removed later.
      
        Make folio_slab() expect only folio_test_slab() == true folios and
        virt_to_slab() return NULL when folio_test_slab() == false.
      
        Move struct slab to mm/slab.h.
      
        Don't represent with struct slab pages that are not true slab pages,
        but just a compound page obtained directly rom page allocator (with
        large kmalloc() for SLUB and SLOB). ]
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      d122019b
    • V
      mm/slub: Make object_err() static · ae16d059
      Vlastimil Babka 提交于
      There are no callers outside of mm/slub.c anymore.
      
      Move freelist_corrupted() that calls object_err() to avoid a need for
      forward declaration.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: NRoman Gushchin <guro@fb.com>
      ae16d059
    • T
      xdp: Add xdp_do_redirect_frame() for pre-computed xdp_frames · 1372d34c
      Toke Høiland-Jørgensen 提交于
      Add an xdp_do_redirect_frame() variant which supports pre-computed
      xdp_frame structures. This will be used in bpf_prog_run() to avoid having
      to write to the xdp_frame structure when the XDP program doesn't modify the
      frame boundaries.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220103150812.87914-6-toke@redhat.com
      1372d34c
    • T
      xdp: Move conversion to xdp_frame out of map functions · d53ad5d8
      Toke Høiland-Jørgensen 提交于
      All map redirect functions except XSK maps convert xdp_buff to xdp_frame
      before enqueueing it. So move this conversion of out the map functions
      and into xdp_do_redirect(). This removes a bit of duplicated code, but more
      importantly it makes it possible to support caller-allocated xdp_frame
      structures, which will be added in a subsequent commit.
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220103150812.87914-5-toke@redhat.com
      d53ad5d8
  6. 05 1月, 2022 6 次提交
    • R
      net: mdio: add helpers to extract clause 45 regad and devad fields · c6af53f0
      Russell King (Oracle) 提交于
      Add a couple of helpers and definitions to extract the clause 45 regad
      and devad fields from the regnum passed into MDIO drivers.
      Tested-by: NDaniel Golle <daniel@makrotopia.org>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDaniel Golle <daniel@makrotopia.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6af53f0
    • V
      can: dev: reorder struct can_priv members for better packing · 5fe1be81
      Vincent Mailhol 提交于
      Save eight bytes of holes on x86-64 architectures by reordering the
      members of struct can_priv.
      
      Before:
      
      | $ pahole -C can_priv drivers/net/can/dev/dev.o
      | struct can_priv {
      | 	struct net_device *        dev;                  /*     0     8 */
      | 	struct can_device_stats    can_stats;            /*     8    24 */
      | 	const struct can_bittiming_const  * bittiming_const; /*    32     8 */
      | 	const struct can_bittiming_const  * data_bittiming_const; /*    40     8 */
      | 	struct can_bittiming       bittiming;            /*    48    32 */
      | 	/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
      | 	struct can_bittiming       data_bittiming;       /*    80    32 */
      | 	const struct can_tdc_const  * tdc_const;         /*   112     8 */
      | 	struct can_tdc             tdc;                  /*   120    12 */
      | 	/* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
      | 	unsigned int               bitrate_const_cnt;    /*   132     4 */
      | 	const u32  *               bitrate_const;        /*   136     8 */
      | 	const u32  *               data_bitrate_const;   /*   144     8 */
      | 	unsigned int               data_bitrate_const_cnt; /*   152     4 */
      | 	u32                        bitrate_max;          /*   156     4 */
      | 	struct can_clock           clock;                /*   160     4 */
      | 	unsigned int               termination_const_cnt; /*   164     4 */
      | 	const u16  *               termination_const;    /*   168     8 */
      | 	u16                        termination;          /*   176     2 */
      |
      | 	/* XXX 6 bytes hole, try to pack */
      |
      | 	struct gpio_desc *         termination_gpio;     /*   184     8 */
      | 	/* --- cacheline 3 boundary (192 bytes) --- */
      | 	u16                        termination_gpio_ohms[2]; /*   192     4 */
      | 	enum can_state             state;                /*   196     4 */
      | 	u32                        ctrlmode;             /*   200     4 */
      | 	u32                        ctrlmode_supported;   /*   204     4 */
      | 	int                        restart_ms;           /*   208     4 */
      |
      | 	/* XXX 4 bytes hole, try to pack */
      |
      | 	struct delayed_work        restart_work;         /*   216    88 */
      |
      | 	/* XXX last struct has 4 bytes of padding */
      |
      | 	/* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */
      | 	int                        (*do_set_bittiming)(struct net_device *); /*   304     8 */
      | 	int                        (*do_set_data_bittiming)(struct net_device *); /*   312     8 */
      | 	/* --- cacheline 5 boundary (320 bytes) --- */
      | 	int                        (*do_set_mode)(struct net_device *, enum can_mode); /*   320     8 */
      | 	int                        (*do_set_termination)(struct net_device *, u16); /*   328     8 */
      | 	int                        (*do_get_state)(const struct net_device  *, enum can_state *); /*   336     8 */
      | 	int                        (*do_get_berr_counter)(const struct net_device  *, struct can_berr_counter *); /*   344     8 */
      | 	unsigned int               echo_skb_max;         /*   352     4 */
      |
      | 	/* XXX 4 bytes hole, try to pack */
      |
      | 	struct sk_buff * *         echo_skb;             /*   360     8 */
      |
      | 	/* size: 368, cachelines: 6, members: 32 */
      | 	/* sum members: 354, holes: 3, sum holes: 14 */
      | 	/* paddings: 1, sum paddings: 4 */
      | 	/* last cacheline: 48 bytes */
      | };
      
      After:
      
      | $ pahole -C can_priv drivers/net/can/dev/dev.o
      | struct can_priv {
      | 	struct net_device *        dev;                  /*     0     8 */
      | 	struct can_device_stats    can_stats;            /*     8    24 */
      | 	const struct can_bittiming_const  * bittiming_const; /*    32     8 */
      | 	const struct can_bittiming_const  * data_bittiming_const; /*    40     8 */
      | 	struct can_bittiming       bittiming;            /*    48    32 */
      | 	/* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
      | 	struct can_bittiming       data_bittiming;       /*    80    32 */
      | 	const struct can_tdc_const  * tdc_const;         /*   112     8 */
      | 	struct can_tdc             tdc;                  /*   120    12 */
      | 	/* --- cacheline 2 boundary (128 bytes) was 4 bytes ago --- */
      | 	unsigned int               bitrate_const_cnt;    /*   132     4 */
      | 	const u32  *               bitrate_const;        /*   136     8 */
      | 	const u32  *               data_bitrate_const;   /*   144     8 */
      | 	unsigned int               data_bitrate_const_cnt; /*   152     4 */
      | 	u32                        bitrate_max;          /*   156     4 */
      | 	struct can_clock           clock;                /*   160     4 */
      | 	unsigned int               termination_const_cnt; /*   164     4 */
      | 	const u16  *               termination_const;    /*   168     8 */
      | 	u16                        termination;          /*   176     2 */
      |
      | 	/* XXX 6 bytes hole, try to pack */
      |
      | 	struct gpio_desc *         termination_gpio;     /*   184     8 */
      | 	/* --- cacheline 3 boundary (192 bytes) --- */
      | 	u16                        termination_gpio_ohms[2]; /*   192     4 */
      | 	unsigned int               echo_skb_max;         /*   196     4 */
      | 	struct sk_buff * *         echo_skb;             /*   200     8 */
      | 	enum can_state             state;                /*   208     4 */
      | 	u32                        ctrlmode;             /*   212     4 */
      | 	u32                        ctrlmode_supported;   /*   216     4 */
      | 	int                        restart_ms;           /*   220     4 */
      | 	struct delayed_work        restart_work;         /*   224    88 */
      |
      | 	/* XXX last struct has 4 bytes of padding */
      |
      | 	/* --- cacheline 4 boundary (256 bytes) was 56 bytes ago --- */
      | 	int                        (*do_set_bittiming)(struct net_device *); /*   312     8 */
      | 	/* --- cacheline 5 boundary (320 bytes) --- */
      | 	int                        (*do_set_data_bittiming)(struct net_device *); /*   320     8 */
      | 	int                        (*do_set_mode)(struct net_device *, enum can_mode); /*   328     8 */
      | 	int                        (*do_set_termination)(struct net_device *, u16); /*   336     8 */
      | 	int                        (*do_get_state)(const struct net_device  *, enum can_state *); /*   344     8 */
      | 	int                        (*do_get_berr_counter)(const struct net_device  *, struct can_berr_counter *); /*   352     8 */
      |
      | 	/* size: 360, cachelines: 6, members: 32 */
      | 	/* sum members: 354, holes: 1, sum holes: 6 */
      | 	/* paddings: 1, sum paddings: 4 */
      | 	/* last cacheline: 40 bytes */
      | };
      
      Link: https://lore.kernel.org/all/20211213160226.56219-4-mailhol.vincent@wanadoo.frSigned-off-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      5fe1be81
    • V
      can: dev: add sanity check in can_set_static_ctrlmode() · 7d4a101c
      Vincent Mailhol 提交于
      Previous patch removed can_priv::ctrlmode_static to replace it with
      can_get_static_ctrlmode().
      
      A condition sine qua non for this to work is that the controller
      static modes should never be set in can_priv::ctrlmode_supported
      (c.f. the comment on can_priv::ctrlmode_supported which states that it
      is for "options that can be *modified* by netlink"). Also, this
      condition is already correctly fulfilled by all existing drivers
      which rely on the ctrlmode_static feature.
      
      Nonetheless, we added an extra safeguard in can_set_static_ctrlmode()
      to return an error value and to warn the developer who would be
      adventurous enough to set to static a given feature that is already
      set to supported.
      
      The drivers which rely on the static controller mode are then updated
      to check the return value of can_set_static_ctrlmode().
      
      Link: https://lore.kernel.org/all/20211213160226.56219-3-mailhol.vincent@wanadoo.frSigned-off-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      7d4a101c
    • V
      can: dev: replace can_priv::ctrlmode_static by can_get_static_ctrlmode() · c9e1d8ed
      Vincent Mailhol 提交于
      The statically enabled features of a CAN controller can be retrieved
      using below formula:
      
      | u32 ctrlmode_static = priv->ctrlmode & ~priv->ctrlmode_supported;
      
      As such, there is no need to store this information. This patch remove
      the field ctrlmode_static of struct can_priv and provides, in
      replacement, the inline function can_get_static_ctrlmode() which
      returns the same value.
      
      Link: https://lore.kernel.org/all/20211213160226.56219-2-mailhol.vincent@wanadoo.frSigned-off-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      c9e1d8ed
    • V
      can: do not increase tx_bytes statistics for RTR frames · cc4b08c3
      Vincent Mailhol 提交于
      The actual payload length of the CAN Remote Transmission Request (RTR)
      frames is always 0, i.e. no payload is transmitted on the wire.
      However, those RTR frames still use the DLC to indicate the length of
      the requested frame.
      
      As such, net_device_stats::tx_bytes should not be increased when
      sending RTR frames.
      
      The function can_get_echo_skb() already returns the correct length,
      even for RTR frames (c.f. [1]). However, for historical reasons, the
      drivers do not use can_get_echo_skb()'s return value and instead, most
      of them store a temporary length (or dlc) in some local structure or
      array. Using the return value of can_get_echo_skb() solves the
      issue. After doing this, such length/dlc fields become unused and so
      this patch does the adequate cleaning when needed.
      
      This patch fixes all the CAN drivers.
      
      Finally, can_get_echo_skb() is decorated with the __must_check
      attribute in order to force future drivers to correctly use its return
      value (else the compiler would emit a warning).
      
      [1] commit ed3320ce ("can: dev: __can_get_echo_skb():
      fix real payload length return value for RTR frames")
      
      Link: https://lore.kernel.org/all/20211207121531.42941-6-mailhol.vincent@wanadoo.fr
      Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
      Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
      Cc: Ludovic Desroches <ludovic.desroches@microchip.com>
      Cc: Maxime Ripard <mripard@kernel.org>
      Cc: Chen-Yu Tsai <wens@csie.org>
      Cc: Jernej Skrabec <jernej.skrabec@gmail.com>
      Cc: Yasushi SHOJI <yashi@spacecubics.com>
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Cc: Stephane Grosjean <s.grosjean@peak-system.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Tested-by: Jimmy Assarsson <extja@kvaser.com> # kvaser
      Signed-off-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Acked-by: Stefan Mätje <stefan.maetje@esd.eu> # esd_usb2
      Tested-by: Stefan Mätje <stefan.maetje@esd.eu> # esd_usb2
      [mkl: add conversion for grcan]
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      cc4b08c3
    • S
      ACPI: PCC: Implement OperationRegion handler for the PCC Type 3 subtype · 77e2a047
      Sudeep Holla 提交于
      PCC OpRegion provides a mechanism to communicate with the platform
      directly from the AML. PCCT provides the list of PCC channel available
      in the platform, a subset or all of them can be used in PCC Opregion.
      
      This patch registers the PCC OpRegion handler before ACPI tables are
      loaded. This relies on the special context data passed to identify and
      set up the PCC channel before the OpRegion handler is executed for the
      first time.
      
      Typical PCC Opregion declaration looks like this:
      
      OperationRegion (PFRM, PCC, 2, 0x74)
      Field (PFRM, ByteAcc, NoLock, Preserve)
      {
          SIGN,   32,
          FLGS,   32,
          LEN,    32,
          CMD,    32,
          DATA,   800
      }
      
      It contains four named double words followed by 100 bytes of buffer
      names DATA.
      
      ASL can fill out the buffer something like:
      
          /* Create global or local buffer */
          Name (BUFF, Buffer (0x0C){})
          /* Create double word fields over the buffer */
          CreateDWordField (BUFF, 0x0, WD0)
          CreateDWordField (BUFF, 0x04, WD1)
          CreateDWordField (BUFF, 0x08, WD2)
      
          /* Fill the named fields */
          WD0 = 0x50434300
          SIGN = BUFF
          WD0 = 1
          FLGS = BUFF
          WD0 = 0x10
          LEN = BUFF
      
          /* Fill the payload in the DATA buffer */
          WD0 = 0
          WD1 = 0x08
          WD2 = 0
          DATA = BUFF
      
          /* Write to CMD field to trigger handler */
          WD0 = 0x4404
          CMD = BUFF
      
      This buffer is received by acpi_pcc_opregion_space_handler. This
      handler will fetch the complete buffer via internal_pcc_buffer.
      
      The setup handler will receive the special PCC context data which will
      contain the PCC channel index which used to set up the channel. The
      buffer pointer and length is saved in region context which is then used
      in the handler.
      
      (kernel test robot: Build failure with CONFIG_ACPI_DEBUGGER)
      Link: https://lore.kernel.org/r/202201041539.feAV0l27-lkp@intel.comReported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      77e2a047
  7. 04 1月, 2022 1 次提交
    • A
      icmp: ICMPV6: Examine invoking packet for Segment Route Headers. · e4129440
      Andrew Lunn 提交于
      RFC8754 says:
      
      ICMP error packets generated within the SR domain are sent to source
      nodes within the SR domain.  The invoking packet in the ICMP error
      message may contain an SRH.  Since the destination address of a packet
      with an SRH changes as each segment is processed, it may not be the
      destination used by the socket or application that generated the
      invoking packet.
      
      For the source of an invoking packet to process the ICMP error
      message, the ultimate destination address of the IPv6 header may be
      required.  The following logic is used to determine the destination
      address for use by protocol-error handlers.
      
      *  Walk all extension headers of the invoking IPv6 packet to the
         routing extension header preceding the upper-layer header.
      
         -  If routing header is type 4 Segment Routing Header (SRH)
      
            o  The SID at Segment List[0] may be used as the destination
               address of the invoking packet.
      
      Mangle the skb so the network header points to the invoking packet
      inside the ICMP packet. The seg6 helpers can then be used on the skb
      to find any segment routing headers. If found, mark this fact in the
      IPv6 control block of the skb, and store the offset into the packet of
      the SRH. Then restore the skb back to its old state.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4129440
  8. 03 1月, 2022 1 次提交
  9. 01 1月, 2022 1 次提交
  10. 31 12月, 2021 2 次提交