1. 19 12月, 2021 7 次提交
  2. 17 12月, 2021 4 次提交
    • C
      bpf: Right align verifier states in verifier logs. · 2e576648
      Christy Lee 提交于
      Make the verifier logs more readable, print the verifier states
      on the corresponding instruction line. If the previous line was
      not a bpf instruction, then print the verifier states on its own
      line.
      
      Before:
      
      Validating test_pkt_access_subprog3() func#3...
      86: R1=invP(id=0) R2=ctx(id=0,off=0,imm=0) R10=fp0
      ; int test_pkt_access_subprog3(int val, struct __sk_buff *skb)
      86: (bf) r6 = r2
      87: R2=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0)
      87: (bc) w7 = w1
      88: R1=invP(id=0) R7_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
      ; return get_skb_len(skb) * get_skb_ifindex(val, skb, get_constant(123));
      88: (bf) r1 = r6
      89: R1_w=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0)
      89: (85) call pc+9
      Func#4 is global and valid. Skipping.
      90: R0_w=invP(id=0)
      90: (bc) w8 = w0
      91: R0_w=invP(id=0) R8_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
      ; return get_skb_len(skb) * get_skb_ifindex(val, skb, get_constant(123));
      91: (b7) r1 = 123
      92: R1_w=invP123
      92: (85) call pc+65
      Func#5 is global and valid. Skipping.
      93: R0=invP(id=0)
      
      After:
      
      86: R1=invP(id=0) R2=ctx(id=0,off=0,imm=0) R10=fp0
      ; int test_pkt_access_subprog3(int val, struct __sk_buff *skb)
      86: (bf) r6 = r2                      ; R2=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0)
      87: (bc) w7 = w1                      ; R1=invP(id=0) R7_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
      ; return get_skb_len(skb) * get_skb_ifindex(val, skb, get_constant(123));
      88: (bf) r1 = r6                      ; R1_w=ctx(id=0,off=0,imm=0) R6_w=ctx(id=0,off=0,imm=0)
      89: (85) call pc+9
      Func#4 is global and valid. Skipping.
      90: R0_w=invP(id=0)
      90: (bc) w8 = w0                      ; R0_w=invP(id=0) R8_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
      ; return get_skb_len(skb) * get_skb_ifindex(val, skb, get_constant(123));
      91: (b7) r1 = 123                     ; R1_w=invP123
      92: (85) call pc+65
      Func#5 is global and valid. Skipping.
      93: R0=invP(id=0)
      Signed-off-by: NChristy Lee <christylee@fb.com>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      2e576648
    • C
      bpf: Only print scratched registers and stack slots to verifier logs. · 0f55f9ed
      Christy Lee 提交于
      When printing verifier state for any log level, print full verifier
      state only on function calls or on errors. Otherwise, only print the
      registers and stack slots that were accessed.
      
      Log size differences:
      
      verif_scale_loop6 before: 234566564
      verif_scale_loop6 after: 72143943
      69% size reduction
      
      kfree_skb before: 166406
      kfree_skb after: 55386
      69% size reduction
      
      Before:
      
      156: (61) r0 = *(u32 *)(r1 +0)
      157: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1=ctx(id=0,off=0,imm=0) R2_w=invP0 R10=fp0 fp-8_w=00000000 fp-16_w=00\
      000000 fp-24_w=00000000 fp-32_w=00000000 fp-40_w=00000000 fp-48_w=00000000 fp-56_w=00000000 fp-64_w=00000000 fp-72_w=00000000 fp-80_w=00000\
      000 fp-88_w=00000000 fp-96_w=00000000 fp-104_w=00000000 fp-112_w=00000000 fp-120_w=00000000 fp-128_w=00000000 fp-136_w=00000000 fp-144_w=00\
      000000 fp-152_w=00000000 fp-160_w=00000000 fp-168_w=00000000 fp-176_w=00000000 fp-184_w=00000000 fp-192_w=00000000 fp-200_w=00000000 fp-208\
      _w=00000000 fp-216_w=00000000 fp-224_w=00000000 fp-232_w=00000000 fp-240_w=00000000 fp-248_w=00000000 fp-256_w=00000000 fp-264_w=00000000 f\
      p-272_w=00000000 fp-280_w=00000000 fp-288_w=00000000 fp-296_w=00000000 fp-304_w=00000000 fp-312_w=00000000 fp-320_w=00000000 fp-328_w=00000\
      000 fp-336_w=00000000 fp-344_w=00000000 fp-352_w=00000000 fp-360_w=00000000 fp-368_w=00000000 fp-376_w=00000000 fp-384_w=00000000 fp-392_w=\
      00000000 fp-400_w=00000000 fp-408_w=00000000 fp-416_w=00000000 fp-424_w=00000000 fp-432_w=00000000 fp-440_w=00000000 fp-448_w=00000000
      ; return skb->len;
      157: (95) exit
      Func#4 is safe for any args that match its prototype
      Validating get_constant() func#5...
      158: R1=invP(id=0) R10=fp0
      ; int get_constant(long val)
      158: (bf) r0 = r1
      159: R0_w=invP(id=1) R1=invP(id=1) R10=fp0
      ; return val - 122;
      159: (04) w0 += -122
      160: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1=invP(id=1) R10=fp0
      ; return val - 122;
      160: (95) exit
      Func#5 is safe for any args that match its prototype
      Validating get_skb_ifindex() func#6...
      161: R1=invP(id=0) R2=ctx(id=0,off=0,imm=0) R3=invP(id=0) R10=fp0
      ; int get_skb_ifindex(int val, struct __sk_buff *skb, int var)
      161: (bc) w0 = w3
      162: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1=invP(id=0) R2=ctx(id=0,off=0,imm=0) R3=invP(id=0) R10=fp0
      
      After:
      
      156: (61) r0 = *(u32 *)(r1 +0)
      157: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1=ctx(id=0,off=0,imm=0)
      ; return skb->len;
      157: (95) exit
      Func#4 is safe for any args that match its prototype
      Validating get_constant() func#5...
      158: R1=invP(id=0) R10=fp0
      ; int get_constant(long val)
      158: (bf) r0 = r1
      159: R0_w=invP(id=1) R1=invP(id=1)
      ; return val - 122;
      159: (04) w0 += -122
      160: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
      ; return val - 122;
      160: (95) exit
      Func#5 is safe for any args that match its prototype
      Validating get_skb_ifindex() func#6...
      161: R1=invP(id=0) R2=ctx(id=0,off=0,imm=0) R3=invP(id=0) R10=fp0
      ; int get_skb_ifindex(int val, struct __sk_buff *skb, int var)
      161: (bc) w0 = w3
      162: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R3=invP(id=0)
      Signed-off-by: NChristy Lee <christylee@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211216213358.3374427-2-christylee@fb.com
      0f55f9ed
    • J
      bpf: Remove the cgroup -> bpf header dependecy · fd1740b6
      Jakub Kicinski 提交于
      Remove the dependency from cgroup-defs.h to bpf-cgroup.h and bpf.h.
      This reduces the incremental build size of x86 allmodconfig after
      bpf.h was touched from ~17k objects rebuilt to ~5k objects.
      bpf.h is 2.2kLoC and is modified relatively often.
      
      We need a new header with just the definition of struct cgroup_bpf
      and enum cgroup_bpf_attach_type, this is akin to cgroup-defs.h.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Link: https://lore.kernel.org/bpf/20211216025538.1649516-4-kuba@kernel.org
      fd1740b6
    • J
      add includes masked by cgroup -> bpf dependency · f7ea534a
      Jakub Kicinski 提交于
      cgroup pulls in BPF which pulls in a lot of includes.
      We're about to break that chain so fix those who were
      depending on it.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211216025538.1649516-2-kuba@kernel.org
      f7ea534a
  3. 14 12月, 2021 2 次提交
  4. 12 12月, 2021 1 次提交
  5. 10 12月, 2021 5 次提交
    • E
      04a931e5
    • E
      net: add networking namespace refcount tracker · 9ba74e6c
      Eric Dumazet 提交于
      We have 100+ syzbot reports about netns being dismantled too soon,
      still unresolved as of today.
      
      We think a missing get_net() or an extra put_net() is the root cause.
      
      In order to find the bug(s), and be able to spot future ones,
      this patch adds CONFIG_NET_NS_REFCNT_TRACKER and new helpers
      to precisely pair all put_net() with corresponding get_net().
      
      To use these helpers, each data structure owning a refcount
      should also use a "netns_tracker" to pair the get and put.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9ba74e6c
    • K
      skbuff: Extract list pointers to silence compiler warnings · 1a2fb220
      Kees Cook 提交于
      Under both -Warray-bounds and the object_size sanitizer, the compiler is
      upset about accessing prev/next of sk_buff when the object it thinks it
      is coming from is sk_buff_head. The warning is a false positive due to
      the compiler taking a conservative approach, opting to warn at casting
      time rather than access time.
      
      However, in support of enabling -Warray-bounds globally (which has
      found many real bugs), arrange things for sk_buff so that the compiler
      can unambiguously see that there is no intention to access anything
      except prev/next.  Introduce and cast to a separate struct sk_buff_list,
      which contains _only_ the first two fields, silencing the warnings:
      
      In file included from ./include/net/net_namespace.h:39,
                       from ./include/linux/netdevice.h:37,
                       from net/core/netpoll.c:17:
      net/core/netpoll.c: In function 'refill_skbs':
      ./include/linux/skbuff.h:2086:9: warning: array subscript 'struct sk_buff[0]' is partly outside array bounds of 'struct sk_buff_head[1]' [-Warray-bounds]
       2086 |         __skb_insert(newsk, next->prev, next, list);
            |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      net/core/netpoll.c:49:28: note: while referencing 'skb_pool'
         49 | static struct sk_buff_head skb_pool;
            |                            ^~~~~~~~
      
      This change results in no executable instruction differences.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20211207062758.2324338-1-keescook@chromium.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      1a2fb220
    • R
      net: phylink: use legacy_pre_march2020 · 001f4261
      Russell King (Oracle) 提交于
      Use the legacy flag to indicate whether we should operate in legacy
      mode. This allows us to stop using the presence of a PCS as an
      indicator to the age of the phylink user, and make PCS presence
      optional.
      
      Legacy mode involves:
      1) calling mac_config() whenever the link comes up
      2) calling mac_config() whenever the inband advertisement changes,
         possibly followed by a call to mac_an_restart()
      3) making use of mac_an_restart()
      4) making use of mac_pcs_get_state()
      
      All the above functionality was moved to a seperate "PCS" block of
      operations in March 2020.
      
      Update the documents to indicate that the differences that this flag
      makes.
      Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      001f4261
    • R
      net: phylink: add legacy_pre_march2020 indicator · 3e5b1fec
      Russell King (Oracle) 提交于
      Add a boolean to phylink_config to indicate whether a driver has not
      been updated for the changes in commit 7cceb599 ("net: phylink:
      avoid mac_config calls"), and thus are reliant on the old behaviour.
      
      We were currently keying the phylink behaviour on the presence of a
      PCS, but this is sub-optimal for modern drivers that may not have a
      PCS.
      
      This commit merely introduces the new flag, but does not add any use,
      since we need all legacy drivers to set this flag before it can be
      used. Once these legacy drivers have been updated, we can remove this
      flag.
      Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3e5b1fec
  6. 09 12月, 2021 3 次提交
    • S
      net: wwan: make debugfs optional · 283e6f5a
      Sergey Ryazanov 提交于
      Debugfs interface is optional for the regular modem use. Some distros
      and users will want to disable this feature for security or kernel
      size reasons. So add a configuration option that allows to completely
      disable the debugfs interface of the WWAN devices.
      
      A primary considered use case for this option was embedded firmwares.
      For example, in OpenWrt, you can not completely disable debugfs, as a
      lot of wireless stuff can only be configured and monitored with the
      debugfs knobs. At the same time, reducing the size of a kernel and
      modules is an essential task in the world of embedded software.
      Disabling the WWAN and IOSM debugfs interfaces allows us to save 50K
      (x86-64 build) of space for module storage. Not much, but already
      considerable when you only have 16MB of storage.
      
      So it is hard to just disable whole debugfs. Users need some fine
      grained set of options to control which debugfs interface is important
      and should be available and which is not.
      
      The new configuration symbol is enabled by default and is hidden under
      the EXPERT option. So a regular user would not be bothered by another
      one configuration question. While an embedded distro maintainer will be
      able to a little more reduce the final image size.
      Signed-off-by: NSergey Ryazanov <ryazanov.s.a@gmail.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Reviewed-by: NLoic Poulain <loic.poulain@linaro.org>
      Acked-by: NM Chetan Kumar <m.chetan.kumar@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      283e6f5a
    • V
      net: dsa: keep the bridge_dev and bridge_num as part of the same structure · d3eed0e5
      Vladimir Oltean 提交于
      The main desire behind this is to provide coherent bridge information to
      the fast path without locking.
      
      For example, right now we set dp->bridge_dev and dp->bridge_num from
      separate code paths, it is theoretically possible for a packet
      transmission to read these two port properties consecutively and find a
      bridge number which does not correspond with the bridge device.
      
      Another desire is to start passing more complex bridge information to
      dsa_switch_ops functions. For example, with FDB isolation, it is
      expected that drivers will need to be passed the bridge which requested
      an FDB/MDB entry to be offloaded, and along with that bridge_dev, the
      associated bridge_num should be passed too, in case the driver might
      want to implement an isolation scheme based on that number.
      
      We already pass the {bridge_dev, bridge_num} pair to the TX forwarding
      offload switch API, however we'd like to remove that and squash it into
      the basic bridge join/leave API. So that means we need to pass this
      pair to the bridge join/leave API.
      
      During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we
      call the driver's .port_bridge_leave with what used to be our
      dp->bridge_dev, but provided as an argument.
      
      When bridge_dev and bridge_num get folded into a single structure, we
      need to preserve this behavior in dsa_port_bridge_leave: we need a copy
      of what used to be in dp->bridge.
      
      Switch drivers check bridge membership by comparing dp->bridge_dev with
      the provided bridge_dev, but now, if we provide the struct dsa_bridge as
      a pointer, they cannot keep comparing dp->bridge to the provided
      pointer, since this only points to an on-stack copy. To make this
      obvious and prevent driver writers from forgetting and doing stupid
      things, in this new API, the struct dsa_bridge is provided as a full
      structure (not very large, contains an int and a pointer) instead of a
      pointer. An explicit comparison function needs to be used to determine
      bridge membership: dsa_port_offloads_bridge().
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d3eed0e5
    • V
      net: dsa: make dp->bridge_num one-based · 3f9bb030
      Vladimir Oltean 提交于
      I have seen too many bugs already due to the fact that we must encode an
      invalid dp->bridge_num as a negative value, because the natural tendency
      is to check that invalid value using (!dp->bridge_num). Latest example
      can be seen in commit 1bec0f05 ("net: dsa: fix bridge_num not
      getting cleared after ports leaving the bridge").
      
      Convert the existing users to assume that dp->bridge_num == 0 is the
      encoding for invalid, and valid bridge numbers start from 1.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NAlvin Šipraga <alsi@bang-olufsen.dk>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3f9bb030
  7. 08 12月, 2021 6 次提交
  8. 07 12月, 2021 10 次提交
  9. 05 12月, 2021 1 次提交
  10. 04 12月, 2021 1 次提交