1. 07 4月, 2020 1 次提交
  2. 31 3月, 2020 9 次提交
    • A
      bpf: Implement bpf_prog replacement for an active bpf_cgroup_link · 0c991ebc
      Andrii Nakryiko 提交于
      Add new operation (LINK_UPDATE), which allows to replace active bpf_prog from
      under given bpf_link. Currently this is only supported for bpf_cgroup_link,
      but will be extended to other kinds of bpf_links in follow-up patches.
      
      For bpf_cgroup_link, implemented functionality matches existing semantics for
      direct bpf_prog attachment (including BPF_F_REPLACE flag). User can either
      unconditionally set new bpf_prog regardless of which bpf_prog is currently
      active under given bpf_link, or, optionally, can specify expected active
      bpf_prog. If active bpf_prog doesn't match expected one, no changes are
      performed, old bpf_link stays intact and attached, operation returns
      a failure.
      
      cgroup_bpf_replace() operation is resolving race between auto-detachment and
      bpf_prog update in the same fashion as it's done for bpf_link detachment,
      except in this case update has no way of succeeding because of target cgroup
      marked as dying. So in this case error is returned.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200330030001.2312810-3-andriin@fb.com
      0c991ebc
    • A
      bpf: Implement bpf_link-based cgroup BPF program attachment · af6eea57
      Andrii Nakryiko 提交于
      Implement new sub-command to attach cgroup BPF programs and return FD-based
      bpf_link back on success. bpf_link, once attached to cgroup, cannot be
      replaced, except by owner having its FD. Cgroup bpf_link supports only
      BPF_F_ALLOW_MULTI semantics. Both link-based and prog-based BPF_F_ALLOW_MULTI
      attachments can be freely intermixed.
      
      To prevent bpf_cgroup_link from keeping cgroup alive past the point when no
      BPF program can be executed, implement auto-detachment of link. When
      cgroup_bpf_release() is called, all attached bpf_links are forced to release
      cgroup refcounts, but they leave bpf_link otherwise active and allocated, as
      well as still owning underlying bpf_prog. This is because user-space might
      still have FDs open and active, so bpf_link as a user-referenced object can't
      be freed yet. Once last active FD is closed, bpf_link will be freed and
      underlying bpf_prog refcount will be dropped. But cgroup refcount won't be
      touched, because cgroup is released already.
      
      The inherent race between bpf_cgroup_link release (from closing last FD) and
      cgroup_bpf_release() is resolved by both operations taking cgroup_mutex. So
      the only additional check required is when bpf_cgroup_link attempts to detach
      itself from cgroup. At that time we need to check whether there is still
      cgroup associated with that link. And if not, exit with success, because
      bpf_cgroup_link was already successfully detached.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Link: https://lore.kernel.org/bpf/20200330030001.2312810-2-andriin@fb.com
      af6eea57
    • S
      NFS: Ensure security label is set for root inode · 779df6a5
      Scott Mayhew 提交于
      When using NFSv4.2, the security label for the root inode should be set
      via a call to nfs_setsecurity() during the mount process, otherwise the
      inode will appear as unlabeled for up to acdirmin seconds.  Currently
      the label for the root inode is allocated, retrieved, and freed entirely
      witin nfs4_proc_get_root().
      
      Add a field for the label to the nfs_fattr struct, and allocate & free
      the label in nfs_get_root(), where we also add a call to
      nfs_setsecurity().  Note that for the call to nfs_setsecurity() to
      succeed, it's necessary to also move the logic calling
      security_sb_{set,clone}_security() from nfs_get_tree_common() down into
      nfs_get_root()... otherwise the SBLABEL_MNT flag will not be set in the
      super_block's security flags and nfs_setsecurity() will silently fail.
      Reported-by: NRichard Haines <richard_c_haines@btinternet.com>
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Tested-by: NStephen Smalley <sds@tycho.nsa.gov>
      [PM: fixed 80-char line width problems]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      779df6a5
    • J
      bpf: Verifier, do explicit ALU32 bounds tracking · 3f50f132
      John Fastabend 提交于
      It is not possible for the current verifier to track ALU32 and JMP ops
      correctly. This can result in the verifier aborting with errors even though
      the program should be verifiable. BPF codes that hit this can work around
      it by changin int variables to 64-bit types, marking variables volatile,
      etc. But this is all very ugly so it would be better to avoid these tricks.
      
      But, the main reason to address this now is do_refine_retval_range() was
      assuming return values could not be negative. Once we fixed this code that
      was previously working will no longer work. See do_refine_retval_range()
      patch for details. And we don't want to suddenly cause programs that used
      to work to fail.
      
      The simplest example code snippet that illustrates the problem is likely
      this,
      
       53: w8 = w0                    // r8 <- [0, S32_MAX],
                                      // w8 <- [-S32_MIN, X]
       54: w8 <s 0                    // r8 <- [0, U32_MAX]
                                      // w8 <- [0, X]
      
      The expected 64-bit and 32-bit bounds after each line are shown on the
      right. The current issue is without the w* bounds we are forced to use
      the worst case bound of [0, U32_MAX]. To resolve this type of case,
      jmp32 creating divergent 32-bit bounds from 64-bit bounds, we add explicit
      32-bit register bounds s32_{min|max}_value and u32_{min|max}_value. Then
      from branch_taken logic creating new bounds we can track 32-bit bounds
      explicitly.
      
      The next case we observed is ALU ops after the jmp32,
      
       53: w8 = w0                    // r8 <- [0, S32_MAX],
                                      // w8 <- [-S32_MIN, X]
       54: w8 <s 0                    // r8 <- [0, U32_MAX]
                                      // w8 <- [0, X]
       55: w8 += 1                    // r8 <- [0, U32_MAX+1]
                                      // w8 <- [0, X+1]
      
      In order to keep the bounds accurate at this point we also need to track
      ALU32 ops. To do this we add explicit ALU32 logic for each of the ALU
      ops, mov, add, sub, etc.
      
      Finally there is a question of how and when to merge bounds. The cases
      enumerate here,
      
      1. MOV ALU32   - zext 32-bit -> 64-bit
      2. MOV ALU64   - copy 64-bit -> 32-bit
      3. op  ALU32   - zext 32-bit -> 64-bit
      4. op  ALU64   - n/a
      5. jmp ALU32   - 64-bit: var32_off | upper_32_bits(var64_off)
      6. jmp ALU64   - 32-bit: (>> (<< var64_off))
      
      Details for each case,
      
      For "MOV ALU32" BPF arch zero extends so we simply copy the bounds
      from 32-bit into 64-bit ensuring we truncate var_off and 64-bit
      bounds correctly. See zext_32_to_64.
      
      For "MOV ALU64" copy all bounds including 32-bit into new register. If
      the src register had 32-bit bounds the dst register will as well.
      
      For "op ALU32" zero extend 32-bit into 64-bit the same as move,
      see zext_32_to_64.
      
      For "op ALU64" calculate both 32-bit and 64-bit bounds no merging
      is done here. Except we have a special case. When RSH or ARSH is
      done we can't simply ignore shifting bits from 64-bit reg into the
      32-bit subreg. So currently just push bounds from 64-bit into 32-bit.
      This will be correct in the sense that they will represent a valid
      state of the register. However we could lose some accuracy if an
      ARSH is following a jmp32 operation. We can handle this special
      case in a follow up series.
      
      For "jmp ALU32" mark 64-bit reg unknown and recalculate 64-bit bounds
      from tnum by setting var_off to ((<<(>>var_off)) | var32_off). We
      special case if 64-bit bounds has zero'd upper 32bits at which point
      we can simply copy 32-bit bounds into 64-bit register. This catches
      a common compiler trick where upper 32-bits are zeroed and then
      32-bit ops are used followed by a 64-bit compare or 64-bit op on
      a pointer. See __reg_combine_64_into_32().
      
      For "jmp ALU64" cast the bounds of the 64bit to their 32-bit
      counterpart. For example s32_min_value = (s32)reg->smin_value. For
      tnum use only the lower 32bits via, (>>(<<var_off)). See
      __reg_combine_64_into_32().
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/158560419880.10843.11448220440809118343.stgit@john-Precision-5820-Tower
      3f50f132
    • R
      net: phylink: add separate pcs operations structure · 4c0d6d3a
      Russell King 提交于
      Add a separate set of PCS operations, which MAC drivers can use to
      couple phylink with their associated MAC PCS layer.  The PCS
      operations include:
      
      - pcs_get_state() - reads the link up/down, resolved speed, duplex
         and pause from the PCS.
      - pcs_config() - configures the PCS for the specified mode, PHY
         interface type, and setting the advertisement.
      - pcs_an_restart() - restarts 802.3 in-band negotiation with the
         link partner
      - pcs_link_up() - informs the PCS that link has come up, and the
         parameters of the link. Link parameters are used to program the
         PCS for fixed speed and non-inband modes.
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c0d6d3a
    • R
      net: phylink: rename 'ops' to 'mac_ops' · e7765d63
      Russell King 提交于
      Rename the bland 'ops' member of struct phylink to be a more
      descriptive 'mac_ops' - this is necessary as we're about to introduce
      another set of operations.
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7765d63
    • R
      net: phylink: change phylink_mii_c22_pcs_set_advertisement() prototype · 0bd27406
      Russell King 提交于
      Change phylink_mii_c22_pcs_set_advertisement() to take only the PHY
      interface and advertisement mask, rather than the full phylink state.
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bd27406
    • Y
      qed: Fix use after free in qed_chain_free · 8063f761
      Yuval Basson 提交于
      The qed_chain data structure was modified in
      commit 1a4a6975 ("qed: Chain support for external PBL") to support
      receiving an external pbl (due to iWARP FW requirements).
      The pages pointed to by the pbl are allocated in qed_chain_alloc
      and their virtual address are stored in an virtual addresses array to
      enable accessing and freeing the data. The physical addresses however
      weren't stored and were accessed directly from the external-pbl
      during free.
      
      Destroy-qp flow, leads to freeing the external pbl before the chain is
      freed, when the chain is freed it tries accessing the already freed
      external pbl, leading to a use-after-free. Therefore we need to store
      the physical addresses in additional to the virtual addresses in a
      new data structure.
      
      Fixes: 1a4a6975 ("qed: Chain support for external PBL")
      Signed-off-by: NMichal Kalderon <mkalderon@marvell.com>
      Signed-off-by: NYuval Bason <ybason@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8063f761
    • R
      ptp: Avoid deadlocks in the programmable pin code. · 62582a7e
      Richard Cochran 提交于
      The PTP Hardware Clock (PHC) subsystem offers an API for configuring
      programmable pins.  User space sets or gets the settings using ioctls,
      and drivers verify dialed settings via a callback.  Drivers may also
      query pin settings by calling the ptp_find_pin() method.
      
      Although the core subsystem protects concurrent access to the pin
      settings, the implementation places illogical restrictions on how
      drivers may call ptp_find_pin().  When enabling an auxiliary function
      via the .enable(on=1) callback, drivers may invoke the pin finding
      method, but when disabling with .enable(on=0) drivers are not
      permitted to do so.  With the exception of the mv88e6xxx, all of the
      PHC drivers do respect this restriction, but still the locking pattern
      is both confusing and unnecessary.
      
      This patch changes the locking implementation to allow PHC drivers to
      freely call ptp_find_pin() from their .enable() and .verify()
      callbacks.
      
      V2 ChangeLog:
      - fixed spelling in the kernel doc
      - add Vladimir's tested by tag
      Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
      Reported-by: NYangbo Lu <yangbo.lu@nxp.com>
      Tested-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62582a7e
  3. 30 3月, 2020 9 次提交
  4. 29 3月, 2020 1 次提交
  5. 28 3月, 2020 10 次提交
  6. 27 3月, 2020 10 次提交