1. 30 10月, 2020 6 次提交
    • H
      bridge: cfm: Netlink GET configuration Interface. · 5e312fc0
      Henrik Bjoernlund 提交于
      This is the implementation of CFM netlink configuration
      get information interface.
      
      Add new nested netlink attributes. These attributes are used by the
      user space to get configuration information.
      
      GETLINK:
          Request filter RTEXT_FILTER_CFM_CONFIG:
          Indicating that CFM configuration information must be delivered.
      
          IFLA_BRIDGE_CFM:
              Points to the CFM information.
      
          IFLA_BRIDGE_CFM_MEP_CREATE_INFO:
              This indicate that MEP instance create parameters are following.
          IFLA_BRIDGE_CFM_MEP_CONFIG_INFO:
              This indicate that MEP instance config parameters are following.
          IFLA_BRIDGE_CFM_CC_CONFIG_INFO:
              This indicate that MEP instance CC functionality
              parameters are following.
          IFLA_BRIDGE_CFM_CC_RDI_INFO:
              This indicate that CC transmitted CCM PDU RDI
              parameters are following.
          IFLA_BRIDGE_CFM_CC_CCM_TX_INFO:
              This indicate that CC transmitted CCM PDU parameters are
              following.
          IFLA_BRIDGE_CFM_CC_PEER_MEP_INFO:
              This indicate that the added peer MEP IDs are following.
      
      CFM nested attribute has the following attributes in next level.
      
      GETLINK RTEXT_FILTER_CFM_CONFIG:
          IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE:
              The created MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN:
              The created MEP domain.
              The type is u32 (br_cfm_domain).
              It must be BR_CFM_PORT.
              This means that CFM frames are transmitted and received
              directly on the port - untagged. Not in a VLAN.
          IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION:
              The created MEP direction.
              The type is u32 (br_cfm_mep_direction).
              It must be BR_CFM_MEP_DIRECTION_DOWN.
              This means that CFM frames are transmitted and received on
              the port. Not in the bridge.
          IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX:
              The created MEP residence port ifindex.
              The type is u32 (ifindex).
      
          IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE:
              The deleted MEP instance number.
              The type is u32.
      
          IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE:
              The configured MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC:
              The configured MEP unicast MAC address.
              The type is 6*u8 (array).
              This is used as SMAC in all transmitted CFM frames.
          IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL:
              The configured MEP unicast MD level.
              The type is u32.
              It must be in the range 1-7.
              No CFM frames are passing through this MEP on lower levels.
          IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID:
              The configured MEP ID.
              The type is u32.
              It must be in the range 0-0x1FFF.
              This MEP ID is inserted in any transmitted CCM frame.
      
          IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE:
              The configured MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE:
              The Continuity Check (CC) functionality is enabled or disabled.
              The type is u32 (bool).
          IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL:
              The CC expected receive interval of CCM frames.
              The type is u32 (br_cfm_ccm_interval).
              This is also the transmission interval of CCM frames when enabled.
          IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID:
              The CC expected receive MAID in CCM frames.
              The type is CFM_MAID_LENGTH*u8.
              This is MAID is also inserted in transmitted CCM frames.
      
          IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE:
              The configured MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_CC_PEER_MEPID:
              The CC Peer MEP ID added.
              The type is u32.
              When a Peer MEP ID is added and CC is enabled it is expected to
              receive CCM frames from that Peer MEP.
      
          IFLA_BRIDGE_CFM_CC_RDI_INSTANCE:
              The configured MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_CC_RDI_RDI:
              The RDI that is inserted in transmitted CCM PDU.
              The type is u32 (bool).
      
          IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE:
              The configured MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC:
              The transmitted CCM frame destination MAC address.
              The type is 6*u8 (array).
              This is used as DMAC in all transmitted CFM frames.
          IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE:
              The transmitted CCM frame update (increment) of sequence
              number is enabled or disabled.
              The type is u32 (bool).
          IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD:
              The period of time where CCM frame are transmitted.
              The type is u32.
              The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX
              must be done before timeout to keep transmission alive.
              When period is zero any ongoing CCM frame transmission
              will be stopped.
          IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV:
              The transmitted CCM frame update with Interface Status TLV
              is enabled or disabled.
              The type is u32 (bool).
          IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE:
              The transmitted Interface Status TLV value field.
              The type is u8.
          IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV:
              The transmitted CCM frame update with Port Status TLV is enabled
              or disabled.
              The type is u32 (bool).
          IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE:
              The transmitted Port Status TLV value field.
              The type is u8.
      Signed-off-by: NHenrik Bjoernlund  <henrik.bjoernlund@microchip.com>
      Reviewed-by: NHoratiu Vultur  <horatiu.vultur@microchip.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      5e312fc0
    • H
      bridge: cfm: Netlink SET configuration Interface. · 2be665c3
      Henrik Bjoernlund 提交于
      This is the implementation of CFM netlink configuration
      set information interface.
      
      Add new nested netlink attributes. These attributes are used by the
      user space to create/delete/configure CFM instances.
      
      SETLINK:
          IFLA_BRIDGE_CFM:
              Indicate that the following attributes are CFM.
      
          IFLA_BRIDGE_CFM_MEP_CREATE:
              This indicate that a MEP instance must be created.
          IFLA_BRIDGE_CFM_MEP_DELETE:
              This indicate that a MEP instance must be deleted.
          IFLA_BRIDGE_CFM_MEP_CONFIG:
              This indicate that a MEP instance must be configured.
          IFLA_BRIDGE_CFM_CC_CONFIG:
              This indicate that a MEP instance Continuity Check (CC)
              functionality must be configured.
          IFLA_BRIDGE_CFM_CC_PEER_MEP_ADD:
              This indicate that a CC Peer MEP must be added.
          IFLA_BRIDGE_CFM_CC_PEER_MEP_REMOVE:
              This indicate that a CC Peer MEP must be removed.
          IFLA_BRIDGE_CFM_CC_CCM_TX:
              This indicate that the CC transmitted CCM PDU must be configured.
          IFLA_BRIDGE_CFM_CC_RDI:
              This indicate that the CC transmitted CCM PDU RDI must be
              configured.
      
      CFM nested attribute has the following attributes in next level.
      
      SETLINK RTEXT_FILTER_CFM_CONFIG:
          IFLA_BRIDGE_CFM_MEP_CREATE_INSTANCE:
              The created MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_MEP_CREATE_DOMAIN:
              The created MEP domain.
              The type is u32 (br_cfm_domain).
              It must be BR_CFM_PORT.
              This means that CFM frames are transmitted and received
              directly on the port - untagged. Not in a VLAN.
          IFLA_BRIDGE_CFM_MEP_CREATE_DIRECTION:
              The created MEP direction.
              The type is u32 (br_cfm_mep_direction).
              It must be BR_CFM_MEP_DIRECTION_DOWN.
              This means that CFM frames are transmitted and received on
              the port. Not in the bridge.
          IFLA_BRIDGE_CFM_MEP_CREATE_IFINDEX:
              The created MEP residence port ifindex.
              The type is u32 (ifindex).
      
          IFLA_BRIDGE_CFM_MEP_DELETE_INSTANCE:
              The deleted MEP instance number.
              The type is u32.
      
          IFLA_BRIDGE_CFM_MEP_CONFIG_INSTANCE:
              The configured MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_MEP_CONFIG_UNICAST_MAC:
              The configured MEP unicast MAC address.
              The type is 6*u8 (array).
              This is used as SMAC in all transmitted CFM frames.
          IFLA_BRIDGE_CFM_MEP_CONFIG_MDLEVEL:
              The configured MEP unicast MD level.
              The type is u32.
              It must be in the range 1-7.
              No CFM frames are passing through this MEP on lower levels.
          IFLA_BRIDGE_CFM_MEP_CONFIG_MEPID:
              The configured MEP ID.
              The type is u32.
              It must be in the range 0-0x1FFF.
              This MEP ID is inserted in any transmitted CCM frame.
      
          IFLA_BRIDGE_CFM_CC_CONFIG_INSTANCE:
              The configured MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_CC_CONFIG_ENABLE:
              The Continuity Check (CC) functionality is enabled or disabled.
              The type is u32 (bool).
          IFLA_BRIDGE_CFM_CC_CONFIG_EXP_INTERVAL:
              The CC expected receive interval of CCM frames.
              The type is u32 (br_cfm_ccm_interval).
              This is also the transmission interval of CCM frames when enabled.
          IFLA_BRIDGE_CFM_CC_CONFIG_EXP_MAID:
              The CC expected receive MAID in CCM frames.
              The type is CFM_MAID_LENGTH*u8.
              This is MAID is also inserted in transmitted CCM frames.
      
          IFLA_BRIDGE_CFM_CC_PEER_MEP_INSTANCE:
              The configured MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_CC_PEER_MEPID:
              The CC Peer MEP ID added.
              The type is u32.
              When a Peer MEP ID is added and CC is enabled it is expected to
              receive CCM frames from that Peer MEP.
      
          IFLA_BRIDGE_CFM_CC_RDI_INSTANCE:
              The configured MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_CC_RDI_RDI:
              The RDI that is inserted in transmitted CCM PDU.
              The type is u32 (bool).
      
          IFLA_BRIDGE_CFM_CC_CCM_TX_INSTANCE:
              The configured MEP instance number.
              The type is u32.
          IFLA_BRIDGE_CFM_CC_CCM_TX_DMAC:
              The transmitted CCM frame destination MAC address.
              The type is 6*u8 (array).
              This is used as DMAC in all transmitted CFM frames.
          IFLA_BRIDGE_CFM_CC_CCM_TX_SEQ_NO_UPDATE:
              The transmitted CCM frame update (increment) of sequence
              number is enabled or disabled.
              The type is u32 (bool).
          IFLA_BRIDGE_CFM_CC_CCM_TX_PERIOD:
              The period of time where CCM frame are transmitted.
              The type is u32.
              The time is given in seconds. SETLINK IFLA_BRIDGE_CFM_CC_CCM_TX
              must be done before timeout to keep transmission alive.
              When period is zero any ongoing CCM frame transmission
              will be stopped.
          IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV:
              The transmitted CCM frame update with Interface Status TLV
              is enabled or disabled.
              The type is u32 (bool).
          IFLA_BRIDGE_CFM_CC_CCM_TX_IF_TLV_VALUE:
              The transmitted Interface Status TLV value field.
              The type is u8.
          IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV:
              The transmitted CCM frame update with Port Status TLV is enabled
              or disabled.
              The type is u32 (bool).
          IFLA_BRIDGE_CFM_CC_CCM_TX_PORT_TLV_VALUE:
              The transmitted Port Status TLV value field.
              The type is u8.
      Signed-off-by: NHenrik Bjoernlund  <henrik.bjoernlund@microchip.com>
      Reviewed-by: NHoratiu Vultur  <horatiu.vultur@microchip.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      2be665c3
    • H
      bridge: cfm: Kernel space implementation of CFM. CCM frame RX added. · dc32cbb3
      Henrik Bjoernlund 提交于
      This is the third commit of the implementation of the CFM protocol
      according to 802.1Q section 12.14.
      
      Functionality is extended with CCM frame reception.
      The MEP instance now contains CCM based status information.
      Most important is the CCM defect status indicating if correct
      CCM frames are received with the expected interval.
      Signed-off-by: NHenrik Bjoernlund  <henrik.bjoernlund@microchip.com>
      Reviewed-by: NHoratiu Vultur  <horatiu.vultur@microchip.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      dc32cbb3
    • H
      bridge: cfm: Kernel space implementation of CFM. CCM frame TX added. · a806ad8e
      Henrik Bjoernlund 提交于
      This is the second commit of the implementation of the CFM protocol
      according to 802.1Q section 12.14.
      
      Functionality is extended with CCM frame transmission.
      
      Interface is extended with these functions:
      br_cfm_cc_rdi_set()
      br_cfm_cc_ccm_tx()
      br_cfm_cc_config_set()
      
      A MEP Continuity Check feature can be configured by
      br_cfm_cc_config_set()
          The Continuity Check parameters can be configured to be used when
          transmitting CCM.
      
      A MEP can be configured to start or stop transmission of CCM frames by
      br_cfm_cc_ccm_tx()
          The CCM will be transmitted for a selected period in seconds.
          Must call this function before timeout to keep transmission alive.
      
      A MEP transmitting CCM can be configured with inserted RDI in PDU by
      br_cfm_cc_rdi_set()
      Signed-off-by: NHenrik Bjoernlund  <henrik.bjoernlund@microchip.com>
      Reviewed-by: NHoratiu Vultur  <horatiu.vultur@microchip.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a806ad8e
    • H
      bridge: cfm: Kernel space implementation of CFM. MEP create/delete. · 86a14b79
      Henrik Bjoernlund 提交于
      This is the first commit of the implementation of the CFM protocol
      according to 802.1Q section 12.14.
      
      It contains MEP instance create, delete and configuration.
      
      Connectivity Fault Management (CFM) comprises capabilities for
      detecting, verifying, and isolating connectivity failures in
      Virtual Bridged Networks. These capabilities can be used in
      networks operated by multiple independent organizations, each
      with restricted management access to each others equipment.
      
      CFM functions are partitioned as follows:
          - Path discovery
          - Fault detection
          - Fault verification and isolation
          - Fault notification
          - Fault recovery
      
      Interface consists of these functions:
      br_cfm_mep_create()
      br_cfm_mep_delete()
      br_cfm_mep_config_set()
      br_cfm_cc_config_set()
      br_cfm_cc_peer_mep_add()
      br_cfm_cc_peer_mep_remove()
      
      A MEP instance is created by br_cfm_mep_create()
          -It is the Maintenance association End Point
           described in 802.1Q section 19.2.
          -It is created on a specific level (1-7) and is assuring
           that no CFM frames are passing through this MEP on lower levels.
          -It initiates and validates CFM frames on its level.
          -It can only exist on a port that is related to a bridge.
          -Attributes given cannot be changed until the instance is
           deleted.
      
      A MEP instance can be deleted by br_cfm_mep_delete().
      
      A created MEP instance has attributes that can be
      configured by br_cfm_mep_config_set().
      
      A MEP Continuity Check feature can be configured by
      br_cfm_cc_config_set()
          The Continuity Check Receiver state machine can be
          enabled and disabled.
          According to 802.1Q section 19.2.8
      
      A MEP can have Peer MEPs added and removed by
      br_cfm_cc_peer_mep_add() and br_cfm_cc_peer_mep_remove()
          The Continuity Check feature can maintain connectivity
          status on each added Peer MEP.
      Signed-off-by: NHenrik Bjoernlund  <henrik.bjoernlund@microchip.com>
      Reviewed-by: NHoratiu Vultur  <horatiu.vultur@microchip.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      86a14b79
    • H
      bridge: uapi: cfm: Added EtherType used by the CFM protocol. · fbaedb41
      Henrik Bjoernlund 提交于
      This EtherType is used by all CFM protocal frames transmitted
      according to 802.1Q section 12.14.
      Signed-off-by: NHenrik Bjoernlund  <henrik.bjoernlund@microchip.com>
      Reviewed-by: NHoratiu Vultur  <horatiu.vultur@microchip.com>
      Acked-by: NNikolay Aleksandrov <nikolay@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      fbaedb41
  2. 22 10月, 2020 2 次提交
  3. 20 10月, 2020 1 次提交
  4. 17 10月, 2020 2 次提交
  5. 16 10月, 2020 2 次提交
  6. 15 10月, 2020 1 次提交
  7. 12 10月, 2020 5 次提交
    • O
      can: isotp: implement cleanups / improvements from review · ac911bfe
      Oliver Hartkopp 提交于
      As pointed out by Jakub Kicinski here:
      http://lore.kernel.org/r/20201009175751.5c54097f@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com
      this patch addresses the remarked issues:
      
      - remove empty line in comment
      - remove default=y for CAN_ISOTP in Kconfig
      - make use of pr_notice_once()
      - use GFP_ATOMIC instead of gfp_any() in soft hrtimer context
      
      The version strings in the CAN subsystem are removed by a separate patch.
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/r/20201012074354.25839-1-socketcan@hartkopp.netSigned-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      ac911bfe
    • P
      netfilter: add inet ingress support · 60a3815d
      Pablo Neira Ayuso 提交于
      This patch adds the NF_INET_INGRESS pseudohook for the NFPROTO_INET
      family. This is a mapping this new hook to the existing NFPROTO_NETDEV
      and NF_NETDEV_INGRESS hook. The hook does not guarantee that packets are
      inet only, users must filter out non-ip traffic explicitly.
      
      This infrastructure makes it easier to support this new hook in nf_tables.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      60a3815d
    • D
      bpf: Allow for map-in-map with dynamic inner array map entries · 4a8f87e6
      Daniel Borkmann 提交于
      Recent work in f4d05259 ("bpf: Add map_meta_equal map ops") and 134fede4
      ("bpf: Relax max_entries check for most of the inner map types") added support
      for dynamic inner max elements for most map-in-map types. Exceptions were maps
      like array or prog array where the map_gen_lookup() callback uses the maps'
      max_entries field as a constant when emitting instructions.
      
      We recently implemented Maglev consistent hashing into Cilium's load balancer
      which uses map-in-map with an outer map being hash and inner being array holding
      the Maglev backend table for each service. This has been designed this way in
      order to reduce overall memory consumption given the outer hash map allows to
      avoid preallocating a large, flat memory area for all services. Also, the
      number of service mappings is not always known a-priori.
      
      The use case for dynamic inner array map entries is to further reduce memory
      overhead, for example, some services might just have a small number of back
      ends while others could have a large number. Right now the Maglev backend table
      for small and large number of backends would need to have the same inner array
      map entries which adds a lot of unneeded overhead.
      
      Dynamic inner array map entries can be realized by avoiding the inlined code
      generation for their lookup. The lookup will still be efficient since it will
      be calling into array_map_lookup_elem() directly and thus avoiding retpoline.
      The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips
      inline code generation and relaxes array_map_meta_equal() check to ignore both
      maps' max_entries. This also still allows to have faster lookups for map-in-map
      when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed.
      
      Example code generation where inner map is dynamic sized array:
      
        # bpftool p d x i 125
        int handle__sys_enter(void * ctx):
        ; int handle__sys_enter(void *ctx)
           0: (b4) w1 = 0
        ; int key = 0;
           1: (63) *(u32 *)(r10 -4) = r1
           2: (bf) r2 = r10
        ;
           3: (07) r2 += -4
        ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key);
           4: (18) r1 = map[id:468]
           6: (07) r1 += 272
           7: (61) r0 = *(u32 *)(r2 +0)
           8: (35) if r0 >= 0x3 goto pc+5
           9: (67) r0 <<= 3
          10: (0f) r0 += r1
          11: (79) r0 = *(u64 *)(r0 +0)
          12: (15) if r0 == 0x0 goto pc+1
          13: (05) goto pc+1
          14: (b7) r0 = 0
          15: (b4) w6 = -1
        ; if (!inner_map)
          16: (15) if r0 == 0x0 goto pc+6
          17: (bf) r2 = r10
        ;
          18: (07) r2 += -4
        ; val = bpf_map_lookup_elem(inner_map, &key);
          19: (bf) r1 = r0                               | No inlining but instead
          20: (85) call array_map_lookup_elem#149280     | call to array_map_lookup_elem()
        ; return val ? *val : -1;                        | for inner array lookup.
          21: (15) if r0 == 0x0 goto pc+1
        ; return val ? *val : -1;
          22: (61) r6 = *(u32 *)(r0 +0)
        ; }
          23: (bc) w0 = w6
          24: (95) exit
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
      4a8f87e6
    • D
      bpf: Add redirect_peer helper · 9aa1206e
      Daniel Borkmann 提交于
      Add an efficient ingress to ingress netns switch that can be used out of tc BPF
      programs in order to redirect traffic from host ns ingress into a container
      veth device ingress without having to go via CPU backlog queue [0]. For local
      containers this can also be utilized and path via CPU backlog queue only needs
      to be taken once, not twice. On a high level this borrows from ipvlan which does
      similar switch in __netif_receive_skb_core() and then iterates via another_round.
      This helps to reduce latency for mentioned use cases.
      
      Pod to remote pod with redirect(), TCP_RR [1]:
      
        # percpu_netperf 10.217.1.33
                RT_LATENCY:         122.450         (per CPU:         122.666         122.401         122.333         122.401 )
              MEAN_LATENCY:         121.210         (per CPU:         121.100         121.260         121.320         121.160 )
            STDDEV_LATENCY:         120.040         (per CPU:         119.420         119.910         125.460         115.370 )
               MIN_LATENCY:          46.500         (per CPU:          47.000          47.000          47.000          45.000 )
               P50_LATENCY:         118.500         (per CPU:         118.000         119.000         118.000         119.000 )
               P90_LATENCY:         127.500         (per CPU:         127.000         128.000         127.000         128.000 )
               P99_LATENCY:         130.750         (per CPU:         131.000         131.000         129.000         132.000 )
      
          TRANSACTION_RATE:       32666.400         (per CPU:        8152.200        8169.842        8174.439        8169.897 )
      
      Pod to remote pod with redirect_peer(), TCP_RR:
      
        # percpu_netperf 10.217.1.33
                RT_LATENCY:          44.449         (per CPU:          43.767          43.127          45.279          45.622 )
              MEAN_LATENCY:          45.065         (per CPU:          44.030          45.530          45.190          45.510 )
            STDDEV_LATENCY:          84.823         (per CPU:          66.770          97.290          84.380          90.850 )
               MIN_LATENCY:          33.500         (per CPU:          33.000          33.000          34.000          34.000 )
               P50_LATENCY:          43.250         (per CPU:          43.000          43.000          43.000          44.000 )
               P90_LATENCY:          46.750         (per CPU:          46.000          47.000          47.000          47.000 )
               P99_LATENCY:          52.750         (per CPU:          51.000          54.000          53.000          53.000 )
      
          TRANSACTION_RATE:       90039.500         (per CPU:       22848.186       23187.089       22085.077       21919.130 )
      
        [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf
        [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net
      9aa1206e
    • D
      bpf: Improve bpf_redirect_neigh helper description · dd2ce6a5
      Daniel Borkmann 提交于
      Follow-up to address David's feedback that we should better describe internals
      of the bpf_redirect_neigh() helper.
      Suggested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Link: https://lore.kernel.org/bpf/20201010234006.7075-2-daniel@iogearbox.net
      dd2ce6a5
  8. 10 10月, 2020 6 次提交
    • J
      netlink: export policy in extended ACK · 44f3625b
      Johannes Berg 提交于
      Add a new attribute NLMSGERR_ATTR_POLICY to the extended ACK
      to advertise the policy, e.g. if an attribute was out of range,
      you'll know the range that's permissible.
      
      Add new NL_SET_ERR_MSG_ATTR_POL() and NL_SET_ERR_MSG_ATTR_POL()
      macros to set this, since realistically it's only useful to do
      this when the bad attribute (offset) is also returned.
      
      Use it in lib/nlattr.c which practically does all the policy
      validation.
      
      v2:
       - add and use netlink_policy_dump_attr_size_estimate()
      v3:
       - remove redundant break
      v4:
       - really remove redundant break ... sorry
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      44f3625b
    • M
      devlink: Add remote reload stats · 77069ba2
      Moshe Shemesh 提交于
      Add remote reload stats to hold the history of actions performed due
      devlink reload commands initiated by remote host. For example, in case
      firmware activation with reset finished successfully but was initiated
      by remote host.
      
      The function devlink_remote_reload_actions_performed() is exported to
      enable drivers update on remote reload actions performed as it was not
      initiated by their own devlink instance.
      
      Expose devlink remote reload stats to the user through devlink dev get
      command.
      
      Examples:
      $ devlink dev show
      pci/0000:82:00.0:
        stats:
            reload:
              driver_reinit 2 fw_activate 1 fw_activate_no_reset 0
            remote_reload:
              driver_reinit 0 fw_activate 0 fw_activate_no_reset 0
      pci/0000:82:00.1:
        stats:
            reload:
              driver_reinit 1 fw_activate 0 fw_activate_no_reset 0
            remote_reload:
              driver_reinit 1 fw_activate 1 fw_activate_no_reset 0
      
      $ devlink dev show -jp
      {
          "dev": {
              "pci/0000:82:00.0": {
                  "stats": {
                      "reload": {
                          "driver_reinit": 2,
                          "fw_activate": 1,
                          "fw_activate_no_reset": 0
                      },
                      "remote_reload": {
                          "driver_reinit": 0,
                          "fw_activate": 0,
                          "fw_activate_no_reset": 0
                      }
                  }
              },
              "pci/0000:82:00.1": {
                  "stats": {
                      "reload": {
                          "driver_reinit": 1,
                          "fw_activate": 0,
                          "fw_activate_no_reset": 0
                      },
                      "remote_reload": {
                          "driver_reinit": 1,
                          "fw_activate": 1,
                          "fw_activate_no_reset": 0
                      }
                  }
              }
          }
      }
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      77069ba2
    • M
      devlink: Add reload stats · a254c264
      Moshe Shemesh 提交于
      Add reload stats to hold the history per reload action type and limit.
      
      For example, the number of times fw_activate has been performed on this
      device since the driver module was added or if the firmware activation
      was performed with or without reset.
      
      Add devlink notification on stats update.
      
      Expose devlink reload stats to the user through devlink dev get command.
      
      Examples:
      $ devlink dev show
      pci/0000:82:00.0:
        stats:
            reload:
              driver_reinit 2 fw_activate 1 fw_activate_no_reset 0
      pci/0000:82:00.1:
        stats:
            reload:
              driver_reinit 1 fw_activate 0 fw_activate_no_reset 0
      
      $ devlink dev show -jp
      {
          "dev": {
              "pci/0000:82:00.0": {
                  "stats": {
                      "reload": {
                          "driver_reinit": 2,
                          "fw_activate": 1,
                          "fw_activate_no_reset": 0
                      }
                  }
              },
              "pci/0000:82:00.1": {
                  "stats": {
                      "reload": {
                          "driver_reinit": 1,
                          "fw_activate": 0,
                          "fw_activate_no_reset": 0
                      }
                  }
              }
          }
      }
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a254c264
    • M
      devlink: Add devlink reload limit option · dc64cc7c
      Moshe Shemesh 提交于
      Add reload limit to demand restrictions on reload actions.
      Reload limits supported:
      no_reset: No reset allowed, no down time allowed, no link flap and no
                configuration is lost.
      
      By default reload limit is unspecified and so no constraints on reload
      actions are required.
      
      Some combinations of action and limit are invalid. For example, driver
      can not reinitialize its entities without any downtime.
      
      The no_reset reload limit will have usecase in this patchset to
      implement restricted fw_activate on mlx5.
      
      Have the uapi parameter of reload limit ready for future support of
      multiselection.
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      dc64cc7c
    • M
      devlink: Add reload action option to devlink reload command · ccdf0721
      Moshe Shemesh 提交于
      Add devlink reload action to allow the user to request a specific reload
      action. The action parameter is optional, if not specified then devlink
      driver re-init action is used (backward compatible).
      Note that when required to do firmware activation some drivers may need
      to reload the driver. On the other hand some drivers may need to reset
      the firmware to reinitialize the driver entities. Therefore, the devlink
      reload command returns the actions which were actually performed.
      Reload actions supported are:
      driver_reinit: driver entities re-initialization, applying devlink-param
                     and devlink-resource values.
      fw_activate: firmware activate.
      
      command examples:
      $devlink dev reload pci/0000:82:00.0 action driver_reinit
      reload_actions_performed:
        driver_reinit
      
      $devlink dev reload pci/0000:82:00.0 action fw_activate
      reload_actions_performed:
        driver_reinit fw_activate
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: NJiri Pirko <jiri@nvidia.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ccdf0721
    • D
      block: fix uapi blkzoned.h comments · 8858e8d9
      Damien Le Moal 提交于
      Update the kdoc comments for struct blk_zone (capacity field description
      missing) and for struct blk_zone_report (flags field description
      missing).
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8858e8d9
  9. 09 10月, 2020 1 次提交
  10. 08 10月, 2020 4 次提交
    • O
      can: add ISO 15765-2:2016 transport protocol · e057dd3f
      Oliver Hartkopp 提交于
      CAN Transport Protocols offer support for segmented Point-to-Point
      communication between CAN nodes via two defined CAN Identifiers.
      As CAN frames can only transport a small amount of data bytes
      (max. 8 bytes for 'classic' CAN and max. 64 bytes for CAN FD) this
      segmentation is needed to transport longer PDUs as needed e.g. for
      vehicle diagnosis (UDS, ISO 14229) or IP-over-CAN traffic.
      This protocol driver implements data transfers according to
      ISO 15765-2:2016 for 'classic' CAN and CAN FD frame types.
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/r/20200928200404.82229-1-socketcan@hartkopp.net
      [mkl: Removed "WITH Linux-syscall-note" from isotp.c.
            Fixed indention, a checkpatch warning and typos.
            Replaced __u{8,32} by u{8,32}.
            Removed always false (optlen < 0) check in isotp_setsockopt().]
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      e057dd3f
    • M
      vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO · 0c633f0b
      Matthew Rosato 提交于
      Allow the VFIO_DEVICE_GET_INFO ioctl to include a capability chain.
      Add a flag indicating capability chain support, and introduce the
      definitions for the first set of capabilities which are specified to
      s390 zPCI devices.
      Signed-off-by: NMatthew Rosato <mjrosato@linux.ibm.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      0c633f0b
    • B
      vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices · fb1ff4c1
      Bharat Bhushan 提交于
      DPAA2 (Data Path Acceleration Architecture) consists in
      mechanisms for processing Ethernet packets, queue management,
      accelerators, etc.
      
      The Management Complex (mc) is a hardware entity that manages the DPAA2
      hardware resources. It provides an object-based abstraction for software
      drivers to use the DPAA2 hardware. The MC mediates operations such as
      create, discover, destroy of DPAA2 objects.
      The MC provides memory-mapped I/O command interfaces (MC portals) which
      DPAA2 software drivers use to operate on DPAA2 objects.
      
      A DPRC is a container object that holds other types of DPAA2 objects.
      Each object in the DPRC is a Linux device and bound to a driver.
      The MC-bus driver is a platform driver (different from PCI or platform
      bus). The DPRC driver does runtime management of a bus instance. It
      performs the initial scan of the DPRC and handles changes in the DPRC
      configuration (adding/removing objects).
      
      All objects inside a container share the same hardware isolation
      context, meaning that only an entire DPRC can be assigned to
      a virtual machine.
      When a container is assigned to a virtual machine, all the objects
      within that container are assigned to that virtual machine.
      The DPRC container assigned to the virtual machine is not allowed
      to change contents (add/remove objects) by the guest. The restriction
      is set by the host and enforced by the mc hardware.
      
      The DPAA2 objects can be directly assigned to the guest. However
      the MC portals (the memory mapped command interface to the MC) need
      to be emulated because there are commands that configure the
      interrupts and the isolation IDs which are virtual in the guest.
      
      Example:
      echo vfio-fsl-mc > /sys/bus/fsl-mc/devices/dprc.2/driver_override
      echo dprc.2 > /sys/bus/fsl-mc/drivers/vfio-fsl-mc/bind
      
      The dprc.2 is bound to the VFIO driver and all the objects within
      dprc.2 are going to be bound to the VFIO driver.
      
      This patch adds the infrastructure for VFIO support for fsl-mc
      devices. Subsequent patches will add support for binding and secure
      assigning these devices using VFIO.
      
      More details about the DPAA2 objects can be found here:
      Documentation/networking/device_drivers/freescale/dpaa2/overview.rst
      Signed-off-by: NBharat Bhushan <Bharat.Bhushan@nxp.com>
      Signed-off-by: NDiana Craciun <diana.craciun@oss.nxp.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      fb1ff4c1
    • J
  11. 07 10月, 2020 2 次提交
  12. 06 10月, 2020 1 次提交
    • J
      netlink: add mask validation · bdbb4e29
      Jakub Kicinski 提交于
      We don't have good validation policy for existing unsigned int attrs
      which serve as flags (for new ones we could use NLA_BITFIELD32).
      With increased use of policy dumping having the validation be
      expressed as part of the policy is important. Add validation
      policy in form of a mask of supported/valid bits.
      
      Support u64 in the uAPI to be future-proof, but really for now
      the embedded mask member can only hold 32 bits, so anything with
      bit 32+ set will always fail validation.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdbb4e29
  13. 05 10月, 2020 2 次提交
    • D
      rxrpc: Fix accept on a connection that need securing · 2d914c1b
      David Howells 提交于
      When a new incoming call arrives at an userspace rxrpc socket on a new
      connection that has a security class set, the code currently pushes it onto
      the accept queue to hold a ref on it for the socket.  This doesn't work,
      however, as recvmsg() pops it off, notices that it's in the SERVER_SECURING
      state and discards the ref.  This means that the call runs out of refs too
      early and the kernel oopses.
      
      By contrast, a kernel rxrpc socket manually pre-charges the incoming call
      pool with calls that already have user call IDs assigned, so they are ref'd
      by the call tree on the socket.
      
      Change the mode of operation for userspace rxrpc server sockets to work
      like this too.  Although this is a UAPI change, server sockets aren't
      currently functional.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      2d914c1b
    • A
      net: devlink: Add unused port flavour · cf116634
      Andrew Lunn 提交于
      Not all ports of a switch need to be used, particularly in embedded
      systems. Add a port flavour for ports which physically exist in the
      switch, but are not connected to the front panel etc, and so are
      unused. By having unused ports present in devlink, it gives a more
      accurate representation of the hardware. It also allows regions to be
      associated to such ports, so allowing, for example, to determine
      unused ports are correctly powered off, or to compare probable reset
      defaults of unused ports to used ports experiences issues.
      
      Actually registering unused ports and setting the flavour to unused is
      optional. The DSA core will register all such switch ports, but such
      ports are expected to be limited in number. Bigger ASICs may decide
      not to list unused ports.
      
      v2:
      Expand the description about why it is useful
      Reviewed-by: NVladimir Oltean <olteanv@gmail.com>
      Tested-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf116634
  14. 04 10月, 2020 4 次提交
  15. 03 10月, 2020 1 次提交