1. 31 7月, 2018 2 次提交
    • A
      bpf: Support bpf_get_socket_cookie in more prog types · d692f113
      Andrey Ignatov 提交于
      bpf_get_socket_cookie() helper can be used to identify skb that
      correspond to the same socket.
      
      Though socket cookie can be useful in many other use-cases where socket is
      available in program context. Specifically BPF_PROG_TYPE_CGROUP_SOCK_ADDR
      and BPF_PROG_TYPE_SOCK_OPS programs can benefit from it so that one of
      them can augment a value in a map prepared earlier by other program for
      the same socket.
      
      The patch adds support to call bpf_get_socket_cookie() from
      BPF_PROG_TYPE_CGROUP_SOCK_ADDR and BPF_PROG_TYPE_SOCK_OPS.
      
      It doesn't introduce new helpers. Instead it reuses same helper name
      bpf_get_socket_cookie() but adds support to this helper to accept
      `struct bpf_sock_addr` and `struct bpf_sock_ops`.
      
      Documentation in bpf.h is changed in a way that should not break
      automatic generation of markdown.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      d692f113
    • M
      bpf: add End.DT6 action to bpf_lwt_seg6_action helper · 486cdf21
      Mathieu Xhonneux 提交于
      The seg6local LWT provides the End.DT6 action, which allows to
      decapsulate an outer IPv6 header containing a Segment Routing Header
      (SRH), full specification is available here:
      
      https://tools.ietf.org/html/draft-filsfils-spring-srv6-network-programming-05
      
      This patch adds this action now to the seg6local BPF
      interface. Since it is not mandatory that the inner IPv6 header also
      contains a SRH, seg6_bpf_srh_state has been extended with a pointer to
      a possible SRH of the outermost IPv6 header. This helps assessing if the
      validation must be triggered or not, and avoids some calls to
      ipv6_find_hdr.
      
      v3: s/1/true, s/0/false for boolean values
      v2: - changed true/false -> 1/0
          - preempt_enable no longer called in first conditional block
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      486cdf21
  2. 25 7月, 2018 3 次提交
    • N
      net/sched: add skbprio scheduler · aea5f654
      Nishanth Devarajan 提交于
      Skbprio (SKB Priority Queue) is a queueing discipline that prioritizes packets
      according to their skb->priority field. Under congestion, already-enqueued lower
      priority packets will be dropped to make space available for higher priority
      packets. Skbprio was conceived as a solution for denial-of-service defenses that
      need to route packets with different priorities as a means to overcome DoS
      attacks.
      
      v5
      *Do not reference qdisc_dev(sch)->tx_queue_len for setting limit. Instead set
      default sch->limit to 64.
      
      v4
      *Drop Documentation/networking/sch_skbprio.txt doc file to move it to tc man
      page for Skbprio, in iproute2.
      
      v3
      *Drop max_limit parameter in struct skbprio_sched_data and instead use
      sch->limit.
      
      *Reference qdisc_dev(sch)->tx_queue_len only once, during initialisation for
      qdisc (previously being referenced every time qdisc changes).
      
      *Move qdisc's detailed description from in-code to Documentation/networking.
      
      *When qdisc is saturated, enqueue incoming packet first before dequeueing
      lowest priority packet in queue - improves usage of call stack registers.
      
      *Introduce and use overlimit stat to keep track of number of dropped packets.
      
      v2
      *Use skb->priority field rather than DS field. Rename queueing discipline as
      SKB Priority Queue (previously Gatekeeper Priority Queue).
      
      *Queueing discipline is made classful to expose Skbprio's internal priority
      queues.
      Signed-off-by: NNishanth Devarajan <ndev2021@gmail.com>
      Reviewed-by: NSachin Paryani <sachin.paryani@gmail.com>
      Reviewed-by: NCody Doucette <doucette@bu.edu>
      Reviewed-by: NMichel Machado <michel@digirati.com.br>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aea5f654
    • H
      net: phy: add GBit master / slave error detection · b8f8c8eb
      Heiner Kallweit 提交于
      Certain PHY's have issues when operating in GBit slave mode and can
      be forced to master mode. Examples are RTL8211C, also the Micrel PHY
      driver has a DT setting to force master mode.
      If two such chips are link partners the autonegotiation will fail.
      Standard defines a self-clearing on read, latched-high bit to
      indicate this error. Check this bit to inform the user.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8f8c8eb
    • F
      netlink: do not store start function in netlink_cb · 3730cf4d
      Florian Westphal 提交于
      ->start() is called once when dump is being initialized, there is no
      need to store it in netlink_cb.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3730cf4d
  3. 24 7月, 2018 9 次提交
    • K
      rds: Extend RDS API for IPv6 support · b7ff8b10
      Ka-Cheong Poon 提交于
      There are many data structures (RDS socket options) used by RDS apps
      which use a 32 bit integer to store IP address. To support IPv6,
      struct in6_addr needs to be used. To ensure backward compatibility, a
      new data structure is introduced for each of those data structures
      which use a 32 bit integer to represent an IP address. And new socket
      options are introduced to use those new structures. This means that
      existing apps should work without a problem with the new RDS module.
      For apps which want to use IPv6, those new data structures and socket
      options can be used. IPv4 mapped address is used to represent IPv4
      address in the new data structures.
      
      v4: Revert changes to SO_RDS_TRANSPORT
      Signed-off-by: NKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7ff8b10
    • J
      net: sched: cls_flower: propagate chain teplate creation and destruction to drivers · 34738452
      Jiri Pirko 提交于
      Introduce a couple of flower offload commands in order to propagate
      template creation/destruction events down to device drivers.
      Drivers may use this information to prepare HW in an optimal way
      for future filter insertions.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34738452
    • J
      net: sched: introduce chain templates · 9f407f17
      Jiri Pirko 提交于
      Allow user to set a template for newly created chains. Template lock
      down the chain for particular classifier type/options combinations.
      The classifier needs to support templates, otherwise kernel would
      reply with error.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f407f17
    • J
      net: sched: introduce chain object to uapi · 32a4f5ec
      Jiri Pirko 提交于
      Allow user to create, destroy, get and dump chain objects. Do that by
      extending rtnl commands by the chain-specific ones. User will now be
      able to explicitly create or destroy chains (so far this was done only
      automatically according the filter/act needs and refcounting). Also, the
      user will receive notification about any chain creation or destuction.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32a4f5ec
    • J
      net: sched: Avoid implicit chain 0 creation · f71e0ca4
      Jiri Pirko 提交于
      Currently, chain 0 is implicitly created during block creation. However
      that does not align with chain object exposure, creation and destruction
      api introduced later on. So make the chain 0 behave the same way as any
      other chain and only create it when it is needed. Since chain 0 is
      somehow special as the qdiscs need to hold pointer to the first chain
      tp, this requires to move the chain head change callback infra to the
      block structure.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f71e0ca4
    • F
      net/mlx5: FW tracer, events handling · c71ad41c
      Feras Daoud 提交于
      The tracer has one event, event 0x26, with two subtypes:
      - Subtype 0: Ownership change
      - Subtype 1: Traces available
      
      An ownership change occurs in the following cases:
      1- Owner releases his ownership, in this case, an event will be
      sent to inform others to reattempt acquire ownership.
      2- Ownership was taken by a higher priority tool, in this case
      the owner should understand that it lost ownership, and go through
      tear down flow.
      
      The second subtype indicates that there are traces in the trace buffer,
      in this case, the driver polls the tracer buffer for new traces, parse
      them and prepares the messages for printing.
      
      The HW starts tracing from the first address in the tracer buffer.
      Driver receives an event notifying that new trace block exists.
      HW posts a timestamp event at the last 8B of every 256B block.
      Comparing the timestamp to the last handled timestamp would indicate
      that this is a new trace block. Once the new timestamp is detected,
      the entire block is considered valid.
      
      Block validation and parsing, should be done after copying the current
      block to a different location, in order to avoid block overwritten
      during processing.
      Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      c71ad41c
    • F
      net/mlx5: FW tracer, implement tracer logic · f53aaa31
      Feras Daoud 提交于
      Implement FW tracer logic and registers access, initialization and
      cleanup flows.
      
      Initializing the tracer will be part of load one flow, as multiple
      PFs will try to acquire ownership but only one will succeed and will
      be the tracer owner.
      Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      f53aaa31
    • K
      net/smc: provide smc mode in smc_diag.c · c601171d
      Karsten Graul 提交于
      Rename field diag_fallback into diag_mode and set the smc mode of a
      connection explicitly.
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c601171d
    • N
      net: bridge: add support for backup port · 2756f68c
      Nikolay Aleksandrov 提交于
      This patch adds a new port attribute - IFLA_BRPORT_BACKUP_PORT, which
      allows to set a backup port to be used for known unicast traffic if the
      port has gone carrier down. The backup pointer is rcu protected and set
      only under RTNL, a counter is maintained so when deleting a port we know
      how many other ports reference it as a backup and we remove it from all.
      Also the pointer is in the first cache line which is hot at the time of
      the check and thus in the common case we only add one more test.
      The backup port will be used only for the non-flooding case since
      it's a part of the bridge and the flooded packets will be forwarded to it
      anyway. To remove the forwarding just send a 0/non-existing backup port.
      This is used to avoid numerous scalability problems when using MLAG most
      notably if we have thousands of fdbs one would need to change all of them
      on port carrier going down which takes too long and causes a storm of fdb
      notifications (and again when the port comes back up). In a Multi-chassis
      Link Aggregation setup usually hosts are connected to two different
      switches which act as a single logical switch. Those switches usually have
      a control and backup link between them called peerlink which might be used
      for communication in case a host loses connectivity to one of them.
      We need a fast way to failover in case a host port goes down and currently
      none of the solutions (like bond) cannot fulfill the requirements because
      the participating ports are actually the "master" devices and must have the
      same peerlink as their backup interface and at the same time all of them
      must participate in the bridge device. As Roopa noted it's normal practice
      in routing called fast re-route where a precalculated backup path is used
      when the main one is down.
      Another use case of this is with EVPN, having a single vxlan device which
      is backup of every port. Due to the nature of master devices it's not
      currently possible to use one device as a backup for many and still have
      all of them participate in the bridge (which is master itself).
      More detailed information about MLAG is available at the link below.
      https://docs.cumulusnetworks.com/display/DOCS/Multi-Chassis+Link+Aggregation+-+MLAG
      
      Further explanation and a diagram by Roopa:
      Two switches acting in a MLAG pair are connected by the peerlink
      interface which is a bridge port.
      
      the config on one of the switches looks like the below. The other
      switch also has a similar config.
      eth0 is connected to one port on the server. And the server is
      connected to both switches.
      
      br0 -- team0---eth0
            |
            -- switch-peerlink
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2756f68c
  4. 23 7月, 2018 1 次提交
  5. 21 7月, 2018 4 次提交
  6. 20 7月, 2018 5 次提交
  7. 19 7月, 2018 8 次提交
  8. 18 7月, 2018 8 次提交