1. 26 5月, 2018 16 次提交
    • Y
      openvswitch: Support conntrack zone limit · 11efd5cb
      Yi-Hung Wei 提交于
      Currently, nf_conntrack_max is used to limit the maximum number of
      conntrack entries in the conntrack table for every network namespace.
      For the VMs and containers that reside in the same namespace,
      they share the same conntrack table, and the total # of conntrack entries
      for all the VMs and containers are limited by nf_conntrack_max.  In this
      case, if one of the VM/container abuses the usage the conntrack entries,
      it blocks the others from committing valid conntrack entries into the
      conntrack table.  Even if we can possibly put the VM in different network
      namespace, the current nf_conntrack_max configuration is kind of rigid
      that we cannot limit different VM/container to have different # conntrack
      entries.
      
      To address the aforementioned issue, this patch proposes to have a
      fine-grained mechanism that could further limit the # of conntrack entries
      per-zone.  For example, we can designate different zone to different VM,
      and set conntrack limit to each zone.  By providing this isolation, a
      mis-behaved VM only consumes the conntrack entries in its own zone, and
      it will not influence other well-behaved VMs.  Moreover, the users can
      set various conntrack limit to different zone based on their preference.
      
      The proposed implementation utilizes Netfilter's nf_conncount backend
      to count the number of connections in a particular zone.  If the number of
      connection is above a configured limitation, ovs will return ENOMEM to the
      userspace.  If userspace does not configure the zone limit, the limit
      defaults to zero that is no limitation, which is backward compatible to
      the behavior without this patch.
      
      The following high leve APIs are provided to the userspace:
        - OVS_CT_LIMIT_CMD_SET:
          * set default connection limit for all zones
          * set the connection limit for a particular zone
        - OVS_CT_LIMIT_CMD_DEL:
          * remove the connection limit for a particular zone
        - OVS_CT_LIMIT_CMD_GET:
          * get the default connection limit for all zones
          * get the connection limit for a particular zone
      Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11efd5cb
    • Y
      openvswitch: Add conntrack limit netlink definition · 5972be6b
      Yi-Hung Wei 提交于
      Define netlink messages and attributes to support user kernel
      communication that uses the conntrack limit feature.
      Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5972be6b
    • D
      Merge tag 'mlx5e-updates-2018-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · d7c52fc8
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      mlx5e-updates-2018-05-19
      
      This series contains updates for mlx5e netdevice driver with one subject,
      DSCP to priority mapping, in the first patch Huy adds the needed API in
      dcbnl, the second patch adds the needed mlx5 core capability bits for the
      feature, and all other patches are mlx5e (netdev) only changes to add
      support for the feature.
      
      From: Huy Nguyen
      
      Dscp to priority mapping for Ethernet packet:
      
      These patches enable differentiated services code point (dscp) to
      priority mapping for Ethernet packet. Once this feature is
      enabled, the packet is routed to the corresponding priority based on its
      dscp. User can combine this feature with priority flow control (pfc)
      feature to have priority flow control based on the dscp.
      
      Firmware interface:
      Mellanox firmware provides two control knobs for this feature:
        QPTS register allow changing the trust state between dscp and
        pcp mode. The default is pcp mode. Once in dscp mode, firmware will
        route the packet based on its dscp value if the dscp field exists.
      
        QPDPM register allow mapping a specific dscp (0 to 63) to a
        specific priority (0 to 7). By default, all the dscps are mapped to
        priority zero.
      
      Software interface:
      This feature is controlled via application priority TLV. IEEE
      specification P802.1Qcd/D2.1 defines priority selector id 5 for
      application priority TLV. This APP TLV selector defines DSCP to priority
      map. This APP TLV can be sent by the switch or can be set locally using
      software such as lldptool. In mlx5 drivers, we add the support for net
      dcb's getapp and setapp call back. Mlx5 driver only handles the selector
      id 5 application entry (dscp application priority application entry).
      If user sends multiple dscp to priority APP TLV entries on the same
      dscp, the last sent one will take effect. All the previous sent will be
      deleted.
      
      This attribute combined with pfc attribute allows advanced user to
      fine tune the qos setting for specific priority queue. For example,
      user can give dedicated buffer for one or more priorities or user
      can give large buffer to certain priorities.
      
      The dcb buffer configuration will be controlled by lldptool.
      >> lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
            maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
      >> lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
            sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively
      
      After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
      choose dcbnl over devlink interface since this feature is intended to set
      port attributes which are governed by the netdev instance of that port, where
      devlink API is more suitable for global ASIC configurations.
      
      The firmware trust state (in QPTS register) is changed based on the
      number of dscp to priority application entries. When the first dscp to
      priority application entry is added by the user, the trust state is
      changed to dscp. When the last dscp to priority application entry is
      deleted by the user, the trust state is changed to pcp.
      
      When the port is in DSCP trust state, the transmit queue is selected
      based on the dscp of the skb.
      
      When the port is in DSCP trust state and vport inline mode is not NONE,
      firmware requires mlx5 driver to copy the IP header to the
      wqe ethernet segment inline header if the skb has it.
      This is done by changing the transmit queue sq's min inline mode to L3.
      Note that the min inline mode of sqs that belong to other features
      such as xdpsq, icosq are not modified.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7c52fc8
    • B
      8139too: Remove unnecessary netif_napi_del() · a4567579
      Bo Chen 提交于
      The call to free_netdev() in __rtl8139_cleanup_dev() clears the network device
      napi list, and explicit calls to netif_napi_del() are unnecessary.
      Signed-off-by: NBo Chen <chenbo@pdx.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4567579
    • D
      Merge branch 'qed-ethtool-rx-flow-classification-enhancements' · a527af9c
      David S. Miller 提交于
      Manish Chopra says:
      
      ====================
      qed*: ethtool rx flow classification enhancements.
      
      This series re-structures the driver's ethtool rx flow
      classification flow, following that it adds other flow
      profiles and rx flow classification enhancements
      via "ethtool -N/-U"
      
      Please consider applying this to "net-next"
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a527af9c
    • M
      qed*: Support drop action classification · 608e00d0
      Manish Chopra 提交于
      With this patch, User can configure for the supported
      flows to be dropped. Added a stat "gft_filter_drop"
      as well to be populated in ethtool for the dropped flows.
      
      For example -
      
      ethtool -N p5p1 flow-type udp4 dst-port 8000 action -1
      ethtool -N p5p1 flow-type tcp4 scr-ip 192.168.8.1 action -1
      Signed-off-by: NManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: NShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      608e00d0
    • M
      qede: Support flow classification to the VFs. · 39385ab0
      Manish Chopra 提交于
      With the supported classification modes [4 tuples based,
      udp port based, src-ip based], flows can be classified
      to the VFs as well. With this patch, flows can be re-directed
      to the requested VF provided in "action" field of command.
      
      Please note that driver doesn't really care about the queue bits
      in "action" field for the VFs. Since queue will be still chosen
      by FW using RSS hash. [I.e., the classification would be done
      according to vport-only]
      
      For examples -
      
      ethtool -N p5p1 flow-type udp4 dst-port 8000 action 0x100000000
      ethtool -N p5p1 flow-type tcp4 src-ip 192.16.6.10 action 0x200000000
      ethtool -U p5p1 flow-type tcp4 src-ip 192.168.40.100 dst-ip \
      	192.168.40.200 src-port 6660 dst-port 5550 \
      	action 0x100000000
      Signed-off-by: NManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: NShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39385ab0
    • M
      qed*: Support other classification modes. · 3893fc62
      Manish Chopra 提交于
      Currently, driver supports flow classification to PF
      receive queues based on TCP/UDP 4 tuples [src_ip, dst_ip,
      src_port, dst_port] only.
      
      This patch enables to configure different flow profiles
      [For example - only UDP dest port or src_ip based] on the
      adapter so that classification can be done according to
      just those fields as well. Although, at a time just one
      type of flow configuration is supported due to limited
      number of flow profiles available on the device.
      
      For example -
      
      ethtool -N enp7s0f0 flow-type udp4 dst-port 45762 action 2
      ethtool -N enp7s0f0 flow-type tcp4 src-ip 192.16.4.10 action 1
      ethtool -N enp7s0f0 flow-type udp6 dst-port 45762 action 3
      Signed-off-by: NManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: NShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3893fc62
    • M
      qede: Validate unsupported configurations · 89ffd14e
      Manish Chopra 提交于
      Validate and prevent some of the configurations for
      unsupported [by firmware] inputs [for example - mac ext,
      vlans, masks/prefix, tos/tclass] via ethtool -N/-U.
      Signed-off-by: NManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: NShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89ffd14e
    • M
      qede: Refactor ethtool rx classification flow. · 87885310
      Manish Chopra 提交于
      This patch simplifies the ethtool rx flow configuration
      [via ethtool -U/-N] flow code base by dividing it logically
      into various APIs based on given protocols. It also separates
      various validations and calculations done along the flow
      in their own APIs.
      Signed-off-by: NManish Chopra <manish.chopra@cavium.com>
      Signed-off-by: NShahed Shaikh <shahed.shaikh@cavium.com>
      Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87885310
    • A
      cxgb4/cxgb4vf: Notify link changes to OS-dependent code · e2f4f4e9
      Arjun Vynipadath 提交于
      We have a confusion of two different abstractions in the Common
      Code:  Physical Link (Port) and Logical Network Interface (Virtual
      Interface), and we haven't been properly managing the state of the
      intersection of those two abstractions.
      On the one hand we have the Physical state of the Link -- up or down --
      and on the other we have the logical state of the VI, enabled or not.
      {ethN} refers to both the Physical and Logical State. In this case,
      ifconfig only affects/interrogates the Logical State of a VI,
      and ethtool only deals with the Physical State. And these are different.
      
      So, just because we disable the VI, we don't really want to change the
      Physical Link Up/Down state.  Thus, the previous hack to set
      "lc->link_ok = 0" when we disable a VI is completely incorrect.
      
      Where we get into trouble is where the Physical Link State and the
      Logical VI State cross swords.  And that happens in
      t4_handle_get_port_info() where we need to manage/safe the Physical
      Link State, but we also need to know when the Logical VI State has
      changed and pass that back up to the OS-dependent Driver routine
      t4_os_link_changed() which is concerned about the Logical Interface.
      
      So we enable a VI and that causes Firmware to send us a new Port
      Information message, but if none of the Physical Link State
      particulars have changed, we don't call t4_os_link_changed().
      
      This fix uses the existing OS Contract APIs for the Common Code to
      inform the OS-dependent portion of the Host Driver when the "Link" (really
      Logical Network Interface) is "up" or "down". A new API
      t4_enable_pi_params() is added which calls t4_enable_vi_params() and,
      if that is successful, then calls back to the OS Contract API
      t4_os_link_changed() notifying the OS-dependent layer of the
      potential Link State change.
      
      Original Work by : Casey Leedom <leedom@chelsio.com>
      Signed-off-by: NSantosh Rastapur <santosh@chelsio.com>
      Signed-off-by: NArjun Vynipadath <arjun@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2f4f4e9
    • G
      cxgb4: clean up init_one · e8d45292
      Ganesh Goudar 提交于
      clean up init_one and use chip_ver consistently throughout
      init_one() for chip version.
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8d45292
    • G
      cxgb4/cxgb4vf: link management changes for new SFP · 57ccaedb
      Ganesh Goudar 提交于
      newer SFPs like SFP28 and QSFP28 Transceiver Modules present
      several new possibilities which we haven't faced before. Fix the
      assumptions in the code reflecting the more limited capabilities
      of previous Transceiver Module systems
      
      Original work by Casey Leedom <leedom@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57ccaedb
    • Y
      net: fec: remove stale comment · b526e56b
      YueHaibing 提交于
      This comment is outdated as fec_ptp_ioctl has been replaced by fec_ptp_set/fec_ptp_get
      since commit 1d5244d0 ("fec: Implement the SIOCGHWTSTAMP ioctl")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b526e56b
    • M
      sfc: stop the TX queue before pushing new buffers · 0c235113
      Martin Habets 提交于
      efx_enqueue_skb() can push new buffers for the xmit_more functionality.
      We must stops the TX queue before this or else the TX queue does not get
      restarted and we get a netdev watchdog.
      
      In the error handling we may now need to unwind more than 1 packet, and
      we may need to push the new buffers onto the partner queue.
      
      v2: In the error leg also push this queue if xmit_more is set
      
      Fixes: e9117e50 ("sfc: Firmware-Assisted TSO version 2")
      Reported-by: NJarod Wilson <jarod@redhat.com>
      Tested-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NMartin Habets <mhabets@solarflare.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c235113
    • N
      net: bridge: add support for port isolation · 7d850abd
      Nikolay Aleksandrov 提交于
      This patch adds support for a new port flag - BR_ISOLATED. If it is set
      then isolated ports cannot communicate between each other, but they can
      still communicate with non-isolated ports. The same can be achieved via
      ACLs but they can't scale with large number of ports and also the
      complexity of the rules grows. This feature can be used to achieve
      isolated vlan functionality (similar to pvlan) as well, though currently
      it will be port-wide (for all vlans on the port). The new test in
      should_deliver uses data that is already cache hot and the new boolean
      is used to avoid an additional source port test in should_deliver.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d850abd
  2. 25 5月, 2018 24 次提交