1. 06 2月, 2017 24 次提交
  2. 05 2月, 2017 8 次提交
    • D
      Merge branch 'ipv6-Improve-user-experience-with-multipath-routes' · 3976001c
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net: ipv6: Improve user experience with multipath routes
      
      This series closes a couple of gaps between IPv4 and IPv6 with respect
      to multipath routes:
      
      1. IPv4 allows all nexthops of multipath routes to be deleted using just
         the prefix and length; IPv6 only deletes the first nexthop for the
         route if only the prefix and length are given.
      
      2. IPv4 returns multipath routes encoded in the RTA_MULTIPATH attribute.
         IPv6 returns a series of routes with the same prefix and length - one
         for each nexthop. This happens for both dumps and notifications.
      
      IPv6 does accept RTA_MULTIPATH encoded routes, but installs them as a
      series of routes.
      
      Patch 1 addresses the first item by allowing IPv6 multipath routes to be
      deleted using just the prefix and length. Patch 2 addresses the second
      allowing IPv6 multipath routes to be returned encoded in the RTA_MULTIPATH.
      
      Patches 3 and 4 upate the RTM_{NEW,DEL}ROUTE notifications to generate
      1 notification with RTA_MULTIPATH where applicable.
      
      Patch 5 prints IPv6 addresses in compressed format when showing route
      replace errors. This was noticed testing REPLACE failures.
      
      The end result for multipath routes:
      1. Dump
         - RTA_MULTIPATH used for multipath routes
      
          $ ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:200::/120 metric 1024
      	    nexthop via 2001:db8:1::2  dev eth1 weight 1
      	    nexthop via 2001:db8:2::2  dev eth2 weight 1
          ...
      
      2. Route Add
         - one notification with RTA_MULTIPATH attribute
      
          $ ip -6 ro add vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::2 nexthop via 2001:db8:2::2
      
          $ ip mon route
          2001:db8:200::/120 table red metric 1024
      	nexthop via 2001:db8:1::2  dev eth1 weight 1
      	nexthop via 2001:db8:2::2  dev eth2 weight 1
      
      2. Route Replace
         - one notification with RTA_MULTIPATH attribute
      
          $ ip -6 ro replace vrf red 2001:db8:200::/120 nexthop via 2001:db8:1::16 nexthop via 2001:db8:2::16
      
          $ ip mon route
          Replaced 2001:db8:200::/120 table red metric 1024
      	    nexthop via 2001:db8:1::16  dev eth1 weight 1
      	    nexthop via 2001:db8:2::16  dev eth2 weight 1
      
         - on a failure after the insertion of the first nexthop (which means
           the original route has been replaced in the FIB), a notification is
           sent with the successful nexthops and then the nexthops are deleted
           with one notification per hop. This is consistent with how it works
           today except the successful additions are coalesced into 1
           notification.
      
      3. Route Delete
         - delete of entire multipath route using prefix/length only 1
           notification is generated:
          $ ip -6 ro del vrf red 2001:db8:200::/120
      
          $ ip mon route
          Deleted 2001:db8:200::/120 table red metric 1024
      	    nexthop via 2001:db8:1::16  dev eth1 weight 1
      	    nexthop via 2001:db8:2::16  dev eth2 weight 1
      
         - if a delete request contains nexthops one notification is
           generated per nexthop deleted. This is unavoidable since IPv6
           alllows a single nexthop to be deleted within a multipath route
      
      4. Route Appends
         - IPv6 allows nexthops to be appended to an existing route. In this
           case one notification is sent for the new route with the append
           flag set.
      
          $ ip -6 ro append vrf red 2001:db8:200::/120 nexthop via 2001:db8:2::20 nexthop via 2001:db8:1::20
      
          $ ip mon route
          Append 2001:db8:200::/120 table red metric 1024
      	    nexthop via 2001:db8:1::2  dev eth1 weight 1
      	    nexthop via 2001:db8:2::2  dev eth2 weight 1
      	    nexthop via 2001:db8:2::20  dev eth2 weight 1
      	    nexthop via 2001:db8:1::20  dev eth1 weight 1
      
        - on failure of an append, a notification is sent with the route
          containing all of the nexthops successfully added, and it is
          followed by delete notifications as the hops are removed
          returning the route to its prior state. This is consistent with
          how it works today except the successful additions are coalesced
          into 1 notification.
      
      Addresses some of the inconsistencies also noted by Roopa at netdev0.1:
      https://www.netdev01.org/docs/prabhu-linux_ipv4_ipv6_inconsistencies_talk_slides.pdf
      
      v4
      - changed series to do encoding in 1 patch and updating notificatons
        in separate patches to make it easier to review and understand
      
      - 1 notification for delete when using prefix/length; 1 notification for
        append
      
      - handle delete of a single nexthop without RTA_MULTIPATH in delete request
      
      - upated commit messages and cover letter
      
      v3
      - removed the need for a user API to opt-in to change. Requiring an
        API just shifts the difference from same API with different
        behavior to different API to achieve equivalent behavior
      - route notifications changed to use RTA_MULTIPATH for add and replace
      - upated commit messages and cover letter
      
      v2
      - fixed locking in patch 1 as noted by DaveM
      - changed user API for patch 2 to require an rtmsg with RTM_F_ALL_NEXTHOPS
        set in rtm_flags
      - revamped explanation of patch 2 and cover letter
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3976001c
    • D
      net: ipv6: Use compressed IPv6 addresses showing route replace error · 7d4d5065
      David Ahern 提交于
      ip6_print_replace_route_err logs an error if a route replace fails with
      IPv6 addresses in the full format. e.g,:
      
      IPv6: IPV6: multipath route replace failed (check consistency of installed routes): 2001:0db8:0200:0000:0000:0000:0000:0000 nexthop 2001:0db8:0001:0000:0000:0000:0000:0016 ifi 0
      
      Change the message to dump the addresses in the compressed format.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d4d5065
    • D
      net: ipv6: Change notifications for multipath delete to RTA_MULTIPATH · 16a16cd3
      David Ahern 提交于
      If an entire multipath route is deleted using prefix and len (without any
      nexthops), send a single RTM_DELROUTE notification with the full route
      using RTA_MULTIPATH. This is done by generating the skb before the route
      delete when all of the sibling routes are still present but sending it
      after the route has been removed from the FIB. The skip_notify flag
      is used to tell the lower fib code not to send notifications for the
      individual nexthop routes.
      
      If a route is deleted using RTA_MULTIPATH for any nexthops or a single
      nexthop entry is deleted, then the nexthops are deleted one at a time with
      notifications sent as each hop is deleted. This is necessary given that
      IPv6 allows individual hops within a route to be deleted.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16a16cd3
    • D
      net: ipv6: Change notifications for multipath add to RTA_MULTIPATH · 3b1137fe
      David Ahern 提交于
      Change ip6_route_multipath_add to send one notifciation with the full
      route encoded with RTA_MULTIPATH instead of a series of individual routes.
      This is done by adding a skip_notify flag to the nl_info struct. The
      flag is used to skip sending of the notification in the fib code that
      actually inserts the route. Once the full route has been added, a
      notification is generated with all nexthops.
      
      ip6_route_multipath_add handles 3 use cases: new routes, route replace,
      and route append. The multipath notification generated needs to be
      consistent with the order of the nexthops and it should be consistent
      with the order in a FIB dump which means the route with the first nexthop
      needs to be used as the route reference. For the first 2 cases (new and
      replace), a reference to the route used to send the notification is
      obtained by saving the first route added. For the append case, the last
      route added is used to loop back to its first sibling route which is
      the first nexthop in the multipath route.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b1137fe
    • D
      net: ipv6: Add support to dump multipath routes via RTA_MULTIPATH attribute · beb1afac
      David Ahern 提交于
      IPv6 returns multipath routes as a series of individual routes making
      their display and handling by userspace different and more complicated
      than IPv4, putting the burden on the user to see that a route is part of
      a multipath route and internally creating a multipath route if desired
      (e.g., libnl does this as of commit 29b71371e764). This patch addresses
      this difference, allowing multipath routes to be returned using the
      RTA_MULTIPATH attribute.
      
      The end result is that IPv6 multipath routes can be treated and displayed
      in a format similar to IPv4:
      
          $ ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:200::/120 metric 1024
      	    nexthop via 2001:db8:1::2  dev eth1 weight 1
      	    nexthop via 2001:db8:2::2  dev eth2 weight 1
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      beb1afac
    • D
      net: ipv6: Allow shorthand delete of all nexthops in multipath route · 0ae81335
      David Ahern 提交于
      IPv4 allows multipath routes to be deleted using just the prefix and
      length. For example:
          $ ip ro ls vrf red
          unreachable default metric 8192
          1.1.1.0/24
              nexthop via 10.100.1.254  dev eth1 weight 1
              nexthop via 10.11.200.2  dev eth11.200 weight 1
          10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
          10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
      
          $ ip ro del 1.1.1.0/24 vrf red
      
          $ ip ro ls vrf red
          unreachable default metric 8192
          10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
          10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3
      
      The same notation does not work with IPv6 because of how multipath routes
      are implemented for IPv6. For IPv6 only the first nexthop of a multipath
      route is deleted if the request contains only a prefix and length. This
      leads to unnecessary complexity in userspace dealing with IPv6 multipath
      routes.
      
      This patch allows all nexthops to be deleted without specifying each one
      in the delete request. Internally, this is done by walking the sibling
      list of the route matching the specifications given (prefix, length,
      metric, protocol, etc).
      
          $  ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:200::/120 via 2001:db8:1::2 dev eth1 metric 1024  pref medium
          2001:db8:200::/120 via 2001:db8:2::2 dev eth2 metric 1024  pref medium
          ...
      
          $ ip -6 ro del vrf red 2001:db8:200::/120
      
          $ ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          ...
      
      Because IPv6 allows individual nexthops to be deleted without deleting
      the entire route, the ip6_route_multipath_del and non-multipath code
      path (ip6_route_del) have to be discriminated so that all nexthops are
      only deleted for the latter case. This is done by making the existing
      fc_type in fib6_config a u16 and then adding a new u16 field with
      fc_delete_all_nh as the first bit.
      Suggested-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ae81335
    • E
      virtio_net: exploit napi_complete_done() return value · 4d6308aa
      Eric Dumazet 提交于
      Since commit 364b6055 ("net: busy-poll: return busypolling status to
      drivers"), napi_complete_done() returns a boolean that can be used
      by drivers to conditionally rearm interrupts.
      
      This patch changes virtio_net to use this boolean to avoid a bit of
      overhead for busy-poll users.
      
      Jason reports about 1.1% improvement for 1 byte TCP_RR (burst 100).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d6308aa
    • D
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · a076d1bd
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2017-02-03
      
      This series contains updates to i40e/i40evf only.
      
      Jake fixes up the driver to not call i40e_vsi_kill_vlan() or
      i40e_vsi_add_vlan() when the PVID is set or when the VID is less than 1.
      Cleaned up a check which really is not needed since there is no real
      reason why we cannot just call i40e_del_mac_all_vlan() directly.  Renamed
      functions to better reflect their actual purpose and how they function
      in a more clear manner.
      
      Bimmy cleans up unused/deprecated macros.
      
      Mitch cleans up unused device ids which were intended for use when
      running Linux VF drivers under Hyper-V, but found to be not needed.
      Then cleaned up a function that is no longer needed since the client
      open and close functions were refactored.  Adds a sleep without timeout
      until the reply from the PF driver has been received since the iWARP
      client cannot continue until the operation has been completed.
      
      Tushar Dave fixes an issue seen on SPARC where the use of the 'packed'
      directive was causing kernel unaligned errors.
      
      Alex does a refactor to pull some data off of the stack and store it
      in the transmit buffer info section of the transmit ring.
      
      Alan fixes a bug which was caused by passing a bad register value to the
      firmware, by refactoring the macro INTRL_USEC_TO_REG into a static
      inline function.  Also added feedback to the user as to the actual
      interrupt rate limit being used when it differs from the requested limit.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a076d1bd
  3. 04 2月, 2017 8 次提交
    • E
      net: skb_needs_check() accepts CHECKSUM_NONE for tx · 6e7bc478
      Eric Dumazet 提交于
      My recent change missed fact that UFO would perform a complete
      UDP checksum before segmenting in frags.
      
      In this case skb->ip_summed is set to CHECKSUM_NONE.
      
      We need to add this valid case to skb_needs_check()
      
      Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e7bc478
    • E
      net: remove support for per driver ndo_busy_poll() · 79e7fff4
      Eric Dumazet 提交于
      We added generic support for busy polling in NAPI layer in linux-4.5
      
      No network driver uses ndo_busy_poll() anymore, we can get rid
      of the pointer in struct net_device_ops, and its use in sk_busy_loop()
      
      Saves NETIF_F_BUSY_POLL features bit.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e7fff4
    • D
      enic: Remove local ndo_busy_poll() implementation. · 7a655c63
      David S. Miller 提交于
      We do polling generically these days.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a655c63
    • E
      ixgbevf: get rid of custom busy polling code · 508aac6d
      Eric Dumazet 提交于
      In linux-4.5, busy polling was implemented in core
      NAPI stack, meaning that all custom implementation can
      be removed from drivers.
      
      Not only we remove lot's of code, we also remove one lock
      operation in fast path, and allow GRO to do its job.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      508aac6d
    • E
      ixgbe: get rid of custom busy polling code · 3ffc1af5
      Eric Dumazet 提交于
      In linux-4.5, busy polling was implemented in core
      NAPI stack, meaning that all custom implementation can
      be removed from drivers.
      
      Not only we remove lot's of code, we also remove one lock
      operation in fast path, and allow GRO to do its job.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ffc1af5
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 52e01b84
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree, they are:
      
      1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
         sk_buff so we only access one single cacheline in the conntrack
         hotpath. Patchset from Florian Westphal.
      
      2) Don't leak pointer to internal structures when exporting x_tables
         ruleset back to userspace, from Willem DeBruijn. This includes new
         helper functions to copy data to userspace such as xt_data_to_user()
         as well as conversions of our ip_tables, ip6_tables and arp_tables
         clients to use it. Not surprinsingly, ebtables requires an ad-hoc
         update. There is also a new field in x_tables extensions to indicate
         the amount of bytes that we copy to userspace.
      
      3) Add nf_log_all_netns sysctl: This new knob allows you to enable
         logging via nf_log infrastructure for all existing netnamespaces.
         Given the effort to provide pernet syslog has been discontinued,
         let's provide a way to restore logging using netfilter kernel logging
         facilities in trusted environments. Patch from Michal Kubecek.
      
      4) Validate SCTP checksum from conntrack helper, from Davide Caratti.
      
      5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
         a copy&paste from the original helper, from Florian Westphal.
      
      6) Reset netfilter state when duplicating packets, also from Florian.
      
      7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
         nft_meta, from Liping Zhang.
      
      8) Add missing code to deal with loopback packets from nft_meta when
         used by the netdev family, also from Liping.
      
      9) Several cleanups on nf_tables, one to remove unnecessary check from
         the netlink control plane path to add table, set and stateful objects
         and code consolidation when unregister chain hooks, from Gao Feng.
      
      10) Fix harmless reference counter underflow in IPVS that, however,
          results in problems with the introduction of the new refcount_t
          type, from David Windsor.
      
      11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
          from Davide Caratti.
      
      12) Missing documentation on nf_tables uapi header, from Liping Zhang.
      
      13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52e01b84
    • D
      Merge branch 'mlxsw-Introduce-TC-Flower-offload-using-TCAM' · e60df624
      David S. Miller 提交于
      Jiri Pirko says:
      
      ====================
      mlxsw: Introduce TC Flower offload using TCAM
      
      This patchset introduces support for offloading TC cls_flower and actions
      to Spectrum TCAM-base policy engine.
      
      The patchset contains patches to allow work with flexible keys and actions
      which are used in Spectrum TCAM.
      
      It also contains in-driver infrastructure for offloading TC rules to TCAM HW.
      The TCAM management code is simple and limited for now. It is going to be
      extended as a follow-up work.
      
      The last patch uses the previously introduced infra to allow to implement
      cls_flower offloading. Initially, only limited set of match-keys and only
      a drop and forward actions are supported.
      
      As a dependency, this patchset introduces parman - priority array
      area manager - as a library.
      
      v1->v2:
      - patch11:
        - use __set_bit and __test_and_clear_bit as suggested by DaveM
      - patch16:
        - Added documentation to the API functions as suggested by Tom Herbert
      - patch17:
        - use __set_bit and __clear_bit as suggested by DaveM
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e60df624
    • J
      mlxsw: spectrum: Implement TC flower offload · 7aa0f5aa
      Jiri Pirko 提交于
      Extend the existing setup_tc ndo call and allow to offload cls_flower
      rules. Only limited set of dissector keys and actions are supported now.
      Use previously introduced ACL infrastructure to offload cls_flower rules
      to be processed in the HW.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7aa0f5aa