1. 29 6月, 2018 1 次提交
  2. 05 5月, 2018 1 次提交
    • S
      openvswitch: Don't swap table in nlattr_set() after OVS_ATTR_NESTED is found · 72f17baf
      Stefano Brivio 提交于
      If an OVS_ATTR_NESTED attribute type is found while walking
      through netlink attributes, we call nlattr_set() recursively
      passing the length table for the following nested attributes, if
      different from the current one.
      
      However, once we're done with those sub-nested attributes, we
      should continue walking through attributes using the current
      table, instead of using the one related to the sub-nested
      attributes.
      
      For example, given this sequence:
      
      1  OVS_KEY_ATTR_PRIORITY
      2  OVS_KEY_ATTR_TUNNEL
      3	OVS_TUNNEL_KEY_ATTR_ID
      4	OVS_TUNNEL_KEY_ATTR_IPV4_SRC
      5	OVS_TUNNEL_KEY_ATTR_IPV4_DST
      6	OVS_TUNNEL_KEY_ATTR_TTL
      7	OVS_TUNNEL_KEY_ATTR_TP_SRC
      8	OVS_TUNNEL_KEY_ATTR_TP_DST
      9  OVS_KEY_ATTR_IN_PORT
      10 OVS_KEY_ATTR_SKB_MARK
      11 OVS_KEY_ATTR_MPLS
      
      we switch to the 'ovs_tunnel_key_lens' table on attribute #3,
      and we don't switch back to 'ovs_key_lens' while setting
      attributes #9 to #11 in the sequence. As OVS_KEY_ATTR_MPLS
      evaluates to 21, and the array size of 'ovs_tunnel_key_lens' is
      15, we also get this kind of KASan splat while accessing the
      wrong table:
      
      [ 7654.586496] ==================================================================
      [ 7654.594573] BUG: KASAN: global-out-of-bounds in nlattr_set+0x164/0xde9 [openvswitch]
      [ 7654.603214] Read of size 4 at addr ffffffffc169ecf0 by task handler29/87430
      [ 7654.610983]
      [ 7654.612644] CPU: 21 PID: 87430 Comm: handler29 Kdump: loaded Not tainted 3.10.0-866.el7.test.x86_64 #1
      [ 7654.623030] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
      [ 7654.631379] Call Trace:
      [ 7654.634108]  [<ffffffffb65a7c50>] dump_stack+0x19/0x1b
      [ 7654.639843]  [<ffffffffb53ff373>] print_address_description+0x33/0x290
      [ 7654.647129]  [<ffffffffc169b37b>] ? nlattr_set+0x164/0xde9 [openvswitch]
      [ 7654.654607]  [<ffffffffb53ff812>] kasan_report.part.3+0x242/0x330
      [ 7654.661406]  [<ffffffffb53ff9b4>] __asan_report_load4_noabort+0x34/0x40
      [ 7654.668789]  [<ffffffffc169b37b>] nlattr_set+0x164/0xde9 [openvswitch]
      [ 7654.676076]  [<ffffffffc167ef68>] ovs_nla_get_match+0x10c8/0x1900 [openvswitch]
      [ 7654.684234]  [<ffffffffb61e9cc8>] ? genl_rcv+0x28/0x40
      [ 7654.689968]  [<ffffffffb61e7733>] ? netlink_unicast+0x3f3/0x590
      [ 7654.696574]  [<ffffffffc167dea0>] ? ovs_nla_put_tunnel_info+0xb0/0xb0 [openvswitch]
      [ 7654.705122]  [<ffffffffb4f41b50>] ? unwind_get_return_address+0xb0/0xb0
      [ 7654.712503]  [<ffffffffb65d9355>] ? system_call_fastpath+0x1c/0x21
      [ 7654.719401]  [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370
      [ 7654.726298]  [<ffffffffb4f41d79>] ? update_stack_state+0x229/0x370
      [ 7654.733195]  [<ffffffffb53fe4b5>] ? kasan_unpoison_shadow+0x35/0x50
      [ 7654.740187]  [<ffffffffb53fe62a>] ? kasan_kmalloc+0xaa/0xe0
      [ 7654.746406]  [<ffffffffb53fec32>] ? kasan_slab_alloc+0x12/0x20
      [ 7654.752914]  [<ffffffffb53fe711>] ? memset+0x31/0x40
      [ 7654.758456]  [<ffffffffc165bf92>] ovs_flow_cmd_new+0x2b2/0xf00 [openvswitch]
      
      [snip]
      
      [ 7655.132484] The buggy address belongs to the variable:
      [ 7655.138226]  ovs_tunnel_key_lens+0xf0/0xffffffffffffd400 [openvswitch]
      [ 7655.145507]
      [ 7655.147166] Memory state around the buggy address:
      [ 7655.152514]  ffffffffc169eb80: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa
      [ 7655.160585]  ffffffffc169ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [ 7655.168644] >ffffffffc169ec80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa fa
      [ 7655.176701]                                                              ^
      [ 7655.184372]  ffffffffc169ed00: fa fa fa fa 00 00 00 00 fa fa fa fa 00 00 00 05
      [ 7655.192431]  ffffffffc169ed80: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
      [ 7655.200490] ==================================================================
      Reported-by: NHangbin Liu <liuhangbin@gmail.com>
      Fixes: 982b5270 ("openvswitch: Fix mask generation for nested attributes.")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72f17baf
  3. 26 1月, 2018 1 次提交
  4. 19 1月, 2018 1 次提交
  5. 16 1月, 2018 1 次提交
  6. 16 12月, 2017 1 次提交
  7. 27 11月, 2017 1 次提交
    • Z
      openvswitch: fix the incorrect flow action alloc size · 67c8d22a
      zhangliping 提交于
      If we want to add a datapath flow, which has more than 500 vxlan outputs'
      action, we will get the following error reports:
        openvswitch: netlink: Flow action size 32832 bytes exceeds max
        openvswitch: netlink: Flow action size 32832 bytes exceeds max
        openvswitch: netlink: Actions may not be safe on all matching packets
        ... ...
      
      It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
      this is not the root cause. For example, for a vxlan output action, we need
      about 60 bytes for the nlattr, but after it is converted to the flow
      action, it only occupies 24 bytes. This means that we can still support
      more than 1000 vxlan output actions for a single datapath flow under the
      the current 32k max limitation.
      
      So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
      shouldn't report EINVAL and keep it move on, as the judgement can be
      done by the reserve_sfa_size.
      Signed-off-by: Nzhangliping <zhangliping02@baidu.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67c8d22a
  8. 14 11月, 2017 1 次提交
  9. 13 11月, 2017 1 次提交
  10. 08 11月, 2017 1 次提交
    • Y
      openvswitch: enable NSH support · b2d0f5d5
      Yi Yang 提交于
      v16->17
       - Fixed disputed check code: keep them in nsh_push and nsh_pop
         but also add them in __ovs_nla_copy_actions
      
      v15->v16
       - Add csum recalculation for nsh_push, nsh_pop and set_nsh
         pointed out by Pravin
       - Move nsh key into the union with ipv4 and ipv6 and add
         check for nsh key in match_validate pointed out by Pravin
       - Add nsh check in validate_set and __ovs_nla_copy_actions
      
      v14->v15
       - Check size in nsh_hdr_from_nlattr
       - Fixed four small issues pointed out By Jiri and Eric
      
      v13->v14
       - Rename skb_push_nsh to nsh_push per Dave's comment
       - Rename skb_pop_nsh to nsh_pop per Dave's comment
      
      v12->v13
       - Fix NSH header length check in set_nsh
      
      v11->v12
       - Fix missing changes old comments pointed out
       - Fix new comments for v11
      
      v10->v11
       - Fix the left three disputable comments for v9
         but not fixed in v10.
      
      v9->v10
       - Change struct ovs_key_nsh to
             struct ovs_nsh_key_base base;
             __be32 context[NSH_MD1_CONTEXT_SIZE];
       - Fix new comments for v9
      
      v8->v9
       - Fix build error reported by daily intel build
         because nsh module isn't selected by openvswitch
      
      v7->v8
       - Rework nested value and mask for OVS_KEY_ATTR_NSH
       - Change pop_nsh to adapt to nsh kernel module
       - Fix many issues per comments from Jiri Benc
      
      v6->v7
       - Remove NSH GSO patches in v6 because Jiri Benc
         reworked it as another patch series and they have
         been merged.
       - Change it to adapt to nsh kernel module added by NSH
         GSO patch series
      
      v5->v6
       - Fix the rest comments for v4.
       - Add NSH GSO support for VxLAN-gpe + NSH and
         Eth + NSH.
      
      v4->v5
       - Fix many comments by Jiri Benc and Eric Garver
         for v4.
      
      v3->v4
       - Add new NSH match field ttl
       - Update NSH header to the latest format
         which will be final format and won't change
         per its author's confirmation.
       - Fix comments for v3.
      
      v2->v3
       - Change OVS_KEY_ATTR_NSH to nested key to handle
         length-fixed attributes and length-variable
         attriubte more flexibly.
       - Remove struct ovs_action_push_nsh completely
       - Add code to handle nested attribute for SET_MASKED
       - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
         to transfer NSH header data.
       - Fix comments and coding style issues by Jiri and Eric
      
      v1->v2
       - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
       - Dynamically allocate struct ovs_action_push_nsh for
         length-variable metadata.
      
      OVS master and 2.8 branch has merged NSH userspace
      patch series, this patch is to enable NSH support
      in kernel data path in order that OVS can support
      NSH in compat mode by porting this.
      Signed-off-by: NYi Yang <yi.y.yang@intel.com>
      Acked-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NEric Garver <e@erig.me>
      Acked-by: NPravin Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2d0f5d5
  11. 11 10月, 2017 1 次提交
  12. 10 10月, 2017 1 次提交
  13. 12 8月, 2017 1 次提交
  14. 25 6月, 2017 1 次提交
    • J
      net: store port/representator id in metadata_dst · 3fcece12
      Jakub Kicinski 提交于
      Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
      representators and control messages over single set of hardware queues.
      Control messages and muxed traffic may need ordered delivery.
      
      Those requirements make it hard to comfortably use TC infrastructure today
      unless we have a way of attaching metadata to skbs at the upper device.
      Because single set of queues is used for many netdevs stopping TC/sched
      queues of all of them reliably is impossible and lower device has to
      retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
      the fastpath.
      
      This patch attempts to enable port/representative devs to attach metadata
      to skbs which carry port id.  This way representatives can be queueless and
      all queuing can be performed at the lower netdev in the usual way.
      
      Traffic arriving on the port/representative interfaces will be have
      metadata attached and will subsequently be queued to the lower device for
      transmission.  The lower device should recognize the metadata and translate
      it to HW specific format which is most likely either a special header
      inserted before the network headers or descriptor/metadata fields.
      
      Metadata is associated with the lower device by storing the netdev pointer
      along with port id so that if TC decides to redirect or mirror the new
      netdev will not try to interpret it.
      
      This is mostly for SR-IOV devices since switches don't have lower netdevs
      today.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fcece12
  15. 14 4月, 2017 1 次提交
  16. 23 3月, 2017 1 次提交
    • A
      openvswitch: Optimize sample action for the clone use cases · 798c1661
      andy zhou 提交于
      With the introduction of open flow 'clone' action, the OVS user space
      can now translate the 'clone' action into kernel datapath 'sample'
      action, with 100% probability, to ensure that the clone semantics,
      which is that the packet seen by the clone action is the same as the
      packet seen by the action after clone, is faithfully carried out
      in the datapath.
      
      While the sample action in the datpath has the matching semantics,
      its implementation is only optimized for its original use.
      Specifically, there are two limitation: First, there is a 3 level of
      nesting restriction, enforced at the flow downloading time. This
      limit turns out to be too restrictive for the 'clone' use case.
      Second, the implementation avoid recursive call only if the sample
      action list has a single userspace action.
      
      The main optimization implemented in this series removes the static
      nesting limit check, instead, implement the run time recursion limit
      check, and recursion avoidance similar to that of the 'recirc' action.
      This optimization solve both #1 and #2 issues above.
      
      One related optimization attempts to avoid copying flow key as
      long as the actions enclosed does not change the flow key. The
      detection is performed only once at the flow downloading time.
      
      Another related optimization is to rewrite the action list
      at flow downloading time in order to save the fast path from parsing
      the sample action list in its original form repeatedly.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      798c1661
  17. 17 3月, 2017 1 次提交
  18. 16 3月, 2017 1 次提交
  19. 10 2月, 2017 2 次提交
    • J
      openvswitch: Pack struct sw_flow_key. · 316d4d78
      Jarno Rajahalme 提交于
      struct sw_flow_key has two 16-bit holes. Move the most matched
      conntrack match fields there.  In some typical cases this reduces the
      size of the key that needs to be hashed into half and into one cache
      line.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      316d4d78
    • J
      openvswitch: Add original direction conntrack tuple to sw_flow_key. · 9dd7f890
      Jarno Rajahalme 提交于
      Add the fields of the conntrack original direction 5-tuple to struct
      sw_flow_key.  The new fields are initially marked as non-existent, and
      are populated whenever a conntrack action is executed and either finds
      or generates a conntrack entry.  This means that these fields exist
      for all packets that were not rejected by conntrack as untrackable.
      
      The original tuple fields in the sw_flow_key are filled from the
      original direction tuple of the conntrack entry relating to the
      current packet, or from the original direction tuple of the master
      conntrack entry, if the current conntrack entry has a master.
      Generally, expected connections of connections having an assigned
      helper (e.g., FTP), have a master conntrack entry.
      
      The main purpose of the new conntrack original tuple fields is to
      allow matching on them for policy decision purposes, with the premise
      that the admissibility of tracked connections reply packets (as well
      as original direction packets), and both direction packets of any
      related connections may be based on ACL rules applying to the master
      connection's original direction 5-tuple.  This also makes it easier to
      make policy decisions when the actual packet headers might have been
      transformed by NAT, as the original direction 5-tuple represents the
      packet headers before any such transformation.
      
      When using the original direction 5-tuple the admissibility of return
      and/or related packets need not be based on the mere existence of a
      conntrack entry, allowing separation of admission policy from the
      established conntrack state.  While existence of a conntrack entry is
      required for admission of the return or related packets, policy
      changes can render connections that were initially admitted to be
      rejected or dropped afterwards.  If the admission of the return and
      related packets was based on mere conntrack state (e.g., connection
      being in an established state), a policy change that would make the
      connection rejected or dropped would need to find and delete all
      conntrack entries affected by such a change.  When using the original
      direction 5-tuple matching the affected conntrack entries can be
      allowed to time out instead, as the established state of the
      connection would not need to be the basis for packet admission any
      more.
      
      It should be noted that the directionality of related connections may
      be the same or different than that of the master connection, and
      neither the original direction 5-tuple nor the conntrack state bits
      carry this information.  If needed, the directionality of the master
      connection can be stored in master's conntrack mark or labels, which
      are automatically inherited by the expected related connections.
      
      The fact that neither ARP nor ND packets are trackable by conntrack
      allows mutual exclusion between ARP/ND and the new conntrack original
      tuple fields.  Hence, the IP addresses are overlaid in union with ARP
      and ND fields.  This allows the sw_flow_key to not grow much due to
      this patch, but it also means that we must be careful to never use the
      new key fields with ARP or ND packets.  ARP is easy to distinguish and
      keep mutually exclusive based on the ethernet type, but ND being an
      ICMPv6 protocol requires a bit more attention.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dd7f890
  20. 21 12月, 2016 1 次提交
  21. 13 11月, 2016 3 次提交
  22. 21 9月, 2016 1 次提交
  23. 09 9月, 2016 1 次提交
  24. 11 6月, 2016 1 次提交
  25. 24 4月, 2016 1 次提交
  26. 19 3月, 2016 1 次提交
    • S
      openvswitch: allow output of MPLS packets on tunnel vports · fe3a5f6c
      Simon Horman 提交于
      Currently output of MPLS packets on tunnel vports is not allowed by Open
      vSwitch. This is because historically encapsulation was done in such a way
      that the inner_protocol field of the skb needed to hold the inner protocol
      for both MPLS and tunnel encapsulation in order for GSO segmentation to be
      performed correctly.
      
      Since b2acd1dc ("openvswitch: Use regular GRE net_device instead of
      vport") Open vSwitch makes use of lwt to output to tunnel netdevs which
      perform encapsulation. As no drivers expose support for MPLS offloads this
      means that GSO packets are segmented in software by validate_xmit_skb(),
      which is called from __dev_queue_xmit(), before tunnel encapsulation occurs.
      This means that the inner protocol of MPLS is no longer needed by the time
      encapsulation occurs and the contention on the inner_protocol field of the
      skb no longer occurs.
      
      Thus it is now safe to output MPLS to tunnel vports.
      Signed-off-by: NSimon Horman <simon.horman@netronome.com>
      Reviewed-by: NJesse Gross <jesse@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe3a5f6c
  27. 17 2月, 2016 1 次提交
  28. 19 12月, 2015 1 次提交
  29. 23 10月, 2015 1 次提交
    • P
      openvswitch: Fix egress tunnel info. · fc4099f1
      Pravin B Shelar 提交于
      While transitioning to netdev based vport we broke OVS
      feature which allows user to retrieve tunnel packet egress
      information for lwtunnel devices.  Following patch fixes it
      by introducing ndo operation to get the tunnel egress info.
      Same ndo operation can be used for lwtunnel devices and compat
      ovs-tnl-vport devices. So after adding such device operation
      we can remove similar operation from ovs-vport.
      
      Fixes: 614732ea ("openvswitch: Use regular VXLAN net_device device").
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc4099f1
  30. 22 10月, 2015 2 次提交
  31. 07 10月, 2015 4 次提交
  32. 05 10月, 2015 1 次提交
  33. 16 9月, 2015 1 次提交
    • J
      openvswitch: Fix mask generation for nested attributes. · 982b5270
      Jesse Gross 提交于
      Masks were added to OVS flows in a way that was backwards compatible
      with userspace programs that did not generate masks. As a result, it is
      possible that we may receive flows that do not have a mask and we need
      to synthesize one.
      
      Generating a mask requires iterating over attributes and descending into
      nested attributes. For each level we need to know the size to generate the
      correct mask. We do this with a linked table of attribute types.
      
      Although the logic to handle these nested attributes was there in concept,
      there are a number of bugs in practice. Examples include incomplete links
      between tables, variable length attributes being treated as nested and
      missing sanity checks.
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      982b5270