1. 17 8月, 2017 1 次提交
    • L
      openvswitch: fix skb_panic due to the incorrect actions attrlen · 494bea39
      Liping Zhang 提交于
      For sw_flow_actions, the actions_len only represents the kernel part's
      size, and when we dump the actions to the userspace, we will do the
      convertions, so it's true size may become bigger than the actions_len.
      
      But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
      to alloc the skbuff, so the user_skb's size may become insufficient and
      oops will happen like this:
        skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
        ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:<NULL>
        ------------[ cut here ]------------
        kernel BUG at net/core/skbuff.c:129!
        [...]
        Call Trace:
         <IRQ>
         [<ffffffff8148be82>] skb_put+0x43/0x44
         [<ffffffff8148fabf>] skb_zerocopy+0x6c/0x1f4
         [<ffffffffa0290d36>] queue_userspace_packet+0x3a3/0x448 [openvswitch]
         [<ffffffffa0292023>] ovs_dp_upcall+0x30/0x5c [openvswitch]
         [<ffffffffa028d435>] output_userspace+0x132/0x158 [openvswitch]
         [<ffffffffa01e6890>] ? ip6_rcv_finish+0x74/0x77 [ipv6]
         [<ffffffffa028e277>] do_execute_actions+0xcc1/0xdc8 [openvswitch]
         [<ffffffffa028e3f2>] ovs_execute_actions+0x74/0x106 [openvswitch]
         [<ffffffffa0292130>] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
         [<ffffffffa0292b77>] ? key_extract+0x63c/0x8d5 [openvswitch]
         [<ffffffffa029848b>] ovs_vport_receive+0xa1/0xc3 [openvswitch]
        [...]
      
      Also we can find that the actions_len is much little than the orig_len:
        crash> struct sw_flow_actions 0xffff8812f539d000
        struct sw_flow_actions {
          rcu = {
            next = 0xffff8812f5398800,
            func = 0xffffe3b00035db32
          },
          orig_len = 1384,
          actions_len = 592,
          actions = 0xffff8812f539d01c
        }
      
      So as a quick fix, use the orig_len instead of the actions_len to alloc
      the user_skb.
      
      Last, this oops happened on our system running a relative old kernel, but
      the same risk still exists on the mainline, since we use the wrong
      actions_len from the beginning.
      
      Fixes: ccea7445 ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
      Cc: Neil McKee <neil.mckee@inmon.com>
      Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      494bea39
  2. 03 7月, 2017 1 次提交
  3. 23 3月, 2017 1 次提交
    • A
      openvswitch: Optimize sample action for the clone use cases · 798c1661
      andy zhou 提交于
      With the introduction of open flow 'clone' action, the OVS user space
      can now translate the 'clone' action into kernel datapath 'sample'
      action, with 100% probability, to ensure that the clone semantics,
      which is that the packet seen by the clone action is the same as the
      packet seen by the action after clone, is faithfully carried out
      in the datapath.
      
      While the sample action in the datpath has the matching semantics,
      its implementation is only optimized for its original use.
      Specifically, there are two limitation: First, there is a 3 level of
      nesting restriction, enforced at the flow downloading time. This
      limit turns out to be too restrictive for the 'clone' use case.
      Second, the implementation avoid recursive call only if the sample
      action list has a single userspace action.
      
      The main optimization implemented in this series removes the static
      nesting limit check, instead, implement the run time recursion limit
      check, and recursion avoidance similar to that of the 'recirc' action.
      This optimization solve both #1 and #2 issues above.
      
      One related optimization attempts to avoid copying flow key as
      long as the actions enclosed does not change the flow key. The
      detection is performed only once at the flow downloading time.
      
      Another related optimization is to rewrite the action list
      at flow downloading time in order to save the fast path from parsing
      the sample action list in its original form repeatedly.
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      798c1661
  4. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  5. 11 6月, 2016 1 次提交
  6. 02 3月, 2016 1 次提交
  7. 23 10月, 2015 1 次提交
    • P
      openvswitch: Fix egress tunnel info. · fc4099f1
      Pravin B Shelar 提交于
      While transitioning to netdev based vport we broke OVS
      feature which allows user to retrieve tunnel packet egress
      information for lwtunnel devices.  Following patch fixes it
      by introducing ndo operation to get the tunnel egress info.
      Same ndo operation can be used for lwtunnel devices and compat
      ovs-tnl-vport devices. So after adding such device operation
      we can remove similar operation from ovs-vport.
      
      Fixes: 614732ea ("openvswitch: Use regular VXLAN net_device device").
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc4099f1
  8. 01 9月, 2015 1 次提交
  9. 30 8月, 2015 2 次提交
  10. 28 8月, 2015 3 次提交
    • J
      openvswitch: Allow matching on conntrack label · c2ac6673
      Joe Stringer 提交于
      Allow matching and setting the ct_label field. As with ct_mark, this is
      populated by executing the CT action. The label field may be modified by
      specifying a label and mask nested under the CT action. It is stored as
      metadata attached to the connection. Label modification occurs after
      lookup, and will only persist when the conntrack entry is committed by
      providing the COMMIT flag to the CT action. Labels are currently fixed
      to 128 bits in size.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2ac6673
    • J
      openvswitch: Add conntrack action · 7f8a436e
      Joe Stringer 提交于
      Expose the kernel connection tracker via OVS. Userspace components can
      make use of the CT action to populate the connection state (ct_state)
      field for a flow. This state can be subsequently matched.
      
      Exposed connection states are OVS_CS_F_*:
      - NEW (0x01) - Beginning of a new connection.
      - ESTABLISHED (0x02) - Part of an existing connection.
      - RELATED (0x04) - Related to an established connection.
      - INVALID (0x20) - Could not track the connection for this packet.
      - REPLY_DIR (0x40) - This packet is in the reply direction for the flow.
      - TRACKED (0x80) - This packet has been sent through conntrack.
      
      When the CT action is executed by itself, it will send the packet
      through the connection tracker and populate the ct_state field with one
      or more of the connection state flags above. The CT action will always
      set the TRACKED bit.
      
      When the COMMIT flag is passed to the conntrack action, this specifies
      that information about the connection should be stored. This allows
      subsequent packets for the same (or related) connections to be
      correlated with this connection. Sending subsequent packets for the
      connection through conntrack allows the connection tracker to consider
      the packets as ESTABLISHED, RELATED, and/or REPLY_DIR.
      
      The CT action may optionally take a zone to track the flow within. This
      allows connections with the same 5-tuple to be kept logically separate
      from connections in other zones. If the zone is specified, then the
      "ct_zone" match field will be subsequently populated with the zone id.
      
      IP fragments are handled by transparently assembling them as part of the
      CT action. The maximum received unit (MRU) size is tracked so that
      refragmentation can occur during output.
      
      IP frag handling contributed by Andy Zhou.
      
      Based on original design by Justin Pettit.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Signed-off-by: NJustin Pettit <jpettit@nicira.com>
      Signed-off-by: NAndy Zhou <azhou@nicira.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f8a436e
    • J
      openvswitch: Move MASKED* macros to datapath.h · be26b9a8
      Joe Stringer 提交于
      This will allow the ovs-conntrack code to reuse these macros.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be26b9a8
  11. 22 7月, 2015 1 次提交
  12. 02 6月, 2015 1 次提交
    • N
      openvswitch: include datapath actions with sampled-packet upcall to userspace · ccea7445
      Neil McKee 提交于
      If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
      OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
      in the upcall.
      
      This Directly associates the sampled packet with the path it takes
      through the virtual switch. Path information currently includes mangling,
      encapsulation and decapsulation actions for tunneling protocols GRE,
      VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
      changes to accommodate datapath actions that may be added in the
      future.
      
      Adding path information enhances visibility into complex virtual
      networks.
      Signed-off-by: NNeil McKee <neil.mckee@inmon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccea7445
  13. 13 3月, 2015 1 次提交
    • E
      net: Introduce possible_net_t · 0c5c9fb5
      Eric W. Biederman 提交于
      Having to say
      > #ifdef CONFIG_NET_NS
      > 	struct net *net;
      > #endif
      
      in structures is a little bit wordy and a little bit error prone.
      
      Instead it is possible to say:
      > typedef struct {
      > #ifdef CONFIG_NET_NS
      >       struct net *net;
      > #endif
      > } possible_net_t;
      
      And then in a header say:
      
      > 	possible_net_t net;
      
      Which is cleaner and easier to use and easier to test, as the
      possible_net_t is always there no matter what the compile options.
      
      Further this allows read_pnet and write_pnet to be functions in all
      cases which is better at catching typos.
      
      This change adds possible_net_t, updates the definitions of read_pnet
      and write_pnet, updates optional struct net * variables that
      write_pnet uses on to have the type possible_net_t, and finally fixes
      up the b0rked users of read_pnet and write_pnet.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c5c9fb5
  14. 10 11月, 2014 4 次提交
  15. 06 11月, 2014 1 次提交
  16. 06 10月, 2014 1 次提交
  17. 16 9月, 2014 4 次提交
  18. 31 7月, 2014 1 次提交
  19. 17 5月, 2014 1 次提交
  20. 15 2月, 2014 1 次提交
  21. 07 1月, 2014 3 次提交
  22. 20 11月, 2013 1 次提交
  23. 23 10月, 2013 1 次提交
  24. 04 10月, 2013 2 次提交
  25. 18 9月, 2013 1 次提交
  26. 24 8月, 2013 1 次提交
    • A
      openvswitch: Mega flow implementation · 03f0d916
      Andy Zhou 提交于
      Add wildcarded flow support in kernel datapath.
      
      Wildcarded flow can improve OVS flow set up performance by avoid sending
      matching new flows to the user space program. The exact performance boost
      will largely dependent on wildcarded flow hit rate.
      
      In case all new flows hits wildcard flows, the flow set up rate is
      within 5% of that of linux bridge module.
      
      Pravin has made significant contributions to this patch. Including API
      clean ups and bug fixes.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NAndy Zhou <azhou@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      03f0d916
  27. 20 6月, 2013 2 次提交