提交 · f0b07bb151b098d291fd1fd71ef7a2df56fb124a · openanolis / cloud-kernel

30 3月, 2018 1 次提交

net: Introduce net_rwsem to protect net_namespace_list · f0b07bb1

由 Kirill Tkhai 提交于 3月 29, 2018

rtnl_lock() is used everywhere, and contention is very high.
When someone wants to iterate over alive net namespaces,
he/she has no a possibility to do that without exclusive lock.
But the exclusive rtnl_lock() in such places is overkill,
and it just increases the contention. Yes, there is already
for_each_net_rcu() in kernel, but it requires rcu_read_lock(),
and this can't be sleepable. Also, sometimes it may be need
really prevent net_namespace_list growth, so for_each_net_rcu()
is not fit there.

This patch introduces new rw_semaphore, which will be used
instead of rtnl_mutex to protect net_namespace_list. It is
sleepable and allows not-exclusive iterations over net
namespaces list. It allows to stop using rtnl_lock()
in several places (what is made in next patches) and makes
less the time, we keep rtnl_mutex. Here we just add new lock,
while the explanation of we can remove rtnl_lock() there are
in next patches.

Fine grained locks generally are better, then one big lock,
so let's do that with net_namespace_list, while the situation
allows that.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f0b07bb1

28 3月, 2018 1 次提交

net: Drop pernet_operations::async · 2f635cee

由 Kirill Tkhai 提交于 3月 27, 2018

Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f635cee

18 3月, 2018 1 次提交

net: Convert ovs_net_ops · ec716650

由 Kirill Tkhai 提交于 3月 15, 2018

These pernet_operations initialize and destroy net_generic()
data pointed by ovs_net_id. Exit method destroys vports from
alive net to exiting net. Since they are only pernet_operations
interested in this data, and exit method is executed under
exclusive global lock (ovs_mutex), they are safe to be executed
in parallel.
Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec716650

12 3月, 2018 1 次提交

openvswitch: meter: fix the incorrect calculation of max delta_t · ddc502df

由 zhangliping 提交于 3月 09, 2018

Max delat_t should be the full_bucket/rate instead of the full_bucket.
Also report EINVAL if the rate is zero.

Fixes: 96fbc13d ("openvswitch: Add meter infrastructure")
Cc: Andy Zhou <azhou@ovn.org>
Signed-off-by: Nzhangliping <zhangliping02@baidu.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddc502df

09 3月, 2018 1 次提交

openvswitch: fix vport packet length check. · 46e371f0

由 William Tu 提交于 3月 07, 2018

When sending a packet to a tunnel device, the dev's hard_header_len
could be larger than the skb->len in function packet_length().
In the case of ip6gretap/erspan, hard_header_len = LL_MAX_HEADER + t_hlen,
which is around 180, and an ARP packet sent to this tunnel has
skb->len = 42. This causes the 'unsign int length' to become super
large because it is negative value, causing the later ovs_vport_send
to drop it due to over-mtu size. The patch fixes it by setting it to 0.
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46e371f0

01 2月, 2018 1 次提交

openvswitch: Remove padding from packet before L3+ conntrack processing · 9382fe71

由 Ed Swierk 提交于 1月 31, 2018

IPv4 and IPv6 packets may arrive with lower-layer padding that is not
included in the L3 length. For example, a short IPv4 packet may have
up to 6 bytes of padding following the IP payload when received on an
Ethernet device with a minimum packet length of 64 bytes.

Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(),
and help() in nf_conntrack_ftp) assume skb->len reflects the length of
the L3 header and payload, rather than referring back to
ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by
lower-layer padding.

In the normal IPv4 receive path, ip_rcv() trims the packet to
ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive
path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly
in the br_netfilter receive path, br_validate_ipv4() and
br_validate_ipv6() trim the packet to the L3 length before invoking
netfilter hooks.

Currently in the OVS conntrack receive path, ovs_ct_execute() pulls
the skb to the L3 header but does not trim it to the L3 length before
calling nf_conntrack_in(NF_INET_PRE_ROUTING). When
nf_conntrack_proto_tcp encounters a packet with lower-layer padding,
nf_ip_checksum() fails causing a "nf_ct_tcp: bad TCP checksum" log
message. While extra zero bytes don't affect the checksum, the length
in the IP pseudoheader does. That length is based on skb->len, and
without trimming, it doesn't match the length the sender used when
computing the checksum.

In ovs_ct_execute(), trim the skb to the L3 length before higher-layer
processing.
Signed-off-by: NEd Swierk <eswierk@skyportsystems.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9382fe71

31 1月, 2018 1 次提交

openvswitch: meter: Use 64-bit arithmetic instead of 32-bit · 5b7789e8

由 Gustavo A. R. Silva 提交于 1月 30, 2018

Add suffix LL to constant 1000 in order to give the compiler
complete information about the proper arithmetic to use. Notice
that this constant is used in a context that expects an expression
of type long long int (64 bits, signed).

The expression (band->burst_size + band->rate) * 1000 is currently
being evaluated using 32-bit arithmetic.

Addresses-Coverity-ID: 1461563 ("Unintentional integer overflow")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b7789e8

26 1月, 2018 1 次提交

openvswitch: add erspan version I and II support · fc1372f8

由 William Tu 提交于 1月 25, 2018

The patch adds support for openvswitch to configure erspan
v1 and v2.  The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr is added
to uapi as a binary blob to support all ERSPAN v1 and v2's
fields.  Note that Previous commit "openvswitch: Add erspan tunnel
support." was reverted since it does not design properly.
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc1372f8

19 1月, 2018 1 次提交

flow_netlink: Remove unneeded semicolons · 89290b83

由 Christopher Díaz Riveros 提交于 1月 17, 2018

Trivial fix removes unneeded semicolons after if blocks.

This issue was detected by using the Coccinelle software.
Signed-off-by: NChristopher Díaz Riveros <chrisadr@gentoo.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89290b83

16 1月, 2018 1 次提交

Revert "openvswitch: Add erspan tunnel support." · 95a33208

由 William Tu 提交于 1月 12, 2018

This reverts commit ceaa001a.

The OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS attr should be designed
as a nested attribute to support all ERSPAN v1 and v2's fields.
The current attr is a be32 supporting only one field.  Thus, this
patch reverts it and later patch will redo it using nested attr.
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Cc: Jiri Benc <jbenc@redhat.com>
Cc: Pravin Shelar <pshelar@ovn.org>
Acked-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95a33208

09 1月, 2018 1 次提交

net: ovs: remove unused hardirq.h · 419091f1

由 Yang Shi 提交于 1月 09, 2018

Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by openvswitch at all.

So, remove the unused hardirq.h.
Signed-off-by: NYang Shi <yang.s@alibaba-inc.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: dev@openvswitch.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

419091f1

03 1月, 2018 1 次提交

openvswitch: drop unneeded newline · e0b10844

由 Julia Lawall 提交于 12月 27, 2017

OVS_NLERR prints a newline at the end of the message string, so the
message string does not need to include a newline explicitly.  Done
using Coccinelle.
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0b10844

22 12月, 2017 1 次提交

openvswitch: Fix pop_vlan action for double tagged frames · c48e7473

由 Eric Garver 提交于 12月 20, 2017

skb_vlan_pop() expects skb->protocol to be a valid TPID for double
tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
shift the true ethertype into position for us.

Fixes: 5108bbad ("openvswitch: add processing of L3 packets")
Signed-off-by: NEric Garver <e@erig.me>
Reviewed-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c48e7473

16 12月, 2017 1 次提交

net: erspan: refactor existing erspan code · 1d7e2ed2

由 William Tu 提交于 12月 13, 2017

The patch refactors the existing erspan implementation in order
to support erspan version 2, which has additional metadata.  So, in
stead of having one 'struct erspanhdr' holding erspan version 1,
breaks it into 'struct erspan_base_hdr' and 'struct erspan_metadata'.
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d7e2ed2

03 12月, 2017 1 次提交

openvswitch: do not propagate headroom updates to internal port · 183dea58

由 Paolo Abeni 提交于 11月 30, 2017

After commit 3a927bc7 ("ovs: propagate per dp max headroom to
all vports") the need_headroom for the internal vport is updated
accordingly to the max needed headroom in its datapath.

That avoids the pskb_expand_head() costs when sending/forwarding
packets towards tunnel devices, at least for some scenarios.

We still require such copy when using the ovs-preferred configuration
for vxlan tunnels:

    br_int
  /       \
tap      vxlan
           (remote_ip:X)

br_phy
     \
    NIC

where the route towards the IP 'X' is via 'br_phy'.

When forwarding traffic from the tap towards the vxlan device, we
will call pskb_expand_head() in vxlan_build_skb() because
br-phy->needed_headroom is equal to tun->needed_headroom.

With this change we avoid updating the internal vport needed_headroom,
so that in the above scenario no head copy is needed, giving 5%
performance improvement in UDP throughput test.

As a trade-off, packets sent from the internal port towards a tunnel
device will now experience the head copy overhead. The rationale is
that the latter use-case is less relevant performance-wise.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

183dea58

30 11月, 2017 1 次提交

openvswitch: use ktime_get_ts64() instead of ktime_get_ts() · 311af51d

由 Arnd Bergmann 提交于 11月 27, 2017

timespec is deprecated because of the y2038 overflow, so let's convert
this one to ktime_get_ts64(). The code is already safe even on 32-bit
architectures, since it uses monotonic times. On 64-bit architectures,
nothing changes, while on 32-bit architectures this avoids one
type conversion.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

311af51d

27 11月, 2017 2 次提交

openvswitch: fix the incorrect flow action alloc size · 67c8d22a

由 zhangliping 提交于 11月 25, 2017

If we want to add a datapath flow, which has more than 500 vxlan outputs'
action, we will get the following error reports:
  openvswitch: netlink: Flow action size 32832 bytes exceeds max
  openvswitch: netlink: Flow action size 32832 bytes exceeds max
  openvswitch: netlink: Actions may not be safe on all matching packets
  ... ...

It seems that we can simply enlarge the MAX_ACTIONS_BUFSIZE to fix it, but
this is not the root cause. For example, for a vxlan output action, we need
about 60 bytes for the nlattr, but after it is converted to the flow
action, it only occupies 24 bytes. This means that we can still support
more than 1000 vxlan output actions for a single datapath flow under the
the current 32k max limitation.

So even if the nla_len(attr) is larger than MAX_ACTIONS_BUFSIZE, we
shouldn't report EINVAL and keep it move on, as the judgement can be
done by the reserve_sfa_size.
Signed-off-by: Nzhangliping <zhangliping02@baidu.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67c8d22a

net: openvswitch: datapath: fix data type in queue_gso_packets · 2734166e

由 Gustavo A. R. Silva 提交于 11月 25, 2017

gso_type is being used in binary AND operations together with SKB_GSO_UDP.
The issue is that variable gso_type is of type unsigned short and
SKB_GSO_UDP expands to more than 16 bits:

SKB_GSO_UDP = 1 << 16

this makes any binary AND operation between gso_type and SKB_GSO_UDP to
be always zero, hence making some code unreachable and likely causing
undesired behavior.

Fix this by changing the data type of variable gso_type to unsigned int.

Addresses-Coverity-ID: 1462223
Fixes: 0c19f846 ("net: accept UFO datagrams from tuntap and packet")
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2734166e

24 11月, 2017 1 次提交

net: accept UFO datagrams from tuntap and packet · 0c19f846

由 Willem de Bruijn 提交于 11月 21, 2017

Tuntap and similar devices can inject GSO packets. Accept type
VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.

Processes are expected to use feature negotiation such as TUNSETOFFLOAD
to detect supported offload types and refrain from injecting other
packets. This process breaks down with live migration: guest kernels
do not renegotiate flags, so destination hosts need to expose all
features that the source host does.

Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
This patch introduces nearly(*) no new code to simplify verification.
It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
insertion and software UFO segmentation.

It does not reinstate protocol stack support, hardware offload
(NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.

To support SKB_GSO_UDP reappearing in the stack, also reinstate
logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
by squashing in commit 93991221 ("net: skb_needs_check() removes
CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
("net: avoid skb_warn_bad_offload false positives on UFO").

(*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
ipv6_proxy_select_ident is changed to return a __be32 and this is
assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
at the end of the enum to minimize code churn.

Tested
  Booted a v4.13 guest kernel with QEMU. On a host kernel before this
  patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
  enabled, same as on a v4.13 host kernel.

  A UFO packet sent from the guest appears on the tap device:
    host:
      nc -l -p -u 8000 &
      tcpdump -n -i tap0

    guest:
      dd if=/dev/zero of=payload.txt bs=1 count=2000
      nc -u 192.16.1.1 8000 < payload.txt

  Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
  packets arriving fragmented:

    ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
    (from https://github.com/wdebruij/kerneltools/tree/master/tests)

Changes
  v1 -> v2
    - simplified set_offload change (review comment)
    - documented test procedure

Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
Reported-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c19f846

15 11月, 2017 1 次提交

openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start · b74912a2

由 Gustavo A. R. Silva 提交于 11月 14, 2017

It seems that the intention of the code is to null check the value
returned by function genlmsg_put. But the current code is null
checking the address of the pointer that holds the value returned
by genlmsg_put.

Fix this by properly null checking the value returned by function
genlmsg_put in order to avoid a pontential null pointer dereference.

Addresses-Coverity-ID: 1461561 ("Dereference before null check")
Addresses-Coverity-ID: 1461562 ("Dereference null return value")
Fixes: 96fbc13d ("openvswitch: Add meter infrastructure")
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b74912a2

14 11月, 2017 3 次提交

openvswitch: Using kfree_rcu() to simplify the code · 6dc14dc4

由 Wei Yongjun 提交于 11月 14, 2017

The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6dc14dc4

openvswitch: Make local function ovs_nsh_key_attr_size() static · 06c2351f

由 Wei Yongjun 提交于 11月 14, 2017

Fixes the following sparse warnings:

net/openvswitch/flow_netlink.c:340:8: warning:
 symbol 'ovs_nsh_key_attr_size' was not declared. Should it be static?
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06c2351f

openvswitch: Fix return value check in ovs_meter_cmd_features() · 8a860c2b

由 Wei Yongjun 提交于 11月 14, 2017

In case of error, the function ovs_meter_cmd_reply_start() returns
ERR_PTR() not NULL. The NULL test in the return value check should
be replaced with IS_ERR().

Fixes: 96fbc13d ("openvswitch: Add meter infrastructure")
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a860c2b

13 11月, 2017 3 次提交

openvswitch: Add meter action support · cd8a6c33

由 Andy Zhou 提交于 11月 10, 2017

Implements OVS kernel meter action support.
Signed-off-by: NAndy Zhou <azhou@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd8a6c33

openvswitch: Add meter infrastructure · 96fbc13d

由 Andy Zhou 提交于 11月 10, 2017

OVS kernel datapath so far does not support Openflow meter action.
This is the first stab at adding kernel datapath meter support.
This implementation supports only drop band type.
Signed-off-by: NAndy Zhou <azhou@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96fbc13d

openvswitch: export get_dp() API. · 9602c01e

由 Andy Zhou 提交于 11月 10, 2017

Later patches will invoke get_dp() outside of datapath.c. Export it.
Signed-off-by: NAndy Zhou <azhou@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9602c01e

08 11月, 2017 1 次提交

openvswitch: enable NSH support · b2d0f5d5

由 Yi Yang 提交于 11月 07, 2017

v16->17
 - Fixed disputed check code: keep them in nsh_push and nsh_pop
   but also add them in __ovs_nla_copy_actions

v15->v16
 - Add csum recalculation for nsh_push, nsh_pop and set_nsh
   pointed out by Pravin
 - Move nsh key into the union with ipv4 and ipv6 and add
   check for nsh key in match_validate pointed out by Pravin
 - Add nsh check in validate_set and __ovs_nla_copy_actions

v14->v15
 - Check size in nsh_hdr_from_nlattr
 - Fixed four small issues pointed out By Jiri and Eric

v13->v14
 - Rename skb_push_nsh to nsh_push per Dave's comment
 - Rename skb_pop_nsh to nsh_pop per Dave's comment

v12->v13
 - Fix NSH header length check in set_nsh

v11->v12
 - Fix missing changes old comments pointed out
 - Fix new comments for v11

v10->v11
 - Fix the left three disputable comments for v9
   but not fixed in v10.

v9->v10
 - Change struct ovs_key_nsh to
       struct ovs_nsh_key_base base;
       __be32 context[NSH_MD1_CONTEXT_SIZE];
 - Fix new comments for v9

v8->v9
 - Fix build error reported by daily intel build
   because nsh module isn't selected by openvswitch

v7->v8
 - Rework nested value and mask for OVS_KEY_ATTR_NSH
 - Change pop_nsh to adapt to nsh kernel module
 - Fix many issues per comments from Jiri Benc

v6->v7
 - Remove NSH GSO patches in v6 because Jiri Benc
   reworked it as another patch series and they have
   been merged.
 - Change it to adapt to nsh kernel module added by NSH
   GSO patch series

v5->v6
 - Fix the rest comments for v4.
 - Add NSH GSO support for VxLAN-gpe + NSH and
   Eth + NSH.

v4->v5
 - Fix many comments by Jiri Benc and Eric Garver
   for v4.

v3->v4
 - Add new NSH match field ttl
 - Update NSH header to the latest format
   which will be final format and won't change
   per its author's confirmation.
 - Fix comments for v3.

v2->v3
 - Change OVS_KEY_ATTR_NSH to nested key to handle
   length-fixed attributes and length-variable
   attriubte more flexibly.
 - Remove struct ovs_action_push_nsh completely
 - Add code to handle nested attribute for SET_MASKED
 - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
   to transfer NSH header data.
 - Fix comments and coding style issues by Jiri and Eric

v1->v2
 - Change encap_nsh and decap_nsh to push_nsh and pop_nsh
 - Dynamically allocate struct ovs_action_push_nsh for
   length-variable metadata.

OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in compat mode by porting this.
Signed-off-by: NYi Yang <yi.y.yang@intel.com>
Acked-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NEric Garver <e@erig.me>
Acked-by: NPravin Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2d0f5d5

05 11月, 2017 1 次提交

openvswitch: reliable interface indentification in port dumps · 9354d452

由 Jiri Benc 提交于 11月 02, 2017

This patch allows reliable identification of netdevice interfaces connected
to openvswitch bridges. In particular, user space queries the netdev
interfaces belonging to the ports for statistics, up/down state, etc.
Datapath dump needs to provide enough information for the user space to be
able to do that.

Currently, only interface names are returned. This is not sufficient, as
openvswitch allows its ports to be in different name spaces and the
interface name is valid only in its name space. What is needed and generally
used in other netlink APIs, is the pair ifindex+netnsid.

The solution is addition of the ifindex+netnsid pair (or only ifindex if in
the same name space) to vport get/dump operation.

On request side, ideally the ifindex+netnsid pair could be used to
get/set/del the corresponding vport. This is not implemented by this patch
and can be added later if needed.
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9354d452

02 11月, 2017 1 次提交

License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318

由 Greg Kroah-Hartman 提交于 11月 01, 2017

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2441318

22 10月, 2017 1 次提交

openvswitch: conntrack: mark expected switch fall-through · 279badc2

由 Gustavo A. R. Silva 提交于 10月 19, 2017

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case I placed a "fall through" comment on
its own line, which is what GCC is expecting to find.
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

279badc2

11 10月, 2017 1 次提交

openvswitch: add ct_clear action · b8226962

由 Eric Garver 提交于 10月 10, 2017

This adds a ct_clear action for clearing conntrack state. ct_clear is
currently implemented in OVS userspace, but is not backed by an action
in the kernel datapath. This is useful for flows that may modify a
packet tuple after a ct lookup has already occurred.
Signed-off-by: NEric Garver <e@erig.me>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8226962

10 10月, 2017 1 次提交

openvswitch: Add erspan tunnel support. · ceaa001a

由 William Tu 提交于 10月 04, 2017

Add erspan netlink interface for OVS.
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceaa001a

05 10月, 2017 1 次提交

net: Add extack to upper device linking · 42ab19ee

由 David Ahern 提交于 10月 04, 2017

Add extack arg to netdev_upper_dev_link and netdev_master_upper_dev_link
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42ab19ee

13 9月, 2017 1 次提交

openvswitch: Fix an error handling path in 'ovs_nla_init_match_and_action()' · 5829e62a

由 Christophe JAILLET 提交于 9月 11, 2017

All other error handling paths in this function go through the 'error'
label. This one should do the same.

Fixes: 9cc9a5cb ("datapath: Avoid using stack larger than 1024.")
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5829e62a

25 8月, 2017 1 次提交

netfilter: conntrack: make protocol tracker pointers const · b3480fe0

由 Florian Westphal 提交于 8月 12, 2017

Doesn't change generated code, but will make it easier to eventually
make the actual trackers themselvers const.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b3480fe0

17 8月, 2017 1 次提交

openvswitch: fix skb_panic due to the incorrect actions attrlen · 494bea39

由 Liping Zhang 提交于 8月 16, 2017

For sw_flow_actions, the actions_len only represents the kernel part's
size, and when we dump the actions to the userspace, we will do the
convertions, so it's true size may become bigger than the actions_len.

But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
to alloc the skbuff, so the user_skb's size may become insufficient and
oops will happen like this:
  skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
  ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:<NULL>
  ------------[ cut here ]------------
  kernel BUG at net/core/skbuff.c:129!
  [...]
  Call Trace:
   <IRQ>
   [<ffffffff8148be82>] skb_put+0x43/0x44
   [<ffffffff8148fabf>] skb_zerocopy+0x6c/0x1f4
   [<ffffffffa0290d36>] queue_userspace_packet+0x3a3/0x448 [openvswitch]
   [<ffffffffa0292023>] ovs_dp_upcall+0x30/0x5c [openvswitch]
   [<ffffffffa028d435>] output_userspace+0x132/0x158 [openvswitch]
   [<ffffffffa01e6890>] ? ip6_rcv_finish+0x74/0x77 [ipv6]
   [<ffffffffa028e277>] do_execute_actions+0xcc1/0xdc8 [openvswitch]
   [<ffffffffa028e3f2>] ovs_execute_actions+0x74/0x106 [openvswitch]
   [<ffffffffa0292130>] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
   [<ffffffffa0292b77>] ? key_extract+0x63c/0x8d5 [openvswitch]
   [<ffffffffa029848b>] ovs_vport_receive+0xa1/0xc3 [openvswitch]
  [...]

Also we can find that the actions_len is much little than the orig_len:
  crash> struct sw_flow_actions 0xffff8812f539d000
  struct sw_flow_actions {
    rcu = {
      next = 0xffff8812f5398800,
      func = 0xffffe3b00035db32
    },
    orig_len = 1384,
    actions_len = 592,
    actions = 0xffff8812f539d01c
  }

So as a quick fix, use the orig_len instead of the actions_len to alloc
the user_skb.

Last, this oops happened on our system running a relative old kernel, but
the same risk still exists on the mainline, since we use the wrong
actions_len from the beginning.

Fixes: ccea7445 ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
Cc: Neil McKee <neil.mckee@inmon.com>
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

494bea39

12 8月, 2017 1 次提交

openvswitch: Remove unnecessary newlines from OVS_NLERR uses · 0ed80da5

由 Joe Perches 提交于 8月 11, 2017

OVS_NLERR already adds a newline so these just add blank
lines to the logging.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NJoe Stringer <joe@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ed80da5

25 7月, 2017 1 次提交

openvswitch: fix potential out of bound access in parse_ct · 69ec932e

由 Liping Zhang 提交于 7月 23, 2017

Before the 'type' is validated, we shouldn't use it to fetch the
ovs_ct_attr_lens's minlen and maxlen, else, out of bound access
may happen.

Fixes: 7f8a436e ("openvswitch: Add conntrack action")
Signed-off-by: NLiping Zhang <zlpnobody@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69ec932e

20 7月, 2017 2 次提交

openvswitch: Optimize operations for OvS flow_stats. · c4b2bf6b

由 Tonghao Zhang 提交于 7月 17, 2017

When calling the flow_free() to free the flow, we call many times
(cpu_possible_mask, eg. 128 as default) cpumask_next(). That will
take up our CPU usage if we call the flow_free() frequently.
When we put all packets to userspace via upcall, and OvS will send
them back via netlink to ovs_packet_cmd_execute(will call flow_free).

The test topo is shown as below. VM01 sends TCP packets to VM02,
and OvS forward packtets. When testing, we use perf to report the
system performance.

VM01 --- OvS-VM --- VM02

Without this patch, perf-top show as below: The flow_free() is
3.02% CPU usage.

	4.23%  [kernel]            [k] _raw_spin_unlock_irqrestore
	3.62%  [kernel]            [k] __do_softirq
	3.16%  [kernel]            [k] __memcpy
	3.02%  [kernel]            [k] flow_free
	2.42%  libc-2.17.so        [.] __memcpy_ssse3_back
	2.18%  [kernel]            [k] copy_user_generic_unrolled
	2.17%  [kernel]            [k] find_next_bit

When applied this patch, perf-top show as below: Not shown on
the list anymore.

	4.11%  [kernel]            [k] _raw_spin_unlock_irqrestore
	3.79%  [kernel]            [k] __do_softirq
	3.46%  [kernel]            [k] __memcpy
	2.73%  libc-2.17.so        [.] __memcpy_ssse3_back
	2.25%  [kernel]            [k] copy_user_generic_unrolled
	1.89%  libc-2.17.so        [.] _int_malloc
	1.53%  ovs-vswitchd        [.] xlate_actions

With this patch, the TCP throughput(we dont use Megaflow Cache
+ Microflow Cache) between VMs is 1.18Gbs/sec up to 1.30Gbs/sec
(maybe ~10% performance imporve).

This patch adds cpumask struct, the cpu_used_mask stores the cpu_id
that the flow used. And we only check the flow_stats on the cpu we
used, and it is unncessary to check all possible cpu when getting,
cleaning, and updating the flow_stats. Adding the cpu_used_mask to
sw_flow struct does’t increase the cacheline number.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4b2bf6b

openvswitch: Optimize updating for OvS flow_stats. · c57c054e

由 Tonghao Zhang 提交于 7月 17, 2017

In the ovs_flow_stats_update(), we only use the node
var to alloc flow_stats struct. But this is not a
common case, it is unnecessary to call the numa_node_id()
everytime. This patch is not a bugfix, but there maybe
a small increase.
Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c57c054e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功