提交 · ec2437f42b44edc84054feb943d49e8030154c38 · openanolis / cloud-kernel

27 9月, 2017 24 次提交

mlxsw: spectrum_router: Use helper to check for last neighbor · ec2437f4

由 Arkadi Sharshevsky 提交于 9月 25, 2017

Use list_is_last helper to check for last neighbor.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec2437f4

mlxsw: spectrum_router: Keep nexthops in a linked list · dbe4598c

由 Arkadi Sharshevsky 提交于 9月 25, 2017

Keep nexthops in a linked list for easy access.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbe4598c

mlxsw: Add fields for mlxsw's meta header for adjacency table · c0859d69

由 Arkadi Sharshevsky 提交于 9月 25, 2017

This patch adds field for mlxsw's meta header which will be used to
describe the match/action behavior of the adjacency table.

The fields are:
1. Adj_index - The global index of the nexthop group in the adjacency
   table.

2. Adj_hash_index - Local index offset which is based on packets hash
   mod the nexthop group size.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0859d69

mlxsw: spectrum_dpipe: Fix indentation in header description · be2336eb

由 Arkadi Sharshevsky 提交于 9月 25, 2017

Fix indentation in mlxsw_meta header's description.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be2336eb

Merge branch 'bpf-metadata-direct-access' · 390e96ec

由 David S. Miller 提交于 9月 26, 2017

Daniel Borkmann says:

====================
BPF metadata for direct access

This work enables generic transfer of metadata from XDP into skb,
meaning the packet has a flexible and programmable room for meta
data, which can later be used by BPF to set various skb members
when passing up the stack. For details, please see second patch.
Support has been implemented and tested with two drivers, and
should be straight forward to add to other drivers as well which
properly support head adjustment already.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

390e96ec

bpf, ixgbe: add meta data support · 366a88fe

由 Daniel Borkmann 提交于 9月 25, 2017

Implement support for transferring XDP meta data into skb for
ixgbe driver; before calling into the program, xdp.data_meta points
to xdp.data, where on program return with pass verdict, we call
into skb_metadata_set().

We implement this for the default ixgbe_build_skb() variant. For the
ixgbe_construct_skb() that is used when legacy-rx buffer mananagement
mode is turned on via ethtool, I found that XDP gets 0 headroom, so
neither xdp_adjust_head() nor xdp_adjust_meta() can be used with this.
Just add a comment with explanation for this operating mode.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

366a88fe

bpf, nfp: add meta data support · 65d88fd0

由 Daniel Borkmann 提交于 9月 25, 2017

Implement support for transferring XDP meta data into skb for
nfp driver; before calling into the program, xdp.data_meta points
to xdp.data, where on program return with pass verdict, we call
into skb_metadata_set().
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65d88fd0

bpf: improve selftests and add tests for meta pointer · 22c88526

由 Daniel Borkmann 提交于 9月 25, 2017

Add various test_verifier selftests, and a simple xdp/tc functional
test that is being attached to veths. Also let new versions of clang
use the recently added -mcpu=probe support [1] for the BPF target,
so that it can probe the underlying kernel for BPF insn set extensions.
We could also just set this options always, where older versions just
ignore it and give a note to the user that the -mcpu value is not
supported, but given emitting the note cannot be turned off from clang
side lets not confuse users running selftests with it, thus fallback
to the default generic one when we see that clang doesn't support it.
Also allow CPU option to be overridden in the Makefile from command
line.

  [1] https://github.com/llvm-mirror/llvm/commit/d7276a40d87b89aed89978dec6457a5b8b3a0db5Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22c88526

bpf: update bpf.h uapi header for tools · ac29991b

由 Daniel Borkmann 提交于 9月 25, 2017

Looks like a couple of updates missed to get carried into tools/include/uapi/,
so copy the bpf.h header as usual to pull in latest updates.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac29991b

bpf: add meta pointer for direct access · de8f3a83

由 Daniel Borkmann 提交于 9月 25, 2017

This work enables generic transfer of metadata from XDP into skb. The
basic idea is that we can make use of the fact that the resulting skb
must be linear and already comes with a larger headroom for supporting
bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
for adjusting a new pointer called xdp->data_meta. Thus, the packet has
a flexible and programmable room for meta data, followed by the actual
packet data. struct xdp_buff is therefore laid out that we first point
to data_hard_start, then data_meta directly prepended to data followed
by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
account whether we have meta data already prepended and if so, memmove()s
this along with the given offset provided there's enough room.

xdp->data_meta is optional and programs are not required to use it. The
rationale is that when we process the packet in XDP (e.g. as DoS filter),
we can push further meta data along with it for the XDP_PASS case, and
give the guarantee that a clsact ingress BPF program on the same device
can pick this up for further post-processing. Since we work with skb
there, we can also set skb->mark, skb->priority or other skb meta data
out of BPF, thus having this scratch space generic and programmable
allows for more flexibility than defining a direct 1:1 transfer of
potentially new XDP members into skb (it's also more efficient as we
don't need to initialize/handle each of such new members). The facility
also works together with GRO aggregation. The scratch space at the head
of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
yet supporting xdp->data_meta can simply be set up with xdp->data_meta
as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
such that the subsequent match against xdp->data for later access is
guaranteed to fail.

The verifier treats xdp->data_meta/xdp->data the same way as we treat
xdp->data/xdp->data_end pointer comparisons. The requirement for doing
the compare against xdp->data is that it hasn't been modified from it's
original address we got from ctx access. It may have a range marking
already from prior successful xdp->data/xdp->data_end pointer comparisons
though.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de8f3a83

bpf: rename bpf_compute_data_end into bpf_compute_data_pointers · 6aaae2b6

由 Daniel Borkmann 提交于 9月 25, 2017

Just do the rename into bpf_compute_data_pointers() as we'll add
one more pointer here to recompute.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6aaae2b6

net: bcm63xx_enet: Use setup_timer and mod_timer · 3bd3b9ed

由 Himanshu Jha 提交于 9月 24, 2017

Use setup_timer and mod_timer API instead of structure assignments.

This is done using Coccinelle and semantic patch used
for this as follows:

@@
expression x,y,z,a,b;
@@

-init_timer (&x);
+setup_timer (&x, y, z);
+mod_timer (&a, b);
-x.function = y;
-x.data = z;
-x.expires = b;
-add_timer(&a);
Signed-off-by: NHimanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bd3b9ed

Merge branch 'qed-iWARP-fixes-and-enhancements' · 5b2ef20d

由 David S. Miller 提交于 9月 26, 2017

Michal Kalderon says:

====================
qed: iWARP fixes and enhancements

This patch series includes several fixes and enhancements
related to iWARP.

Patch #1 is actually the last of the initial iWARP submission.
It has been delayed until now as I wanted to make sure that qedr
supports iWARP prior to enabling iWARP device detection.

iWARP changes in RDMA tree have been accepted and targeted at
kernel 4.15, therefore, all iWARP fixes for this cycle are
submitted to net-next.

Changes from v1->v2
  - Added "Fixes:" tag to commit message of patch #3
====================

Signed-off by: Michal.Kalderon@cavium.com
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b2ef20d

qed: iWARP - Add check for errors on a SYN packet · 1e99c497

由 Michal Kalderon 提交于 9月 24, 2017

A SYN packet which arrives with errors from FW should be dropped.
This required adding an additional field to the ll2
rx completion data.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e99c497

qed: Fix maximum number of CQs for iWARP · 471115ab

由 Michal Kalderon 提交于 9月 24, 2017

The maximum number of CQs supported is bound to the number
of connections supported, which differs between RoCE and iWARP.

This fixes a crash that occurred in iWARP when running 1000 sessions
using perftest.

Fixes: 67b40dcc ("qed: Implement iWARP initialization, teardown and qp operations")
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

471115ab

qed: Add iWARP out of order support · d1abfd0b

由 Michal Kalderon 提交于 9月 24, 2017

iWARP requires OOO support which is already provided by the ll2
interface (until now was used only for iSCSI offload).
The changes mostly include opening a ll2 dedicated connection for
OOO and notifiying the FW about the handle id.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1abfd0b

qed: Add iWARP enablement support · e0a8f9de

由 Michal Kalderon 提交于 9月 24, 2017

This patch is the last of the initial iWARP patch series. It
adds the possiblity to actually detect iWARP from the device and enable
it in the critical locations which basically make iWARP available.

It wasn't submitted until now as iWARP hadn't been accepted into
the rdma tree.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0a8f9de

ldmvsw: Remove redundant unlikely() · 2091c227

由 Tobias Klauser 提交于 9月 26, 2017

IS_ERR() already implies unlikely(), so it can be omitted.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2091c227

net/mlx5: Remove redundant unlikely() · 92978ee8

由 Tobias Klauser 提交于 9月 26, 2017

IS_ERR() already implies unlikely(), so it can be omitted.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92978ee8

bnxt_en: Remove redundant unlikely() · 1fac4b2f

由 Tobias Klauser 提交于 9月 26, 2017

IS_ERR() already implies unlikely(), so it can be omitted.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fac4b2f

kcm: Remove redundant unlikely() · d9db5e36

由 Tobias Klauser 提交于 9月 26, 2017

IS_ERR() already implies unlikely(), so it can be omitted.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9db5e36

ipv6: Remove redundant unlikely() · 63a4e80b

由 Tobias Klauser 提交于 9月 26, 2017

IS_ERR() already implies unlikely(), so it can be omitted.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63a4e80b

datagram: Remove redundant unlikely() · 98e4fcff

由 Tobias Klauser 提交于 9月 26, 2017

IS_ERR() already implies unlikely(), so it can be omitted.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98e4fcff

net: ena: Remove redundant unlikely() · 1f4cf93b

由 Tobias Klauser 提交于 9月 26, 2017

IS_ERR() already implies unlikely(), so it can be omitted.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f4cf93b

26 9月, 2017 16 次提交

neigh: make strucrt neigh_table::entry_size unsigned int · 01ccdf12

由 Alexey Dobriyan 提交于 9月 23, 2017

Key length can't be negative.

Leave comparisons against nla_len() signed just in case truncated attribute
can sneak in there.

Space savings:

	add/remove: 0/0 grow/shrink: 0/7 up/down: 0/-7 (-7)
	function                                     old     new   delta
	pneigh_delete                                273     272      -1
	mlx5e_rep_netevent_event                    1415    1414      -1
	mlx5e_create_encap_header_ipv6              1194    1193      -1
	mlx5e_create_encap_header_ipv4              1071    1070      -1
	cxgb4_l2t_get                               1104    1103      -1
	__pneigh_lookup                               69      68      -1
	__neigh_create                              2452    2451      -1
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01ccdf12

neigh: make struct neigh_table::entry_size unsigned int · e451ae8e

由 Alexey Dobriyan 提交于 9月 23, 2017

Neigh entry size can't be negative.

Space savings:

	add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-7 (-7)
	function                                     old     new   delta
	lowpan_neigh_construct                        25      24      -1
	clip_seq_sub_iter                            152     151      -1
	clip_ioctl                                  1475    1474      -1
	clip_constructor                              93      92      -1
	__neigh_create                              2455    2452      -3
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e451ae8e

net: speed up skb_rbtree_purge() · 7c90584c

由 Eric Dumazet 提交于 9月 23, 2017

As measured in my prior patch ("sch_netem: faster rb tree removal"),
rbtree_postorder_for_each_entry_safe() is nice looking but much slower
than using rb_next() directly, except when tree is small enough
to fit in CPU caches (then the cost is the same)

Also note that there is not even an increase of text size :
$ size net/core/skbuff.o.before net/core/skbuff.o
   text	   data	    bss	    dec	    hex	filename
  40711	   1298	      0	  42009	   a419	net/core/skbuff.o.before
  40711	   1298	      0	  42009	   a419	net/core/skbuff.o

From: Eric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c90584c

sch_netem: faster rb tree removal · 3aa605f2

由 Eric Dumazet 提交于 9月 23, 2017

While running TCP tests involving netem storing millions of packets,
I had the idea to speed up tfifo_reset() and did experiments.

I tried the rbtree_postorder_for_each_entry_safe() method that is
used in skb_rbtree_purge() but discovered it was slower than the
current tfifo_reset() method.

I measured time taken to release skbs with three occupation levels :
10^4, 10^5 and 10^6 skbs with three methods :

1) (current 'naive' method)

	while ((p = rb_first(&q->t_root))) {
		struct sk_buff *skb = netem_rb_to_skb(p);

		rb_erase(p, &q->t_root);
		rtnl_kfree_skbs(skb, skb);
	}

2) Use rb_next() instead of rb_first() in the loop :

	p = rb_first(&q->t_root);
	while (p) {
		struct sk_buff *skb = netem_rb_to_skb(p);

		p = rb_next(p);
		rb_erase(&skb->rbnode, &q->t_root);
		rtnl_kfree_skbs(skb, skb);
	}

3) "optimized" method using rbtree_postorder_for_each_entry_safe()

	struct sk_buff *skb, *next;

	rbtree_postorder_for_each_entry_safe(skb, next,
					     &q->t_root, rbnode) {
               rtnl_kfree_skbs(skb, skb);
	}
	q->t_root = RB_ROOT;

Results :

method_1:while (rb_first()) rb_erase() 10000 skbs in 690378 ns (69 ns per skb)
method_2:rb_first; while (p) { p = rb_next(p); ...}  10000 skbs in 541846 ns (54 ns per skb)
method_3:rbtree_postorder_for_each_entry_safe() 10000 skbs in 868307 ns (86 ns per skb)

method_1:while (rb_first()) rb_erase() 99996 skbs in 7804021 ns (78 ns per skb)
method_2:rb_first; while (p) { p = rb_next(p); ...}  100000 skbs in 5942456 ns (59 ns per skb)
method_3:rbtree_postorder_for_each_entry_safe() 100000 skbs in 11584940 ns (115 ns per skb)

method_1:while (rb_first()) rb_erase() 1000000 skbs in 108577838 ns (108 ns per skb)
method_2:rb_first; while (p) { p = rb_next(p); ...}  1000000 skbs in 82619635 ns (82 ns per skb)
method_3:rbtree_postorder_for_each_entry_safe() 1000000 skbs in 127328743 ns (127 ns per skb)

Method 2) is simply faster, probably because it maintains a smaller
working size set.

Note that this is the method we use in tcp_ofo_queue() already.

I will also change skb_rbtree_purge() in a second patch.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3aa605f2

tun: delete original tun_get() and rename __tun_get() to tun_get() · 9484dc74

由 yuan linyu 提交于 9月 23, 2017

it seems no need to keep tun_get() and __tun_get() at same time.
Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9484dc74

cxgb4: do DCB state reset in couple of places · ba581f77

由 Ganesh Goudar 提交于 9月 23, 2017

reset the driver's DCB state in couple of places
where it was missing.
Signed-off-by: NCasey Leedom <leedom@chelsio.com>
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba581f77

Merge branch 'liquidio-fw-loading' · d958af3d

由 David S. Miller 提交于 9月 25, 2017

Rick Farrington says:

====================
liquidio: firmware loading

1. Allow host driver parameter to override auto-loaded firmware (in flash).
2. Verify version of firmware that is auto-loaded from flash.
3. Change value of fw_type module parameter to reflect default firmware
   image name that is loaded by host driver (in /sys/module/liquidio/...)
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d958af3d

liquidio: update module parameter fw_type to reflect firmware type loaded · 429cbf6b

由 Rick Farrington 提交于 9月 22, 2017

Signed-off-by: NRick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: NFelix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

429cbf6b

liquidio: verify firmware version when auto-loaded from flash. · b36e4820

由 Rick Farrington 提交于 9月 22, 2017

Signed-off-by: NRick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: NFelix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b36e4820

liquidio: allow override of firmware present in flash · 088b8749

由 Rick Farrington 提交于 9月 22, 2017

Signed-off-by: NRick Farrington <ricardo.farrington@cavium.com>
Signed-off-by: NFelix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

088b8749

Merge branch 'dsa-port-enabling' · 74c6042a

由 David S. Miller 提交于 9月 25, 2017

Vivien Didelot says:

====================
net: dsa: port enabling

This patchset makes slave open and close symmetrical and provides
helpers for enabling or disabling a given DSA port.

Changes in v3:
  - save the phy_device change for a future patchset

Changes in v2:
  - do not remove the phy argument from port enable/disable
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74c6042a

net: dsa: add port enable and disable helpers · fb8a6a2b

由 Vivien Didelot 提交于 9月 22, 2017

Provide dsa_port_enable and dsa_port_disable helpers to respectively
enable and disable a switch port. This makes the dsa_port_set_state_now
helper static.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb8a6a2b

net: dsa: make slave close symmetrical to open · 6457edfe

由 Vivien Didelot 提交于 9月 22, 2017

The DSA slave open function configures the unicast MAC addresses on the
master device, enable the switch port, change its STP state, then start
the PHY device.

Make the close function symmetric, by first stopping the PHY device,
then changing the STP state, disabling the switch port and restore the
master device.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6457edfe

hv_netvsc: Fix the real number of queues of non-vRSS cases · 6450f8f2

由 Haiyang Zhang 提交于 9月 22, 2017

For older hosts without multi-channel (vRSS) support, and some error
cases, we still need to set the real number of queues to one.
This patch adds this missing setting.

Fixes: 8195b139 ("hv_netvsc: fix deadlock on hotplug")
Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6450f8f2

Merge branch 'tun-NAPI-and-gro' · 070eb6e0

由 David S. Miller 提交于 9月 25, 2017

Petar Penkov says:

====================
net: Improve code coverage of syzkaller

This patch series is intended to improve code coverage of syzkaller on
the early receive path, specifically including flow dissector, GRO,
and GRO with frags parts of the networking stack. Syzkaller exercises
the stack through the TUN driver and this is therefore where changes
reside. Current coverage through netif_receive_skb() is limited as it
does not touch on any of the aforementioned code paths. Furthermore,
for full coverage, it is necessary to have more flexibility over the
linear and non-linear data of the skbs.

The following patches address this by providing the user(syzkaller)
with the ability to send via napi_gro_receive() and napi_gro_frags().
Additionally, syzkaller can specify how many fragments there are and
how much data per fragment there is. This is done by exploiting the
convenient structure of iovecs. Finally, this patch series adds
support for exercising the flow dissector during fuzzing.

The code path including napi_gro_receive() can be enabled via the
IFF_NAPI flag.  The remainder of the changes in this patch series give
the user significantly more control over packets entering the kernel.
To avoid potential security vulnerabilities, hide the ability to send
custom skbs and the flow dissector code paths behind a
capable(CAP_NET_ADMIN) check to require special user privileges.

Changes since v2 based on feedback from Willem de Bruijn and Mahesh
Bandewar:

Patch 1/ No changes.
Patch 2/ Check if the preconditions for IFF_NAPI_FRAGS (IFF_NAPI and
	 IFF_TAP) are met before opening/attaching rather than after.
	 If they are not, change the behavior from discarding the
	 flag to rejecting the command with EINVAL.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

070eb6e0

tun: enable napi_gro_frags() for TUN/TAP driver · 90e33d45

由 Petar Penkov 提交于 9月 22, 2017

Add a TUN/TAP receive mode that exercises the napi_gro_frags()
interface. This mode is available only in TAP mode, as the interface
expects packets with Ethernet headers.

Furthermore, packets follow the layout of the iovec_iter that was
received. The first iovec is the linear data, and every one after the
first is a fragment. If there are more fragments than the max number,
drop the packet. Additionally, invoke eth_get_headlen() to exercise flow
dissector code and to verify that the header resides in the linear data.

The napi_gro_frags() mode requires setting the IFF_NAPI_FRAGS option.
This is imposed because this mode is intended for testing via tools like
syzkaller and packetdrill, and the increased flexibility it provides can
introduce security vulnerabilities. This flag is accepted only if the
device is in TAP mode and has the IFF_NAPI flag set as well. This is
done because both of these are explicit requirements for correct
operation in this mode.
Signed-off-by: NPetar Penkov <peterpenkov96@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: davem@davemloft.net
Cc: ppenkov@stanford.edu
Acked-by: NMahesh Bandewar <maheshb@google,com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90e33d45

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功