提交 · c3f8324188fa80178f20c8209b492ca6191177e8 · openeuler / Kernel

05 6月, 2015 4 次提交

net: Add full IPv6 addresses to flow_keys · c3f83241

由 Tom Herbert 提交于 6月 04, 2015

This patch adds full IPv6 addresses into flow_keys and uses them as
input to the flow hash function. The implementation supports either
IPv4 or IPv6 addresses in a union, and selector is used to determine
how may words to input to jhash2.

We also add flow_get_u32_dst and flow_get_u32_src functions which are
used to get a u32 representation of the source and destination
addresses. For IPv6, ipv6_addr_hash is called. These functions retain
getting the legacy values of src and dst in flow_keys.

With this patch, Ethertype and IP protocol are now included in the
flow hash input.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3f83241

net: Get skb hash over flow_keys structure · 42aecaa9

由 Tom Herbert 提交于 6月 04, 2015

This patch changes flow hashing to use jhash2 over the flow_keys
structure instead just doing jhash_3words over src, dst, and ports.
This method will allow us take more input into the hashing function
so that we can include full IPv6 addresses, VLAN, flow labels etc.
without needing to resort to xor'ing which makes for a poor hash.
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42aecaa9

net: Remove superfluous setting of key_basic · c468efe2

由 Tom Herbert 提交于 6月 04, 2015

key_basic is set twice in __skb_flow_dissect which seems unnecessary.
Remove second one.
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c468efe2

net: Simplify GRE case in flow_dissector · ce3b5355

由 Tom Herbert 提交于 6月 04, 2015

Do break when we see routing flag or a non-zero version number in GRE
header.
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce3b5355

23 5月, 2015 1 次提交

flow_dissector: do not break if ports are not needed in flowlabel · 12c227ec

由 Jiri Pirko 提交于 5月 22, 2015

This restored previous behaviour. If caller does not want ports to be
filled, we should not break.

Fixes: 06635a35 ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12c227ec

18 5月, 2015 1 次提交

flow_dissector: remove bogus return in tipc section · 74b80e84

由 Jiri Pirko 提交于 5月 15, 2015

Fixes: 06635a35 ("flow_dissect: use programable dissector in skb_flow_dissect and friends")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74b80e84

14 5月, 2015 10 次提交

J
flow_dissector: change port array into src, dst tuple · 59346afe
由 Jiri Pirko 提交于 5月 12, 2015
```
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
59346afe
J
flow_dissector: introduce support for Ethernet addresses · 67a900cc
由 Jiri Pirko 提交于 5月 12, 2015
```
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
67a900cc

flow_dissector: introduce support for ipv6 addressses · b924933c

由 Jiri Pirko 提交于 5月 12, 2015

So far, only hashes made out of ipv6 addresses could be dissected. This
patch introduces support for dissection of full ipv6 addresses.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b924933c

J
flow_dissect: use programable dissector in skb_flow_dissect and friends · 06635a35
由 Jiri Pirko 提交于 5月 12, 2015
```
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
06635a35

flow_dissector: introduce programable flow_dissector · fbff949e

由 Jiri Pirko 提交于 5月 12, 2015

Introduce dissector infrastructure which allows user to specify which
parts of skb he wants to dissect.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbff949e

flow_dissector: fix doc for skb_get_poff · 0db89b8b

由 Jiri Pirko 提交于 5月 12, 2015

Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0db89b8b

net: move netdev_pick_tx and dependencies to net/core/dev.c · 638b2a69

由 Jiri Pirko 提交于 5月 12, 2015

next to its user. No relation to flow_dissector so it makes no sense to
have it in flow_dissector.c
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

638b2a69

net: move __skb_tx_hash to dev.c · 5605c762

由 Jiri Pirko 提交于 5月 12, 2015

__skb_tx_hash function has no relation to flow_dissect so just move it
to dev.c
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5605c762

J
flow_dissector: fix doc for __skb_get_hash and remove couple of empty lines · d4fd3275
由 Jiri Pirko 提交于 5月 12, 2015
```
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
d4fd3275

net: change name of flow_dissector header to match the .c file name · 1bd758eb

由 Jiri Pirko 提交于 5月 12, 2015

add couple of empty lines on the way.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bd758eb

04 5月, 2015 2 次提交

net: Add flow_keys digest · 2f59e1eb

由 Tom Herbert 提交于 5月 01, 2015

Some users of flow keys (well just sch_choke now) need to pass
flow_keys in skbuff cb, and use them for exact comparisons of flows
so that skb->hash is not sufficient. In order to increase size of
the flow_keys structure, we introduce another structure for
the purpose of passing flow keys in skbuff cb. We limit this structure
to sixteen bytes, and we will technically treat this as a digest of
flow_keys struct hence its name flow_keys_digest. In the first
incaranation we just copy the flow_keys structure up to 16 bytes--
this is the same information previously passed in the cb. In the
future, we'll adapt this for larger flow_keys and could use something
like SHA-1 over the whole flow_keys to improve the quality of the
digest.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f59e1eb

net: Add skb_get_hash_perturb · 50fb7992

由 Tom Herbert 提交于 5月 01, 2015

This calls flow_disect and __skb_get_hash to procure a hash for a
packet. Input includes a key to initialize jhash. This function
does not set skb->hash.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50fb7992

05 2月, 2015 1 次提交

xps: fix xps for stacked devices · 2bd82484

由 Eric Dumazet 提交于 2月 03, 2015

A typical qdisc setup is the following :

bond0 : bonding device, using HTB hierarchy
eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc

XPS allows to spread packets on specific tx queues, based on the cpu
doing the send.

Problem is that dequeues from bond0 qdisc can happen on random cpus,
due to the fact that qdisc_run() can dequeue a batch of packets.

CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0

CPUB -> dequeue packet P1 from bond0
        enqueue packet on eth1/eth2
CPUC -> dequeue packet P2 from bond0
        enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)

get_xps_queue() then might select wrong queue for P1, since current cpu
might be different than CPUA.

P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)

Effect of this bug is TCP reorders, and more generally not optimal
TX queue placement. (A victim bulk flow can be migrated to the wrong TX
queue for a while)

To fix this, we have to record sender cpu number the first time
dev_queue_xmit() is called for one tx skb.

We can union napi_id (used on receive path) and sender_cpu,
granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
this union idea)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2bd82484

27 1月, 2015 1 次提交

flow_dissector: add tipc support · 08bfc9cb

由 Erik Hugne 提交于 1月 22, 2015

The flows are hashed on the sending node address, which allows us
to spread out the TIPC link processing to RPS enabled cores. There
is no point to include the destination address in the hash as that
will always be the same for all inbound links. We have experimented
with a 3-tuple hash over [srcnode, sport, dport], but this showed to
give slightly lower performance because of increased lock contention
when the same link was handled by multiple cores.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08bfc9cb

11 10月, 2014 1 次提交

flow-dissector: Fix alignment issue in __skb_flow_get_ports · 5af7fb6e

由 Alexander Duyck 提交于 10月 10, 2014

This patch addresses a kernel unaligned access bug seen on a sparc64 system
with an igb adapter. Specifically the __skb_flow_get_ports was returning a
be32 pointer which was then having the value directly returned.

In order to prevent this it is actually easier to simply not populate the
ports or address values when an skb is not present. In this case the
assumption is that the data isn't needed and rather than slow down the
faster aligned accesses by making them have to assume the unaligned path on
architectures that don't support efficent unaligned access it makes more
sense to simply switch off the bits that were copying the source and
destination address/port for the case where we only care about the protocol
types and lengths which are normally 16 bit fields anyway.
Reported-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5af7fb6e

06 9月, 2014 1 次提交

net: Add function for parsing the header length out of linear ethernet frames · 56193d1b

由 Alexander Duyck 提交于 9月 05, 2014

This patch updates some of the flow_dissector api so that it can be used to
parse the length of ethernet buffers stored in fragments. Most of the
changes needed were to __skb_get_poff as it needed to be updated to support
sending a linear buffer instead of a skb.

I have split __skb_get_poff into two functions, the first is skb_get_poff
and it retains the functionality of the original __skb_get_poff. The other
function is __skb_get_poff which now works much like __skb_flow_dissect in
relation to skb_flow_dissect in that it provides the same functionality but
works with just a data buffer and hlen instead of needing an skb.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56193d1b

26 8月, 2014 2 次提交

net: make skb an optional parameter for__skb_flow_dissect() · 453a940e

由 WANG Cong 提交于 8月 25, 2014

Fixes: commit 690e36e7 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

453a940e

net: fix comments for __skb_flow_get_ports() · 6451b3f5

由 WANG Cong 提交于 8月 25, 2014

Fixes: commit 690e36e7 (net: Allow raw buffers to be passed into the flow dissector)
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6451b3f5

24 8月, 2014 2 次提交

net: use reciprocal_scale() helper · 8fc54f68

由 Daniel Borkmann 提交于 8月 23, 2014

Replace open codings of (((u64) <x> * <y>) >> 32) with reciprocal_scale().
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fc54f68

net: Allow raw buffers to be passed into the flow dissector. · 690e36e7

由 David S. Miller 提交于 8月 23, 2014

Drivers, and perhaps other entities we have not yet considered,
sometimes want to know how deep the protocol headers go before
deciding how large of an SKB to allocate and how much of the packet to
place into the linear SKB area.

For example, consider a driver which has a device which DMAs into
pools of pages and then tells the driver where the data went in the
DMA descriptor(s).  The driver can then build an SKB and reference
most of the data via SKB fragments (which are page/offset/length
triplets).

However at least some of the front of the packet should be placed into
the linear SKB area, which comes before the fragments, so that packet
processing can get at the headers efficiently.  The first thing each
protocol layer is going to do is a "pskb_may_pull()" so we might as
well aggregate as much of this as possible while we're building the
SKB in the driver.

Part of supporting this is that we don't have an SKB yet, so we want
to be able to let the flow dissector operate on a raw buffer in order
to compute the offset of the end of the headers.

So now we have a __skb_flow_dissect() which takes an explicit data
pointer and length.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

690e36e7

08 7月, 2014 4 次提交

net: Only do flow_dissector hash computation once per packet · a3b18ddb

由 Tom Herbert 提交于 7月 01, 2014

Add sw_hash flag to skbuff to indicate that skb->hash was computed
from flow_dissector. This flag is checked in skb_get_hash to avoid
repeatedly trying to compute the hash (ie. in the case that no L4 hash
can be computed).
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3b18ddb

flow_dissector: Use IPv6 flow label in flow_dissector · 19469a87

由 Tom Herbert 提交于 7月 01, 2014

This patch implements the receive side to support RFC 6438 which is to
use the flow label as an ECMP hash. If an IPv6 flow label is set
in a packet we can use this as input for computing an L4-hash. There
should be no need to parse any transport headers in this case.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19469a87

net: Call skb_get_hash in get_xps_queue and __skb_tx_hash · 0e001614

由 Tom Herbert 提交于 7月 01, 2014

Call standard function to get a packet hash instead of taking this from
skb->sk->sk_hash or only using skb->protocol.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e001614

flow_dissector: Abstract out hash computation · 5ed20a68

由 Tom Herbert 提交于 7月 01, 2014

Move the hash computation located in __skb_get_hash to be a separate
function which takes flow_keys as input. This will allow flow hash
computation in other contexts where we only have addresses and ports.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ed20a68

24 6月, 2014 1 次提交

flow_keys: Record IP layer protocol in skb_flow_dissect() · e0f31d84

由 Govindarajulu Varadarajan 提交于 6月 23, 2014

skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
give any information about IPv4 or IPv6.

This patch adds new member, n_proto, to struct flow_keys. Which records the
IP layer type. i.e IPv4 or IPv6.

This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.

Adding new member to flow_keys increases the struct size by around 4 bytes.
This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
qdisc_cb_private_validate()

So increase data size by 4
Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0f31d84

27 3月, 2014 1 次提交

net: Rename skb->rxhash to skb->hash · 61b905da

由 Tom Herbert 提交于 3月 24, 2014

The packet hash can be considered a property of the packet, not just
on RX path.

This patch changes name of rxhash and l4_rxhash skbuff fields to be
hash and l4_hash respectively. This includes changing uses of the
field in the code which don't call the access functions.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61b905da

13 3月, 2014 1 次提交

net: Convert uses of __constant_<foo> to <foo> · 2b8837ae

由 Joe Perches 提交于 3月 12, 2014

The use of __constant_<foo> has been unnecessary for quite awhile now.

Make these uses consistent with the rest of the kernel.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b8837ae

17 2月, 2014 2 次提交

netdevice: move netdev_cap_txqueue for shared usage to header · b9507bda

由 Daniel Borkmann 提交于 2月 16, 2014

In order to allow users to invoke netdev_cap_txqueue, it needs to
be moved into netdevice.h header file. While at it, also add kernel
doc header to document the API.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9507bda

netdevice: add queue selection fallback handler for ndo_select_queue · 99932d4f

由 Daniel Borkmann 提交于 2月 16, 2014

Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).

This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.
Suggested-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99932d4f

11 1月, 2014 1 次提交

net: core: explicitly select a txq before doing l2 forwarding · f663dd9a

由 Jason Wang 提交于 1月 10, 2014

Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
  instead of lower device which misses the necessary txq synchronization for
  lower device such as txq stopping or frozen required by dev watchdog or
  control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
  watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
  when tso is disabled for lower device.

Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f663dd9a

18 12月, 2013 1 次提交

net: Change skb_get_rxhash to skb_get_hash · 3958afa1

由 Tom Herbert 提交于 12月 15, 2013

Changing name of function as part of making the hash in skbuff to be
generic property, not just for receive path.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3958afa1

09 11月, 2013 1 次提交

net: flow_dissector: small optimizations in IPv4 dissect · 3797d3e8

由 Eric Dumazet 提交于 11月 07, 2013

By moving code around, we avoid :

1) A reload of iph->ihl (bit field, so needs a mask)

2) A conditional test (replaced by a conditional mov on x86)
   Fast path loads iph->protocol anyway.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3797d3e8

02 11月, 2013 1 次提交

net: flow_dissector: fail on evil iph->ihl · 6f092343

由 Jason Wang 提交于 11月 01, 2013

We don't validate iph->ihl which may lead a dead loop if we meet a IPIP
skb whose iph->ihl is zero. Fix this by failing immediately when iph->ihl
is evil (less than 5).

This issue were introduced by commit ec5efe79
(rps: support IPIP encapsulation).

Cc: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f092343

26 10月, 2013 1 次提交

net: initialize hashrnd in flow_dissector with net_get_random_once · 66415cf8

由 Hannes Frederic Sowa 提交于 10月 23, 2013

We also can defer the initialization of hashrnd in flow_dissector
to its first use. Since net_get_random_once is irq safe now we don't
have to audit the call paths if one of this functions get called by an
interrupt handler.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66415cf8

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功