提交 · 9f12fbe603f7ae346b2b46008e325f0c9a68e55d · openeuler / Kernel

09 7月, 2014 1 次提交

net: filter: move load_pointer() into filter.h · 9f12fbe6

由 Zi Shen Lim 提交于 7月 03, 2014

load_pointer() is already a static inline function.
Let's move it into filter.h so BPF JIT implementations can reuse this
function.

Since we're exporting this function, let's also rename it to
bpf_load_pointer() for clarity.
Signed-off-by: NZi Shen Lim <zlim.lnx@gmail.com>
Reviewed-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f12fbe6

08 7月, 2014 6 次提交

net: Only do flow_dissector hash computation once per packet · a3b18ddb

由 Tom Herbert 提交于 7月 01, 2014

Add sw_hash flag to skbuff to indicate that skb->hash was computed
from flow_dissector. This flag is checked in skb_get_hash to avoid
repeatedly trying to compute the hash (ie. in the case that no L4 hash
can be computed).
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3b18ddb

flow_dissector: Use IPv6 flow label in flow_dissector · 19469a87

由 Tom Herbert 提交于 7月 01, 2014

This patch implements the receive side to support RFC 6438 which is to
use the flow label as an ECMP hash. If an IPv6 flow label is set
in a packet we can use this as input for computing an L4-hash. There
should be no need to parse any transport headers in this case.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19469a87

net: Call skb_get_hash in get_xps_queue and __skb_tx_hash · 0e001614

由 Tom Herbert 提交于 7月 01, 2014

Call standard function to get a packet hash instead of taking this from
skb->sk->sk_hash or only using skb->protocol.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e001614

flow_dissector: Abstract out hash computation · 5ed20a68

由 Tom Herbert 提交于 7月 01, 2014

Move the hash computation located in __skb_get_hash to be a separate
function which takes flow_keys as input. This will allow flow hash
computation in other contexts where we only have addresses and ports.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ed20a68

ptp: Classify ptp over ip over vlan packets · ae5c6c6d

由 Stefan Sørensen 提交于 6月 27, 2014

This extends the ptp bpf to also match ptp over ip over vlan packets. The ptp
classes are changed to orthogonal bitfields representing version, transport
and vlan values to simplify matching.
Signed-off-by: NStefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae5c6c6d

net: Simplify ptp class checks · b9c701ed

由 Stefan Sørensen 提交于 6月 27, 2014

Replace two switch statements enumerating all valid ptp classes with an if
statement matching for not PTP_CLASS_NONE.
Signed-off-by: NStefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9c701ed

02 7月, 2014 4 次提交

pktgen: RCU-ify "if_list" to remove lock in next_to_run() · 8788370a

由 Jesper Dangaard Brouer 提交于 6月 26, 2014

The if_lock()/if_unlock() in next_to_run() adds a significant
overhead, because its called for every packet in busy loop of
pktgen_thread_worker().  (Thomas Graf originally pointed me
at this lock problem).

Removing these two "LOCK" operations should in theory save us approx
16ns (8ns x 2), as illustrated below we do save 16ns when removing
the locks and introducing RCU protection.

Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30:
 (single CPU performance, ixgbe 10Gbit/s, E5-2630)
 * Prev   : 5684009 pps --> 175.93ns (1/5684009*10^9)
 * RCU-fix: 6272204 pps --> 159.43ns (1/6272204*10^9)
 * Diff   : +588195 pps --> -16.50ns

To understand this RCU patch, I describe the pktgen thread model
below.

In pktgen there is several kernel threads, but there is only one CPU
running each kernel thread.  Communication with the kernel threads are
done through some thread control flags.  This allow the thread to
change data structures at a know synchronization point, see main
thread func pktgen_thread_worker().

Userspace changes are communicated through proc-file writes.  There
are three types of changes, general control changes "pgctrl"
(func:pgctrl_write), thread changes "kpktgend_X"
(func:pktgen_thread_write), and interface config changes "etcX@N"
(func:pktgen_if_write).

Userspace "pgctrl" and "thread" changes are synchronized via the mutex
pktgen_thread_lock, thus only a single userspace instance can run.
The mutex is taken while the packet generator is running, by pgctrl
"start".  Thus e.g. "add_device" cannot be invoked when pktgen is
running/started.

All "pgctrl" and all "thread" changes, except thread "add_device",
communicate via the thread control flags.  The main problem is the
exception "add_device", that modifies threads "if_list" directly.

Fortunately "add_device" cannot be invoked while pktgen is running.
But there exists a race between "rem_device_all" and "add_device"
(which normally don't occur, because "rem_device_all" waits 125ms
before returning). Background'ing "rem_device_all" and running
"add_device" immediately allow the race to occur.

The race affects the threads (list of devices) "if_list".  The if_lock
is used for protecting this "if_list".  Other readers are given
lock-free access to the list under RCU read sections.

Note, interface config changes (via proc) can occur while pktgen is
running, which worries me a bit.  I'm assuming proc_remove() takes
appropriate locks, to assure no writers exists after proc_remove()
finish.

I've been running a script exercising the race condition (leading me
to fix the proc_remove order), without any issues.  The script also
exercises concurrent proc writes, while the interface config is
getting removed.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8788370a

pktgen: avoid expensive set_current_state() call in loop · baac167b

由 Jesper Dangaard Brouer 提交于 6月 26, 2014

Avoid calling set_current_state() inside the busy-loop in
pktgen_thread_worker().  In case of pkt_dev->delay, then it is still
used/enabled in pktgen_xmit() via the spin() call.

The set_current_state(TASK_INTERRUPTIBLE) uses a xchg, which implicit
is LOCK prefixed.  I've measured the asm LOCK operation to take approx
8ns on this E5-2630 CPU.  Performance increase corrolate with this
measurement.

Performance data with CLONE_SKB==100000, rx-usecs=30:
 (single CPU performance, ixgbe 10Gbit/s, E5-2630)
 * Prev:  5454050 pps --> 183.35ns (1/5454050*10^9)
 * Now:   5684009 pps --> 175.93ns (1/5684009*10^9)
 * Diff:  +229959 pps -->  -7.42ns
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baac167b

rtnetlink: allow to register ops without ops->setup set · b0ab2fab

由 Jiri Pirko 提交于 6月 26, 2014

So far, it is assumed that ops->setup is filled up. But there might be
case that ops might make sense even without ->setup. In that case,
forbid to newlink and dellink.

This allows to register simple rtnl link ops containing only ->kind.
That allows consistent way of passing device kind (either device-kind or
slave-kind) to userspace.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0ab2fab

net: fix some typos in comment · 9bf2b8c2

由 Ying Xue 提交于 6月 26, 2014

In commit 37112105("net:
QDISC_STATE_RUNNING dont need atomic bit ops") the
__QDISC_STATE_RUNNING is renamed to __QDISC___STATE_RUNNING,
but the old names existing in comment are not replaced with
the new name completely.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bf2b8c2

26 6月, 2014 6 次提交

net: fix setting csum_start in skb_segment() · de843723

由 Tom Herbert 提交于 6月 25, 2014

Dave Jones reported that a crash is occurring in

csum_partial
tcp_gso_segment
inet_gso_segment
? update_dl_migration
skb_mac_gso_segment
__skb_gso_segment
dev_hard_start_xmit
sch_direct_xmit
__dev_queue_xmit
? dev_hard_start_xmit
dev_queue_xmit
ip_finish_output
? ip_output
ip_output
ip_forward_finish
ip_forward
ip_rcv_finish
ip_rcv
__netif_receive_skb_core
? __netif_receive_skb_core
? trace_hardirqs_on
__netif_receive_skb
netif_receive_skb_internal
napi_gro_complete
? napi_gro_complete
dev_gro_receive
? dev_gro_receive
napi_gro_receive

It looks like a likely culprit is that SKB_GSO_CB()->csum_start is
not set correctly when doing non-scatter gather. We are using
offset as opposed to doffset.
Reported-by: NDave Jones <davej@redhat.com>
Tested-by: NDave Jones <davej@redhat.com>
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: 7e2b10c1 ("net: Support for multiple checksums with gso")
Acked-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de843723

ipv4: fix dst race in sk_dst_get() · f8864972

由 Eric Dumazet 提交于 6月 24, 2014

When IP route cache had been removed in linux-3.6, we broke assumption
that dst entries were all freed after rcu grace period. DST_NOCACHE
dst were supposed to be freed from dst_release(). But it appears
we want to keep such dst around, either in UDP sockets or tunnels.

In sk_dst_get() we need to make sure dst refcount is not 0
before incrementing it, or else we might end up freeing a dst
twice.

DST_NOCACHE set on a dst does not mean this dst can not be attached
to a socket or a tunnel.

Then, before actual freeing, we need to observe a rcu grace period
to make sure all other cpus can catch the fact the dst is no longer
usable.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDormando <dormando@rydia.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8864972

net: filter: Use kcalloc/kmalloc_array to allocate arrays · 99e72a0f

由 Tobias Klauser 提交于 6月 24, 2014

Use kcalloc/kmalloc_array to make it clear we're allocating arrays. No
integer overflow can actually happen here, since len/flen is guaranteed
to be less than BPF_MAXINSNS (4096). However, this changed makes sure
we're not going to get one if BPF_MAXINSNS were ever increased.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99e72a0f

trivial: net: filter: Change kerneldoc parameter order · 677a9fd3

由 Tobias Klauser 提交于 6月 24, 2014

Change the order of the parameters to sk_unattached_filter_create() in
the kerneldoc to reflect the order they appear in the actual function.

This fix is only cosmetic, in the generated doc they still appear in the
correct order without the fix.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

677a9fd3

trivial: net: filter: Fix typo in comment · 285276e7

由 Tobias Klauser 提交于 6月 24, 2014

Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

285276e7

inet: reduce TLB pressure for listeners · f6d8cb2e

由 Eric Dumazet 提交于 6月 24, 2014

It seems overkill to use vmalloc() for typical listeners with less than
2048 hash buckets. Try kmalloc() and fallback to vmalloc() to reduce TLB
pressure.

Use kvfree() helper as it is now available.
Use ilog2() instead of a loop.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6d8cb2e

24 6月, 2014 1 次提交

flow_keys: Record IP layer protocol in skb_flow_dissect() · e0f31d84

由 Govindarajulu Varadarajan 提交于 6月 23, 2014

skb_flow_dissect() dissects only transport header type in ip_proto. It dose not
give any information about IPv4 or IPv6.

This patch adds new member, n_proto, to struct flow_keys. Which records the
IP layer type. i.e IPv4 or IPv6.

This can be used in netdev->ndo_rx_flow_steer driver function to dissect flow.

Adding new member to flow_keys increases the struct size by around 4 bytes.
This causes BUILD_BUG_ON(sizeof(qcb->data) < sz); to fail in
qdisc_cb_private_validate()

So increase data size by 4
Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0f31d84

20 6月, 2014 1 次提交
- D
  Revert "net: return actual error on register_queue_kobjects" · 8e4946cc
  由 David S. Miller 提交于 6月 19, 2014
```
This reverts commit d36a4f4b.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  8e4946cc
19 6月, 2014 2 次提交

net: filter: fix upper BPF instruction limit · 6f9a093b

由 Kees Cook 提交于 6月 18, 2014

The original checks (via sk_chk_filter) for instruction count uses ">",
not ">=", so changing this in sk_convert_filter has the potential to break
existing seccomp filters that used exactly BPF_MAXINSNS many instructions.

Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org # v3.15+
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f9a093b

net: return actual error on register_queue_kobjects · d36a4f4b

由 Jie Liu 提交于 6月 17, 2014

Return the actual error code if call kset_create_and_add() failed

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d36a4f4b

18 6月, 2014 1 次提交

net: delete duplicate dev_set_rx_mode() call · d215d10f

由 Peter Pan(潘卫平) 提交于 6月 16, 2014

In __dev_open(), it already calls dev_set_rx_mode().
and dev_set_rx_mode() has no effect for a net device which does not have
IFF_UP flag set.

So the call of dev_set_rx_mode() is duplicate in __dev_change_flags().
Signed-off-by: NWeiping Pan <panweiping3@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d215d10f

15 6月, 2014 1 次提交

net: Fix save software checksum complete · 46fb51eb

由 Tom Herbert 提交于 6月 14, 2014

Geert reported issues regarding checksum complete and UDP.
The logic introduced in commit 7e3cead5
("net: Save software checksum complete") is not correct.

This patch:
1) Restores code in __skb_checksum_complete_header except for setting
   CHECKSUM_UNNECESSARY. This function may be calculating checksum on
   something less than skb->len.
2) Adds saving checksum to __skb_checksum_complete. The full packet
   checksum 0..skb->len is calculated without adding in pseudo header.
   This value is saved in skb->csum and then the pseudo header is added
   to that to derive the checksum for validation.
3) In both __skb_checksum_complete_header and __skb_checksum_complete,
   set skb->csum_valid to whether checksum of zero was computed. This
   allows skb_csum_unnecessary to return true without changing to
   CHECKSUM_UNNECESSARY which was done previously.
4) Copy new csum related bits in __copy_skb_header.
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46fb51eb

13 6月, 2014 1 次提交

rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 · e5eca6d4

由 Michal Schmidt 提交于 5月 28, 2014

When running RHEL6 userspace on a current upstream kernel, "ip link"
fails to show VF information.

The reason is a kernel<->userspace API change introduced by commit
88c5b5ce ("rtnetlink: Call nlmsg_parse() with correct header length"),
after which the kernel does not see iproute2's IFLA_EXT_MASK attribute
in the netlink request.

iproute2 adjusted for the API change in its commit 63338dca4513
("libnetlink: Use ifinfomsg instead of rtgenmsg in rtnl_wilddump_req_filter").

The problem has been noticed before:
http://marc.info/?l=linux-netdev&m=136692296022182&w=2
(Subject: Re: getting VF link info seems to be broken in 3.9-rc8)

We can do better than tell those with old userspace to upgrade. We can
recognize the old iproute2 in the kernel by checking the netlink message
length. Even when including the IFLA_EXT_MASK attribute, its netlink
message is shorter than struct ifinfomsg.

With this patch "ip link" shows VF information in both old and new
iproute2 versions.
Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5eca6d4

12 6月, 2014 4 次提交

net/core: Add VF link state control policy · c5b46160

由 Doug Ledford 提交于 6月 11, 2014

Commit 1d8faf48 (net/core: Add VF link state control) added VF link state
control to the netlink VF nested structure, but failed to add a proper entry
for the new structure into the VF policy table.  Add the missing entry so
the table and the actual data copied into the netlink nested struct are in
sync.
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5b46160

net: Save software checksum complete · 7e3cead5

由 Tom Herbert 提交于 6月 10, 2014

In skb_checksum complete, if we need to compute the checksum for the
packet (via skb_checksum) save the result as CHECKSUM_COMPLETE.
Subsequent checksum verification can use this.

Also, added csum_complete_sw flag to distinguish between software and
hardware generated checksum complete, we should always be able to trust
the software computation.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e3cead5

net: add __pskb_copy_fclone and pskb_copy_for_clone · bad93e9d

由 Octavian Purdila 提交于 6月 12, 2014

There are several instances where a pskb_copy or __pskb_copy is
immediately followed by an skb_clone.

Add a couple of new functions to allow the copy skb to be allocated
from the fclone cache and thus speed up subsequent skb_clone calls.

Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Allan Stephens <allan.stephens@windriver.com>
Cc: Andrew Hendry <andrew.hendry@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bad93e9d

net: filter: fix warning on 32-bit arch · 61f83d0d

由 Alexei Starovoitov 提交于 6月 11, 2014

fix compiler warning on 32-bit architectures:

net/core/filter.c: In function '__sk_run_filter':
net/core/filter.c:540:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
net/core/filter.c:550:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
net/core/filter.c:560:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61f83d0d

11 6月, 2014 2 次提交

net: fix UDP tunnel GSO of frag_list GRO packets · 5882a07c

由 Wei-Chun Chao 提交于 6月 08, 2014

This patch fixes a kernel BUG_ON in skb_segment. It is hit when
testing two VMs on openvswitch with one VM acting as VXLAN gateway.

During VXLAN packet GSO, skb_segment is called with skb->data
pointing to inner TCP payload. skb_segment calls skb_network_protocol
to retrieve the inner protocol. skb_network_protocol actually expects
skb->data to point to MAC and it calls pskb_may_pull with ETH_HLEN.
This ends up pulling in ETH_HLEN data from header tail. As a result,
pskb_trim logic is skipped and BUG_ON is hit later.

Move skb_push in front of skb_network_protocol so that skb->data
lines up properly.

kernel BUG at net/core/skbuff.c:2999!
Call Trace:
[<ffffffff816ac412>] tcp_gso_segment+0x122/0x410
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff816b3658>] skb_udp_tunnel_segment+0xd8/0x390
[<ffffffff816b3c00>] udp4_ufo_fragment+0x120/0x140
[<ffffffff816bc74c>] inet_gso_segment+0x13c/0x390
[<ffffffff8109d742>] ? default_wake_function+0x12/0x20
[<ffffffff8164b39b>] skb_mac_gso_segment+0x9b/0x170
[<ffffffff8164b4d0>] __skb_gso_segment+0x60/0xc0
[<ffffffff8164b6b3>] dev_hard_start_xmit+0x183/0x550
[<ffffffff8166c91e>] sch_direct_xmit+0xfe/0x1d0
[<ffffffff8164bc94>] __dev_queue_xmit+0x214/0x4f0
[<ffffffff8164bf90>] dev_queue_xmit+0x10/0x20
[<ffffffff81687edb>] ip_finish_output+0x66b/0x890
[<ffffffff81688a58>] ip_output+0x58/0x90
[<ffffffff816c628f>] ? fib_table_lookup+0x29f/0x350
[<ffffffff816881c9>] ip_local_out_sk+0x39/0x50
[<ffffffff816cbfad>] iptunnel_xmit+0x10d/0x130
[<ffffffffa0212200>] vxlan_xmit_skb+0x1d0/0x330 [vxlan]
[<ffffffffa02a3919>] vxlan_tnl_send+0x129/0x1a0 [openvswitch]
[<ffffffffa02a2cd6>] ovs_vport_send+0x26/0xa0 [openvswitch]
[<ffffffffa029931e>] do_output+0x2e/0x50 [openvswitch]
Signed-off-by: NWei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5882a07c

net: filter: cleanup A/X name usage · e430f34e

由 Alexei Starovoitov 提交于 6月 06, 2014

The macro 'A' used in internal BPF interpreter:
 #define A regs[insn->a_reg]
was easily confused with the name of classic BPF register 'A', since
'A' would mean two different things depending on context.

This patch is trying to clean up the naming and clarify its usage in the
following way:

- A and X are names of two classic BPF registers

- BPF_REG_A denotes internal BPF register R0 used to map classic register A
  in internal BPF programs generated from classic

- BPF_REG_X denotes internal BPF register R7 used to map classic register X
  in internal BPF programs generated from classic

- internal BPF instruction format:
struct sock_filter_int {
        __u8    code;           /* opcode */
        __u8    dst_reg:4;      /* dest register */
        __u8    src_reg:4;      /* source register */
        __s16   off;            /* signed offset */
        __s32   imm;            /* signed immediate constant */
};

- BPF_X/BPF_K is 1 bit used to encode source operand of instruction
In classic:
  BPF_X - means use register X as source operand
  BPF_K - means use 32-bit immediate as source operand
In internal:
  BPF_X - means use 'src_reg' register as source operand
  BPF_K - means use 32-bit immediate as source operand
Suggested-by: NChema Gonzalez <chema@google.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NChema Gonzalez <chema@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e430f34e

09 6月, 2014 1 次提交

net: force a list_del() in unregister_netdevice_many() · 87757a91

由 Eric Dumazet 提交于 6月 06, 2014

unregister_netdevice_many() API is error prone and we had too
many bugs because of dangling LIST_HEAD on stacks.

See commit f87e6f47 ("net: dont leave active on stack LIST_HEAD")

In fact, instead of making sure no caller leaves an active list_head,
just force a list_del() in the callee. No one seems to need to access
the list after unregister_netdevice_many()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87757a91

06 6月, 2014 2 次提交

net: filter: fix SKF_AD_PKTTYPE extension on big-endian · 0dcceabb

由 Alexei Starovoitov 提交于 6月 05, 2014

BPF classic->internal converter broke SKF_AD_PKTTYPE extension, since
pkt_type_offset() was failing to find skb->pkt_type field which is defined as:
__u8 pkt_type:3,
     fclone:2,
     ipvs_property:1,
     peeked:1,
     nf_trace:1;

Fix it by searching for 3 most significant bits and shift them by 5 at run-time

Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Tested-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dcceabb

MPLS: Use mpls_features to activate software MPLS GSO segmentation · 3b392ddb

由 Simon Horman 提交于 6月 04, 2014

If an MPLS packet requires segmentation then use mpls_features
to determine if the software implementation should be used.

As no driver advertises MPLS GSO segmentation this will always be
the case.

I had not noticed that this was necessary before as software MPLS GSO
segmentation was already being used in my test environment. I believe that
the reason for that is the skbs in question always had fragments and the
driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the
case for most drivers). Thus software segmentation was activated by
skb_gso_ok().

This introduces the overhead of an extra call to skb_network_protocol()
in the case where where CONFIG_NET_MPLS_GSO is set and
skb->ip_summed == CHECKSUM_NONE.

Thanks to Jesse Gross for prompting me to investigate this.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Acked-by: NYAMAMOTO Takashi <yamamoto@valinux.co.jp>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b392ddb

05 6月, 2014 2 次提交

net: use the new API kvfree() · 4cb28970

由 WANG Cong 提交于 6月 02, 2014

It is available since v3.15-rc5.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cb28970

net: Support for multiple checksums with gso · 7e2b10c1

由 Tom Herbert 提交于 6月 04, 2014

When creating a GSO packet segment we may need to set more than
one checksum in the packet (for instance a TCP checksum and
UDP checksum for VXLAN encapsulation). To be efficient, we want
to do checksum calculation for any part of the packet at most once.

This patch adds csum_start offset to skb_gso_cb. This tracks the
starting offset for skb->csum which is initially set in skb_segment.
When a protocol needs to compute a transport checksum it calls
gso_make_checksum which computes the checksum value from the start
of transport header to csum_start and then adds in skb->csum to get
the full checksum. skb->csum and csum_start are then updated to reflect
the checksum of the resultant packet starting from the transport header.

This patch also adds a flag to skbuff, encap_hdr_csum, which is set
in *gso_segment fucntions to indicate that a tunnel protocol needs
checksum calculation
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e2b10c1

04 6月, 2014 2 次提交

net: remove some unless free on failure in alloc_netdev_mqs() · 92ff71b8

由 WANG Cong 提交于 6月 03, 2014

When we jump to free_pcpu on failure in alloc_netdev_mqs()
rx and tx queues are not yet allocated, so no need to free them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92ff71b8

rtnetlink: fix a memory leak when ->newlink fails · e51fb152

由 Cong Wang 提交于 6月 03, 2014

It is possible that ->newlink() fails before registering
the device, in this case we should just free it, it's
safe to call free_netdev().

Fixes: commit 0e0eee24 (net: correct error path in rtnl_newlink())
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e51fb152

03 6月, 2014 3 次提交

ethtool: Check that reserved fields of struct ethtool_rxfh are 0 · f062a384

由 Ben Hutchings 提交于 5月 15, 2014

We should fail rather than silently ignoring use of these extensions.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>

f062a384

ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh() · fe62d001

由 Ben Hutchings 提交于 5月 15, 2014

ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers
regardless of whether they expose the hash key, unless you try to
set a hash key for a driver that doesn't expose it.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

fe62d001

net: filter: fix possible memory leak in __sk_prepare_filter() · 418c96ac

由 Leon Yu 提交于 6月 01, 2014

__sk_prepare_filter() was reworked in commit bd4cf0ed (net: filter:
rework/optimize internal BPF interpreter's instruction set) so that it should
have uncharged memory once things went wrong. However that work isn't complete.
Error is handled only in __sk_migrate_filter() while memory can still leak in
the error path right after sk_chk_filter().

Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Signed-off-by: NLeon Yu <chianglungyu@gmail.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Tested-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

418c96ac

openeuler / Kernel 11 个月 前同步成功

openeuler / Kernel
11 个月前同步成功