提交 · f0db9b073415848709dd59a6394969882f517da9 · openanolis / cloud-kernel

06 9月, 2014 4 次提交

ethtool: Add generic options for tunables · f0db9b07

由 Govindarajulu Varadarajan 提交于 9月 03, 2014

This patch adds new ethtool cmd, ETHTOOL_GTUNABLE & ETHTOOL_STUNABLE for getting
tunable values from driver.

Add get_tunable and set_tunable to ethtool_ops. Driver implements these
functions for getting/setting tunable value.
Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f0db9b07

enic: implement rx_copybreak · a03bb56e

由 Govindarajulu Varadarajan 提交于 9月 03, 2014

Calling dma_map_single()/dma_unmap_single() is quite expensive compared
to copying a small packet. So let's copy short frames and keep the buffers
mapped.
Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a03bb56e

dev_ioctl: remove dev_load() CAP_SYS_MODULE message · e020836d

由 Daniel Borkmann 提交于 9月 02, 2014

Marcel reported to see the following message when autoloading
is being triggered when adding nlmon device:

  Loading kernel module for a network device with
  CAP_SYS_MODULE (deprecated). Use CAP_NET_ADMIN and alias
  netdev-nlmon instead.

This false-positive happens despite with having correct
capabilities set, e.g. through issuing `ip link del dev nlmon`
more than once on a valid device with name nlmon, but Marcel
has also seen it on creation time when no nlmon module is
previously compiled-in or loaded as module and the device
name equals a link type name (e.g. nlmon, vxlan, team).

Stephen says:

  The netdev module alias is a hold over from the past. For
  normal devices, people used to create a alias eth0 to and
  point it to the type of network device used, that was back
  in the bad old ISA days before real discovery.

  Also, the tunnels create module alias for the control device
  and ip used to use this to autoload the tunnel device.

  The message is bogus and should just be removed, I also see
  it in a couple of other cases where tap devices are renamed
  for other usese.

As mentioned in 8909c9ad ("net: don't allow CAP_NET_ADMIN
to load non-netdev kernel modules"), we nevertheless still
might want to leave the old autoloading behaviour in place
as it could break old scripts, so for now, lets just remove
the log message as Stephen suggests.

Reference: http://thread.gmane.org/gmane.linux.kernel/1105168Reported-by: NMarcel Holtmann <marcel@holtmann.org>
Suggested-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e020836d

net: bpf: make eBPF interpreter images read-only · 60a3b225

由 Daniel Borkmann 提交于 9月 02, 2014

With eBPF getting more extended and exposure to user space is on it's way,
hardening the memory range the interpreter uses to steer its command flow
seems appropriate.  This patch moves the to be interpreted bytecode to
read-only pages.

In case we execute a corrupted BPF interpreter image for some reason e.g.
caused by an attacker which got past a verifier stage, it would not only
provide arbitrary read/write memory access but arbitrary function calls
as well. After setting up the BPF interpreter image, its contents do not
change until destruction time, thus we can setup the image on immutable
made pages in order to mitigate modifications to that code. The idea
is derived from commit 314beb9b ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks").

This is possible because bpf_prog is not part of sk_filter anymore.
After setup bpf_prog cannot be altered during its life-time. This prevents
any modifications to the entire bpf_prog structure (incl. function/JIT
image pointer).

Every eBPF program (including classic BPF that are migrated) have to call
bpf_prog_select_runtime() to select either interpreter or a JIT image
as a last setup step, and they all are being freed via bpf_prog_free(),
including non-JIT. Therefore, we can easily integrate this into the
eBPF life-time, plus since we directly allocate a bpf_prog, we have no
performance penalty.

Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
inspection of kernel_page_tables.  Brad Spengler proposed the same idea
via Twitter during development of this patch.

Joint work with Hannes Frederic Sowa.
Suggested-by: NBrad Spengler <spender@grsecurity.net>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60a3b225

05 9月, 2014 3 次提交

net: systemport: update UMAC_CMD only when link is detected · 4a804c01

由 Florian Fainelli 提交于 9月 02, 2014

When we bring the interface down, phy_stop() will schedule the PHY
state machine to call our link adjustment callback. By the time we do so,
we may have clock gated off the SYSTEMPORT hardware block, and this will
cause bus errors to happen in bcm_sysport_adj_link():

Make sure that we only touch the UMAC_CMD register when there is an
actual link. This is safe to do for two reasons:

- updating the Ethernet MAC registers only make sense when a physical
  link is present
- the PHY library state machine first set phydev->link = 0 before
  invoking phydev->adjust_link in the PHY_HALTED case

This is a similar fix to the GENET one:
c677ba8b ("net: bcmgenet: update
UMAC_CMD only when link is detected").

Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a804c01

ipv4: implement igmp_qrv sysctl to tune igmp robustness variable · a9fe8e29

由 Hannes Frederic Sowa 提交于 9月 02, 2014

As in IPv6 people might increase the igmp query robustness variable to
make sure unsolicited state change reports aren't lost on the network. Add
and document this new knob to igmp code.

RFCs allow tuning this parameter back to first IGMP RFC, so we also use
this setting for all counters, including source specific multicast.

Also take over sysctl value when upping the interface and don't reuse
the last one seen on the interface.

Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NFlavio Leitner <fbl@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9fe8e29

ipv6: add sysctl_mld_qrv to configure query robustness variable · 2f711939

由 Hannes Frederic Sowa 提交于 9月 02, 2014

This patch adds a new sysctl_mld_qrv knob to configure the mldv1/v2 query
robustness variable. It specifies how many retransmit of unsolicited mld
retransmit should happen. Admins might want to tune this on lossy links.

Also reset mld state on interface down/up, so we pick up new sysctl
settings during interface up event.

IPv6 certification requests this knob to be available.

I didn't make this knob netns specific, as it is mostly a setting in a
physical environment and should be per host.

Cc: Flavio Leitner <fbl@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NFlavio Leitner <fbl@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f711939

04 9月, 2014 4 次提交

mlx4_en: Convert the normal skb free path to dev_consume_skb_any() · b89df95d

由 Rick Jones 提交于 9月 03, 2014

It would appear the mlx4_en driver was still making a call to
dev_kfree_skb_any() where dev_consume_skb_any() would be more
appropriate.  This should make dropped packet profiling/tracking
easier/better over a NIC driven by mlx4_en.
Signed-off-by: NRick Jones <rick.jones2@hp.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b89df95d

lib/rhashtable: allow user to set the minimum shifts of shrinking · 94000176

由 Ying Xue 提交于 9月 03, 2014

Although rhashtable library allows user to specify a quiet big size
for user's created hash table, the table may be shrunk to a
very small size - HASH_MIN_SIZE(4) after object is removed from
the table at the first time. Subsequently, even if the total amount
of objects saved in the table is quite lower than user's initial
setting in a long time, the hash table size is still dynamically
adjusted by rhashtable_shrink() or rhashtable_expand() each time
object is inserted or removed from the table. However, as
synchronize_rcu() has to be called when table is shrunk or
expanded by the two functions, we should permit user to set the
minimum table size through configuring the minimum number of shifts
according to user specific requirement, avoiding these expensive
actions of shrinking or expanding because of calling synchronize_rcu().
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94000176

qdisc: validate frames going through the direct_xmit path · 1f59533f

由 Jesper Dangaard Brouer 提交于 9月 03, 2014

In commit 50cbe9ab ("net: Validate xmit SKBs right when we
pull them out of the qdisc") the validation code was moved out of
dev_hard_start_xmit and into dequeue_skb.

However this overlooked the fact that we do not always enqueue
the skb onto a qdisc. First situation is if qdisc have flag
TCQ_F_CAN_BYPASS and qdisc is empty.  Second situation is if
there is no qdisc on the device, which is a common case for
software devices.

Originally spotted and inital patch by Alexander Duyck.
As a result Alex was seeing issues trying to connect to a
vhost_net interface after commit 50cbe9ab was applied.

Added a call to validate_xmit_skb() in __dev_xmit_skb(), in the
code path for qdiscs with TCQ_F_CAN_BYPASS flag, and in
__dev_queue_xmit() when no qdisc.

Also handle the error situation where dev_hard_start_xmit() could
return a skb list, and does not return dev_xmit_complete(rc) and
falls through to the kfree_skb(), in that situation it should
call kfree_skb_list().

Fixes:  50cbe9ab ("net: Validate xmit SKBs right when we pull them out of the qdisc")
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f59533f

qdisc: exit case fixes for skb list handling in qdisc layer · 3f3c7eec

由 Jesper Dangaard Brouer 提交于 9月 03, 2014

More minor fixes to merge commit 53fda7f7 (Merge branch 'xmit_list')
that allows us to work with a list of SKBs.

Fixing exit cases in qdisc_reset() and qdisc_destroy(), where a
leftover requeued SKB (qdisc->gso_skb) can have the potential of
being a skb list, thus use kfree_skb_list().

This is a followup to commit 10770bc2 ("qdisc: adjustments for
API allowing skb list xmits").
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f3c7eec

03 9月, 2014 21 次提交

qdisc: adjustments for API allowing skb list xmits · 10770bc2

由 Jesper Dangaard Brouer 提交于 9月 02, 2014

Minor adjustments for merge commit 53fda7f7 (Merge branch 'xmit_list')
that allows us to work with a list of SKBs.

Update code doc to function sch_direct_xmit().

In handle_dev_cpu_collision() use kfree_skb_list() in error handling.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10770bc2

ethernet: arc: remove unused dev · 4a314988

由 Jingoo Han 提交于 9月 02, 2014

Remove unused 'dev' variable from arc_emac_remove(), since it's
not being used any more.
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a314988

ethernet: nvidia: Remove extra parens · d46781bc

由 David Wood 提交于 9月 01, 2014

Remove unnecessary double parenthesis around if statement.
Signed-off-by: NDavid Wood <devel@dtwood.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d46781bc

Merge branch 'netdev_modified' · 29fea209

由 David S. Miller 提交于 9月 02, 2014

Nicolas Dichtel says:

====================
rtnl: send notification in do_setlink()

This series ensures to call the notifier chain and to send a netlink
message when a change is done by do_setlink().

The three first patches mainly prepare the last one, which do this change.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29fea209

rtnl/do_setlink(): notify when a netdev is modified · ba998906

由 Nicolas Dichtel 提交于 9月 01, 2014

Depending on which parameters were updated, the changes were not propagated via
the notifier chain and netlink.

The new flag has been set only when the change did not cause a call to the
notifier chain and/or to the netlink notification functions.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba998906

rtnl/do_setlink(): last arg is now a set of flags · 90c325e3

由 Nicolas Dichtel 提交于 9月 01, 2014

There is no functional changes with this commit, it only prepares the next one.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90c325e3

rtnl/do_setlink(): set modified when IFLA_LINKMODE is updated · 1889b0e7

由 Nicolas Dichtel 提交于 9月 01, 2014

The only effect of this patch is to print a warning if IFLA_LINKMODE is updated
and a following change fails.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1889b0e7

rtnl/do_setlink(): set modified when IFLA_TXQLEN is updated · 5d1180fc

由 Nicolas Dichtel 提交于 9月 01, 2014

The only effect of this patch is to print a warning if IFLA_TXQLEN is updated
and a following change fails.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d1180fc

Merge branch 'be2net-next' · 219c5361

由 David S. Miller 提交于 9月 02, 2014

Sathya Perla says:

====================
be2net: patch set

v2 changes: add a new line after variable declaration in patch 12.

***
Patch 1 adds a few new log messages to help debugging in failure cases.

Patch 2 uses new macros for parsing RX/TX completions and TX wrbs to
help shorten the lines.

Patch 3 adds a description for the RX counter rx_input_fifo_overflow_drop.

Patch 4 adds TX completion error statistics reporting via ethtool.

Patch 5 adds a dma_mapping_error counter and its reporting via ethtool.

Patch 6 fixes up log messages in the Lancer FW download path.

Patch 7 replaces gotos with direct return statements.

Patch 8 cleans up be_change_mtu() code by using a new macro BE_MAX_MTU

Patch 9 makes be_cmd_get_regs() routine to return an integer status
similar to other FW cmd routines in be_cmds.c

Patch 10 gets rid of TX budget as enforcing a budget on TX completion
processing in NAPI is neither suggested nor it provides a performance benefit.

Patch 11 defines and uses a new macro for_all_tx_queues_on_eq() similar
to the RX processing code.

Patch 12 queries max_tx_qs from the FW for BE3 super-nic profiles.
For those profiles, the driver cannot assume a constant BE3_MAX_TX_QS value,
as the value may change for each function.

Please consider applying this patch set to the net-next tree. Thanks!
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

219c5361

be2net: query max_tx_qs for BE3 super-nic profile from FW · a28277dc

由 Suresh Reddy 提交于 9月 02, 2014

In the BE3 super-nic profile, the max_tx_qs value can vary for each function.
So the driver needs to query this value from FW instead of using the
pre-defined constant BE3_MAX_TX_QS.
Signed-off-by: NSuresh Reddy <Suresh.Reddy@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a28277dc

be2net: define macro for_all_tx_queues_on_eq() · a4906ea0

由 Sathya Perla 提交于 9月 02, 2014

Replace the for() loop that traverses all the TX queues on an EQ
with the macro for_all_tx_queues_on_eq(). With this expalnatory
name, the one line comment is not required anymore.
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4906ea0

be2net: get rid of TX budget · c8f64615

由 Sathya Perla 提交于 9月 02, 2014

Enforcing a budget on the TX completion processing in NAPI doesn't
benefit performance in anyway. Just get rid of it.
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8f64615

be2net: make be_cmd_get_regs() return a status · c5f156de

由 Vasundhara Volam 提交于 9月 02, 2014

There are a few failure cases in be_cmd_get_regs() that ideally must return
an error value. This style is used across all the routines in be_cmds.c with
this routine being an exception. This patch fixes this.
Signed-off-by: NVasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5f156de

be2net: define BE_MAX_MTU · 0d3f5cce

由 Kalesh AP 提交于 9月 02, 2014

This patch defines a new macro BE_MAX_MTU to make the code in be_change_mtu()
more readable.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d3f5cce

be2net: remove unncessary gotos · 3fb8cb80

由 Kalesh AP 提交于 9月 02, 2014

In cases where there is no extra code to handle an error, this patch replaces
gotos with a direct return statement.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fb8cb80

be2net: fix log messages in lancer FW download path · bb864e07

由 Kalesh AP 提交于 9月 02, 2014

Log messages in the Lancer FW download path have issues such as:
- a single message spanning multiple lines
- the success message is logged even in failure cases
- status codes are already logged in the FW cmd routines
This patch fixes these issues.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb864e07

be2net: Add a dma_mapping_error counter in ethtool · d3de1540

由 Vasundhara Volam 提交于 9月 02, 2014

Add a dma_mapping_error counter to count the number of packets dropped
due to DMA mapping errors.
Signed-off-by: NVasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3de1540

be2net: Add TX completion error statistics in ethtool · 512bb8a2

由 Kalesh AP 提交于 9月 02, 2014

HW reports TX completion errors in TX completion. This patch adds these
counters to ethtool statistics.
Signed-off-by: NKalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

512bb8a2

S
be2net: add a description for counter rx_input_fifo_overflow_drop · acbd6ff8
由 Sathya Perla 提交于 9月 02, 2014
```
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
acbd6ff8

be2net: shorten AMAP_GET/SET_BITS() macro calls · c3c18bc1

由 Sathya Perla 提交于 9月 02, 2014

The AMAP_GET/SET_BITS() macro calls take structure name as a parameter
and hence are long and span more than one line. Replace these calls
with a wrapper macros for RX/Tx compls and TX wrb. This results in fewer
lines and more readable code in be_main.c
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3c18bc1

be2net: add a few log messages · acbafeb1

由 Sathya Perla 提交于 9月 02, 2014

This patch adds the following log messages to help debugging
failure cases:
1) log FW version number: this is useful when driver initialization
fails and the FW version number cannot be queried via ethtool
2) per function resource limits for BEx chips: these values are
currently being printed only for Skyhawk and Lancer
3) PCI BAR mapping failure
4) function_mode/caps queried from FW: this helps catch any FW bugs
that could advertise wrong capabilities to the driver
Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acbafeb1

02 9月, 2014 8 次提交

sock: deduplicate errqueue dequeue · 364a9e93

由 Willem de Bruijn 提交于 8月 31, 2014

sk->sk_error_queue is dequeued in four locations. All share the
exact same logic. Deduplicate.

Also collapse the two critical sections for dequeue (at the top of
the recv handler) and signal (at the bottom).

This moves signal generation for the next packet forward, which should
be harmless.

It also changes the behavior if the recv handler exits early with an
error. Previously, a signal for follow-up packets on the errqueue
would then not be scheduled. The new behavior, to always signal, is
arguably a bug fix.

For rxrpc, the change causes the same function to be called repeatedly
for each queued packet (because the recv handler == sk_error_report).
It is likely that all packets will fail for the same reason (e.g.,
memory exhaustion).

This code runs without sk_lock held, so it is not safe to trust that
sk->sk_err is immutable inbetween releasing q->lock and the subsequent
test. Introduce int err just to avoid this potential race.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

364a9e93

net-timestamp: expand documentation · 8fe2f761

由 Willem de Bruijn 提交于 8月 31, 2014

Expand Documentation/networking/timestamping.txt with new
interfaces and bytestream timestamping. Also minor
cleanup of the other text.

Import txtimestamp.c test of the new features.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fe2f761

Merge branch 'csums-next' · c5a65680

由 David S. Miller 提交于 9月 01, 2014

Tom Herbert says:

====================
net: Checksum offload changes - Part VI

I am working on overhauling RX checksum offload. Goals of this effort
are:

- Specify what exactly it means when driver returns CHECKSUM_UNNECESSARY
- Preserve CHECKSUM_COMPLETE through encapsulation layers
- Don't do skb_checksum more than once per packet
- Unify GRO and non-GRO csum verification as much as possible
- Unify the checksum functions (checksum_init)
- Simplify code

What is in this seventh patch set:

- Add skb->csum. This allows a device or GRO to indicate that an
  invalid checksum was detected.
- Checksum unncessary to checksum complete conversions.

With these changes, I believe that the third goal of the overhaul is
now mostly achieved. In the case of no encapsulation or one layer of
encapsulation, there should only be at most one skb_checksum over
each packet (between GRO and normal path). In the case of two layers
of encapsulation, it is still possible with the right combination of
non-zero and zero UDP checksums to have >1 skb_checksum. For instance:
IP>GRE(with csum)>IP>UDP(zero csum)>VXLAN>IP>UDP(non-zero csum),
would likely necessiate an skb_checksum in GRO and normal path.
This doesn't seem like a common scenario at all so I'm inclined to
not address this now, if multiple layers of encapsulation becomes
popular we can reassess.

Note that checksum conversion shows a nice improvement for RX VXLAN when
outer UDP checksum is enabled (12.65% CPU compared to 20.94%). This
is not only from the fact that we don't need checksum calculation on
the host, but also allows GRO for VXLAN in this case. Checksum
conversion does not help send side (which still needs to perform
a checksum on host). For that we will implement remote checksum offload
in a later patch
(http://tools.ietf.org/html/draft-herbert-remotecsumoffload-00).

Please review carefully and test if possible, mucking with basic
checksum functions is always a little precarious :-)
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5a65680

T
l2tp: Enable checksum unnecessary conversions for l2tp/UDP sockets · 72297c59
由 Tom Herbert 提交于 8月 31, 2014
```
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
72297c59
T
vxlan: Enable checksum unnecessary conversions for vxlan/UDP sockets · c60c308c
由 Tom Herbert 提交于 8月 31, 2014
```
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c60c308c

gre: Add support for checksum unnecessary conversions · 884d338c

由 Tom Herbert 提交于 8月 31, 2014

Call skb_checksum_try_convert and skb_gro_checksum_try_convert
after checksum is found present and validated in the GRE header
for normal and GRO paths respectively.

In GRO path, call skb_gro_checksum_try_convert
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

884d338c

udp: Add support for doing checksum unnecessary conversion · 2abb7cdc

由 Tom Herbert 提交于 8月 31, 2014

Add support for doing CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE
conversion in UDP tunneling path.

In the normal UDP path, we call skb_checksum_try_convert after locating
the UDP socket. The check is that checksum conversion is enabled for
the socket (new flag in UDP socket) and that checksum field is
non-zero.

In the UDP GRO path, we call skb_gro_checksum_try_convert after
checksum is validated and checksum field is non-zero. Since this is
already in GRO we assume that checksum conversion is always wanted.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2abb7cdc

net: Infrastructure for checksum unnecessary conversions · d96535a1

由 Tom Herbert 提交于 8月 31, 2014

For normal path, added skb_checksum_try_convert which is called
to attempt to convert CHECKSUM_UNNECESSARY to CHECKSUM_COMPLETE. The
primary condition to allow this is that ip_summed is CHECKSUM_NONE
and csum_valid is true, which will be the state after consuming
a CHECKSUM_UNNECESSARY.

For GRO path, added skb_gro_checksum_try_convert which is the GRO
analogue of skb_checksum_try_convert. The primary condition to allow
this is that NAPI_GRO_CB(skb)->csum_cnt == 0 and
NAPI_GRO_CB(skb)->csum_valid is set. This implies that we have consumed
all available CHECKSUM_UNNECESSARY checksums in the GRO path.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d96535a1

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功