提交 · b8226962b1c49c784aeddb9d2fafbf53dfdc2190 · openanolis / cloud-kernel

11 10月, 2017 32 次提交

openvswitch: add ct_clear action · b8226962

由 Eric Garver 提交于 10月 10, 2017

This adds a ct_clear action for clearing conntrack state. ct_clear is
currently implemented in OVS userspace, but is not backed by an action
in the kernel datapath. This is useful for flows that may modify a
packet tuple after a ct lookup has already occurred.
Signed-off-by: NEric Garver <e@erig.me>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8226962

net: dst: move cpu inside ifdef to avoid compilation warning · 833e0e2f

由 Jakub Kicinski 提交于 10月 10, 2017

If CONFIG_DST_CACHE is not selected cpu variable
will be unused and we will see a compilation warning.
Move it under the ifdef.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Fixes: d66f2b91 ("bpf: don't rely on the verifier lock for metadata_dst allocation")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

833e0e2f

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · f44dea34

由 David S. Miller 提交于 10月 10, 2017

Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2017-10-10

This series contains updates to e1000e and igb.

Benjamin Poirier provides several fixes for e1000e, starting with a
correction to the return status which was always returning success even
if it was not successful.  Fixed code comments to reflect the actual
code behavior.  Fixed the conditional test for the correct return
value.  Fixed a potential race condition reported by Lennart Sorensen,
where the single flag get_link_status is used to signal two different
states.

Sasha fixes a buffer overrun for i219 devices, where the chipset had
reduced the round-trip latency for the LAN controller DMA accesses
which in some high performance cases caused a buffer overrun while
processing the DMA transactions.

Willem de Bruijn changes the default behavior of e1000e to use the
burst mode settings by default unless the user specifies the
receive interrupt delay (RxIntDelay).

Florian Fainelli updates the driver to differentiate between when
e1000e_put_txbuf() is called from normal reclamation or when a
DMA mapping failure to make the driver more "drop monitor friendly".

Christophe JAILLET fixes a potential NULL pointer dereference by
properly returning -ENOMEM on memory allocation failures.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f44dea34

rtnetlink: bridge: use ext_ack instead of printk · b88d12e4

由 Florian Westphal 提交于 10月 10, 2017

We can now piggyback error strings to userspace via extended acks
rather than using printk.

Before:
bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
RTNETLINK answers: Invalid argument

After:
bridge fdb add 01:02:03:04:05:06 dev br0 vlan 4095
Error: invalid vlan id.

v3: drop 'RTM_' prefixes, suggested by David Ahern, they
are not useful, the add/del in bridge command line is enough.

Also reword error in response to malformed/bad vlan id attribute
size.

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b88d12e4

selftests: rtnetlink: test RTM_GETNETCONF · 8f88f74a

由 Florian Westphal 提交于 10月 10, 2017

exercise RTM_GETNETCONF call path for unspec, inet and inet6
families, they are DOIT_UNLOCKED candidates.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f88f74a

Merge branch 'mlx4_en-num-of-rings' · 31ce6cee

由 David S. Miller 提交于 10月 10, 2017

Tariq Toukan says:

====================
mlx4_en num of rings

This patchset from Inbar contains changes to rings control
to the mlx4 Eth driver.

Patches 1 and 2 limit the number of rings to the number of CPUs.
Patch 3 removes a limitation in logic of default number of RX rings.

Series generated against net-next commit:
812b5ca7 Add a driver for Renesas uPD60620 and uPD60620A PHYs
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31ce6cee

net/mlx4_en: Increase number of default RX rings · 80a8dc75

由 Inbar Karmy 提交于 10月 10, 2017

Remove limitation of netif_get_num_default_rss_queues()
from logic of RX rings default number.
Signed-off-by: NInbar Karmy <inbark@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80a8dc75

net/mlx4_en: Limit the number of RX rings · b8d39436

由 Inbar Karmy 提交于 10月 10, 2017

Limit the number of RX rings by the number of cores
in the system.
Signed-off-by: NInbar Karmy <inbark@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8d39436

net/mlx4_en: Limit the number of TX rings · 7e1dc5e9

由 Inbar Karmy 提交于 10月 10, 2017

Limit the number of TX rings per UP by the number of cores
in the system.
Signed-off-by: NInbar Karmy <inbark@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e1dc5e9

Merge branch 'hnx3-rxnfc' · 59d43876

由 David S. Miller 提交于 10月 10, 2017

Lipeng says:

====================
Support set_ringparam and {set|get}_rxnfc ethtool commands

1, Patch [1/5,2/5] add support for ethtool ops set_ringparam
   (ethtool -G) and fix related bug.
2, Patch [3/5,4/5, 5/5] add support for ethtool ops
   set_rxnfc/get_rxnfc (-n/-N) and fix related bug.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59d43876

net: hns3: fix the ring count for ETHTOOL_GRXRINGS · abf11d04

由 Lipeng 提交于 10月 10, 2017

This patch fix the ring count for ETHTOOL_GRXRINGS. Ring count
not TC size should be return for command "ethtool -n ethx".
Signed-off-by: NLipeng <lipeng321@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abf11d04

net: hns3: add support for ETHTOOL_GRXFH · 07d29954

由 Lipeng 提交于 10月 10, 2017

This patch add support for ethtool's ETHTOOL_GRXFH in hns3_get_rxnfc().
Signed-off-by: NLipeng <lipeng321@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07d29954

net: hns3: add support for set_rxnfc · f7db940a

由 Lipeng 提交于 10月 10, 2017

This patch supports the ethtool's set_rxnfc().
Signed-off-by: NLipeng <lipeng321@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7db940a

net: hns3: add support for set_ringparam · 5668abda

由 Lipeng 提交于 10月 10, 2017

This patch supports the ethtool's set_ringparam().
Signed-off-by: NLipeng <lipeng321@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5668abda

net: hns3: fixes the ring index in hns3_fini_ring · ee83f776

由 Lipeng 提交于 10月 10, 2017

This patch fixes the ring index in hns3_fini_ring.
Signed-off-by: NLipeng <lipeng321@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee83f776

cxgb4: add new T5 pci device id's · 652faa98

由 Ganesh Goudar 提交于 10月 10, 2017

Add 0x50aa and 0x50ab T5 device id's.
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

652faa98

cxgb4: Add support for new flash parts · 96ac18f1

由 Ganesh Goudar 提交于 10月 10, 2017

Add support for new flash parts identification, and
also cleanup the flash Part identifying and decoding
code.

Based on the original work of Casey Leedom <leedom@chelsio.com>
Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96ac18f1

net/core: Fix BUG to BUG_ON conditionals. · 9f77fad3

由 Tim Hansen 提交于 10月 09, 2017

Fix BUG() calls to use BUG_ON(conditional) macros.

This was found using make coccicheck M=net/core on linux next
tag next-2017092
Signed-off-by: NTim Hansen <devtimhansen@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f77fad3

Merge branch 'bpf-get-rid-of-global-verifier-state-and-reuse-instruction-printer' · 67174bb2

由 David S. Miller 提交于 10月 10, 2017

Jakub Kicinski says:

====================
bpf: get rid of global verifier state and reuse instruction printer

This set started off as simple extraction of eBPF verifier's instruction
printer into a separate file but evolved into removal of global state.
The purpose of moving instruction printing code is to be able to reuse it
from the bpftool.

As far as the global verifier lock goes, this set removes the global
variables relating to the log buffer, makes the one-time init done
by bpf_get_skb_set_tunnel_proto() not depend on any external locking,
and performs verifier log writeback as data is produced removing the need
for allocating a potentially large temporary buffer.

The final step of actually removing the verifier lock is left to someone
more competent and self-confident :)

Note that struct bpf_verifier_env is just 40B under two pages now,
we should probably switch to vzalloc() when it's expanded again...

v2:
 - add a selftest;
 - use env buffer and flush on every print (Alexei);
 - handle kernel log allocation failures (Daniel);
 - put the env log members into a struct (Daniel).
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67174bb2

bpf: write back the verifier log buffer as it gets filled · a2a7d570

由 Jakub Kicinski 提交于 10月 09, 2017

Verifier log buffer can be quite large (up to 16MB currently).
As Eric Dumazet points out if we allow multiple verification
requests to proceed simultaneously, malicious user may use the
verifier as a way of allocating large amounts of unswappable
memory to OOM the host.

Switch to a strategy of allocating a smaller buffer (1024B)
and writing it out into the user buffer after every print.

While at it remove the old BUG_ON().

This is in preparation of the global verifier lock removal.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2a7d570

bpf: don't rely on the verifier lock for metadata_dst allocation · d66f2b91

由 Jakub Kicinski 提交于 10月 09, 2017

bpf_skb_set_tunnel_*() functions require allocation of per-cpu
metadata_dst.  The allocation happens upon verification of the
first program using those helpers.  In preparation for removing
the verifier lock, use cmpxchg() to make sure we only allocate
the metadata_dsts once.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d66f2b91

tools: bpftool: use the kernel's instruction printer · c9c35995

由 Jakub Kicinski 提交于 10月 09, 2017

Compile the instruction printer from kernel/bpf and use it
for disassembling "translated" eBPF code.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9c35995

bpf: move instruction printing into a separate file · f4ac7e0b

由 Jakub Kicinski 提交于 10月 09, 2017

Separate the instruction printing into a standalone source file.
This way sneaky code from tools/ can compile it in directly.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4ac7e0b

bpf: move global verifier log into verifier environment · 61bd5218

由 Jakub Kicinski 提交于 10月 09, 2017

The biggest piece of global state protected by the verifier lock
is the verifier_log.  Move that log to struct bpf_verifier_env.
struct bpf_verifier_env has to be passed now to all invocations
of verbose().
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61bd5218

bpf: encapsulate verifier log state into a structure · e7bf8249

由 Jakub Kicinski 提交于 10月 09, 2017

Put the loose log_* variables into a structure.  This will make
it simpler to remove the global verifier state in following patches.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7bf8249

selftests/bpf: add a test for verifier logs · a99ca6db

由 Jakub Kicinski 提交于 10月 09, 2017

Add a test for verifier log handling.  Check bad attr combinations
but focus on cases when log is truncated.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NSimon Horman <simon.horman@netronome.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a99ca6db

ipv6: fix incorrect bitwise operator used on rt6i_flags · 442d713b

由 Colin Ian King 提交于 10月 10, 2017

The use of the | operator always leads to true which looks rather
suspect to me. Fix this by using & instead to just check the
RTF_CACHE entry bit.

Detected by CoverityScan, CID#1457734, #1457747 ("Wrong operator used")

Fixes: 35732d01 ("ipv6: introduce a hash table to store dst cache")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NWei Wang <weiwan@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

442d713b

ipv6: fix dereference of rt6_ex before null check error · b2427e67

由 Colin Ian King 提交于 10月 10, 2017

Currently rt6_ex is being dereferenced before it is null checked
hence there is a possible null dereference bug. Fix this by only
dereferencing rt6_ex after it has been null checked.

Detected by CoverityScan, CID#1457749 ("Dereference before null check")

Fixes: 81eb8447 ("ipv6: take care of rt6_stats")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2427e67

igb: check memory allocation failure · 18eb8636

由 Christophe JAILLET 提交于 8月 27, 2017

Check memory allocation failures and return -ENOMEM in such cases, as
already done for other memory allocations in this function.

This avoids NULL pointers dereference.
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Tested-by: Aaron Brown <aaron.f.brown@intel.com
Acked-by: NPJ Waskiewicz <peter.waskiewicz.jr@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

18eb8636

e1000e: Be drop monitor friendly · 377b6273

由 Florian Fainelli 提交于 8月 25, 2017

e1000e_put_txbuf() can be called from normal reclamation path as well as
when a DMA mapping failure, so we need to differentiate these two cases
when freeing SKBs to be drop monitor friendly. e1000e_tx_hwtstamp_work()
and e1000_remove() are processing TX timestamped SKBs and those should
not be accounted as drops either.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

377b6273

e1000e: apply burst mode settings only on default · 48072ae1

由 Willem de Bruijn 提交于 8月 25, 2017

Devices that support FLAG2_DMA_BURST have different default values
for RDTR and RADV. Apply burst mode default settings only when no
explicit value was passed at module load.

The RDTR default is zero. If the module is loaded for low latency
operation with RxIntDelay=0, do not override this value with a burst
default of 32.

Move the decision to apply burst values earlier, where explicitly
initialized module variables can be distinguished from defaults.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

48072ae1

e1000e: fix buffer overrun while the I219 is processing DMA transactions · b10effb9

由 Sasha Neftin 提交于 8月 06, 2017

Intel® 100/200 Series Chipset platforms reduced the round-trip
latency for the LAN Controller DMA accesses, causing in some high
performance cases a buffer overrun while the I219 LAN Connected
Device is processing the DMA transactions. I219LM and I219V devices
can fall into unrecovered Tx hang under very stressfully UDP traffic
and multiple reconnection of Ethernet cable. This Tx hang of the LAN
Controller is only recovered if the system is rebooted. Slightly slow
down DMA access by reducing the number of outstanding requests.
This workaround could have an impact on TCP traffic performance
on the platform. Disabling TSO eliminates performance loss for TCP
traffic without a noticeable impact on CPU performance.

Please, refer to I218/I219 specification update:
https://www.intel.com/content/www/us/en/embedded/products/networking/
ethernet-connection-i218-family-documentation.html
Signed-off-by: NSasha Neftin <sasha.neftin@intel.com>
Reviewed-by: NDima Ruinskiy <dima.ruinskiy@intel.com>
Reviewed-by: NRaanan Avargil <raanan.avargil@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

b10effb9

10 10月, 2017 8 次提交

e1000e: Avoid receiver overrun interrupt bursts · 4aea7a5c

由 Benjamin Poirier 提交于 7月 21, 2017

When e1000e_poll() is not fast enough to keep up with incoming traffic, the
adapter (when operating in msix mode) raises the Other interrupt to signal
Receiver Overrun.

This is a double problem because 1) at the moment e1000_msix_other()
assumes that it is only called in case of Link Status Change and 2) if the
condition persists, the interrupt is repeatedly raised again in quick
succession.

Ideally we would configure the Other interrupt to not be raised in case of
receiver overrun but this doesn't seem possible on this adapter. Instead,
we handle the first part of the problem by reverting to the practice of
reading ICR in the other interrupt handler, like before commit 16ecba59
("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
0a8047ac ("e1000e: Fix msi-x interrupt automask") which cleared IAME
from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
anymore. We handle the second part of the problem by not re-enabling the
Other interrupt right away when there is overrun. Instead, we wait until
traffic subsides, napi polling mode is exited and interrupts are
re-enabled.
Reported-by: NLennart Sorensen <lsorense@csclub.uwaterloo.ca>
Fixes: 16ecba59 ("e1000e: Do not read ICR in Other interrupt")
Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

4aea7a5c

e1000e: Separate signaling for link check/link up · 19110cfb

由 Benjamin Poirier 提交于 7月 21, 2017

Lennart reported the following race condition:

\ e1000_watchdog_task
    \ e1000e_has_link
        \ hw->mac.ops.check_for_link() === e1000e_check_for_copper_link
            /* link is up */
            mac->get_link_status = false;

                            /* interrupt */
                            \ e1000_msix_other
                                hw->mac.get_link_status = true;

        link_active = !hw->mac.get_link_status
        /* link_active is false, wrongly */

This problem arises because the single flag get_link_status is used to
signal two different states: link status needs checking and link status is
down.

Avoid the problem by using the return value of .check_for_link to signal
the link status to e1000e_has_link().
Reported-by: NLennart Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

19110cfb

e1000e: Fix return value test · d3509f8b

由 Benjamin Poirier 提交于 7月 21, 2017

All the helpers return -E1000_ERR_PHY.
Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d3509f8b

e1000e: Fix wrong comment related to link detection · 65a29da1

由 Benjamin Poirier 提交于 7月 21, 2017

Reading e1000e_check_for_copper_link() shows that get_link_status is set to
false after link has been detected. Therefore, it stays TRUE until then.
Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

65a29da1

e1000e: Fix error path in link detection · c4c40e51

由 Benjamin Poirier 提交于 7月 21, 2017

In case of error from e1e_rphy(), the loop will exit early and "success"
will be set to true erroneously.
Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c4c40e51

Add a driver for Renesas uPD60620 and uPD60620A PHYs · 812b5ca7

由 Bernd Edlinger 提交于 10月 08, 2017

Signed-off-by: NBernd Edlinger <bernd.edlinger@hotmail.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

812b5ca7

vhost_net: do not stall on zerocopy depletion · 1e6f7453

由 Willem de Bruijn 提交于 10月 06, 2017

Vhost-net has a hard limit on the number of zerocopy skbs in flight.
When reached, transmission stalls. Stalls cause latency, as well as
head-of-line blocking of other flows that do not use zerocopy.

Instead of stalling, revert to copy-based transmission.

Tested by sending two udp flows from guest to host, one with payload
of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
large flow is redirected to a netem instance with 1MBps rate limit
and deep 1000 entry queue.

modprobe ifb
ip link set dev ifb0 up
tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit

tc qdisc add dev tap0 ingress
tc filter add dev tap0 parent ffff: protocol ip \
u32 match ip dport 8000 0xffff \
action mirred egress redirect dev ifb0

Before the delay, both flows process around 80K pps. With the delay,
before this patch, both process around 400. After this patch, the
large flow is still rate limited, while the small reverts to its
original rate. See also discussion in the first link, below.

Without rate limiting, {1, 10, 100}x TCP_STREAM tests continued to
send at 100% zerocopy.

The limit in vhost_exceeds_maxpend must be carefully chosen. With
vq->num >> 1, the flows remain correlated. This value happens to
correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
fractions and ensure correctness also for much smaller values of
vq->num, by testing the min() of both explicitly. See also the
discussion in the second link below.

Changes
v1 -> v2
- replaced min with typed min_t
- avoid unnecessary whitespace change

Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.comSigned-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e6f7453

openvswitch: Add erspan tunnel support. · ceaa001a

由 William Tu 提交于 10月 04, 2017

Add erspan netlink interface for OVS.
Signed-off-by: NWilliam Tu <u9012063@gmail.com>
Cc: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceaa001a

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功