提交 · 813961de3ee6474dd5703e883471fd941d6c8f69 · openeuler / Kernel

23 11月, 2018 1 次提交

bpf: fix integer overflow in queue_stack_map · 813961de

由 Alexei Starovoitov 提交于 11月 22, 2018

Fix the following issues:

- allow queue_stack_map for root only
- fix u32 max_entries overflow
- disallow value_size == 0

Fixes: f1a2e44a ("bpf: add queue and stack maps")
Reported-by: NWei Wu <ww9210@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Mauricio Vasquez B <mauricio.vasquez@polito.it>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

813961de

22 11月, 2018 1 次提交

tools: bpftool: fix potential NULL pointer dereference in do_load · dde7011a

由 Jakub Kicinski 提交于 11月 21, 2018

This patch fixes a possible null pointer dereference in
do_load, detected by the semantic patch deref_null.cocci,
with the following warning:

./tools/bpf/bpftool/prog.c:1021:23-25: ERROR: map_replace is NULL but dereferenced.

The following code has potential null pointer references:
881             map_replace = reallocarray(map_replace, old_map_fds + 1,
882                                        sizeof(*map_replace));
883             if (!map_replace) {
884                     p_err("mem alloc failed");
885                     goto err_free_reuse_maps;
886             }

...
1019 err_free_reuse_maps:
1020         for (i = 0; i < old_map_fds; i++)
1021                 close(map_replace[i].fd);
1022         free(map_replace);

Fixes: 3ff5a4dc ("tools: bpftool: allow reuse of maps with bpftool prog load")
Co-developed-by: NWen Yang <wen.yang99@zte.com.cn>
Signed-off-by: NWen Yang <wen.yang99@zte.com.cn>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

dde7011a

17 11月, 2018 2 次提交

bpf: allocate local storage buffers using GFP_ATOMIC · 569a933b

由 Roman Gushchin 提交于 11月 14, 2018

Naresh reported an issue with the non-atomic memory allocation of
cgroup local storage buffers:

[   73.047526] BUG: sleeping function called from invalid context at
/srv/oe/build/tmp-rpb-glibc/work-shared/intel-corei7-64/kernel-source/mm/slab.h:421
[   73.060915] in_atomic(): 1, irqs_disabled(): 0, pid: 3157, name: test_cgroup_sto
[   73.068342] INFO: lockdep is turned off.
[   73.072293] CPU: 2 PID: 3157 Comm: test_cgroup_sto Not tainted
4.20.0-rc2-next-20181113 #1
[   73.080548] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS
2.0b 07/27/2017
[   73.088018] Call Trace:
[   73.090463]  dump_stack+0x70/0xa5
[   73.093783]  ___might_sleep+0x152/0x240
[   73.097619]  __might_sleep+0x4a/0x80
[   73.101191]  __kmalloc_node+0x1cf/0x2f0
[   73.105031]  ? cgroup_storage_update_elem+0x46/0x90
[   73.109909]  cgroup_storage_update_elem+0x46/0x90

cgroup_storage_update_elem() (as well as other update map update
callbacks) is called with disabled preemption, so GFP_ATOMIC
allocation should be used: e.g. alloc_htab_elem() in hashtab.c.
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: NRoman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

569a933b

bpf: fix off-by-one error in adjust_subprog_starts · afd59424

由 Edward Cree 提交于 11月 16, 2018

When patching in a new sequence for the first insn of a subprog, the start
of that subprog does not change (it's the first insn of the sequence), so
adjust_subprog_starts should check start <= off (rather than < off).
Also added a test to test_verifier.c (it's essentially the syz reproducer).

Fixes: cc8b0b92 ("bpf: introduce function calls (function boundaries)")
Reported-by: syzbot+4fc427c7af994b0948be@syzkaller.appspotmail.com
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Acked-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

afd59424

09 11月, 2018 16 次提交

kselftests/bpf: use ping6 as the default ipv6 ping binary when it exists · da85d8bf

由 Li Zhijian 提交于 11月 05, 2018

At commit deee2cae ("kselftests/bpf: use ping6 as the default ipv6 ping
binary if it exists"), it fixed similar issues for shell script, but it
missed a same issue in the C code.

Fixes: 371e4fcc ("selftests/bpf: cgroup local storage-based network counters")
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Signed-off-by: NLi Zhijian <lizhijian@cn.fujitsu.com>
CC: Philip Li <philip.li@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

da85d8bf

tools: bpftool: update references to other man pages in documentation · f98e46a2

由 Quentin Monnet 提交于 11月 08, 2018

Update references to other bpftool man pages at the bottom of each
manual page. Also reference the "bpf(2)" and "bpf-helpers(7)" man pages.

References are sorted by number of man section, then by
"prog-and-map-go-first", the other pages in alphabetical order.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f98e46a2

tools: bpftool: pass an argument to silence open_obj_pinned() · f120919f

由 Quentin Monnet 提交于 11月 08, 2018

Function open_obj_pinned() prints error messages when it fails to open a
link in the BPF virtual file system. However, in some occasions it is
not desirable to print an error, for example when we parse all links
under the bpffs root, and the error is due to some paths actually being
symbolic links.

Example output:

    # ls -l /sys/fs/bpf/
    lrwxrwxrwx 1 root root 0 Oct 18 19:00 ip -> /sys/fs/bpf/tc/
    drwx------ 3 root root 0 Oct 18 19:00 tc
    lrwxrwxrwx 1 root root 0 Oct 18 19:00 xdp -> /sys/fs/bpf/tc/

    # bpftool --bpffs prog show
    Error: bpf obj get (/sys/fs/bpf): Permission denied
    Error: bpf obj get (/sys/fs/bpf): Permission denied

    # strace -e bpf bpftool --bpffs prog show
    bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/ip", bpf_fd=0}, 72) = -1 EACCES (Permission denied)
    Error: bpf obj get (/sys/fs/bpf): Permission denied
    bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/xdp", bpf_fd=0}, 72) = -1 EACCES (Permission denied)
    Error: bpf obj get (/sys/fs/bpf): Permission denied
    ...

To fix it, pass a bool as a second argument to the function, and prevent
it from printing an error when the argument is set to true.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f120919f

tools: bpftool: fix plain output and doc for --bpffs option · a8bfd2bc

由 Quentin Monnet 提交于 11月 08, 2018

Edit the documentation of the -f|--bpffs option to make it explicit that
it dumps paths of pinned programs when bpftool is used to list the
programs only, so that users do not believe they will see the name of
the newly pinned program with "bpftool prog pin" or "bpftool prog load".

Also fix the plain output: do not add a blank line after each program
block, in order to remain consistent with what bpftool does when the
option is not passed.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

a8bfd2bc

tools: bpftool: prevent infinite loop in get_fdinfo() · 53909030

由 Quentin Monnet 提交于 11月 08, 2018

Function getline() returns -1 on failure to read a line, thus creating
an infinite loop in get_fdinfo() if the key is not found. Fix it by
calling the function only as long as we get a strictly positive return
value.

Found by copying the code for a key which is not always present...

Fixes: 71bb428f ("tools: bpf: add bpftool")
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

53909030

tools/bpftool: copy a few net uapi headers to tools directory · 49a249c3

由 Yonghong Song 提交于 11月 07, 2018

Commit f6f3bac0 ("tools/bpf: bpftool: add net support")
added certain networking support to bpftool.
The implementation relies on a relatively recent uapi header file
linux/tc_act/tc_bpf.h on the host which contains the marco
definition of TCA_ACT_BPF_ID.

Unfortunately, this is not the case for all distributions.
See the email message below where rhel-7.2 does not have
an up-to-date linux/tc_act/tc_bpf.h.
  https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1799211.html
Further investigation found that linux/pkt_cls.h is also needed for macro
TCA_BPF_TAG.

This patch fixed the issue by copying linux/tc_act/tc_bpf.h
and linux/pkt_cls.h from kernel include/uapi directory to
tools/include/uapi directory so building the bpftool does not depend
on host system for these files.

Fixes: f6f3bac0 ("tools/bpf: bpftool: add net support")
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Cc: Li Zhijian <zhijianx.li@intel.com>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

49a249c3

bpf: Fix IPv6 dport byte order in bpf_sk_lookup_udp · b13b8787

由 Andrey Ignatov 提交于 11月 07, 2018

Lookup functions in sk_lookup have different expectations about byte
order of provided arguments.

Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup
expect dport to be in network byte order and do ntohs(dport) internally.

At the same time __inet6_lookup expects dport to be in host byte order
and correspondingly name the argument hnum.

sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and
__inet6_lookup with regard to dport. But in __udp6_lib_lookup case it
uses host instead of expected network byte order. It makes result
returned by bpf_sk_lookup_udp for IPv6 incorrect.

The patch fixes byte order of dport passed to __udp6_lib_lookup.

Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84
fixes TCPv6 but breaks UDPv6.

Fixes: 5ef0ae84 ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup")
Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Acked-by: NJoe Stringer <joe@wand.net.nz>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

b13b8787

net: smsc95xx: Fix MTU range · 85b18b02

由 Stefan Wahren 提交于 11月 08, 2018

The commit f77f0aee ("net: use core MTU range checking in USB NIC
drivers") introduce a common MTU handling for usbnet. But it's missing
the necessary changes for smsc95xx. So set the MTU range accordingly.

This patch has been tested on a Raspberry Pi 3.

Fixes: f77f0aee ("net: use core MTU range checking in USB NIC drivers")
Signed-off-by: NStefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85b18b02

net: stmmac: Fix RX packet size > 8191 · 8137b6ef

由 Thor Thayer 提交于 11月 08, 2018

Ping problems with packets > 8191 as shown:

PING 192.168.1.99 (192.168.1.99) 8150(8178) bytes of data.
8158 bytes from 192.168.1.99: icmp_seq=1 ttl=64 time=0.669 ms
wrong data byte 8144 should be 0xd0 but was 0x0
16    10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
      20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
%< ---------------snip--------------------------------------
8112  b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
      c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf
8144  0 0 0 0 d0 d1
      ^^^^^^^
Notice the 4 bytes of 0 before the expected byte of d0.

Databook notes that the RX buffer must be a multiple of 4/8/16
bytes [1].

Update the DMA Buffer size define to 8188 instead of 8192. Remove
the -1 from the RX buffer size allocations and use the new
DMA Buffer size directly.

[1] Synopsys DesignWare Cores Ethernet MAC Universal v3.70a
    [section 8.4.2 - Table 8-24]

Tested on SoCFPGA Stratix10 with ping sweep from 100 to 8300 byte packets.

Fixes: 286a8372 ("stmmac: add CHAINED descriptor mode support (V4)")
Suggested-by: NJose Abreu <jose.abreu@synopsys.com>
Signed-off-by: NThor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8137b6ef

Merge branch 'qed-Slowpath-Queue-bug-fixes' · 81fe16e0

由 David S. Miller 提交于 11月 08, 2018

Denis Bolotin says:

====================
qed: Slowpath Queue bug fixes

This patch series fixes several bugs in the SPQ mechanism.
It deals with SPQ entries management, preventing resource leaks, memory
corruptions and handles error cases throughout the driver.
Please consider applying to net.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81fe16e0

qed: Fix potential memory corruption · fa5c448d

由 Sagiv Ozeri 提交于 11月 08, 2018

A stuck ramrod should be deleted from the completion_pending list,
otherwise it will be added again in the future and corrupt the list.

Return error value to inform that ramrod is stuck and should be deleted.
Signed-off-by: NSagiv Ozeri <sagiv.ozeri@cavium.com>
Signed-off-by: NDenis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa5c448d

qed: Fix SPQ entries not returned to pool in error flows · fb5e7438

由 Denis Bolotin 提交于 11月 08, 2018

qed_sp_destroy_request() API was added for SPQ users that need to
free/return the entry they acquired in their error flows.
Signed-off-by: NDenis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: NMichal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb5e7438

qed: Fix blocking/unlimited SPQ entries leak · 2632f22e

由 Denis Bolotin 提交于 11月 08, 2018

When there are no SPQ entries left in the free_pool, new entries are
allocated and are added to the unlimited list. When an entry in the pool
is available, the content is copied from the original entry, and the new
entry is sent to the device. qed_spq_post() is not aware of that, so the
additional entry is stored in the original entry as p_post_ent, which can
later be returned to the pool.
Signed-off-by: NDenis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: NMichal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2632f22e

qed: Fix memory/entry leak in qed_init_sp_request() · 39477551

由 Denis Bolotin 提交于 11月 08, 2018

Free the allocated SPQ entry or return the acquired SPQ entry to the free
list in error flows.
Signed-off-by: NDenis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: NMichal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39477551

inet: frags: better deal with smp races · 0d5b9311

由 Eric Dumazet 提交于 11月 08, 2018

Multiple cpus might attempt to insert a new fragment in rhashtable,
if for example RPS is buggy, as reported by 배석진 in
https://patchwork.ozlabs.org/patch/994601/

We use rhashtable_lookup_get_insert_key() instead of
rhashtable_insert_fast() to let cpus losing the race
free their own inet_frag_queue and use the one that
was inserted by another cpu.

Fixes: 648700f7 ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: N배석진 <soukjin.bae@samsung.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d5b9311

net: hns3: bugfix for not checking return value · e12c2252

由 Huazhong Tan 提交于 11月 08, 2018

hns3_reset_notify_init_enet() only return error early if the return
value of hns3_restore_vlan() is not 0.

This patch adds checking for the return value of hns3_restore_vlan.

Fixes: 7fa6be4f ("net: hns3: fix incorrect return value/type of some functions")
Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e12c2252

08 11月, 2018 11 次提交

qlcnic: remove assumption that vlan_tci != 0 · b25ddb00

由 Michał Mirosław 提交于 11月 07, 2018

VLAN.TCI == 0 is perfectly valid (802.1p), so allow it to be accelerated.
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b25ddb00

ibmvnic: fix accelerated VLAN handling · e84b4794

由 Michał Mirosław 提交于 11月 07, 2018

Don't request tag insertion when it isn't present in outgoing skb.
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e84b4794

Merge branch 'FDDI-defza-Fix-a-bunch-of-small-issues' · b1870a6d

由 David S. Miller 提交于 11月 07, 2018

Maciej W. Rozycki says:

====================
FDDI: defza: Fix a bunch of small issues

 Here is a bunch of small fixes addressing issues that I missed in my
final round of testing.  None of these affect run-time behaviour.  One was
actually found by the kbuild bot, which turned out to be more pedantic
than my compiler.  See individual change descriptions for details.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1870a6d

FDDI: defza: Make the driver version string constant · 8f5365eb

由 Maciej W. Rozycki 提交于 11月 07, 2018

The driver version string is obviously not meant to be changed at run
time, so mark it `const'.
Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f5365eb

FDDI: defza: Move SMT Tx data buffer declaration next to its skb · 04453b6b

由 Maciej W. Rozycki 提交于 11月 07, 2018

Move the temporary data buffer used when tapping into the SMT Tx queue
from the outer function level into the conditional block it's actually
used in and its containing skb is also declared, making the structure of
code better.
Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04453b6b

FDDI: defza: Add missing comment closing · 5f5fae37

由 Maciej W. Rozycki 提交于 11月 07, 2018

Fix:

drivers/net/fddi/defza.h:238:1: warning: "/*" within comment [-Wcomment]

by adding a missing comment closing.
Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f5fae37

FDDI: defza: Fix SPDX annotation · 96ed82cc

由 Maciej W. Rozycki 提交于 11月 07, 2018

The SPDX annotation for this driver does not match the license text,
which specifies GNU GPL 2 or later.  Make the two match by correcting
the SPDX tag.
Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96ed82cc

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue · 69e36298

由 David S. Miller 提交于 11月 07, 2018

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-11-07

This series contains fixes to igb, i40e and ice drivers.

Anirudh fixes an issue during rebuild of the ice driver, where we need
to set the carrier state, as well as start or stop the queues all based
on the link status.  Removed functions that were duplicating current
functionality in the VSI rebuild/replay framework.

Dave fixes a potential resource collision during the remove path, so add
a check to see if we are in the middle of a reset.  Fixed the remove
path to ensure we call netif_napi_del() to free vectors before we set
vsi->netdev to NULL.

Akeem fixes an issue when the receive or transmit pause parameter is
set, results in link loss on the interface.  Fixed the spelling of
"Enabling" in error message.

Victor fixes potential memory leak by also freeing the related VSI
contexts in the unload path.

Md Fahad fixes a flag during port VLAN insertion, which was not being
set properly.

Brett fixes a transmit timeout during stress due to the hardware tail
and software tail were incorrectly out of sync.

Miroslav Lichvar fixes the igb PHC timecounter update interval to be
sure the timecounter is updated in time.

Chinh fixes the req_speeds variable to be u16 instead of u8 so that it
can handle all the link speeds.

Jake fixes i40e to add back the missing feature flags, which was causing
IP-in-IP offloads to be reported as not supported.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69e36298

i40e: enable NETIF_F_NTUPLE and NETIF_F_HW_TC at driver load · d5596fd4

由 Jacob Keller 提交于 10月 29, 2018

The assignment of the feature flag NETIF_F_NTUPLE and NETIF_F_HW_TC
occurs prior to the initial setup of the local hw_features variable.

This means the features are set as user-changeable, but are not set in
the currently active feature list. This results in the features being
disabled at the driver's initial load.

Move the assignment after the initial assignment of hw_features, and
assign to the local variable. This ensures that NETIF_F_NTUPLE and
NETIF_F_HW_TC are marked as user-changeable, and also enables them by
default when the driver loads.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d5596fd4

i40e: restore NETIF_F_GSO_IPXIP[46] to netdev features · ba766b8b

由 Jacob Keller 提交于 10月 29, 2018

Since commit bacd75cf ("i40e/i40evf: Add capability exchange for
outer checksum", 2017-04-06) the i40e driver has not reported support
for IP-in-IP offloads. This likely occurred due to a bad rebase, as the
commit extracts hw_enc_features into its own variable. As part of this
change, it dropped the NETIF_F_FSO_IPXIP flags from the
netdev->hw_enc_features. This was unfortunately not caught during code
review.

Fix this by adding back the missing feature flags.

For reference, NETIF_F_GSO_IPXIP4 was added in commit 7e13318d
("net: define gso types for IPx over IPv4 and IPv6", 2016-05-20),
replacing NETIF_F_GSO_IPIP and NETIF_F_GSO_SIT.

NETIF_F_GSO_IPXIP6 was added in commit bf2d1df3 ("intel: Add support
for IPv6 IP-in-IP offload", 2016-05-20).
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ba766b8b

ice: Change req_speeds to be u16 · ffe49823

由 Chinh T Cao 提交于 11月 05, 2018

Since the req_speeds field in struct ice_link_status is a u8,
req_speeds & ICE_AQ_LINK_SPEED_40GB always returns 0. This was caught
by a coverity scan.

Fix this by changing req_speeds to be u16.
Reported-by: NBruce Allan <bruce.w.allan@intel.com>
Signed-off-by: NChinh T Cao <chinh.t.cao@intel.com>
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

ffe49823

07 11月, 2018 9 次提交

igb: shorten maximum PHC timecounter update interval · 4c9b658e

由 Miroslav Lichvar 提交于 10月 26, 2018

The timecounter needs to be updated at least once per ~550 seconds in
order to avoid a 40-bit SYSTIM timestamp to be misinterpreted as an old
timestamp.

Since commit 500462a9 ("timers: Switch to a non-cascading wheel"),
scheduling of delayed work seems to be less accurate and a requested
delay of 540 seconds may actually be longer than 550 seconds. Also, the
PHC may be adjusted to run up to 6% faster than real time and the system
clock up to 10% slower. Shorten the delay to 360 seconds to be sure the
timecounter is updated in time.

This fixes an issue with HW timestamps on 82580/I350/I354 being off by
~1100 seconds for few seconds every ~9 minutes.

Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NMiroslav Lichvar <mlichvar@redhat.com>
Acked-by: NJacob Keller <jacob.e.keller@intel.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

4c9b658e

ice: Fix the bytecount sent to netdev_tx_sent_queue · d944b469

由 Brett Creeley 提交于 10月 26, 2018

Currently if the driver does a TSO offload the bytecount sent to
netdev_tx_sent_queue will be incorrect. This is because in ice_tso we
overwrite the initial value that we set in ice_tx_map. This creates a
mismatch between the Tx and Tx clean flow. In the Tx clean flow we
calculate the bytecount (called total_bytes) as we clean the
descriptors so the value used in the Tx clean path is correct. Fix this
by using += in ice_tso instead of =. This fixes the mismatch in
bytecount mentioned above.
Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d944b469

ice: Fix tx_timeout in PF driver · c585ea42

由 Brett Creeley 提交于 10月 26, 2018

Prior to this commit the driver was running into tx_timeouts when a
queue was stressed enough. This was happening because the HW tail
and SW tail (NTU) were incorrectly out of sync. Consequently this was
causing the HW head to collide with the HW tail, which to the hardware
means that all descriptors posted for Tx have been processed.

Due to the Tx logic used in the driver SW tail and HW tail are allowed
to be out of sync. This is done as an optimization because it allows the
driver to write HW tail as infrequently as possible, while still
updating the SW tail index to keep track. However, there are situations
where this results in the tail never getting updated, resulting in Tx
timeouts.

Tx HW tail write condition:
	if (netif_xmit_stopped(txring_txq(tx_ring) || !skb->xmit_more)
		writel(sw_tail, tx_ring->tail);

An issue was found in the Tx logic that was causing the afore mentioned
condition for updating HW tail to never happen, causing tx_timeouts.

In ice_xmit_frame_ring we calculate how many descriptors we need for the
Tx transaction based on the skb the kernel hands us. This is then passed
into ice_maybe_stop_tx along with some extra padding to determine if we
have enough descriptors available for this transaction. If we don't then
we return -EBUSY to the stack, otherwise we move on and eventually
prepare the Tx descriptors accordingly in ice_tx_map and set
next_to_watch. In ice_tx_map we make another call to ice_maybe_stop_tx
with a value of MAX_SKB_FRAGS + 4. The key here is that this value is
possibly less than the value we sent in the first call to
ice_maybe_stop_tx in ice_xmit_frame_ring. Now, if the number of unused
descriptors is between MAX_SKB_FRAGS + 4 and the value used in the first
call to ice_maybe_stop_tx in ice_xmit_frame_ring then we do not update
the HW tail because of the "Tx HW tail write condition" above. This is
because in ice_maybe_stop_tx we return success from ice_maybe_stop_tx
instead of calling __ice_maybe_stop_tx and subsequently calling
netif_stop_subqueue, which sets the __QUEUE_STATE_DEV_XOFF bit. This
bit is then checked in the "Tx HW tail write condition" by calling
netif_xmit_stopped and subsequently updating HW tail if the
afore mentioned bit is set.

In ice_clean_tx_irq, if next_to_watch is not NULL, we end up cleaning
the descriptors that HW sets the DD bit on and we have the budget. The
HW head will eventually run into the HW tail in response to the
description in the paragraph above.

The next time through ice_xmit_frame_ring we make the initial call to
ice_maybe_stop_tx with another skb from the stack. This time we do not
have enough descriptors available and we return NETDEV_TX_BUSY to the
stack and end up setting next_to_watch to NULL.

This is where we are stuck. In ice_clean_tx_irq we never clean anything
because next_to_watch is always NULL and in ice_xmit_frame_ring we never
update HW tail because we already return NETDEV_TX_BUSY to the stack and
eventually we hit a tx_timeout.

This issue was fixed by making sure that the second call to
ice_maybe_stop_tx in ice_tx_map is passed a value that is >= the value
that was used on the initial call to ice_maybe_stop_tx in
ice_xmit_frame_ring. This was done by adding the following defines to
make the logic more clear and to reduce the chance of mucking this up
again:

ICE_CACHE_LINE_BYTES		64
ICE_DESCS_PER_CACHE_LINE	(ICE_CACHE_LINE_BYTES / \
				 sizeof(struct ice_tx_desc))
ICE_DESCS_FOR_CTX_DESC		1
ICE_DESCS_FOR_SKB_DATA_PTR	1

The ICE_CACHE_LINE_BYTES being 64 is an assumption being made so we
don't have to figure this out on every pass through the Tx path. Instead
I added a sanity check in ice_probe to verify cache line size and print
a message if it's not 64 Bytes. This will make it easier to file issues
if they are seen when the cache line size is not 64 Bytes when reading
from the GLPCI_CNF2 register.
Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

c585ea42

ice: Fix napi delete calls for remove · 25525b69

由 Dave Ertman 提交于 10月 26, 2018

In the remove path, the vsi->netdev is being set to NULL before the call
to free vectors. This is causing the netif_napi_del call to never be made.

Add a call to ice_napi_del to the same location as the calls to
unregister_netdev and just prior to them. This will use the reverse flow
as the register and netif_napi_add calls.
Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

25525b69

ice: Fix typo in error message · 31082519

由 Anirudh Venkataramanan 提交于 10月 26, 2018

Print should say "Enabling" instead of "Enaabling"
Signed-off-by: NAkeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

31082519

ice: Fix flags for port VLAN · 58297dd1

由 Md Fahad Iqbal Polash 提交于 10月 26, 2018

According to the spec, whenever insert PVID field is set, the VLAN
driver insertion mode should be set to 01b which isn't done currently.
Fix it.
Signed-off-by: NMd Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com>
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

58297dd1

ice: Remove duplicate addition of VLANs in replay path · 9ecd25c2

由 Anirudh Venkataramanan 提交于 10月 26, 2018

ice_restore_vlan and active_vlans were originally put in place to
reprogram VLAN filters in the replay path. This is now done as part
of the much broader VSI rebuild/replay framework. So remove both
ice_restore_vlan and active_vlans
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9ecd25c2

ice: Free VSI contexts during for unload · 33e055fc

由 Victor Raj 提交于 10月 26, 2018

In the unload path, all VSIs are freed. Also free the related VSI
contexts to prevent memory leaks.
Signed-off-by: NVictor Raj <victor.raj@intel.com>
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

33e055fc

ice: Fix dead device link issue with flow control · 0f5d4c21

由 Akeem G Abodunrin 提交于 10月 26, 2018

Setting Rx or Tx pause parameter currently results in link loss on the
interface, requiring the platform/host to be cold power cycled. Fix it.
Signed-off-by: NAkeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0f5d4c21

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功