提交 · cc8e9ebf952699cb6870f1366a4920d05b036e31 · openeuler / Kernel

29 8月, 2016 5 次提交

net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ · cc8e9ebf

由 Eran Ben Elisha 提交于 8月 29, 2016

The driver RQ has two possible configurations: striding RQ and
non-striding RQ.  Until this patch, the driver always reported the
number of hardware WQEs (ring descriptors). For non striding RQ
configuration, this was OK since we have one WQE per pending packet
For striding RQ, multiple packets can fit into one WQE. For better
user experience we normalize the rx_pending parameter (size of wqe/mtu)
as the average ring size in case of striding RQ.

Fixes: 461017cb ('net/mlx5e: Support RX multi-packet WQE ...')
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc8e9ebf

net/mlx5e: Don't wait for SQ completions on close · 6e8dd6d6

由 Saeed Mahameed 提交于 8月 29, 2016

Instead of asking the firmware to flush the SQ (Send Queue) via
asynchronous completions when moved to error, we handle SQ flush
manually (mlx5e_free_tx_descs) same as we did when SQ flush got
timed out or on tx_timeout.

This will reduce SQs flush time and speedup interface down procedure.

Moved mlx5e_free_tx_descs to the end of en_tx.c for tx
critical code locality.

Fixes: 29429f33 ('net/mlx5e: Timeout if SQ doesn't flush during close')
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e8dd6d6

net/mlx5e: Don't post fragmented MPWQE when RQ is disabled · 8484f9ed

由 Saeed Mahameed 提交于 8月 29, 2016

ICO (Internal control operations) SQ (Send Queue) is closed/disabled
after RQ (Receive Queue).  After RQ is closed an ICO SQ completion
might post a fragmented MPWQE (Multi Packet Work Queue Element) into
that RQ.

As on regular RQ post, check if we are allowed to post to that
RQ (RQ is enabled). Cleanup in-progress UMR MPWQE on mlx5e_free_rx_descs
if needed.

Fixes: bc77b240 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8484f9ed

net/mlx5e: Don't wait for RQ completions on close · f2fde18c

由 Saeed Mahameed 提交于 8月 29, 2016

This will significantly reduce receive queue flush time on interface
down.

Instead of asking the firmware to flush the RQ (Receive Queue) via
asynchronous completions when moved to error, we handle RQ flush
manually (mlx5e_free_rx_descs) same as we did when RQ flush got timed
out.

This will reduce RQs flush time and speedup interface down procedure
(ifconfig down) from 6 sec to 0.3 sec on a 48 cores system.

Moved mlx5e_free_rx_descs en_main.c where it is needed, to keep en_rx.c
free form non critical data path code for better code locality.

Fixes: 6cd392a0 ('net/mlx5e: Handle RQ flush in error cases')
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2fde18c

net/mlx5e: Limit UMR length to the device's limitation · fe4c988b

由 Saeed Mahameed 提交于 8月 29, 2016

ConnectX-4 UMR (User Memory Region) MTT translation table offset in WQE
is limited to U16_MAX, before this patch we ignored that limitation and
requested the maximum possible UMR translation length that the netdev
might need (MAX channels * MAX pages per channel).
In case of a system with #cores > 32 and when linear WQE allocation fails,
falling back to using UMR WQEs will cause the RQ (Receive Queue) to get
stuck.

Here we limit UMR length to min(U16_MAX, max required pages) (while
considering the required alignments) on driver load, by default U16_MAX is
sufficient since the default RX rings value guarantees that we are in
range, dynamically (on set_ringparam/set_channels) we will check if the
new required UMR length (num mtts) is still in range, if not, fail the
request.

Fixes: bc77b240 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE')
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe4c988b

27 8月, 2016 5 次提交

rhashtable: fix a memory leak in alloc_bucket_locks() · 9dbeea7f

由 Eric Dumazet 提交于 8月 26, 2016

If vmalloc() was successful, do not attempt a kmalloc_array()

Fixes: 4cf0b354 ("rhashtable: avoid large lock-array allocations")
Reported-by: NCAI Qian <caiqian@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Florian Westphal <fw@strlen.de>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Tested-by: NCAI Qian <caiqian@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9dbeea7f

sfc: fix potential stack corruption from running past stat bitmask · e70c70c3

由 Andrew Rybchenko 提交于 8月 26, 2016

On 32-bit systems, mask is only an array of 3 longs, not 4, so don't try
to write to mask[3].
Also include build-time checks in case the size of the bitmask changes.

Fixes: 3c36a2ad ("sfc: display vadaptor statistics for all interfaces")
Signed-off-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e70c70c3

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 5c1f5b45

由 David S. Miller 提交于 8月 26, 2016

Johan Hedberg says:

====================
pull request: bluetooth 2016-08-25

Here are a couple of important Bluetooth fixes for the 4.8 kernel:

 - Memory leak fix for HCI requests
 - Fix sk_filter handling with L2CAP
 - Fix sock_recvmsg behavior when MSG_TRUNC is not set

Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c1f5b45

team: loadbalance: push lacpdus to exact delivery · c15e07b0

由 Jiri Pirko 提交于 8月 25, 2016

When team is in bridge and LACP is utilized, LACPDU packets are pushed
to userspace using raw socket and there they are processed. However,
since 8626c56c, LACPDU skbs are dropped by bridge rx_handler so
they never reach packet handlers in rx path. Fix this by explicity treat
LACPDUs to be pushed to exact delivery in team rx_handler.
Reported-by: NIdo Schimmel <idosch@mellanox.com>
Fixes: 8626c56c ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c15e07b0

net: hns: dereference ppe_cb->ppe_common_cb if it is non-null · c234af58

由 Colin Ian King 提交于 8月 25, 2016

ppe_cb->ppe_common_cb is being dereferenced before a null check is
being made on it.  If ppe_cb->ppe_common_cb is null then we end up
with a null pointer dereference when assigning dsaf_dev.  Fix this
by moving the initialisation of dsaf_dev once we know
ppe_cb->ppe_common_cb is OK to dereference.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Acked-by: NYisen Zhuang <yisen.zhuang@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c234af58

26 8月, 2016 8 次提交

8139cp: Fix one possible deadloop in cp_rx_poll · b628d611

由 Gao Feng 提交于 8月 25, 2016

When cp_rx_poll does not get enough packet, it will check the rx
interrupt status again. If so, it will jumpt to rx_status_loop again.
But the goto jump resets the rx variable as zero too.

As a result, it causes one possible deadloop. Assume this case,
rx_status_loop only gets the packet count which is less than budget,
and (cpr16(IntrStatus) & cp_rx_intr_mask) condition is always true.
It causes the deadloop happens and system is blocked.
Signed-off-by: NGao Feng <fgao@ikuai8.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b628d611

i40e: Change some init flow for the client · f38ff2ee

由 Anjali Singhai Jain 提交于 8月 24, 2016

This change makes a common flow for Client instance open during init
and reset path. The Client subtask can handle both the cases instead of
making a separate notify_client_of_open call.
Also it may fix a bug during reset where the service task was leaking
some memory and causing issues.

Change-Id: I7232a32fd52b82e863abb54266fa83122f80a0cd
Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f38ff2ee

Revert "phy: IRQ cannot be shared" · c3e70edd

由 Xander Huff 提交于 8月 24, 2016

This reverts:
  commit 33c133cc ("phy: IRQ cannot be shared")

On hardware with multiple PHY devices hooked up to the same IRQ line, allow
them to share it.

Sergei Shtylyov says:
  "I'm not sure now what was the reason I concluded that the IRQ sharing
  was impossible... most probably I thought that the kernel IRQ handling
  code exited the loop over the IRQ actions once IRQ_HANDLED was returned
  -- which is obviously not so in reality..."
Signed-off-by: NXander Huff <xander.huff@ni.com>
Signed-off-by: NNathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3e70edd

net: dsa: bcm_sf2: Fix race condition while unmasking interrupts · 4f101c47

由 Florian Fainelli 提交于 8月 24, 2016

We kept shadow copies of which interrupt sources we have enabled and
disabled, but due to an order bug in how intrl2_mask_clear was defined,
we could run into the following scenario:

CPU0					CPU1
intrl2_1_mask_clear(..)
sets INTRL2_CPU_MASK_CLEAR
					bcm_sf2_switch_1_isr
					read INTRL2_CPU_STATUS and masks with stale
					irq1_mask value
updates irq1_mask value

Which would make us loop again and again trying to process and interrupt
we are not clearing since our copy of whether it was enabled before
still indicates it was not. Fix this by updating the shadow copy first,
and then unasking at the HW level.

Fixes: 246d7f77 ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f101c47

qdisc: fix a module refcount leak in qdisc_create_dflt() · 166ee5b8

由 Eric Dumazet 提交于 8月 24, 2016

Should qdisc_alloc() fail, we must release the module refcount
we got right before.

Fixes: 6da7c8fc ("qdisc: allow setting default queuing discipline")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

166ee5b8

tipc: fix the error handling in tipc_udp_enable() · a5de125d

由 Wei Yongjun 提交于 8月 24, 2016

Fix to return a negative error code in enable_mcast() error handling
case, and release udp socket when necessary.

Fixes: d0f91938 ("tipc: add ip/udp media type")
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5de125d

Bluetooth: Fix hci_sock_recvmsg when MSG_TRUNC is not set · 4f34228b

由 Luiz Augusto von Dentz 提交于 8月 15, 2016

Similar to bt_sock_recvmsg MSG_TRUNC shall be checked using the original
flags not msg_flags.
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

4f34228b

Bluetooth: Fix bt_sock_recvmsg when MSG_TRUNC is not set · 90a56f72

由 Luiz Augusto von Dentz 提交于 8月 12, 2016

Commit b5f34f94 attempt to introduce
proper handling for MSG_TRUNC but recv and variants should still work
as read if no flag is passed, but because the code may set MSG_TRUNC to
msg->msg_flags that shall not be used as it may cause it to be behave as
if MSG_TRUNC is always, so instead of using it this changes the code to
use the flags parameter which shall contain the original flags.
Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

90a56f72

25 8月, 2016 2 次提交

mlxsw: router: Enable neighbors to be created on stacked devices · 51af96b5

由 Yotam Gigi 提交于 8月 24, 2016

Make the function mlxsw_router_neigh_construct search the rif according
to the neighbour dev other than the dev that was passed to the ndo, thus
allowing creating neigbhours upon stacked devices.

Fixes: 6cf3c971 ("mlxsw: spectrum_router: Add private neigh table")
Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51af96b5

mlxsw: spectrum: Add missing flood to router port · f888f587

由 Ido Schimmel 提交于 8月 24, 2016

In case we have a layer 3 interface on top of a bridge (VLAN / FID RIF),
then we should flood the following packet types to the router:

* Broadcast: If DIP is the broadcast address of the interface, then we
need to be able to get it to CPU by trapping it following route lookup.

* Reserved IP multicast (224.0.0.X): Some control packets (e.g. OSPF)
use this range and are trapped in the router block.

Fixes: 99f44bb3 ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f888f587

24 8月, 2016 14 次提交

Bluetooth: split sk_filter in l2cap_sock_recv_cb · dbb50887

由 Daniel Borkmann 提交于 7月 27, 2016

During an audit for sk_filter(), we found that rx_busy_skb handling
in l2cap_sock_recv_cb() and l2cap_sock_recvmsg() looks not quite as
intended.

The assumption from commit e328140f ("Bluetooth: Use event-driven
approach for handling ERTM receive buffer") is that errors returned
from sock_queue_rcv_skb() are due to receive buffer shortage. However,
nothing should prevent doing a setsockopt() with SO_ATTACH_FILTER on
the socket, that could drop some of the incoming skbs when handled in
sock_queue_rcv_skb().

In that case sock_queue_rcv_skb() will return with -EPERM, propagated
from sk_filter() and if in L2CAP_MODE_ERTM mode, wrong assumption was
that we failed due to receive buffer being full. From that point onwards,
due to the to-be-dropped skb being held in rx_busy_skb, we cannot make
any forward progress as rx_busy_skb is never cleared from l2cap_sock_recvmsg(),
due to the filter drop verdict over and over coming from sk_filter().
Meanwhile, in l2cap_sock_recv_cb() all new incoming skbs are being
dropped due to rx_busy_skb being occupied.

Instead, just use __sock_queue_rcv_skb() where an error really tells that
there's a receive buffer issue. Split the sk_filter() and enable it for
non-segmented modes at queuing time since at this point in time the skb has
already been through the ERTM state machine and it has been acked, so dropping
is not allowed. Instead, for ERTM and streaming mode, call sk_filter() in
l2cap_data_rcv() so the packet can be dropped before the state machine sees it.

Fixes: e328140f ("Bluetooth: Use event-driven approach for handling ERTM receive buffer")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

dbb50887

Bluetooth: Fix memory leak at end of hci requests · 9afee949

由 Frederic Dalleau 提交于 8月 23, 2016

In hci_req_sync_complete the event skb is referenced in hdev->req_skb.
It is used (via hci_req_run_skb) from either __hci_cmd_sync_ev which will
pass the skb to the caller, or __hci_req_sync which leaks.

unreferenced object 0xffff880005339a00 (size 256):
  comm "kworker/u3:1", pid 1011, jiffies 4294671976 (age 107.389s)
  backtrace:
    [<ffffffff818d89d9>] kmemleak_alloc+0x49/0xa0
    [<ffffffff8116bba8>] kmem_cache_alloc+0x128/0x180
    [<ffffffff8167c1df>] skb_clone+0x4f/0xa0
    [<ffffffff817aa351>] hci_event_packet+0xc1/0x3290
    [<ffffffff8179a57b>] hci_rx_work+0x18b/0x360
    [<ffffffff810692ea>] process_one_work+0x14a/0x440
    [<ffffffff81069623>] worker_thread+0x43/0x4d0
    [<ffffffff8106ead4>] kthread+0xc4/0xe0
    [<ffffffff818dd38f>] ret_from_fork+0x1f/0x40
    [<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: NFrédéric Dalleau <frederic.dalleau@collabora.co.uk>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

9afee949

net: diag: Fix refcnt leak in error path destroying socket · d7226c7a

由 David Ahern 提交于 8月 23, 2016

inet_diag_find_one_icsk takes a reference to a socket that is not
released if sock_diag_destroy returns an error. Fix by changing
tcp_diag_destroy to manage the refcnt for all cases and remove
the sock_put calls from tcp_abort.

Fixes: c1e64e29 ("net: diag: Support destroying TCP sockets")
Reported-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7226c7a

tun: fix transmit timestamp support · 7b996243

由 Soheil Hassas Yeganeh 提交于 8月 23, 2016

Instead of using sock_tx_timestamp, use skb_tx_timestamp to record
software transmit timestamp of a packet.

sock_tx_timestamp resets and overrides the tx_flags of the skb.
The function is intended to be called from within the protocol
layer when creating the skb, not from a device driver. This is
inconsistent with other drivers and will cause issues for TCP.

In TCP, we intend to sample the timestamps for the last byte
for each sendmsg/sendpage. For that reason, tcp_sendmsg calls
tcp_tx_timestamp only with the last skb that it generates.
For example, if a 128KB message is split into two 64KB packets
we want to sample the SND timestamp of the last packet. The current
code in the tun driver, however, will result in sampling the SND
timestamp for both packets.

Also, when the last packet is split into smaller packets for
retranmission (see tcp_fragment), the tun driver will record
timestamps for all of the retransmitted packets and not only the
last packet.

Fixes: eda29772 (tun: Support software transmit time stamping.)
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NFrancis Yan <francisyyan@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b996243

udp: get rid of SLAB_DESTROY_BY_RCU allocations · 75d855a5

由 Eric Dumazet 提交于 8月 23, 2016

After commit ca065d0c ("udp: no longer use SLAB_DESTROY_BY_RCU")
we do not need this special allocation mode anymore, even if it is
harmless.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75d855a5

sctp: fix overrun in sctp_diag_dump_one() · 232cb53a

由 Lance Richardson 提交于 8月 23, 2016

The function sctp_diag_dump_one() currently performs a memcpy()
of 64 bytes from a 16 byte field into another 16 byte field. Fix
by using correct size, use sizeof to obtain correct size instead
of using a hard-coded constant.

Fixes: 8f840e47 ("sctp: add the sctp_diag.c file")
Signed-off-by: NLance Richardson <lrichard@redhat.com>
Reviewed-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

232cb53a

dwc_eth_qos: fix interrupt enable race · a8184003

由 Rabin Vincent 提交于 8月 23, 2016

We currently enable interrupts before we enable NAPI. If an RX interrupt
hits before we enabled NAPI then the NAPI callback is never called and
we leave the hardware with RX interrupts disabled, which of course leads
us to never handling received packets. Fix this by moving the interrupt
enable to after we've enable NAPI and the reclaim tasklet.

Fixes: cd5e4123 ("dwc_eth_qos: do phy_start before resetting hardware")
Signed-off-by: NRabin Vincent <rabinv@axis.com>
Signed-off-by: NLars Persson <larper@axis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8184003

net: lpc_eth: Check clk_prepare_enable() error · 53080fe9

由 Fabio Estevam 提交于 8月 23, 2016

clk_prepare_enable() may fail, so we should better check its return
value and propagate it in the case of failure

While at it, replace __lpc_eth_clock_enable() with a plain
clk_prepare_enable/clk_disable_unprepare() call in order to
simplify the code.
Signed-off-by: NFabio Estevam <fabio.estevam@nxp.com>
Acked-by: NVladimir Zapolskiy <vz@mleia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53080fe9

net: mv88e6xxx: Fix ingress rate removal for mv6131 chips · 1bc261fa

由 Jamie Lentin 提交于 8月 22, 2016

The PORT_RATE_CONTROL register works differently on 88e6095/6095f/6131
in comparison to 6123/61/65, and 0x0 disables. The distinction was lost
Linux 4.1 --> 4.2
Signed-off-by: NJamie Lentin <jm@lentin.co.uk>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bc261fa

phy: micrel: Reenable interrupts during resume for ksz9031 · f64f1482

由 Xander Huff 提交于 8月 22, 2016

Like the ksz8081, the ksz9031 has the behavior where it will clear the
interrupt enable bits when leaving power down. This takes advantage of the
solution provided by f5aba91d.
Signed-off-by: NXander Huff <xander.huff@ni.com>
Signed-off-by: NNathan Sullivan <nathan.sullivan@ni.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f64f1482

tcp: properly scale window in tcp_v[46]_reqsk_send_ack() · 20a2b49f

由 Eric Dumazet 提交于 8月 22, 2016

When sending an ack in SYN_RECV state, we must scale the offered
window if wscale option was negotiated and accepted.

Tested:
 Following packetdrill test demonstrates the issue :

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

// Establish a connection.
+0 < S 0:0(0) win 20000 <mss 1000,sackOK,wscale 7, nop, TS val 100 ecr 0>
+0 > S. 0:0(0) ack 1 win 28960 <mss 1460,sackOK, TS val 100 ecr 100, nop, wscale 7>

+0 < . 1:11(10) ack 1 win 156 <nop,nop,TS val 99 ecr 100>
// check that window is properly scaled !
+0 > . 1:1(0) ack 1 win 226 <nop,nop,TS val 200 ecr 100>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20a2b49f

gianfar: fix size of scatter-gathered frames · 6c389fc9

由 Zefir Kurtisi 提交于 8月 22, 2016

The current scatter-gather logic in gianfar is flawed, since
it does not consider the eTSEC's RxBD 'Data Length' field is
context depening: for the last fragment it contains the full
frame size, while fragments contain the fragment size, which
equals the value written to register MRBLR.

This causes data corruption as soon as the hardware starts
to fragment receiving frames. As a result, the size of
fragmented frames is increased by
(nr_frags - 1) * MRBLR

We first noticed this issue working with DSA, where an ICMP
request sized 1472 bytes causes the scatter-gather logic to
kick in. The full Ethernet frame (1518) gets increased by
DSA (4), GMAC_FCB_LEN (8), and FSL_GIANFAR_DEV_HAS_TIMER
(priv->padding=8) to a total of 1538 octets, which is
fragmented by the hardware and reconstructed by the driver
to a 3074 octet frame.

This patch fixes the problem by adjusting the size of
the last fragment.

It was tested by setting MRBLR to different multiples of
64, proving correct scatter-gather operation on frames
with up to 9000 octets in size.
Signed-off-by: NZefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c389fc9

gianfar: prevent fragmentation in DSA environments · b323431b

由 Zefir Kurtisi 提交于 8月 22, 2016

The eTSEC register MRBLR defines the maximum space in
the RX buffers and is set to 1536 by gianfar. This
reasonably covers the common use case where the MTU
is kept at default 1500. In that case, the largest
Ethernet frame size of 1518 plus an optional
GMAC_FCB_LEN of 8, and an additional padding of 8
to handle FSL_GIANFAR_DEV_HAS_TIMER totals to 1534
and nicely fit within the chosen MRBLR.

Alas, if the eTSEC is attached to a DSA enabled switch,
the (E)DSA header extension (4 or 8 bytes) causes every
maximum sized frame to be fragmented by the hardware.

This patch increases the maximum RX buffer size by 8
and rounds up to the next multiple of 64, which the
hardware's defines as RX buffer granularity.
Signed-off-by: NZefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b323431b

udp: fix poll() issue with zero sized packets · e83c6744

由 Eric Dumazet 提交于 8月 23, 2016

Laura tracked poll() [and friends] regression caused by commit
e6afc8ac ("udp: remove headers from UDP packets before queueing")

udp_poll() needs to know if there is a valid packet in receive queue,
even if its payload length is 0.

Change first_packet_length() to return an signed int, and use -1
as the indication of an empty queue.

Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Reported-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NLaura Abbott <labbott@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e83c6744

23 8月, 2016 6 次提交

net sched: fix encoding to use real length · 28a10c42

由 Jamal Hadi Salim 提交于 8月 22, 2016

Encoding of the metadata was using the padded length as opposed to
the real length of the data which is a bug per specification.
This has not been an issue todate because all metadatum specified
so far has been 32 bit where aligned and data length are the same width.
This also includes a bug fix for validating the length of a u16 field.
But since there is no metadata of size u16 yes we are fine to include it
here.

While at it get rid of magic numbers.

Fixes: ef6980b6 ("net sched: introduce IFE action")
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28a10c42

qed: FLR of active VFs might lead to FW assert · 4870e704

由 Yuval Mintz 提交于 8月 22, 2016

Driver never bothered marking the VF's vport with the VF's sw_fid.
As a result, FLR flows are not going to clean those vports.

If the vport was active when FLRed, re-activating it would lead
to a FW assertion.

Fixes: dacd88d6 ("qed: IOV l2 functionality")
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4870e704

net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset · c0451fe1

由 Shmulik Ladkani 提交于 8月 21, 2016

In b8247f09,

   "net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs"

gso skbs arriving from an ingress interface that go through UDP
tunneling, are allowed to be fragmented if the resulting encapulated
segments exceed the dst mtu of the egress interface.

This aligned the behavior of gso skbs to non-gso skbs going through udp
encapsulation path.

However the non-gso vs gso anomaly is present also in the following
cases of a GRE tunnel:
 - ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
   (e.g. OvS vport-gre with df_default=false)
 - ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set

In both of the above cases, the non-gso skbs get fragmented, whereas the
gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped,
as they don't go through the segment+fragment code path.

Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set.

Tunnels that do set IP_DF, will not go to fragmentation of segments.
This preserves behavior of ip_gre in (the default) pmtudisc mode.

Fixes: b8247f09 ("net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs")
Reported-by: Nwenxu <wenxu@ucloud.cn>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
Tested-by: Nwenxu <wenxu@ucloud.cn>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0451fe1

net: ipv6: Remove addresses for failures with strict DAD · 85b51b12

由 Mike Manning 提交于 8月 18, 2016

If DAD fails with accept_dad set to 2, global addresses and host routes
are incorrectly left in place. Even though disable_ipv6 is set,
contrary to documentation, the addresses are not dynamically deleted
from the interface. It is only on a subsequent link down/up that these
are removed. The fix is not only to set the disable_ipv6 flag, but
also to call addrconf_ifdown(), which is the action to carry out when
disabling IPv6. This results in the addresses and routes being deleted
immediately. The DAD failure for the LL addr is determined as before
via netlink, or by the absence of the LL addr (which also previously
would have had to be checked for in case of an intervening link down
and up). As the call to addrconf_ifdown() requires an rtnl lock, the
logic to disable IPv6 when DAD fails is moved to addrconf_dad_work().

Previous behavior:

root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
net.ipv6.conf.eth3.accept_dad = 2
root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
    inet6 2000::10/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe43:dd5a/64 scope link tentative dadfailed
       valid_lft forever preferred_lft forever
root@vm1:/# ip -6 route show dev eth3
2000::/64  proto kernel  metric 256
fe80::/64  proto kernel  metric 256
root@vm1:/# ip link set down eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
root@vm1:/# ip -6 route show dev eth3
root@vm1:/#

New behavior:

root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
net.ipv6.conf.eth3.accept_dad = 2
root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
root@vm1:/# ip link set up eth3
root@vm1:/# ip -6 addr show dev eth3
root@vm1:/# ip -6 route show dev eth3
root@vm1:/#
Signed-off-by: NMike Manning <mmanning@brocade.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85b51b12

include/uapi/linux/ipx.h: fix conflicting defitions with glibc netipx/ipx.h · 53dc65d4

由 Mikko Rapeli 提交于 8月 22, 2016

Fixes these compiler warnings via libc-compat.h when glibc netipx/ipx.h is
included before linux/ipx.h:

./linux/ipx.h:9:8: error: redefinition of ‘struct sockaddr_ipx’
./linux/ipx.h:26:8: error: redefinition of ‘struct ipx_route_definition’
./linux/ipx.h:32:8: error: redefinition of ‘struct ipx_interface_definition’
./linux/ipx.h:49:8: error: redefinition of ‘struct ipx_config_data’
./linux/ipx.h:58:8: error: redefinition of ‘struct ipx_route_def’
Signed-off-by: NMikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53dc65d4

include/uapi/linux/openvswitch.h: use __u32 from linux/types.h · a1d1f65f

由 Mikko Rapeli 提交于 8月 22, 2016

Kernel uapi header are supposed to use them. Fixes userspace compile error:

linux/openvswitch.h:583:2: error: unknown type name ‘uint32_t’
Signed-off-by: NMikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1d1f65f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功