提交 · de6d5cdf881353f83006d5f3e28ac4fffd42145e · openeuler / raspberrypi-kernel

06 9月, 2009 2 次提交

net_sched: make cls_ops->change and cls_ops->delete optional · de6d5cdf

由 Patrick McHardy 提交于 9月 04, 2009

Some schedulers don't support creating, changing or deleting classes.
Make the respective callbacks optionally and consistently return
-EOPNOTSUPP for unsupported operations, instead of currently either
-EOPNOTSUPP, -ENOSYS or no error.

In case of sch_prio and sch_multiq, the removed operations additionally
checked for an invalid class. This is not necessary since the class
argument can only orginate from ->get() or in case of ->change is 0
for creation of new classes, in which case ->change() incorrectly
returned -ENOENT.

As a side-effect, this patch fixes a possible (root-only) NULL pointer
function call in sch_ingress, which didn't implement a so far mandatory
->delete() operation.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de6d5cdf

net_sched: make cls_ops->tcf_chain() optional · 71ebe5e9

由 Patrick McHardy 提交于 9月 04, 2009

Some qdiscs don't support attaching filters. Handle this centrally in
cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71ebe5e9

05 9月, 2009 29 次提交

net_sched: fix class grafting errno codes · c9f1d038

由 Patrick McHardy 提交于 9月 04, 2009

If the parent qdisc doesn't support classes, use EOPNOTSUPP.
If the parent class doesn't exist, use ENOENT. Currently EINVAL
is returned in both cases.

Additionally check whether grafting is supported and remove a now
unnecessary graft function from sch_ingress.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9f1d038

netlink: silence compiler warning · b1f57195

由 Brian Haley 提交于 9月 04, 2009

  CC      net/netlink/genetlink.o
net/netlink/genetlink.c: In function ‘genl_register_mc_group’:
net/netlink/genetlink.c:139: warning: ‘err’ may be used uninitialized in this function

From following the code 'err' is initialized, but set it to zero to
silence the warning.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1f57195

sctp: Catch bogus stream sequence numbers · f1751c57

由 Vlad Yasevich 提交于 9月 04, 2009

Since our TSN map is capable of holding at most a 4K chunk gap,
there is no way that during this gap, a stream sequence number
(unsigned short) can wrap such that the new number is smaller
then the next expected one.  If such a case is encountered,
this is a protocol violation.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

f1751c57

sctp: remove dup code in net/sctp/output.c · be297143

由 Wei Yongjun 提交于 9月 04, 2009

Use sctp_packet_reset() instead of dup code.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

be297143

sctp: Sysctl configuration for IPv4 Address Scoping · 72388433

由 Bhaskar Dutta 提交于 9月 03, 2009

This patch introduces a new sysctl option to make IPv4 Address Scoping
configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.

In networking environments where DNAT rules in iptables prerouting
chains convert destination IP's to link-local/private IP addresses,
SCTP connections fail to establish as the INIT chunk is dropped by the
kernel due to address scope match failure.
For example to support overlapping IP addresses (same IP address with
different vlan id) a Layer-5 application listens on link local IP's,
and there is a DNAT rule that maps the destination IP to a link local
IP. Such applications never get the SCTP INIT if the address-scoping
draft is strictly followed.

This sysctl configuration allows SCTP to function in such
unconventional networking environments.

Sysctl options:
0 - Disable IPv4 address scoping draft altogether
1 - Enable IPv4 address scoping (default, current behavior)
2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
3 - Enable address scoping but allow IPv4 link local address in init/init-ack
Signed-off-by: NBhaskar Dutta <bhaskar.dutta@globallogic.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

72388433

sctp: Get rid of an extra routing lookup when adding a transport. · 8da645e1

由 Vlad Yasevich 提交于 9月 04, 2009

We used to perform 2 routing lookups for a new transport: one
just for path mtu detection, and one to actually route to destination
and path mtu update when sending a packet.  There is no point in doing
both of them, especially since the first one just for path mtu doesn't
take into account source address and sometimes gives the wrong route,
causing path mtu updates anyway.

We now do just the one call to do both route to destination and get
path mtu updates.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

8da645e1

sctp: Correctly track if AUTH has been bundled. · 4007cc88

由 Vlad Yasevich 提交于 9月 04, 2009

We currently track if AUTH has been bundled using the 'auth'
pointer to the chunk.  However, AUTH is disallowed after DATA
is already in the packet, so we need to instead use the
'has_auth' field.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

4007cc88

sctp: fix to reset packet information after packet transmit · d521c08f

由 Wei Yongjun 提交于 9月 02, 2009

The packet information does not reset after packet transmit, this
may cause some problems such as following DATA chunk be sent without
AUTH chunk, even if the authentication of DATA chunk has been
requested by the peer.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

d521c08f

sctp: Failover transmitted list on transport delete · 31b02e15

由 Vlad Yasevich 提交于 9月 04, 2009

Add-IP feature allows users to delete an active transport.  If that
transport has chunks in flight, those chunks need to be moved to another
transport or association may get into unrecoverable state.
Reported-by: NRafael Laufer <rlaufer@cisco.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

31b02e15

sctp: Fix SCTP_MAXSEG socket option to comply to spec. · f68b2e05

由 Vlad Yasevich 提交于 9月 04, 2009

We had a bug that we never stored the user-defined value for
MAXSEG when setting the value on an association.  Thus future
PMTU events ended up re-writing the frag point and increasing
it past user limit.  Additionally, when setting the option on
the socket/endpoint, we effect all current associations, which
is against spec.

Now, we store the user 'maxseg' value along with the computed
'frag_point'.  We inherit 'maxseg' from the socket at association
creation and use it as an upper limit for 'frag_point' when its
set.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

f68b2e05

sctp: Don't do NAGLE delay on large writes that were fragmented small · cb95ea32

由 Vlad Yasevich 提交于 9月 04, 2009

SCTP will delay the last part of a large write due to NAGLE, if that
part is smaller then MTU. Since we are doing large writes, we might
as well send the last portion now instead of waiting untill the next
large write happens. The small portion will be sent as is regardless,
so it's better to not delay it.

This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
and Doug Graham <dgraham@nortel.com>. Many thanks go out to them.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

cb95ea32

sctp: Nagle delay should be based on path mtu · b29e7907

由 Vlad Yasevich 提交于 9月 04, 2009

The decision to delay due to Nagle should be based on the path mtu
and future packet size.  We currently incorrectly base it on
'frag_point' which is the SCTP DATA segment size, and also we do
not count DATA chunk header overhead in the computation.  This
actuall allows situations where a user can set low 'frag_point',
and then send small messages without delay.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

b29e7907

sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN. · d4d6fb57

由 Vlad Yasevich 提交于 9月 04, 2009

We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
This results in an hung association if the remote only uses
SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
closing.  The reason for that is that we simply honor the a_rwnd
from the sack, but since we faked it to be 0, we enter 0-window
probing.  The fix is to use the peers old rwnd and add our flight
size to it.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

d4d6fb57

sctp: drop a_rwnd to 0 when receive buffer overflows. · 4d3c46e6

由 Vlad Yasevich 提交于 9月 04, 2009

SCTP has a problem that when small chunks are used, it is possible
to exhaust the receiver buffer without fully closing receive window.
This happens due to all overhead that we have account for with small
messages. To fix this, when receive buffer is exceeded, we'll drop
the window to 0 and save the 'drop' portion. When application starts
reading data and freeing up recevie buffer space, we'll wait until
we've reached the 'drop' window and then add back this 'drop' one
mtu at a time. This worked well in testing and under stress produced
rather even recovery.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

4d3c46e6

sctp: Clear fast_recovery on the transport when T3 timer expires. · 33ce8281

由 Vlad Yasevich 提交于 9月 04, 2009

If T3 timer expires, we are retransmitting data due to timeout any
any fast recovery is null and void.  We can clear the fast recovery
flag.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

33ce8281

sctp: Fix error count increments that were results of HEARTBEATS · b9f84786

由 Vlad Yasevich 提交于 8月 26, 2009

SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
errors agains a given transport or endpoint.  As such, we
should increment the error counts for only for unacknowledged
HB, otherwise we detect failure too soon.  This goes for both
the overall error count and the path error count.

Now, there is a difference in how the detection is done
between the two.  The path error detection is done after
the increment, so to detect it properly, we actually need
to exceed the path threshold.  The overall error detection
is done _BEFORE_ the increment.  Thus to detect the failure,
it's enough for the error count to match the threshold.
This is why all the state functions use '>=' to detect failure,
while path detection uses '>'.

Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first
proposed patches to fix this issue and made me re-read the spec
and the code to figure out how this cruft really works.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

b9f84786

sctp: use proc_create() · d71a09ed

由 Alexey Dobriyan 提交于 8月 23, 2009

create_proc_entry() is deprecated (not formally, though).
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

d71a09ed

sctp: fix check the chunk length of received HEARTBEAT-ACK chunk · dadb50cc

由 Wei Yongjun 提交于 8月 22, 2009

The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK
that contains the Heartbeat Information field copied from the
received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk
must have a length of:
  sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t)

A badly formatted HB-ACK chunk, it is possible that we may access
invalid memory.  We should really make sure that the chunk format
is what we expect, before attempting to touch the data.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

dadb50cc

sctp: drop SHUTDOWN chunk if the TSN is less than the CTSN · a2f36eec

由 Wei Yongjun 提交于 8月 22, 2009

If Cumulative TSN Ack field of SHUTDOWN chunk is less than the
Cumulative TSN Ack Point then drop the SHUTDOWN chunk.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

a2f36eec

sctp: Send user messages to the lower layer as one · 9c5c62be

由 Vlad Yasevich 提交于 8月 10, 2009

Currenlty, sctp breaks up user messages into fragments and
sends each fragment to the lower layer by itself.  This means
that for each fragment we go all the way down the stack
and back up.  This also discourages bundling of multiple
fragments when they can fit into a sigle packet (ex: due
to user setting a low fragmentation threashold).

We introduce a new command SCTP_CMD_SND_MSG and hand the
whole message down state machine.  The state machine and
the side-effect parser will cork the queue, add all chunks
from the message to the queue, and then un-cork the queue
thus causing the chunks to get transmitted.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

9c5c62be

sctp: Try to encourage SACK bundling with DATA. · 5d7ff261

由 Vlad Yasevich 提交于 8月 07, 2009

If the association has a SACK timer pending and now DATA queued
to be send, we'll try to bundle the SACK with the next application send.
As such, try encourage bundling by accounting for SACK in the size
of the first chunk fragment.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

5d7ff261

sctp: Generate SACKs when actually sending outbound DATA · e83963b7

由 Vlad Yasevich 提交于 8月 07, 2009

We are now trying to bundle SACKs when we have outbound
DATA to send.  However, there are situations where this
outbound DATA will not be sent (due to congestion or 
available window).  In such cases it's ok to wait for the
timer to expire.  This patch refactors the sending code
so that betfore attempting to bundle the SACK we check
to see if the DATA will actually be transmitted.

Based on eirlier works for Doug Graham <dgraham@nortel.com> and
Wei Youngjun <yjwei@cn.fujitsu.com>.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

e83963b7

sctp: Fix data segmentation with small frag_size · 3e62abf9

由 Vlad Yasevich 提交于 9月 04, 2009

Since an application may specify the maximum SCTP fragment size
that all data should be fragmented to, we need to fix how
we do segmentation.   Right now, if a user specifies a small
fragment size, the segment size can go negative in the presence
of AUTH or COOKIE_ECHO bundling.

What we need to do is track the largest possbile DATA chunk that
can fit into the mtu.  Then if the fragment size specified is
bigger then this maximum length, we'll shrink it down.  Otherwise,
we just use the smaller segment size without changing it further.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

3e62abf9

sctp: Disallow new connection on a closing socket · bec9640b

由 Vlad Yasevich 提交于 7月 30, 2009

If a socket has a lot of association that are in the process of
of being closed/aborted, it is possible for a remote to establish
new associations during the time period that the old ones are shutting
down. If this was a result of a close() call, there will be no socket
and will cause a memory leak. We'll prevent this by setting the
socket state to CLOSING and disallow new associations when in this state.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

bec9640b

sctp: Fix piggybacked ACKs · af87b823

由 Doug Graham 提交于 7月 29, 2009

This patch corrects the conditions under which a SACK will be piggybacked
on a DATA packet. The previous condition was incorrect due to a
misinterpretation of RFC 4960 and/or RFC 2960. Specifically, the
following paragraph from section 6.2 had not been implemented correctly:

Before an endpoint transmits a DATA chunk, if any received DATA
chunks have not been acknowledged (e.g., due to delayed ack), the
sender should create a SACK and bundle it with the outbound DATA
chunk, as long as the size of the final SCTP packet does not exceed
the current MTU. See Section 6.2.

When about to send a DATA chunk, the code now checks to see if the SACK
timer is running. If it is, we know we have a SACK to send to the
peer, so we append the SACK (assuming available space in the packet)
and turn off the timer. For a simple request-response scenario, this
will result in the SACK being bundled with the response, meaning the
the SACK is received quickly by the client, and also meaning that no
separate SACK packet needs to be sent by the server to acknowledge the
request. Prior to this patch, a separate SACK packet would have been
sent by the server SCTP only after its delayed-ACK timer had expired
(usually 200ms). This is wasteful of bandwidth, and can also have a
major negative impact on performance due the interaction of delayed ACKs
with the Nagle algorithm.
Signed-off-by: NDoug Graham <dgraham@nortel.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

af87b823

sctp: release cached route when the transport goes down. · 40187886

由 Vlad Yasevich 提交于 6月 23, 2009

When the sctp transport is marked down, we can release the
cached route and force a new lookup when attempting to use
this transport for anything.  This way, if a better route
or source address is available, we'll try to use it.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

40187886

sctp: update the route for non-active transports after addresses are added · 3cd9749c

由 Wei Yongjun 提交于 6月 16, 2009

Update the route and saddr entries for the non-active transports as some
of the added addresses can be used as better source addresses, or may
be there is a better route.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

3cd9749c

sctp: check the unrecognized ASCONF parameter before access it · 44e65c1e

由 Wei Yongjun 提交于 6月 16, 2009

This patch fix to check the unrecognized ASCONF parameter before
access it.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

44e65c1e

sctp: avoid overwrite the return value of sctp_process_asconf_ack() · 425e0f68

由 Wei Yongjun 提交于 6月 16, 2009

The return value of sctp_process_asconf_ack() may be
overwritten while process parameters with no error.
This patch fixed the problem.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

425e0f68

04 9月, 2009 2 次提交

ipv6: Fix tcp_v6_send_response(): it didn't set skb transport header · a8fdf2b3

由 Cosmin Ratiu 提交于 9月 03, 2009

Here is a patch which fixes an issue observed when using TCP over IPv6
and AH from IPsec.

When a connection gets closed the 4-way method and the last ACK from
the server gets dropped, the subsequent FINs from the client do not
get ACKed because tcp_v6_send_response does not set the transport
header pointer. This causes ah6_output to try to allocate a lot of
memory, which typically fails, so the ACKs never make it out of the
stack.

I have reproduced the problem on kernel 2.6.7, but after looking at
the latest kernel it seems the problem is still there.
Signed-off-by: NCosmin Ratiu <cratiu@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8fdf2b3

vlan: adds drops accounting · 1a123a31

由 Eric Dumazet 提交于 9月 03, 2009

Its hard to tell if vlans are dropping frames, since
every frame given to vlan_???_start_xmit() functions
is accounted as fully transmitted by lower device.

We can test dev_queue_xmit() return values to
properly account for dropped frames.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a123a31

03 9月, 2009 7 次提交

net: Remove debugging code · 55f9d678

由 Eric Dumazet 提交于 9月 03, 2009

Remove a debugging aid I accidently left in previous 'cleanup' patch
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55f9d678

vlan: enable multiqueue xmits · 2f8bc32b

由 Eric Dumazet 提交于 9月 03, 2009

vlan_dev_hard_start_xmit() & vlan_dev_hwaccel_hard_start_xmit()
select txqueue number 0, instead of using index provided by
skb_get_queue_mapping().

This is not correct after commit 2e59af3d
[vlan: multiqueue vlan device] because
txq->tx_packets  & txq->tx_bytes changes are performed on
a single location, and not the right locking.

Fix is to take the appropriate struct netdev_queue pointer
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f8bc32b

net: net/core/dev.c cleanups · d1b19dff

由 Eric Dumazet 提交于 9月 03, 2009

Pure style cleanup patch before surgery :)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1b19dff

atm/br2684: netif_stop_queue() when atm device busy and netif_wake_queue()... · 137742cf

由 Karl Hiramoto 提交于 9月 02, 2009

atm/br2684: netif_stop_queue() when atm device busy and netif_wake_queue() when we can send packets again.

This patch removes the call to dev_kfree_skb() when the atm device is busy.
Calling dev_kfree_skb() causes heavy packet loss then the device is under
heavy load, the more correct behavior should be to stop the upper layers,
then when the lower device can queue packets again wake the upper layers.
Signed-off-by: NKarl Hiramoto <karl@hiramoto.org>
Signed-off-by: NChas Williams <chas@cmf.nrl.navy.mil>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

137742cf

tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076

由 Wu Fengguang 提交于 9月 02, 2009

This fixed a lockdep warning which appeared when doing stress
memory tests over NFS:

	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.

	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock

	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
			tcp_send_fin => alloc_skb_fclone => page reclaim

David raised a concern that if the allocation fails in tcp_send_fin(), and it's
GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
for the allocation to succeed.

But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
loop endlessly under memory pressure.

CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
CC: David S. Miller <davem@davemloft.net>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa133076

net/ethtool: Add support for the ethtool feature to flash firmware image from a specified file. · 05c6a8d7

由 Ajit Khaparde 提交于 9月 02, 2009

This patch adds support to flash a firmware image to a device using ethtool.
The driver gets the filename of the firmware image and flashes the image
using the request firmware path.

The region "on the chip" to be flashed can be specified by an option.
It is upto the device driver to enumerate the region number passed by ethtool,
to the region to be flashed.

The default behavior is to flash all the regions on the chip.
Signed-off-by: NAjit Khaparde <ajitk@serverengines.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05c6a8d7

ip: Report qdisc packet drops · 6ce9e7b5

由 Eric Dumazet 提交于 9月 02, 2009

Christoph Lameter pointed out that packet drops at qdisc level where not
accounted in SNMP counters. Only if application sets IP_RECVERR, drops
are reported to user (-ENOBUFS errors) and SNMP counters updated.

IP_RECVERR is used to enable extended reliable error message passing,
but these are not needed to update system wide SNMP stats.

This patch changes things a bit to allow SNMP counters to be updated,
regardless of IP_RECVERR being set or not on the socket.

Example after an UDP tx flood
# netstat -s 
...
IP:
    1487048 outgoing packets dropped
...
Udp:
...
    SndbufErrors: 1487048


send() syscalls, do however still return an OK status, to not
break applications.

Note : send() manual page explicitly says for -ENOBUFS error :

 "The output queue for a network interface was full.
  This generally indicates that the interface has stopped sending,
  but may be caused by transient congestion.
  (Normally, this does not occur in Linux. Packets are just silently
  dropped when a device queue overflows.) "

This is not true for IP_RECVERR enabled sockets : a send() syscall
that hit a qdisc drop returns an ENOBUFS error.

Many thanks to Christoph, David, and last but not least, Alexey !
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ce9e7b5