提交 · 252a8fbe819d041b29789e2035cd1760f373345f · openeuler / Kernel

18 5月, 2015 1 次提交

由 Eric Dumazet 提交于 5月 15, 2015

make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/ipip.o
  CHECK   net/ipv4/ipip.c
net/ipv4/ipip.c:254:27: warning: incorrect type in assignment (different base types)
net/ipv4/ipip.c:254:27:    expected restricted __be32 [addressable] [usertype] o_key
net/ipv4/ipip.c:254:27:    got restricted __be16 [addressable] [usertype] i_flags

Fixes: 3b7b514f ("ipip: fix a regression in ioctl")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

252a8fbe

15 5月, 2015 1 次提交

ip_tunnel: Report Rx dropped in ip_tunnel_get_stats64 · c24a5964

由 Alexander Duyck 提交于 5月 14, 2015

The rx_dropped stat wasn't being reported when ip_tunnel_get_stats64 was
called.  This was leading to some confusing results in my debug as I was
seeing rx_errors increment but no other value which pointed me toward the
type of error being seen.

This change corrects that by using netdev_stats_to_stats64 to copy all
available dev stats instead of just the few that were hand picked.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c24a5964

14 5月, 2015 6 次提交

J
geneve_core: identify as driver library in modules description · d37d29c3
由 John W. Linville 提交于 5月 13, 2015
```
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
d37d29c3

geneve: Rename support library as geneve_core · 11e1fa46

由 John W. Linville 提交于 5月 13, 2015

net/ipv4/geneve.c -> net/ipv4/geneve_core.c

This name better reflects the purpose of the module.
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11e1fa46

geneve: move definition of geneve_hdr() to geneve.h · 35d32e8f

由 John W. Linville 提交于 5月 13, 2015

This is a static inline with identical definitions in multiple places...
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35d32e8f

geneve: remove MODULE_ALIAS_RTNL_LINK from net/ipv4/geneve.c · 125907ae

由 John W. Linville 提交于 5月 13, 2015

This file is essentially a library for implementing the geneve
encapsulation protocol.  The file does not register any rtnl_link_ops,
so the MODULE_ALIAS_RTNL_LINK macro is inappropriate here.
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

125907ae

ipv4: __ip_local_out_sk() is static · 7d771aaa

由 Eric Dumazet 提交于 5月 12, 2015

__ip_local_out_sk() is only used from net/ipv4/ip_output.c

net/ipv4/ip_output.c:94:5: warning: symbol '__ip_local_out_sk' was not
declared. Should it be static?

Fixes: 7026b1dd ("netfilter: Pass socket pointer down through okfn().")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d771aaa

tcp/dccp: tw_timer_handler() is static · 216f8bb9

由 Eric Dumazet 提交于 5月 12, 2015

tw_timer_handler() is only used from net/ipv4/inet_timewait_sock.c

Fixes: 789f558c ("tcp/dccp: get rid of central timewait timer")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

216f8bb9

13 5月, 2015 1 次提交

switchdev: s/netdev_switch_/switchdev_/ and s/NETDEV_SWITCH_/SWITCHDEV_/ · ebb9a03a

由 Jiri Pirko 提交于 5月 10, 2015

Turned out that "switchdev" sticks. So just unify all related terms to use
this prefix.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebb9a03a

11 5月, 2015 3 次提交

net: Modify sk_alloc to not reference count the netns of kernel sockets. · 26abe143

由 Eric W. Biederman 提交于 5月 08, 2015

Now that sk_alloc knows when a kernel socket is being allocated modify
it to not reference count the network namespace of kernel sockets.

Keep track of if a socket needs reference counting by adding a flag to
struct sock called sk_net_refcnt.

Update all of the callers of sock_create_kern to stop using
sk_change_net and sk_release_kernel as those hacks are no longer
needed, to avoid reference counting a kernel socket.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26abe143

net: Pass kern from net_proto_family.create to sk_alloc · 11aa9c28

由 Eric W. Biederman 提交于 5月 08, 2015

In preparation for changing how struct net is refcounted
on kernel sockets pass the knowledge that we are creating
a kernel socket from sock_create_kern through to sk_alloc.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11aa9c28

net: Add a struct net parameter to sock_create_kern · eeb1bd5c

由 Eric W. Biederman 提交于 5月 08, 2015

This is long overdue, and is part of cleaning up how we allocate kernel
sockets that don't reference count struct net.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eeb1bd5c

10 5月, 2015 2 次提交

tcp: add TCPWinProbe and TCPKeepAlive SNMP counters · e520af48

由 Eric Dumazet 提交于 5月 06, 2015

Diagnosing problems related to Window Probes has been hard because
we lack a counter.

TCPWinProbe counts the number of ACK packets a sender has to send
at regular intervals to make sure a reverse ACK packet opening back
a window had not been lost.

TCPKeepAlive counts the number of ACK packets sent to keep TCP
flows alive (SO_KEEPALIVE)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e520af48

tcp: adjust window probe timers to safer values · 21c8fe99

由 Eric Dumazet 提交于 5月 06, 2015

With the advent of small rto timers in datacenter TCP,
(ip route ... rto_min x), the following can happen :

1) Qdisc is full, transmit fails.

   TCP sets a timer based on icsk_rto to retry the transmit, without
   exponential backoff.
   With low icsk_rto, and lot of sockets, all cpus are servicing timer
   interrupts like crazy.
   Intent of the code was to retry with a timer between 200 (TCP_RTO_MIN)
   and 500ms (TCP_RESOURCE_PROBE_INTERVAL)

2) Receivers can send zero windows if they don't drain their receive queue.

   TCP sends zero window probes, based on icsk_rto current value, with
   exponential backoff.
   With /proc/sys/net/ipv4/tcp_retries2 being 15 (or even smaller in
   some cases), sender can abort in less than one or two minutes !
   If receiver stops the sender, it obviously doesn't care of very tight
   rto. Probability of dropping the ACK reopening the window is not
   worth the risk.

Lets change the base timer to be at least 200ms (TCP_RTO_MIN) for these
events (but not normal RTO based retransmits)

A followup patch adds a new SNMP counter, as it would have helped a lot
diagnosing this issue.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21c8fe99

06 5月, 2015 3 次提交

tcp_westwood: fix tcp_westwood_info() · 31ccd0e6

由 Eric Dumazet 提交于 4月 29, 2015

I forgot to update tcp_westwood when changing get_info() behavior,
this patch should fix this.

Fixes: 64f40ff5 ("tcp: prepare CC get_info() access from getsockopt()")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31ccd0e6

ipv4/ip_tunnel_core: Use eth_proto_is_802_3 · d181ddca

由 Alexander Duyck 提交于 5月 04, 2015

Replace "ntohs(proto) >= ETH_P_802_3_MIN" w/ eth_proto_is_802_3(proto).
Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d181ddca

tcp: provide SYN headers for passive connections · cd8ae852

由 Eric Dumazet 提交于 5月 03, 2015

This patch allows a server application to get the TCP SYN headers for
its passive connections.  This is useful if the server is doing
fingerprinting of clients based on SYN packet contents.

Two socket options are added: TCP_SAVE_SYN and TCP_SAVED_SYN.

The first is used on a socket to enable saving the SYN headers
for child connections. This can be set before or after the listen()
call.

The latter is used to retrieve the SYN headers for passive connections,
if the parent listener has enabled TCP_SAVE_SYN.

TCP_SAVED_SYN is read once, it frees the saved SYN headers.

The data returned in TCP_SAVED_SYN are network (IPv4/IPv6) and TCP
headers.

Original patch was written by Tom Herbert, I changed it to not hold
a full skb (and associated dst and conntracking reference).

We have used such patch for about 3 years at Google.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Tested-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd8ae852

05 5月, 2015 1 次提交

net: Export IGMP/MLD message validation code · 9afd85c9

由 Linus Lüssing 提交于 5月 02, 2015

With this patch, the IGMP and MLD message validation functions are moved
from the bridge code to IPv4/IPv6 multicast files. Some small
refactoring was done to enhance readibility and to iron out some
differences in behaviour between the IGMP and MLD parsing code (e.g. the
skb-cloning of MLD messages is now only done if necessary, just like the
IGMP part always did).

Finally, these IGMP and MLD message validation functions are exported so
that not only the bridge can use it but batman-adv later, too.
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9afd85c9

04 5月, 2015 4 次提交

net: ipv4: route: Fix sending IGMP messages with link address · 6a211654

由 Andrew Lunn 提交于 5月 01, 2015

In setups with a global scope address on an interface, and a lesser
scope address on an interface sending IGMP reports, the reports can be
sent using the other interfaces global scope address rather than the
local interface address. RFC 2236 suggests:

     Ignore the Report if you cannot identify the source address of
     the packet as belonging to a subnet assigned to the interface on
     which the packet was received.

since such reports could be forged.

Look at the protocol when deciding if a RT_SCOPE_LINK address should
be used for the packet.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a211654

tcp: invoke pkts_acked hook on every ACK · 138998fd

由 Kenneth Klette Jonassen 提交于 5月 01, 2015

Invoking pkts_acked is currently conditioned on FLAG_ACKED:
receiving a cumulative ACK of new data, or ACK with SYN flag set.

Remove this condition so that CC may get RTT measurements from all SACKs.

Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: NKenneth Klette Jonassen <kennetkl@ifi.uio.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

138998fd

tcp: improve RTT from SACK for CC · 31231a8a

由 Kenneth Klette Jonassen 提交于 5月 01, 2015

tcp_sacktag_one() always picks the earliest sequence SACKed for RTT.
This might not make sense for congestion control in cases where:

  1. ACKs are lost, i.e. a SACK following a lost SACK covers both
     new and old segments at the receiver.
  2. The receiver disregards the RFC 5681 recommendation to immediately
     ACK out-of-order segments.

Give congestion control a RTT for the latest segment SACKed, which is the
most accurate RTT estimate, but preserve the conservative RTT for RTO.

Removes the call to skb_mstamp_get() in tcp_sacktag_one().

Cc: Yuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NKenneth Klette Jonassen <kennetkl@ifi.uio.no>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31231a8a

tcp: move struct tcp_sacktag_state to tcp_ack() · 196da974

由 Kenneth Klette Jonassen 提交于 5月 01, 2015

Later patch passes two values set in tcp_sacktag_one() to
tcp_clean_rtx_queue(). Prepare passing them via struct tcp_sacktag_state.
Acked-by: NYuchung Cheng <ycheng@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NKenneth Klette Jonassen <kennetkl@ifi.uio.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

196da974

03 5月, 2015 1 次提交

ipv4: remove the unnecessary codes in fib_info_hash_move · 7eee8cd4

由 Li RongQing 提交于 4月 30, 2015

The whole hlist will be moved, so not need to call hlist_del before
add the hlist_node to other hlist_head.
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7eee8cd4

02 5月, 2015 2 次提交

ipv4: Missing sk_nulls_node_init() in ping_unhash(). · a134f083

由 David S. Miller 提交于 5月 01, 2015

If we don't do that, then the poison value is left in the ->pprev
backlink.

This can cause crashes if we do a disconnect, followed by a connect().
Tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reported-by: NWen Xu <hotdog3645@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a134f083

ipv4: speedup ip_idents_reserve() · 355b590c

由 Eric Dumazet 提交于 5月 01, 2015

Under stress, ip_idents_reserve() is accessing a contended
cache line twice, with non optimal MESI transactions.

If we place timestamps in separate location, we reduce this
pressure by ~50% and allow atomic_add_return() to issue
a Request for Ownership.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

355b590c

30 4月, 2015 7 次提交

tcp_westwood: fix tcp_westwood_info() · f828ad0c

由 Eric Dumazet 提交于 4月 29, 2015

I forgot to update tcp_westwood when changing get_info() behavior,
this patch should fix this.

Fixes: 64f40ff5 ("tcp: prepare CC get_info() access from getsockopt()")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f828ad0c

tcp: update reordering first before detecting loss · 9dac8835

由 Yuchung Cheng 提交于 4月 29, 2015

tcp_mark_lost_retrans is not used when FACK is disabled. Since
tcp_update_reordering may disable FACK, it should be called first
before tcp_mark_lost_retrans.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9dac8835

tcp: add TCP_CC_INFO socket option · 6e9250f5

由 Eric Dumazet 提交于 4月 28, 2015

Some Congestion Control modules can provide per flow information,
but current way to get this information is to use netlink.

Like TCP_INFO, let's add TCP_CC_INFO so that applications can
issue a getsockopt() if they have a socket file descriptor,
instead of playing complex netlink games.

Sample usage would be :

  union tcp_cc_info info;
  socklen_t len = sizeof(info);

  if (getsockopt(fd, SOL_TCP, TCP_CC_INFO, &info, &len) == -1)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e9250f5

tcp: prepare CC get_info() access from getsockopt() · 64f40ff5

由 Eric Dumazet 提交于 4月 28, 2015

We would like that optional info provided by Congestion Control
modules using netlink can also be read using getsockopt()

This patch changes get_info() to put this information in a buffer,
instead of skb, like tcp_get_info(), so that following patch
can reuse this common infrastructure.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64f40ff5

tcp: add tcpi_bytes_received to tcp_info · bdd1f9ed

由 Eric Dumazet 提交于 4月 28, 2015

This patch tracks total number of payload bytes received on a TCP socket.
This is the sum of all changes done to tp->rcv_nxt

RFC4898 named this : tcpEStatsAppHCThruOctetsReceived

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_received was placed near tp->rcv_nxt for
best data locality and minimal performance impact.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdd1f9ed

tcp: add tcpi_bytes_acked to tcp_info · 0df48c26

由 Eric Dumazet 提交于 4月 28, 2015

This patch tracks total number of bytes acked for a TCP socket.
This is the sum of all changes done to tp->snd_una, and allows
for precise tracking of delivered data.

RFC4898 named this : tcpEStatsAppHCThruOctetsAcked

This is a 64bit field, and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)

Note that tp->bytes_acked was placed near tp->snd_una for
best data locality and minimal performance impact.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Eric Salo <salo@google.com>
Cc: Martin Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0df48c26

route: Use ipv4_mtu instead of raw rt_pmtu · cb6ccf09

由 Herbert Xu 提交于 4月 28, 2015

The commit 3cdaa5be ("ipv4: Don't
increase PMTU with Datagram Too Big message") broke PMTU in cases
where the rt_pmtu value has expired but is smaller than the new
PMTU value.

This obsolete rt_pmtu then prevents the new PMTU value from being
installed.

Fixes: 3cdaa5be ("ipv4: Don't increase PMTU with Datagram Too Big message")
Reported-by: NGerd v. Egidy <gerd.von.egidy@intra2net.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb6ccf09

24 4月, 2015 2 次提交

inet: fix possible panic in reqsk_queue_unlink() · b357a364

由 Eric Dumazet 提交于 4月 23, 2015

[ 3897.923145] BUG: unable to handle kernel NULL pointer dereference at
 0000000000000080
[ 3897.931025] IP: [<ffffffffa9f27686>] reqsk_timer_handler+0x1a6/0x243

There is a race when reqsk_timer_handler() and tcp_check_req() call
inet_csk_reqsk_queue_unlink() on the same req at the same time.

Before commit fa76ce73 ("inet: get rid of central tcp/dccp listener
timer"), listener spinlock was held and race could not happen.

To solve this bug, we change reqsk_queue_unlink() to not assume req
must be found, and we return a status, to conditionally release a
refcount on the request sock.

This also means tcp_check_req() in non fastopen case might or not
consume req refcount, so tcp_v6_hnd_req() & tcp_v4_hnd_req() have
to properly handle this.

(Same remark for dccp_check_req() and its callers)

inet_csk_reqsk_queue_drop() is now too big to be inlined, as it is
called 4 times in tcp and 3 times in dccp.

Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b357a364

tcp: avoid looping in tcp_send_fin() · 845704a5

由 Eric Dumazet 提交于 4月 23, 2015

Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.

Lets try a different strategy :

In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.

By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.

In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

845704a5

23 4月, 2015 1 次提交

tcp: fix possible deadlock in tcp_send_fin() · d83769a5

由 Eric Dumazet 提交于 4月 21, 2015

Using sk_stream_alloc_skb() in tcp_send_fin() is dangerous in
case a huge process is killed by OOM, and tcp_mem[2] is hit.

To be able to free memory we need to make progress, so this
patch allows FIN packets to not care about tcp_mem[2], if
skb allocation succeeded.

In a follow-up patch, we might abort tcp_send_fin() infinite loop
in case TIF_MEMDIE is set on this thread, as memory allocator
did its best getting extra memory already.

This patch reverts d22e1537 ("tcp: fix tcp fin memory accounting")

Fixes: d22e1537 ("tcp: fix tcp fin memory accounting")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d83769a5

22 4月, 2015 1 次提交

tcp: add memory barriers to write space paths · 3c715127

由 jbaron@akamai.com 提交于 4月 20, 2015

Ensure that we either see that the buffer has write space
in tcp_poll() or that we perform a wakeup from the input
side. Did not run into any actual problem here, but thought
that we should make things explicit.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c715127

21 4月, 2015 1 次提交

ip_forward: Drop frames with attached skb->sk · 2ab95749

由 Sebastian Pöhn 提交于 4月 20, 2015

Initial discussion was:
[FYI] xfrm: Don't lookup sk_policy for timewait sockets

Forwarded frames should not have a socket attached. Especially
tw sockets will lead to panics later-on in the stack.

This was observed with TPROXY assigning a tw socket and broken
policy routing (misconfigured). As a result frame enters
forwarding path instead of input. We cannot solve this in
TPROXY as it cannot know that policy routing is broken.

v2:
Remove useless comment
Signed-off-by: NSebastian Poehn <sebastian.poehn@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ab95749

18 4月, 2015 2 次提交

inet_diag: fix access to tcp cc information · 521f1cf1

由 Eric Dumazet 提交于 4月 16, 2015

Two different problems are fixed here :

1) inet_sk_diag_fill() might be called without socket lock held.
   icsk->icsk_ca_ops can change under us and module be unloaded.
   -> Access to freed memory.
   Fix this using rcu_read_lock() to prevent module unload.

2) Some TCP Congestion Control modules provide information
   but again this is not safe against icsk->icsk_ca_ops
   change and nla_put() errors were ignored. Some sockets
   could not get the additional info if skb was almost full.

Fix this by returning a status from get_info() handlers and
using rcu protection as well.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

521f1cf1

tcp: tcp_get_info() should fetch socket fields once · fad9dfef

由 Eric Dumazet 提交于 4月 16, 2015

tcp_get_info() can be called without holding socket lock,
so any socket fields can change under us.

Use READ_ONCE() to fetch sk_pacing_rate and sk_max_pacing_rate

Fixes: 977cb0ec ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fad9dfef

17 4月, 2015 1 次提交

fou: avoid missing unlock in failure path · 540207ae

由 WANG Cong 提交于 4月 15, 2015

Fixes: 7a6c8c34 ("fou: implement FOU_CMD_GET")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

540207ae

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功