提交 · 7c0f6ba682b9c7632072ffbedf8d328c8f3c42ba · openanolis / cloud-kernel

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

24 12月, 2016 1 次提交

inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets · 39b2dd76

由 Willem de Bruijn 提交于 12月 22, 2016

Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.

Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
Reported-by: NNisar Jagabar <njagabar@cloudmark.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

39b2dd76

01 12月, 2016 1 次提交

l2tp: lock socket before checking flags in connect() · 0382a25a

由 Guillaume Nault 提交于 11月 29, 2016

Socket flags aren't updated atomically, so the socket must be locked
while reading the SOCK_ZAPPED flag.

This issue exists for both l2tp_ip and l2tp_ip6. For IPv6, this patch
also brings error handling for __ip6_datagram_connect() failures.
Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0382a25a

05 11月, 2016 1 次提交

net: inet: Support UID-based routing in IP protocols. · e2d118a1

由 Lorenzo Colitti 提交于 11月 04, 2016

- Use the UID in routing lookups made by protocol connect() and
  sendmsg() functions.
- Make sure that routing lookups triggered by incoming packets
  (e.g., Path MTU discovery) take the UID of the socket into
  account.
- For packets not associated with a userspace socket, (e.g., ping
  replies) use UID 0 inside the user namespace corresponding to
  the network namespace the socket belongs to. This allows
  all namespaces to apply routing and iptables rules to
  kernel-originated traffic in that namespaces by matching UID 0.
  This is better than using the UID of the kernel socket that is
  sending the traffic, because the UID of kernel sockets created
  at namespace creation time (e.g., the per-processor ICMP and
  TCP sockets) is the UID of the user that created the socket,
  which might not be mapped in the namespace.

Tested: compiles allnoconfig, allyesconfig, allmodconfig
Tested: https://android-review.googlesource.com/253302Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2d118a1

04 11月, 2016 1 次提交

ipv6: add IPV6_RECVFRAGSIZE cmsg · 0cc0aa61

由 Willem de Bruijn 提交于 11月 02, 2016

When reading a datagram or raw packet that arrived fragmented, expose
the maximum fragment size if recorded to allow applications to
estimate receive path MTU.

At this point, the field is only recorded when ipv6 connection
tracking is enabled. A follow-up patch will record this field also
in the ipv6 input path.

Tested using the test for IP_RECVFRAGSIZE plus

  ip netns exec to ip addr add dev veth1 fc07::1/64
  ip netns exec from ip addr add dev veth0 fc07::2/64

  ip netns exec to ./recv_cmsg_recvfragsize -6 -u -p 6000 &
  ip netns exec from nc -q 1 -u fc07::1 6000 < payload

Both with and without enabling connection tracking

  ip6tables -A INPUT -m state --state NEW -p udp -j LOG
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cc0aa61

17 5月, 2016 1 次提交

sock: propagate __sock_cmsg_send() error · 2632616b

由 Eric Dumazet 提交于 5月 13, 2016

__sock_cmsg_send() might return different error codes, not only -EINVAL.

Fixes: 24025c46 ("ipv4: process socket-level control messages in IPv4")
Fixes: ad1e46a8 ("ipv6: process socket-level control messages in IPv6")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2632616b

04 5月, 2016 1 次提交

ipv6: add new struct ipcm6_cookie · 26879da5

由 Wei Wang 提交于 5月 02, 2016

In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
This is not a good practice and makes it hard to add new parameters.
This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
ipv4 and include the above mentioned variables. And we only pass the
pointer to this structure to corresponding functions. This makes it easier
to add new parameters in the future and makes the function cleaner.
Signed-off-by: NWei Wang <weiwan@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26879da5

26 4月, 2016 1 次提交

net: better drop monitoring in ip{6}_recv_error() · 960a2628

由 Eric Dumazet 提交于 4月 21, 2016

We should call consume_skb(skb) when skb is properly consumed,
or kfree_skb(skb) when skb must be dropped in error case.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

960a2628

15 4月, 2016 4 次提交

ipv6: udp: Do a route lookup and update during release_cb · e646b657

由 Martin KaFai Lau 提交于 4月 11, 2016

This patch adds a release_cb for UDPv6.  It does a route lookup
and updates sk->sk_dst_cache if it is needed.  It picks up the
left-over job from ip6_sk_update_pmtu() if the sk was owned
by user during the pmtu update.

It takes a rcu_read_lock to protect the __sk_dst_get() operations
because another thread may do ip6_dst_store() without taking the
sk lock (e.g. sendmsg).

Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Reported-by: NWei Wang <weiwan@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e646b657

ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update · 33c162a9

由 Martin KaFai Lau 提交于 4月 11, 2016

There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
   logic to update the sk->sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
   sk->sk_dst_cache instead of the newly created RTF_CACHE clone.

This patch updates the sk->sk_dst_cache for a connected datagram sk
during pmtu-update code path.

Note that the sk->sk_v6_daddr is used to do the route lookup
instead of skb->data (i.e. iph).  It is because a UDP socket can become
connected after sending out some datagrams in un-connected state.  or
It can be connected multiple times to different destinations.  Hence,
iph may not be related to where sk is currently connected to.

It is done under '!sock_owned_by_user(sk)' condition because
the user may make another ip6_datagram_connect()  (i.e changing
the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update
code path.

For the sock_owned_by_user(sk) == true case, the next patch will
introduce a release_cb() which will update the sk->sk_dst_cache.

Test:

Server (Connected UDP Socket):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Route Details:
[root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
2fac::/64 dev eth0  proto kernel  metric 256  pref medium
2fac:face::/64 via 2fac::face dev eth0  metric 1024  pref medium

A simple python code to create a connected UDP socket:

import socket
import errno

HOST = '2fac::1'
PORT = 8080

s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.connect(('2fac:face::face', 53))
print("connected")
while True:
    try:
	data = s.recv(1024)
    except socket.error as se:
	if se.errno == errno.EMSGSIZE:
		pmtu = s.getsockopt(41, 24)
		print("PMTU:%d" % pmtu)
		break
s.close()

Python program output after getting a ICMPV6_PKT_TOOBIG:
[root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
connected
PMTU:1300

Cache routes after recieving TOOBIG:
[root@arch-fb-vm1 ~]# ip -6 r show table cache
2fac:face::face via 2fac::face dev eth0  metric 0
    cache  expires 463sec mtu 1300 pref medium

Client (Send the ICMPV6_PKT_TOOBIG):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scapy is used to generate the TOOBIG message.  Here is the scapy script I have
used:

>>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
>>> sendp(p, iface='qemubr0')

Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Reported-by: NWei Wang <weiwan@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33c162a9

ipv6: datagram: Refactor dst lookup and update codes to a new function · 7e2040db

由 Martin KaFai Lau 提交于 4月 11, 2016

This patch moves the route lookup and update codes for connected
datagram sk to a newly created function ip6_datagram_dst_update()

It will be reused during the pmtu update in the later patch.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e2040db

ipv6: datagram: Refactor flowi6 init codes to a new function · 80fbdb20

由 Martin KaFai Lau 提交于 4月 11, 2016

Move flowi6 init codes for connected datagram sk to a newly created
function ip6_datagram_flow_key_init().

Notes:
1. fl6_flowlabel is used instead of fl6.flowlabel in __ip6_datagram_connect
2. ipv6_addr_is_multicast(&fl6->daddr) is used instead of
   (addr_type & IPV6_ADDR_MULTICAST) in ip6_datagram_flow_key_init()

This new function will be reused during pmtu update in the later patch.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80fbdb20

05 4月, 2016 1 次提交

ipv6: process socket-level control messages in IPv6 · ad1e46a8

由 Soheil Hassas Yeganeh 提交于 4月 02, 2016

Process socket-level control messages by invoking
__sock_cmsg_send in ip6_datagram_send_ctl for control messages on
the SOL_SOCKET layer.

This makes sure whenever ip6_datagram_send_ctl is called for
udp and raw, we also process socket-level control messages.

This is a bit uglier than IPv4, since IPv6 does not have
something like ipcm_cookie. Perhaps we can later create
a control message cookie for IPv6?

Note that this commit interprets new control messages that
were ignored before. As such, this commit does not change
the behavior of IPv6 control messages.
Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad1e46a8

30 1月, 2016 1 次提交

ipv6/udp: use sticky pktinfo egress ifindex on connect() · 1cdda918

由 Paolo Abeni 提交于 1月 29, 2016

Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cdda918

03 12月, 2015 1 次提交

ipv6: add complete rcu protection around np->opt · 45f6fad8

由 Eric Dumazet 提交于 11月 29, 2015

This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np->opt
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45f6fad8

26 9月, 2015 1 次提交

ipv6: constify ip6_xmit() sock argument · 1c1e9d2b

由 Eric Dumazet 提交于 9月 25, 2015

This is to document that socket lock might not be held at this point.

skb_set_owner_w() and ipv6_local_error() are using proper atomic ops
or spinlocks, so we promote the socket to non const when calling them.

netfilter hooks should never assume socket lock is held,
we also promote the socket to non const.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c1e9d2b

30 7月, 2015 1 次提交

net: Set sk_txhash from a random number · 877d1f62

由 Tom Herbert 提交于 7月 28, 2015

This patch creates sk_set_txhash and eliminates protocol specific
inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a
random number instead of performing flow dissection. sk_set_txash
is also allowed to be called multiple times for the same socket,
we'll need this when redoing the hash for negative routing advice.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

877d1f62

16 7月, 2015 1 次提交

ipv6: lock socket in ip6_datagram_connect() · 03645a11

由 Eric Dumazet 提交于 7月 14, 2015

ip6_datagram_connect() is doing a lot of socket changes without
socket being locked.

This looks wrong, at least for udp_lib_rehash() which could corrupt
lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03645a11

10 7月, 2015 1 次提交

ipv6: use flag instead of u16 for hop in inet6_skb_parm · 8b58a398

由 Florian Westphal 提交于 7月 08, 2015

Hop was always either 0 or sizeof(struct ipv6hdr).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b58a398

24 6月, 2015 1 次提交

ip: report the original address of ICMP messages · 34b99df4

由 Julian Anastasov 提交于 6月 23, 2015

ICMP messages can trigger ICMP and local errors. In this case
serr->port is 0 and starting from Linux 4.0 we do not return
the original target address to the error queue readers.
Add function to define which errors provide addr_offset.
With this fix my ping command is not silent anymore.

Fixes: c247f053 ("ip: fix error queue empty skb handling")
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34b99df4

01 4月, 2015 1 次提交

ipv6: coding style: comparison for equality with NULL · 63159f29

由 Ian Morris 提交于 3月 29, 2015

The ipv6 code uses a mixture of coding styles. In some instances check for NULL
pointer is done as x == NULL and sometimes as !x. !x is preferred according to
checkpatch and this patch makes the code consistent by adopting the latter
form.

No changes detected by objdiff.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63159f29

09 3月, 2015 1 次提交

ip: fix error queue empty skb handling · c247f053

由 Willem de Bruijn 提交于 3月 07, 2015

When reading from the error queue, msg_name and msg_control are only
populated for some errors. A new exception for empty timestamp skbs
added a false positive on icmp errors without payload.

`traceroute -M udpconn` only displayed gateways that return payload
with the icmp error: the embedded network headers are pulled before
sock_queue_err_skb, leaving an skb with skb->len == 0 otherwise.

Fix this regression by refining when msg_name and msg_control
branches are taken. The solutions for the two fields are independent.

msg_name only makes sense for errors that configure serr->port and
serr->addr_offset. Test the first instead of skb->len. This also fixes
another issue. saddr could hold the wrong data, as serr->addr_offset
is not initialized  in some code paths, pointing to the start of the
network header. It is only valid when serr->port is set (non-zero).

msg_control support differs between IPv4 and IPv6. IPv4 only honors
requests for ICMP and timestamps with SOF_TIMESTAMPING_OPT_CMSG. The
skb->len test can simply be removed, because skb->dev is also tested
and never true for empty skbs. IPv6 honors requests for all errors
aside from local errors and timestamps on empty skbs.

In both cases, make the policy more explicit by moving this logic to
a new function that decides whether to process msg_control and that
optionally prepares the necessary fields in skb->cb[]. After this
change, the IPv4 and IPv6 paths are more similar.

The last case is rxrpc. Here, simply refine to only match timestamps.

Fixes: 49ca0d8b ("net-timestamp: no-payload option")
Reported-by: NJan Niehusmann <jan@gondor.com>
Signed-off-by: NWillem de Bruijn <willemb@google.com>

----

Changes
  v1->v2
  - fix local origin test inversion in ip6_datagram_support_cmsg
  - make v4 and v6 code paths more similar by introducing analogous
    ipv4_datagram_support_cmsg
  - fix compile bug in rxrpc
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c247f053

03 2月, 2015 1 次提交

net-timestamp: no-payload option · 49ca0d8b

由 Willem de Bruijn 提交于 1月 30, 2015

Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
timestamps, this loops timestamps on top of empty packets.

Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
cmsg reception (aside from timestamps) are no longer possible. This
works together with a follow on patch that allows administrators to
only allow tx timestamping if it does not loop payload or metadata.
Signed-off-by: NWillem de Bruijn <willemb@google.com>

----

Changes (rfc -> v1)
  - add documentation
  - remove unnecessary skb->len test (thanks to Richard Cochran)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

49ca0d8b

16 1月, 2015 1 次提交

ip: zero sockaddr returned on error queue · f812116b

由 Willem de Bruijn 提交于 1月 15, 2015

The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as

    struct {
            struct sock_extended_err ee;
            struct sockaddr_in(6)    offender;
    } errhdr;

The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.

An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.
Signed-off-by: NWillem de Bruijn <willemb@google.com>

----

Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f812116b

11 12月, 2014 1 次提交

net: introduce helper macro for_each_cmsghdr · f95b414e

由 Gu Zheng 提交于 12月 11, 2014

Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f95b414e

09 12月, 2014 1 次提交

net-timestamp: allow reading recv cmsg on errqueue with origin tstamp · 829ae9d6

由 Willem de Bruijn 提交于 11月 30, 2014

Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.

on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.

In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.

The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:

ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.
Signed-off-by: NWillem de Bruijn <willemb@google.com>

----

Changes
  v1 -> v2
    large rewrite
    - integrate with existing pktinfo cmsg generation code
    - on ipv4: only send with new flag, to maintain legacy behavior
    - on ipv6: send at most a single pktinfo cmsg
    - on ipv6: initialize fields if not yet initialized

The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches

1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
   (http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
   cmsg from users without CAP_NET_RAW.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

829ae9d6

12 11月, 2014 1 次提交

net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited · ba7a46f1

由 Joe Perches 提交于 11月 11, 2014

Use the more common dynamic_debug capable net_dbg_ratelimited
and remove the LIMIT_NETDEBUG macro.

All messages are still ratelimited.

Some KERN_<LEVEL> uses are changed to KERN_DEBUG.

This may have some negative impact on messages that were
emitted at KERN_INFO that are not not enabled at all unless
DEBUG is defined or dynamic_debug is enabled.  Even so,
these messages are now _not_ emitted by default.

This also eliminates the use of the net_msg_warn sysctl
"/proc/sys/net/core/warnings".  For backward compatibility,
the sysctl is not removed, but it has no function.  The extern
declaration of net_msg_warn is removed from sock.h and made
static in net/core/sysctl_net_core.c

Miscellanea:

o Update the sysctl documentation
o Remove the embedded uses of pr_fmt
o Coalesce format fragments
o Realign arguments
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba7a46f1

06 11月, 2014 1 次提交

net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b

由 David S. Miller 提交于 11月 05, 2014

This encapsulates all of the skb_copy_datagram_iovec() callers
with call argument signature "skb, offset, msghdr->msg_iov, length".

When we move to iov_iters in the networking, the iov_iter object will
sit in the msghdr.

Having a helper like this means there will be less places to touch
during that transformation.

Based upon descriptions and patch from Al Viro.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f3d02b

02 9月, 2014 1 次提交

sock: deduplicate errqueue dequeue · 364a9e93

由 Willem de Bruijn 提交于 8月 31, 2014

sk->sk_error_queue is dequeued in four locations. All share the
exact same logic. Deduplicate.

Also collapse the two critical sections for dequeue (at the top of
the recv handler) and signal (at the bottom).

This moves signal generation for the next packet forward, which should
be harmless.

It also changes the behavior if the recv handler exits early with an
error. Previously, a signal for follow-up packets on the errqueue
would then not be scheduled. The new behavior, to always signal, is
arguably a bug fix.

For rxrpc, the change causes the same function to be called repeatedly
for each queued packet (because the recv handler == sk_error_report).
It is likely that all packets will fail for the same reason (e.g.,
memory exhaustion).

This code runs without sk_lock held, so it is not safe to trust that
sk->sk_err is immutable inbetween releasing q->lock and the subsequent
test. Introduce int err just to avoid this potential race.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

364a9e93

25 8月, 2014 1 次提交

ipv6: White-space cleansing : Line Layouts · 67ba4152

由 Ian Morris 提交于 8月 24, 2014

This patch makes no changes to the logic of the code but simply addresses
coding style issues as detected by checkpatch.

Both objdump and diff -w show no differences.

A number of items are addressed in this patch:
* Multiple spaces converted to tabs
* Spaces before tabs removed.
* Spaces in pointer typing cleansed (char *)foo etc.
* Remove space after sizeof
* Ensure spacing around comparators such as if statements.
Signed-off-by: NIan Morris <ipm@chirality.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67ba4152

08 7月, 2014 1 次提交

net: Save TX flow hash in sock and set in skbuf on xmit · b73c3d0e

由 Tom Herbert 提交于 7月 01, 2014

For a connected socket we can precompute the flow hash for setting
in skb->hash on output. This is a performance advantage over
calculating the skb->hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.

skb_set_hash_from_sk is a function which sets skb->hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.

Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).

Before fix:

  95.02% CPU utilization
  154/256/505 90/95/99% latencies
  1.13042e+06 tps

  Time in functions:
    0.28% skb_flow_dissect
    0.21% __skb_get_hash

After fix:

  94.95% CPU utilization
  156/254/485 90/95/99% latencies
  1.15447e+06

  Neither __skb_get_hash nor skb_flow_dissect appear in perf
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b73c3d0e

23 1月, 2014 1 次提交

ipv6: enable anycast addresses as source addresses for datagrams · 7c90cc2d

由 FX Le Bail 提交于 1月 22, 2014

This change allows to consider an anycast address valid as source address
when given via an IPV6_PKTINFO or IPV6_2292PKTINFO ancillary data item.
So, when sending a datagram with ancillary data, the unicast and anycast
addresses are handled in the same way.

- Adds ipv6_chk_acast_addr_src() to check if an anycast address is link-local
  on given interface or is global.
- Uses it in ip6_datagram_send_ctl().
Signed-off-by: NFrancois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c90cc2d

22 1月, 2014 1 次提交

ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts · 82b276cd

由 Hannes Frederic Sowa 提交于 1月 20, 2014

Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow
connecting and binding to them. sendmsg logic does already check msg->name
for this but must trust already connected sockets which could be set up
for connection to ipv4 address family.

Per-socket flag ipv6only is of no use here, as it is under users control
by setsockopt.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82b276cd

20 1月, 2014 1 次提交

ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams · 4b261c75

由 Hannes Frederic Sowa 提交于 1月 20, 2014

We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.
Reported-by: NGert Doering <gert@space.net>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b261c75

19 1月, 2014 1 次提交

net: add build-time checks for msg->msg_name size · 342dfc30

由 Steffen Hurrle 提交于 1月 17, 2014

This is a follow-up patch to f3d33426 ("net: rework recvmsg
handler msg_name and msg_namelen logic").

DECLARE_SOCKADDR validates that the structure we use for writing the
name information to is not larger than the buffer which is reserved
for msg->msg_name (which is 128 bytes). Also use DECLARE_SOCKADDR
consistently in sendmsg code paths.
Signed-off-by: NSteffen Hurrle <steffen@hurrle.net>
Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

342dfc30

11 12月, 2013 1 次提交

ipv6: do not erase dst address with flow label destination · ce7a3bdf

由 Florent Fourcot 提交于 12月 10, 2013

This patch is following b579035f
	"ipv6: remove old conditions on flow label sharing"

Since there is no reason to restrict a label to a
destination, we should not erase the destination value of a
socket with the value contained in the flow label storage.

This patch allows to really have the same flow label to more
than one destination.
Signed-off-by: NFlorent Fourcot <florent.fourcot@enst-bretagne.fr>
Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce7a3bdf

06 12月, 2013 1 次提交

net: Remove FLOWI_FLAG_CAN_SLEEP · 0e0d44ab

由 Steffen Klassert 提交于 8月 28, 2013

FLOWI_FLAG_CAN_SLEEP was used to notify xfrm about the posibility
to sleep until the needed states are resolved. This code is gone,
so FLOWI_FLAG_CAN_SLEEP is not needed anymore.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

0e0d44ab

24 11月, 2013 2 次提交

ipv6: fix leaking uninitialized port number of offender sockaddr · 1fa4c710

由 Hannes Frederic Sowa 提交于 11月 23, 2013

Offenders don't have port numbers, so set it to 0.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fa4c710

inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions · 85fbaa75

由 Hannes Frederic Sowa 提交于 11月 23, 2013

Commit bceaa902 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.

As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.

This broke traceroute and such.

Fixes: bceaa902 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: NBrad Spengler <spender@grsecurity.net>
Reported-by: Tom Labanowski
Cc: mpb <mpb.mail@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85fbaa75

09 10月, 2013 1 次提交

ipv6: make lookups simpler and faster · efe4208f

由 Eric Dumazet 提交于 10月 03, 2013

TCP listener refactoring, part 4 :

To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common

Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.

Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).

inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.

inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.

We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efe4208f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功