提交 · 03485f2adcde0c2d4e9228b659be78e872486bbb · openeuler / Kernel

03 2月, 2015 6 次提交

udpv6: Add lockless sendmsg() support · 03485f2a

由 Vlad Yasevich 提交于 1月 31, 2015

This commit adds the same functionaliy to IPv6 that
commit 903ab86d
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Mar 1 02:36:48 2011 +0000

    udp: Add lockless transmit path

added to IPv4.

UDP transmit path can now run without a socket lock,
thus allowing multiple threads to send to a single socket
more efficiently.
This is only used when corking/MSG_MORE is not used.
Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03485f2a

ipv6: Introduce udpv6_send_skb() · d39d938c

由 Vlad Yasevich 提交于 1月 31, 2015

Now that we can individually construct IPv6 skbs to send, add a
udpv6_send_skb() function to populate the udp header and send the
skb.  This allows udp_v6_push_pending_frames() to re-use this
function as well as enables us to add lockless sendmsg() support.
Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d39d938c

ipv6: introduce ipv6_make_skb · 6422398c

由 Vlad Yasevich 提交于 1月 31, 2015

This commit is very similar to
commit 1c32c5ad
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Tue Mar 1 02:36:47 2011 +0000

    inet: Add ip_make_skb and ip_finish_skb

It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb.

The job of ip6_make_skb is to collect messages into an ipv6 packet
and poplulate ipv6 eader.  The job of ip6_finish_skb is to transmit
the generated skb.  Together they replicated the job of
ip6_push_pending_frames() while also provide the capability to be
called independently.  This will be needed to add lockless UDP sendmsg
support.
Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6422398c

ipv6: Append sending data to arbitrary queue · 0bbe84a6

由 Vlad Yasevich 提交于 1月 31, 2015

Add the ability to append data to arbitrary queue.  This
will be needed later to implement lockless UDP sends.
Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bbe84a6

ipv6: pull cork initialization into its own function. · 366e41d9

由 Vlad Yasevich 提交于 1月 31, 2015

Pull IPv6 cork initialization into its own function that
can be re-used.  IPv6 specific cork data did not have an
explicit data structure.  This patch creats eone so that
just ipv6 cork data can be as arguemts.  Also, since
IPv6 tries to save the flow label into inet_cork_full
tructure, pass the full cork.

Adjust ip6_cork_release() to take cork data structures.
Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

366e41d9

net-timestamp: no-payload option · 49ca0d8b

由 Willem de Bruijn 提交于 1月 30, 2015

Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
timestamps, this loops timestamps on top of empty packets.

Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
cmsg reception (aside from timestamps) are no longer possible. This
works together with a follow on patch that allows administrators to
only allow tx timestamping if it does not loop payload or metadata.
Signed-off-by: NWillem de Bruijn <willemb@google.com>

----

Changes (rfc -> v1)
  - add documentation
  - remove unnecessary skb->len test (thanks to Richard Cochran)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

49ca0d8b

31 1月, 2015 1 次提交

net: mark some potential candidates __read_mostly · 207895fd

由 Daniel Borkmann 提交于 1月 29, 2015

They are all either written once or extremly rarely (e.g. from init
code), so we can move them to the .data..read_mostly section.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

207895fd

27 1月, 2015 1 次提交

ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too · 6e9e16e6

由 Hannes Frederic Sowa 提交于 1月 26, 2015

Lubomir Rintel reported that during replacing a route the interface
reference counter isn't correctly decremented.

To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>:
| [root@rhel7-5 lkundrak]# sh -x lal
| + ip link add dev0 type dummy
| + ip link set dev0 up
| + ip link add dev1 type dummy
| + ip link set dev1 up
| + ip addr add 2001:db8:8086::2/64 dev dev0
| + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
| + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
| + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
| + ip link del dev0 type dummy
| Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
|
| Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
|  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2

During replacement of a rt6_info we must walk all parent nodes and check
if the to be replaced rt6_info got propagated. If so, replace it with
an alive one.

Fixes: 4a287eba ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
Reported-by: NLubomir Rintel <lkundrak@v3.sk>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Tested-by: NLubomir Rintel <lkundrak@v3.sk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e9e16e6

26 1月, 2015 3 次提交

ipv6: tcp: fix race in IPV6_2292PKTOPTIONS · 1dc7b90f

由 Eric Dumazet 提交于 1月 21, 2015

IPv6 TCP sockets store in np->pktoptions skbs, and use skb_set_owner_r()
to charge the skb to socket.

It means that destructor must be called while socket is locked.

Therefore, we cannot use skb_get() or atomic_inc(&skb->users)
to protect ourselves : kfree_skb() might race with other users
manipulating sk->sk_forward_alloc

Fix this race by holding socket lock for the duration of
ip6_datagram_recv_ctl()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1dc7b90f

ipv6: Fix __ip6_route_redirect · b0a1ba59

由 Martin KaFai Lau 提交于 1月 20, 2015

In my last commit (a3c00e46: ipv6: Remove BACKTRACK macro), the changes in
__ip6_route_redirect is incorrect.  The following case is missed:
1. The for loop tries to find a valid gateway rt. If it fails to find
   one, rt will be NULL.
2. When rt is NULL, it is set to the ip6_null_entry.
3. The newly added 'else if', from a3c00e46, will stop the backtrack from
   happening.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0a1ba59

net: ipv6: Add sysctl entry to disable MTU updates from RA · c2943f14

由 Harout Hedeshian 提交于 1月 20, 2015

The kernel forcefully applies MTU values received in router
advertisements provided the new MTU is less than the current. This
behavior is undesirable when the user space is managing the MTU. Instead
a sysctl flag 'accept_ra_mtu' is introduced such that the user space
can control whether or not RA provided MTU updates should be applied. The
default behavior is unchanged; user space must explicitly set this flag
to 0 for RA MTUs to be ignored.
Signed-off-by: NHarout Hedeshian <harouth@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2943f14

25 1月, 2015 1 次提交

udp: Do not require sock in udp_tunnel_xmit_skb · d998f8ef

由 Tom Herbert 提交于 1月 20, 2015

The UDP tunnel transmit functions udp_tunnel_xmit_skb and
udp_tunnel6_xmit_skb include a socket argument. The socket being
passed to the functions (from VXLAN) is a UDP created for receive
side. The only thing that the socket is used for in the transmit
functions is to get the setting for checksum (enabled or zero).
This patch removes the argument and and adds a nocheck argument
for checksum setting. This eliminates the unnecessary dependency
on a UDP socket for UDP tunnel transmit.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d998f8ef

24 1月, 2015 1 次提交

ip6gretap: advertise link netns via netlink · 3390e397

由 Nicolas Dichtel 提交于 1月 20, 2015

Assign rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3390e397

20 1月, 2015 2 次提交

ipv6: stop sending PTB packets for MTU < 1280 · 9d289715

由 Hagen Paul Pfeifer 提交于 1月 15, 2015

Reduce the attack vector and stop generating IPv6 Fragment Header for
paths with an MTU smaller than the minimum required IPv6 MTU
size (1280 byte) - called atomic fragments.

See IETF I-D "Deprecating the Generation of IPv6 Atomic Fragments" [1]
for more information and how this "feature" can be misused.

[1] https://tools.ietf.org/html/draft-ietf-6man-deprecate-atomfrag-generation-00Signed-off-by: NFernando Gont <fgont@si6networks.com>
Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d289715

tunnels: advertise link netns via netlink · 1728d4fa

由 Nicolas Dichtel 提交于 1月 15, 2015

Implement rtnl_link_ops->get_link_net() callback so that IFLA_LINK_NETNSID is
added to rtnetlink messages.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1728d4fa

19 1月, 2015 1 次提交

netlink: Fix bugs in nlmsg_end() conversions. · 7b46a644

由 David S. Miller 提交于 1月 18, 2015

Commit 053c095a ("netlink: make nlmsg_end() and genlmsg_end()
void") didn't catch all of the cases where callers were breaking out
on the return value being equal to zero, which they no longer should
when zero means success.

Fix all such cases.
Reported-by: NMarcel Holtmann <marcel@holtmann.org>
Reported-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b46a644

18 1月, 2015 1 次提交

netlink: make nlmsg_end() and genlmsg_end() void · 053c095a

由 Johannes Berg 提交于 1月 16, 2015

Contrary to common expectations for an "int" return, these functions
return only a positive value -- if used correctly they cannot even
return 0 because the message header will necessarily be in the skb.

This makes the very common pattern of

  if (genlmsg_end(...) < 0) { ... }

be a whole bunch of dead code. Many places also simply do

  return nlmsg_end(...);

and the caller is expected to deal with it.

This also commonly (at least for me) causes errors, because it is very
common to write

  if (my_function(...))
    /* error condition */

and if my_function() does "return nlmsg_end()" this is of course wrong.

Additionally, there's not a single place in the kernel that actually
needs the message length returned, and if anyone needs it later then
it'll be very easy to just use skb->len there.

Remove this, and make the functions void. This removes a bunch of dead
code as described above. The patch adds lines because I did

-	return nlmsg_end(...);
+	nlmsg_end(...);
+	return 0;

I could have preserved all the function's return values by returning
skb->len, but instead I've audited all the places calling the affected
functions and found that none cared. A few places actually compared
the return value with <= 0 in dump functionality, but that could just
be changed to < 0 with no change in behaviour, so I opted for the more
efficient version.

One instance of the error I've made numerous times now is also present
in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
check for <0 or <=0 and thus broke out of the loop every single time.
I've preserved this since it will (I think) have caused the messages to
userspace to be formatted differently with just a single message for
every SKB returned to userspace. It's possible that this isn't needed
for the tools that actually use this, but I don't even know what they
are so couldn't test that changing this behaviour would be acceptable.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

053c095a

16 1月, 2015 1 次提交

ip: zero sockaddr returned on error queue · f812116b

由 Willem de Bruijn 提交于 1月 15, 2015

The sockaddr is returned in IP(V6)_RECVERR as part of errhdr. That
structure is defined and allocated on the stack as

    struct {
            struct sock_extended_err ee;
            struct sockaddr_in(6)    offender;
    } errhdr;

The second part is only initialized for certain SO_EE_ORIGIN values.
Always initialize it completely.

An MTU exceeded error on a SOCK_RAW/IPPROTO_RAW is one example that
would return uninitialized bytes.
Signed-off-by: NWillem de Bruijn <willemb@google.com>

----

Also verified that there is no padding between errhdr.ee and
errhdr.offender that could leak additional kernel data.
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f812116b

15 1月, 2015 1 次提交

ipv6:icmp:remove unnecessary brackets · 9a6b4b39

由 zhuyj 提交于 1月 14, 2015

There are too many brackets. Maybe only one bracket is enough.
Signed-off-by: NZhu Yanjun <Yanjun.Zhu@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a6b4b39

06 1月, 2015 5 次提交

net: tcp: add per route congestion control · 81164413

由 Daniel Borkmann 提交于 1月 05, 2015

This work adds the possibility to define a per route/destination
congestion control algorithm. Generally, this opens up the possibility
for a machine with different links to enforce specific congestion
control algorithms with optimal strategies for each of them based
on their network characteristics, even transparently for a single
application listening on all links.

For our specific use case, this additionally facilitates deployment
of DCTCP, for example, applications can easily serve internal
traffic/dsts in DCTCP and external one with CUBIC. Other scenarios
would also allow for utilizing e.g. long living, low priority
background flows for certain destinations/routes while still being
able for normal traffic to utilize the default congestion control
algorithm. We also thought about a per netns setting (where different
defaults are possible), but given its actually a link specific
property, we argue that a per route/destination setting is the most
natural and flexible.

The administrator can utilize this through ip-route(8) by appending
"congctl [lock] <name>", where <name> denotes the name of a
congestion control algorithm and the optional lock parameter allows
to enforce the given algorithm so that applications in user space
would not be allowed to overwrite that algorithm for that destination.

The dst metric lookups are being done when a dst entry is already
available in order to avoid a costly lookup and still before the
algorithms are being initialized, thus overhead is very low when the
feature is not being used. While the client side would need to drop
the current reference on the module, on server side this can actually
even be avoided as we just got a flat-copied socket clone.

Joint work with Florian Westphal.
Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81164413

net: tcp: add RTAX_CC_ALGO fib handling · ea697639

由 Daniel Borkmann 提交于 1月 05, 2015

This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
control metric to be set up and dumped back to user space.

While the internal representation of RTAX_CC_ALGO is handled as a u32
key, we avoided to expose this implementation detail to user space, thus
instead, we chose the netlink attribute that is being exchanged between
user space to be the actual congestion control algorithm name, similarly
as in the setsockopt(2) API in order to allow for maximum flexibility,
even for 3rd party modules.

It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
it should have been stored in RTAX_FEATURES instead, we first thought
about reusing it for the congestion control key, but it brings more
complications and/or confusion than worth it.

Joint work with Florian Westphal.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea697639

net: fib6: convert cfg metric to u32 outside of table write lock · e715b6d3

由 Florian Westphal 提交于 1月 05, 2015

Do the nla validation earlier, outside the write lock.

This is needed by followup patch which needs to be able to call
request_module (which can sleep) if needed.

Joint work with Daniel Borkmann.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e715b6d3

net: fib6: fib6_commit_metrics: fix potential NULL pointer dereference · 0409c9a5

由 Daniel Borkmann 提交于 1月 05, 2015

When IPv6 host routes with metrics attached are being added, we fetch
the metrics store from the dst via COW through dst_metrics_write_ptr(),
added through commit e5fd387a.

One remaining problem here is that we actually call into inet_getpeer()
and may end up allocating/creating a new peer from the kmemcache, which
may fail.

Example trace from perf probe (inet_getpeer:41) where create is 1:

ip 6877 [002] 4221.391591: probe:inet_getpeer: (ffffffff8165e293)
  85e294 inet_getpeer.part.7 (<- kmem_cache_alloc())
  85e578 inet_getpeer
  8eb333 ipv6_cow_metrics
  8f10ff fib6_commit_metrics

Therefore, a check for NULL on the return of dst_metrics_write_ptr()
is necessary here.

Joint work with Florian Westphal.

Fixes: e5fd387a ("ipv6: do not overwrite inetpeer metrics prematurely")
Cc: Michal Kubeček <mkubecek@suse.cz>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0409c9a5

ip: Move checksum convert defines to inet · 224d019c

由 Tom Herbert 提交于 1月 05, 2015

Move convert_csum from udp_sock to inet_sock. This allows the
possibility that we can use convert checksum for different types
of sockets and also allows convert checksum to be enabled from
inet layer (what we'll want to do when enabling IP_CHECKSUM cmsg).
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

224d019c

23 12月, 2014 2 次提交

netfilter: nf_tables: fix port natting in little endian archs · 7b5bca46

由 leroy christophe 提交于 12月 22, 2014

Make sure this fetches 16-bits port data from the register.
Remove casting to make sparse happy, not needed anymore.
Signed-off-by: Nleroy christophe <christophe.leroy@c-s.fr>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7b5bca46

tcp6: don't move IP6CB before xfrm6_policy_check() · 2dc49d16

由 Nicolas Dichtel 提交于 12月 22, 2014

When xfrm6_policy_check() is used, _decode_session6() is called after some
intermediate functions. This function uses IP6CB(), thus TCP_SKB_CB() must be
prepared after the call of xfrm6_policy_check().

Before this patch, scenarii with IPv6 + TCP + IPsec Transport are broken.

Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Reported-by: NHuaibin Wang <huaibin.wang@6wind.com>
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2dc49d16

11 12月, 2014 1 次提交

net: introduce helper macro for_each_cmsghdr · f95b414e

由 Gu Zheng 提交于 12月 11, 2014

Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f95b414e

10 12月, 2014 5 次提交

tcp: fix more NULL deref after prequeue changes · 0f85feae

由 Eric Dumazet 提交于 12月 09, 2014

When I cooked commit c3658e8d ("tcp: fix possible NULL dereference in
tcp_vX_send_reset()") I missed other spots we could deref a NULL
skb_dst(skb)

Again, if a socket is provided, we do not need skb_dst() to get a
pointer to network namespace : sock_net(sk) is good enough.
Reported-by: NDann Frazier <dann.frazier@canonical.com>
Bisected-by: NDann Frazier <dann.frazier@canonical.com>
Tested-by: NDann Frazier <dann.frazier@canonical.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Fixes: ca777eff ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f85feae

put iov_iter into msghdr · c0371da6

由 Al Viro 提交于 11月 24, 2014

Note that the code _using_ ->msg_iter at that point will be very
unhappy with anything other than unshifted iovec-backed iov_iter.
We still need to convert users to proper primitives.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c0371da6

A
ip_generic_getfrag, udplite_getfrag: switch to passing msghdr · f69e6d13
由 Al Viro 提交于 11月 24, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f69e6d13
A
ipv6 equivalent of "ipv4: Avoid reading user iov twice after raw_probe_proto_opt" · 19e3c66b
由 Al Viro 提交于 11月 24, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
19e3c66b

ipv6: remove useless spin_lock/spin_unlock · 86fe8f89

由 Duan Jiong 提交于 12月 03, 2014

xchg is atomic, so there is no necessary to use spin_lock/spin_unlock
to protect it. At last, remove the redundant
opt = xchg(&inet6_sk(sk)->opt, opt); statement.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86fe8f89

09 12月, 2014 2 次提交

udp: Neaten and reduce size of compute_score functions · 60c04aec

由 Joe Perches 提交于 12月 01, 2014

The compute_score functions are a bit difficult to read.

Neaten them a bit to reduce object sizes and make them a
bit more intelligible.

Return early to avoid indentation and avoid unnecessary
initializations.

(allyesconfig, but w/ -O2 and no profiling)

$ size net/ipv[46]/udp.o.*
   text    data     bss     dec     hex filename
  28680    1184      25   29889    74c1 net/ipv4/udp.o.new
  28756    1184      25   29965    750d net/ipv4/udp.o.old
  17600    1010       2   18612    48b4 net/ipv6/udp.o.new
  17632    1010       2   18644    48d4 net/ipv6/udp.o.old
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60c04aec

net-timestamp: allow reading recv cmsg on errqueue with origin tstamp · 829ae9d6

由 Willem de Bruijn 提交于 11月 30, 2014

Allow reading of timestamps and cmsg at the same time on all relevant
socket families. One use is to correlate timestamps with egress
device, by asking for cmsg IP_PKTINFO.

on AF_INET sockets, call the relevant function (ip_cmsg_recv). To
avoid changing legacy expectations, only do so if the caller sets a
new timestamping flag SOF_TIMESTAMPING_OPT_CMSG.

on AF_INET6 sockets, IPV6_PKTINFO and all other recv cmsg are already
returned for all origins. only change is to set ifindex, which is
not initialized for all error origins.

In both cases, only generate the pktinfo message if an ifindex is
known. This is not the case for ACK timestamps.

The difference between the protocol families is probably a historical
accident as a result of the different conditions for generating cmsg
in the relevant ip(v6)_recv_error function:

ipv4:        if (serr->ee.ee_origin == SO_EE_ORIGIN_ICMP) {
ipv6:        if (serr->ee.ee_origin != SO_EE_ORIGIN_LOCAL) {

At one time, this was the same test bar for the ICMP/ICMP6
distinction. This is no longer true.
Signed-off-by: NWillem de Bruijn <willemb@google.com>

----

Changes
  v1 -> v2
    large rewrite
    - integrate with existing pktinfo cmsg generation code
    - on ipv4: only send with new flag, to maintain legacy behavior
    - on ipv6: send at most a single pktinfo cmsg
    - on ipv6: initialize fields if not yet initialized

The recv cmsg interfaces are also relevant to the discussion of
whether looping packet headers is problematic. For v6, cmsgs that
identify many headers are already returned. This patch expands
that to v4. If it sounds reasonable, I will follow with patches

1. request timestamps without payload with SOF_TIMESTAMPING_OPT_TSONLY
   (http://patchwork.ozlabs.org/patch/366967/)
2. sysctl to conditionally drop all timestamps that have payload or
   cmsg from users without CAP_NET_RAW.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

829ae9d6

08 12月, 2014 2 次提交

xfrm6: Fix the nexthdr offset in _decode_session6. · f8556919

由 Steffen Klassert 提交于 12月 08, 2014

xfrm_decode_session() was originally designed for the
usage in the receive path where the correct nexthdr offset
is stored in IP6CB(skb)->nhoff. Over time this function
spread to code that is used in the output path (netfilter,
vti) where IP6CB(skb)->nhoff is not set. As a result, we
get a wrong nexthdr and the upper layer flow informations
are wrong. This can leed to incorrect policy lookups.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

f8556919

xfrm6: Fix transport header offset in _decode_session6. · de3b7a06

由 Steffen Klassert 提交于 12月 04, 2014

skb->transport_header might not be valid when we do a reverse
decode because the ipv6 tunnel error handlers don't update it
to the inner transport header. This leads to a wrong offset
calculation and to wrong layer 4 informations. We fix this
by using the size of the ipv6 header as the first offset.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

de3b7a06

29 11月, 2014 1 次提交

netfilter: nf_log_ipv6: correct typo in module description · 4338c572

由 Steven Noonan 提交于 11月 27, 2014

It incorrectly identifies itself as "IPv4" packet logging.
Signed-off-by: NSteven Noonan <steven@uplinklabs.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4338c572

27 11月, 2014 2 次提交

netfilter: combine IPv4 and IPv6 nf_nat_redirect code in one module · b59eaf9e

由 Pablo Neira Ayuso 提交于 11月 26, 2014

This resolves linking problems with CONFIG_IPV6=n:

net/built-in.o: In function `redirect_tg6':
xt_REDIRECT.c:(.text+0x6d021): undefined reference to `nf_nat_redirect_ipv6'
Reported-by: NAndreas Ruprecht <rupran@einserver.de>
Reported-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b59eaf9e

ipv6: Remove unnecessary test · 73cf0e92

由 zhuyj 提交于 11月 26, 2014

The "init_net" test in function addrconf_exit_net is introduced
in commit 44a6bd29 [Create ipv6 devconf-s for namespaces] to avoid freeing
init_net. In commit c900a800 [ipv6: fix bad free of addrconf_init_net],
function addrconf_init_net will allocate memory for every net regardless of
init_net. In this case, it is unnecessary to make "init_net" test.

CC: Hong Zhiguo <honkiko@gmail.com>
CC: Octavian Purdila <opurdila@ixiacom.com>
CC: Pavel Emelyanov <xemul@openvz.org>
CC: Cong Wang <cwang@twopensource.com>
Suggested-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NZhu Yanjun <Yanjun.Zhu@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73cf0e92

26 11月, 2014 1 次提交

tcp: fix possible NULL dereference in tcp_vX_send_reset() · c3658e8d

由 Eric Dumazet 提交于 11月 25, 2014

After commit ca777eff ("tcp: remove dst refcount false sharing for
prequeue mode") we have to relax check against skb dst in
tcp_v[46]_send_reset() if prequeue dropped the dst.

If a socket is provided, a full lookup was done to find this socket,
so the dst test can be skipped.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88191Reported-by: NJaša Bartelj <jasa.bartelj@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDaniel Borkmann <dborkman@redhat.com>
Fixes: ca777eff ("tcp: remove dst refcount false sharing for prequeue mode")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3658e8d

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功