提交 · c319b4d76b9e583a5d88d6bf190e079c4e43213d · openeuler / raspberrypi-kernel

14 5月, 2011 1 次提交

net: ipv4: add IPPROTO_ICMP socket kind · c319b4d7

由 Vasiliy Kulikov 提交于 5月 13, 2011

This patch adds IPPROTO_ICMP socket kind.  It makes it possible to send
ICMP_ECHO messages and receive the corresponding ICMP_ECHOREPLY messages
without any special privileges.  In other words, the patch makes it
possible to implement setuid-less and CAP_NET_RAW-less /bin/ping.  In
order not to increase the kernel's attack surface, the new functionality
is disabled by default, but is enabled at bootup by supporting Linux
distributions, optionally with restriction to a group or a group range
(see below).

Similar functionality is implemented in Mac OS X:
http://www.manpagez.com/man/4/icmp/

A new ping socket is created with

    socket(PF_INET, SOCK_DGRAM, PROT_ICMP)

Message identifiers (octets 4-5 of ICMP header) are interpreted as local
ports. Addresses are stored in struct sockaddr_in. No port numbers are
reserved for privileged processes, port 0 is reserved for API ("let the
kernel pick a free number"). There is no notion of remote ports, remote
port numbers provided by the user (e.g. in connect()) are ignored.

Data sent and received include ICMP headers. This is deliberate to:
1) Avoid the need to transport headers values like sequence numbers by
other means.
2) Make it easier to port existing programs using raw sockets.

ICMP headers given to send() are checked and sanitized. The type must be
ICMP_ECHO and the code must be zero (future extensions might relax this,
see below). The id is set to the number (local port) of the socket, the
checksum is always recomputed.

ICMP reply packets received from the network are demultiplexed according
to their id's, and are returned by recv() without any modifications.
IP header information and ICMP errors of those packets may be obtained
via ancillary data (IP_RECVTTL, IP_RETOPTS, and IP_RECVERR). ICMP source
quenches and redirects are reported as fake errors via the error queue
(IP_RECVERR); the next hop address for redirects is saved to ee_info (in
network order).

socket(2) is restricted to the group range specified in
"/proc/sys/net/ipv4/ping_group_range".  It is "1 0" by default, meaning
that nobody (not even root) may create ping sockets.  Setting it to "100
100" would grant permissions to the single group (to either make
/sbin/ping g+s and owned by this group or to grant permissions to the
"netadmins" group), "0 4294967295" would enable it for the world, "100
4294967295" would enable it for the users, but not daemons.

The existing code might be (in the unlikely case anyone needs it)
extended rather easily to handle other similar pairs of ICMP messages
(Timestamp/Reply, Information Request/Reply, Address Mask Request/Reply
etc.).

Userspace ping util & patch for it:
http://openwall.info/wiki/people/segoon/ping

For Openwall GNU/*/Linux it was the last step on the road to the
setuid-less distro.  A revision of this patch (for RHEL5/OpenVZ kernels)
is in use in Owl-current, such as in the 2011/03/12 LiveCD ISOs:
http://mirrors.kernel.org/openwall/Owl/current/iso/

Initially this functionality was written by Pavel Kankovsky for
Linux 2.4.32, but unfortunately it was never made public.

All ping options (-b, -p, -Q, -R, -s, -t, -T, -M, -I), are tested with
the patch.

PATCH v3:
    - switched to flowi4.
    - minor changes to be consistent with raw sockets code.

PATCH v2:
    - changed ping_debug() to pr_debug().
    - removed CONFIG_IP_PING.
    - removed ping_seq_fops.owner field (unused for procfs).
    - switched to proc_net_fops_create().
    - switched to %pK in seq_printf().

PATCH v1:
    - fixed checksumming bug.
    - CAP_NET_RAW may not create icmp sockets anymore.

RFC v2:
    - minor cleanups.
    - introduced sysctl'able group range to restrict socket(2).
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c319b4d7

13 5月, 2011 4 次提交

ipv4: Fix 'iph' use before set. · 72a8f97b

由 David S. Miller 提交于 5月 12, 2011

I swear none of my compilers warned about this, yet it is so
obvious.

> net/ipv4/ip_forward.c: In function 'ip_forward':
> net/ipv4/ip_forward.c:87: warning: 'iph' may be used uninitialized in this function
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72a8f97b

ipv4: Elide use of rt->rt_dst in ip_forward() · def57687

由 David S. Miller 提交于 5月 12, 2011

No matter what kind of header mangling occurs due to IP options
processing, rt->rt_dst will always equal iph->daddr in the packet.

So we can safely use iph->daddr instead of rt->rt_dst here.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

def57687

ipv4: Simplify iph->daddr overwrite in ip_options_rcv_srr(). · c30883bd

由 David S. Miller 提交于 5月 12, 2011

We already copy the 4-byte nexthop from the options block into
local variable "nexthop" for the route lookup.

Re-use that variable instead of memcpy()'ing again when assigning
to iph->daddr after the route lookup succeeds.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c30883bd

ipv4: Kill spurious opt->srr check in ip_options_rcv_srr(). · 10949550

由 David S. Miller 提交于 5月 12, 2011

All call sites conditionalize the call to ip_options_rcv_srr()
with a check of opt->srr, so no need to check it again there.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10949550

11 5月, 2011 6 次提交

xfrm: Assign the inner mode output function to the dst entry · 43a4dea4

由 Steffen Klassert 提交于 5月 09, 2011

As it is, we assign the outer modes output function to the dst entry
when we create the xfrm bundle. This leads to two problems on interfamily
scenarios. We might insert ipv4 packets into ip6_fragment when called
from xfrm6_output. The system crashes if we try to fragment an ipv4
packet with ip6_fragment. This issue was introduced with git commit
ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
as needed). The second issue is, that we might insert ipv4 packets in
netfilter6 and vice versa on interfamily scenarios.

With this patch we assign the inner mode output function to the dst entry
when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
mode is used and the right fragmentation and netfilter functions are called.
We switch then to outer mode with the output_finish functions.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43a4dea4

net: fix two lockdep splats · 1fc19aff

由 Eric Dumazet 提交于 5月 09, 2011

Commit e67f88dd (net: dont hold rtnl mutex during netlink dump
callbacks) switched rtnl protection to RCU, but we forgot to adjust two
rcu_dereference() lockdep annotations :

inet_get_link_af_size() or inet_fill_link_af() might be called with
rcu_read_lock or rtnl held, so use rcu_dereference_rtnl()
instead of rtnl_dereference()
Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fc19aff

ipv4: xfrm: Eliminate ->rt_src reference in policy code. · 8f01cb08

由 David S. Miller 提交于 5月 09, 2011

Rearrange xfrm4_dst_lookup() so that it works by calling a helper
function __xfrm_dst_lookup() that takes an explicit flow key storage
area as an argument.

Use this new helper in xfrm4_get_saddr() so we can fetch the selected
source address from the flow instead of from rt->rt_src
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f01cb08

ipv4: udp: Eliminate remaining uses of rt->rt_src · 79ab0531

由 David S. Miller 提交于 5月 09, 2011

We already track and pass around the correct flow key,
so simply use it in udp_send_skb().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79ab0531

ipv4: icmp: Eliminate remaining uses of rt->rt_src · 9f6abb5f

由 David S. Miller 提交于 5月 09, 2011

On input packets, rt->rt_src always equals ip_hdr(skb)->saddr

Anything that mangles or otherwise changes the IP header must
relookup the route found at skb_rtable().  Therefore this
invariant must always hold true.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f6abb5f

D
ipv4: Pass explicit daddr arg to ip_send_reply(). · 0a5ebb80
由 David S. Miller 提交于 5月 09, 2011
```
This eliminates an access to rt->rt_src.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
0a5ebb80

09 5月, 2011 13 次提交

D
ipv4: Pass flow key down into ip_append_*(). · f5fca608
由 David S. Miller 提交于 5月 08, 2011
```
This way rt->rt_dst accesses are unnecessary.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f5fca608

ipv4: Pass flow keys down into datagram packet building engine. · 77968b78

由 David S. Miller 提交于 5月 08, 2011

This way ip_output.c no longer needs rt->rt_{src,dst}.

We already have these keys sitting, ready and waiting, on the stack or
in a socket structure.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77968b78

udp: Use flow key information instead of rt->rt_{src,dst} · e474995f

由 David S. Miller 提交于 5月 08, 2011

We have two cases.

Either the socket is in TCP_ESTABLISHED state and connect() filled
in the inet socket cork flow, or we looked up the route here and
used an on-stack flow.

Track which one it was, and use it to obtain src/dst addrs.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e474995f

tcp_cubic: limit delayed_ack ratio to prevent divide error · b9f47a3a

由 stephen hemminger 提交于 5月 04, 2011

TCP Cubic keeps a metric that estimates the amount of delayed
acknowledgements to use in adjusting the window. If an abnormally
large number of packets are acknowledged at once, then the update
could wrap and reach zero. This kind of ACK could only
happen when there was a large window and huge number of
ACK's were lost.

This patch limits the value of delayed ack ratio. The choice of 32
is just a conservative value since normally it should be range of
1 to 4 packets.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9f47a3a

D
tcp: Use cork flow info instead of rt->rt_dst in tcp_v4_get_peer() · c5216cc7
由 David S. Miller 提交于 5月 06, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c5216cc7
D
ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit(). · ea4fc0d6
由 David S. Miller 提交于 5月 06, 2011
```
Now we can pick it out of the provided flow key.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
ea4fc0d6

inet: Pass flowi to ->queue_xmit(). · d9d8da80

由 David S. Miller 提交于 5月 06, 2011

This allows us to acquire the exact route keying information from the
protocol, however that might be managed.

It handles all of the possibilities, from the simplest case of storing
the key in inet->cork.fl to the more complex setup SCTP has where
individual transports determine the flow.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9d8da80

ipv4: Use inet_csk_route_child_sock() in DCCP and TCP. · 0e734419

由 David S. Miller 提交于 5月 08, 2011

Operation order is now transposed, we first create the child
socket then we try to hook up the route.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e734419

ipv4: Create inet_csk_route_child_sock(). · 77357a95

由 David S. Miller 提交于 5月 08, 2011

This is just like inet_csk_route_req() except that it operates after
we've created the new child socket.

In this way we can use the new socket's cork flow for proper route
key storage.

This will be used by DCCP and TCP child socket creation handling.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77357a95

ipv4: Use cork flow in ip_queue_xmit() · b57ae01a

由 David S. Miller 提交于 5月 06, 2011

All invokers of ip_queue_xmit() must make certain that the
socket is locked.  All of SCTP, TCP, DCCP, and L2TP now make
sure this is the case.

Therefore we can use the cork flow during output route lookup in
ip_queue_xmit() when the socket route check fails.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b57ae01a

ipv4: Use cork flow in inet_sk_{reselect_saddr,rebuild_header}() · 6e869138

由 David S. Miller 提交于 5月 06, 2011

These two functions must be invoked only when the socket is locked
(because socket identity modifications are made non-atomically).

Therefore we can use the cork flow for output route lookups.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e869138

ipv4: Lock socket and use cork flow in ip4_datagram_connect(). · 3038eeac

由 David S. Miller 提交于 5月 06, 2011

This is to make sure that an l2tp socket's inet cork flow is
fully filled in, when it's encapsulated in UDP.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3038eeac

tcp: Use cork flow in tcp_v4_connect() · da905bd1

由 David S. Miller 提交于 5月 06, 2011

Since this is invoked from inet_stream_connect() the socket is locked
and therefore this usage is safe.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da905bd1

07 5月, 2011 3 次提交

D
ipv4: Initialize cork->opt using NULL not 0. · 70652728
由 David S. Miller 提交于 5月 06, 2011
```
Noticed by Joe Perches.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
70652728

ipv4: Initialize on-stack cork more efficiently. · b80d7226

由 David S. Miller 提交于 5月 06, 2011

ip_setup_cork() explicitly initializes every member of
inet_cork except flags, addr, and opt.  So we can simply
set those three members to zero instead of using a
memset() via an empty struct assignment.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

b80d7226

inet: Decrease overhead of on-stack inet_cork. · bdc712b4

由 David S. Miller 提交于 5月 06, 2011

When we fast path datagram sends to avoid locking by putting
the inet_cork on the stack we use up lots of space that isn't
necessary.

This is because inet_cork contains a "struct flowi" which isn't
used in these code paths.

Split inet_cork to two parts, "inet_cork" and "inet_cork_full".
Only the latter of which has the "struct flowi" and is what is
stored in inet_sock.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

bdc712b4

06 5月, 2011 1 次提交

net: call dev_alloc_name from register_netdevice · 1c5cae81

由 Jiri Pirko 提交于 4月 30, 2011

Force dev_alloc_name() to be called from register_netdevice() by
dev_get_valid_name(). That allows to remove multiple explicit
dev_alloc_name() calls.

The possibility to call dev_alloc_name in advance remains.

This also fixes veth creation regresion caused by
84c49d8cSigned-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c5cae81

05 5月, 2011 5 次提交

net: ip_expire() must revalidate route · 64f3b9e2

由 Eric Dumazet 提交于 5月 04, 2011

Commit 4a94445c (net: Use ip_route_input_noref() in input path)
added a bug in IP defragmentation handling, in case timeout is fired.

When a frame is defragmented, we use last skb dst field when building
final skb. Its dst is valid, since we are in rcu read section.

But if a timeout occurs, we take first queued fragment to build one ICMP
TIME EXCEEDED message. Problem is all queued skb have weak dst pointers,
since we escaped RCU critical section after their queueing. icmp_send()
might dereference a now freed (and possibly reused) part of memory.

Calling skb_dst_drop() and ip_route_input_noref() to revalidate route is
the only possible choice.
Reported-by: NDenys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64f3b9e2

ipv4: Kill rt->rt_{src, dst} usage in IP GRE tunnels. · cbb1e85f

由 David S. Miller 提交于 5月 04, 2011

First, make callers pass on-stack flowi4 to ip_route_output_gre()
so they can get at the fully resolved flow key.

Next, use that in ipgre_tunnel_xmit() to avoid the need to use
rt->rt_{dst,src}.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbb1e85f

D
ipv4: Pass explicit saddr/daddr args to ipmr_get_route(). · 9a1b9496
由 David S. Miller 提交于 5月 04, 2011
```
This eliminates the need to use rt->rt_{src,dst}.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
9a1b9496

ipv4: In ip_build_and_send_pkt() use 'saddr' and 'daddr' args passed in. · dd927a26

由 David S. Miller 提交于 5月 04, 2011

Instead of rt->rt_{dst,src}

The only tricky part is source route option handling.

If the source route option is enabled we can't just use plain 'daddr',
we have to use opt->opt.faddr.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd927a26

D
ipv4: Use flowi4->{daddr,saddr} in ipip_tunnel_xmit(). · 69458cb1
由 David S. Miller 提交于 5月 04, 2011
```
Instead of rt->rt_{dst,src}
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
69458cb1

04 5月, 2011 4 次提交
- D
  ipv4: Use flowi4's {saddr,daddr} in igmpv3_newpack() and igmp_send_report() · 492f64ce
  由 David S. Miller 提交于 5月 03, 2011
```
Instead of rt->rt_{src,dst}
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  492f64ce
- D
  ipv4: Make caller provide on-stack flow key to ip_route_output_ports(). · 31e4543d
  由 David S. Miller 提交于 5月 03, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  31e4543d
- D
  ipv4: Renamt struct rtable's rt_tos to rt_key_tos. · 475949d8
  由 David S. Miller 提交于 5月 03, 2011
```
To more accurately reflect that it is purely a routing
cache lookup key and is used in no other context.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  475949d8
- D
  ipv4: Rework ipmr_rt_fib_lookup() flow key initialization. · 417da66f
  由 David S. Miller 提交于 5月 03, 2011
```
Use information from the skb as much as possible, currently
this means daddr, saddr, and TOS.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  417da66f
03 5月, 2011 2 次提交

sysctl: net: call unregister_net_sysctl_table where needed · ff538818

由 Lucian Adrian Grijincu 提交于 5月 01, 2011

ctl_table_headers registered with register_net_sysctl_table should
have been unregistered with the equivalent unregister_net_sysctl_table
Signed-off-by: NLucian Adrian Grijincu <lucian.grijincu@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff538818

ipv4: Make sure flowi4->{saddr,daddr} are always set. · 56157872

由 David S. Miller 提交于 5月 02, 2011

Slow path output route resolution always makes sure that
->{saddr,daddr} are set, and also if we trigger into IPSEC resolution
we initialize them as well, because xfrm_lookup() expects them to be
fully resolved.

But if we hit the fast path and flowi4->flowi4_proto is zero, we won't
do this initialization.

Therefore, move the IPSEC path initialization to the route cache
lookup fast path to make sure these are always set.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56157872

02 5月, 2011 1 次提交

ipv4: don't spam dmesg with "Using LC-trie" messages · 7cfd2609

由 Alexey Dobriyan 提交于 5月 01, 2011

fib_trie_table() is called during netns creation and
Chromium uses clone(CLONE_NEWNET) to sandbox renderer process.

Don't print anything.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cfd2609