提交 · e110861f86094cd78cc85593b873970092deb43a · openanolis / cloud-kernel

14 5月, 2014 1 次提交

net: add a sysctl to reflect the fwmark on replies · e110861f

由 Lorenzo Colitti 提交于 5月 13, 2014

Kernel-originated IP packets that have no user socket associated
with them (e.g., ICMP errors and echo replies, TCP RSTs, etc.)
are emitted with a mark of zero. Add a sysctl to make them have
the same mark as the packet they are replying to.

This allows an administrator that wishes to do so to use
mark-based routing, firewalling, etc. for these replies by
marking the original packets inbound.

Tested using user-mode linux:
 - ICMP/ICMPv6 echo replies and errors.
 - TCP RST packets (IPv4 and IPv6).
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e110861f

13 5月, 2014 1 次提交

net: rename local_df to ignore_df · 60ff7467

由 WANG Cong 提交于 5月 04, 2014

As suggested by several people, rename local_df to ignore_df,
since it means "ignore df bit if it is set".

Cc: Maciej Żenczykowski <maze@google.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60ff7467

08 5月, 2014 1 次提交

net: clean up snmp stats code · 698365fa

由 WANG Cong 提交于 5月 05, 2014

commit 8f0ea0fe (snmp: reduce percpu needs by 50%)
reduced snmp array size to 1, so technically it doesn't have to be
an array any more. What's more, after the following commit:

	commit 933393f5
	Date:   Thu Dec 22 11:58:51 2011 -0600

	    percpu: Remove irqsafe_cpu_xxx variants

	    We simply say that regular this_cpu use must be safe regardless of
	    preemption and interrupt state.  That has no material change for x86
	    and s390 implementations of this_cpu operations.  However, arches that
	    do not provide their own implementation for this_cpu operations will
	    now get code generated that disables interrupts instead of preemption.

probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
almost 3 years, no one complains.

So, just convert the array to a single pointer and remove snmp_mib_init()
and snmp_mib_free() as well.

Cc: Christoph Lameter <cl@linux.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

698365fa

06 5月, 2014 1 次提交

net: Call skb_checksum_init in IPv4 · ed70fcfc

由 Tom Herbert 提交于 5月 02, 2014

Call skb_checksum_init instead of private functions.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed70fcfc

16 4月, 2014 2 次提交

ipv4: add a sock pointer to dst->output() path. · aad88724

由 Eric Dumazet 提交于 4月 15, 2014

In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.

The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.

Fixes: 8f646c92 ("vxlan: keep original skb ownership")
Reported-by: Nlucien xin <lucien.xin@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aad88724

ipv4: add a sock pointer to ip_queue_xmit() · b0270e91

由 Eric Dumazet 提交于 4月 15, 2014

ip_queue_xmit() assumes the skb it has to transmit is attached to an
inet socket. Commit 31c70d59 ("l2tp: keep original skb ownership")
changed l2tp to not change skb ownership and thus broke this assumption.

One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(),
so that we do not assume skb->sk points to the socket used by l2tp
tunnel.

Fixes: 31c70d59 ("l2tp: keep original skb ownership")
Reported-by: NZhan Jianyu <nasa4836@gmail.com>
Tested-by: NZhan Jianyu <nasa4836@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0270e91

07 3月, 2014 1 次提交

tcp: Use NET_ADD_STATS instead of NET_ADD_STATS_BH in tcp_event_new_data_sent() · f7324acd

由 David S. Miller 提交于 3月 06, 2014

Can be invoked from non-BH context.

Based upon a patch by Eric Dumazet.

Fixes: f19c29e3 ("tcp: snmp stats for Fast Open, SYN rtx, and data pkts")
Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7324acd

27 2月, 2014 1 次提交

ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT · 1b346576

由 Hannes Frederic Sowa 提交于 2月 26, 2014

IP_PMTUDISC_INTERFACE has a design error: because it does not allow the
generation of fragments if the interface mtu is exceeded, it is very
hard to make use of this option in already deployed name server software
for which I introduced this option.

This patch adds yet another new IP_MTU_DISCOVER option to not honor any
path mtu information and not accepting new icmp notifications destined for
the socket this option is enabled on. But we allow outgoing fragmentation
in case the packet size exceeds the outgoing interface mtu.

As such this new option can be used as a drop-in replacement for
IP_PMTUDISC_DONT, which is currently in use by most name server software
making the adoption of this option very smooth and easy.

The original advantage of IP_PMTUDISC_INTERFACE is still maintained:
ignoring incoming path MTU updates and not honoring discovered path MTUs
in the output path.

Fixes: 482fc609 ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE")
Cc: Florian Weimer <fweimer@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b346576

20 2月, 2014 1 次提交

ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg · c8e6ad08

由 Hannes Frederic Sowa 提交于 2月 18, 2014

In case we decide in udp6_sendmsg to send the packet down the ipv4
udp_sendmsg path because the destination is either of family AF_INET or
the destination is an ipv4 mapped ipv6 address, we don't honor the
maybe specified ipv4 mapped ipv6 address in IPV6_PKTINFO.

We simply can check for this option in ip_cmsg_send because no calls to
ipv6 module functions are needed to do so.
Reported-by: NGert Doering <gert@space.net>
Cc: Tore Anderson <tore@fud.no>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8e6ad08

20 1月, 2014 1 次提交

ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams · 4b261c75

由 Hannes Frederic Sowa 提交于 1月 20, 2014

We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
for IPv4 datagrams on IPv6 sockets.

This patch splits the ip6_datagram_recv_ctl into two functions, one
which handles both protocol families, AF_INET and AF_INET6, while the
ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.

ip6_datagram_recv_*_ctl never reported back any errors, so we can make
them return void. Also provide a helper for protocols which don't offer dual
personality to further use ip6_datagram_recv_ctl, which is exported to
modules.

I needed to shuffle the code for ping around a bit to make it easier to
implement dual personality for ping ipv6 sockets in future.
Reported-by: NGert Doering <gert@space.net>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b261c75

15 1月, 2014 1 次提交

ipv4: register igmp_notifier even when !CONFIG_PROC_FS · 72c1d3bd

由 WANG Cong 提交于 1月 10, 2014

We still need this notifier even when we don't config
PROC_FS.

It should be rare to have a kernel without PROC_FS,
so just for completeness.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72c1d3bd

14 1月, 2014 1 次提交

ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing · f87c10a8

由 Hannes Frederic Sowa 提交于 1月 09, 2014

While forwarding we should not use the protocol path mtu to calculate
the mtu for a forwarded packet but instead use the interface mtu.

We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was
introduced for multicast forwarding. But as it does not conflict with
our usage in unicast code path it is perfect for reuse.

I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu
along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular
dependencies because of IPSKB_FORWARDED.

Because someone might have written a software which does probe
destinations manually and expects the kernel to honour those path mtus
I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone
can disable this new behaviour. We also still use mtus which are locked on a
route for forwarding.

The reason for this change is, that path mtus information can be injected
into the kernel via e.g. icmp_err protocol handler without verification
of local sockets. As such, this could cause the IPv4 forwarding path to
wrongfully emit fragmentation needed notifications or start to fragment
packets along a path.

Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED
won't be set and further fragmentation logic will use the path mtu to
determine the fragmentation size. They also recheck packet size with
help of path mtu discovery and report appropriate errors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f87c10a8

19 12月, 2013 1 次提交

inet: make no_pmtu_disc per namespace and kill ipv4_config · 974eda11

由 Hannes Frederic Sowa 提交于 12月 14, 2013

The other field in ipv4_config, log_martians, was converted to a
per-interface setting, so we can just remove the whole structure.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

974eda11

24 11月, 2013 1 次提交

inet: fix addr_len/msg->msg_namelen assignment in recv_error and rxpmtu functions · 85fbaa75

由 Hannes Frederic Sowa 提交于 11月 23, 2013

Commit bceaa902 ("inet: prevent leakage
of uninitialized memory to user in recv syscalls") conditionally updated
addr_len if the msg_name is written to. The recv_error and rxpmtu
functions relied on the recvmsg functions to set up addr_len before.

As this does not happen any more we have to pass addr_len to those
functions as well and set it to the size of the corresponding sockaddr
length.

This broke traceroute and such.

Fixes: bceaa902 ("inet: prevent leakage of uninitialized memory to user in recv syscalls")
Reported-by: NBrad Spengler <spender@grsecurity.net>
Reported-by: Tom Labanowski
Cc: mpb <mpb.mail@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85fbaa75

09 10月, 2013 2 次提交

ipv6: make lookups simpler and faster · efe4208f

由 Eric Dumazet 提交于 10月 03, 2013

TCP listener refactoring, part 4 :

To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common

Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.

Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).

inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.

inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.

We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efe4208f

net: ipv4 only populate IP_PKTINFO when needed · fbf8866d

由 Shawn Bohrer 提交于 10月 07, 2013

The since the removal of the routing cache computing
fib_compute_spec_dst() does a fib_table lookup for each UDP multicast
packet received.  This has introduced a performance regression for some
UDP workloads.

This change skips populating the packet info for sockets that do not have
IP_PKTINFO set.

Benchmark results from a netperf UDP_RR test:
Before 89789.68 transactions/s
After  90587.62 transactions/s

Benchmark results from a fio 1 byte UDP multicast pingpong test
(Multicast one way unicast response):
Before 12.63us RTT
After  12.48us RTT
Signed-off-by: NShawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbf8866d

01 10月, 2013 1 次提交

net ipv4: Convert ipv4.ip_local_port_range to be per netns v3 · 0bbf87d8

由 Eric W. Biederman 提交于 9月 28, 2013

- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
  of the callers.
- Move the initialization of sysctl_local_ports into
   sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c

v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next

v3:
- Compile fixes of strange callers of inet_get_local_port_range.
  This patch now successfully passes an allmodconfig build.
  Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range
Originally-by: NSamya <samya@twitter.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bbf87d8

29 9月, 2013 2 次提交

ipv4: processing ancillary IP_TOS or IP_TTL · aa661581

由 Francesco Fusco 提交于 9月 24, 2013

If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
packets with the specified TTL or TOS overriding the socket values specified
with the traditional setsockopt().

The struct inet_cork stores the values of TOS, TTL and priority that are
passed through the struct ipcm_cookie. If there are user-specified TOS
(tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
used to override the per-socket values. In case of TOS also the priority
is changed accordingly.

Two helper functions get_rttos and get_rtconn_flags are defined to take
into account the presence of a user specified TOS value when computing
RT_TOS and RT_CONN_FLAGS.
Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa661581

ipv4: IP_TOS and IP_TTL can be specified as ancillary data · f02db315

由 Francesco Fusco 提交于 9月 24, 2013

This patch enables the IP_TTL and IP_TOS values passed from userspace to
be stored in the ipcm_cookie struct. Three fields are added to the struct:

- the TTL, expressed as __u8.
  The allowed values are in the [1-255].
  A value of 0 means that the TTL is not specified.

- the TOS, expressed as __s16.
  The allowed values are in the range [0,255].
  A value of -1 means that the TOS is not specified.

- the priority, expressed as a char and computed when
  handling the ancillary data.
Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f02db315

22 9月, 2013 1 次提交

ip*.h: Remove extern from function prototypes · 5c3a0fd7

由 Joe Perches 提交于 9月 21, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c3a0fd7

20 9月, 2013 1 次提交

ip: generate unique IP identificator if local fragmentation is allowed · 703133de

由 Ansis Atteka 提交于 9月 18, 2013

If local fragmentation is allowed, then ip_select_ident() and
ip_select_ident_more() need to generate unique IDs to ensure
correct defragmentation on the peer.

For example, if IPsec (tunnel mode) has to encrypt large skbs
that have local_df bit set, then all IP fragments that belonged
to different ESP datagrams would have used the same identificator.
If one of these IP fragments would get lost or reordered, then
peer could possibly stitch together wrong IP fragments that did
not belong to the same datagram. This would lead to a packet loss
or data corruption.
Signed-off-by: NAnsis Atteka <aatteka@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

703133de

03 9月, 2013 1 次提交

net: make snmp_mib_free static inline · 5a17a390

由 Cong Wang 提交于 9月 02, 2013

Fengguang reported:

   net/built-in.o: In function `in6_dev_finish_destroy':
   (.text+0x4ca7d): undefined reference to `snmp_mib_free'

this is due to snmp_mib_free() is defined when CONFIG_INET is enabled,
but in6_dev_finish_destroy() is now moved to core kernel.

I think snmp_mib_free() is small enough to be inlined, so just make it
static inline.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a17a390

22 1月, 2013 1 次提交

ipv4: Add a socket release callback for datagram sockets · 8141ed9f

由 Steffen Klassert 提交于 1月 21, 2013

This implements a socket release callback function to check
if the socket cached route got invalid during the time
we owned the socket. The function is used from udp, raw
and ping sockets.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8141ed9f

27 8月, 2012 1 次提交

ipv4: fix path MTU discovery with connection tracking · 5f2d04f1

由 Patrick McHardy 提交于 8月 26, 2012

IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.

This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>

5f2d04f1

11 8月, 2012 1 次提交

ipv4: fix ip_send_skb() · b5ec8eea

由 Eric Dumazet 提交于 8月 10, 2012

ip_send_skb() can send orphaned skb, so we must pass the net pointer to
avoid possible NULL dereference in error path.

Bug added by commit 3a7c384f (ipv4: tcp: unicast_sock should not
land outside of TCP stack)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5ec8eea

20 7月, 2012 1 次提交

ipv4: tcp: remove per net tcp_sock · be9f4a44

由 Eric Dumazet 提交于 7月 19, 2012

tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.

This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.

To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)

Additional features :

1) We also mirror the queue_mapping of the incoming skb, so that
answers use the same queue if possible.

2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()

3) We now limit the number of in-flight RST/ACK [1] packets
per cpu, instead of per namespace, and we honor the sysctl_wmem_default
limit dynamically. (Prior to this patch, sysctl_wmem_default value was
copied at boot time, so any further change would not affect tcp_sock
limit)

[1] These packets are only generated when no socket was matched for
the incoming packet.
Reported-by: NBill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

be9f4a44

28 6月, 2012 1 次提交

ipv4: Show that ip_send_reply() is purely unicast routine. · 70e73416

由 David S. Miller 提交于 6月 28, 2012

Rename it to ip_send_unicast_reply() and add explicit 'saddr'
argument.

This removed one of the few users of rt->rt_spec_dst.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70e73416

23 6月, 2012 1 次提交

ipv4: Add sysctl knob to control early socket demux · 6648bd7e

由 Alexander Duyck 提交于 6月 21, 2012

This change is meant to add a control for disabling early socket demux.
The main motivation behind this patch is to provide an option to disable
the feature as it adds an additional cost to routing that reduces overall
throughput by up to 5%.  For example one of my systems went from 12.1Mpps
to 11.6 after the early socket demux was added.  It looks like the reason
for the regression is that we are now having to perform two lookups, first
the one for an established socket, and then the one for the routing table.

By adding this patch and toggling the value for ip_early_demux to 0 I am
able to get back to the 12.1Mpps I was previously seeing.

[ Move local variables in ip_rcv_finish() down into the basic
  block in which they are actually used.  -DaveM ]
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6648bd7e

16 5月, 2012 1 次提交

net: delete all instances of special processing for token ring · 211ed865

由 Paul Gortmaker 提交于 5月 10, 2012

We are going to delete the Token ring support.  This removes any
special processing in the core networking for token ring, (aside
from net/tr.c itself), leaving the drivers and remaining tokenring
support present but inert.

The mass removal of the drivers and net/tr.c will be in a separate
commit, so that the history of these files that we still care
about won't have the giant deletion tied into their history.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

211ed865

21 4月, 2012 1 次提交

net: Delete all remaining instances of ctl_path · a5347fe3

由 Eric W. Biederman 提交于 4月 19, 2012

We don't use struct ctl_path anymore so delete the exported constants.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5347fe3

10 3月, 2012 1 次提交
- D
  ipv4: Make ip_call_ra_chain() return bool. · ba57b4db
  由 David S. Miller 提交于 3月 07, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  ba57b4db
12 12月, 2011 1 次提交

net: use IS_ENABLED(CONFIG_IPV6) · dfd56b8b

由 Eric Dumazet 提交于 12月 10, 2011

Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfd56b8b

10 11月, 2011 1 次提交

ipv4: PKTINFO doesnt need dst reference · d826eb14

由 Eric Dumazet 提交于 11月 09, 2011

Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :

> At least, in recent kernels we dont change dst->refcnt in forwarding
> patch (usinf NOREF skb->dst)
>
> One particular point is the atomic_inc(dst->refcnt) we have to perform
> when queuing an UDP packet if socket asked PKTINFO stuff (for example a
> typical DNS server has to setup this option)
>
> I have one patch somewhere that stores the information in skb->cb[] and
> avoid the atomic_{inc|dec}(dst->refcnt).
>

OK I found it, I did some extra tests and believe its ready.

[PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference

When a socket uses IP_PKTINFO notifications, we currently force a dst
reference for each received skb. Reader has to access dst to get needed
information (rt_iif & rt_spec_dst) and must release dst reference.

We also forced a dst reference if skb was put in socket backlog, even
without IP_PKTINFO handling. This happens under stress/load.

We can instead store the needed information in skb->cb[], so that only
softirq handler really access dst, improving cache hit ratios.

This removes two atomic operations per packet, and false sharing as
well.

On a benchmark using a mono threaded receiver (doing only recvmsg()
calls), I can reach 720.000 pps instead of 570.000 pps.

IP_PKTINFO is typically used by DNS servers, and any multihomed aware
UDP application.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d826eb14

24 10月, 2011 1 次提交

ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT · 66b13d99

由 Eric Dumazet 提交于 10月 24, 2011

There is a long standing bug in linux tcp stack, about ACK messages sent
on behalf of TIME_WAIT sockets.

In the IP header of the ACK message, we choose to reflect TOS field of
incoming message, and this might break some setups.

Example of things that were broken :
  - Routing using TOS as a selector
  - Firewalls
  - Trafic classification / shaping

We now remember in timewait structure the inet tos field and use it in
ACK generation, and route lookup.

Notes :
 - We still reflect incoming TOS in RST messages.
 - We could extend MuraliRaja Muniraju patch to report TOS value in
netlink messages for TIME_WAIT sockets.
 - A patch is needed for IPv6
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66b13d99

19 10月, 2011 1 次提交

macvlan: handle fragmented multicast frames · bc416d97

由 Eric Dumazet 提交于 10月 06, 2011

Fragmented multicast frames are delivered to a single macvlan port,
because ip defrag logic considers other samples are redundant.

Implement a defrag step before trying to send the multicast frame.
Reported-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc416d97

06 7月, 2011 1 次提交

ipv4: Add ip_defrag() agent IP_DEFRAG_AF_PACKET. · 595fc71b

由 David S. Miller 提交于 7月 05, 2011

Elide the ICMP on frag queue timeouts unconditionally for
this user.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

595fc71b

24 6月, 2011 1 次提交

net: Fix build failures due to ip_is_fragment() · d18cd551

由 David S. Miller 提交于 6月 23, 2011

It needs to be available even when CONFIG_INET is not set.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d18cd551

22 6月, 2011 1 次提交

ip: introduce ip_is_fragment helper inline function · 56f8a75c

由 Paul Gortmaker 提交于 6月 21, 2011

There are enough instances of this:

    iph->frag_off & htons(IP_MF | IP_OFFSET)

that a helper function is probably warranted.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56f8a75c

09 6月, 2011 1 次提交

inetpeer: remove unused list · 4b9d9be8

由 Eric Dumazet 提交于 6月 08, 2011

Andi Kleen and Tim Chen reported huge contention on inetpeer
unused_peers.lock, on memcached workload on a 40 core machine, with
disabled route cache.

It appears we constantly flip peers refcnt between 0 and 1 values, and
we must insert/remove peers from unused_peers.list, holding a contended
spinlock.

Remove this list completely and perform a garbage collection on-the-fly,
at lookup time, using the expired nodes we met during the tree
traversal.

This removes a lot of code, makes locking more standard, and obsoletes
two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
removes two pointers in inet_peer structure.

There is still a false sharing effect because refcnt is in first cache
line of object [were the links and keys used by lookups are located], we
might move it at the end of inet_peer structure to let this first cache
line mostly read by cpus.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b9d9be8

11 5月, 2011 1 次提交
- D
  ipv4: Pass explicit daddr arg to ip_send_reply(). · 0a5ebb80
  由 David S. Miller 提交于 5月 09, 2011
```
This eliminates an access to rt->rt_src.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  0a5ebb80

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功