提交 · 43480aecb1f538d4f6dd8b2c5d2b71fb98659072 · openeuler / raspberrypi-kernel

10 12月, 2011 1 次提交

udp: Export code sk lookup routines · fce82338

由 Pavel Emelyanov 提交于 12月 09, 2011

The UDP diag get_exact handler will require them to find a
socket by provided net, [sd]addr-s, [sd]ports and device.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fce82338

02 12月, 2011 1 次提交

Revert "udp: remove redundant variable" · 59c2cdae

由 David S. Miller 提交于 12月 01, 2011

This reverts commit 81d54ec8.

If we take the "try_again" goto, due to a checksum error,
the 'len' has already been truncated.  So we won't compute
the same values as the original code did.
Reported-by: Npaul bilke <fsmail@conspiracy.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59c2cdae

17 11月, 2011 1 次提交

net: introduce and use netdev_features_t for device features sets · c8f44aff

由 Michał Mirosław 提交于 11月 15, 2011

v2:	add couple missing conversions in drivers
	split unexporting netdev_fix_features()
	implemented %pNF
	convert sock::sk_route_(no?)caps
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8f44aff

10 11月, 2011 1 次提交

ipv4: PKTINFO doesnt need dst reference · d826eb14

由 Eric Dumazet 提交于 11月 09, 2011

Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :

> At least, in recent kernels we dont change dst->refcnt in forwarding
> patch (usinf NOREF skb->dst)
>
> One particular point is the atomic_inc(dst->refcnt) we have to perform
> when queuing an UDP packet if socket asked PKTINFO stuff (for example a
> typical DNS server has to setup this option)
>
> I have one patch somewhere that stores the information in skb->cb[] and
> avoid the atomic_{inc|dec}(dst->refcnt).
>

OK I found it, I did some extra tests and believe its ready.

[PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference

When a socket uses IP_PKTINFO notifications, we currently force a dst
reference for each received skb. Reader has to access dst to get needed
information (rt_iif & rt_spec_dst) and must release dst reference.

We also forced a dst reference if skb was put in socket backlog, even
without IP_PKTINFO handling. This happens under stress/load.

We can instead store the needed information in skb->cb[], so that only
softirq handler really access dst, improving cache hit ratios.

This removes two atomic operations per packet, and false sharing as
well.

On a benchmark using a mono threaded receiver (doing only recvmsg()
calls), I can reach 720.000 pps instead of 570.000 pps.

IP_PKTINFO is typically used by DNS servers, and any multihomed aware
UDP application.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d826eb14

02 11月, 2011 2 次提交

udp: fix a race in encap_rcv handling · 0ad92ad0

由 Eric Dumazet 提交于 11月 01, 2011

udp_queue_rcv_skb() has a possible race in encap_rcv handling, since
this pointer can be changed anytime.

We should use ACCESS_ONCE() to close the race.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ad92ad0

net: make the tcp and udp file_operations for the /proc stuff const · 73cb88ec

由 Arjan van de Ven 提交于 10月 30, 2011

the tcp and udp code creates a set of struct file_operations at runtime
while it can also be done at compile time, with the added benefit of then
having these file operations be const.

the trickiest part was to get the "THIS_MODULE" reference right; the naive
method of declaring a struct in the place of registration would not work
for this reason.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73cb88ec

18 8月, 2011 1 次提交

rps: Add flag to skb to indicate rxhash is based on L4 tuple · bdeab991

由 Tom Herbert 提交于 8月 14, 2011

The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet.  This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdeab991

12 8月, 2011 1 次提交

net: cleanup some rcu_dereference_raw · 33d480ce

由 Eric Dumazet 提交于 8月 11, 2011

RCU api had been completed and rcu_access_pointer() or
rcu_dereference_protected() are better than generic
rcu_dereference_raw()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33d480ce

07 7月, 2011 1 次提交

net: refine {udp|tcp|sctp}_mem limits · f03d78db

由 Eric Dumazet 提交于 7月 07, 2011

Current tcp/udp/sctp global memory limits are not taking into account
hugepages allocations, and allow 50% of ram to be used by buffers of a
single protocol [ not counting space used by sockets / inodes ...]

Lets use nr_free_buffer_pages() and allow a default of 1/8 of kernel ram
per protocol, and a minimum of 128 pages.
Heavy duty machines sysadmins probably need to tweak limits anyway.

References: https://bugzilla.stlinux.com/show_bug.cgi?id=38032Reported-by: Nstarlight <starlight@binnacle.cx>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f03d78db

22 6月, 2011 2 次提交

udp/recvmsg: Clear MSG_TRUNC flag when starting over for a new packet · 9cfaa8de

由 Xufeng Zhang 提交于 6月 21, 2011

Consider this scenario: When the size of the first received udp packet
is bigger than the receive buffer, MSG_TRUNC bit is set in msg->msg_flags.
However, if checksum error happens and this is a blocking socket, it will
goto try_again loop to receive the next packet. But if the size of the
next udp packet is smaller than receive buffer, MSG_TRUNC flag should not
be set, but because MSG_TRUNC bit is not cleared in msg->msg_flags before
receive the next packet, MSG_TRUNC is still set, which is wrong.

Fix this problem by clearing MSG_TRUNC flag when starting over for a
new packet.
Signed-off-by: NXufeng Zhang <xufeng.zhang@windriver.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cfaa8de

udp: add tracepoints for queueing skb to rcvbuf · 296f7ea7

由 Satoru Moriya 提交于 6月 17, 2011

This patch adds a tracepoint to __udp_queue_rcv_skb to get the
return value of ip_queue_rcv_skb. It indicates why kernel drops
a packet at this point.

ip_queue_rcv_skb returns following values in the packet drop case:

rcvbuf is full                 : -ENOMEM
sk_filter returns error        : -EINVAL, -EACCESS, -ENOMEM, etc.
__sk_mem_schedule returns error: -ENOBUF
Signed-off-by: NSatoru Moriya <satoru.moriya@hds.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

296f7ea7

24 5月, 2011 1 次提交

net: convert %p usage to %pK · 71338aa7

由 Dan Rosenberg 提交于 5月 23, 2011

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces.  Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers.  The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
 If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges.  Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".

The supporting code for kptr_restrict and %pK are currently in the -mm
tree.  This patch converts users of %p in net/ to %pK.  Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71338aa7

11 5月, 2011 1 次提交

ipv4: udp: Eliminate remaining uses of rt->rt_src · 79ab0531

由 David S. Miller 提交于 5月 09, 2011

We already track and pass around the correct flow key,
so simply use it in udp_send_skb().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79ab0531

09 5月, 2011 3 次提交

D
ipv4: Pass flow key down into ip_append_*(). · f5fca608
由 David S. Miller 提交于 5月 08, 2011
```
This way rt->rt_dst accesses are unnecessary.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f5fca608

ipv4: Pass flow keys down into datagram packet building engine. · 77968b78

由 David S. Miller 提交于 5月 08, 2011

This way ip_output.c no longer needs rt->rt_{src,dst}.

We already have these keys sitting, ready and waiting, on the stack or
in a socket structure.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77968b78

udp: Use flow key information instead of rt->rt_{src,dst} · e474995f

由 David S. Miller 提交于 5月 08, 2011

We have two cases.

Either the socket is in TCP_ESTABLISHED state and connect() filled
in the inet socket cork flow, or we looked up the route here and
used an on-stack flow.

Track which one it was, and use it to obtain src/dst addrs.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e474995f

29 4月, 2011 1 次提交

inet: add RCU protection to inet->opt · f6d8bd05

由 Eric Dumazet 提交于 4月 21, 2011

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6d8bd05

23 4月, 2011 1 次提交

inet: constify ip headers and in6_addr · b71d1d42

由 Eric Dumazet 提交于 4月 22, 2011

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b71d1d42

31 3月, 2011 2 次提交
- L
  Fix common misspellings · 25985edc
  由 Lucas De Marchi 提交于 3月 30, 2011
```
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
```
  25985edc
- D
  ipv4: Use flowi4_init_output() in udp_sendmsg() · c0951cbc
  由 David S. Miller 提交于 3月 31, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  c0951cbc
13 3月, 2011 5 次提交

D
net: Put fl4_* macros to struct flowi4 and use them again. · 9cce96df
由 David S. Miller 提交于 3月 12, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
9cce96df
D
ipv4: Use flowi4 in UDP · b6f21b26
由 David S. Miller 提交于 3月 12, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b6f21b26
D
ipv4: Use flowi4 in public route lookup interfaces. · 9d6ec938
由 David S. Miller 提交于 3月 12, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
9d6ec938

net: Make flowi ports AF dependent. · 6281dcc9

由 David S. Miller 提交于 3月 12, 2011

Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6281dcc9

net: Put flowi_* prefix on AF independent members of struct flowi · 1d28f42c

由 David S. Miller 提交于 3月 12, 2011

I intend to turn struct flowi into a union of AF specific flowi
structs.  There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d28f42c

04 3月, 2011 1 次提交

ipv4: Fix crash in dst_release when udp_sendmsg route lookup fails. · 06dc94b1

由 David S. Miller 提交于 3月 03, 2011

As reported by Eric:

[11483.697233] IP: [<c12b0638>] dst_release+0x18/0x60
 ...
[11483.697741] Call Trace:
[11483.697764]  [<c12fc9d2>] udp_sendmsg+0x282/0x6e0
[11483.697790]  [<c12a1c01>] ? memcpy_toiovec+0x51/0x70
[11483.697818]  [<c12dbd90>] ? ip_generic_getfrag+0x0/0xb0

The pointer passed to dst_release() is -EINVAL, that's because
we leave an error pointer in the local variable "rt" by accident.

NULL it out to fix the bug.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06dc94b1

03 3月, 2011 1 次提交
- D
  ipv4: Make output route lookup return rtable directly. · b23dd4fe
  由 David S. Miller 提交于 3月 02, 2011
```
Instead of on the stack.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  b23dd4fe
02 3月, 2011 5 次提交

ipv4: Kill can_sleep arg to ip_route_output_flow() · 273447b3

由 David S. Miller 提交于 3月 01, 2011

This boolean state is now available in the flow flags.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

273447b3

net: Add FLOWI_FLAG_CAN_SLEEP. · 5df65e55

由 David S. Miller 提交于 3月 01, 2011

And set is in contexts where the route resolution can sleep.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5df65e55

D
ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep" · 420d44da
由 David S. Miller 提交于 3月 01, 2011
```
Since that is what the current vague "flags" argument means.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
420d44da

udp: Add lockless transmit path · 903ab86d

由 Herbert Xu 提交于 3月 01, 2011

The UDP transmit path has been running under the socket lock
for a long time because of the corking feature.  This means that
transmitting to the same socket in multiple threads does not
scale at all.

However, as most users don't actually use corking, the locking
can be removed in the common case.

This patch creates a lockless fast path where corking is not used.

Please note that this does create a slight inaccuracy in the
enforcement of socket send buffer limits.  In particular, we
may exceed the socket limit by up to (number of CPUs) * (packet
size) because of the way the limit is computed.

As the primary purpose of socket buffers is to indicate congestion,
this should not be a great problem for now.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

903ab86d

udp: Switch to ip_finish_skb · f6b9664f

由 Herbert Xu 提交于 3月 01, 2011

This patch converts UDP to use the new ip_finish_skb API.  This
would then allows us to more easily use ip_make_skb which allows
UDP to run without a socket lock.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6b9664f

25 1月, 2011 1 次提交

net: change netdev->features to u32 · 04ed3e74

由 Michał Mirosław 提交于 1月 24, 2011

Quoting Ben Hutchings: we presumably won't be defining features that
can only be enabled on 64-bit architectures.

Occurences found by `grep -r` on net/, drivers/net, include/

[ Move features and vlan_features next to each other in
  struct netdev, as per Eric Dumazet's suggestion -DaveM ]
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04ed3e74

17 12月, 2010 2 次提交

net: Use skb_checksum_start_offset() · 55508d60

由 Michał Mirosław 提交于 12月 14, 2010

Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset().

Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb).
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55508d60

net: fix nulls list corruptions in sk_prot_alloc · fcbdf09d

由 Octavian Purdila 提交于 12月 16, 2010

Special care is taken inside sk_port_alloc to avoid overwriting
skc_node/skc_nulls_node. We should also avoid overwriting
skc_bind_node/skc_portaddr_node.

The patch fixes the following crash:

 BUG: unable to handle kernel paging request at fffffffffffffff0
 IP: [<ffffffff812ec6dd>] udp4_lib_lookup2+0xad/0x370
 [<ffffffff812ecc22>] __udp4_lib_lookup+0x282/0x360
 [<ffffffff812ed63e>] __udp4_lib_rcv+0x31e/0x700
 [<ffffffff812bba45>] ? ip_local_deliver_finish+0x65/0x190
 [<ffffffff812bbbf8>] ? ip_local_deliver+0x88/0xa0
 [<ffffffff812eda35>] udp_rcv+0x15/0x20
 [<ffffffff812bba45>] ip_local_deliver_finish+0x65/0x190
 [<ffffffff812bbbf8>] ip_local_deliver+0x88/0xa0
 [<ffffffff812bb2cd>] ip_rcv_finish+0x32d/0x6f0
 [<ffffffff8128c14c>] ? netif_receive_skb+0x99c/0x11c0
 [<ffffffff812bb94b>] ip_rcv+0x2bb/0x350
 [<ffffffff8128c14c>] netif_receive_skb+0x99c/0x11c0
Signed-off-by: NLeonard Crestez <lcrestez@ixiacom.com>
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcbdf09d

18 11月, 2010 1 次提交

net: use the macros defined for the members of flowi · 5811662b

由 Changli Gao 提交于 11月 12, 2010

Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5811662b

17 11月, 2010 1 次提交

udp: use atomic_inc_not_zero_hint · c31504dc

由 Eric Dumazet 提交于 11月 15, 2010

UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.

Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c31504dc

11 11月, 2010 1 次提交

net: avoid limits overflow · 8d987e5c

由 Eric Dumazet 提交于 11月 09, 2010

Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Reported-by: NRobin Holt <holt@sgi.com>
Reviewed-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d987e5c

26 10月, 2010 1 次提交

net: add __rcu annotation to sk_filter · 0d7da9dd

由 Eric Dumazet 提交于 10月 25, 2010

Add __rcu annotation to :
        (struct sock)->sk_filter

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d7da9dd

09 9月, 2010 1 次提交

udp: add rehash on connect() · 719f8358

由 Eric Dumazet 提交于 9月 08, 2010

commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
added a secondary hash on UDP, hashed on (local addr, local port).

Problem is that following sequence :

fd = socket(...)
connect(fd, &remote, ...)

not only selects remote end point (address and port), but also sets
local address, while UDP stack stored in secondary hash table the socket
while its local address was INADDR_ANY (or ipv6 equivalent)

Sequence is :
 - autobind() : choose a random local port, insert socket in hash tables
              [while local address is INADDR_ANY]
 - connect() : set remote address and port, change local address to IP
              given by a route lookup.

When an incoming UDP frame comes, if more than 10 sockets are found in
primary hash table, we switch to secondary table, and fail to find
socket because its local address changed.

One solution to this problem is to rehash datagram socket if needed.

We add a new rehash(struct socket *) method in "struct proto", and
implement this method for UDP v4 & v6, using a common helper.

This rehashing only takes care of secondary hash table, since primary
hash (based on local port only) is not changed.
Reported-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

719f8358