提交 · 155d4551bd0ab04367f4ca4e0a229774497da4d8 · Linux-御风守护者 / linux

23 12月, 2011 1 次提交

由 Eric Dumazet 提交于 12月 21, 2011

skb->truesize might be big even for a small packet.

Its even bigger after commit 87fb4b7b (net: more accurate skb
truesize) and big MTU.

We should allow queueing at least one packet per receiver, even with a
low RCVBUF setting.
Reported-by: NMichal Simek <monstr@monstr.eu>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fd7bac6

01 11月, 2011 2 次提交

treewide: use __printf not __attribute__((format(printf,...))) · b9075fa9

由 Joe Perches 提交于 10月 31, 2011

Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.

Done via script and a little typing.

$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
  grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
  xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'

[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9075fa9

include: replace linux/module.h with "struct module" wherever possible · de477254

由 Paul Gortmaker 提交于 5月 26, 2011

The <linux/module.h> pretty much brings in the kitchen sink along
with it, so it should be avoided wherever reasonably possible in
terms of being included from other commonly used <linux/something.h>
files, as it results in a measureable increase on compile times.

The worst culprit was probably device.h since it is used everywhere.
This file also had an implicit dependency/usage of mutex.h which was
masked by module.h, and is also fixed here at the same time.

There are over a dozen other headers that simply declare the
struct instead of pulling in the whole file, so follow their lead
and simply make it a few more.

Most of the implicit dependencies on module.h being present by
these headers pulling it in have been now weeded out, so we can
finally make this change with hopefully minimal breakage.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

de477254

18 8月, 2011 1 次提交

rps: Add flag to skb to indicate rxhash is based on L4 tuple · bdeab991

由 Tom Herbert 提交于 8月 14, 2011

The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet.  This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdeab991

09 7月, 2011 1 次提交

rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check · d8bf4ca9

由 Michal Hocko 提交于 7月 08, 2011

Since ca5ecddf (rcu: define __rcu address space modifier for sparse)
rcu_dereference_check use rcu_read_lock_held as a part of condition
automatically so callers do not have to do that as well.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

d8bf4ca9

07 7月, 2011 1 次提交

sock.h: Add a new sock zero-copy flag · 1cdebb42

由 Shirley Ma 提交于 7月 06, 2011

Signed-off-by: NShirley Ma <xma@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cdebb42

28 6月, 2011 2 次提交

treewide: fix kernel-doc warnings · e1f91f82

由 Vitaliy Ivanov 提交于 6月 27, 2011

Fix 'make htmldocs' warnings:

Warning(/include/linux/hrtimer.h:153): No description found for
parameter 'clockid'
Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef
member 'of_match' description in 'device'
Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef
member 'sk_rmem_alloc' description in 'sock'
Signed-off-by: NVitaliy Ivanov <vitalivanov@gmail.com>
Acked-by: NGrant Likely <grant.likely@secretlab.ca>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

e1f91f82

Fix some kernel-doc warnings · 4d258b25

由 Vitaliy Ivanov 提交于 6月 27, 2011

Fix 'make htmldocs' warnings:

Warning(/include/linux/hrtimer.h:153): No description found for parameter 'clockid'
Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef member 'of_match' description in 'device'
Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef member 'sk_rmem_alloc' description in 'sock'
Signed-off-by: NVitaliy Ivanov <vitalivanov@gmail.com>
Acked-by: NGrant Likely <grant.likely@secretlab.ca>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d258b25

07 6月, 2011 1 次提交

net: remove interrupt.h inclusion from netdevice.h · a6b7a407

由 Alexey Dobriyan 提交于 6月 06, 2011

* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6b7a407

07 4月, 2011 1 次提交

net: fix skb_add_data_nocache() to calc csum correctly · 912d398d

由 Wei Yongjun 提交于 4月 06, 2011

commit c6e1a0d1 broken the calc
 (net: Allow no-cache copy from user on transmit)
of checksum, which may cause some tcp packets be dropped because
incorrect checksum. ssh does not work under today's net-next-2.6
tree.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

912d398d

05 4月, 2011 1 次提交

net: Allow no-cache copy from user on transmit · c6e1a0d1

由 Tom Herbert 提交于 4月 04, 2011

This patch uses __copy_from_user_nocache on transmit to bypass data
cache for a performance improvement.  skb_add_data_nocache and
skb_copy_to_page_nocache can be called by sendmsg functions to use
this feature, initial support is in tcp_sendmsg.  This functionality is
configurable per device using ethtool.

Presumably, this feature would only be useful when the driver does
not touch the data.  The feature is turned on by default if a device
indicates that it does some form of checksum offload; it is off by
default for devices that do no checksum offload or indicate no checksum
is necessary.  For the former case copy-checksum is probably done
anyway, in the latter case the device is likely loopback in which case
the no cache copy is probably not beneficial.

This patch was tested using 200 instances of netperf TCP_RR with
1400 byte request and one byte reply.  Platform is 16 core AMD x86.

No-cache copy disabled:
   672703 tps, 97.13% utilization
   50/90/99% latency:244.31 484.205 1028.41

No-cache copy enabled:
   702113 tps, 96.16% utilization,
   50/90/99% latency 238.56 467.56 956.955

Using 14000 byte request and response sizes demonstrate the
effects more dramatically:

No-cache copy disabled:
   79571 tps, 34.34 %utlization
   50/90/95% latency 1584.46 2319.59 5001.76

No-cache copy enabled:
   83856 tps, 34.81% utilization
   50/90/95% latency 2508.42 2622.62 2735.88

Note especially the effect on latency tail (95th percentile).

This seems to provide a nice performance improvement and is
consistent in the tests I ran.  Presumably, this would provide
the greatest benfits in the presence of an application workload
stressing the cache and a lot of transmit data happening.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6e1a0d1

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

23 2月, 2011 1 次提交

net: add __rcu annotations to sk_wq and wq · eaefd110

由 Eric Dumazet 提交于 2月 18, 2011

Add proper RCU annotations/verbs to sk_wq and wq members

Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)

Fix sunrpc sk_sleep() abuse too
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eaefd110

30 1月, 2011 1 次提交

net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT · 709b46e8

由 Eric W. Biederman 提交于 1月 29, 2011

SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1,
which unfortunately means the existing infrastructure for compat networking
ioctls is insufficient.  A trivial compact ioctl implementation would conflict
with:

SIOCAX25ADDUID
SIOCAIPXPRISLT
SIOCGETSGCNT_IN6
SIOCGETSGCNT
SIOCRSSCAUSE
SIOCX25SSUBSCRIP
SIOCX25SDTEFACILITIES

To make this work I have updated the compat_ioctl decode path to mirror the
the normal ioctl decode path.  I have added an ipv4 inet_compat_ioctl function
so that I can have ipv4 specific compat ioctls.   I have added a compat_ioctl
function into struct proto so I can break out ioctls by which kind of ip socket
I am using.  I have added a compat_raw_ioctl function because SIOCGETSGCNT only
works on raw sockets.  I have added a ipmr_compat_ioctl that mirrors the normal
ipmr_ioctl.

This was necessary because unfortunately the struct layout for the SIOCGETSGCNT
has unsigned longs in it so changes between 32bit and 64bit kernels.

This change was sufficient to run a 32bit ip multicast routing daemon on a
64bit kernel.
Reported-by: NBill Fenner <fenner@aristanetworks.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

709b46e8

19 1月, 2011 1 次提交

net: filter: dont block softirqs in sk_run_filter() · 80f8f102

由 Eric Dumazet 提交于 1月 18, 2011

Packet filter (BPF) doesnt need to disable softirqs, being fully
re-entrant and lock-less.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80f8f102

10 1月, 2011 1 次提交

net/sock.h: make some fields private to fix kernel-doc warning(s) · 928c41e7

由 Randy Dunlap 提交于 1月 08, 2011

Fix new kernel-doc notation warning in sock.h by annotating skc_dontcopy_*
as private fields.

Warning(include/net/sock.h:163): No description found for parameter 'skc_dontcopy_end[0]'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

928c41e7

17 12月, 2010 1 次提交

net: fix nulls list corruptions in sk_prot_alloc · fcbdf09d

由 Octavian Purdila 提交于 12月 16, 2010

Special care is taken inside sk_port_alloc to avoid overwriting
skc_node/skc_nulls_node. We should also avoid overwriting
skc_bind_node/skc_portaddr_node.

The patch fixes the following crash:

 BUG: unable to handle kernel paging request at fffffffffffffff0
 IP: [<ffffffff812ec6dd>] udp4_lib_lookup2+0xad/0x370
 [<ffffffff812ecc22>] __udp4_lib_lookup+0x282/0x360
 [<ffffffff812ed63e>] __udp4_lib_rcv+0x31e/0x700
 [<ffffffff812bba45>] ? ip_local_deliver_finish+0x65/0x190
 [<ffffffff812bbbf8>] ? ip_local_deliver+0x88/0xa0
 [<ffffffff812eda35>] udp_rcv+0x15/0x20
 [<ffffffff812bba45>] ip_local_deliver_finish+0x65/0x190
 [<ffffffff812bbbf8>] ip_local_deliver+0x88/0xa0
 [<ffffffff812bb2cd>] ip_rcv_finish+0x32d/0x6f0
 [<ffffffff8128c14c>] ? netif_receive_skb+0x99c/0x11c0
 [<ffffffff812bb94b>] ip_rcv+0x2bb/0x350
 [<ffffffff8128c14c>] netif_receive_skb+0x99c/0x11c0
Signed-off-by: NLeonard Crestez <lcrestez@ixiacom.com>
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcbdf09d

10 12月, 2010 1 次提交

net: optimize INET input path further · 68835aba

由 Eric Dumazet 提交于 11月 30, 2010

Followup of commit b178bb3d (net: reorder struct sock fields)

Optimize INET input path a bit further, by :

1) moving sk_refcnt close to sk_lock.

This reduces number of dirtied cache lines by one on 64bit arches (and
64 bytes cache line size).

2) moving inet_daddr & inet_rcv_saddr at the beginning of sk

(same cache line than hash / family / bound_dev_if / nulls_node)

This reduces number of accessed cache lines in lookups by one, and dont
increase size of inet and timewait socks.
inet and tw sockets now share same place-holder for these fields.

Before patch :

offsetof(struct sock, sk_refcnt) = 0x10
offsetof(struct sock, sk_lock) = 0x40
offsetof(struct sock, sk_receive_queue) = 0x60
offsetof(struct inet_sock, inet_daddr) = 0x270
offsetof(struct inet_sock, inet_rcv_saddr) = 0x274

After patch :

offsetof(struct sock, sk_refcnt) = 0x44
offsetof(struct sock, sk_lock) = 0x48
offsetof(struct sock, sk_receive_queue) = 0x68
offsetof(struct inet_sock, inet_daddr) = 0x0
offsetof(struct inet_sock, inet_rcv_saddr) = 0x4

compute_score() (udp or tcp) now use a single cache line per ignored
item, instead of two.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68835aba

07 12月, 2010 1 次提交

filter: fix sk_filter rcu handling · 46bcf14f

由 Eric Dumazet 提交于 12月 06, 2010

Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
sk_clone() in commit 47e958ea

Problem is we can have several clones sharing a common sk_filter, and
these clones might want to sk_filter_attach() their own filters at the
same time, and can overwrite old_filter->rcu, corrupting RCU queues.

We can not use filter->rcu without being sure no other thread could do
the same thing.

Switch code to a more conventional ref-counting technique : Do the
atomic decrement immediately and queue one rcu call back when last
reference is released.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46bcf14f

03 12月, 2010 1 次提交

net: kill unused macros from head file · dca9b240

由 Shan Wei 提交于 12月 01, 2010

These macros have been defined for several years since v2.6.12-rc2（tracing by git）,
but never be used. So remove them.
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dca9b240

17 11月, 2010 2 次提交

net: reorder struct sock fields · b178bb3d

由 Eric Dumazet 提交于 11月 16, 2010

Right now, fields in struct sock are not optimally ordered, because each
path (RX softirq, TX completion, RX user,  TX user) has to touch fields
that are contained in many different cache lines.

The really critical thing is to shrink number of cache lines that are
used at RX softirq time : CPU handling softirqs for a device can receive
many frames per second for many sockets. If load is too big, we can drop
frames at NIC level. RPS or multiqueue cards can help, but better reduce
latency if possible.

This patch starts with UDP protocol, then additional patches will try to
reduce latencies of other ones as well.

At RX softirq time, fields of interest for UDP protocol are :
(not counting ones in inet struct for the lookup)

Read/Written:
sk_refcnt   (atomic increment/decrement)
sk_rmem_alloc & sk_backlog.len (to check if there is room in queues)
sk_receive_queue
sk_backlog (if socket locked by user program)
sk_rxhash
sk_forward_alloc
sk_drops

Read only:
sk_rcvbuf (sk_rcvqueues_full())
sk_filter
sk_wq
sk_policy[0]
sk_flags

Additional notes :

- sk_backlog has one hole on 64bit arches. We can fill it to save 8
bytes.
- sk_backlog is used only if RX sofirq handler finds the socket while
locked by user.
- sk_rxhash is written only once per flow.
- sk_drops is written only if queues are full

Final layout :

[1] One section grouping all read/write fields, but placing rxhash and
sk_backlog at the end of this section.

[2] One section grouping all read fields in RX handler
   (sk_filter, sk_rcv_buf, sk_wq)

[3] Section used by other paths

I'll post a patch on its own to put sk_refcnt at the end of struct
sock_common so that it shares same cache line than section [1]

New offsets on 64bit arch :

sizeof(struct sock)=0x268
offsetof(struct sock, sk_refcnt)  =0x10
offsetof(struct sock, sk_lock)    =0x48
offsetof(struct sock, sk_receive_queue)=0x68
offsetof(struct sock, sk_backlog)=0x80
offsetof(struct sock, sk_rmem_alloc)=0x80
offsetof(struct sock, sk_forward_alloc)=0x98
offsetof(struct sock, sk_rxhash)=0x9c
offsetof(struct sock, sk_rcvbuf)=0xa4
offsetof(struct sock, sk_drops) =0xa0
offsetof(struct sock, sk_filter)=0xa8
offsetof(struct sock, sk_wq)=0xb0
offsetof(struct sock, sk_policy)=0xd0
offsetof(struct sock, sk_flags) =0xe0

Instead of :

sizeof(struct sock)=0x270
offsetof(struct sock, sk_refcnt)  =0x10
offsetof(struct sock, sk_lock)    =0x50
offsetof(struct sock, sk_receive_queue)=0xc0
offsetof(struct sock, sk_backlog)=0x70
offsetof(struct sock, sk_rmem_alloc)=0xac
offsetof(struct sock, sk_forward_alloc)=0x10c
offsetof(struct sock, sk_rxhash)=0x128
offsetof(struct sock, sk_rcvbuf)=0x4c
offsetof(struct sock, sk_drops) =0x16c
offsetof(struct sock, sk_filter)=0x198
offsetof(struct sock, sk_wq)=0x88
offsetof(struct sock, sk_policy)=0x98
offsetof(struct sock, sk_flags) =0x130
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b178bb3d

udp: use atomic_inc_not_zero_hint · c31504dc

由 Eric Dumazet 提交于 11月 15, 2010

UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.

Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c31504dc

11 11月, 2010 1 次提交

net: avoid limits overflow · 8d987e5c

由 Eric Dumazet 提交于 11月 09, 2010

Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Reported-by: NRobin Holt <holt@sgi.com>
Reviewed-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d987e5c

26 10月, 2010 1 次提交

net: add __rcu annotation to sk_filter · 0d7da9dd

由 Eric Dumazet 提交于 10月 25, 2010

Add __rcu annotation to :
        (struct sock)->sk_filter

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d7da9dd

27 9月, 2010 1 次提交

net: update SOCK_MIN_RCVBUF · 7a91b434

由 Eric Dumazet 提交于 9月 26, 2010

SOCK_MIN_RCVBUF current value is 256 bytes

It doesnt permit to receive the smallest possible frame, considering
socket sk_rmem_alloc/sk_rcvbuf account skb truesizes. On 64bit arches,
sizeof(struct sk_buff) is 240 bytes. Add the typical 64 bytes of
headroom, and we go over the limit.

With old kernels and 32bit arches, we were under the limit, if netdriver
was doing copybreak.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a91b434

09 9月, 2010 1 次提交

udp: add rehash on connect() · 719f8358

由 Eric Dumazet 提交于 9月 08, 2010

commit 30fff923 introduced in linux-2.6.33 (udp: bind() optimisation)
added a secondary hash on UDP, hashed on (local addr, local port).

Problem is that following sequence :

fd = socket(...)
connect(fd, &remote, ...)

not only selects remote end point (address and port), but also sets
local address, while UDP stack stored in secondary hash table the socket
while its local address was INADDR_ANY (or ipv6 equivalent)

Sequence is :
 - autobind() : choose a random local port, insert socket in hash tables
              [while local address is INADDR_ANY]
 - connect() : set remote address and port, change local address to IP
              given by a route lookup.

When an incoming UDP frame comes, if more than 10 sockets are found in
primary hash table, we switch to secondary table, and fail to find
socket because its local address changed.

One solution to this problem is to rehash datagram socket if needed.

We add a new rehash(struct socket *) method in "struct proto", and
implement this method for UDP v4 & v6, using a common helper.

This rehashing only takes care of secondary hash table, since primary
hash (based on local port only) is not changed.
Reported-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NKrzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

719f8358

19 8月, 2010 1 次提交

net: simplify flags for tx timestamping · 2244d07b

由 Oliver Hartkopp 提交于 8月 17, 2010

This patch removes the abstraction introduced by the union skb_shared_tx in
the shared skb data.

The access of the different union elements at several places led to some
confusion about accessing the shared tx_flags e.g. in skb_orphan_try().

http://marc.info/?l=linux-netdev&m=128084897415886&w=2Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2244d07b

10 8月, 2010 1 次提交

net/sock.h: add missing kernel-doc notation · 53c3fa20

由 Randy Dunlap 提交于 8月 09, 2010

Add missing kernel-doc notation to struct sock:

Warning(include/net/sock.h:324): No description found for parameter 'sk_peer_pid'
Warning(include/net/sock.h:324): No description found for parameter 'sk_peer_cred'
Warning(include/net/sock.h:324): No description found for parameter 'sk_classid'
Warning(include/net/sock.h:324): Excess struct/union/enum/typedef member 'sk_peercred' description in 'sock'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53c3fa20

15 7月, 2010 1 次提交

net: fix problem in reading sock TX queue · b0f77d0e

由 Tom Herbert 提交于 7月 14, 2010

Fix problem in reading the tx_queue recorded in a socket.  In
dev_pick_tx, the TX queue is read by doing a check with
sk_tx_queue_recorded on the socket, followed by a sk_tx_queue_get.
The problem is that there is not mutual exclusion across these
calls in the socket so it it is possible that the queue in the
sock can be invalidated after sk_tx_queue_recorded is called so
that sk_tx_queue get returns -1, which sets 65535 in queue_index
and thus dev_pick_tx returns 65536 which is a bogus queue and
can cause crash in dev_queue_xmit.

We fix this by only calling sk_tx_queue_get which does the proper
checks.  The interface is that sk_tx_queue_get returns the TX queue
if the sock argument is non-NULL and TX queue is recorded, else it
returns -1.  sk_tx_queue_recorded is no longer used so it can be
completely removed.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0f77d0e

13 7月, 2010 1 次提交

inet, inet6: make tcp_sendmsg() and tcp_sendpage() through inet_sendmsg() and inet_sendpage() · 7ba42910

由 Changli Gao 提交于 7月 10, 2010

a new boolean flag no_autobind is added to structure proto to avoid the autobind
calls when the protocol is TCP. Then sock_rps_record_flow() is called int the
TCP's sendmsg() and sendpage() pathes.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
----
 include/net/inet_common.h |    4 ++++
 include/net/sock.h        |    1 +
 include/net/tcp.h         |    8 ++++----
 net/ipv4/af_inet.c        |   15 +++++++++------
 net/ipv4/tcp.c            |   11 +++++------
 net/ipv4/tcp_ipv4.c       |    3 +++
 net/ipv6/af_inet6.c       |    8 ++++----
 net/ipv6/tcp_ipv6.c       |    3 +++
 8 files changed, 33 insertions(+), 20 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7ba42910

17 6月, 2010 1 次提交

af_unix: Allow SO_PEERCRED to work across namespaces. · 109f6e39

由 Eric W. Biederman 提交于 6月 13, 2010

Use struct pid and struct cred to store the peer credentials on struct
sock.  This gives enough information to convert the peer credential
information to a value relative to whatever namespace the socket is in
at the time.

This removes nasty surprises when using SO_PEERCRED on socket
connetions where the processes on either side are in different pid and
user namespaces.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

109f6e39

02 6月, 2010 1 次提交

net: CONFIG_NET_NS reduction · c2d9ba9b

由 Eric Dumazet 提交于 6月 01, 2010

Use read_pnet() and write_pnet() to reduce number of ifdef CONFIG_NET_NS
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2d9ba9b

01 6月, 2010 1 次提交

net: sock_queue_err_skb() dont mess with sk_forward_alloc · b1faf566

由 Eric Dumazet 提交于 5月 31, 2010

Correct sk_forward_alloc handling for error_queue would need to use a
backlog of frames that softirq handler could not deliver because socket
is owned by user thread. Or extend backlog processing to be able to
process normal and error packets.

Another possibility is to not use mem charge for error queue, this is
what I implemented in this patch.

Note: this reverts commit 29030374
(net: fix sk_forward_alloc corruptions), since we dont need to lock
socket anymore.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1faf566

27 5月, 2010 1 次提交

net: fix lock_sock_bh/unlock_sock_bh · 8a74ad60

由 Eric Dumazet 提交于 5月 26, 2010

This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh

This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.

After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().
Reported-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a74ad60

25 5月, 2010 1 次提交

sock.h: fix kernel-doc warning · acfbe96a

由 Randy Dunlap 提交于 5月 24, 2010

Fix sock.h kernel-doc warning:
Warning(include/net/sock.h:1438): No description found for parameter 'wq'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acfbe96a

24 5月, 2010 1 次提交

cls_cgroup: Store classid in struct sock · f8451725

由 Herbert Xu 提交于 5月 24, 2010

Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error path on cls_group module
creation.  If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8451725

18 5月, 2010 1 次提交

net: add a noref bit on skb dst · 7fee226a

由 Eric Dumazet 提交于 5月 11, 2010

Use low order bit of skb->_skb_dst to tell dst is not refcounted.

Change _skb_dst to _skb_refdst to make sure all uses are catched.

skb_dst() returns the dst, regardless of noref bit set or not, but
with a lockdep check to make sure a noref dst is not given if current
user is not rcu protected.

New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
(with lockdep check)

skb_dst_drop() drops a reference only if skb dst was refcounted.

skb_dst_force() helper is used to force a refcount on dst, when skb
is queued and not anymore RCU protected.

Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
!IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
sock_queue_rcv_skb(), in __nf_queue().

Use skb_dst_force() in dev_requeue_skb().

Note: dst_use_noref() still dirties dst, we might transform it
later to do one dirtying per jiffies.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fee226a

16 5月, 2010 1 次提交

net: Introduce sk_route_nocaps · a465419b

由 Eric Dumazet 提交于 5月 16, 2010

TCP-MD5 sessions have intermittent failures, when route cache is
invalidated. ip_queue_xmit() has to find a new route, calls
sk_setup_caps(sk, &rt->u.dst), destroying the 

sk->sk_route_caps &= ~NETIF_F_GSO_MASK

that MD5 desperately try to make all over its way (from
tcp_transmit_skb() for example)

So we send few bad packets, and everything is fine when
tcp_transmit_skb() is called again for this socket.

Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.
Reported-by: NBhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a465419b

03 5月, 2010 1 次提交

net: fix compile error due to double return type in SOCK_DEBUG · 1183f383

由 Jan Engelhardt 提交于 5月 02, 2010

Fix this one:
include/net/sock.h: error: two or more data types in declaration specifiers
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1183f383

02 5月, 2010 1 次提交

net: sock_def_readable() and friends RCU conversion · 43815482

由 Eric Dumazet 提交于 4月 29, 2010

sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.

RCU conversion is pretty much needed :

1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).

[Future patch will add a list anchor for wakeup coalescing]

2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().

3) Respect RCU grace period when freeing a "struct socket_wq"

4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"

5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep

6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.

7) Change all sk_has_sleeper() callers to :
  - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
  - Use wq_has_sleeper() to eventually wakeup tasks.
  - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

8) sock_wake_async() is modified to use rcu protection as well.

9) Exceptions :
  macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.

Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43815482

Linux-御风守护者 / linux 与 Fork 源项目一致

Linux-御风守护者 / linux
与 Fork 源项目一致