提交 · abbd00b82a2771b0460ba2cffdb1343aa827ccde · openeuler / raspberrypi-kernel

15 11月, 2011 1 次提交

由 Eric Dumazet 提交于 11月 14, 2011

One of the thing we discussed during netdev 2011 conference was the idea
to change some network drivers to allocate/populate their skb at RX
completion time, right before feeding the skb to network stack.

In old days, we allocated skbs when populating the RX ring.

This means bringing into cpu cache sk_buff and skb_shared_info cache
lines (since we clear/initialize them), then 'queue' skb->data to NIC.

By the time NIC fills a frame in skb->data buffer and host can process
it, cpu probably threw away the cache lines from its caches, because lot
of things happened between the allocation and final use.

So the deal would be to allocate only the data buffer for the NIC to
populate its RX ring buffer. And use build_skb() at RX completion to
attach a data buffer (now filled with an ethernet frame) to a new skb,
initialize the skb_shared_info portion, and give the hot skb to network
stack.

build_skb() is the function to allocate an skb, caller providing the
data buffer that should be attached to it. Drivers are expected to call
skb_reserve() right after build_skb() to adjust skb->data to the
Ethernet frame (usually skipping NET_SKB_PAD and NET_IP_ALIGN, but some
drivers might add a hardware provided alignment)

Data provided to build_skb() MUST have been allocated by a prior
kmalloc() call, with enough room to add SKB_DATA_ALIGN(sizeof(struct
skb_shared_info)) bytes at the end of the data without corrupting
incoming frame.

data = kmalloc(NET_SKB_PAD + NET_IP_ALIGN + 1536 +
               SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
	       GFP_ATOMIC);
...
skb = build_skb(data);
if (!skb) {
	recycle_data(data);
} else {
	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
	...
}
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Eilon Greenstein <eilong@broadcom.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: Tom Herbert <therbert@google.com>
CC: Jamal Hadi Salim <hadi@mojatatu.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Thomas Graf <tgraf@infradead.org>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2b5ce9d

14 11月, 2011 8 次提交

neigh: new unresolved queue limits · 8b5c171b

由 Eric Dumazet 提交于 11月 09, 2011

Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
>
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 09 Nov 2011 12:14:09 +0100
> >
> >> unres_qlen is the number of frames we are able to queue per unresolved
> >> neighbour. Its default value (3) was never changed and is responsible
> >> for strange drops, especially if IP fragments are used, or multiple
> >> sessions start in parallel. Even a single tcp flow can hit this limit.
> >  ...
> >
> > Ok, I've applied this, let's see what happens :-)
>
> Early answer, build fails.
>
> Please test build this patch with DECNET enabled and resubmit.  The
> decnet neigh layer still refers to the removed ->queue_len member.
>
> Thanks.

Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.

[PATCH V5 net-next] neigh: new unresolved queue limits

unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.

$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b5c171b

bridge: add NTF_USE support · 292d1398

由 stephen hemminger 提交于 11月 09, 2011

More changes to the recent code to support control of forwarding
database via netlink.
   * Support NTF_USE like neighbour table
   * Validate state bits from application
   * Only send notifications (and change bits) if new entry is
     different.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

292d1398

6LoWPAN: UDP header decompression · f8b1b5d2

由 alex.bluesman.smirnov@gmail.com 提交于 11月 10, 2011

This patch provides possibility to decompress UDP headers.
Derived from Contiki OS.
Signed-off-by: NAlexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8b1b5d2

6LoWPAN: UDP header compression · 3bd5b958

由 alex.bluesman.smirnov@gmail.com 提交于 11月 10, 2011

This patch adds support for UDP header compression.
Derived from Contiki OS.
Signed-off-by: NAlexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bd5b958

6LoWPAN: set proper netdev flags · 4d039f68

由 alex.bluesman.smirnov@gmail.com 提交于 11月 10, 2011

This patch fixes settings for device initialization which makes possible to
use NDISC and TCP.
Signed-off-by: NAlexander Smirnov <alex.bluesman.smirnov@gmail.com>
Acked-by: NDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4d039f68

6LoWPAN: disable debugging by default · e86586ba

由 alex.bluesman.smirnov@gmail.com 提交于 11月 10, 2011

This patch disables debug output enabled by default.
Signed-off-by: NAlexander Smirnov <alex.bluesman.smirnov@gmail.com>
Acked-by: NDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e86586ba

6LoWPAN: add fragmentation support · 719269af

由 alex.bluesman.smirnov@gmail.com 提交于 11月 10, 2011

This patch adds support for frame fragmentation.
Signed-off-by: NAlexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

719269af

ipv6: reduce percpu needs for icmpv6msg mibs · 2a24444f

由 Eric Dumazet 提交于 11月 13, 2011

Reading /proc/net/snmp6 on a machine with a lot of cpus is very
expensive (can be ~88000 us).

This is because ICMPV6MSG MIB uses 4096 bytes per cpu, and folding
values for all possible cpus can read 16 Mbytes of memory (32MBytes on
non x86 arches)

ICMP messages are not considered as fast path on a typical server, and
eventually few cpus handle them anyway. We can afford an atomic
operation instead of using percpu data.

This saves 4096 bytes per cpu and per network namespace.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a24444f

10 11月, 2011 2 次提交

ipv4: PKTINFO doesnt need dst reference · d826eb14

由 Eric Dumazet 提交于 11月 09, 2011

Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :

> At least, in recent kernels we dont change dst->refcnt in forwarding
> patch (usinf NOREF skb->dst)
>
> One particular point is the atomic_inc(dst->refcnt) we have to perform
> when queuing an UDP packet if socket asked PKTINFO stuff (for example a
> typical DNS server has to setup this option)
>
> I have one patch somewhere that stores the information in skb->cb[] and
> avoid the atomic_{inc|dec}(dst->refcnt).
>

OK I found it, I did some extra tests and believe its ready.

[PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference

When a socket uses IP_PKTINFO notifications, we currently force a dst
reference for each received skb. Reader has to access dst to get needed
information (rt_iif & rt_spec_dst) and must release dst reference.

We also forced a dst reference if skb was put in socket backlog, even
without IP_PKTINFO handling. This happens under stress/load.

We can instead store the needed information in skb->cb[], so that only
softirq handler really access dst, improving cache hit ratios.

This removes two atomic operations per packet, and false sharing as
well.

On a benchmark using a mono threaded receiver (doing only recvmsg()
calls), I can reach 720.000 pps instead of 570.000 pps.

IP_PKTINFO is typically used by DNS servers, and any multihomed aware
UDP application.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d826eb14

ipv4: reduce percpu needs for icmpmsg mibs · acb32ba3

由 Eric Dumazet 提交于 11月 08, 2011

Reading /proc/net/snmp on a machine with a lot of cpus is very expensive
(can be ~88000 us).

This is because ICMPMSG MIB uses 4096 bytes per cpu, and folding values
for all possible cpus can read 16 Mbytes of memory.

ICMP messages are not considered as fast path on a typical server, and
eventually few cpus handle them anyway. We can afford an atomic
operation instead of using percpu data.

This saves 4096 bytes per cpu and per network namespace.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acb32ba3

09 11月, 2011 11 次提交

net: rename sk_clone to sk_clone_lock · e56c57d0

由 Eric Dumazet 提交于 11月 08, 2011

Make clear that sk_clone() and inet_csk_clone() return a locked socket.

Add _lock() prefix and kerneldoc.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e56c57d0

sch_choke: use skb_header_pointer() · 9ecd04bc

由 Eric Dumazet 提交于 11月 08, 2011

Remove the assumption that skb_get_rxhash() makes IP header and ports
linear, and use skb_header_pointer() instead in choke_match_flow()

This permits __skb_get_rxhash() to use skb_header_pointer() eventually.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ecd04bc

net: make ipv6 PKTINFO honour freebind · 2563fa59

由 Maciej Żenczykowski 提交于 11月 07, 2011

This just makes it possible to spoof source IPv6 address on a socket
without having to create and bind a new socket for every source IP
we wish to spoof.
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2563fa59

net: make ipv6 bind honour freebind · f74024d9

由 Maciej Żenczykowski 提交于 11月 07, 2011

This makes native ipv6 bind follow the precedent set by:
  - native ipv4 bind behaviour
  - dual stack ipv4-mapped ipv6 bind behaviour.

This does allow an unpriviledged process to spoof its source IPv6
address, just like it currently can spoof its source IPv4 address
(for example when using UDP).
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f74024d9

sctp: fasthandoff with ASCONF at server-node · 34d2d89f

由 Michio Honda 提交于 6月 17, 2011

Retransmit chunks to newly confirmed destination when ASCONF and
HEARTBEAT negotiation has success with a single-homed peer.
Signed-off-by: NMichio Honda <micchie@sfc.wide.ad.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34d2d89f

sctp: fasthandoff with ASCONF at mobile-node · ddc4bbee

由 Michio Honda 提交于 6月 17, 2011

Fast retransmission after changing the last address
with ASCONF negotiation
Signed-off-by: NMichio Honda <micchie@sfc.wide.ad.jp>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddc4bbee

net: better pcpu data alignment · 8ce120f1

由 Eric Dumazet 提交于 11月 04, 2011

Tunnels can force an alignment of their percpu data to reduce number of
cache lines used in fast path, or read in .ndo_get_stats()

percpu_alloc() is a very fine grained allocator, so any small hole will
be used anyway.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ce120f1

ipv4: Fix inetpeer expire time information · 2bc8ca40

由 Steffen Klassert 提交于 10月 11, 2011

As we update the learned pmtu informations on demand, we might
report a nagative expiration time value to userspace if the
pmtu informations are already expired and we have not send a
packet to that inetpeer after expiration. With this patch we
send a expire time of null to userspace after expiration
until the next packet is send to that inetpeer.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2bc8ca40

tcp: Fix comments for Nagle algorithm · 6d67e9be

由 Feng King 提交于 11月 05, 2011

TCP_NODELAY is weaker than TCP_CORK, when TCP_CORK was set, small
segments will always pass Nagle test regardless of TCP_NODELAY option.
Signed-off-by: NFeng King <kinwin2008@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d67e9be

l2tp: fix l2tp_udp_recv_core() · e50e705c

由 Eric Dumazet 提交于 11月 08, 2011

pskb_may_pull() can change skb->data, so we have to load ptr/optr at the
right place.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e50e705c

ipv6: drop packets when source address is multicast · c457338d

由 Brian Haley 提交于 11月 08, 2011

RFC 4291 Section 2.7 says Multicast addresses must not be used as source
addresses in IPv6 packets - drop them on input so we don't process the
packet further.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Reported-and-Tested-by: NKumar Sanghvi <divinekumar@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c457338d

08 11月, 2011 1 次提交

wanrouter: Remove kernel_lock annotations · 039c811c

由 Richard Weinberger 提交于 11月 07, 2011

The BKL is gone, these annotations are useless.
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

039c811c

04 11月, 2011 4 次提交

af_packet: de-inline some helper functions · eea49cc9

由 Olof Johansson 提交于 11月 02, 2011

This popped some compiler errors due to mismatched prototypes. Just
remove most manual inlines, the compiler should be able to figure out
what makes sense to inline and not.

net/packet/af_packet.c:252: warning: 'prb_curr_blk_in_use' declared inline after being called
net/packet/af_packet.c:252: warning: previous declaration of 'prb_curr_blk_in_use' was here
net/packet/af_packet.c:258: warning: 'prb_queue_frozen' declared inline after being called
net/packet/af_packet.c:258: warning: previous declaration of 'prb_queue_frozen' was here
net/packet/af_packet.c:248: warning: 'packet_previous_frame' declared inline after being called
net/packet/af_packet.c:248: warning: previous declaration of 'packet_previous_frame' was here
net/packet/af_packet.c:251: warning: 'packet_increment_head' declared inline after being called
net/packet/af_packet.c:251: warning: previous declaration of 'packet_increment_head' was here
Signed-off-by: NOlof Johansson <olof@lixom.net>
Cc: Chetan Loke <loke.chetan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eea49cc9

net: Add back alignment for size for __alloc_skb · bc417e30

由 Tony Lindgren 提交于 11月 02, 2011

Commit 87fb4b7b (net: more
accurate skb truesize) changed the alignment of size. This
can cause problems at least on some machines with NFS root:

Unhandled fault: alignment exception (0x801) at 0xc183a43a
Internal error: : 801 [#1] PREEMPT
Modules linked in:
CPU: 0    Not tainted  (3.1.0-08784-g5eeee4a #733)
pc : [<c02fbba0>]    lr : [<c02fbb9c>]    psr: 60000013
sp : c180fef8  ip : 00000000  fp : c181f580
r10: 00000000  r9 : c044b28c  r8 : 00000001
r7 : c183a3a0  r6 : c1835be0  r5 : c183a412  r4 : 000001f2
r3 : 00000000  r2 : 00000000  r1 : ffffffe6  r0 : c183a43a
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 0005317f  Table: 10004000  DAC: 00000017
Process swapper (pid: 1, stack limit = 0xc180e270)
Stack: (0xc180fef8 to 0xc1810000)
fee0:                                                       00000024 00000000
ff00: 00000000 c183b9c0 c183b8e0 c044b28c c0507ccc c019dfc4 c180ff2c c0503cf8
ff20: c180ff4c c180ff4c 00000000 c1835420 c182c740 c18349c0 c05233c0 00000000
ff40: 00000000 c00e6bb8 c180e000 00000000 c04dd82c c0507e7c c050cc18 c183b9c0
ff60: c05233c0 00000000 00000000 c01f34f4 c0430d70 c019d364 c04dd898 c04dd898
ff80: c04dd82c c0507e7c c180e000 00000000 c04c584c c01f4918 c04dd898 c04dd82c
ffa0: c04ddd28 c180e000 00000000 c0008758 c181fa60 3231d82c 00000037 00000000
ffc0: 00000000 c04dd898 c04dd82c c04ddd28 00000013 00000000 00000000 00000000
ffe0: 00000000 c04b2224 00000000 c04b21a0 c001056c c001056c 00000000 00000000
Function entered at [<c02fbba0>] from [<c019dfc4>]
Function entered at [<c019dfc4>] from [<c01f34f4>]
Function entered at [<c01f34f4>] from [<c01f4918>]
Function entered at [<c01f4918>] from [<c0008758>]
Function entered at [<c0008758>] from [<c04b2224>]
Function entered at [<c04b2224>] from [<c001056c>]
Code: e1a00005 e3a01028 ebfa7cb0 e35a0000 (e5858028)

Here PC is at __alloc_skb and &shinfo->dataref is unaligned because
skb->end can be unaligned without this patch.

As explained by Eric Dumazet <eric.dumazet@gmail.com>, this happens
only with SLOB, and not with SLAB or SLUB:

* Eric Dumazet <eric.dumazet@gmail.com> [111102 15:56]:
>
> Your patch is absolutely needed, I completely forgot about SLOB :(
>
> since, kmalloc(386) on SLOB gives exactly ksize=386 bytes, not nearest
> power of two.
>
> [   60.305763] malloc(size=385)->ffff880112c11e38 ksize=386 -> nsize=2
> [   60.305921] malloc(size=385)->ffff88007c92ce28 ksize=386 -> nsize=2
> [   60.306898] malloc(size=656)->ffff88007c44ad28 ksize=656 -> nsize=272
> [   60.325385] malloc(size=656)->ffff88007c575868 ksize=656 -> nsize=272
> [   60.325531] malloc(size=656)->ffff88011c777230 ksize=656 -> nsize=272
> [   60.325701] malloc(size=656)->ffff880114011008 ksize=656 -> nsize=272
> [   60.346716] malloc(size=385)->ffff880114142008 ksize=386 -> nsize=2
> [   60.346900] malloc(size=385)->ffff88011c777690 ksize=386 -> nsize=2
Signed-off-by: NTony Lindgren <tony@atomide.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc417e30

net: add missing bh_unlock_sock() calls · 918eb399

由 Eric Dumazet 提交于 11月 02, 2011

Simon Kirby reported lockdep warnings and following messages :

[104661.897577] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

[104661.923653] huh, entered softirq 3 NET_RX ffffffff81613740
preempt_count 00000101, exited with 00000102?

Problem comes from commit 0e734419
(ipv4: Use inet_csk_route_child_sock() in DCCP and TCP.)

If inet_csk_route_child_sock() returns NULL, we should release socket
lock before freeing it.

Another lock imbalance exists if __inet_inherit_port() returns an error
since commit 093d2823 ( tproxy: fix hash locking issue when using
port redirection in __inet_inherit_port()) a backport is also needed for
>= 2.6.37 kernels.
Reported-by: NSimon Kirby <sim@hostway.ca>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Balazs Scheidler <bazsi@balabit.hu>
CC: KOVACS Krisztian <hidden@balabit.hu>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NSimon Kirby <sim@hostway.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

918eb399

l2tp: fix race in l2tp_recv_dequeue() · e2e210c0

由 Eric Dumazet 提交于 11月 02, 2011

Misha Labjuk reported panics occurring in l2tp_recv_dequeue()

If we release reorder_q.lock, we must not keep a dangling pointer (tmp),
since another thread could manipulate reorder_q.

Instead we must restart the scan at beginning of list.
Reported-by: NMisha Labjuk <spiked.yar@gmail.com>
Tested-by: NMisha Labjuk <spiked.yar@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2e210c0

03 11月, 2011 4 次提交

mac80211: disable powersave for broken APs · 05cb9108

由 Johannes Berg 提交于 10月 28, 2011

Only AID values 1-2007 are valid, but some APs have been
found to send random bogus values, in the reported case an
AP that was sending the AID field value 0xffff, an AID of
0x3fff (16383).

There isn't much we can do but disable powersave since
there's no way it can work properly in this case.

Cc: stable@vger.kernel.org
Reported-by: NBill C Riemers <briemers@redhat.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

05cb9108

mac80211: Fix TDLS support validation in add_station handler · e3a4cc2f

由 Jouni Malinen 提交于 10月 23, 2011

We need to verify whether the command is successful before allocating
the station entry to avoid extra processing. This also fixes a memory
leak on the error path.
Signed-off-by: NJouni Malinen <j@w1.fi>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

e3a4cc2f

mac80211: config hw when going back on-channel · 6911bf04

由 Eliad Peller 提交于 10月 20, 2011

When going back on-channel, we should reconfigure
the hw iff the hardware is not already configured
to the operational channel.
Signed-off-by: NEliad Peller <eliad@wizery.com>
Cc: stable@kernel.org # 2.6.39+
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

6911bf04

mac80211: fix remain_off_channel regression · eaa7af2a

由 Eliad Peller 提交于 10月 20, 2011

The offchannel code is currently broken - we should
remain_off_channel if the work was started, and
the work's channel and channel_type are the same
as local->tmp_channel and local->tmp_channel_type.

However, if wk->chan_type and local->tmp_channel_type
coexist (e.g. have the same channel type), we won't
remain_off_channel.

This behavior was introduced by commit da2fd1f0
("mac80211: Allow work items to use existing
channel type.")
Tested-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NEliad Peller <eliad@wizery.com>
Cc: stable@kernel.org # 2.6.39+
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

eaa7af2a

02 11月, 2011 5 次提交

udp: fix a race in encap_rcv handling · 0ad92ad0

由 Eric Dumazet 提交于 11月 01, 2011

udp_queue_rcv_skb() has a possible race in encap_rcv handling, since
this pointer can be changed anytime.

We should use ACCESS_ONCE() to close the race.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ad92ad0

x25: Fix NULL dereference in x25_recvmsg · 501e89d3

由 Dave Jones 提交于 11月 01, 2011

commit cb101ed2 in 3.0 introduced a bug in x25_recvmsg()
When passed bogus junk from userspace, x25->neighbour can be NULL,
as shown in this oops..

BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
IP: [<ffffffffa05482bd>] x25_recvmsg+0x4d/0x280 [x25]
PGD 1015f3067 PUD 105072067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0
Pid: 27928, comm: iknowthis Not tainted 3.1.0+ #2 Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H
RIP: 0010:[<ffffffffa05482bd>]  [<ffffffffa05482bd>] x25_recvmsg+0x4d/0x280 [x25]
RSP: 0018:ffff88010c0b7cc8  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88010c0b7d78 RCX: 0000000000000c02
RDX: ffff88010c0b7d78 RSI: ffff88011c93dc00 RDI: ffff880103f667b0
RBP: ffff88010c0b7d18 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880103f667b0
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f479ce7f700(0000) GS:ffff88012a600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000000001c CR3: 000000010529e000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process iknowthis (pid: 27928, threadinfo ffff88010c0b6000, task ffff880103faa4f0)
Stack:
 0000000000000c02 0000000000000c02 ffff88010c0b7d18 ffffff958153cb37
 ffffffff8153cb60 0000000000000c02 ffff88011c93dc00 0000000000000000
 0000000000000c02 ffff88010c0b7e10 ffff88010c0b7de8 ffffffff815372c2
Call Trace:
 [<ffffffff8153cb60>] ? sock_update_classid+0xb0/0x180
 [<ffffffff815372c2>] sock_aio_read.part.10+0x142/0x150
 [<ffffffff812d6752>] ? inode_has_perm+0x62/0xa0
 [<ffffffff815372fd>] sock_aio_read+0x2d/0x40
 [<ffffffff811b05e2>] do_sync_read+0xd2/0x110
 [<ffffffff812d3796>] ? security_file_permission+0x96/0xb0
 [<ffffffff811b0a91>] ? rw_verify_area+0x61/0x100
 [<ffffffff811b103d>] vfs_read+0x16d/0x180
 [<ffffffff811b109d>] sys_read+0x4d/0x90
 [<ffffffff81657282>] system_call_fastpath+0x16/0x1b
Code: 8b 66 20 4c 8b 32 48 89 d3 48 89 4d b8 45 89 c7 c7 45 cc 95 ff ff ff 4d 85 e4 0f 84 ed 01 00 00 49 8b 84 24 18 05 00 00 4c 89 e7
 78 1c 01 45 19 ed 31 f6 e8 d5 37 ff e0 41 0f b6 44 24 0e 41
Signed-off-by: NDave Jones <davej@redhat.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

501e89d3

net: make the tcp and udp file_operations for the /proc stuff const · 73cb88ec

由 Arjan van de Ven 提交于 10月 30, 2011

the tcp and udp code creates a set of struct file_operations at runtime
while it can also be done at compile time, with the added benefit of then
having these file operations be const.

the trickiest part was to get the "THIS_MODULE" reference right; the naive
method of declaring a struct in the place of registration would not work
for this reason.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73cb88ec

vlan: Don't propagate flag changes on down interfaces. · deede2fa

由 Matthijs Kooijman 提交于 10月 31, 2011

When (de)configuring a vlan interface, the IFF_ALLMULTI ans IFF_PROMISC
flags are cleared or set on the underlying interface. So, if these flags
are changed on a vlan interface that is not up, the flags underlying
interface might be set or cleared twice.

Only propagating flag changes when a device is up makes sure this does
not happen. It also makes sure that an underlying device is not set to
promiscuous or allmulti mode for a vlan device that is down.
Signed-off-by: NMatthijs Kooijman <matthijs@stdin.nl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

deede2fa

neigh: Kill bogus SMP protected debugging message. · 045f7b3b

由 David S. Miller 提交于 11月 01, 2011

Whatever situations make this state legitimate when SMP
also would be legitimate when !SMP and f.e. preemption is
enabled.

This is dubious enough that we should just delete it entirely.  If we
want to add debugging for neigh timer races, better more thorough
mechanisms are needed.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

045f7b3b

01 11月, 2011 4 次提交

netfilter: do not propagate nf_queue errors in nf_hook_slow · 563e1232

由 Florian Westphal 提交于 10月 31, 2011

commit f1585086
(netfilter: nfnetlink_queue: return error number to caller)
erronously assigns the return value of nf_queue() to the "ret" value.

This can cause bogus return values if we encounter QUEUE verdict
when bypassing is enabled, the listener does not exist and the
next hook returns NF_STOLEN.

In this case nf_hook_slow returned -ESRCH instead of 0.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

563e1232

netfilter: ipv6: fix afinfo->route refcnt leak on error · 2dad81ad

由 Florian Westphal 提交于 10月 19, 2011

Several callers (h323 conntrack, xt_addrtype) assume that the
returned **dst only needs to be released if the function returns 0.

This is true for the ipv4 implementation, but not for the ipv6 one.

Instead of changing the users, change the ipv6 implementation
to behave like the ipv4 version by only providing the dst_entry result
in the success case.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2dad81ad

ipvs: Remove unused variable "cs" from ip_vs_leave function. · ad542ced

由 Krzysztof Wilczynski 提交于 10月 18, 2011

This is to address the following warning during compilation time:

  net/netfilter/ipvs/ip_vs_core.c: In function ‘ip_vs_leave’:
  net/netfilter/ipvs/ip_vs_core.c:532: warning: unused variable ‘cs’

This variable is indeed no longer in use.
Signed-off-by: NKrzysztof Wilczynski <krzysztof.wilczynski@linux.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

ad542ced

netfilter: Remove unnecessary OOM logging messages · 0a9ee813

由 Joe Perches 提交于 8月 29, 2011

Site specific OOM messages are duplications of a generic MM
out of memory message and aren't really useful, so just
delete them.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0a9ee813