提交 · 551eaff1b384cc107eab6332ba8424b3ca1f304b · openeuler / raspberrypi-kernel

22 11月, 2010 2 次提交

pktgen: allow faster module unload · 551eaff1

由 Eric Dumazet 提交于 11月 21, 2010

Unloading pktgen module needs ~6 seconds on a 64 cpus machine, to stop
64 kthreads.

Add a pktgen_exiting variable to let kernel threads die faster, so that
kthread_stop() doesnt have to wait too long for them. This variable is
not tested in fast path.

Note : Before exiting from pktgen_thread_worker(), we must make sure
kthread_stop() is waiting for this thread to be stopped, like its done
in kernel/softirq.c
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

551eaff1

packet: use vzalloc() · bbce5a59

由 Eric Dumazet 提交于 11月 20, 2010

alloc_one_pg_vec_page() is supposed to return zeroed memory, so use
vzalloc() instead of vmalloc()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bbce5a59

20 11月, 2010 8 次提交

X25: remove bkl in routing ioctls · 0670b8ae

由 andrew hendry 提交于 11月 18, 2010

Routing doesn't use the socket data and is protected by x25_route_list_lock
Signed-off-by: NAndrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0670b8ae

X25: remove bkl in inq and outq ioctls · 54aafbd4

由 andrew hendry 提交于 11月 18, 2010

Signed-off-by: NAndrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54aafbd4

X25: remove bkl in timestamp ioctls · 1ecd66bf

由 andrew hendry 提交于 11月 18, 2010

Signed-off-by: NAndrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ecd66bf

X25: pushdown bkl in ioctls · 70be998c

由 andrew hendry 提交于 11月 18, 2010

Push down the bkl in the ioctls so they can be removed one at a time.
Signed-off-by: NAndrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70be998c

filter: use reciprocal divide · c26aed40

由 Eric Dumazet 提交于 11月 18, 2010

At compile time, we can replace the DIV_K instruction (divide by a
constant value) by a reciprocal divide.

At exec time, the expensive divide is replaced by a multiply, a less
expensive operation on most processors.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c26aed40

filter: cleanup codes[] init · 8c1592d6

由 Eric Dumazet 提交于 11月 18, 2010

Starting the translated instruction to 1 instead of 0 allows us to
remove one descrement at check time and makes codes[] array init
cleaner.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c1592d6

filter: optimize sk_run_filter · 93aaae2e

由 Eric Dumazet 提交于 11月 19, 2010

Remove pc variable to avoid arithmetic to compute fentry at each filter
instruction. Jumps directly manipulate fentry pointer.

As the last instruction of filter[] is guaranteed to be a RETURN, and
all jumps are before the last instruction, we dont need to check filter
bounds (number of instructions in filter array) at each iteration, so we
remove it from sk_run_filter() params.

On x86_32 remove f_k var introduced in commit 57fe93b3
(filter: make sure filters dont read uninitialized memory)

Note : We could use a CONFIG_ARCH_HAS_{FEW|MANY}_REGISTERS in order to
avoid too many ifdefs in this code.

This helps compiler to use cpu registers to hold fentry and A
accumulator.

On x86_32, this saves 401 bytes, and more important, sk_run_filter()
runs much faster because less register pressure (One less conditional
branch per BPF instruction)

# size net/core/filter.o net/core/filter_pre.o
   text    data     bss     dec     hex filename
   2948       0       0    2948     b84 net/core/filter.o
   3349       0       0    3349     d15 net/core/filter_pre.o

on x86_64 :
# size net/core/filter.o net/core/filter_pre.o
   text    data     bss     dec     hex filename
   5173       0       0    5173    1435 net/core/filter.o
   5224       0       0    5224    1468 net/core/filter_pre.o
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93aaae2e

net: fix kernel-doc for sk_filter_rcu_release · 0302b862

由 Randy Dunlap 提交于 11月 18, 2010

Fix kernel-doc warning for sk_filter_rcu_release():

Warning(net/core/filter.c:586): missing initial short description on line:
 * 	sk_filter_rcu_release: Release a socket filter by rcu_head
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc:	"David S. Miller" <davem@davemloft.net>
Cc:	netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0302b862

19 11月, 2010 12 次提交

netfilter: fix IP_VS dependencies · dba4490d

由 Patrick McHardy 提交于 11月 18, 2010

When NF_CONNTRACK is enabled, IP_VS uses conntrack symbols.
Therefore IP_VS can't be linked statically when conntrack
is built modular.
Reported-by: NJustin P. Mattock <justinmattock@gmail.com>
Tested-by: NJustin P. Mattock <justinmattock@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dba4490d

net: irda: irttp: sync error paths of data- and udata-requests · 925e277f

由 Wolfram Sang 提交于 11月 16, 2010

irttp_data_request() returns meaningful errorcodes, while irttp_udata_request()
just returns -1 in similar situations. Sync the two and the loglevels of the
accompanying output.
Signed-off-by: NWolfram Sang <w.sang@pengutronix.de>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

925e277f

ipv6: Expose reachable and retrans timer values as msecs · 18a31e1e

由 Thomas Graf 提交于 11月 17, 2010

Expose reachable and retrans timer values in msecs instead of jiffies.
Both timer values are already exposed as msecs in the neighbour table
netlink interface.

The creation timestamp format with increased precision is kept but
cleaned up.
Signed-off-by: NThomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18a31e1e

ipv6: Expose IFLA_PROTINFO timer values in msecs instead of jiffies · 93908d19

由 Thomas Graf 提交于 11月 17, 2010

IFLA_PROTINFO exposes timer related per device settings in jiffies.
Change it to expose these values in msecs like the sysctl interface
does.

I did not find any users of IFLA_PROTINFO which rely on any of these
values and even if there are, they are likely already broken because
there is no way for them to reliably convert such a value to another
time format.
Signed-off-by: NThomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93908d19

igmp: refine skb allocations · 57e1ab6e

由 Eric Dumazet 提交于 11月 16, 2010

IGMP allocates MTU sized skbs. This may fail for large MTU (order-2
allocations), so add a fallback to try lower sizes.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57e1ab6e

net: move definitions of BPF_S_* to net/core/filter.c · 4c3710af

由 Changli Gao 提交于 11月 16, 2010

BPF_S_* are used internally, should not be exposed to the others.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NHagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c3710af

filter: Optimize instruction revalidation code. · cba328fc

由 Tetsuo Handa 提交于 11月 16, 2010

Since repeating u16 value to u8 value conversion using switch() clause's
case statement is wasteful, this patch introduces u16 to u8 mapping table
and removes most of case statements. As a result, the size of net/core/filter.o
is reduced by about 29% on x86.
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cba328fc

net: add priority field to pktgen · 9e50e3ac

由 John Fastabend 提交于 11月 16, 2010

Add option to set skb priority to pktgen. Useful for testing
QOS features. Also by running pktgen on the vlan device the
qdisc on the real device can be tested.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e50e3ac

net: zero kobject in rx_queue_release · 7d8e76bf

由 John Fastabend 提交于 11月 16, 2010

netif_set_real_num_rx_queues() can decrement and increment
the number of rx queues. For example ixgbe does this as
features and offloads are toggled. Presumably this could
also happen across down/up on most devices if the available
resources changed (cpu offlined).

The kobject needs to be zero'd in this case so that the
state is not preserved across kobject_put()/kobject_init_and_add().

This resolves the following error report.

ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
Call Trace:
 [<ffffffff8121c940>] kobject_init+0x3a/0x83
 [<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57
 [<ffffffff8107b800>] ? mark_lock+0x21/0x267
 [<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6
 [<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78
 [<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
 [<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
 [<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d8e76bf

dccp ccid-2: whitespace fix-up · f72f2f4c

由 Gerrit Renker 提交于 11月 18, 2010

This fixes whitespace noise introduced in commit "dccp ccid-2: Algorithm to
update buffer state", 5753fdfe, 14 Nov 2010.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f72f2f4c

bonding: IGMP handling cleanup · 866f3b25

由 Eric Dumazet 提交于 11月 18, 2010

Instead of iterating in_dev->mc_list from bonding driver, its better
to call a helper function provided by igmp.c
Details of implementation (locking) are private to igmp code.

ip_mc_rejoin_group(struct ip_mc_list *im) becomes
ip_mc_rejoin_groups(struct in_device *in_dev);
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

866f3b25

cfg80211: fix can_beacon_sec_chan, reenable HT40 · 09a02fdb

由 Mark Mentovai 提交于 11月 17, 2010

This follows wireless-testing 9236d838
("cfg80211: fix extension channel checks to initiate communication") and
fixes accidental case fall-through. Without this fix, HT40 is entirely
blocked.
Signed-off-by: NMark Mentovai <mark@moxienet.com>
Cc: stable@kernel.org
Acked-by: Luis R. Rodriguez <lrodriguez@atheros.com
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

09a02fdb

18 11月, 2010 10 次提交

net: ipv4: tcp_probe: cleanup snprintf() use · dda0b386

由 Vasiliy Kulikov 提交于 11月 14, 2010

snprintf() returns number of bytes that were copied if there is no overflow.
This code uses return value as number of copied bytes. Theoretically format
string '%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n' may be expanded
up to 163 bytes. In reality tv.tv_sec is just few bytes instead of 20, 2 ports
are just 5 bytes each instead of 10, length is 5 bytes instead of 10. The rest
is an unstrusted input. Theoretically if tv_sec is big then copy_to_user() would
overflow tbuf.

tbuf was increased to fit in 163 bytes. snprintf() is used to follow return
value semantic.
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dda0b386

net: zero kobject in rx_queue_release · 9ea19481

由 John Fastabend 提交于 11月 16, 2010

netif_set_real_num_rx_queues() can decrement and increment
the number of rx queues. For example ixgbe does this as
features and offloads are toggled. Presumably this could
also happen across down/up on most devices if the available
resources changed (cpu offlined).

The kobject needs to be zero'd in this case so that the
state is not preserved across kobject_put()/kobject_init_and_add().

This resolves the following error report.

ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
Call Trace:
 [<ffffffff8121c940>] kobject_init+0x3a/0x83
 [<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57
 [<ffffffff8107b800>] ? mark_lock+0x21/0x267
 [<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6
 [<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78
 [<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
 [<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
 [<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ea19481

net: use the macros defined for the members of flowi · 5811662b

由 Changli Gao 提交于 11月 12, 2010

Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5811662b

rds: Integer overflow in RDS cmsg handling · 218854af

由 Dan Rosenberg 提交于 11月 17, 2010

In rds_cmsg_rdma_args(), the user-provided args->nr_local value is
restricted to less than UINT_MAX. This seems to need a tighter upper
bound, since the calculation of total iov_size can overflow, resulting
in a small sock_kmalloc() allocation. This would probably just result
in walking off the heap and crashing when calling rds_rdma_pages() with
a high count value. If it somehow doesn't crash here, then memory
corruption could occur soon after.
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

218854af

ipv6: AF_INET6 link address family · b382b191

由 Thomas Graf 提交于 11月 16, 2010

IPv6 already exposes some address family data via netlink in the
IFLA_PROTINFO attribute if RTM_GETLINK request is sent with the
address family set to AF_INET6. We take over this format and
reuse all the code.
Signed-off-by: NThomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b382b191

ipv4: AF_INET link address family · 9f0f7272

由 Thomas Graf 提交于 11月 16, 2010

Implements the AF_INET link address family exposing the per
device configuration settings via netlink using the attribute
IFLA_INET_CONF.

The format of IFLA_INET_CONF differs depending on the direction
the attribute is sent. The attribute sent by the kernel consists
of a u32 array, basically a 1:1 copy of in_device->cnf.data[].
The attribute expected by the kernel must consist of a sequence
of nested u32 attributes, each representing a change request,
e.g.
	[IFLA_INET_CONF] = {
		[IPV4_DEVCONF_FORWARDING] = 1,
		[IPV4_DEVCONF_NOXFRM] = 0,
	}

libnl userspace API documentation and example available from:
http://www.infradead.org/~tgr/libnl/doc-git/group__link__inet.htmlSigned-off-by: NThomas Graf <tgraf@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f0f7272

rtnetlink: Link address family API · f8ff182c

由 Thomas Graf 提交于 11月 16, 2010

Each net_device contains address family specific data such as
per device settings and statistics. We already expose this data
via procfs/sysfs and partially netlink.

The netlink method requires the requester to send one RTM_GETLINK
request for each address family it wishes to receive data of
and then merge this data itself.

This patch implements a new API which combines all address family
specific link data in a new netlink attribute IFLA_AF_SPEC.
IFLA_AF_SPEC contains a sequence of nested attributes, one for each
address family which in turn defines the structure of its own
attribute. Example:

   [IFLA_AF_SPEC] = {
       [AF_INET] = {
           [IFLA_INET_CONF] = ...,
       },
       [AF_INET6] = {
           [IFLA_INET6_FLAGS] = ...,
           [IFLA_INET6_CONF] = ...,
       }
   }

The API also allows for address families to implement a function
which parses the IFLA_AF_SPEC attribute sent by userspace to
implement address family specific link options.
Signed-off-by: NThomas Graf <tgraf@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8ff182c

network: tcp_connect should return certain errors up the stack · ee586811

由 Eric Paris 提交于 11月 16, 2010

The current tcp_connect code completely ignores errors from sending an skb.
This makes sense in many situations (like -ENOBUFFS) but I want to be able to
immediately fail connections if they are denied by the SELinux netfilter hook.
Netfilter does not normally return ECONNREFUSED when it drops a packet so we
respect that error code as a final and fatal error that can not be recovered.
Based-on-patch-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee586811

netfilter: allow hooks to pass error code back up the stack · da683650

由 Eric Paris 提交于 11月 16, 2010

SELinux would like to pass certain fatal errors back up the stack.  This patch
implements the generic netfilter support for this functionality.
Based-on-patch-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da683650

J
net/atm: Remove unnecessary casts of netdev_priv · 37d66800
由 Joe Perches 提交于 11月 15, 2010
```
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
37d66800

17 11月, 2010 8 次提交

cfg80211: fix extension channel checks to initiate communication · 9236d838

由 Luis R. Rodriguez 提交于 11月 12, 2010

When operating in a mode that initiates communication and using
HT40 we should fail if we cannot use both primary and secondary
channels to initiate communication. Our current ht40 allowmap
only covers STA mode of operation, for beaconing modes we need
a check on the fly as the mode of operation is dynamic and
there other flags other than disable which we should read
to check if we can initiate communication.

Do not allow for initiating communication if our secondary HT40
channel has is either disabled, has a passive scan flag, a
no-ibss flag or is a radar channel. Userspace now has similar
checks but this is also needed in-kernel.
Reported-by: NJouni Malinen <jouni.malinen@atheros.com>
Cc: stable@kernel.org
Signed-off-by: NLuis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

9236d838

xfrm: update flowi saddr in icmp_send if unset · 7d98ffd8

由 Ulrich Weber 提交于 11月 05, 2010

otherwise xfrm_lookup will fail to find correct policy
Signed-off-by: NUlrich Weber <uweber@astaro.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d98ffd8

udp: use atomic_inc_not_zero_hint · c31504dc

由 Eric Dumazet 提交于 11月 15, 2010

UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.

Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c31504dc

vlan: remove ndo_select_queue() logic · 213b15ca

由 Eric Dumazet 提交于 11月 11, 2010

Now vlan are lockless, we dont need special ndo_select_queue() logic.
dev_pick_tx() will do the multiqueue stuff on the real device transmit.
Suggested-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

213b15ca

vlan: lockless transmit path · 4af429d2

由 Eric Dumazet 提交于 11月 10, 2010

vlan is a stacked device, like tunnels. We should use the lockless
mechanism we are using in tunnels and loopback.

This patch completely removes locking in TX path.

tx stat counters are added into existing percpu stat structure, renamed
from vlan_rx_stats to vlan_pcpu_stats.

Note : this partially reverts commit 2e59af3d (vlan: multiqueue vlan
device)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4af429d2

packet: Enhance AF_PACKET implementation to not require high order contiguous... · 0e3125c7

由 Neil Horman 提交于 11月 16, 2010

packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Version 4 of this patch.

Change notes:
1) Removed extra memset. Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)

Summary:
It was shown to me recently that systems under high load were driven very deep
into swap when tcpdump was run. The reason this happened was because the
AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
application to specify how many entries an AF_PACKET socket will have and how
large each entry will be. It seems the default setting for tcpdump is to set
the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
allocation. Thats difficult under good circumstances, and horrid under memory
pressure.

I thought it would be good to make that a bit more usable. I was going to do a
simple conversion of the ring buffer from contigous pages to iovecs, but
unfortunately, the metadata which AF_PACKET places in these buffers can easily
span a page boundary, and given that these buffers get mapped into user space,
and the data layout doesn't easily allow for a change to padding between frames
to avoid that, a simple iovec change is just going to break user space ABI
consistency.

So I've done this, I've added a three tiered mechanism to the af_packet set_ring
socket option. It attempts to allocate memory in the following order:

1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
digging into swap

2) Using vmalloc

3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
needed to get the memory

The effect is that we don't disturb the system as much when we're under load,
while still being able to conduct tcpdumps effectively.

Tested successfully by me.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMaciej Żenczykowski <zenczykowski@gmail.com>
Reported-by: NMaciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e3125c7

irda: irttp: allow zero byte packets · 4c62ab9c

由 Wolfram Sang 提交于 11月 16, 2010

Sending zero byte packets is not neccessarily an error (AF_INET accepts it,
too), so just apply a shortcut. This was discovered because of a non-working
software with WINE. See

  http://bugs.winehq.org/show_bug.cgi?id=19397#c86
  http://thread.gmane.org/gmane.linux.irda.general/1643

for very detailed debugging information and a testcase. Kudos to Wolfgang for
those!
Reported-by: NWolfgang Schwotzer <wolfgang.schwotzer@gmx.net>
Signed-off-by: NWolfram Sang <w.sang@pengutronix.de>
Tested-by: NMike Evans <mike.evans@cardolan.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4c62ab9c

ipv6: fix missing in6_ifa_put in addrconf · 9d82ca98

由 John Fastabend 提交于 11月 15, 2010

Fix ref count bug introduced by

commit 2de79570
Author: Lorenzo Colitti <lorenzo@google.com>
Date:   Wed Oct 27 18:16:49 2010 +0000

ipv6: addrconf: don't remove address state on ifdown if the address
is being kept

Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
refcnt correctly with in6_ifa_put().
Reported-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d82ca98