提交 · 86911732d3996a9da07914b280621450111bb6da · openanolis / cloud-kernel

30 1月, 2009 2 次提交

gro: Avoid copying headers of unmerged packets · 86911732

由 Herbert Xu 提交于 1月 29, 2009

Unfortunately simplicity isn't always the best.  The fraginfo
interface turned out to be suboptimal.  The problem was quite
obvious.  For every packet, we have to copy the headers from
the frags structure into skb->head, even though for 99% of the
packets this part is immediately thrown away after the merge.

LRO didn't have this problem because it directly read the headers
from the frags structure.

This patch attempts to address this by creating an interface
that allows GRO to access the headers in the first frag without
having to copy it.  Because all drivers that use frags place the
headers in the first frag this optimisation should be enough.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86911732

gro: Move common completion code into helpers · 5d0d9be8

由 Herbert Xu 提交于 1月 29, 2009

Currently VLAN still has a bit of common code handling the aftermath
of GRO that's shared with the common path.  This patch moves them
into shared helpers to reduce code duplication.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d0d9be8

28 1月, 2009 3 次提交

net: Get rid of by-hand TX queue hashing. · 7019298a

由 David S. Miller 提交于 1月 27, 2009

We now only TX hash on pre-computed SKB properties.

The thinking is:

1) High performance routing and firewalling setups will
   have a multiqueue capable card used for receive, and
   therefore would have RX queue recordings made into
   the SKB which can be used for the TX side hash.

2) Locally generated packets will have an attached socket
   and thus a valid sk->sk_hash to make use of.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7019298a

D
net: If SKB has attached socket, use socket's hash for TX queue selection. · f7105d63
由 David S. Miller 提交于 1月 27, 2009
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f7105d63

net: Allow RX queue selection to seed TX queue hashing. · d5a9e24a

由 David S. Miller 提交于 1月 27, 2009

The idea is that drivers which implement multiqueue RX
pre-seed the SKB by recording the RX queue selected by
the hardware.

If such a seed is found on TX, we'll use that to select
the outgoing TX queue.

This helps get more consistent load balancing on router
and firewall loads.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5a9e24a

27 1月, 2009 9 次提交

Phonet: use per-namespace devices list · 9a3b7a42

由 remi.denis-courmont@nokia 提交于 1月 23, 2009

Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a3b7a42

Phonet: remove useless locking in device cleanup · 6530e0fe

由 remi.denis-courmont@nokia 提交于 1月 23, 2009

Incoming packets and sockets are already gone.
The netdevice notifier is unregistered under the RTNL lock
There remains a race with the rtnetlink handlers unregistration, but it
is a generic RTNL issue that was already present before this change.
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6530e0fe

Phonet: handle rtnetlink registration failure · 660f706d

由 remi.denis-courmont@nokia 提交于 1月 23, 2009

Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

660f706d

R
Phonet: allow phonet_device_init() to fail, put it to __init section · 76e02cf6
由 remi.denis-courmont@nokia 提交于 1月 23, 2009
```
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
76e02cf6
R
Phonet: check destination before delivering packets locally · 4b8f704b
由 remi.denis-courmont@nokia 提交于 1月 23, 2009
```
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
4b8f704b
R
Phonet: move to Networking options like other protocol stacks · 5075138d
由 remi.denis-courmont@nokia 提交于 1月 23, 2009
```
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
5075138d

gre: optimize hash lookup · afcf1242

由 Timo Teras 提交于 1月 26, 2009

Instead of keeping candidate tunnel device from all categories,
keep only one candidate with best score. This optimizes stack
usage and speeds up exit code.
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afcf1242

vlan: Export symbols as non GPL symbols. · 116cb428

由 Ben Greear 提交于 1月 26, 2009

In previous kernels, any kernel module could get access to the
'real-device' and the VLAN-ID for a particular VLAN.  In more recent
kernels, the code was restructured such that this is hard to do
without accessing private .h files for any module that cannot use
GPL-only symbols.

Attached is a patch to once again allow non-GPL modules the ability to
access the real-device and VLAN id for VLANs.
Signed-off-by: NBen Greear <greearb@candelatech.com>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

116cb428

net: Move config NET_NS to from net/Kconfig to init/Kconfig · d6eb633f

由 Matt Helsley 提交于 1月 26, 2009

Make NET_NS available underneath the generic Namespaces config option
since all of the other namespace options are there.
Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6eb633f

26 1月, 2009 1 次提交

af_key: initialize xfrm encap_oa · a8d694c6

由 Timo Teras 提交于 1月 25, 2009

Currently encap_oa is left uninitialized, so it contains garbage data which
is visible to userland via Netlink. Initialize it by zeroing it out.
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8d694c6

23 1月, 2009 15 次提交

sctp: Fix another socket race during accept/peeloff · ae53b5bd

由 Vlad Yasevich 提交于 1月 22, 2009

There is a race between sctp_rcv() and sctp_accept() where we
have moved the association from the listening socket to the
accepted socket, but sctp_rcv() processing cached the old
socket and continues to use it.

The easy solution is to check for the socket mismatch once we've
grabed the socket lock.  If we hit a mis-match, that means
that were are currently holding the lock on the listening socket,
but the association is refrencing a newly accepted socket.  We need
to drop the lock on the old socket and grab the lock on the new one.

A more proper solution might be to create accepted sockets when
the new association is established, similar to TCP.  That would
eliminate the race for 1-to-1 style sockets, but it would still
existing for 1-to-many sockets where a user wished to peeloff an
association.  For now, we'll live with this easy solution as
it addresses the problem.
Reported-by: NMichal Hocko <mhocko@suse.cz>
Reported-by: NKarsten Keil <kkeil@suse.de>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae53b5bd

sctp: Properly timestamp outgoing data chunks for rtx purposes · 759af00e

由 Vlad Yasevich 提交于 1月 22, 2009

Recent changes to the retransmit code exposed a long standing
bug where it was possible for a chunk to be time stamped
after the retransmit timer was reset.  This caused a rare
situation where the retrnamist timer has expired, but
nothing was marked for retrnasmission because all of
timesamps on data were less then 1 rto ago.  As result,
the timer was never restarted since nothing was retransmitted,
and this resulted in a hung association that did couldn't
complete the data transfer.  The solution is to timestamp
the chunk when it's added to the packet for transmission
purposes.  After the packet is trsnmitted the rtx timer
is restarted.  This guarantees that when the timer expires,
there will be data to retransmit.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

759af00e

sctp: Correctly start rtx timer on new packet transmissions. · 6574df9a

由 Vlad Yasevich 提交于 1月 22, 2009

Commit 62aeaff5
(sctp: Start T3-RTX timer when fast retransmitting lowest TSN)
introduced a regression where it was possible to forcibly
restart the sctp retransmit timer at the transmission of any
new chunk.  This resulted in much longer timeout times and
sometimes hung sctp connections.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6574df9a

netns: ipmr: enable namespace support in ipv4 multicast routing code · 4feb88e5

由 Benjamin Thery 提交于 1月 22, 2009

This last patch makes the appropriate changes to use and propagate the
network namespace where needed in IPv4 multicast routing code.

This consists mainly in replacing all the remaining init_net occurences
with current netns pointer retrieved from sockets, net devices or
mfc_caches depending on the routines' contexts.

Some routines receive a new 'struct net' parameter to propagate the current
netns:
* vif_add/vif_delete
* ipmr_new_tunnel
* mroute_clean_tables
* ipmr_cache_find
* ipmr_cache_report
* ipmr_cache_unresolved
* ipmr_mfc_add/ipmr_mfc_delete
* ipmr_get_route
* rt_fill_info (in route.c)
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4feb88e5

netns: ipmr: declare ipmr /proc/net entries per-namespace · f6bb4514

由 Benjamin Thery 提交于 1月 22, 2009

Declare IPv4 multicast forwarding /proc/net entries per-namespace:
/proc/net/ip_mr_vif
/proc/net/ip_mr_cache
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6bb4514

netns: ipmr: declare reg_vif_num per-namespace · 6c5143db

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4.

At the moment, this variable is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c5143db

netns: ipmr: declare mroute_do_assert and mroute_do_pim per-namespace · 6f9374a9

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Declare IPv multicast routing variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv4.

At the moment, these variables are only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f9374a9

netns: ipmr: declare counter cache_resolve_queue_len per-namespace · 1e8fb3b6

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Declare variable cache_resolve_queue_len per-namespace: move it into
struct netns_ipv4.

This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine
ipmr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in
struct mfc_cache.

Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming
or deleting the timer. These tests were equivalent to testing
mfc_unres_queue value instead and are replaced in this patch.

At the moment, cache_resolve_queue_len is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e8fb3b6

netns: ipmr: dynamically allocate mfc_cache_array · 2bb8b26c

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array,
and move it to struct netns_ipv4.

At the moment, mfc_cache_array is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2bb8b26c

netns: ipmr: store netns in struct mfc_cache · 5c0a66f5

由 Benjamin Thery 提交于 1月 22, 2009

This patch stores into struct mfc_cache the network namespace each
mfc_cache belongs to. The new member is mfc_net.

mfc_net is assigned at cache allocation and doesn't change during
the rest of the cache entry life.
A new net parameter is added to ipmr_cache_alloc/ipmr_cache_alloc_unres.

This will help to retrieve the current netns around the IPv4 multicast
routing code.

At the moment, all mfc_cache are allocated in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c0a66f5

netns: ipmr: dynamically allocate vif_table · cf958ae3

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv6 multicast routing netns-aware.

Dynamically allocate interface table vif_table and move it to
struct netns_ipv4, and update MIF_EXISTS() macro.

At the moment, vif_table is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf958ae3

netns: ipmr: allocate mroute_socket per-namespace. · 70a269e6

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Make IPv4 multicast routing mroute_socket per-namespace,
moves it into struct netns_ipv4.

At the moment, mroute_socket is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70a269e6

sctp/ipv6.c: use ipv6_addr_copy · 4fe1d58b

由 Joe Perches 提交于 1月 22, 2009

Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4fe1d58b

mac80211: fix slot time debug message · 391429c1

由 Christian Lamparter 提交于 1月 18, 2009

wlan0: switched to short barker preamble (BSSID=00:01:aa:bb:cc:dd)
wlan0: switched to short slot (BSSID=) <something is missing here>

should be:

wlan0: switched to short barker preamble (BSSID=00:01:aa:bb:cc:dd)
wlan0: switched to short slot (BSSID=00:01:aa:bb:cc:dd)
Signed-off-by: NChristian Lamparter <chunkeey@web.de>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

391429c1

mac80211: decrement ref count to netdev after launching mesh discovery · 5dc306f3

由 Brian Cavagnolo 提交于 1月 16, 2009

After launching mesh discovery in tx path, reference count was not being
decremented.  This was preventing module unload.
Signed-off-by: NBrian Cavagnolo <brian@cozybit.com>
Signed-off-by: NAndrey Yurovsky <andrey@cozybit.com>
Acked-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

5dc306f3

22 1月, 2009 10 次提交

A
fs/Kconfig: move sunrpc out · 9098c24f
由 Alexey Dobriyan 提交于 1月 22, 2009
```
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
```
9098c24f

gre: strict physical device binding · 749c10f9

由 Timo Teras 提交于 1月 19, 2009

Check the device on receive path and allow otherwise identical devices
as long as the physical device differs.

This is useful for NBMA tunnels, where you want to use different gre IP
for each public IP available via different physical devices.
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

749c10f9

inet: Allowing more than 64k connections and heavily optimize bind(0) time. · a9d8f911

由 Evgeniy Polyakov 提交于 1月 19, 2009

With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.

So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.

Attached picture shows bind() time depending on number of already bound
sockets.

Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.

At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.

Blue area corresponds to the port selection optimization.

This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.
Signed-off-by: NEvgeniy Polyakov <zbr@ioremap.net>
Tested-by: NDenys Fedoryschenko <denys@visp.net.lb>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9d8f911

dccp: Debugging functions for feature negotiation · f3f3abb6

由 Gerrit Renker 提交于 1月 16, 2009

Since all feature-negotiation processing now takes place in feat.c,
functions for producing verbose debugging output are concentrated
there.

New functions to print out values, entry records, and options are
provided, and also a macro is defined to not always have the function
name in the output line.

Thanks a lot to Wei Yongjun and Giuseppe Galeota for help and
discussion with an earlier revision of this patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3f3abb6

dccp: Initialisation and type-checking of feature sysctls · 883ca833

由 Gerrit Renker 提交于 1月 16, 2009

This patch takes care of initialising and type-checking sysctls
related to feature negotiation. Type checking is important since some
of the sysctls now directly impact the feature-negotiation process.

The sysctls are initialised with the known default values for each
feature.  For the type-checking the value constraints from RFC 4340
are used:

 * Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes),
   tested and confirmed that it works up to 4294967295 - for Gbps speed;
 * Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer);
 * CCIDs are between 0 .. 255;
 * request_retries, retries1, retries2 also between 0..255 for good measure;
 * tx_qlen is checked to be non-negative;
 * sync_ratelimit remains as before.

Notes:
------
 1. Die s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c.
 2. As pointed out by Arnaldo, the pattern of type-checking repeats itself in
    other places, sometimes with exactly the same kind of definitions (e.g.
    "static int zero;"). It may be a good idea (kernel janitors?) to consolidate
    type checking. For the sake of keeping the changeset small and in order not
    to affect other subsystems, I have not strived to generalise here.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

883ca833

dccp: Implement both feature-local and feature-remote Sequence Window feature · 792b4878

由 Gerrit Renker 提交于 1月 16, 2009

This adds full support for local/remote Sequence Window feature, from which the
  * sequence-number-validity (W) and
  * acknowledgment-number-validity (W') windows
derive as specified in RFC 4340, 7.5.3.

Specifically, the following is contained in this patch:
  * integrated new socket fields into dccp_sk;
  * updated the update_gsr/gss routines with regard to these fields;
  * updated handler code: the Sequence Window feature is located at the TX side,
    so the local feature is meant if the handler-rx flag is false;
  * the initialisation of `rcv_wnd' in reqsk is removed, since
    - rcv_wnd is not used by the code anywhere;
    - sequence number checks are not done in the LISTEN state (cf. 7.5.3);
    - dccp_check_req checks the Ack number validity more rigorously;
  * the `struct dccp_minisock' became empty and is now removed.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

792b4878

dccp: Initialisation framework for feature negotiation · f90f92ee

由 Gerrit Renker 提交于 1月 16, 2009

This initialises feature negotiation from two tables, which are in
turn are initialised from sysctls.

As a novel feature, specifics of the implementation (e.g. that short
seqnos and ECN are not yet available) are advertised for robustness.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f90f92ee

appletalk: remove unneeded stubs · 60961ce4

由 Stephen Hemminger 提交于 1月 09, 2009

With net_device_ops if set_mac_address is null, then error
is -EOPNOTSUPPORTED.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60961ce4

rose: convert to network_device_ops · 3170c656

由 Stephen Hemminger 提交于 1月 09, 2009

Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3170c656

rose: convert to internal net_device_stats · d289d120

由 Stephen Hemminger 提交于 1月 09, 2009

Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d289d120

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功