提交 · 4fb0a54a55d34c28dc53c39567ce171166572699 · openeuler / raspberrypi-kernel

24 2月, 2009 1 次提交

Doc: Refer to ip-sysctl.txt for strict vs. loose rp_filter mode · d18921a0

由 Jesper Dangaard Brouer 提交于 2月 23, 2009

The IP_ADVANCED_ROUTER Kconfig describes the rp_filter
proc option.  Recent changes added a loose mode.
Instead of documenting this change too places, refer to
the document describing it:
 Documentation/networking/ip-sysctl.txt

I'm considering moving the rp_filter description away
from the Kconfig file into ip-sysctl.txt.
Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d18921a0

23 2月, 2009 7 次提交

tcp: Like icmp use register_pernet_subsys · 6a1b3054

由 Eric W. Biederman 提交于 2月 22, 2009

To remove the possibility of packets flying around when network
devices are being cleaned up use reisger_pernet_subsys instead of
register_pernet_device.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a1b3054

netns: Fix icmp shutdown. · 959d2726

由 Eric W. Biederman 提交于 2月 22, 2009

Recently I had a kernel panic in icmp_send during a network namespace
cleanup.  There were packets in the arp queue that failed to be sent
and we attempted to generate an ICMP host unreachable message, but
failed because icmp_sk_exit had already been called.

The network devices are removed from a network namespace and their
arp queues are flushed before we do attempt to shutdown subsystems
so this error should have been impossible.

It turns out icmp_init is using register_pernet_device instead
of register_pernet_subsys.  Which resulted in icmp being shut down
while we still had the possibility of packets in flight, making
a nasty NULL pointer deference in interrupt context possible.

Changing this to register_pernet_subsys fixes the problem in
my testing.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

959d2726

ipv4: Clean whitespaces in net/ipv4/Kconfig. · a6e8f27f

由 Jesper Dangaard Brouer 提交于 2月 22, 2009

While going through net/ipv4/Kconfig cleanup whitespaces.
Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6e8f27f

ipv4: Fix rp_filter description in net/ipv4/Kconfig. · b2cc46a8

由 Jesper Dangaard Brouer 提交于 2月 22, 2009

The reverse path filter (rp_filter) will NOT get enabled
when enabling forwarding.  Read the code and tested in
in practice.

Most distributions do enable it in startup scripts.
Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2cc46a8

ip: ipip compile warning · 5747a1aa

由 Stephen Hemminger 提交于 2月 22, 2009

Get rid of compile warning about non-const format
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5747a1aa

ip: add loose reverse path filtering · c1cf8422

由 Stephen Hemminger 提交于 2月 20, 2009

Extend existing reverse path filter option to allow strict or loose
filtering. (See http://en.wikipedia.org/wiki/Reverse_path_filtering).

For compatibility with existing usage, the value 1 is chosen for strict mode
and 2 for loose mode.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1cf8422

cipso: Fix documentation comment · 586c2500

由 Paul Moore 提交于 2月 20, 2009

The CIPSO protocol engine incorrectly stated that the FIPS-188 specification
could be found in the kernel's Documentation directory. This patch corrects
that by removing the comment and directing users to the FIPS-188 documented
hosted online. For the sake of completeness I've also included a link to the
CIPSO draft specification on the NetLabel website.

Thanks to Randy Dunlap for spotting the error and letting me know.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

586c2500

22 2月, 2009 1 次提交

tcp: Always set urgent pointer if it's beyond snd_nxt · 7691367d

由 Herbert Xu 提交于 2月 21, 2009

Our TCP stack does not set the urgent flag if the urgent pointer
does not fit in 16 bits, i.e., if it is more than 64K from the
sequence number of a packet.

This behaviour is different from the BSDs, and clearly contradicts
the purpose of urgent mode, which is to send the notification
(though not necessarily the associated data) as soon as possible.
Our current behaviour may in fact delay the urgent notification
indefinitely if the receiver window does not open up.

Simply matching BSD however may break legacy applications which
incorrectly rely on the out-of-band delivery of urgent data, and
conversely the in-band delivery of non-urgent data.

Alexey Kuznetsov suggested a safe solution of following BSD only
if the urgent pointer itself has not yet been transmitted.  This
way we guarantee that when the remote end sees the packet with
non-urgent data marked as urgent due to wrap-around we would have
advanced the urgent pointer beyond, either to the actual urgent
data or to an as-yet untransmitted packet.

The only potential downside is that applications on the remote
end may see multiple SIGURG notifications.  However, this would
occur anyway with other TCP stacks.  More importantly, the outcome
of such a duplicate notification is likely to be harmless since
the signal itself does not carry any information other than the
fact that we're in urgent mode.

Thanks to Ilpo Järvinen for fixing a critical bug in this and
Jeff Chua for reporting that bug.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7691367d

19 2月, 2009 1 次提交

tcp: remove obsoleted comment about different passes · 5209921c

由 Ilpo Järvinen 提交于 2月 18, 2009

This is obsolete since the passes got combined.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5209921c

16 2月, 2009 2 次提交

net: replace commatas with semicolons · 1c10c49d

由 Thomas Gleixner 提交于 2月 16, 2009

Impact: syntax fix

Interestingly enough this compiles w/o any complaints:

	orphans = percpu_counter_sum_positive(&tcp_orphan_count),
	sockets = percpu_counter_sum_positive(&tcp_sockets_allocated),
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c10c49d

ip: support for TX timestamps on UDP and RAW sockets · 51f31cab

由 Patrick Ohly 提交于 2月 12, 2009

Instructions for time stamping outgoing packets are take from the
socket layer and later copied into the new skb.
Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51f31cab

09 2月, 2009 2 次提交

gro: Optimise TCP packet reception · aa6320d3

由 Herbert Xu 提交于 2月 08, 2009

gro: Optimise TCP packet reception

As this function can be called more than half a million times for
10GbE, it's important to optimise it as much as we can.

This patch uses bit ops to logical ops, as well as open coding
memcmp to exploit alignment properties.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa6320d3

gro: Optimise IPv4 packet reception · a5ad24be

由 Herbert Xu 提交于 2月 08, 2009

As this function can be called more than half a million times for
10GbE, it's important to optimise it as much as we can.

This patch does some obvious changes to use 2-byte and 4-byte
operations instead of byte-oriented ones where possible.  Bit
ops are also used to replace logical ops to reduce branching.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5ad24be

07 2月, 2009 1 次提交

ipmr: use goto to common label instead of opencoding · 69ebbf58

由 Ilpo Järvinen 提交于 2月 06, 2009

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69ebbf58

06 2月, 2009 3 次提交

udp: Fix potential wrong ip_hdr(skb) pointers · 2783ef23

由 Jesper Dangaard Brouer 提交于 2月 06, 2009

Like the UDP header fix, pskb_may_pull() can potentially
alter the SKB buffer.  Thus the saddr and daddr, pointers
may point to the old skb->data buffer.

I haven't seen corruptions, as its only seen if the old
skb->data buffer were reallocated by another user and
written into very quickly (or poison'd by SLAB debugging).
Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2783ef23

Revert "tcp: Always set urgent pointer if it's beyond snd_nxt" · a23f4bbd

由 David S. Miller 提交于 2月 05, 2009

This reverts commit 64ff3b93.

Jeff Chua reports that it breaks rlogin for him.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a23f4bbd

udp: Fix UDP short packet false positive · 7b5e56f9

由 Jesper Dangaard Brouer 提交于 2月 05, 2009

The UDP header pointer assignment must happen after calling
pskb_may_pull().  As pskb_may_pull() can potentially alter the SKB
buffer.

This was exposted by running multicast traffic through the NIU driver,
as it won't prepull the protocol headers into the linear area on
receive.
Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7b5e56f9

03 2月, 2009 1 次提交

udp: increments sk_drops in __udp_queue_rcv_skb() · e408b8dc

由 Eric Dumazet 提交于 2月 02, 2009

Commit 93821778 (udp: Fix rcv socket
locking) accidentally removed sk_drops increments for UDP IPV4
sockets.

This field can be used to detect incorrect sizing of socket receive
buffers.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e408b8dc

02 2月, 2009 2 次提交

ipv4: Delete redundant sk_family assignment · f15fbcd7

由 Herbert Xu 提交于 2月 01, 2009

sk_alloc now sets sk_family so this is redundant.  In fact it caught
my eye because sock_init_data already uses sk_family so this is too
late anyway.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f15fbcd7

net: move bsockets outside of read only beginning of struct inet_hashinfo · 24dd1fa1

由 Eric Dumazet 提交于 2月 01, 2009

And switch bsockets to atomic_t since it might be changed in parallel.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24dd1fa1

01 2月, 2009 3 次提交

inet: Fix virt-manager regression due to bind(0) changes. · 5add3009

由 Stephen Hemminger 提交于 2月 01, 2009

From: Stephen Hemminger <shemminger@vyatta.com>

Fix regression introduced by a9d8f911
("inet: Allowing more than 64k connections and heavily optimize
bind(0) time.")

Based upon initial patches and feedback from Evegniy Polyakov and
Eric Dumazet.

From Eric Dumazet:
--------------------
Also there might be a problem at line 175

if (sk->sk_reuse && sk->sk_state != TCP_LISTEN && --attempts >= 0) { 
	spin_unlock(&head->lock);
	goto again;

If we entered inet_csk_get_port() with a non null snum, we can "goto again"
while it was not expected.
--------------------
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5add3009

net: add ARP notify option for devices · eefef1cf

由 Stephen Hemminger 提交于 2月 01, 2009

This adds another inet device option to enable gratuitous ARP
when device is brought up or address change. This is handy for
clusters or virtualization.
Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eefef1cf

net: replace uses of __constant_{endian} · 09640e63

由 Harvey Harrison 提交于 2月 01, 2009

Base versions handle constant folding now.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09640e63

30 1月, 2009 2 次提交

gro: Avoid copying headers of unmerged packets · 86911732

由 Herbert Xu 提交于 1月 29, 2009

Unfortunately simplicity isn't always the best.  The fraginfo
interface turned out to be suboptimal.  The problem was quite
obvious.  For every packet, we have to copy the headers from
the frags structure into skb->head, even though for 99% of the
packets this part is immediately thrown away after the merge.

LRO didn't have this problem because it directly read the headers
from the frags structure.

This patch attempts to address this by creating an interface
that allows GRO to access the headers in the first frag without
having to copy it.  Because all drivers that use frags place the
headers in the first frag this optimisation should be enough.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86911732

ipv4: fix infinite retry loop in IP-Config · 9d8dba6c

由 Benjamin Zores 提交于 1月 29, 2009

Signed-off-by: NBenjamin Zores <benjamin.zores@alcatel-lucent.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d8dba6c

27 1月, 2009 3 次提交

tcp: Fix length tcp_splice_data_recv passes to skb_splice_bits. · 9fa5fdf2

由 Dimitris Michailidis 提交于 1月 26, 2009

tcp_splice_data_recv has two lengths to consider: the len parameter it
gets from tcp_read_sock, which specifies the amount of data in the skb,
and rd_desc->count, which is the amount of data the splice caller still
wants. Currently it passes just the latter to skb_splice_bits, which then
splices min(rd_desc->count, skb->len - offset) bytes.

Most of the time this is fine, except when the skb contains urgent data.
In that case len goes only up to the urgent byte and is less than
skb->len - offset. By ignoring len tcp_splice_data_recv may a) splice
data tcp_read_sock told it not to, b) return to tcp_read_sock a value > len.

Now, tcp_read_sock doesn't handle used > len and leaves the socket in a
bad state (both sk_receive_queue and copied_seq are bad at that point)
resulting in duplicated data and corruption.

Fix by passing min(rd_desc->count, len) to skb_splice_bits.
Signed-off-by: NDimitris Michailidis <dm@chelsio.com>
Acked-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fa5fdf2

udp: optimize bind(0) if many ports are in use · 98322f22

由 Eric Dumazet 提交于 1月 26, 2009

commit 9088c560
(udp: Improve port randomization) introduced a regression for UDP bind() syscall
to null port (getting a random port) in case lot of ports are already in use.

This is because we do about 28000 scans of very long chains (220 sockets per chain),
with many spin_lock_bh()/spin_unlock_bh() calls.

Fix this using a bitmap (64 bytes for current value of UDP_HTABLE_SIZE)
so that we scan chains at most once.

Instead of 250 ms per bind() call, we get after patch a time of 2.9 ms 

Based on a report from Vitaly Mayatskikh
Reported-by: NVitaly Mayatskikh <v.mayatskih@gmail.com>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Tested-by: NVitaly Mayatskikh <v.mayatskih@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98322f22

gre: optimize hash lookup · afcf1242

由 Timo Teras 提交于 1月 26, 2009

Instead of keeping candidate tunnel device from all categories,
keep only one candidate with best score. This optimizes stack
usage and speeds up exit code.
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afcf1242

23 1月, 2009 9 次提交

netns: ipmr: enable namespace support in ipv4 multicast routing code · 4feb88e5

由 Benjamin Thery 提交于 1月 22, 2009

This last patch makes the appropriate changes to use and propagate the
network namespace where needed in IPv4 multicast routing code.

This consists mainly in replacing all the remaining init_net occurences
with current netns pointer retrieved from sockets, net devices or
mfc_caches depending on the routines' contexts.

Some routines receive a new 'struct net' parameter to propagate the current
netns:
* vif_add/vif_delete
* ipmr_new_tunnel
* mroute_clean_tables
* ipmr_cache_find
* ipmr_cache_report
* ipmr_cache_unresolved
* ipmr_mfc_add/ipmr_mfc_delete
* ipmr_get_route
* rt_fill_info (in route.c)
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4feb88e5

netns: ipmr: declare ipmr /proc/net entries per-namespace · f6bb4514

由 Benjamin Thery 提交于 1月 22, 2009

Declare IPv4 multicast forwarding /proc/net entries per-namespace:
/proc/net/ip_mr_vif
/proc/net/ip_mr_cache
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6bb4514

netns: ipmr: declare reg_vif_num per-namespace · 6c5143db

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4.

At the moment, this variable is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c5143db

netns: ipmr: declare mroute_do_assert and mroute_do_pim per-namespace · 6f9374a9

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Declare IPv multicast routing variables 'mroute_do_assert' and
'mroute_do_pim' per-namespace in struct netns_ipv4.

At the moment, these variables are only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f9374a9

netns: ipmr: declare counter cache_resolve_queue_len per-namespace · 1e8fb3b6

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Declare variable cache_resolve_queue_len per-namespace: move it into
struct netns_ipv4.

This variable counts the number of unresolved cache entries queued in the
list mfc_unres_queue. This list is kept global to all netns as the number
of entries per namespace is limited to 10 (hardcoded in routine
ipmr_cache_unresolved).
Entries belonging to different namespaces in mfc_unres_queue will be
identified by matching the mfc_net member introduced previously in
struct mfc_cache.

Keeping this list global to all netns, also allows us to keep a single
timer (ipmr_expire_timer) to handle their expiration.
In some places cache_resolve_queue_len value was tested for arming
or deleting the timer. These tests were equivalent to testing
mfc_unres_queue value instead and are replaced in this patch.

At the moment, cache_resolve_queue_len is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e8fb3b6

netns: ipmr: dynamically allocate mfc_cache_array · 2bb8b26c

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array,
and move it to struct netns_ipv4.

At the moment, mfc_cache_array is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2bb8b26c

netns: ipmr: store netns in struct mfc_cache · 5c0a66f5

由 Benjamin Thery 提交于 1月 22, 2009

This patch stores into struct mfc_cache the network namespace each
mfc_cache belongs to. The new member is mfc_net.

mfc_net is assigned at cache allocation and doesn't change during
the rest of the cache entry life.
A new net parameter is added to ipmr_cache_alloc/ipmr_cache_alloc_unres.

This will help to retrieve the current netns around the IPv4 multicast
routing code.

At the moment, all mfc_cache are allocated in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c0a66f5

netns: ipmr: dynamically allocate vif_table · cf958ae3

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv6 multicast routing netns-aware.

Dynamically allocate interface table vif_table and move it to
struct netns_ipv4, and update MIF_EXISTS() macro.

At the moment, vif_table is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf958ae3

netns: ipmr: allocate mroute_socket per-namespace. · 70a269e6

由 Benjamin Thery 提交于 1月 22, 2009

Preliminary work to make IPv4 multicast routing netns-aware.

Make IPv4 multicast routing mroute_socket per-namespace,
moves it into struct netns_ipv4.

At the moment, mroute_socket is only referenced in init_net.
Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70a269e6

22 1月, 2009 2 次提交

gre: strict physical device binding · 749c10f9

由 Timo Teras 提交于 1月 19, 2009

Check the device on receive path and allow otherwise identical devices
as long as the physical device differs.

This is useful for NBMA tunnels, where you want to use different gre IP
for each public IP available via different physical devices.
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

749c10f9

inet: Allowing more than 64k connections and heavily optimize bind(0) time. · a9d8f911

由 Evgeniy Polyakov 提交于 1月 19, 2009

With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.

So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.

Attached picture shows bind() time depending on number of already bound
sockets.

Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.

At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.

Blue area corresponds to the port selection optimization.

This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.
Signed-off-by: NEvgeniy Polyakov <zbr@ioremap.net>
Tested-by: NDenys Fedoryschenko <denys@visp.net.lb>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9d8f911