提交 · 7512cbf6efc97644812f137527a54b8e92b6a90a · openeuler / Kernel

22 3月, 2008 1 次提交

[IPV4]: Fix null dereference in ip_defrag · 12b10155

由 Phil Oester 提交于 3月 21, 2008

Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag.
Offending line in ip_defrag is here:

	net = skb->dev->nd_net

where dev is NULL.  Bisected the problem down to commit
ac18e750 ([NETNS][FRAGS]: Make the
inet_frag_queue lookup work in namespaces).  

Below patch (idea from Patrick McHardy) fixes the problem for me.
Signed-off-by: NPhil Oester <kernel@linuxace.com>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12b10155

21 3月, 2008 2 次提交

[TCP]: Fix shrinking windows with window scaling · 607bfbf2

由 Patrick McHardy 提交于 3月 20, 2008

When selecting a new window, tcp_select_window() tries not to shrink
the offered window by using the maximum of the remaining offered window
size and the newly calculated window size. The newly calculated window
size is always a multiple of the window scaling factor, the remaining
window size however might not be since it depends on rcv_wup/rcv_nxt.
This means we're effectively shrinking the window when scaling it down.


The dump below shows the problem (scaling factor 2^7):

- Window size of 557 (71296) is advertised, up to 3111907257:

IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>

- New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
  below the last end:

IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>

The number 40 results from downscaling the remaining window:

3111907257 - 3111841425 = 65832
65832 / 2^7 = 514
65832 % 2^7 = 40

If the sender uses up the entire window before it is shrunk, this can have
chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
will notice that the window has been shrunk since tcp_wnd_end() is before
tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
This will fail the receivers checks in tcp_sequence() however since it
is before it's tp->rcv_wup, making it respond with a dupack.

If both sides are in this condition, this leads to a constant flood of
ACKs until the connection times out.

Make sure the window is never shrunk by aligning the remaining window to
the window scaling factor.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

607bfbf2

[NETFILTER]: ipt_recent: sanity check hit count · d0ebf133

由 Daniel Hokka Zakrisson 提交于 3月 20, 2008

If a rule using ipt_recent is created with a hit count greater than
ip_pkt_list_tot, the rule will never match as it cannot keep track
of enough timestamps. This patch makes ipt_recent refuse to create such
rules.

With ip_pkt_list_tot's default value of 20, the following can be used
to reproduce the problem.

nc -u -l 0.0.0.0 1234 &
for i in `seq 1 100`; do echo $i | nc -w 1 -u 127.0.0.1 1234; done

This limits it to 20 packets:
iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
         --rsource
iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
         60 --hitcount 20 --name test --rsource -j DROP

While this is unlimited:
iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
         --rsource
iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
         60 --hitcount 21 --name test --rsource -j DROP

With the patch the second rule-set will throw an EINVAL.
Reported-by: NSean Kennedy <skennedy@vcn.com>
Signed-off-by: NDaniel Hokka Zakrisson <daniel@hozac.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0ebf133

18 3月, 2008 2 次提交

[IPV4]: esp_output() misannotations · 5e226e4d

由 Al Viro 提交于 3月 17, 2008

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e226e4d

[NET] endianness noise: INADDR_ANY · e6f1cebf

由 Al Viro 提交于 3月 17, 2008

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6f1cebf

12 3月, 2008 1 次提交

[TCP]: Prevent sending past receiver window with TSO (at last skb) · 5ea3a748

由 Ilpo Järvinen 提交于 3月 11, 2008

With TSO it was possible to send past the receiver window when the skb
to be sent was the last in the write queue while the receiver window
is the limiting factor. One can notice that there's a loophole in the
tcp_mss_split_point that lacked a receiver window check for the
tcp_write_queue_tail() if also cwnd was smaller than the full skb.

Noticed by Thomas Gleixner <tglx@linutronix.de> in form of "Treason
uncloaked! Peer ... shrinks window .... Repaired." messages (the peer
didn't actually shrink its window as the message suggests, we had just
sent something past it without a permission to do so).
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Tested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ea3a748

05 3月, 2008 2 次提交

[IPCONFIG]: The kernel gets no IP from some DHCP servers · dea75bdf

由 Stephen Hemminger 提交于 3月 04, 2008

From: Stephen Hemminger <shemminger@linux-foundation.org>

Based upon a patch by Marcel Wappler:
 
   This patch fixes a DHCP issue of the kernel: some DHCP servers
   (i.e.  in the Linksys WRT54Gv5) are very strict about the contents
   of the DHCPDISCOVER packet they receive from clients.
 
   Table 5 in RFC2131 page 36 requests the fields 'ciaddr' and
   'siaddr' MUST be set to '0'.  These DHCP servers ignore Linux
   kernel's DHCP discovery packets with these two fields set to
   '255.255.255.255' (in contrast to popular DHCP clients, such as
   'dhclient' or 'udhcpc').  This leads to a not booting system.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dea75bdf

[ESP]: Add select on AUTHENC · ed58dd41

由 Herbert Xu 提交于 3月 04, 2008

Now the ESP uses the AEAD interface even for algorithms which are
not combined mode, we need to select CONFIG_CRYPTO_AUTHENC as
otherwise only combined mode algorithms will work.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed58dd41

04 3月, 2008 1 次提交

[TCP]: Must count fack_count also when skipping · d152a7d8

由 Ilpo Järvinen 提交于 3月 03, 2008

It makes fackets_out to grow too slowly compared with the
real write queue.

This shouldn't cause those BUG_TRAP(packets <= tp->packets_out)
to trigger but how knows how such inconsistent fackets_out
affects here and there around TCP when everything is nowadays
assuming accurate fackets_out. So lets see if this silences
them all.

Reported by Guillaume Chazarain <guichaz@gmail.com>.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d152a7d8

29 2月, 2008 3 次提交

[TCP]: BIC web page link is corrected. · 0bc8c7bf

由 Sangtae Ha 提交于 2月 28, 2008

Signed-off-by: NSangtae Ha <sha2@ncsu.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bc8c7bf

[IPV4]: Use proc_create() to setup ->proc_fops first · 77020720

由 Wang Chen 提交于 2月 28, 2008

Use proc_create() to make sure that ->proc_fops be setup before gluing
PDE to main tree.
Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77020720

[IPCOMP]: Disable BH on output when using shared tfm · 21e43188

由 Herbert Xu 提交于 2月 28, 2008

Because we use shared tfm objects in order to conserve memory,
(each tfm requires 128K of vmalloc memory), BH needs to be turned
off on output as that can occur in process context.

Previously this was done implicitly by the xfrm output code.
That was lost when it became lockless.  So we need to add the
BH disabling to IPComp directly.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21e43188

27 2月, 2008 2 次提交

[INET]: Don't create tunnels with '%' in name. · b37d428b

由 Pavel Emelyanov 提交于 2月 26, 2008

Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a
pre-defined name for a device from the userspace.  Since these drivers
call the register_netdevice() (rtnl_lock, is held), which does _not_
generate the device's name, this name may contain a '%' character.

Not sure how bad is this to have a device with a '%' in its name, but
all the other places either use the register_netdev(), which call the
dev_alloc_name(), or explicitly call the dev_alloc_name() before
registering, i.e. do not allow for such names.

This had to be prior to the commit 34cc7b, but I forgot to number the
patches and this one got lost, sorry.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b37d428b

[IPV4]: Reset scope when changing address · 148f9729

由 Bjorn Mork 提交于 2月 26, 2008

This bug did bite at least one user, who did have to resort to rebooting
the system after an "ifconfig eth0 127.0.0.1" typo.

Deleting the address and adding a new is a less intrusive workaround.
But I still beleive this is a bug that should be fixed.  Some way or
another.

Another possibility would be to remove the scope mangling based on
address.  This will always be incomplete (are 127/8 the only address
space with host scope requirements?)

We set the scope to RT_SCOPE_HOST if an IPv4 interface is configured
with a loopback address (127/8).  The scope is never reset, and will
remain set to RT_SCOPE_HOST after changing the address. This patch
resets the scope if the address is changed again, to restore normal
functionality.
Signed-off-by: NBjorn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

148f9729

24 2月, 2008 1 次提交

[IP_TUNNEL]: Don't limit the number of tunnels with generic name explicitly. · 34cc7ba6

由 Pavel Emelyanov 提交于 2月 23, 2008

Use the added dev_alloc_name() call to create tunnel device name,
rather than iterate in a hand-made loop with an artificial limit.

Thanks Patrick for noticing this.

[ The way this works is, when the device is actually registered,
  the generic code noticed the '%' in the name and invokes
  dev_alloc_name() to fully resolve the name.  -DaveM ]
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34cc7ba6

20 2月, 2008 3 次提交

[NETFILTER]: Fix incorrect use of skb_make_writable · eb1197bc

由 Joonwoo Park 提交于 2月 19, 2008

http://bugzilla.kernel.org/show_bug.cgi?id=9920
The function skb_make_writable returns true or false.
Signed-off-by: NJoonwoo Park <joonwpark81@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb1197bc

[NETFILTER]: {ip,ip6,nfnetlink}_queue: fix SKB_LINEAR_ASSERT when mangling packet data · e2b58a67

由 Patrick McHardy 提交于 2月 19, 2008

As reported by Tomas Simonaitis <tomas.simonaitis@gmail.com>,
inserting new data in skbs queued over {ip,ip6,nfnetlink}_queue
triggers a SKB_LINEAR_ASSERT in skb_put().

Going back through the git history, it seems this bug is present since
at least 2.6.12-rc2, probably even since the removal of
skb_linearize() for netfilter.

Linearize non-linear skbs through skb_copy_expand() when enlarging
them.  Tested by Thomas, fixes bugzilla #9933.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2b58a67

ipv4/fib_hash.c: fix NULL dereference · 94cb1503

由 Adrian Bunk 提交于 2月 19, 2008

Unless I miss a guaranteed relation between between "f" and
"new_fa->fa_info" this patch is required for fixing a NULL dereference
introduced by commit a6501e08 ("[IPV4]
FIB_HASH: Reduce memory needs and speedup lookups") and spotted by the
Coverity checker.

Eric Dumazet says:

	Hum, you are right, kmem_cache_free() doesnt allow a NULL
	object, like kfree() does.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94cb1503

18 2月, 2008 3 次提交

[TCP]: Fix tcp_v4_send_synack() comment · 9bf1d83e

由 Kris Katterjohn 提交于 2月 17, 2008

Signed-off-by: NKris Katterjohn <katterjohn@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bf1d83e

[IPV4]: fix alignment of IP-Config output · 9c00409a

由 Uwe Kleine-Koenig 提交于 2月 17, 2008

Make the indented lines aligned in the output (not in the code).
Signed-off-by: NUwe Kleine-Koenig <Uwe.Kleine-Koenig@digi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c00409a

Revert "[NDISC]: Fix race in generic address resolution" · 9ff56607

由 David S. Miller 提交于 2月 17, 2008

This reverts commit 69cc64d8.

It causes recursive locking in IPV6 because unlike other
neighbour layer clients, it even needs neighbour cache
entries to send neighbour soliciation messages :-(

We'll have to find another way to fix this race.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ff56607

14 2月, 2008 2 次提交

[INET]: Unexport inet_listen_wlock · 324b5761

由 Adrian Bunk 提交于 2月 13, 2008

This patch removes the no longer used EXPORT_SYMBOL(inet_listen_wlock).
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

324b5761

[INET]: Unexport __inet_hash_connect · 74da4d34

由 Adrian Bunk 提交于 2月 13, 2008

This patch removes the unused EXPORT_SYMBOL_GPL(__inet_hash_connect).
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74da4d34

13 2月, 2008 5 次提交

[IPSEC]: Fix bogus usage of u64 on input sequence number · b318e0e4

由 Herbert Xu 提交于 2月 12, 2008

Al Viro spotted a bogus use of u64 on the input sequence number which
is big-endian.  This patch fixes it by giving the input sequence number
its own member in the xfrm_skb_cb structure.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b318e0e4

[NDISC]: Fix race in generic address resolution · 69cc64d8

由 David S. Miller 提交于 2月 11, 2008

Frank Blaschka provided the bug report and the initial suggested fix
for this bug.  He also validated this version of this fix.

The problem is that the access to neigh->arp_queue is inconsistent, we
grab references when dropping the lock lock to call
neigh->ops->solicit() but this does not prevent other threads of
control from trying to send out that packet at the same time causing
corruptions because both code paths believe they have exclusive access
to the skb.

The best option seems to be to hold the write lock on neigh->lock
during the ->solicit() call.  I looked at all of the ndisc_ops
implementations and this seems workable.  The only case that needs
special care is the IPV4 ARP implementation of arp_solicit().  It
wants to take neigh->lock as a reader to protect the header entry in
neigh->ha during the emission of the soliciation.  We can simply
remove the read lock calls to take care of that since holding the lock
as a writer at the caller providers a superset of the protection
afforded by the existing read locking.

The rest of the ->solicit() implementations don't care whether the
neigh is locked or not.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69cc64d8

fib_trie: /proc/net/route performance improvement · 8315f5d8

由 Stephen Hemminger 提交于 2月 11, 2008

Use key/offset caching to change /proc/net/route (use by iputils route)
from O(n^2) to O(n). This improves performance from 30sec with 160,000
routes to 1sec.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8315f5d8

fib_trie: handle empty tree · ec28cf73

由 Stephen Hemminger 提交于 2月 11, 2008

This fixes possible problems when trie_firstleaf() returns NULL
to trie_leafindex().
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec28cf73

[IPV4]: Remove IP_TOS setting privilege checks. · e4f8b5d4

由 David S. Miller 提交于 2月 11, 2008

Various RFCs have all sorts of things to say about the CS field of the
DSCP value.  In particular they try to make the distinction between
values that should be used by "user applications" and things like
routing daemons.

This seems to have influenced the CAP_NET_ADMIN check which exists for
IP_TOS socket option settings, but in fact it has an off-by-one error
so it wasn't allowing CS5 which is meant for "user applications" as
well.

Further adding to the inconsistency and brokenness here, IPV6 does not
validate the DSCP values specified for the IPV6_TCLASS socket option.

The real actual uses of these TOS values are system specific in the
final analysis, and these RFC recommendations are just that, "a
recommendation".  In fact the standards very purposefully use
"SHOULD" and "SHOULD NOT" when describing how these values can be
used.

In the final analysis the only clean way to provide consistency here
is to remove the CAP_NET_ADMIN check.  The alternatives just don't
work out:

1) If we add the CAP_NET_ADMIN check to ipv6, this can break existing
   setups.

2) If we just fix the off-by-one error in the class comparison in
   IPV4, certain DSCP values can be used in IPV6 but not IPV4 by
   default.  So people will just ask for a sysctl asking to
   override that.

I checked several other freely available kernel trees and they
do not make any privilege checks in this area like we do.  For
the BSD stacks, this goes back all the way to Stevens Volume 2
and beyond.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4f8b5d4

10 2月, 2008 1 次提交

[IGMP]: Optimize kfree_skb in igmp_rcv. · cd557bc1

由 Denis V. Lunev 提交于 2月 09, 2008

Merge error paths inside igmp_rcv.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd557bc1

08 2月, 2008 2 次提交

[IPV4]: route: fix crash ip_route_input · 4136cd52

由 Patrick McHardy 提交于 2月 07, 2008

ip_route_me_harder() may call ip_route_input() with skbs that don't
have skb->dev set for skbs rerouted in LOCAL_OUT and TCP resets
generated by the REJECT target, resulting in a crash when dereferencing
skb->dev->nd_net. Since ip_route_input() has an input device argument,
it seems correct to use that one anyway.

Bug introduced in b5921910 (Routing cache virtualization).
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4136cd52

[NETFILTER]: nf_conntrack: fix ct_extend ->move operation · 86577c66

由 Patrick McHardy 提交于 2月 07, 2008

The ->move operation has two bugs:

- It is called with the same extension as source and destination,
  so it doesn't update the new extension.

- The address of the old extension is calculated incorrectly,
  instead of (void *)ct->ext + ct->ext->offset[i] it uses
  ct->ext + ct->ext->offset[i].

Fixes a crash on x86_64 reported by Chuck Ebbert <cebbert@redhat.com>
and Thomas Woerner <twoerner@redhat.com>.
Tested-by: NThomas Woerner <twoerner@redhat.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86577c66

06 2月, 2008 2 次提交

ipvs: Make wrr "no available servers" error message rate-limited · 9c1ca6e6

由 Sven Wegener 提交于 2月 05, 2008

No available servers is more an error message than something informational. It
should also be rate-limited, else we're going to flood our logs on a busy
director, if all real servers are out of order with a weight of zero.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Acked-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c1ca6e6

NetLabel: introduce a new kernel configuration API for NetLabel · eda61d32

由 Paul Moore 提交于 2月 04, 2008

Add a new set of configuration functions to the NetLabel/LSM API so that
LSMs can perform their own configuration of the NetLabel subsystem without
relying on assistance from userspace.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NCasey Schaufler <casey@schaufler-ca.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eda61d32

05 2月, 2008 4 次提交

[ICMP]: Restore pskb_pull calls in receive function · 8cf22943

由 Herbert Xu 提交于 2月 05, 2008

Somewhere along the development of my ICMP relookup patch the header
length check went AWOL on the non-IPsec path.  This patch restores the
check.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cf22943

[INET]: Fix accidentally broken inet(6)_hash_connect's port offset calculations. · 5d8c0aa9

由 Pavel Emelyanov 提交于 2月 05, 2008

The port offset calculations depend on the protocol family, but, as
Adrian noticed, I broke this logic with the commit

	5ee31fc1
	[INET]: Consolidate inet(6)_hash_connect.

Return this logic back, by passing the port offset directly into the
consolidated function.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Noticed-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d8c0aa9

[IPV4]: Formatting fix for /proc/net/fib_trie. · b9c4d82a

由 Denis V. Lunev 提交于 2月 05, 2008

The line in the /proc/net/fib_trie for route with TOS specified
- has extra \n at the end
- does not have a space after route scope
like below.
           |-- 1.1.1.1
              /32 universe UNICASTtos =1
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9c4d82a

[IPSEC] xfrm4_beet_input(): fix an if() · 322c8a3c

由 Adrian Bunk 提交于 2月 05, 2008

A bug every C programmer makes at some point in time...
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

322c8a3c

03 2月, 2008 1 次提交

[SOCK] proto: Add hashinfo member to struct proto · ab1e0a13

由 Arnaldo Carvalho de Melo 提交于 2月 03, 2008

This way we can remove TCP and DCCP specific versions of

sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                       a specific version to deal with mapped sockets
sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly

struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
that inet_csk_get_port can find the per family routine.

Now only the lookup routines receive as a parameter a struct inet_hashtable.

With this we further reuse code, reducing the difference among INET transport
protocols.

Eventually work has to be done on UDP and SCTP to make them share this
infrastructure and get as a bonus inet_diag interfaces so that iproute can be
used with these protocols.

net-2.6/net/ipv4/inet_hashtables.c:
  struct proto			     |   +8
  struct inet_connection_sock_af_ops |   +8
 2 structs changed
  __inet_hash_nolisten               |  +18
  __inet_hash                        | -210
  inet_put_port                      |   +8
  inet_bind_bucket_create            |   +1
  __inet_hash_connect                |   -8
 5 functions changed, 27 bytes added, 218 bytes removed, diff: -191

net-2.6/net/core/sock.c:
  proto_seq_show                     |   +3
 1 function changed, 3 bytes added, diff: +3

net-2.6/net/ipv4/inet_connection_sock.c:
  inet_csk_get_port                  |  +15
 1 function changed, 15 bytes added, diff: +15

net-2.6/net/ipv4/tcp.c:
  tcp_set_state                      |   -7
 1 function changed, 7 bytes removed, diff: -7

net-2.6/net/ipv4/tcp_ipv4.c:
  tcp_v4_get_port                    |  -31
  tcp_v4_hash                        |  -48
  tcp_v4_destroy_sock                |   -7
  tcp_v4_syn_recv_sock               |   -2
  tcp_unhash                         | -179
 5 functions changed, 267 bytes removed, diff: -267

net-2.6/net/ipv6/inet6_hashtables.c:
  __inet6_hash |   +8
 1 function changed, 8 bytes added, diff: +8

net-2.6/net/ipv4/inet_hashtables.c:
  inet_unhash                        | +190
  inet_hash                          | +242
 2 functions changed, 432 bytes added, diff: +432

vmlinux:
 16 functions changed, 485 bytes added, 492 bytes removed, diff: -7

/home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
  tcp_v6_get_port                    |  -31
  tcp_v6_hash                        |   -7
  tcp_v6_syn_recv_sock               |   -9
 3 functions changed, 47 bytes removed, diff: -47

/home/acme/git/net-2.6/net/dccp/proto.c:
  dccp_destroy_sock                  |   -7
  dccp_unhash                        | -179
  dccp_hash                          |  -49
  dccp_set_state                     |   -7
  dccp_done                          |   +1
 5 functions changed, 1 bytes added, 242 bytes removed, diff: -241

/home/acme/git/net-2.6/net/dccp/ipv4.c:
  dccp_v4_get_port                   |  -31
  dccp_v4_request_recv_sock          |   -2
 2 functions changed, 33 bytes removed, diff: -33

/home/acme/git/net-2.6/net/dccp/ipv6.c:
  dccp_v6_get_port                   |  -31
  dccp_v6_hash                       |   -7
  dccp_v6_request_recv_sock          |   +5
 3 functions changed, 5 bytes added, 38 bytes removed, diff: -33
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab1e0a13

01 2月, 2008 2 次提交

[NETNS]: Lookup in FIB semantic hashes taking into account the namespace. · 4814bdbd

由 Denis V. Lunev 提交于 1月 31, 2008

The namespace is not available in the fib_sync_down_addr, add it as a
parameter.

Looking up a device by the pointer to it is OK. Looking up using a
result from fib_trie/fib_hash table lookup is also safe. No need to
fix that at all.  So, just fix lookup by address and insertion to the
hash table path.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4814bdbd

[NETNS]: Add a namespace mark to fib_info. · 7462bd74

由 Denis V. Lunev 提交于 1月 31, 2008

This is required to make fib_info lookups namespace aware. In the
other case initial namespace devices are marked as dead in the local
routing table during other namespace stop.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7462bd74

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功