提交 · ebb53d75657f86587ac8cf3e38ab0c860a8e3d4f · openeuler / raspberrypi-kernel

29 1月, 2008 36 次提交

[NET] proto: Use pcounters for the inuse field · ebb53d75

由 Arnaldo Carvalho de Melo 提交于 11月 21, 2007

Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebb53d75

mac80211: provide interface iterator for drivers · dabeb344

由 Johannes Berg 提交于 11月 09, 2007

Sometimes drivers need to know which interfaces are associated with
their hardware. Rather than forcing those drivers to keep track of
the interfaces that were added, this adds an iteration function to
mac80211.

As it is intended to be used from the interface add/remove callbacks,
the iteration function may currently only be called under RTNL.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dabeb344

[RAW]: Consolidate proc interface. · 42a73808

由 Pavel Emelyanov 提交于 11月 19, 2007

Both ipv6/raw.c and ipv4/raw.c use the seq files to walk
through the raw sockets hash and show them.

The "walking" code is rather huge, but is identical in both
cases. The difference is the hash table to walk over and
the protocol family to check (this was not in the first
virsion of the patch, which was noticed by YOSHIFUJI)

Make the ->open store the needed hash table and the family
on the allocated raw_iter_state and make the start/next/stop
callbacks work with it.

This removes most of the code.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42a73808

[RAW]: Consolidate proto->unhash callback · ab70768e

由 Pavel Emelyanov 提交于 11月 19, 2007

Same as the ->hash one, this is easily consolidated.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab70768e

[RAW]: Consolidate proto->hash callback · 65b4c50b

由 Pavel Emelyanov 提交于 11月 19, 2007

Having the raw_hashinfo it's easy to consolidate the
raw[46]_hash functions.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65b4c50b

[RAW]: Introduce raw_hashinfo structure · b673e4df

由 Pavel Emelyanov 提交于 11月 19, 2007

The ipv4/raw.c and ipv6/raw.c contain many common code (most
of which is proc interface) which can be consolidated.

Most of the places to consolidate deal with the raw sockets
hashtable, so introduce a struct raw_hashinfo which describes
the raw sockets hash.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b673e4df

[IPv6] RAW: Compact the API for the kernel · 69d6da0b

由 Pavel Emelyanov 提交于 11月 19, 2007

Same as in the previous patch for ipv4, compact the
API and hide hash table and rwlock inside the raw.c
file.

Plus fix some "bad" places from checkpatch.pl point
of view (assignments inside if()).
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69d6da0b

[IPv4] RAW: Compact the API for the kernel · 7bc54c90

由 Pavel Emelyanov 提交于 11月 19, 2007

The raw sockets functions are explicitly used from
inside the kernel in two places:

1. in ip_local_deliver_finish to intercept skb-s
2. in icmp_error

For this purposes many functions and even data structures,
that are naturally internal for raw protocol, are exported.

Compact the API to two functions and hide all the other
(including hash table and rwlock) inside the net/ipv4/raw.c
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bc54c90

[NET]: Make AF_PACKET handle multiple network namespaces · d12d01d6

由 Denis V. Lunev 提交于 11月 19, 2007

This is done by making packet_sklist_lock and packet_sklist per
network namespace and adding an additional filter condition on
received packets to ensure they came from the proper network
namespace.

Changes from v1:
- prohibit to call inet_dgram_ops.ioctl in other than init_net
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d12d01d6

[NET]: Make rtnetlink infrastructure network namespace aware (v3) · 97c53cac

由 Denis V. Lunev 提交于 11月 19, 2007

After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Changes from v2:
- IPv6 addrlabel processing

Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97c53cac

[TCP]: Rewrite SACK block processing & sack_recv_cache use · 68f8353b

由 Ilpo Järvinen 提交于 11月 15, 2007

Key points of this patch are:

  - In case new SACK information is advance only type, no skb
    processing below previously discovered highest point is done
  - Optimize cases below highest point too since there's no need
    to always go up to highest point (which is very likely still
    present in that SACK), this is not entirely true though
    because I'm dropping the fastpath_skb_hint which could
    previously optimize those cases even better. Whether that's
    significant, I'm not too sure.

Currently it will provide skipping by walking. Combined with
RB-tree, all skipping would become fast too regardless of window
size (can be done incrementally later).

Previously a number of cases in TCP SACK processing fails to
take advantage of costly stored information in sack_recv_cache,
most importantly, expected events such as cumulative ACK and new
hole ACKs. Processing on such ACKs result in rather long walks
building up latencies (which easily gets nasty when window is
huge). Those latencies are often completely unnecessary
compared with the amount of _new_ information received, usually
for cumulative ACK there's no new information at all, yet TCP
walks whole queue unnecessary potentially taking a number of
costly cache misses on the way, etc.!

Since the inclusion of highest_sack, there's a lot information
that is very likely redundant (SACK fastpath hint stuff,
fackets_out, highest_sack), though there's no ultimate guarantee
that they'll remain the same whole the time (in all unearthly
scenarios). Take advantage of this knowledge here and drop
fastpath hint and use direct access to highest SACKed skb as
a replacement.

Effectively "special cased" fastpath is dropped. This change
adds some complexity to introduce better coveraged "fastpath",
though the added complexity should make TCP behave more cache
friendly.

The current ACK's SACK blocks are compared against each cached
block individially and only ranges that are new are then scanned
by the high constant walk. For other parts of write queue, even
when in previously known part of the SACK blocks, a faster skip
function is used (if necessary at all). In addition, whenever
possible, TCP fast-forwards to highest_sack skb that was made
available by an earlier patch. In typical case, no other things
but this fast-forward and mandatory markings after that occur
making the access pattern quite similar to the former fastpath
"special case".

DSACKs are special case that must always be walked.

The local to recv_sack_cache copying could be more intelligent
w.r.t DSACKs which are likely to be there only once but that
is left to a separate patch.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68f8353b

[TCP]: Convert highest_sack to sk_buff to allow direct access · a47e5a98

由 Ilpo Järvinen 提交于 11月 15, 2007

It is going to replace the sack fastpath hint quite soon... :-)
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a47e5a98

[NET]: Move sock_valbool_flag to socket.c · c0ef877b

由 Pavel Emelyanov 提交于 11月 15, 2007

The sock_valbool_flag() helper is used in setsockopt to
set or reset some flag on the sock. This helper is required
in the net/socket.c only, so move it there.

Besides, patch two places in sys_setsockopt() that repeat
this helper functionality manually.

Since this is not a bugfix, but a trivial cleanup, I
prepared this patch against net-2.6.25, but it also
applies (with a single offset) to the latest net-2.6.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0ef877b

[NET]: Move Qdisc_class_ops and Qdisc_ops in appropriate sections. · 20fea08b

由 Eric Dumazet 提交于 11月 14, 2007

Qdisc_class_ops are const, and Qdisc_ops are mostly read.

Using "const" and "__read_mostly" qualifiers helps to reduce false
sharing.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20fea08b

[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table. · 2a8cc6c8

由 YOSHIFUJI Hideaki 提交于 11月 14, 2007

Policy table is implemented as an RCU linear list since we do not expect
large list nor frequent updates.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a8cc6c8

[IPSEC]: Kill afinfo->nf_post_routing · 294b4baf

由 David S. Miller 提交于 11月 14, 2007

After changeset:

	[NETFILTER]: Introduce NF_INET_ hook values

It always evaluates to NF_INET_POST_ROUTING.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

294b4baf

[NETFILTER]: Introduce NF_INET_ hook values · 6e23ae2a

由 Patrick McHardy 提交于 11月 19, 2007

The IPv4 and IPv6 hook values are identical, yet some code tries to figure
out the "correct" value by looking at the address family. Introduce NF_INET_*
values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__
section for userspace compatibility.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e23ae2a

[IPSEC]: Add async resume support on input · 1bf06cd2

由 Herbert Xu 提交于 11月 19, 2007

This patch adds support for async resumptions on input.  To do so, the
transform would return -EINPROGRESS and subsequently invoke the
function xfrm_input_resume to resume processing.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bf06cd2

[IPSEC]: Remove nhoff from xfrm_input · 60d5fcfb

由 Herbert Xu 提交于 11月 19, 2007

The nhoff field isn't actually necessary in xfrm_input.  For tunnel
mode transforms we now throw away the output IP header so it makes no
sense to fill in the nexthdr field.  For transport mode we can now let
the function transport_finish do the setting and it knows where the
nexthdr field is.

The only other thing that needs the nexthdr field to be set is the
header extraction code.  However, we can simply move the protocol
extraction out of the generic header extraction.

We want to minimise the amount of info we have to carry around between
transforms as this simplifies the resumption process for async crypto.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60d5fcfb

[IPSEC]: Make x->lastused an unsigned long · d26f3984

由 Herbert Xu 提交于 11月 13, 2007

Currently x->lastused is u64 which means that it cannot be
read/written atomically on all architectures.  David Miller observed
that the value stored in it is only an unsigned long which is always
atomic.

So based on his suggestion this patch changes the internal
representation from u64 to unsigned long while the user-interface
still refers to it as u64.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d26f3984

[IPSEC]: Merge most of the input path · 716062fd

由 Herbert Xu 提交于 11月 13, 2007

As part of the work on asynchronous cryptographic operations, we need
to be able to resume from the spot where they occur.  As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common input code.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

716062fd

[IPSEC]: Add async resume support on output · c6581a45

由 Herbert Xu 提交于 11月 13, 2007

This patch adds support for async resumptions on output.  To do so,
the transform would return -EINPROGRESS and subsequently invoke the
function xfrm_output_resume to resume processing.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6581a45

[IPSEC]: Merge most of the output path · 862b82c6

由 Herbert Xu 提交于 11月 13, 2007

As part of the work on asynchrnous cryptographic operations, we need
to be able to resume from the spot where they occur.  As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common output code.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

862b82c6

[IPV6]: Add ip6_local_out · ef76bc23

由 Herbert Xu 提交于 1月 11, 2008

Most callers of the LOCAL_OUT chain will set the IP packet length
before doing so.  They also share the same output function dst_output.

This patch creates a new function called ip6_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef76bc23

[IPV4]: Add ip_local_out · c439cb2e

由 Herbert Xu 提交于 1月 11, 2008

Most callers of the LOCAL_OUT chain will set the IP packet length and
header checksum before doing so.  They also share the same output
function dst_output.

This patch creates a new function called ip_local_out which does all
of that and converts the appropriate users over to it.

Apart from removing duplicate code, it will also help in merging the
IPsec output path once the same thing is done for IPv6.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c439cb2e

[IPSEC]: Separate inner/outer mode processing on input · 227620e2

由 Herbert Xu 提交于 11月 13, 2007

With inter-family transforms the inner mode differs from the outer
mode.  Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.

This patch separates the two parts on the input path so that each
function deals with one family only.

In particular, the functions xfrm4_extract_inut/xfrm6_extract_inut
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb.  This is then used by the inner mode
input functions to modify the inner IP header.  In this way the input
function no longer has to know about the outer address family.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

227620e2

[IPSEC]: Separate inner/outer mode processing on output · 36cf9acf

由 Herbert Xu 提交于 11月 13, 2007

With inter-family transforms the inner mode differs from the outer
mode.  Attempting to handle both sides from the same function means
that it needs to handle both IPv4 and IPv6 which creates duplication
and confusion.

This patch separates the two parts on the output path so that each
function deals with one family only.

In particular, the functions xfrm4_extract_output/xfrm6_extract_output
moves the pertinent fields from the IPv4/IPv6 IP headers into a
neutral format stored in skb->cb.  This is then used by the outer mode
output functions to write the outer IP header.  In this way the output
function no longer has to know about the inner address family.

Since the extract functions are only called by tunnel modes (the only
modes that can support inter-family transforms), I've also moved the
xfrm*_tunnel_check_size calls into them.  This allows the correct ICMP
message to be sent as opposed to now where you might call icmp_send
with an IPv6 packet and vice versa.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36cf9acf

[INET]: Give outer DSCP directly to ip*_copy_dscp · 29bb43b4

由 Herbert Xu 提交于 11月 13, 2007

This patch changes the prototype of ipv4_copy_dscp and ipv6_copy_dscp so
that they directly take the outer DSCP rather than the outer IP header.
This will help us to unify the code for inter-family tunnels.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29bb43b4

[IPSEC]: Merge common code into xfrm_bundle_create · 25ee3286

由 Herbert Xu 提交于 12月 11, 2007

Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are
common.  This patch extracts that logic and puts it into
xfrm_bundle_create.  The rest of it are then accessed through afinfo.

As a result this fixes the problem with inter-family transforms where
we treat every xfrm dst in the bundle as if it belongs to the top
family.

This patch also fixes a long-standing error-path bug where we may free
the xfrm states twice.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25ee3286

[IPSEC]: Move flow construction into xfrm_dst_lookup · 66cdb3ca

由 Herbert Xu 提交于 11月 13, 2007

This patch moves the flow construction from the callers of
xfrm_dst_lookup into that function.  It also changes xfrm_dst_lookup
so that it takes an xfrm state as its argument instead of explicit
addresses.

This removes any address-specific logic from the callers of
xfrm_dst_lookup which is needed to correctly support inter-family
transforms.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66cdb3ca

[IPSEC]: Replace x->type->{local,remote}_addr with flags · f04e7e8d

由 Herbert Xu 提交于 11月 13, 2007

The functions local_addr and remote_addr are more than what they're
needed for.  The same thing can be done easily with flags on the type
object.  This patch does that and simplifies the wrapper functions in
xfrm6_policy accordingly.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f04e7e8d

[NET]: Remove unnecessary inclusion of dst.h · 274b3426

由 Herbert Xu 提交于 11月 13, 2007

The file net/netevent.h only refers to struct dst_entry * so it
doesn't need to include dst.h.  I've replaced it with a forward
declaration.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

274b3426

[NET]: Eliminate duplicate copies of dst_discard · 352e512c

由 Herbert Xu 提交于 11月 13, 2007

We have a number of copies of dst_discard scattered around the place
which all do the same thing, namely free a packet on the input or
output paths.

This patch deletes all of them except dst_discard and points all the
users to it.

The only non-trivial bit is decnet where it returns an error.
However, conceptually this is identical to the blackhole functions
used in IPv4 and IPv6 which do not return errors.  So they should
either all return errors or all return zero.  For now I've stuck with
the majority and picked zero as the return value.

It doesn't really matter in practice since few if any driver would
react differently depending on a zero return value or NET_RX_DROP.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

352e512c

[IPV6]: Move nfheader_len into rt6_info · b4ce9277

由 Herbert Xu 提交于 11月 13, 2007

The dst member nfheader_len is only used by IPv6.  It's also currently
creating a rather ugly alignment hole in struct dst.  Therefore this patch
moves it from there into struct rt6_info.

It also reorders the fields in rt6_info to minimize holes.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4ce9277

[IPV4]: Add raw drops counter. · 33c732c3

由 Wang Chen 提交于 11月 13, 2007

Add raw drops counter for IPv4 in /proc/net/raw .
Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33c732c3

[TCP]: Splice receive support. · 9c55e01c

由 Jens Axboe 提交于 11月 06, 2007

Support for network splice receive.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c55e01c

26 1月, 2008 1 次提交

IPoIB: improve IPv4/IPv6 to IB mcast mapping functions · a9e527e3

由 Rolf Manderscheid 提交于 12月 10, 2007

An IPoIB subnet on an IB fabric that spans multiple IB subnets can't
use link-local scope in multicast GIDs.  The existing routines that
map IP/IPv6 multicast addresses into IB link-level addresses hard-code
the scope to link-local, and they also leave the partition key field
uninitialised.  This patch adds a parameter (the link-level broadcast
address) to the mapping routines, allowing them to initialise both the
scope and the P_Key appropriately, and fixes up the call sites.

The next step will be to add a way to configure the scope for an IPoIB
interface.
Signed-off-by: NRolf Manderscheid <rvm@obsidianresearch.com>
Signed-off-by: NRoland Dreier <rolandd@cisco.com>

a9e527e3

09 1月, 2008 3 次提交

[SOCK]: Adds a rcu_dereference() in sk_filter · 9d3e4442

由 Eric Dumazet 提交于 1月 08, 2008

It seems commit fda9ef5d introduced a RCU 
protection for sk_filter(), without a rcu_dereference()

Either we need a rcu_dereference(), either a comment should explain why we 
dont need it. I vote for the former.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d3e4442

[XFRM]: xfrm_algo_clone() allocates too much memory · 0f99be0d

由 Eric Dumazet 提交于 1月 08, 2008

alg_key_len is the length in bits of the key, not in bytes.

Best way to fix this is to move alg_len() function from net/xfrm/xfrm_user.c 
to include/net/xfrm.h, and to use it in xfrm_algo_clone()

alg_len() is renamed to xfrm_alg_len() because of its global exposition.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f99be0d

[NET]: Clone the sk_buff 'iif' field in __skb_clone() · 02f1c89d

由 Paul Moore 提交于 1月 07, 2008

Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
on the 'iif' field to determine the receiving network interface of
inbound packets.  Unfortunately, at present this field is not
preserved across a skb clone operation which can lead to garbage
values if the cloned skb is sent back through the network stack.  This
patch corrects this problem by properly copying the 'iif' field in
__skb_clone() and removing the 'iif' field assignment from
skb_act_clone() since it is no longer needed.

Also, while we are here, put the assignments in the same order as the
offsets to reduce cacheline bounces.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02f1c89d