提交 · f25f4e44808f0f6c9875d94ef1c41ef86c288eb2 · openeuler / Kernel

11 7月, 2007 40 次提交

[CORE] Stack changes to add multiqueue hardware support API · f25f4e44

由 Peter P Waskiewicz Jr 提交于 7月 06, 2007

Add the multiqueue hardware device support API to the core network
stack.  Allow drivers to allocate multiple queues and manage them at
the netdev level if they choose to do so.

Added a new field to sk_buff, namely queue_mapping, for drivers to
know which tx_ring to select based on OS classification of the flow.
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f25f4e44

[NET]: [DOC] Multiqueue hardware support documentation · a093bf00

由 Peter P Waskiewicz Jr 提交于 6月 28, 2007

Add a brief howto to Documentation/networking for multiqueue.  It
explains how to use the multiqueue API in a driver to support
multiqueue paths from the stack, as well as the qdiscs to use for
feeding a multiqueue device.
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a093bf00

[NET]: Fix TX checksum feature check · a298830c

由 Herbert Xu 提交于 6月 28, 2007

This patch fixes a boolean error in the new TX checksum check
that causes bogus TSO packets to be generated.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a298830c

[L2TP]: Add PPPoL2TP in-kernel documentation · 58e50a90

由 James Chapman 提交于 6月 27, 2007

Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58e50a90

[L2TP]: Add PPPoL2TP maintainer · a6d2370b

由 James Chapman 提交于 6月 27, 2007

Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6d2370b

D
[PPPOL2TP]: Use proper printf format specifier for size_t. · 38d15b65
由 David S. Miller 提交于 6月 27, 2007
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
38d15b65

[L2TP]: PPP over L2TP driver core · 3557baab

由 James Chapman 提交于 6月 27, 2007

This driver handles only L2TP data frames; control frames are handled
by a userspace application. It implements L2TP using the PPPoX socket
family. There is a PPPoX socket for each L2TP session in an L2TP
tunnel. PPP data within each session is passed through the kernel's
PPP subsystem via this driver. Kernel parameters of each socket can be
read or modified using ioctl() or [gs]etsockopt() calls.
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3557baab

[L2TP]: Changes to existing ppp and socket kernel headers for L2TP · cf14a4d0

由 James Chapman 提交于 6月 27, 2007

Add struct sockaddr_pppol2tp to carry L2TP-specific address
information for the PPPoX (PPPoL2TP) socket. Unfortunately we can't
use the union inside struct sockaddr_pppox because the L2TP-specific
data is larger than the current size of the union and we must preserve
the size of struct sockaddr_pppox for binary compatibility.

Also add a PPPIOCGL2TPSTATS ioctl to allow userspace to obtain
L2TP counters and state from the kernel.

Add new if_pppol2tp.h header.

[ Modified to use aligned_u64 in statistics structure -DaveM ]
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf14a4d0

[UDP]: Introduce UDP encapsulation type for L2TP · 342f0234

由 James Chapman 提交于 6月 27, 2007

This patch adds a new UDP_ENCAP_L2TPINUDP encapsulation type for UDP
sockets. When a UDP socket's encap_type is UDP_ENCAP_L2TPINUDP, the
skb is delivered to a function pointed to by the udp_sock's
encap_rcv funcptr. If the skb isn't wanted by L2TP, it returns >0, which
causes it to be passed through to UDP.

Include padding to put the new encap_rcv field on a 4-byte boundary.

Previously, the only user of UDP encap sockets was ESP, so when
CONFIG_XFRM was not defined, some of the encap code was compiled
out. This patch changes that. As a result, udp_encap_rcv() will
now do a little more work when CONFIG_XFRM is not defined.
Signed-off-by: NJames Chapman <jchapman@katalix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

342f0234

[NET]: dev: secondary unicast address support · 4417da66

由 Patrick McHardy 提交于 6月 27, 2007

Add support for configuring secondary unicast addresses on network
devices. To support this devices capable of filtering multiple
unicast addresses need to change their set_multicast_list function
to configure unicast filters as well and assign it to dev->set_rx_mode
instead of dev->set_multicast_list. Other devices are put into promiscous
mode when secondary unicast addresses are present.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4417da66

[NET]: dev_mcast: switch to generic net_device address lists · 3fba5a8b

由 Patrick McHardy 提交于 6月 27, 2007

Use generic net_device address lists for multicast list handling.
Some defines are used to keep drivers working.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3fba5a8b

[NET]: dev: introduce generic net_device address lists · bf742482

由 Patrick McHardy 提交于 6月 27, 2007

Introduce struct dev_addr_list and list maintenance functions
based on dev_mc_list and the related functions. This will be
used by follow-up patches for both multicast and secondary
unicast addresses.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf742482

[NET]: dev_mcast: unexport dev_mc_upload · 75ebe8f7

由 Patrick McHardy 提交于 6月 27, 2007

dev_mc_add/dev_mc_delete take care of uploading the list when
necessary and thats the only interface other code should use.
Also remove two incorrect calls in DECnet.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75ebe8f7

[NET]: IPV6 checksum offloading in network devices · d212f87b

由 Stephen Hemminger 提交于 6月 27, 2007

The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.

This patch:
 * adds NETIF_F_IPV6_CSUM for those devices
 * fixes bnx2 and tg3 devices that need it
 * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
 * fixes assumptions about NETIF_F_ALL_CSUM in nat
 * adjusts bridge union of checksumming computation
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d212f87b

[XFRM]: Add module alias for transformation type. · d3d6dd3a

由 Masahide NAKAMURA 提交于 6月 26, 2007

It is clean-up for XFRM type modules and adds aliases with its
protocol:
 ESP, AH, IPCOMP, IPIP and IPv6 for IPsec
 ROUTING and DSTOPTS for MIPv6

It is almost the same thing as XFRM mode alias, but it is added
new defines XFRM_PROTO_XXX for preprocessing since some protocols
are defined as enum.
Signed-off-by: NMasahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: NIngo Oeser <netdev@axxeo.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3d6dd3a

[IPV6] MIP6: Loadable module support for MIPv6. · 59fbb3a6

由 Masahide NAKAMURA 提交于 6月 26, 2007

This patch makes MIPv6 loadable module named "mip6".

Here is a modprobe.conf(5) example to load it automatically
when user application uses XFRM state for MIPv6:

alias xfrm-type-10-43 mip6
alias xfrm-type-10-60 mip6

Some MIPv6 feature is not included by this modular, however,
it should not be affected to other features like either IPsec
or IPv6 with and without the patch.
We may discuss XFRM, MH (RAW socket) and ancillary data/sockopt
separately for future work.

Loadable features:
* MH receiving check (to send ICMP error back)
* RO header parsing and building (i.e. RH2 and HAO in DSTOPTS)
* XFRM policy/state database handling for RO

These are NOT covered as loadable:
* Home Address flags and its rule on source address selection
* XFRM sub policy (depends on its own kernel option)
* XFRM functions to receive RO as IPv6 extension header
* MH sending/receiving through raw socket if user application
  opens it (since raw socket allows to do so)
* RH2 sending as ancillary data
* RH2 operation with setsockopt(2)
Signed-off-by: NMasahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59fbb3a6

[IPV6] MIP6: Kill unnecessary ifdefs. · 136ebf08

由 Masahide NAKAMURA 提交于 6月 26, 2007

Kill unnecessary CONFIG_IPV6_MIP6.

o It is redundant for RAW socket to keep MH out with the config then
  it can handle any protocol.
o Clean-up at AH.
Signed-off-by: NMasahide NAKAMURA <nakam@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

136ebf08

[RTNETLINK]: Fix rtnetlink compat attribute patch · 2371baa4

由 Patrick McHardy 提交于 6月 26, 2007

Sent the wrong patch previously.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2371baa4

[RTNETLINK]: Add nested compat attribute · afdc3238

由 Patrick McHardy 提交于 6月 25, 2007

Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.

The attribute looks like this:

struct {
        [ compat contents ]
        struct rtattr {
                .rta_len        = total size,
                .rta_type       = type,
        } rta;
        struct old_structure struct;

        [ nested top-level attribute ]
        struct rtattr {
                .rta_len        = nest size,
                .rta_type       = type,
        } nest_attr;

        [ optional 0 .. n nested attributes ]
        struct rtattr {
                .rta_len        = private attribute len,
                .rta_type       = private attribute typ,
        } nested_attr;
        struct nested_data data;
};

Since both userspace and kernel deal correctly with attributes that are
larger than expected old versions will just parse the compat part and
ignore the rest.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afdc3238

[NETLINK]: attr: add nested compat attribute type · 1092cb21

由 Patrick McHardy 提交于 6月 25, 2007

Add a nested compat attribute type that can be used to convert
attributes that contain a structure to nested attributes in a
backwards compatible way.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1092cb21

[SKBUFF]: Keep track of writable header len of headerless clones · 334a8132

由 Patrick McHardy 提交于 6月 25, 2007

Currently NAT (and others) that want to modify cloned skbs copy them,
even if in the vast majority of cases its not necessary because the
skb is a clone made by TCP and the portion NAT wants to modify is
actually writable because TCP release the header reference before
cloning.

The problem is that there is no clean way for NAT to find out how
long the writable header area is, so this patch introduces skb->hdr_len
to hold this length. When a headerless skb is cloned skb->hdr_len
is set to the current headroom, for regular clones it is copied from
the original. A new function skb_clone_writable(skb, len) returns
whether the skb is writable up to len bytes from skb->data. To avoid
enlarging the skb the mac_len field is reduced to 16 bit and the
new hdr_len field is put in the remaining 16 bit.

I've done a few rough benchmarks of NAT (not with this exact patch,
but a very similar one). As expected it saves huge amounts of system
time in case of sendfile, bringing it down to basically the same
amount as without NAT, with sendmsg it only helps on loopback,
probably because of the large MTU.

Transmit a 1GB file using sendfile/sendmsg over eth0/lo with and
without NAT:

- sendfile eth0, no NAT:	sys     0m0.388s
- sendfile eth0, NAT:		sys     0m1.835s
- sendfile eth0: NAT + path:	sys     0m0.370s	(~ -80%)

- sendfile lo, no NAT:		sys     0m0.258s
- sendfile lo, NAT:		sys     0m2.609s
- sendfile lo, NAT + patch:	sys     0m0.260s	(~ -90%)

- sendmsg eth0, no NAT:		sys     0m2.508s
- sendmsg eth0, NAT:		sys     0m2.539s
- sendmsg eth0, NAT + patch:	sys     0m2.445s	(no change)

- sendmsg lo, no NAT:		sys	0m2.151s
- sendmsg lo, NAT:		sys     0m3.557s
- sendmsg lo, NAT + patch:	sys     0m2.159s	(~ -40%)

I expect other users can see a similar performance improvement,
packet mangling iptables targets, ipip and ip_gre come to mind ..
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

334a8132

[NET]: qdisc_restart - couple of optimizations. · e50c41b5

由 Krishna Kumar 提交于 6月 24, 2007

Changes :

- netif_queue_stopped need not be called inside qdisc_restart as
  it has been called already in qdisc_run() before the first skb
  is sent, and in __qdisc_run() after each intermediate skb is
  sent (note : we are the only sender, so the queue cannot get
  stopped while the tx lock was got in the ~LLTX case).

- BUG_ON((int) q->q.qlen < 0) was a relic from old times when -1
  meant more packets are available, and __qdisc_run used to loop
  when qdisc_restart() returned -1. During those days, it was
  necessary to make sure that qlen is never less than zero, since
  __qdisc_run would get into an infinite loop if no packets are on
  the queue and this bug in qdisc was there (and worse - no more
  skbs could ever get queue'd as we hold the queue lock too). With
  Herbert's recent change to return values, this check is not
  required.  Hopefully Herbert can validate this change. If at all
  this is required, it should be added to skb_dequeue (in failure
  case), and not to qdisc_qlen.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e50c41b5

[NET]: qdisc_restart - readability changes plus one bug fix. · 6c1361a6

由 Krishna Kumar 提交于 6月 24, 2007

New changes :

- Incorporated Peter Waskiewicz's comments.
- Re-added back one warning message (on driver returning wrong value).

Previous changes :

- Converted to use switch/case code which looks neater.

- "if (ret == NETDEV_TX_LOCKED && lockless)" is buggy, and the lockless
  check should be removed, since driver will return NETDEV_TX_LOCKED only
  if lockless is true and driver has to do the locking. In the original
  code as well as the latest code, this code can result in a bug where
  if LLTX is not set for a driver (lockless == 0) but the driver is written
  wrongly to do a trylock (despite LLTX being set), the driver returns
  LOCKED. But since lockless is zero, the packet is requeue'd instead of
  calling collision code which will issue warning and free up the skb.
  Instead this skb will be retried with this driver next time, and the same
  result will ensue. Removing this check will catch these driver bugs instead
  of hiding the problem. I am keeping this change to readability section
  since :
  	a. it is confusing to check two things as it is; and
  	b. it is difficult to keep this check in the changed 'switch' code.

- Changed some names, like try_get_tx_pkt to dev_dequeue_skb (as that is
  the work being done and easier to understand) and do_dev_requeue to
  dev_requeue_skb, merged handle_dev_cpu_collision and tx_islocked to
  dev_handle_collision (handle_dev_cpu_collision is a small routine with only
  one caller, so there is no need to have two separate routines which also
  results in getting rid of two macros, etc.

- Removed an XXX comment as it should never fail (I suspect this was related
  to batch skb WIP, Jamal ?). Converted some functions to original coding
  style of having the return values and the function name on same line, eg
  prio2list.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c1361a6

[CCID3]: Fix a bug in the send time processing · 49d66a70

由 Gerrit Renker 提交于 6月 16, 2007

ccid3_hc_tx_send_packet currently returns 0 when the time difference between
current time and t_nom is less than 1000 microseconds.

In this case the packet is sent immediately; but, unlike other packets that can
be emitted on first attempt, it will not have its window counter updated and
its options set as required. This is a bug.

Fix: Require the time difference to be at least 1000 microseconds. The
algorithm then converges: time differences > 1000 microseconds trigger the
timer in dccp_write_xmit; after timer expiry this function is tried again; when
the time difference is less than 1000, the packet will have its options added
and window counter updated as required.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

49d66a70

[CCID3]: Sending time: update to ktime_t · 8132da4d

由 Gerrit Renker 提交于 6月 16, 2007

This updates the computation of t_nom and t_last_win_count to use the newer
gettimeofday interface.

Committer note: used ktime_to_timeval to set the 'now' variable to t_ld in
                ccid3hctx_no_feedback_timer
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

8132da4d

A
[KTIME]: Introduce ktime_add_us · 1e180f72
由 Arnaldo Carvalho de Melo 提交于 6月 16, 2007
```
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
```
1e180f72

[KTIME]: Introduce ktime_us_delta · f1c91da4

由 Gerrit Renker 提交于 6月 16, 2007

This provides a reusable time difference function which returns the difference in
microseconds, as often used in the DCCP code.

Commiter note: renamed ktime_delta to ktime_us_delta and put it in ktime.h.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

f1c91da4

loss_interval: make struct dccp_li_hist_entry private · dd36a9ab

由 Arnaldo Carvalho de Melo 提交于 5月 28, 2007

net/dccp/ccids/lib/loss_interval.c is the only place where this struct is used.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

dd36a9ab

loss_interval: Nuke dccp_li_hist · cc4d6a3a

由 Arnaldo Carvalho de Melo 提交于 5月 28, 2007

It had just a slab cache, so, for the sake of simplicity just make
dccp_trfc_lib module init routine create the slab cache, no need for users of
the lib to create a private loss_interval object.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

cc4d6a3a

A
loss_interval: Make dccp_li_hist_entry_{new,delete} private · c70b729e
由 Arnaldo Carvalho de Melo 提交于 5月 28, 2007
```
Not used outside the loss_interval code anymore.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
```
c70b729e
A
loss_interval: unexport dccp_li_hist_interval_new · 8c281780
由 Arnaldo Carvalho de Melo 提交于 5月 28, 2007
```
Now its only used inside the loss_interval code.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
```
8c281780

[DCCP] loss_interval: Move ccid3_hc_rx_update_li to loss_interval · cc0a910b

由 Arnaldo Carvalho de Melo 提交于 6月 14, 2007

Renaming it to dccp_li_update_li.

Also based on previous work by Ian McDonald.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

cc0a910b

[CCID3]: Pass ccid3_li_hist to ccid3_hc_rx_update_li · 878ac600

由 Arnaldo Carvalho de Melo 提交于 6月 14, 2007

Now ccid3_hc_rx_update_li is ready to be moved to
net/dccp/ccids/lib/loss_interval, it uses the same interface as the other
functions there.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

878ac600

Remove accesses to ccid3_hc_rx_sock in ccid3_hc_rx_{update,calc_first}_li · d83258a3

由 Arnaldo Carvalho de Melo 提交于 5月 28, 2007

This is a preparatory patch for moving these loss interval functions from
net/dccp/ccids/ccid3.c to net/dccp/ccids/lib/loss_interval.c.

Based on a patch by Ian McDonald.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

d83258a3

loss_interval: Fix timeval initialisation · 6bc7efe8

由 Ian McDonald 提交于 5月 28, 2007

When compiling with EXTRA_CFLAGS=-W noticed that tstamp is not initialised
correctly in dccp_li_calc_first_li.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

6bc7efe8

Fix dccp_sum_coverage · e961811f

由 Ian McDonald 提交于 5月 28, 2007

When compiling with EXTRA_CFLAGS=-W notice that we have signed/unsigned issue
in dccp.h.
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

e961811f

ccid3: Update copyrights · b2f41ff4

由 Ian McDonald 提交于 5月 28, 2007

Signed-off-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>

b2f41ff4

[VLAN]: Use rtnl_link API · 07b5b17e

由 Patrick McHardy 提交于 6月 13, 2007

Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07b5b17e

P
[VLAN]: Introduce symbolic constants for flag values · a4bf3af4
由 Patrick McHardy 提交于 6月 13, 2007
```
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
a4bf3af4

[VLAN]: Keep track of number of QoS mappings · b020cb48

由 Patrick McHardy 提交于 6月 13, 2007

Keep track of the number of configured ingress/egress QoS mappings to
avoid iteration while calculating the netlink attribute size.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b020cb48

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功