提交 · 3f2c31d90327f21d76d296af34aa4ca547932ff4 · openeuler / Kernel

17 11月, 2008 15 次提交

virtio_net: VIRTIO_NET_F_MSG_RXBUF (imprive rcv buffer allocation) · 3f2c31d9

由 Mark McLoughlin 提交于 11月 16, 2008

If segmentation offload is enabled by the host, we currently allocate
maximum sized packet buffers and pass them to the host. This uses up
20 ring entries, allowing us to supply only 20 packet buffers to the
host with a 256 entry ring. This is a huge overhead when receiving
small packets, and is most keenly felt when receiving MTU sized
packets from off-host.

The VIRTIO_NET_F_MRG_RXBUF feature flag is set by hosts which support
using receive buffers which are smaller than the maximum packet size.
In order to transfer large packets to the guest, the host merges
together multiple receive buffers to form a larger logical buffer.
The number of merged buffers is returned to the guest via a field in
the virtio_net_hdr.

Make use of this support by supplying single page receive buffers to
the host. On receive, we extract the virtio_net_hdr, copy 128 bytes of
the payload to the skb's linear data buffer and adjust the fragment
offset to point to the remaining data. This ensures proper alignment
and allows us to not use any paged data for small packets. If the
payload occupies multiple pages, we simply append those pages as
fragments and free the associated skbs.

This scheme allows us to be efficient in our use of ring entries
while still supporting large packets. Benchmarking using netperf from
an external machine to a guest over a 10Gb/s network shows a 100%
improvement from ~1Gb/s to ~2Gb/s. With a local host->guest benchmark
with GSO disabled on the host side, throughput was seen to increase
from 700Mb/s to 1.7Gb/s.

Based on a patch from Herbert Xu.
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f2c31d9

virtio_net: hook up the set-tso ethtool op · 0276b497

由 Mark McLoughlin 提交于 11月 16, 2008

Seems like an oversight that we have set-tx-csum and set-sg hooked
up, but not set-tso.

Also leads to the strange situation that if you e.g. disable tx-csum,
then tso doesn't get disabled.
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0276b497

virtio_net: Recycle some more rx buffer pages · 0a888fd1

由 Mark McLoughlin 提交于 11月 16, 2008

Each time we re-fill the recv queue with buffers, we allocate
one too many skbs and free it again when adding fails. We should
recycle the pages allocated in this case.

A previous version of this patch made trim_pages() trim trailing
unused pages from skbs with some paged data, but this actually
caused a barely measurable slowdown.
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (use netdev_priv)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a888fd1

net: use %pF for /proc/net/ptype · 908cd2da

由 Alexey Dobriyan 提交于 11月 16, 2008

Technically, patch changes format for modules, but I think nobody cares.

	-86dd          :ipv6:ipv6_rcv+0x0
	+86dd          ipv6_rcv+0x0/0x400 [ipv6]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

908cd2da

net: make sure struct dst_entry refcount is aligned on 64 bytes · 5635c10d

由 Eric Dumazet 提交于 11月 16, 2008

As found in the past (commit f1dd9c37
[NET]: Fix tbench regression in 2.6.25-rc1), it is really
important that struct dst_entry refcount is aligned on a cache line.

We cannot use __atribute((aligned)), so manually pad the structure
for 32 and 64 bit arches.

for 32bit : offsetof(truct dst_entry, __refcnt) is 0x80
for 64bit : offsetof(truct dst_entry, __refcnt) is 0xc0

As it is not possible to guess at compile time cache line size,
we use a generic value of 64 bytes, that satisfies many current arches.
(Using 128 bytes alignment on 64bit arches would waste 64 bytes)

Add a BUILD_BUG_ON to catch future updates to "struct dst_entry" dont
break this alignment.

"tbench 8" is 4.4 % faster on a dual quad core (HP BL460c G1), Intel E5450 @3.00GHz
(2350 MB/s instead of 2250 MB/s)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5635c10d

rcu: documents rculist_nulls · 536533e6

由 Eric Dumazet 提交于 11月 16, 2008

Adds Documentation/RCU/rculist_nulls.txt file to describe how 'nulls'
end-of-list can help in some RCU algos.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

536533e6

net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls · 3ab5aee7

由 Eric Dumazet 提交于 11月 16, 2008

RCU was added to UDP lookups, using a fast infrastructure :
- sockets kmem_cache use SLAB_DESTROY_BY_RCU and dont pay the
  price of call_rcu() at freeing time.
- hlist_nulls permits to use few memory barriers.

This patch uses same infrastructure for TCP/DCCP established
and timewait sockets.

Thanks to SLAB_DESTROY_BY_RCU, no slowdown for applications
using short lived TCP connections. A followup patch, converting
rwlocks to spinlocks will even speedup this case.

__inet_lookup_established() is pretty fast now we dont have to
dirty a contended cache line (read_lock/read_unlock)

Only established and timewait hashtable are converted to RCU
(bind table and listen table are still using traditional locking)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ab5aee7

udp: Use hlist_nulls in UDP RCU code · 88ab1932

由 Eric Dumazet 提交于 11月 16, 2008

This is a straightforward patch, using hlist_nulls infrastructure.

RCUification already done on UDP two weeks ago.

Using hlist_nulls permits us to avoid some memory barriers, both
at lookup time and delete time.

Patch is large because it adds new macros to include/net/sock.h.
These macros will be used by TCP & DCCP in next patch.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88ab1932

rcu: Introduce hlist_nulls variant of hlist · bbaffaca

由 Eric Dumazet 提交于 11月 16, 2008

hlist uses NULL value to finish a chain.

hlist_nulls variant use the low order bit set to 1 to signal an end-of-list marker.

This allows to store many different end markers, so that some RCU lockless
algos (used in TCP/UDP stack for example) can save some memory barriers in
fast paths.

Two new files are added :

include/linux/list_nulls.h
  - mimics hlist part of include/linux/list.h, derived to hlist_nulls variant

include/linux/rculist_nulls.h
  - mimics hlist part of include/linux/rculist.h, derived to hlist_nulls variant

   Only four helpers are declared for the moment :

     hlist_nulls_del_init_rcu(), hlist_nulls_del_rcu(),
     hlist_nulls_add_head_rcu() and hlist_nulls_for_each_entry_rcu()

prefetches() were removed, since an end of list is not anymore NULL value.
prefetches() could trigger useless (and possibly dangerous) memory transactions.

Example of use (extracted from __udp4_lib_lookup())

	struct sock *sk, *result;
        struct hlist_nulls_node *node;
        unsigned short hnum = ntohs(dport);
        unsigned int hash = udp_hashfn(net, hnum);
        struct udp_hslot *hslot = &udptable->hash[hash];
        int score, badness;

        rcu_read_lock();
begin:
        result = NULL;
        badness = -1;
        sk_nulls_for_each_rcu(sk, node, &hslot->head) {
                score = compute_score(sk, net, saddr, hnum, sport,
                                      daddr, dport, dif);
                if (score > badness) {
                        result = sk;
                        badness = score;
                }
        }
        /*
         * if the nulls value we got at the end of this lookup is
         * not the expected one, we must restart lookup.
         * We probably met an item that was moved to another chain.
         */
        if (get_nulls_value(node) != hash)
                goto begin;

        if (result) {
                if (unlikely(!atomic_inc_not_zero(&result->sk_refcnt)))
                        result = NULL;
                else if (unlikely(compute_score(result, net, saddr, hnum, sport,
                                  daddr, dport, dif) < badness)) {
                        sock_put(result);
                        goto begin;
                }
        }
        rcu_read_unlock();
        return result;
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bbaffaca

TPROXY: implemented IP_RECVORIGDSTADDR socket option · e8b2dfe9

由 Balazs Scheidler 提交于 11月 16, 2008

In case UDP traffic is redirected to a local UDP socket,
the originally addressed destination address/port
cannot be recovered with the in-kernel tproxy.

This patch adds an IP_RECVORIGDSTADDR sockopt that enables
a IP_ORIGDSTADDR ancillary message in recvmsg(). This
ancillary message contains the original destination address/port
of the packet being received.
Signed-off-by: NBalazs Scheidler <bazsi@balabit.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8b2dfe9

ipv4: Fix ARP behavior with many mac-vlans · 8164f1b7

由 Ben Greear 提交于 11月 16, 2008

Ben Greear wrote:
> I have 500 mac-vlans on a system talking to 500 other
> mac-vlans.  My problem is that the arp-table gets extremely
> huge because every time an arp-request comes in on all mac-vlans,
> a stale arp entry is added for each mac-vlan.  I have filtering
> turned on, but that doesn't help because the neigh_event_ns call
> below will cause a stale neighbor entry to be created regardless
> of whether a replay will be sent or not.
> Maybe the neigh_event code should be below the checks for dont_send,
> and only create check neigh_event_ns if we are !dont_send?

The attached patch makes it work much better for me.  The patch
will cause the code to NOT create a stale neighbor entry if we
are not going to respond to the ARP request.  The old code
*would* create a stale entry even if we are not going to respond.
Signed-off-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8164f1b7

e1000e: enable ECC correction on 82571 silicon · 6ea7ae1d

由 Alexander Duyck 提交于 11月 14, 2008

This change enables ECC correction for the packet buffer on all 82571
silicon.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ea7ae1d

phylib: make mdio-gpio work without OF (v4) · f004f3ea

由 Paulius Zaleckas 提交于 11月 14, 2008

make mdio-gpio work with non OpenFirmware gpio implementation.

Aditional changes to mdio-gpio:
- use gpio_request() and gpio_free()
- place irq[] array in struct mdio_gpio_info
- add module description, author and license
- add note about compiling this driver as module
- rename mdc and mdio function (were ugly names)
- change MII to MDIO in bus name
- add __init __exit to module (un)loading functions
- probe fails if no phys added to the bus
- kzalloc bitbang with sizeof(*bitbang)

Changes since v3:
- keep bus naming "%x" to be compatible with existing drivers.

Changes since v2:
- more #ifdefs reduction
- platform driver will be registered on OF platforms also
- unified platform and OF bus_id to phy%i

Changes since v1:
- removed NO_IRQ
- reduced #idefs

Laurent, please test this driver under OF.
Signed-off-by: NPaulius Zaleckas <paulius.zaleckas@teltonika.lt>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f004f3ea

phylib: rename mdio-ofgpio to mdio-gpio · 72af187f

由 Paulius Zaleckas 提交于 11月 14, 2008

Signed-off-by: NPaulius Zaleckas <paulius.zaleckas@teltonika.lt>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72af187f

dm9000: Fix build error. · 6817ba2c

由 David S. Miller 提交于 11月 16, 2008

Reported by Stephen Rothwell:

drivers/net/dm9000.c:1450: error: expected ')' before ';' token
drivers/net/dm9000.c:1455: error: expected ';' before '}' token
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6817ba2c

16 11月, 2008 2 次提交

pegasus: minor resource shrinkage · cda2836d

由 David Brownell 提交于 11月 16, 2008

Make pegasus driver not allocate a workqueue until the driver
is bound to some device, which will need that workqueue if
the device is brought up.  This conserves resources when the
driver is linked but there's no pegasus device connected.

Also shrink the runtime footprint a smidgeon by moving some
init-only code into its proper section, and move an obnoxious
(frequent and meaningless) message to be debug-only.
Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cda2836d

ixgbe: Fix usage of netif_*_all_queues() with netif_carrier_{off|on}() · 74ad0a54

由 PJ Waskiewicz 提交于 11月 07, 2008

netif_carrier_off() is sufficient to stop Tx into the driver. Stopping the Tx
queues is redundant and unnecessary. By the same token, netif_carrier_on()
will be sufficient to re-enable Tx, so waking the queues is unnecessary.
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74ad0a54

14 11月, 2008 4 次提交

net: speedup dst_release() · ef711cf1

由 Eric Dumazet 提交于 11月 14, 2008

During tbench/oprofile sessions, I found that dst_release() was in third position.

CPU: Core 2, speed 2999.68 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        symbol name
483726    9.0185  __copy_user_zeroing_intel
191466    3.5697  __copy_user_intel
185475    3.4580  dst_release
175114    3.2648  ip_queue_xmit
153447    2.8608  tcp_sendmsg
108775    2.0280  tcp_recvmsg
102659    1.9140  sysenter_past_esp
101450    1.8914  tcp_current_mss
95067     1.7724  __copy_from_user_ll
86531     1.6133  tcp_transmit_skb

Of course, all CPUS fight on the dst_entry associated with 127.0.0.1 

Instead of first checking the refcount value, then decrement it,
we use atomic_dec_return() to help CPU to make the right memory transaction
(ie getting the cache line in exclusive mode)

dst_release() is now at the fifth position, and tbench a litle bit faster ;)

CPU: Core 2, speed 3000.1 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples  %        symbol name
647107    8.8072  __copy_user_zeroing_intel
258840    3.5229  ip_queue_xmit
258302    3.5155  __copy_user_intel
209629    2.8531  tcp_sendmsg
165632    2.2543  dst_release
149232    2.0311  tcp_current_mss
147821    2.0119  tcp_recvmsg
137893    1.8767  sysenter_past_esp
127473    1.7349  __copy_from_user_ll
121308    1.6510  ip_finish_output
118510    1.6129  tcp_transmit_skb
109295    1.4875  tcp_v4_rcv
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef711cf1

pkt_sched: Remove qdisc->ops->requeue() etc. · f30ab418

由 Jarek Poplawski 提交于 11月 13, 2008

After implementing qdisc->ops->peek() and changing sch_netem into
classless qdisc there are no more qdisc->ops->requeue() users. This
patch removes this method with its wrappers (qdisc_requeue()), and
also unused qdisc->requeue structure. There are a few minor fixes of
warnings (htb_enqueue()) and comments btw.

The idea to kill ->requeue() and a similar patch were first developed
by David S. Miller.
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f30ab418

tcp: remove an unnecessary field in struct tcp_skb_cb · 38a7ddff

由 Petr Tesarik 提交于 11月 13, 2008

The urg_ptr field is not used anywhere and is merely confusing.
Signed-off-by: NPetr Tesarik <ptesarik@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38a7ddff

isdn: use %pI4, remove get_{u8/u16/u32} and put_{u8/u16/u32} inlines · 00bcd522

由 Harvey Harrison 提交于 11月 13, 2008

They would have been better named as get_be16, put_be16, etc.
as they were hiding an endian shift inside.

They don't add much over explicitly coding the byteshifting
and gcc sometimes has a problem with builtin_constant_p inside
inline functions, so it may do a better job of byteswapping
at compile time rather than runtime.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00bcd522

13 11月, 2008 11 次提交

netdevice: safe convert to netdev_priv() #part-4 · 524ad0a7

由 Wang Chen 提交于 11月 12, 2008

We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
   netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.

This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.
Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

524ad0a7

netdevice: safe convert to netdev_priv() #part-3 · 8f15ea42

由 Wang Chen 提交于 11月 12, 2008

We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
   netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.

This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.
Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f15ea42

netdevice: safe convert to netdev_priv() #part-2 · 4cf1653a

由 Wang Chen 提交于 11月 12, 2008

We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
   netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.

This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.
Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4cf1653a

netdevice: safe convert to netdev_priv() #part-1 · 454d7c9b

由 Wang Chen 提交于 11月 12, 2008

We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
   netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.

This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.
Signed-off-by: NWang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

454d7c9b

net: Remove unused parameter of xfrm_gen_index() · 7a12122c

由 Arnaud Ebalard 提交于 11月 12, 2008

In commit 2518c7c2 ("[XFRM]: Hash
policies when non-prefixed."), the last use of xfrm_gen_policy() first
argument was removed, but the argument was left behind in the
prototype.
Signed-off-by: NArnaud Ebalard <arno@natisbad.org>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a12122c

net: ifdef struct sock::sk_async_wait_queue · 23789824

由 Alexey Dobriyan 提交于 11月 12, 2008

Every user is under CONFIG_NET_DMA already, so ifdef field as well.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23789824

bnx2: Update version to 1.8.2. · e4412cb8

由 Michael Chan 提交于 11月 12, 2008

Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NMatt Carlson <mcarlson@broadcom.com>
Signed-off-by: NBenjamin Li <benli@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4412cb8

bnx2: Reorganize timeout constants. · 40105c0b

由 Michael Chan 提交于 11月 12, 2008

Move all related timeout constants to the same location.  BNX2
prefix is also added to make them more consistent.
Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NMatt Carlson <mcarlson@broadcom.com>
Signed-off-by: NBenjamin Li <benli@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40105c0b

bnx2: Set rx buffer water marks based on MTU. · d8026d93

由 Michael Chan 提交于 11月 12, 2008

The default rx buffer water marks for XOFF/XON are for 1500 MTU.  At
larger MTUs, these water marks need to be adjusted for effective
flow control.
Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NMatt Carlson <mcarlson@broadcom.com>
Signed-off-by: NBenjamin Li <benli@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8026d93

bnx2: Restrict WoL support. · 5ec6d7bf

由 Michael Chan 提交于 11月 12, 2008

On some quad-port cards that cannot support WoL on all ports due
to excessive power consumption, the driver needs to restrict WoL
on some ports by checking VAUX_PRESET bit.
Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NMatt Carlson <mcarlson@broadcom.com>
Signed-off-by: NBenjamin Li <benli@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ec6d7bf

bnx2: Add PCI ID for 5716S. · 1caacecb

由 Michael Chan 提交于 11月 12, 2008

Signed-off-by: NMichael Chan <mchan@broadcom.com>
Signed-off-by: NMatt Carlson <mcarlson@broadcom.com>
Signed-off-by: NBenjamin Li <benli@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1caacecb

12 11月, 2008 8 次提交

net: Cleanup of neighbour code · e42ea986

由 Eric Dumazet 提交于 11月 12, 2008

Using read_pnet() and write_pnet() in neighbour code ease the reading
of code.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e42ea986

net: ib_net pointer should depends on CONFIG_NET_NS · 7a9546ee

由 Eric Dumazet 提交于 11月 12, 2008

We can shrink size of "struct inet_bind_bucket" by 50%, using
read_pnet() and write_pnet()
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a9546ee

net: Introduce read_pnet() and write_pnet() helpers · 8f424b5f

由 Eric Dumazet 提交于 11月 12, 2008

This patch introduces two helpers that deal with reading and writing
struct net pointers in various network structures.

Their implementation depends on CONFIG_NET_NS

For symmetry, both functions work with "struct net **pnet".

Their usage should reduce the number of #ifdef CONFIG_NET_NS,
without adding many helpers for each network structure
that hold a "struct net *pointer"
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f424b5f

dccp: Resolve dependencies of features on choice of CCID · 9eca0a47

由 Gerrit Renker 提交于 11月 12, 2008

This provides a missing link in the code chain, as several features implicitly
depend and/or rely on the choice of CCID. Most notably, this is the Send Ack Vector
feature, but also Ack Ratio and Send Loss Event Rate (also taken care of).

For Send Ack Vector, the situation is as follows:
 * since CCID2 mandates the use of Ack Vectors, there is no point in allowing 
   endpoints which use CCID2 to disable Ack Vector features such a connection;

 * a peer with a TX CCID of CCID2 will always expect Ack Vectors, and a peer
   with a RX CCID of CCID2 must always send Ack Vectors (RFC 4341, sec. 4);

 * for all other CCIDs, the use of (Send) Ack Vector is optional and thus
   negotiable. However, this implies that the code negotiating the use of Ack
   Vectors also supports it (i.e. is able to supply and to either parse or
   ignore received Ack Vectors). Since this is not the case (CCID-3 has no Ack
   Vector support), the use of Ack Vectors is here disabled, with a comment
   in the source code.

An analogous consideration arises for the Send Loss Event Rate feature,
since the CCID-3 implementation does not support the loss interval options
of RFC 4342. To make such use explicit, corresponding feature-negotiation
options are inserted which signal the use of the loss event rate option,
as it is used by the CCID3 code.

Lastly, the values of the Ack Ratio feature are matched to the choice of CCID.

The patch implements this as a function which is called after the user has
made all other registrations for changing default values of features.

The table is variable-length, the reserved (and hence for feature-negotiation
invalid, confirmed by considering section 19.4 of RFC 4340) feature number `0'
is used to mark the end of the table.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9eca0a47

dccp: Query supported CCIDs · d90ebcbf

由 Gerrit Renker 提交于 11月 12, 2008

This provides a data structure to record which CCIDs are locally supported
and three accessor functions:
 - a test function for internal use which is used to validate CCID requests
   made by the user;
 - a copy function so that the list can be used for feature-negotiation;   
 - documented getsockopt() support so that the user can query capabilities.

The data structure is a table which is filled in at compile-time with the
list of available CCIDs (which in turn depends on the Kconfig choices).

Using the copy function for cloning the list of supported CCIDs is useful for
feature negotiation, since the negotiation is now with the full list of available
CCIDs (e.g. {2, 3}) instead of the default value {2}. This means negotiation 
will not fail if the peer requests to use CCID3 instead of CCID2. 
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d90ebcbf

dccp: Registration routines for changing feature values · e8ef967a

由 Gerrit Renker 提交于 11月 12, 2008

Two registration routines, for SP and NN features, are provided by this patch,
replacing a previous routine which was used for both feature types.

These are internal-only routines and therefore start with `__feat_register'.

It further exports the known limits of Sequence Window and Ack Ratio as symbolic
constants.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8ef967a

dccp: Limit feature negotiation to connection setup phase · f74e91b6

由 Gerrit Renker 提交于 11月 12, 2008

This patch limits feature (capability) negotation to the connection setup phase:

1. Although it is theoretically possible to perform feature negotiation at any
time (and RFC 4340 supports this), in practice this is prohibitively complex,
as it requires to put traffic on hold for each new negotiation.
2. As a byproduct of restricting feature negotiation to connection setup, the
feature-negotiation retransmit timer is no longer required. This part is now
mapped onto the protocol-level retransmission.
Details indicating why timers are no longer needed can be found on
http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/feature_negotiation/\
implementation_notes.html

This patch disables anytime negotiation, subsequent patches work out full
feature negotiation support for connection setup.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f74e91b6

net: remove struct dst_entry::entry_size · 6bb3ce25

由 Alexey Dobriyan 提交于 11月 11, 2008

Unused after kmem_cache_zalloc() conversion.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bb3ce25

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功