提交 · dc2caba7b321289e7d02e63d7216961ccecfa103 · OpenHarmony / kernel_linux

26 11月, 2008 20 次提交

netns xfrm: per-netns policy counts · dc2caba7

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc2caba7

netns xfrm: per-netns xfrm_policy_bydst hash · a35f6c5d

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a35f6c5d

netns xfrm: per-netns inexact policies · 8b18f8ea

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b18f8ea

netns xfrm: per-netns xfrm_policy_byidx hashmask · 8100bea7

由 Alexey Dobriyan 提交于 11月 25, 2008

Per-netns hashes are independently resizeable.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8100bea7

netns xfrm: per-netns xfrm_policy_byidx hash · 93b851c1

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93b851c1

netns xfrm: per-netns policy list · adfcf0b2

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adfcf0b2

netns xfrm: add struct xfrm_policy::xp_net · 0331b1f3

由 Alexey Dobriyan 提交于 11月 25, 2008

Again, to avoid complications with passing netns when not necessary.
Again, ->xp_net is set-once field, once set it never changes.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0331b1f3

netns xfrm: per-netns km_waitq · 50a30657

由 Alexey Dobriyan 提交于 11月 25, 2008

Disallow spurious wakeups in __xfrm_lookup().
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50a30657

netns xfrm: per-netns state GC work · c7837144

由 Alexey Dobriyan 提交于 11月 25, 2008

State GC is per-netns, and this is part of it.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7837144

netns xfrm: per-netns state GC list · b8a0ae20

由 Alexey Dobriyan 提交于 11月 25, 2008

km_waitq is going to be made per-netns to disallow spurious wakeups
in __xfrm_lookup().

To not wakeup after every garbage-collected xfrm_state (which potentially
can be from different netns) make state GC list per-netns.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8a0ae20

netns xfrm: per-netns xfrm_hash_work · 63082733

由 Alexey Dobriyan 提交于 11月 25, 2008

All of this is implicit passing which netns's hashes should be resized.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63082733

netns xfrm: per-netns xfrm_state counts · 0bf7c5b0

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bf7c5b0

netns xfrm: per-netns xfrm_state_hmask · 529983ec

由 Alexey Dobriyan 提交于 11月 25, 2008

Since hashtables are per-netns, they can be independently resized.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

529983ec

netns xfrm: per-netns xfrm_state_byspi hash · b754a4fd

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b754a4fd

netns xfrm: per-netns xfrm_state_bysrc hash · d320bbb3

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d320bbb3

netns xfrm: per-netns xfrm_state_bydst hash · 73d189dc

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73d189dc

netns xfrm: per-netns xfrm_state_all list · 9d4139c7

由 Alexey Dobriyan 提交于 11月 25, 2008

This is done to get
a) simple "something leaked" check
b) cover possible DoSes when other netns puts many, many xfrm_states
   onto a list.
c) not miss "alien xfrm_state" check in some of list iterators in future.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d4139c7

netns xfrm: add struct xfrm_state::xs_net · 673c09be

由 Alexey Dobriyan 提交于 11月 25, 2008

To avoid unnecessary complications with passing netns around.

* set once, very early after allocating
* once set, never changes

For a while create every xfrm_state in init_net.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

673c09be

netns xfrm: add netns boilerplate · d62ddc21

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d62ddc21

tcp: tcp_limit_reno_sacked can become static · 8eecaba9

由 Ilpo Järvinen 提交于 11月 25, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8eecaba9

25 11月, 2008 3 次提交

tcp: Try to restore large SKBs while SACK processing · 832d11c5

由 Ilpo Järvinen 提交于 11月 24, 2008

During SACK processing, most of the benefits of TSO are eaten by
the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
Then we're in problems when cleanup work for them has to be done
when a large cumulative ACK comes. Try to return back to pre-split
state already while more and more SACK info gets discovered by
combining newly discovered SACK areas with the previous skb if
that's SACKed as well.

This approach has a number of benefits:

1) The processing overhead is spread more equally over the RTT
2) Write queue has less skbs to process (affect everything
   which has to walk in the queue past the sacked areas)
3) Write queue is consistent whole the time, so no other parts
   of TCP has to be aware of this (this was not the case with
   some other approach that was, well, quite intrusive all
   around).
4) Clean_rtx_queue can release most of the pages using single
   put_page instead of previous PAGE_SIZE/mss+1 calls

In case a hole is fully filled by the new SACK block, we attempt
to combine the next skb too which allows construction of skbs
that are even larger than what tso split them to and it handles
hole per on every nth patterns that often occur during slow start
overshoot pretty nicely. Though this to be really useful also
a retransmission would have to get lost since cumulative ACKs
advance one hole at a time in the most typical case.

TODO: handle upwards only merging. That should be rather easy
when segment is fully sacked but I'm leaving that as future
work item (it won't make very large difference anyway since
this current approach already covers quite a lot of normal
cases).

I was earlier thinking of some sophisticated way of tracking
timestamps of the first and the last segment but later on
realized that it won't be that necessary at all to store the
timestamp of the last segment. The cases that can occur are
basically either:
  1) ambiguous => no sensible measurement can be taken anyway
  2) non-ambiguous is due to reordering => having the timestamp
     of the last segment there is just skewing things more off
     than does some good since the ack got triggered by one of
     the holes (besides some substle issues that would make
     determining right hole/skb even harder problem). Anyway,
     it has nothing to do with this change then.

I choose to route some abnormal looking cases with goto noop,
some could be handled differently (eg., by stopping the
walking at that skb but again). In general, they either
shouldn't happen at all or are rare enough to make no difference
in practice.

In theory this change (as whole) could cause some macroscale
regression (global) because of cache misses that are taken over
the round-trip time but it gets very likely better because of much
less (local) cache misses per other write queue walkers and the
big recovery clearing cumulative ack.

Worth to note that these benefits would be very easy to get also
without TSO/GSO being on as long as the data is in pages so that
we can merge them. Currently I won't let that happen because
DSACK splitting at fragment that would mess up pcounts due to
sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
avoided, we have some conditions that can be made less strict.

TODO: I will probably have to convert the excessive pointer
passing to struct sacktag_state... :-)

My testing revealed that considerable amount of skbs couldn't
be shifted because they were cloned (most likely still awaiting
tx reclaim)...

[The rest is considering future work instead since I got
repeatably EFAULT to tcpdump's recvfrom when I added
pskb_expand_head to deal with clones, so I separated that
into another, later patch]

...To counter that, I gave up on the fifth advantage:

5) When growing previous SACK block, less allocs for new skbs
   are done, basically a new alloc is needed only when new hole
   is detected and when the previous skb runs out of frags space

...which now only happens of if reclaim is fast enough to dispose
the clone before the SACK block comes in (the window is RTT long),
otherwise we'll have to alloc some.

With clones being handled I got these numbers (will be somewhat
worse without that), taken with fine-grained mibs:

                  TCPSackShifted 398
                   TCPSackMerged 877
            TCPSackShiftFallback 320
      TCPSACKCOLLAPSEFALLBACKGSO 0
  TCPSACKCOLLAPSEFALLBACKSKBBITS 0
  TCPSACKCOLLAPSEFALLBACKSKBDATA 0
    TCPSACKCOLLAPSEFALLBACKBELOW 0
    TCPSACKCOLLAPSEFALLBACKFIRST 1
 TCPSACKCOLLAPSEFALLBACKPREVBITS 318
      TCPSACKCOLLAPSEFALLBACKMSS 1
   TCPSACKCOLLAPSEFALLBACKNOHEAD 0
    TCPSACKCOLLAPSEFALLBACKSHIFT 0
          TCPSACKCOLLAPSENOOPSEQ 0
  TCPSACKCOLLAPSENOOPSMALLPCOUNT 0
     TCPSACKCOLLAPSENOOPSMALLLEN 0
             TCPSACKCOLLAPSEHOLE 12
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

832d11c5

tcp: move tcp_simple_retransmit to tcp_input · e1aa680f

由 Ilpo Järvinen 提交于 11月 24, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1aa680f

net: avoid a pair of dst_hold()/dst_release() in ip_append_data() · 2e77d89b

由 Eric Dumazet 提交于 11月 24, 2008

We can reduce pressure on dst entry refcount that slowdown UDP transmit
path on SMP machines. This pressure is visible on RTP servers when
delivering content to mediagateways, especially big ones, handling
thousand of streams. Several cpus send UDP frames to the same
destination, hence use the same dst entry.

This patch makes ip_append_data() eventually steal the refcount its
callers had to take on the dst entry.

This doesnt avoid all refcounting, but still gives speedups on SMP,
on UDP/RAW transmit path
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e77d89b

24 11月, 2008 1 次提交

net: Convert TCP/DCCP listening hash tables to use RCU · c25eb3bf

由 Eric Dumazet 提交于 11月 23, 2008

This is the last step to be able to perform full RCU lookups
in __inet_lookup() : After established/timewait tables, we
add RCU lookups to listening hash table.

The only trick here is that a socket of a given type (TCP ipv4,
TCP ipv6, ...) can now flight between two different tables
(established and listening) during a RCU grace period, so we
must use different 'nulls' end-of-chain values for two tables.

We define a large value :

#define LISTENING_NULLS_BASE (1U << 29)

So that slots in listening table are guaranteed to have different
end-of-chain values than slots in established table. A reader can
still detect it finished its lookup in the right chain.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c25eb3bf

22 11月, 2008 8 次提交

K
WAN: syncppp.c is no longer used by any kernel code. Remove it. · 72364706
由 Krzysztof Hałasa 提交于 8月 14, 2008
```
Signed-off-by: NKrzysztof Hałasa <khc@pm.waw.pl>
```
72364706

net: use net_eq() in INET_MATCH and INET_TW_MATCH · f757fec4

由 Eric Dumazet 提交于 11月 21, 2008

We can avoid some useless instructions if !CONFIG_NET_NS

Because of RCU, we use INET_MATCH or INET_TW_MATCH twice for the found
socket, so thats six instructions less per incoming TCP packet.

Yet another tbench speedup :)
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f757fec4

wireless: missing include in lib80211.h · a1eb5fe3

由 Rami Rosen 提交于 11月 19, 2008

This patch adds #include <linux/timer.h> in lib80211.h to avoid
these compilation erros.

> In file included from /work/src/wireless-testing/net/wireless/lib80211.c:24:
> /work/src/wireless-testing/include/net/lib80211.h:113: error: field
> 'crypt_deinit_timer' has incomplete type
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_info_init':
> /work/src/wireless-testing/net/wireless/lib80211.c:83: error: implicit
> declaration of function 'setup_timer'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_info_free':
> /work/src/wireless-testing/net/wireless/lib80211.c:95: error: implicit
> declaration of function 'del_timer_sync'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_deinit_handler':
> /work/src/wireless-testing/net/wireless/lib80211.c:157: error:
> implicit declaration of function 'add_timer'
> /work/src/wireless-testing/net/wireless/lib80211.c: In function
> 'lib80211_crypt_delayed_deinit':
> /work/src/wireless-testing/net/wireless/lib80211.c:182: error:
> implicit declaration of function 'timer_pending'
> make[3]: *** [net/wireless/lib80211.o] Error 1
> make[2]: *** [net/wireless] Error 2
> make[1]: *** [net] Error 2
> make: *** [sub-make] Error 2
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

a1eb5fe3

mac80211: add explicit padding in struct ieee80211_tx_info · 62727101

由 John W. Linville 提交于 11月 12, 2008

Otherwise, the BUILD_BUG_ON calls in ieee80211_tx_info_clear_status can
fail on some architectures.
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

62727101

J
lib80211: consolidate crypt init routines · 2ba4b32e
由 John W. Linville 提交于 11月 11, 2008
```
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
```
2ba4b32e

lib80211: absorb crypto bits from net/ieee80211 · 274bfb8d

由 John W. Linville 提交于 10月 29, 2008

These bits are shared already between ipw2x00 and hostap, and could
probably be shared both more cleanly and with other drivers.  This
commit simply relocates the code to lib80211 and adjusts the drivers
appropriately.
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

274bfb8d

mac80211: remove more excess kernel-doc · 0ed94eaa

由 Randy Dunlap 提交于 11月 07, 2008

Delete kernel-doc struct descriptions for fields that don't exist:

Warning(include/net/mac80211.h:1263): Excess struct/union/enum/typedef member 'conf_ht' description in 'ieee80211_ops'
Warning(net/mac80211/sta_info.h:309): Excess struct/union/enum/typedef member 'addr' description in 'sta_info'
Warning(net/mac80211/sta_info.h:309): Excess struct/union/enum/typedef member 'aid' description in 'sta_info'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
cc: Johannes Berg <johannes@sipsolutions.net>
cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

0ed94eaa

mac80211: fix BUILD_BUG_ON() caused by misalignment on arm · 4821277f

由 Felix Fietkau 提交于 11月 03, 2008

On ARM alignment is done slightly different from other architectures.
struct ieee80211_tx_rate is aligned to word size, even though it only has 3
single-byte members, which triggers the BUILD_BUG_ON in
ieee80211_tx_info_clear_status

This patch marks the struct ieee80211_tx_rate as packed, so that ARM
behaves like the other architectures.
Signed-off-by: NFelix Fietkau <nbd@openwrt.org>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

4821277f

21 11月, 2008 6 次提交

DCB: Add support for DCB BCN · 859ee3c4

由 Alexander Duyck 提交于 11月 20, 2008

Adds an interface to configure the Backward Congestion Notification
(BCN) feature.  In a BCN capabale network, congestion notifications
from congested points out in the network can cause the end station
limit the rate of a given traffic flow.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

859ee3c4

DCB: Add interface to query the state of PFC feature. · 0eb3aa9b

由 Alexander Duyck 提交于 11月 20, 2008

Adds a netlink interface for Data Center Bridging (DCB) to get and set
the enable state of the Priority Flow Control (PFC) feature.
Primarily, this is a way to turn off PFC in the driver while DCB
remains enabled.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eb3aa9b

DCB: Add interface to query # of TCs supported by device · 33dbabc4

由 Alexander Duyck 提交于 11月 20, 2008

Adds interface for Data Center Bridging (DCB) to query (and set if
supported) the number of traffic classes currently supported by the
device for the two (DCB) features: priority groups (PG) and priority
flow control (PFC).
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33dbabc4

DCB: Add interface to query for the DCB capabilities of an device. · 46132188

由 Alexander Duyck 提交于 11月 20, 2008

Adds to the netlink interface for Data Center Bridging (DCB), allowing
the DCB capabilities supported by a device to be queried.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46132188

ixgbe: this patch adds support for DCB to the kernel and ixgbe driver · 2f90b865

由 Alexander Duyck 提交于 11月 20, 2008

This adds support for Data Center Bridging (DCB) features in the ixgbe
driver and adds an rtnetlink interface for configuring DCB to the
kernel. The DCB feature support included are Priority Grouping (PG) -
which allows bandwidth guarantees to be allocated to groups to traffic
based on the 802.1q priority, and Priority Based Flow Control (PFC) -
which introduces a new MAC control PAUSE frame which works at
granularity of the 802.1p priority instead of the link (IEEE 802.3x).
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NPeter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f90b865

net: convert TCP/DCCP ehash rwlocks to spinlocks · 9db66bdc

由 Eric Dumazet 提交于 11月 20, 2008

Now TCP & DCCP use RCU lookups, we can convert ehash rwlocks to spinlocks.

/proc/net/tcp and other seq_file 'readers' can safely be converted to 'writers'.

This should speedup writers, since spin_lock()/spin_unlock()
only use one atomic operation instead of two for write_lock()/write_unlock()
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9db66bdc

20 11月, 2008 2 次提交

net: listening_hash get a spinlock per bucket · 5caea4ea

由 Eric Dumazet 提交于 11月 20, 2008

This patch prepares RCU migration of listening_hash table for
TCP/DCCP protocols.

listening_hash table being small (32 slots per protocol), we add
a spinlock for each slot, instead of a single rwlock for whole table.

This should reduce hold time of readers, and writers concurrency.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5caea4ea

include/net net/ - csum_partial - remove unnecessary casts · 07f0757a

由 Joe Perches 提交于 11月 19, 2008

The first argument to csum_partial is const void *
casts to char/u8 * are not necessary
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07f0757a

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多