提交 · 0331b1f383e1fa4049f8e75cafeea8f006171c64 · openeuler / raspberrypi-kernel

26 11月, 2008 29 次提交

netns xfrm: add struct xfrm_policy::xp_net · 0331b1f3

由 Alexey Dobriyan 提交于 11月 25, 2008

Again, to avoid complications with passing netns when not necessary.
Again, ->xp_net is set-once field, once set it never changes.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0331b1f3

netns xfrm: per-netns km_waitq · 50a30657

由 Alexey Dobriyan 提交于 11月 25, 2008

Disallow spurious wakeups in __xfrm_lookup().
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50a30657

netns xfrm: per-netns state GC work · c7837144

由 Alexey Dobriyan 提交于 11月 25, 2008

State GC is per-netns, and this is part of it.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7837144

netns xfrm: per-netns state GC list · b8a0ae20

由 Alexey Dobriyan 提交于 11月 25, 2008

km_waitq is going to be made per-netns to disallow spurious wakeups
in __xfrm_lookup().

To not wakeup after every garbage-collected xfrm_state (which potentially
can be from different netns) make state GC list per-netns.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b8a0ae20

netns xfrm: per-netns xfrm_hash_work · 63082733

由 Alexey Dobriyan 提交于 11月 25, 2008

All of this is implicit passing which netns's hashes should be resized.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63082733

netns xfrm: per-netns xfrm_state counts · 0bf7c5b0

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bf7c5b0

netns xfrm: per-netns xfrm_state_hmask · 529983ec

由 Alexey Dobriyan 提交于 11月 25, 2008

Since hashtables are per-netns, they can be independently resized.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

529983ec

netns xfrm: per-netns xfrm_state_byspi hash · b754a4fd

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b754a4fd

netns xfrm: per-netns xfrm_state_bysrc hash · d320bbb3

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d320bbb3

netns xfrm: per-netns xfrm_state_bydst hash · 73d189dc

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73d189dc

netns xfrm: per-netns xfrm_state_all list · 9d4139c7

由 Alexey Dobriyan 提交于 11月 25, 2008

This is done to get
a) simple "something leaked" check
b) cover possible DoSes when other netns puts many, many xfrm_states
   onto a list.
c) not miss "alien xfrm_state" check in some of list iterators in future.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d4139c7

netns xfrm: add struct xfrm_state::xs_net · 673c09be

由 Alexey Dobriyan 提交于 11月 25, 2008

To avoid unnecessary complications with passing netns around.

* set once, very early after allocating
* once set, never changes

For a while create every xfrm_state in init_net.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

673c09be

netns xfrm: add netns boilerplate · d62ddc21

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d62ddc21

xfrm: initialise xfrm_policy_gc_work statically · c9583969

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9583969

bluetooth: fix warning in net/bluetooth/rfcomm/sock.c · 45555c0e

由 Ingo Molnar 提交于 11月 25, 2008

fix this warning:

  net/bluetooth/rfcomm/sock.c: In function ‘rfcomm_sock_ioctl’:
  net/bluetooth/rfcomm/sock.c:795: warning: unused variable ‘sk’

perhaps BT_DEBUG() should be improved to do printf format checking
instead of the #ifdef, but that looks quite intrusive: each bluetooth
.c file undefines the macro.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45555c0e

sunrpc: fix warning in net/sunrpc/xprtrdma/verbs.c · ff0db049

由 Ingo Molnar 提交于 11月 25, 2008

fix this warning:

  net/sunrpc/xprtrdma/verbs.c: In function ‘rpcrdma_conn_upcall’:
  net/sunrpc/xprtrdma/verbs.c:279: warning: unused variable ‘addr’
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff0db049

ax25: fix warning in net/ax25/sysctl_net_ax25.c · e14bec2e

由 Ingo Molnar 提交于 11月 25, 2008

fix this warning:

  net/ax25/sysctl_net_ax25.c:27: warning: ‘min_ds_timeout’ defined but not used
  net/ax25/sysctl_net_ax25.c:27: warning: ‘max_ds_timeout’ defined but not used

These are only used in the CONFIG_AX25_DAMA_SLAVE case.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e14bec2e

dccp: fix warning in net/dccp/options.c · 3ed7cc0f

由 Ingo Molnar 提交于 11月 25, 2008

this warning:

  net/dccp/options.c: In function ‘dccp_parse_options’:
  net/dccp/options.c:67: warning: ‘value’ may be used uninitialized in this function

is a bogus GCC warning. The compiler does not recognize the relation
between "value" and "mandatory" variables: the code flow can ever reach
the "out_invalid_option:" label if 'mandatory' is set to 1, and when
'mandatory' is non-zero, we'll always have 'value' initialized.

Help out the compiler by annotating the variable.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ed7cc0f

dsa: fix warning in net/dsa/mv88e6060.c · d3f644da

由 Ingo Molnar 提交于 11月 25, 2008

this warning:

  net/dsa/mv88e6060.c: In function ‘mv88e6060_poll_link’:
  net/dsa/mv88e6060.c:225: warning: ‘port_status’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between 'link' and 'port_status'.

Annotate it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3f644da

dsa: fix warning in net/dsa/mv88e6xxx.c · 2a9e7978

由 Ingo Molnar 提交于 11月 25, 2008

this warning:

  net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_poll_link’:
  net/dsa/mv88e6xxx.c:361: warning: ‘port_status’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between 'link' and 'port_status'.

Annotate it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a9e7978

ipv6: fix warning in net/ipv6/ip6_flowlabel.c · 55205d40

由 Ingo Molnar 提交于 11月 25, 2008

this warning:

  net/ipv6/ip6_flowlabel.c: In function ‘ipv6_flowlabel_opt’:
  net/ipv6/ip6_flowlabel.c:467: warning: ‘err’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between fl_create() and 'err'.

Annotate it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55205d40

pkt_sched: fix warning in net/sched/sch_hfsc.c · dc0a0011

由 Ingo Molnar 提交于 11月 25, 2008

this warning:

  net/sched/sch_hfsc.c: In function ‘hfsc_enqueue’:
  net/sched/sch_hfsc.c:1577: warning: ‘err’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) error flow
between hfsc_classify(), 'cl' and 'err'.

Annotate it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc0a0011

sunrpc: fix warning in net/sunrpc/xprtrdma/svc_rdma_transport.c · ed72b9c6

由 Ingo Molnar 提交于 11月 25, 2008

this warning:

  net/sunrpc/xprtrdma/svc_rdma_transport.c: In function ‘svc_rdma_accept’:
  net/sunrpc/xprtrdma/svc_rdma_transport.c:830: warning: ‘dma_mr_acc’ may be used uninitialized in this function

triggers because GCC does not recognize the (correct) flow connection
between need_dma_mr and dma_mr_acc.

Annotate it.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ed72b9c6

netns: filter out uevent not belonging to init_net · 09bb5217

由 Daniel Lezcano 提交于 11月 25, 2008

This patch will filter out the uevent not related to the init_net.
Without this patch if a network device is created in a network
namespace with the same name as one network device belonging to the
initial network namespace (eg. eth0), when the network namespace
will die and the network device will be destroyed, an event will
be sent and catched by the udevd daemon. That will result to have
the real network device to be shutdown because the udevd/uevent are
not namespace aware.
Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09bb5217

tcp: skb_shift cannot cache frag ptrs past pskb_expand_head · 9f782db3

由 Ilpo Järvinen 提交于 11月 25, 2008

Since pskb_expand_head creates copy of the shared area we
cannot keep any frag ptr past de-cloning. This fixes the
tcpdump recvfrom -EFAULT problem.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f782db3

pkt_sched: sch_api: Remove qdisc_list_lock · f6486d40

由 Jarek Poplawski 提交于 11月 25, 2008

After implementing qdisc->ops->peek() there is no more calling
qdisc_tree_decrease_qlen() without rtnl_lock(), so qdisc_list_lock
added by commit: f6e0b239 "pkt_sched:
Fix qdisc list locking" can be removed.
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6486d40

net: udp_unhash() can test if sk is hashed · 723b4610

由 Eric Dumazet 提交于 11月 25, 2008

Impact: Optimization

Like done in inet_unhash(), we can avoid taking a chain lock if
socket is not hashed in udp_unhash()

Triggered by close(socket(AF_INET, SOCK_DGRAM, 0));
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

723b4610

net: Make sure BHs are disabled in sock_prot_inuse_add() · 5bc0b3bf

由 Eric Dumazet 提交于 11月 25, 2008

prot->destroy is not called with BH disabled. So we must add
explicit BH disable around call to sock_prot_inuse_add()
in sctp_destroy_sock()
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5bc0b3bf

tcp: tcp_limit_reno_sacked can become static · 8eecaba9

由 Ilpo Järvinen 提交于 11月 25, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8eecaba9

25 11月, 2008 11 次提交

xfrm: remove useless forward declarations · fb7e0674

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb7e0674

ah4/ah6: remove useless NULL assignments · 6daad372

由 Alexey Dobriyan 提交于 11月 25, 2008

struct will be kfreed in a moment, so...
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6daad372

DCB: fix kconfig option · 7a6b6f51

由 Jeff Kirsher 提交于 11月 25, 2008

Since the netlink option for DCB is necessary to actually be useful,
simplified the Kconfig option.  In addition, added useful help text for the
Kconfig option.
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a6b6f51

tcp: handle shift/merge of cloned skbs too · 0ace2856

由 Ilpo Järvinen 提交于 11月 24, 2008

This caused me to get repeatably:

  tcpdump: pcap_loop: recvfrom: Bad address

Happens occassionally when I tcpdump my for-looped test xfers:
  while [ : ]; do echo -n "$(date '+%s.%N') "; ./sendfile; sleep 20; done

Rest of the relevant commands:
  ethtool -K eth0 tso off
  tc qdisc add dev eth0 root netem drop 4%
  tcpdump -n -s0 -i eth0 -w sacklog.all

Running net-next under kvm, connection goes to the same host
(basically just out of kvm). The connection itself works ok
and data gets sent without corruption even with a large
number of tests while tcpdump fails usually within less than
5 tests.

Whether it only happens because of this change or not, I
don't know for sure but it's the only thing with which
I've seen that error. The non-cloned variant works w/o it
for much longer time. I'm yet to debug where the error
actually comes from.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ace2856

tcp: add some mibs to track collapsing · 111cc8b9

由 Ilpo Järvinen 提交于 11月 24, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

111cc8b9

tcp: Make shifting not clear the hints · 92ee76b6

由 Ilpo Järvinen 提交于 11月 24, 2008

The earlier version was just very basic one which is "playing
safe" by always clearing the hints. However, clearing of a hint
is extremely costly operation with large windows, so it must be
avoided at all cost whenever possible, there is a way with
shifting too achieve not-clearing.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92ee76b6

tcp: Try to restore large SKBs while SACK processing · 832d11c5

由 Ilpo Järvinen 提交于 11月 24, 2008

During SACK processing, most of the benefits of TSO are eaten by
the SACK blocks that one-by-one fragment SKBs to MSS sized chunks.
Then we're in problems when cleanup work for them has to be done
when a large cumulative ACK comes. Try to return back to pre-split
state already while more and more SACK info gets discovered by
combining newly discovered SACK areas with the previous skb if
that's SACKed as well.

This approach has a number of benefits:

1) The processing overhead is spread more equally over the RTT
2) Write queue has less skbs to process (affect everything
   which has to walk in the queue past the sacked areas)
3) Write queue is consistent whole the time, so no other parts
   of TCP has to be aware of this (this was not the case with
   some other approach that was, well, quite intrusive all
   around).
4) Clean_rtx_queue can release most of the pages using single
   put_page instead of previous PAGE_SIZE/mss+1 calls

In case a hole is fully filled by the new SACK block, we attempt
to combine the next skb too which allows construction of skbs
that are even larger than what tso split them to and it handles
hole per on every nth patterns that often occur during slow start
overshoot pretty nicely. Though this to be really useful also
a retransmission would have to get lost since cumulative ACKs
advance one hole at a time in the most typical case.

TODO: handle upwards only merging. That should be rather easy
when segment is fully sacked but I'm leaving that as future
work item (it won't make very large difference anyway since
this current approach already covers quite a lot of normal
cases).

I was earlier thinking of some sophisticated way of tracking
timestamps of the first and the last segment but later on
realized that it won't be that necessary at all to store the
timestamp of the last segment. The cases that can occur are
basically either:
  1) ambiguous => no sensible measurement can be taken anyway
  2) non-ambiguous is due to reordering => having the timestamp
     of the last segment there is just skewing things more off
     than does some good since the ack got triggered by one of
     the holes (besides some substle issues that would make
     determining right hole/skb even harder problem). Anyway,
     it has nothing to do with this change then.

I choose to route some abnormal looking cases with goto noop,
some could be handled differently (eg., by stopping the
walking at that skb but again). In general, they either
shouldn't happen at all or are rare enough to make no difference
in practice.

In theory this change (as whole) could cause some macroscale
regression (global) because of cache misses that are taken over
the round-trip time but it gets very likely better because of much
less (local) cache misses per other write queue walkers and the
big recovery clearing cumulative ack.

Worth to note that these benefits would be very easy to get also
without TSO/GSO being on as long as the data is in pages so that
we can merge them. Currently I won't let that happen because
DSACK splitting at fragment that would mess up pcounts due to
sk_can_gso in tcp_set_skb_tso_segs. Once DSACKs fragments gets
avoided, we have some conditions that can be made less strict.

TODO: I will probably have to convert the excessive pointer
passing to struct sacktag_state... :-)

My testing revealed that considerable amount of skbs couldn't
be shifted because they were cloned (most likely still awaiting
tx reclaim)...

[The rest is considering future work instead since I got
repeatably EFAULT to tcpdump's recvfrom when I added
pskb_expand_head to deal with clones, so I separated that
into another, later patch]

...To counter that, I gave up on the fifth advantage:

5) When growing previous SACK block, less allocs for new skbs
   are done, basically a new alloc is needed only when new hole
   is detected and when the previous skb runs out of frags space

...which now only happens of if reclaim is fast enough to dispose
the clone before the SACK block comes in (the window is RTT long),
otherwise we'll have to alloc some.

With clones being handled I got these numbers (will be somewhat
worse without that), taken with fine-grained mibs:

                  TCPSackShifted 398
                   TCPSackMerged 877
            TCPSackShiftFallback 320
      TCPSACKCOLLAPSEFALLBACKGSO 0
  TCPSACKCOLLAPSEFALLBACKSKBBITS 0
  TCPSACKCOLLAPSEFALLBACKSKBDATA 0
    TCPSACKCOLLAPSEFALLBACKBELOW 0
    TCPSACKCOLLAPSEFALLBACKFIRST 1
 TCPSACKCOLLAPSEFALLBACKPREVBITS 318
      TCPSACKCOLLAPSEFALLBACKMSS 1
   TCPSACKCOLLAPSEFALLBACKNOHEAD 0
    TCPSACKCOLLAPSEFALLBACKSHIFT 0
          TCPSACKCOLLAPSENOOPSEQ 0
  TCPSACKCOLLAPSENOOPSMALLPCOUNT 0
     TCPSACKCOLLAPSENOOPSMALLLEN 0
             TCPSACKCOLLAPSEHOLE 12
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

832d11c5

tcp: make tcp_sacktag_one able to handle partial skb too · f58b22fd

由 Ilpo Järvinen 提交于 11月 24, 2008

This is preparatory work for SACK combiner patch which may
have to count TCP state changes for only a part of the skb
because it will intentionally avoids splitting skb to SACKed
and not sacked parts.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f58b22fd

tcp: Make SACK code to split only at mss boundaries · adb92db8

由 Ilpo Järvinen 提交于 11月 24, 2008

Sadly enough, this adds possible divide though we try to avoid
it by checking one mss as common case.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

adb92db8

tcp: more aggressive skipping · e8bae275

由 Ilpo Järvinen 提交于 11月 24, 2008

I knew already when rewriting the sacktag that this condition
was too conservative, change it now since it prevent lot of
useless work (especially in the sack shifter decision code
that is being added by a later patch). This shouldn't change
anything really, just save some processing regardless of the
shifter.
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8bae275

tcp: move tcp_simple_retransmit to tcp_input · e1aa680f

由 Ilpo Järvinen 提交于 11月 24, 2008

Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1aa680f