提交 · 2948d2ebbb98747b912ac6d0c864b4d02be8a6f5 · Linux-御风守护者 / linux

12 1月, 2008 1 次提交

[NETFILTER]: bridge: fix double POST_ROUTING invocation · 2948d2eb

由 Patrick McHardy 提交于 1月 11, 2008

The bridge code incorrectly causes two POST_ROUTING hook invocations
for DNATed packets that end up on the same bridge device. This
happens because packets with a changed destination address are passed
to dst_output() to make them go through the neighbour output function
again to build a new destination MAC address, before they will continue
through the IP hooks simulated by bridge netfilter.

The resulting hook order is:
 PREROUTING	(bridge netfilter)
 POSTROUTING	(dst_output -> ip_output)
 FORWARD	(bridge netfilter)
 POSTROUTING	(bridge netfilter)

The deferred hooks used to abort the first POST_ROUTING invocation,
but since the only thing bridge netfilter actually really wants is
a new MAC address, we can avoid going through the IP stack completely
by simply calling the neighbour output function directly.

Tested, reported and lots of data provided by: Damien Thebault <damien.thebault@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2948d2eb

11 1月, 2008 6 次提交

[NETFILTER]: xt_helper: Do not bypass RCU · 0ff4d77b

由 Jan Engelhardt 提交于 1月 10, 2008

Use the @helper variable that was just obtained.
Signed-off-by: NJan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ff4d77b

[NETFILTER]: ip6t_eui64: Fixes calculation of Universal/Local bit · 8f41f017

由 Yasuyuki Kozakai 提交于 1月 10, 2008

RFC2464 says that the next to lowerst order bit of the first octet
of the Interface Identifier is formed by complementing
the Universal/Local bit of the EUI-64. But ip6t_eui64 uses OR not XOR.

Thanks Peter Ivancik for reporing this bug and posting a patch
for it.
Signed-off-by: NYasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f41f017

[VLAN]: nested VLAN: fix lockdep's recursive locking warning · 0fe1e567

由 Jarek Poplawski 提交于 1月 10, 2008

Allow vlans nesting other vlans without lockdep's warnings (max. 2 levels
i.e. parent + child). Thanks to Patrick McHardy for pointing a bug in the
first version of this patch.

Reported-by: Benny Amorsen
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fe1e567

[DECNET] ROUTE: fix rcu_dereference() uses in /proc/net/decnet_cache · 0d89d794

由 Eric Dumazet 提交于 1月 10, 2008

In dn_rt_cache_get_next(), no need to guard seq->private by a
rcu_dereference() since seq is private to the thread running this
function. Reading seq.private once (as guaranted bu rcu_dereference())
or several time if compiler really is dumb enough wont change the
result.
 
But we miss real spots where rcu_dereference() are needed, both in
dn_rt_cache_get_first() and dn_rt_cache_get_next()
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d89d794

[BLUETOOTH]: rfcomm tty BUG_ON() code fix · f951375d

由 Dave Young 提交于 1月 10, 2008

1) In tty.c the BUG_ON at line 115 will never be called, because the the
   before list_del_init in this same function.
	115          BUG_ON(!list_empty(&dev->list));
   So move the list_del_init to rfcomm_dev_del 

2) The rfcomm_dev_del could be called from diffrent path
   (rfcomm_tty_hangup/rfcomm_dev_state_change/rfcomm_release_dev),

   So add another BUG_ON when the rfcomm_dev_del is called more than
   one time.
Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f951375d

[AX25] af_ax25: Possible circular locking. · ecd2ebde

由 Jarek Poplawski 提交于 1月 10, 2008

Bernard Pidoux F6BVP reported:
> When I killall kissattach I can see the following message.
>
> This happens on kernel 2.6.24-rc5 already patched with the 6 previously
> patches I sent recently.
>
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.23.9 #1
> -------------------------------------------------------
> kissattach/2906 is trying to acquire lock:
>  (linkfail_lock){-+..}, at: [<d8bd4603>] ax25_link_failed+0x11/0x39 [ax25]
>
> but task is already holding lock:
>  (ax25_list_lock){-+..}, at: [<d8bd7c7c>] ax25_device_event+0x38/0x84
> [ax25]
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
...

lockdep is worried about the different order here:

#1 (rose_neigh_list_lock){-+..}:
#3 (ax25_list_lock){-+..}:

#0 (linkfail_lock){-+..}:
#1 (rose_neigh_list_lock){-+..}:

#3 (ax25_list_lock){-+..}:
#0 (linkfail_lock){-+..}:

So, ax25_list_lock could be taken before and after linkfail_lock. 
I don't know if this three-thread clutch is very probable (or
possible at all), but it seems another bug reported by Bernard
("[...] system impossible to reboot with linux-2.6.24-rc5")
could have similar source - namely ax25_list_lock held by
ax25_kill_by_device() during ax25_disconnect(). It looks like the
only place which calls ax25_disconnect() this way, so I guess, it
isn't necessary.

This patch is breaking the lock for ax25_disconnect().
Reported-and-tested-by: NBernard Pidoux <f6bvp@free.fr>
Signed-off-by: NJarek Poplawski <jarkao2@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ecd2ebde

10 1月, 2008 3 次提交

[AX25]: Kill user triggable printks. · 27d1cba2

由 maximilian attems 提交于 1月 10, 2008

sfuzz can easily trigger any of those.

move the printk message to the corresponding comment: makes the
intention of the code clear and easy to pick up on an scheduled
removal.  as bonus simplify the braces placement.
Signed-off-by: Nmaximilian attems <max@stro.at>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27d1cba2

[IPV4] ROUTE: fix rcu_dereference() uses in /proc/net/rt_cache · 0bcceadc

由 Eric Dumazet 提交于 1月 10, 2008

In rt_cache_get_next(), no need to guard seq->private by a
rcu_dereference() since seq is private to the thread running this
function. Reading seq.private once (as guaranted bu rcu_dereference())
or several time if compiler really is dumb enough wont change the
result.

But we miss real spots where rcu_dereference() are needed, both in
rt_cache_get_first() and rt_cache_get_next()
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bcceadc

[NEIGH]: Fix race between neigh_parms_release and neightbl_fill_parms · 9cd40029

由 Pavel Emelyanov 提交于 1月 10, 2008

The neightbl_fill_parms() is called under the write-locked tbl->lock
and accesses the parms->dev. The negh_parm_release() calls the
dev_put(parms->dev) without this lock. This creates a tiny race window
on which the parms contains potentially stale dev pointer.

To fix this race it's enough to move the dev_put() upper under the
tbl->lock, but note, that the parms are held by neighbors and thus can
live after the neigh_parms_release() is called, so we still can have a
parm with bad dev pointer.

I didn't find where the neigh->parms->dev is accessed, but still think
that putting the dev is to be done in a place, where the parms are
really freed. Am I right with that?
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cd40029

09 1月, 2008 14 次提交

[ATM]: Check IP header validity in mpc_send_packet · 1c9b7aa1

由 Herbert Xu 提交于 1月 09, 2008

Al went through the ip_fast_csum callers and found this piece of code
that did not validate the IP header.  While root crashing the machine
by sending bogus packets through raw or AF_PACKET sockets isn't that
serious, it is still nice to react gracefully.

This patch ensures that the skb has enough data for an IP header and
that the header length field is valid.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c9b7aa1

[IPV6]: IPV6_MULTICAST_IF setting is ignored on link-local connect() · 1ac4f008

由 Brian Haley 提交于 1月 08, 2008

Signed-off-by: NBrian Haley <brian.haley@hp.com>
Acked-by: NDavid L Stevens <dlstevens@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ac4f008

[XFRM]: xfrm_algo_clone() allocates too much memory · 0f99be0d

由 Eric Dumazet 提交于 1月 08, 2008

alg_key_len is the length in bits of the key, not in bytes.

Best way to fix this is to move alg_len() function from net/xfrm/xfrm_user.c 
to include/net/xfrm.h, and to use it in xfrm_algo_clone()

alg_len() is renamed to xfrm_alg_len() because of its global exposition.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f99be0d

[LRO] Fix lro_mgr->features checks · 877364e6

由 Brice Goglin 提交于 1月 07, 2008

lro_mgr->features contains a bitmask of LRO_F_* values which are
defined as power of two, not as bit indexes.
They must be checked with x&LRO_F_FOO, not with test_bit(LRO_F_FOO,&x).
Signed-off-by: NBrice Goglin <Brice.Goglin@inria.fr>
Acked-by: NAndrew Gallatin <gallatin@myri.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

877364e6

[NET]: Clone the sk_buff 'iif' field in __skb_clone() · 02f1c89d

由 Paul Moore 提交于 1月 07, 2008

Both NetLabel and SELinux (other LSMs may grow to use it as well) rely
on the 'iif' field to determine the receiving network interface of
inbound packets.  Unfortunately, at present this field is not
preserved across a skb clone operation which can lead to garbage
values if the cloned skb is sent back through the network stack.  This
patch corrects this problem by properly copying the 'iif' field in
__skb_clone() and removing the 'iif' field assignment from
skb_act_clone() since it is no longer needed.

Also, while we are here, put the assignments in the same order as the
offsets to reduce cacheline bounces.
Signed-off-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02f1c89d

[IPV4] ROUTE: ip_rt_dump() is unecessary slow · d8c92830

由 Eric Dumazet 提交于 1月 07, 2008

I noticed "ip route list cache x.y.z.t" can be *very* slow.

While strace-ing -T it I also noticed that first part of route cache
is fetched quite fast :

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.000047>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.000042>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3740 <0.000055>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\
202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.000043>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\
202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3732 <0.000053>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3708 <0.000052>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202
GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3680 <0.000041>

while the part at the end of the table is more expensive:

recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3656 <0.003857>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\204\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3772 <0.003891>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3712 <0.003765>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3700 <0.003879>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3676 <0.003797>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"p\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\2\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3724 <0.003856>
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\234\0\0\0\30\0\2\0\254i\202GXm\0\0\2 \0\376\0\0\1\0\2"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 3736 <0.003848>

The following patch corrects this performance/latency problem,
removing quadratic behavior.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8c92830

[NET]: Stop polling when napi_disable() is pending. · fed17f30

由 David S. Miller 提交于 1月 07, 2008

This finally adds the code in net_rx_action() to break out of the
->poll()'ing loop when a napi_disable() is found to be pending.

Now, even if a device is being flooded with packets it can be cleanly
brought down.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fed17f30

mac80211: return an error when SIWRATE doesn't match any rate · 5cdfed54

由 Andrew Lutomirski 提交于 1月 03, 2008

Currently mac80211 fails silently when trying to set a nonexistent
rate.  Return an error instead.
Signed-Off-By: NAndy Lutomirski <luto@myrealbox.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

5cdfed54

[IRDA]: irda_create() nuke user triggable printk · 9e8d6f89

由 maximilian attems 提交于 1月 07, 2008

easy to trigger as user with sfuzz.

irda_create() is quiet on unknown sock->type,
match this behaviour for SOCK_DGRAM unknown protocol
Signed-off-by: Nmaximilian attems <max@stro.at>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e8d6f89

[SCTP]: Add back the code that accounted for FORWARD_TSN parameter in INIT. · 036b579b

由 Vlad Yasevich 提交于 1月 07, 2008

Some recent changes completely removed accounting for the FORWARD_TSN
parameter length in the INIT and INIT-ACK chunk.  This is wrong and
should be restored.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

036b579b

[SCTP]: Correctly handle AUTH parameters in unexpected INIT · 6df9cfc1

由 Vlad Yasevich 提交于 1月 07, 2008

When processing an unexpected INIT chunk, we do not need to
do any preservation of the old AUTH parameters.  In fact,
doing such preservations will nullify AUTH and allow connection
stealing.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6df9cfc1

[SCTP]: Fix the name of the authentication event. · f691724c

由 Vlad Yasevich 提交于 1月 07, 2008

The even should be called SCTP_AUTHENTICATION_INDICATION.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f691724c

[IPV4] ipconfig: Fix regression in ip command line processing · 92ffb85d

由 Amos Waterland 提交于 1月 05, 2008

The recent changes for ip command line processing fixed some problems
but unfortunately broke some common usage scenarios.  In current
2.6.24-rc6 the following command line results in no IP address
assignment, which is surely a regression:

 ip=10.0.2.15::10.0.2.2:255.255.255.0::eth0:off

Please find below a patch that works for all cases I can find.
Signed-off-by: NAmos Waterland <apw@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92ffb85d

[IPV4] raw: Strengthen check on validity of iph->ihl · f844c74f

由 Herbert Xu 提交于 1月 05, 2008

We currently check that iph->ihl is bounded by the real length and that
the real length is greater than the minimum IP header length.  However,
we did not check the caes where iph->ihl is less than the minimum IP
header length.

This breaks because some ip_fast_csum implementations assume that which
is quite reasonable.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f844c74f

04 1月, 2008 3 次提交

[INET]: Fix netdev renaming and inet address labels · 44344b2a

由 Mark McLoughlin 提交于 1月 04, 2008

When re-naming an interface, the previous secondary address
labels get lost e.g.

  $> brctl addbr foo
  $> ip addr add 192.168.0.1 dev foo
  $> ip addr add 192.168.0.2 dev foo label foo:00
  $> ip addr show dev foo | grep inet
    inet 192.168.0.1/32 scope global foo
    inet 192.168.0.2/32 scope global foo:00
  $> ip link set foo name bar
  $> ip addr show dev bar | grep inet
    inet 192.168.0.1/32 scope global bar
    inet 192.168.0.2/32 scope global bar:2

Turns out to be a simple thinko in inetdev_changename() - clearly we
want to look at the address label, rather than the device name, for
a suffix to retain.
Signed-off-by: NMark McLoughlin <markmc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44344b2a

[XFRM]: Do not define km_migrate() if !CONFIG_XFRM_MIGRATE · 2d60abc2

由 Eric Dumazet 提交于 1月 03, 2008

In include/net/xfrm.h we find :

#ifdef CONFIG_XFRM_MIGRATE
extern int km_migrate(struct xfrm_selector *sel, u8 dir, u8 type,
                      struct xfrm_migrate *m, int num_bundles);
...
#endif

We can also guard the function body itself in net/xfrm/xfrm_state.c
with same condition.

(Problem spoted by sparse checker)
make C=2 net/xfrm/xfrm_state.o
...
net/xfrm/xfrm_state.c:1765:5: warning: symbol 'km_migrate' was not declared. Should it be static?
...
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d60abc2

[X25]: Add missing x25_neigh_put · 76975f8a

由 Julia Lawall 提交于 1月 01, 2008

The function x25_get_neigh increments a reference count.  At the point of
the second goto out, the result of calling x25_get_neigh is only stored in
a local variable, and thus no one outside the function will be able to
decrease the reference count.  Thus, x25_neigh_put should be called before
the return in this case.

The problem was found using the following semantic match.
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>

@@
type T,T1,T2;
identifier E;
statement S;
expression x1,x2,x3;
int ret;
@@

  T E;
  ...
* if ((E = x25_get_neigh(...)) == NULL)
  S
  ... when != x25_neigh_put(...,(T1)E,...)
      when != if (E != NULL) { ... x25_neigh_put(...,(T1)E,...); ...}
      when != x1 = (T1)E
      when != E = x3;
      when any
  if (...) {
    ... when != x25_neigh_put(...,(T2)E,...)
        when != if (E != NULL) { ... x25_neigh_put(...,(T2)E,...); ...}
        when != x2 = (T2)E
(
*   return;
|
*   return ret;
)
  }
// </smpl>
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76975f8a

03 1月, 2008 1 次提交

NFS: add newline to kernel warning message in auth_gss code · 3392c349

由 James Morris 提交于 12月 26, 2007

Add newline to kernel warning message in gss_create().
Signed-off-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

3392c349

30 12月, 2007 2 次提交

[BLUETOOTH]: put_device before device_del fix · 38b7da09

由 Dave Young 提交于 12月 29, 2007

Because of workqueue delay, the put_device could be called before
device_del, so move it to del_conn.

Signed-off-by: Dave Young <hidave.darkstar@gmail.com> 
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38b7da09

[TCP]: use non-delayed ACK for congestion control RTT · 2072c228

由 Gavin McCullagh 提交于 12月 29, 2007

When a delayed ACK representing two packets arrives, there are two RTT
samples available, one for each packet.  The first (in order of seq
number) will be artificially long due to the delay waiting for the
second packet, the second will trigger the ACK and so will not itself
be delayed.

According to rfc1323, the SRTT used for RTO calculation should use the
first rtt, so receivers echo the timestamp from the first packet in
the delayed ack.  For congestion control however, it seems measuring
delayed ack delay is not desirable as it varies independently of
congestion.

The patch below causes seq_rtt and last_ackt to be updated with any
available later packet rtts which should have less (and hopefully
zero) delack delay.  The rtt value then gets passed to
ca_ops->pkts_acked().

Where TCP_CONG_RTT_STAMP was set, effort was made to supress RTTs from
within a TSO chunk (!fully_acked), using only the final ACK (which
includes any TSO delay) to generate RTTs.  This patch removes these
checks so RTTs are passed for each ACK to ca_ops->pkts_acked().

For non-delay based congestion control (cubic, h-tcp), rtt is
sometimes used for rtt-scaling.  In shortening the RTT, this may make
them a little less aggressive.  Delay-based schemes (eg vegas, veno,
illinois) should get a cleaner, more accurate congestion signal,
particularly for small cwnds.  The congestion control module can
potentially also filter out bad RTTs due to the delayed ack alarm by
looking at the associated cnt which (where delayed acking is in use)
should probably be 1 if the alarm went off or greater if the ACK was
triggered by a packet.
Signed-off-by: NGavin McCullagh <gavin.mccullagh@nuim.ie>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2072c228

29 12月, 2007 1 次提交

[IPV4] Fix ip=dhcp regression · 9cecd07c

由 Simon Horman 提交于 12月 27, 2007

David Brownell pointed out a regression in my recent "Fix ip command
line processing" patch. It turns out to be a fairly blatant oversight on
my part whereby ic_enable is never set, and thus autoconfiguration is
never enabled. Clearly my testing was broken :-(

The solution that I have is to set ic_enable to 1 if we hit
ip_auto_config_setup(), which basically means that autoconfiguration is
activated unless told otherwise. I then flip ic_enable to 0 if ip=off,
ip=none, ip=::::::off or ip=::::::none using ic_proto_name();

The incremental patch is below, let me know if a non-incremental version
is prepared, as I did as for the original patch to be reverted pending a
fix.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cecd07c

27 12月, 2007 4 次提交

[IPV4]: Fix ip command line processing. · a6c05c3d

由 Simon Horman 提交于 12月 25, 2007

Recently the documentation in Documentation/nfsroot.txt was
update to note that in fact ip=off and ip=::::::off as the
latter is ignored and the default (on) is used.

This was certainly a step in the direction of reducing confusion.
But it seems to me that the code ought to be fixed up so that
ip=::::::off actually turns off ip autoconfiguration.

This patch also notes more specifically that ip=on (aka ip=::::::on)
is the default.
Signed-off-by: NSimon Horman <horms@verge.net.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6c05c3d

[NETFILTER]: nf_conntrack_ipv4: fix module parameter compatibility · fae718dd

由 Patrick McHardy 提交于 12月 24, 2007

Some users do "modprobe ip_conntrack hashsize=...". Since we have the
module aliases this loads nf_conntrack_ipv4 and nf_conntrack, the
hashsize parameter is unknown for nf_conntrack_ipv4 however and makes
it fail.

Allow to specify hashsize= for both nf_conntrack and nf_conntrack_ipv4.

Note: the nf_conntrack message in the ringbuffer will display an
incorrect hashsize since nf_conntrack is first pulled in as a
dependency and calculates the size itself, then it gets changed
through a call to nf_conntrack_set_hashsize().
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fae718dd

mac80211: warn when receiving frames with unaligned data · 81100eb8

由 Johannes Berg 提交于 12月 18, 2007

This patch makes mac80211 warn (once) when the driver passes up a
frame in which the payload data is not aligned on a four-byte
boundary, with a long comment for people who run into the condition
and need to know what to do.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

81100eb8

mac80211: round station cleanup timer · 0d174406

由 Johannes Berg 提交于 12月 17, 2007

The station cleanup timer runs every ten seconds, the exact
timing is not relevant at all so it can well run together with
other things to save power.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

0d174406

21 12月, 2007 5 次提交

[IPV4]: OOPS with NETLINK_FIB_LOOKUP netlink socket · d883a036

由 Denis V. Lunev 提交于 12月 21, 2007

[ Regression added by changeset:
	cd40b7d3
	[NET]: make netlink user -> kernel interface synchronious
  -DaveM ]

nl_fib_input re-reuses incoming skb to send the reply. This means that this
packet will be freed twice, namely in:
- netlink_unicast_kernel
- on receive path
Use clone to send as a cure, the caller is responsible for kfree_skb on error.

Thanks to Alexey Dobryan, who originally found the problem.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d883a036

[NET]: Fix function put_cmsg() which may cause usr application memory overflow · 1ac70e7a

由 Wei Yongjun 提交于 12月 20, 2007

When used function put_cmsg() to copy kernel information to user 
application memory, if the memory length given by user application is 
not enough, by the bad length calculate of msg.msg_controllen, 
put_cmsg() function may cause the msg.msg_controllen to be a large 
value, such as 0xFFFFFFF0, so the following put_cmsg() can also write 
data to usr application memory even usr has no valid memory to store 
this. This may cause usr application memory overflow.

int put_cmsg(struct msghdr * msg, int level, int type, int len, void *data)
{
    struct cmsghdr __user *cm
        = (__force struct cmsghdr __user *)msg->msg_control;
    struct cmsghdr cmhdr;
    int cmlen = CMSG_LEN(len);
    ~~~~~~~~~~~~~~~~~~~~~
    int err;

    if (MSG_CMSG_COMPAT & msg->msg_flags)
        return put_cmsg_compat(msg, level, type, len, data);

    if (cm==NULL || msg->msg_controllen < sizeof(*cm)) {
        msg->msg_flags |= MSG_CTRUNC;
        return 0; /* XXX: return error? check spec. */
    }
    if (msg->msg_controllen < cmlen) {
    ~~~~~~~~~~~~~~~~~~~~~~~~
        msg->msg_flags |= MSG_CTRUNC;
        cmlen = msg->msg_controllen;
    }
    cmhdr.cmsg_level = level;
    cmhdr.cmsg_type = type;
    cmhdr.cmsg_len = cmlen;

    err = -EFAULT;
    if (copy_to_user(cm, &cmhdr, sizeof cmhdr))
        goto out;
    if (copy_to_user(CMSG_DATA(cm), data, cmlen - sizeof(struct cmsghdr)))
        goto out;
    cmlen = CMSG_SPACE(len);
~~~~~~~~~~~~~~~~~~~~~~~~~~~
    If MSG_CTRUNC flags is set, msg->msg_controllen is less than 
CMSG_SPACE(len), "msg->msg_controllen -= cmlen" will cause unsinged int 
type msg->msg_controllen to be a large value.
~~~~~~~~~~~~~~~~~~~~~~~~~~~
    msg->msg_control += cmlen;
    msg->msg_controllen -= cmlen;
    ~~~~~~~~~~~~~~~~~~~~~
    err = 0;
out:
    return err;
}

The same promble exists in put_cmsg_compat(). This patch can fix this 
problem.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ac70e7a

[NETFILTER] ipv4: Spelling fixes · e00ccd4a

由 Joe Perches 提交于 12月 20, 2007

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e00ccd4a

[NETFILTER]: Spelling fixes · c8238177

由 Joe Perches 提交于 12月 20, 2007

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8238177

[SCTP]: Spelling fixes · 7aa1b54b

由 Joe Perches 提交于 12月 20, 2007

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7aa1b54b

Linux-御风守护者 / linux 与 Fork 源项目一致

Linux-御风守护者 / linux
与 Fork 源项目一致