提交 · 0f7cc9a3c2bd89b15720dbf358e9b9e62af27126 · openeuler / raspberrypi-kernel

23 8月, 2013 1 次提交

tcp: increase throughput when reordering is high · 0f7cc9a3

由 Yuchung Cheng 提交于 8月 21, 2013

The stack currently detects reordering and avoid spurious
retransmission very well. However the throughput is sub-optimal under
high reordering because cwnd is increased only if the data is deliverd
in order. I.e., FLAG_DATA_ACKED check in tcp_ack().  The more packet
are reordered the worse the throughput is.

Therefore when reordering is proven high, cwnd should advance whenever
the data is delivered regardless of its ordering. If reordering is low,
conservatively advance cwnd only on ordered deliveries in Open state,
and retain cwnd in Disordered state (RFC5681).

Using netperf on a qdisc setup of 20Mbps BW and random RTT from 45ms
to 55ms (for reordering effect). This change increases TCP throughput
by 20 - 25% to near bottleneck BW.

A special case is the stretched ACK with new SACK and/or ECE mark.
For example, a receiver may receive an out of order or ECN packet with
unacked data buffered because of LRO or delayed ACK. The principle on
such an ACK is to advance cwnd on the cummulative acked part first,
then reduce cwnd in tcp_fastretrans_alert().
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f7cc9a3

21 8月, 2013 16 次提交

net: ipv6: mcast: minor: use defines for rfc3810/8.1 lengths · 9fd07841

由 Daniel Borkmann 提交于 8月 20, 2013

Instead of hard-coding length values, use a define to make it clear
where those lengths come from.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fd07841

net: ipv6: minor: *_start_timer: rather use unsigned long · c2cef4e8

由 Daniel Borkmann 提交于 8月 20, 2013

For the functions mld_gq_start_timer(), mld_ifc_start_timer(),
and mld_dad_start_timer(), rather use unsigned long than int
as we operate only on unsigned values anyway. This seems more
appropriate as there is no good reason to do type conversions
to int, that could lead to future errors.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2cef4e8

net: ipv6: igmp6_event_query: use msecs_to_jiffies · 84698963

由 Daniel Borkmann 提交于 8月 20, 2013

Use proper API functions to calculate jiffies from milliseconds and
not the crude method of dividing HZ by a value. This ensures more
accurate values even in the case of strange HZ values. While at it,
also simplify code in the mlh2 case by using max().
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84698963

ip6_tunnel: ensure to always have a link local address · e837735e

由 Nicolas Dichtel 提交于 8月 20, 2013

When an Xin6 tunnel is set up, we check other netdevices to inherit the link-
local address. If none is available, the interface will not have any link-local
address. RFC4862 expects that each interface has a link local address.

Now than this kind of tunnels supports x-netns, it's easy to fall in this case
(by creating the tunnel in a netns where ethernet interfaces stand and then
moving it to a other netns where no ethernet interface is available).

RFC4291, Appendix A suggests two methods: the first is the one currently
implemented, the second is to generate a unique identifier, so that we can
always generate the link-local address. Let's use eth_random_addr() to generate
this interface indentifier.

I remove completly the previous method, hence for the whole life of the
interface, the link-local address remains the same (previously, it depends on
which ethernet interfaces were up when the tunnel interface was set up).
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e837735e

Revert "ipv6: fix checkpatch errors in net/ipv6/addrconf.c" · 7eaa48a4

由 David S. Miller 提交于 8月 20, 2013

This reverts commit df8372ca.

These changes are buggy and make unintended semantic changes
to ip6_tnl_add_linklocal().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7eaa48a4

ipip: dereferencing an ERR_PTR in ip_tunnel_init_net() · ea857f28

由 Dan Carpenter 提交于 8月 19, 2013

We need to move the derefernce after the IS_ERR() check.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea857f28

ipv4: raise IP_MAX_MTU to theoretical limit · 734d2725

由 Eric Dumazet 提交于 8月 18, 2013

As discussed last year [1], there is no compelling reason
to limit IPv4 MTU to 0xFFF0, while real limit is 0xFFFF

[1] : http://marc.info/?l=linux-netdev&m=135607247609434&w=2

Willem raised this issue again because some of our internal
regression tests broke after lo mtu being set to 65536.

IP_MTU reports 0xFFF0, and the test attempts to send a RAW datagram of
mtu + 1 bytes, expecting the send() to fail, but it does not.

Alexey raised interesting points about TCP MSS, that should be addressed
in follow-up patches in TCP stack if needed, as someone could also set
an odd mtu anyway.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

734d2725

tcp: trivial: Remove nocache argument from tcp_v4_send_synack · 397b4174

由 Christoph Paasch 提交于 8月 18, 2013

The nocache-argument was used in tcp_v4_send_synack as an argument to
inet_csk_route_req. However, since ba3f7f04 (ipv4: Kill
FLOWI_FLAG_RT_NOCACHE and associated code.) this is no more used.

This patch removes the unsued argument from tcp_v4_send_synack.
Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

397b4174

ipv6: fix checkpatch errors in net/ipv6/addrconf.c · df8372ca

由 dingtianhong 提交于 8月 17, 2013

ERROR: code indent should use tabs where possible: fix 2.
ERROR: do not use assignment in if condition: fix 5.
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df8372ca

ipv6: convert the uses of ADBG and remove the superfluous parentheses · ba3542e1

由 dingtianhong 提交于 8月 17, 2013

Just follow the Joe Perches's opinion, it is a better way to fix the
style errors.
Suggested-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba3542e1

6lowpan: handle context based source address · 65d892c8

由 Alexander Aring 提交于 8月 16, 2013

Handle context based address when an unspecified address is given.
For other context based address we print a warning and drop the packet
because we don't support it right now.
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Reviewed-by: NWerner Almesberger <werner@almesberger.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65d892c8

6lowpan: lowpan_uncompress_addr with address_mode · ce2463b2

由 Alexander Aring 提交于 8月 16, 2013

This patch drops the pre and postcount calculation from the
lowpan_uncompress_addr function.We use instead a switch/case
over address_mode value. The original implementation has several
bugs in this function and it was hard to decrypt how it works.
To make it maintainable and fix these bugs this patch basically
reimplements lowpan_uncompress_addr from scratch.

A list of bugs we found in the current implementation:

1) Properly support uncompression of short-address based IPv6 addresses
   (instead of basically copying garbage)

2) Fix use and uncompression of long-addresses based IPv6 addresses

3) Add missing ff:fe00 in the case of SAM/DAM = 2 and M = 0
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Reviewed-by: NWerner Almesberger <werner@almesberger.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce2463b2

6lowpan: add function to uncompress multicast addr · 84c2e7bc

由 Alexander Aring 提交于 8月 16, 2013

Add function to uncompress multicast address.
This function split the uncompress function for a multicast address
in a seperate function.

To uncompress a multicast address is different than a other
non-multicasts addresses according to rfc6282.
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Reviewed-by: NWerner Almesberger <werner@almesberger.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84c2e7bc

6lowpan: introduce lowpan_fetch_skb function · 4666669f

由 Alexander Aring 提交于 8月 16, 2013

This patch adds a helper function to parse the ipv6 header to a
6lowpan header in stream.

This function checks first if we can pull data with a specific
length from a skb. If this seems to be okay, we copy skb data to
a destination pointer and run skb_pull.
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Reviewed-by: NWerner Almesberger <werner@almesberger.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4666669f

6lowpan: Fix fragmentation with link-local compressed addresses · 31afe1f7

由 David Hauweele 提交于 8月 16, 2013

When a new 6lowpan fragment is received, a skbuff is allocated for
the reassembled packet. However when a 6lowpan packet compresses
link-local addresses based on link-layer addresses, the processing
function relies on the skb mac control block to find the related
link-layer address.

This patch copies the control block from the first fragment into
the newly allocated skb to keep a trace of the link-layer addresses
in case of a link-local compressed address.

Edit: small changes on comment issue
Signed-off-by: NDavid Hauweele <david@hauweele.net>
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Reviewed-by: NWerner Almesberger <werner@almesberger.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31afe1f7

6lowpan: init ipv6hdr buffer to zero · 84ce1ddf

由 Alexander Aring 提交于 8月 16, 2013

This patch simplify the handling to set fields inside of struct ipv6hdr
to zero. Instead of setting some memory regions with memset to zero we
initialize the whole ipv6hdr to zero.

This is a simplification for parsing the 6lowpan header for the upcomming
patches.
Signed-off-by: NAlexander Aring <alex.aring@gmail.com>
Reviewed-by: NWerner Almesberger <werner@almesberger.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

84ce1ddf

20 8月, 2013 1 次提交

openvswitch: Add vxlan tunneling support. · 58264848

由 Pravin B Shelar 提交于 8月 19, 2013

Following patch adds vxlan vport type for openvswitch using
vxlan api. So now there is vxlan dependency for openvswitch.

CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58264848

16 8月, 2013 2 次提交

netlink: Eliminate kmalloc in netlink dump operation. · 16b304f3

由 Pravin B Shelar 提交于 8月 15, 2013

Following patch stores struct netlink_callback in netlink_sock
to avoid allocating and freeing it on every netlink dump msg.
Only one dump operation is allowed for a given socket at a time
therefore we can safely convert cb pointer to cb struct inside
netlink_sock.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16b304f3

net: proc_fs: trivial: print UIDs as unsigned int · d14c5ab6

由 Francesco Fusco 提交于 8月 15, 2013

UIDs are printed in the proc_fs as signed int, whereas
they are unsigned int.
Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d14c5ab6

15 8月, 2013 9 次提交

net_sched: restore "linklayer atm" handling · 8a8e3d84

由 Jesper Dangaard Brouer 提交于 8月 14, 2013

commit 56b765b7 ("htb: improved accuracy at high rates")
broke the "linklayer atm" handling.

 tc class add ... htb rate X ceil Y linklayer atm

The linklayer setting is implemented by modifying the rate table
which is send to the kernel.  No direct parameter were
transferred to the kernel indicating the linklayer setting.

The commit 56b765b7 ("htb: improved accuracy at high rates")
removed the use of the rate table system.

To keep compatible with older iproute2 utils, this patch detects
the linklayer by parsing the rate table.  It also supports future
versions of iproute2 to send this linklayer parameter to the
kernel directly. This is done by using the __reserved field in
struct tc_ratespec, to convey the choosen linklayer option, but
only using the lower 4 bits of this field.

Linklayer detection is limited to speeds below 100Mbit/s, because
at high rates the rtab is gets too inaccurate, so bad that
several fields contain the same values, this resembling the ATM
detect.  Fields even start to contain "0" time to send, e.g. at
1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
been more broken than we first realized.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a8e3d84

rtnetlink: remove an unneeded test · fce9b9be

由 Dan Carpenter 提交于 8月 14, 2013

We know that "dev" is a valid pointer at this point, so we can remove
the test and clean up a little.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fce9b9be

ip6tnl: add x-netns support · 0bd87628

由 Nicolas Dichtel 提交于 8月 13, 2013

This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.

When one of the two netns is removed, the tunnel is destroyed.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bd87628

ipip: add x-netns support · 6c742e71

由 Nicolas Dichtel 提交于 8月 13, 2013

When one of the two netns is removed, the tunnel is destroyed.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c742e71

ipv4 tunnels: use net_eq() helper to check netns · fc8f999d

由 Nicolas Dichtel 提交于 8月 13, 2013

It's better to use available helpers for these tests.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc8f999d

dev: move skb_scrub_packet() after eth_type_trans() · 64261f23

由 Nicolas Dichtel 提交于 8月 13, 2013

skb_scrub_packet() was called before eth_type_trans() to let eth_type_trans()
set pkt_type.

In fact, we should force pkt_type to PACKET_HOST, so move the call after
eth_type_trans().
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64261f23

openvswitch: Reset tunnel key between input and output. · 36bf5cc6

由 Jesse Gross 提交于 8月 14, 2013

It doesn't make sense to output a tunnel packet using the same
parameters that it was received with since that will generally
just result in the packet going back to us. As a result, userspace
assumes that the tunnel key is cleared when transitioning through
the switch. In the majority of cases this doesn't matter since a
packet is either going to a tunnel port (in which the key is
overwritten with new values) or to a non-tunnel port (in which
case the key is ignored). However, it's theoreticaly possible that
userspace could rely on the documented behavior, so this corrects
it.
Signed-off-by: NJesse Gross <jesse@nicira.com>

36bf5cc6

openvswitch: Use correct type while allocating flex array. · 42415c90

由 Pravin B Shelar 提交于 7月 30, 2013

Flex array is used to allocate hash buckets which is type struct
hlist_head, but we use `struct hlist_head *` to calculate
array size.  Since hlist_head is of size pointer it works fine.

Following patch use correct type.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

42415c90

openvswitch: Fix bad merge resolution. · 30444e98

由 Jesse Gross 提交于 5月 13, 2013

git silently included an extra hunk in vport_cmd_set() during
automatic merging. This code is unreachable so it does not actually
introduce a problem but it is clearly incorrect.
Signed-off-by: NJesse Gross <jesse@nicira.com>

30444e98

14 8月, 2013 5 次提交

rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header · 3e805ad2

由 Asbjoern Sloth Toennesen 提交于 8月 12, 2013

Fix the iproute2 command `bridge vlan show`, after switching from
rtgenmsg to ifinfomsg.

Let's start with a little history:

Feb 20:   Vlad Yasevich got his VLAN-aware bridge patchset included in
          the 3.9 merge window.
          In the kernel commit 6cbdceeb, he added attribute support to
          bridge GETLINK requests sent with rtgenmsg.

Mar 6th:  Vlad got this iproute2 reference implementation of the bridge
          vlan netlink interface accepted (iproute2 9eff0e5c)

Apr 25th: iproute2 switched from using rtgenmsg to ifinfomsg (63338dca)
          http://patchwork.ozlabs.org/patch/239602/
          http://marc.info/?t=136680900700007

Apr 28th: Linus released 3.9

Apr 30th: Stephen released iproute2 3.9.0

The `bridge vlan show` command haven't been working since the switch to
ifinfomsg, or in a released version of iproute2. Since the kernel side
only supports rtgenmsg, which iproute2 switched away from just prior to
the iproute2 3.9.0 release.

I haven't been able to find any documentation, about neither rtgenmsg
nor ifinfomsg, and in which situation to use which, but kernel commit
88c5b5ce seams to suggest that ifinfomsg should be used.

Fixing this in kernel will break compatibility, but I doubt that anybody
have been using it due to this bug in the user space reference
implementation, at least not without noticing this bug. That said the
functionality is still fully functional in 3.9, when reversing iproute2
commit 63338dca.

This could also be fixed in iproute2, but thats an ugly patch that would
reintroduce rtgenmsg in iproute2, and from searching in netdev it seams
like rtgenmsg usage is discouraged. I'm assuming that the only reason
that Vlad implemented the kernel side to use rtgenmsg, was because
iproute2 was using it at the time.
Signed-off-by: NAsbjoern Sloth Toennesen <ast@fiberby.net>
Reviewed-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3e805ad2

ipv6: make unsolicited report intervals configurable for mld · fc4eba58

由 Hannes Frederic Sowa 提交于 8月 14, 2013

Commit cab70040 ("net: igmp:
Reduce Unsolicited report interval to 1s when using IGMPv3") and
2690048c ("net: igmp: Allow user-space
configuration of igmp unsolicited report interval") by William Manley made
igmp unsolicited report intervals configurable per interface and corrected
the interval of unsolicited igmpv3 report messages resendings to 1s.

Same needs to be done for IPv6:

MLDv1 (RFC2710 7.10.): 10 seconds
MLDv2 (RFC3810 9.11.): 1 second

Both intervals are configurable via new procfs knobs
mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval.

(also added .force_mld_version to ipv6_devconf_dflt to bring structs in
line without semantic changes)

v2:
a) Joined documentation update for IPv4 and IPv6 MLD/IGMP
   unsolicited_report_interval procfs knobs.
b) incorporate stylistic feedback from William Manley

v3:
a) add new DEVCONF_* values to the end of the enum (thanks to David
   Miller)

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: William Manley <william.manley@youview.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc4eba58

ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id. · 4221f405

由 Pravin B Shelar 提交于 8月 13, 2013

Using inner-id for tunnel id is not safe in some rare cases.
E.g. packets coming from multiple sources entering same tunnel
can have same id. Therefore on tunnel packet receive we
could have packets from two different stream but with same
source and dst IP with same ip-id which could confuse ip packet
reassembly.

Following patch reverts optimization from commit
490ab081 (IP_GRE: Fix IP-Identification.)

CC: Jarno Rajahalme <jrajahalme@nicira.com>
CC: Ansis Atteka <aatteka@nicira.com>
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4221f405

tcp: reset reordering est. selectively on timeout · 74c181d5

由 Yuchung Cheng 提交于 8月 12, 2013

On timeout the TCP sender unconditionally resets the estimated degree
of network reordering (tp->reordering). The idea behind this is that
the estimate is too large to trigger fast recovery (e.g., due to a IP
path change).

But for example if the sender only had 2 packets outstanding, then a
timeout doesn't tell much about reordering. A sender that learns about
reordering on big writes and loses packets on small writes will end up
falsely retransmitting again and again, especially when reordering is
more likely on big writes.

Therefore the sender should only suspect that tp->reordering is too
high if it could have gone into fast recovery with the (lower) default
estimate.
Signed-off-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74c181d5

net: sctp: Add rudimentary infrastructure to account for control chunks · 072017b4

由 Vlad Yasevich 提交于 8月 09, 2013

This patch adds a base infrastructure that allows SCTP to do
memory accounting for control chunks.  Real accounting code will
follow.

This patch alos fixes the following triggered bug ...

[  553.109742] kernel BUG at include/linux/skbuff.h:1813!
[  553.109766] invalid opcode: 0000 [#1] SMP
[  553.109789] Modules linked in: sctp libcrc32c rfcomm [...]
[  553.110259]  uinput i915 i2c_algo_bit drm_kms_helper e1000e drm ptp
pps_core i2c_core wmi video sunrpc
[  553.110320] CPU: 0 PID: 1636 Comm: lt-test_1_to_1_ Not tainted
3.11.0-rc3+ #2
[  553.110350] Hardware name: LENOVO 74597D6/74597D6, BIOS 6DET60WW
(3.10 ) 09/17/2009
[  553.110381] task: ffff88020a01dd40 ti: ffff880204ed0000 task.ti:
ffff880204ed0000
[  553.110411] RIP: 0010:[<ffffffffa0698017>]  [<ffffffffa0698017>]
skb_orphan.part.9+0x4/0x6 [sctp]
[  553.110459] RSP: 0018:ffff880204ed1bb8  EFLAGS: 00010286
[  553.110483] RAX: ffff8802086f5a40 RBX: ffff880204303300 RCX:
0000000000000000
[  553.110487] RDX: ffff880204303c28 RSI: ffff8802086f5a40 RDI:
ffff880202158000
[  553.110487] RBP: ffff880204ed1bb8 R08: 0000000000000000 R09:
0000000000000000
[  553.110487] R10: ffff88022f2d9a04 R11: ffff880233001600 R12:
0000000000000000
[  553.110487] R13: ffff880204303c00 R14: ffff8802293d0000 R15:
ffff880202158000
[  553.110487] FS:  00007f31b31fe740(0000) GS:ffff88023bc00000(0000)
knlGS:0000000000000000
[  553.110487] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  553.110487] CR2: 000000379980e3e0 CR3: 000000020d225000 CR4:
00000000000407f0
[  553.110487] Stack:
[  553.110487]  ffff880204ed1ca8 ffffffffa068d7fc 0000000000000000
0000000000000000
[  553.110487]  0000000000000000 ffff8802293d0000 ffff880202158000
ffffffff81cb7900
[  553.110487]  0000000000000000 0000400000001c68 ffff8802086f5a40
000000000000000f
[  553.110487] Call Trace:
[  553.110487]  [<ffffffffa068d7fc>] sctp_sendmsg+0x6bc/0xc80 [sctp]
[  553.110487]  [<ffffffff8128f185>] ? sock_has_perm+0x75/0x90
[  553.110487]  [<ffffffff815a3593>] inet_sendmsg+0x63/0xb0
[  553.110487]  [<ffffffff8128f2b3>] ? selinux_socket_sendmsg+0x23/0x30
[  553.110487]  [<ffffffff8151c5d6>] sock_sendmsg+0xa6/0xd0
[  553.110487]  [<ffffffff81637b05>] ? _raw_spin_unlock_bh+0x15/0x20
[  553.110487]  [<ffffffff8151cd38>] SYSC_sendto+0x128/0x180
[  553.110487]  [<ffffffff8151ce6b>] ? SYSC_connect+0xdb/0x100
[  553.110487]  [<ffffffffa0690031>] ? sctp_inet_listen+0x71/0x1f0
[sctp]
[  553.110487]  [<ffffffff8151d35e>] SyS_sendto+0xe/0x10
[  553.110487]  [<ffffffff81640202>] system_call_fastpath+0x16/0x1b
[  553.110487] Code: e0 48 c7 c7 00 22 6a a0 e8 67 a3 f0 e0 48 c7 [...]
[  553.110487] RIP  [<ffffffffa0698017>] skb_orphan.part.9+0x4/0x6
[sctp]
[  553.110487]  RSP <ffff880204ed1bb8>
[  553.121578] ---[ end trace 46c20c5903ef5be2 ]---

The approach taken here is to split data and control chunks
creation a  bit.  Data chunks already have memory accounting
so noting needs to happen.  For control chunks, add stubs handlers.
Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

072017b4

13 8月, 2013 5 次提交

netfilter: nfnetlink_queue: allow to attach expectations to conntracks · bd077937

由 Pablo Neira Ayuso 提交于 8月 07, 2013

This patch adds the capability to attach expectations via nfnetlink_queue.
This is required by conntrack helpers that trigger expectations based on
the first packet seen like the TFTP and the DHCPv6 user-space helpers.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

bd077937

netfilter: ctnetlink: refactor ctnetlink_create_expect · 0ef71ee1

由 Pablo Neira Ayuso 提交于 8月 07, 2013

This patch refactors ctnetlink_create_expect by spliting it in two
chunks. As a result, we have a new function ctnetlink_alloc_expect
to allocate and to setup the expectation from ctnetlink.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0ef71ee1

genetlink: fix family dump race · 58ad436f

由 Johannes Berg 提交于 8月 13, 2013

When dumping generic netlink families, only the first dump call
is locked with genl_lock(), which protects the list of families,
and thus subsequent calls can access the data without locking,
racing against family addition/removal. This can cause a crash.
Fix it - the locking needs to be conditional because the first
time around it's already locked.

A similar bug was reported to me on an old kernel (3.4.47) but
the exact scenario that happened there is no longer possible,
on those kernels the first round wasn't locked either. Looking
at the current code I found the race described above, which had
also existed on the old kernel.

Cc: stable@vger.kernel.org
Reported-by: NAndrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58ad436f

net: sctp: sctp_transport_destroy{, _rcu}: fix potential pointer corruption · 771085d6

由 Daniel Borkmann 提交于 8月 09, 2013

Probably this one is quite unlikely to be triggered, but it's more safe
to do the call_rcu() at the end after we have dropped the reference on
the asoc and freed sctp packet chunks. The reason why is because in
sctp_transport_destroy_rcu() the transport is being kfree()'d, and if
we're unlucky enough we could run into corrupted pointers. Probably
that's more of theoretical nature, but it's safer to have this simple fix.

Introduced by commit 8c98653f ("sctp: sctp_close: fix release of bindings
for deferred call_rcu's"). I also did the 8c98653f regression test and
it's fine that way.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

771085d6

net: sctp: sctp_assoc_control_transport: fix MTU size in SCTP_PF state · ac4f9599

由 Daniel Borkmann 提交于 8月 09, 2013

The SCTP Quick failover draft [1] section 5.1, point 5 says that the cwnd
should be 1 MTU. So, instead of 1, set it to 1 MTU.

  [1] https://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05Reported-by: NKarl Heiss <kheiss@gmail.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac4f9599

12 8月, 2013 1 次提交

af_unix: fix bug on large send() · f3dfd208

由 Eric Dumazet 提交于 8月 11, 2013

commit e370a723 ("af_unix: improve STREAM behavior with fragmented
memory") added a bug on large send() because the
skb_copy_datagram_from_iovec() call always start from the beginning
of iovec.

We must instead use the @sent variable to properly skip the
already processed part.
Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3dfd208