提交 · e8388eb10371745627d1e538e018cb10ded86aa7 · openeuler / raspberrypi-kernel

27 2月, 2014 5 次提交

af_rxrpc: Request an ACK for every alternate DATA packet · e8388eb1

由 David Howells 提交于 2月 14, 2014

Set the RxRPC header flag to request an ACK packet for every odd-numbered DATA
packet unless it's the last one (which implicitly requests an ACK anyway).
This is similar to how librx appears to work.

If we don't do this, we'll send out a full window of packets and then just sit
there until the other side gets bored and sends an ACK to indicate that it's
been idle for a while and has received no new packets.

Requesting a lot of ACKs shouldn't be a problem as ACKs should be merged when
possible.

As AF_RXRPC currently works, it will schedule an ACK to be generated upon
receipt of a DATA packet with the ACK-request packet set - and in the time
taken to schedule this in a work queue, several other packets are likely to
arrive and then all get ACK'd together.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

e8388eb1

af_rxrpc: Expose more RxRPC parameters via sysctls · 817913d8

由 David Howells 提交于 2月 07, 2014

Expose RxRPC parameters via sysctls to control the Rx window size, the Rx MTU
maximum size and the number of packets that can be glued into a jumbo packet.

More info added to Documentation/networking/rxrpc.txt.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

817913d8

af_rxrpc: Improve ACK production · 9823f39a

由 David Howells 提交于 2月 07, 2014

Improve ACK production by the following means:

 (1) Don't send an ACK_REQUESTED ack immediately even if the RXRPC_MORE_PACKETS
     flag isn't set on a data packet that has also has RXRPC_REQUEST_ACK set.

     MORE_PACKETS just means that the sender just emptied its Tx data buffer.
     More data will be forthcoming unless RXRPC_LAST_PACKET is also flagged.

     It is possible to see runs of DATA packets with MORE_PACKETS unset that
     aren't waiting for an ACK.

     It is therefore better to wait a small instant to see if we can combine an
     ACK for several packets.

 (2) Don't send an ACK_IDLE ack immediately unless we're responding to the
     terminal data packet of a call.

     Whilst sending an ACK_IDLE mid-call serves to let the other side know
     that we won't be asking it to resend certain Tx buffers and that it can
     discard them, spamming it with loads of acks just because we've
     temporarily run out of data just distracts it.

 (3) Put the ACK_IDLE ack generation timeout up to half a second rather than a
     single jiffy.  Just because we haven't been given more data immediately
     doesn't mean that more isn't forthcoming.  The other side may be busily
     finding the data to send to us.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

9823f39a

af_rxrpc: Add sysctls for configuring RxRPC parameters · 5873c083

由 David Howells 提交于 2月 07, 2014

Add sysctls for configuring RxRPC protocol handling, specifically controls on
delays before ack generation, the delay before resending a packet, the maximum
lifetime of a call and the expiration times of calls, connections and
transports that haven't been recently used.

More info added in Documentation/networking/rxrpc.txt.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

5873c083

af_rxrpc: Fix UDP MTU calculation from ICMP_FRAG_NEEDED · 6c9a2d32

由 David Howells 提交于 2月 14, 2014

AF_RXRPC sends UDP packets with the "Don't Fragment" bit set in an attempt to
determine the maximum packet size between the local socket and the peer by
invoking the generation of ICMP_FRAG_NEEDED packets.

Once a packet is sent with the "Don't Fragment" bit set, it is then
inconvenient to break it up as that requires recalculating all the rxrpc serial
and sequence numbers and reencrypting all the fragments, so we switch off the
"Don't Fragment" service temporarily and send the bounced packet again.  Future
packets then use the new MTU.

That's all fine.  The problem lies in rxrpc_UDP_error_report() where the code
that deals with ICMP_FRAG_NEEDED packets lives.  Packets of this type have a
field (ee_info) to indicate the maximum packet size at the reporting node - but
sometimes ee_info isn't filled in and is just left as 0 and the code must allow
for this.

When ee_info is 0, the code should take the MTU size we're currently using and
reduce it for the next packet we want to send.  However, it takes ee_info
(which is known to be 0) and tries to reduce that instead.

This was discovered by Coverity.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

6c9a2d32

08 2月, 2014 2 次提交

af_rxrpc: Prevent RxRPC peers from ABORT-storming one another · b6f3a40c

由 Tim Smith 提交于 2月 07, 2014

When an ABORT is sent, aborting a connection, the sender quite reasonably
forgets about the connection. If another frame is received, another ABORT
will be sent. When the receiver gets it, it no longer applies to an extant
connection, so an ABORT is sent, and so on...

Prevent this by never sending a rejection for an ABORT packet.
Signed-off-by: NTim Smith <tim@electronghost.co.uk>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

b6f3a40c

af_rxrpc: Remove incorrect checksum calculation from rxrpc_recvmsg() · 8961749e

由 Tim Smith 提交于 2月 07, 2014

The UDP checksum was already verified in rxrpc_data_ready() - which calls
skb_checksum_complete() - as the RxRPC packet header contains no checksum of
its own. Subsequent calls to skb_copy_and_csum_datagram_iovec() are thus
redundant and are, in any case, being passed only a subset of the UDP payload -
so the checksum will always fail if that path is taken.

So there is no need to check skb->ip_summed in rxrpc_recvmsg(), and no need for
the csum_copy_error: exit path.
Signed-off-by: NTim Smith <tim@electronghost.co.uk>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

8961749e

31 1月, 2014 1 次提交

x86, x32: Correct invalid use of user timespec in the kernel · 2def2ef2

由 PaX Team 提交于 1月 30, 2014

The x32 case for the recvmsg() timout handling is broken:

  asmlinkage long compat_sys_recvmmsg(int fd, struct compat_mmsghdr __user *mmsg,
                                      unsigned int vlen, unsigned int flags,
                                      struct compat_timespec __user *timeout)
  {
          int datagrams;
          struct timespec ktspec;

          if (flags & MSG_CMSG_COMPAT)
                  return -EINVAL;

          if (COMPAT_USE_64BIT_TIME)
                  return __sys_recvmmsg(fd, (struct mmsghdr __user *)mmsg, vlen,
                                        flags | MSG_CMSG_COMPAT,
                                        (struct timespec *) timeout);
          ...

The timeout pointer parameter is provided by userland (hence the __user
annotation) but for x32 syscalls it's simply cast to a kernel pointer
and is passed to __sys_recvmmsg which will eventually directly
dereference it for both reading and writing.  Other callers to
__sys_recvmmsg properly copy from userland to the kernel first.

The bug was introduced by commit ee4fa23c ("compat: Use
COMPAT_USE_64BIT_TIME in net/compat.c") and should affect all kernels
since 3.4 (and perhaps vendor kernels if they backported x32 support
along with this code).

Note that CONFIG_X86_X32_ABI gets enabled at build time and only if
CONFIG_X86_X32 is enabled and ld can build x32 executables.

Other uses of COMPAT_USE_64BIT_TIME seem fine.

This addresses CVE-2014-0038.
Signed-off-by: NPaX Team <pageexec@freemail.hu>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.4+
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2def2ef2

29 1月, 2014 2 次提交

net: Fix warning on make htmldocs caused by skbuff.c · 7fceb4de

由 Masanari Iida 提交于 1月 29, 2014

This patch fixed following Warning while executing "make htmldocs".

Warning(/net/core/skbuff.c:2164): No description found for parameter 'from'
Warning(/net/core/skbuff.c:2164): Excess function parameter 'source'
description in 'skb_zerocopy'
Replace "@source" with "@from" fixed the warning.
Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fceb4de

llc: remove noisy WARN from llc_mac_hdr_init · 0f1a24c9

由 Dave Jones 提交于 1月 28, 2014

Sending malformed llc packets triggers this spew, which seems excessive.

WARNING: CPU: 1 PID: 6917 at net/llc/llc_output.c:46 llc_mac_hdr_init+0x85/0x90 [llc]()
device type not supported: 0
CPU: 1 PID: 6917 Comm: trinity-c1 Not tainted 3.13.0+ #95
 0000000000000009 00000000007e257d ffff88009232fbe8 ffffffffac737325
 ffff88009232fc30 ffff88009232fc20 ffffffffac06d28d ffff88020e07f180
 ffff88009232fec0 00000000000000c8 0000000000000000 ffff88009232fe70
Call Trace:
 [<ffffffffac737325>] dump_stack+0x4e/0x7a
 [<ffffffffac06d28d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffffac06d30c>] warn_slowpath_fmt+0x5c/0x80
 [<ffffffffc01736d5>] llc_mac_hdr_init+0x85/0x90 [llc]
 [<ffffffffc0173759>] llc_build_and_send_ui_pkt+0x79/0x90 [llc]
 [<ffffffffc057cdba>] llc_ui_sendmsg+0x23a/0x400 [llc2]
 [<ffffffffac605d8c>] sock_sendmsg+0x9c/0xe0
 [<ffffffffac185a37>] ? might_fault+0x47/0x50
 [<ffffffffac606321>] SYSC_sendto+0x121/0x1c0
 [<ffffffffac011847>] ? syscall_trace_enter+0x207/0x270
 [<ffffffffac6071ce>] SyS_sendto+0xe/0x10
 [<ffffffffac74aaa4>] tracesys+0xdd/0xe2

Until 2009, this was a printk, when it was changed in
bf9ae538: "llc: use dev_hard_header".

Let userland figure out what -EINVAL means by itself.
Signed-off-by: NDave Jones <davej@fedoraproject.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f1a24c9

28 1月, 2014 13 次提交

net: gre: use icmp_hdr() to get inner ip header · c0c0c50f

由 Duan Jiong 提交于 1月 28, 2014

When dealing with icmp messages, the skb->data points the
ip header that triggered the sending of the icmp message.

In gre_cisco_err(), the parse_gre_header() is called, and the
iptunnel_pull_header() is called to pull the skb at the end of
the parse_gre_header(), so the skb->data doesn't point the
inner ip header.

Unfortunately, the ipgre_err still needs those ip addresses in
inner ip header to look up tunnel by ip_tunnel_lookup().

So just use icmp_hdr() to get inner ip header instead of skb->data.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0c0c50f

net: 6lowpan: fixup for code movement · ce60e0c4

由 Stephen Rothwell 提交于 1月 07, 2014

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce60e0c4

net: Fix memory leak if TPROXY used with TCP early demux · a452ce34

由 Holger Eitzenberger 提交于 1月 27, 2014

I see a memory leak when using a transparent HTTP proxy using TPROXY
together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):

unreferenced object 0xffff88008cba4a40 (size 1696):
  comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
  hex dump (first 32 bytes):
    0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
    02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9
    [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5
    [<ffffffff812702cf>] sk_clone_lock+0x14/0x283
    [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b
    [<ffffffff8129a893>] netlink_broadcast+0x14/0x16
    [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3
    [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d
    [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0
    [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e
    [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55
    [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725
    [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154
    [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514
    [<ffffffff8127aa77>] process_backlog+0xee/0x1c5
    [<ffffffff8127c949>] net_rx_action+0xa7/0x200
    [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157

But there are many more, resulting in the machine going OOM after some
days.

From looking at the TPROXY code, and with help from Florian, I see
that the memory leak is introduced in tcp_v4_early_demux():

  void tcp_v4_early_demux(struct sk_buff *skb)
  {
    /* ... */

    iph = ip_hdr(skb);
    th = tcp_hdr(skb);

    if (th->doff < sizeof(struct tcphdr) / 4)
        return;

    sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
                       iph->saddr, th->source,
                       iph->daddr, ntohs(th->dest),
                       skb->skb_iif);
    if (sk) {
        skb->sk = sk;

where the socket is assigned unconditionally to skb->sk, also bumping
the refcnt on it.  This is problematic, because in our case the skb
has already a socket assigned in the TPROXY target.  This then results
in the leak I see.

The very same issue seems to be with IPv6, but haven't tested.
Reviewed-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a452ce34

libceph: follow redirect replies from osds · 205ee118