提交 · 6e28099d38c0e50d62c1afc054e37e573adf3d21 · openanolis / cloud-kernel

27 2月, 2017 2 次提交

ipv4: mask tos for input route · 6e28099d

由 Julian Anastasov 提交于 2月 26, 2017

Restore the lost masking of TOS in input route code to
allow ip rules to match it properly.

Problem [1] noticed by Shmulik Ladkani <shmulik.ladkani@gmail.com>

[1] http://marc.info/?t=137331755300040&r=1&w=2

Fixes: 89aef892 ("ipv4: Delete routing cache.")
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e28099d

ipv4: add missing initialization for flowi4_uid · 8bcfd092

由 Julian Anastasov 提交于 2月 26, 2017

Avoid matching of random stack value for uid when rules
are looked up on input route or when RP filter is used.
Problem should affect only setups that use ip rules with
uid range.

Fixes: 622ec2c9 ("net: core: add UID to flows, rules, and routes")
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8bcfd092

23 2月, 2017 2 次提交

tcp: account for ts offset only if tsecr not zero · eee2faab

由 Alexey Kodanev 提交于 2月 22, 2017

We can get SYN with zero tsecr, don't apply offset in this case.

Fixes: ee684b6f ("tcp: send packets with a socket timestamp")
Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eee2faab

tcp: setup timestamp offset when write_seq already set · 00355fa5

由 Alexey Kodanev 提交于 2月 22, 2017

Found that when randomized tcp offsets are enabled (by default)
TCP client can still start new connections without them. Later,
if server does active close and re-uses sockets in TIME-WAIT
state, new SYN from client can be rejected on PAWS check inside
tcp_timewait_state_process(), because either tw_ts_recent or
rcv_tsval doesn't really have an offset set.

Here is how to reproduce it with LTP netstress tool:
    netstress -R 1 &
    netstress -H 127.0.0.1 -lr 1000000 -a1

    [...]
    < S  seq 1956977072 win 43690 TS val 295618 ecr 459956970
    > .  ack 1956911535 win 342 TS val 459967184 ecr 1547117608
    < R  seq 1956911535 win 0 length 0
+1. < S  seq 1956977072 win 43690 TS val 296640 ecr 459956970
    > S. seq 657450664 ack 1956977073 win 43690 TS val 459968205 ecr 296640

Fixes: 95a22cae ("tcp: randomize tcp timestamp offsets for each connection")
Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00355fa5

22 2月, 2017 2 次提交

tcp: Revert "tcp: tcp_probe: use spin_lock_bh()" · 29869d66

由 Eric Dumazet 提交于 2月 21, 2017

This reverts commit e70ac171.

jtcp_rcv_established() is in fact called with hard irq being disabled.

Initial bug report from Ricardo Nabinger Sanchez [1] still needs
to be investigated, but does not look like a TCP bug.

[1] https://www.spinics.net/lists/netdev/msg420960.htmlSigned-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
Cc: Ricardo Nabinger Sanchez <rnsanchez@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29869d66

ip: fix IP_CHECKSUM handling · ca4ef457

由 Paolo Abeni 提交于 2月 21, 2017

The skbs processed by ip_cmsg_recv() are not guaranteed to
be linear e.g. when sending UDP packets over loopback with
MSGMORE.
Using csum_partial() on [potentially] the whole skb len
is dangerous; instead be on the safe side and use skb_checksum().

Thanks to syzkaller team to detect the issue and provide the
reproducer.

v1 -> v2:
 - move the variable declaration in a tighter scope

Fixes: ad6f939a ("ip: Add offset parameter to ip_cmsg_recv")
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca4ef457

18 2月, 2017 2 次提交

tcp: use page_ref_inc() in tcp_sendmsg() · 4e33e346

由 Eric Dumazet 提交于 2月 17, 2017

sk_page_frag_refill() allocates either a compound page or an order-0
page. We can use page_ref_inc() which is slightly faster than get_page()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e33e346

tcp: accommodate sequence number to a peer's shrunk receive window caused by... · a4ecb15a

由 Cui, Cheng 提交于 2月 17, 2017

tcp: accommodate sequence number to a peer's shrunk receive window caused by precision loss in window scaling

Prevent sending out a left-shifted sequence number from a Linux sender in
response to a peer's shrunk receive-window caused by losing least significant
bits in window-scaling.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: NCheng Cui <Cheng.Cui@netapp.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4ecb15a

15 2月, 2017 3 次提交

esp: Add a software GRO codepath · 7785bba2

由 Steffen Klassert 提交于 2月 15, 2017

This patch adds GRO ifrastructure and callbacks for ESP on
ipv4 and ipv6.

In case the GRO layer detects an ESP packet, the
esp{4,6}_gro_receive() function does a xfrm state lookup
and calls the xfrm input layer if it finds a matching state.
The packet will be decapsulated and reinjected it into layer 2.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

7785bba2

net: Add a skb_gro_flush_final helper. · 5f114163

由 Steffen Klassert 提交于 2月 15, 2017

Add a skb_gro_flush_final helper to prepare for  consuming
skbs in call_gro_receive. We will extend this helper to not
touch the skb if the skb is consumed by a gro callback with
a followup patch. We need this to handle the upcomming IPsec
ESP callbacks as they reinject the skb to the napi_gro_receive
asynchronous. The handler is used in all gro_receive functions
that can call the ESP gro handlers.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

5f114163

tcp: tcp_probe: use spin_lock_bh() · e70ac171

由 Eric Dumazet 提交于 2月 14, 2017

tcp_rcv_established() can now run in process context.

We need to disable BH while acquiring tcp probe spinlock,
or risk a deadlock.

Fixes: 5413d1ba ("net: do not block BH while processing socket backlog")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NRicardo Nabinger Sanchez <rnsanchez@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e70ac171

14 2月, 2017 1 次提交

NET: Fix /proc/net/arp for AX.25 · 4872e57c

由 Ralf Baechle 提交于 2月 11, 2017

When sending ARP requests over AX.25 links the hwaddress in the neighbour
cache are not getting initialized.  For such an incomplete arp entry
ax2asc2 will generate an empty string resulting in /proc/net/arp output
like the following:

$ cat /proc/net/arp
IP address       HW type     Flags       HW address            Mask     Device
192.168.122.1    0x1         0x2         52:54:00:00:5d:5f     *        ens3
172.20.1.99      0x3         0x0              *        bpq0

The missing field will confuse the procfs parsing of arp(8) resulting in
incorrect output for the device such as the following:

$ arp
Address                  HWtype  HWaddress           Flags Mask            Iface
gateway                  ether   52:54:00:00:5d:5f   C                     ens3
172.20.1.99                      (incomplete)                              ens3

This changes the content of /proc/net/arp to:

$ cat /proc/net/arp
IP address       HW type     Flags       HW address            Mask     Device
172.20.1.99      0x3         0x0         *                     *        bpq0
192.168.122.1    0x1         0x2         52:54:00:00:5d:5f     *        ens3

To do so it change ax2asc to put the string "*" in buf for a NULL address
argument.  Finally the HW address field is left aligned in a 17 character
field (the length of an ethernet HW address in the usual hex notation) for
readability.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4872e57c

12 2月, 2017 1 次提交

net: rename dst_neigh_output back to neigh_output · c16ec185

由 Julian Anastasov 提交于 2月 11, 2017

After the dst->pending_confirm flag was removed, we do not
need anymore to provide dst arg to dst_neigh_output.
So, rename it to neigh_output as before commit 5110effe
("net: Do delayed neigh confirmation.").
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c16ec185

11 2月, 2017 4 次提交

ipv4: fib: Add events for FIB replace and append · 2f3a5272

由 Ido Schimmel 提交于 2月 09, 2017

The FIB notification chain currently uses the NLM_F_{REPLACE,APPEND}
flags to signal routes being replaced or appended.

Instead of using netlink flags for in-kernel notifications we can simply
introduce two new events in the FIB notification chain. This has the
added advantage of making the API cleaner, thereby making it clear that
these events should be supported by listeners of the notification chain.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2f3a5272

ipv4: fib: Send notification before deleting FIB alias · 5b7d616d

由 Ido Schimmel 提交于 2月 09, 2017

When a FIB alias is replaced following NLM_F_REPLACE, the ENTRY_ADD
notification is sent after the reference on the previous FIB info was
dropped. This is problematic as potential listeners might need to access
it in their notification blocks.

Solve this by sending the notification prior to the deletion of the
replaced FIB alias. This is consistent with ENTRY_DEL notifications.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b7d616d

ipv4: fib: Send deletion notification with actual FIB alias type · 42d5aa76

由 Ido Schimmel 提交于 2月 09, 2017

When a FIB alias is removed, a notification is sent using the type
passed from user space - can be RTN_UNSPEC - instead of the actual type
of the removed alias. This is problematic for listeners of the FIB
notification chain, as several FIB aliases can exist with matching
parameters, but the type.

Solve this by passing the actual type of the removed FIB alias.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42d5aa76

ipv4: fib: Only flush FIB aliases belonging to currently flushed table · 58e3bdd5

由 Ido Schimmel 提交于 2月 09, 2017

In case the MAIN table is flushed and its trie is shared with the LOCAL
table, then we might be flushing FIB aliases belonging to the latter.
This can lead to FIB_ENTRY_DEL notifications sent with the wrong table
ID.

The above doesn't affect current listeners, as the table ID is ignored
during entry deletion, but this will change later in the patchset.

When flushing a particular table, skip any aliases belonging to a
different one.
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Patrick McHardy <kaber@trash.net>
Reviewed-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58e3bdd5

10 2月, 2017 1 次提交

igmp, mld: Fix memory leak in igmpv3/mld_del_delrec() · 9c8bb163

由 Hangbin Liu 提交于 2月 08, 2017

In function igmpv3/mld_add_delrec() we allocate pmc and put it in
idev->mc_tomb, so we should free it when we don't need it in del_delrec().
But I removed kfree(pmc) incorrectly in latest two patches. Now fix it.

Fixes: 24803f38 ("igmp: do not remove igmp souce list info when ...")
Fixes: 1666d49e ("mld: do not remove mld souce list info when ...")
Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c8bb163

09 2月, 2017 7 次提交

xfrm: policy: make policy backend const · 37b10383

由 Florian Westphal 提交于 2月 07, 2017

Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

37b10383

xfrm: policy: remove family field · a2817d8b

由 Florian Westphal 提交于 2月 07, 2017

Only needed it to register the policy backend at init time.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

a2817d8b

xfrm: policy: remove garbage_collect callback · 3d7d25a6

由 Florian Westphal 提交于 2月 07, 2017

Just call xfrm_garbage_collect_deferred() directly.
This gets rid of a write to afinfo in register/unregister and allows to
constify afinfo later on.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

3d7d25a6

xfrm: input: constify xfrm_input_afinfo · 960fdfde

由 Florian Westphal 提交于 2月 07, 2017

Nothing writes to these structures (the module owner was not used).

While at it, size xfrm_input_afinfo[] by the highest existing xfrm family
(INET6), not AF_MAX.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

960fdfde

ipv4: fib: Notify about nexthop status changes · 982acb97

由 Ido Schimmel 提交于 2月 08, 2017

When a multipath route is hit the kernel doesn't consider nexthops that
are DEAD or LINKDOWN when IN_DEV_IGNORE_ROUTES_WITH_LINKDOWN is set.
Devices that offload multipath routes need to be made aware of nexthop
status changes. Otherwise, the device will keep forwarding packets to
non-functional nexthops.

Add the FIB_EVENT_NH_{ADD,DEL} events to the fib notification chain,
which notify capable devices when they should add or delete a nexthop
from their tables.

Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Cc: David Ahern <dsa@cumulusnetworks.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Reviewed-by: NAndy Gospodarek <gospo@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

982acb97

gro_cells: move to net/core/gro_cells.c · 97e219b7

由 Eric Dumazet 提交于 2月 07, 2017

We have many gro cells users, so lets move the code to avoid
duplication.

This creates a CONFIG_GRO_CELLS option.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97e219b7

ping: fix a null pointer dereference · 73d2c667

由 WANG Cong 提交于 2月 07, 2017

Andrey reported a kernel crash:

  general protection fault: 0000 [#1] SMP KASAN
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Modules linked in:
  CPU: 2 PID: 3880 Comm: syz-executor1 Not tainted 4.10.0-rc6+ #124
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
  task: ffff880060048040 task.stack: ffff880069be8000
  RIP: 0010:ping_v4_push_pending_frames net/ipv4/ping.c:647 [inline]
  RIP: 0010:ping_v4_sendmsg+0x1acd/0x23f0 net/ipv4/ping.c:837
  RSP: 0018:ffff880069bef8b8 EFLAGS: 00010206
  RAX: dffffc0000000000 RBX: ffff880069befb90 RCX: 0000000000000000
  RDX: 0000000000000018 RSI: ffff880069befa30 RDI: 00000000000000c2
  RBP: ffff880069befbb8 R08: 0000000000000008 R09: 0000000000000000
  R10: 0000000000000002 R11: 0000000000000000 R12: ffff880069befab0
  R13: ffff88006c624a80 R14: ffff880069befa70 R15: 0000000000000000
  FS:  00007f6f7c716700(0000) GS:ffff88006de00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000004a6f28 CR3: 000000003a134000 CR4: 00000000000006e0
  Call Trace:
   inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
   sock_sendmsg_nosec net/socket.c:635 [inline]
   sock_sendmsg+0xca/0x110 net/socket.c:645
   SYSC_sendto+0x660/0x810 net/socket.c:1687
   SyS_sendto+0x40/0x50 net/socket.c:1655
   entry_SYSCALL_64_fastpath+0x1f/0xc2

This is because we miss a check for NULL pointer for skb_peek() when
the queue is empty. Other places already have the same check.

Fixes: c319b4d7 ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73d2c667

08 2月, 2017 5 次提交

net: use dst_confirm_neigh for UDP, RAW, ICMP, L2TP · 0dec879f

由 Julian Anastasov 提交于 2月 06, 2017

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

The datagram protocols can use MSG_CONFIRM to confirm the
neighbour. When used with MSG_PROBE we do not reach the
code where neighbour is confirmed, so we have to do the
same slow lookup by using the dst_confirm_neigh() helper.
When MSG_PROBE is not used, ip_append_data/ip6_append_data
will set the skb flag dst_pending_confirm.
Reported-by: NYueHaibing <yuehaibing@huawei.com>
Fixes: 5110effe ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bed ("ipv4: Cache output routes in fib_info nexthops.")
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0dec879f

net: add confirm_neigh method to dst_ops · 63fca65d

由 Julian Anastasov 提交于 2月 06, 2017

Add confirm_neigh method to dst_ops and use it from IPv4 and IPv6
to lookup and confirm the neighbour. Its usage via the new helper
dst_confirm_neigh() should be restricted to MSG_PROBE users for
performance reasons.

For XFRM prefer the last tunnel address, if present. With help
from Steffen Klassert.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63fca65d

tcp: replace dst_confirm with sk_dst_confirm · c3a2e837

由 Julian Anastasov 提交于 2月 06, 2017

When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.
Use the new sk_dst_confirm() helper to propagate the
indication from received packets to sock_confirm_neigh().
Reported-by: NYueHaibing <yuehaibing@huawei.com>
Fixes: 5110effe ("net: Do delayed neigh confirmation.")
Fixes: f2bb4bed ("ipv4: Cache output routes in fib_info nexthops.")
Tested-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3a2e837

net: add dst_pending_confirm flag to skbuff · 4ff06203

由 Julian Anastasov 提交于 2月 06, 2017

Add new skbuff flag to allow protocols to confirm neighbour.
When same struct dst_entry can be used for many different
neighbours we can not use it for pending confirmations.

Add sock_confirm_neigh() helper to confirm the neighbour and
use it for IPv4, IPv6 and VRF before dst_neigh_output.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ff06203

udp: properly cope with csum errors · 69629464

由 Eric Dumazet 提交于 2月 05, 2017

Dmitry reported that UDP sockets being destroyed would trigger the
WARN_ON(atomic_read(&sk->sk_rmem_alloc)); in inet_sock_destruct()

It turns out we do not properly destroy skb(s) that have wrong UDP
checksum.

Thanks again to syzkaller team.

Fixes : 7c13f97f ("udp: do fwd memory scheduling on dequeue")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

69629464

07 2月, 2017 1 次提交

tcp: avoid infinite loop in tcp_splice_read() · ccf7abb9

由 Eric Dumazet 提交于 2月 03, 2017

Splicing from TCP socket is vulnerable when a packet with URG flag is
received and stored into receive queue.

__tcp_splice_read() returns 0, and sk_wait_data() immediately
returns since there is the problematic skb in queue.

This is a nice way to burn cpu (aka infinite loop) and trigger
soft lockups.

Again, this gem was found by syzkaller tool.

Fixes: 9c55e01c ("[TCP]: Splice receive support.")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov  <dvyukov@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccf7abb9

05 2月, 2017 2 次提交

netlabel: out of bound access in cipso_v4_validate() · d71b7896

由 Eric Dumazet 提交于 2月 03, 2017

syzkaller found another out of bound access in ip_options_compile(),
or more exactly in cipso_v4_validate()

Fixes: 20e2a864 ("cipso: handle CIPSO options correctly when NetLabel is disabled")
Fixes: 446fda4f ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NDmitry Vyukov  <dvyukov@google.com>
Cc: Paul Moore <paul@paul-moore.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d71b7896

ipv4: keep skb->dst around in presence of IP options · 34b2cef2

由 Eric Dumazet 提交于 2月 04, 2017

Andrey Konovalov got crashes in __ip_options_echo() when a NULL skb->dst
is accessed.

ipv4_pktinfo_prepare() should not drop the dst if (evil) IP options
are present.

We could refine the test to the presence of ts_needtime or srr,
but IP options are not often used, so let's be conservative.

Thanks to syzkaller team for finding this bug.

Fixes: d826eb14 ("ipv4: PKTINFO doesnt need dst reference")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34b2cef2

04 2月, 2017 2 次提交

tcp: clear pfmemalloc on outgoing skb · 38ab52e8

由 Eric Dumazet 提交于 2月 02, 2017

Josef Bacik diagnosed following problem :

   I was seeing random disconnects while testing NBD over loopback.
   This turned out to be because NBD sets pfmemalloc on it's socket,
   however the receiving side is a user space application so does not
   have pfmemalloc set on its socket. This means that
   sk_filter_trim_cap will simply drop this packet, under the
   assumption that the other side will simply retransmit. Well we do
   retransmit, and then the packet is just dropped again for the same
   reason.

It seems the better way to address this problem is to clear pfmemalloc
in the TCP transmit path. pfmemalloc strict control really makes sense
on the receive path.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38ab52e8

tcp: add tcp_mss_clamp() helper · 3541f9e8

由 Eric Dumazet 提交于 2月 02, 2017

Small cleanup factorizing code doing the TCP_MAXSEG clamping.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3541f9e8

03 2月, 2017 2 次提交

net: add LINUX_MIB_PFMEMALLOCDROP counter · 8fe809a9

由 Eric Dumazet 提交于 2月 01, 2017

Debugging issues caused by pfmemalloc is often tedious.

Add a new SNMP counter to more easily diagnose these problems.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Josef Bacik <jbacik@fb.com>
Acked-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fe809a9

net: ipv4: remove fib_lookup.h from devinet.c include list · 66109109

由 David Ahern 提交于 2月 01, 2017

nothing in devinet.c relies on fib_lookup.h; remove it from the includes
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66109109

02 2月, 2017 3 次提交

netfilter: allow logging from non-init namespaces · 2851940f

由 Michal Kubeček 提交于 1月 31, 2017

Commit 69b34fb9 ("netfilter: xt_LOG: add net namespace support for
xt_LOG") disabled logging packets using the LOG target from non-init
namespaces. The motivation was to prevent containers from flooding
kernel log of the host. The plan was to keep it that way until syslog
namespace implementation allows containers to log in a safe way.

However, the work on syslog namespace seems to have hit a dead end
somewhere in 2013 and there are users who want to use xt_LOG in all
network namespaces. This patch allows to do so by setting

  /proc/sys/net/netfilter/nf_log_all_netns

to a nonzero value. This sysctl is only accessible from init_net so that
one cannot switch the behaviour from inside a container.
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2851940f

netfilter: add and use nf_ct_set helper · c74454fa

由 Florian Westphal 提交于 1月 23, 2017

Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff.
This avoids changing code in followup patch that merges skb->nfct and
skb->nfctinfo into skb->_nfct.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c74454fa

skbuff: add and use skb_nfct helper · cb9c6836

由 Florian Westphal 提交于 1月 23, 2017

Followup patch renames skb->nfct and changes its type so add a helper to
avoid intrusive rename change later.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

cb9c6836

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功