提交 · fc33a8943efb25bc025750c7f4ea643fae526860 · openanolis / cloud-kernel

09 8月, 2017 1 次提交

net: avoid skb_warn_bad_offload false positives on UFO · 8d63bee6

由 Willem de Bruijn 提交于 8月 08, 2017

skb_warn_bad_offload triggers a warning when an skb enters the GSO
stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
checksum offload set.

Commit b2504a5d ("net: reduce skb_warn_bad_offload() noise")
observed that SKB_GSO_DODGY producers can trigger the check and
that passing those packets through the GSO handlers will fix it
up. But, the software UFO handler will set ip_summed to
CHECKSUM_NONE.

When __skb_gso_segment is called from the receive path, this
triggers the warning again.

Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
Tx these two are equivalent. On Rx, this better matches the
skb state (checksum computed), as CHECKSUM_NONE here means no
checksum computed.

See also this thread for context:
http://patchwork.ozlabs.org/patch/799015/

Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d63bee6

04 8月, 2017 1 次提交

ipv6: set rt6i_protocol properly in the route when it is installed · b91d5329

由 Xin Long 提交于 8月 03, 2017

After commit c2ed1880 ("net: ipv6: check route protocol when
deleting routes"), ipv6 route checks rt protocol when trying to
remove a rt entry.

It introduced a side effect causing 'ip -6 route flush cache' not
to work well. When flushing caches with iproute, all route caches
get dumped from kernel then removed one by one by sending DELROUTE
requests to kernel for each cache.

The thing is iproute sends the request with the cache whose proto
is set with RTPROT_REDIRECT by rt6_fill_node() when kernel dumps
it. But in kernel the rt_cache protocol is still 0, which causes
the cache not to be matched and removed.

So the real reason is rt6i_protocol in the route is not set when
it is allocated. As David Ahern's suggestion, this patch is to
set rt6i_protocol properly in the route when it is installed and
remove the codes setting rtm_protocol according to rt6i_flags in
rt6_fill_node.

This is also an improvement to keep rt6i_protocol consistent with
rtm_protocol.

Fixes: c2ed1880 ("net: ipv6: check route protocol when deleting routes")
Reported-by: NJianlin Shi <jishi@redhat.com>
Suggested-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b91d5329

01 8月, 2017 1 次提交

udp6: fix jumbogram reception · cb891fa6

由 Paolo Abeni 提交于 7月 31, 2017

Since commit 67a51780 ("ipv6: udp: leverage scratch area
helpers") udp6_recvmsg() read the skb len from the scratch area,
to avoid a cache miss.
But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
length exceeds the 16 bits available in the scratch area. As a side
effect the length returned by recvmsg() is:
<ingress datagram len> % (1<<16)

This commit addresses the issue allocating one more bit in the
IP6CB flags field and setting it for incoming jumbograms.
Such field is still in the first cacheline, so at recvmsg()
time we can check it and fallback to access skb->len if
required, without a measurable overhead.

Fixes: 67a51780 ("ipv6: udp: leverage scratch area helpers")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb891fa6

30 7月, 2017 1 次提交

udp6: fix socket leak on early demux · c9f2c1ae

由 Paolo Abeni 提交于 7月 27, 2017

When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
sk reference is retrieved and used, but the relevant reference
count is leaked and the socket destructor is never called.
Beyond leaking the sk memory, if there are pending UDP packets
in the receive queue, even the related accounted memory is leaked.

In the long run, this will cause persistent forward allocation errors
and no UDP skbs (both ipv4 and ipv6) will be able to reach the
user-space.

Fix this by explicitly accessing the early demux reference before
the lookup, and properly decreasing the socket reference count
after usage.

Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
the now obsoleted comment about "socket cache".

The newly added code is derived from the current ipv4 code for the
similar path.

v1 -> v2:
  fixed the __udp6_lib_rcv() return code for resubmission,
  as suggested by Eric
Reported-by: NSam Edwards <CFSworks@gmail.com>
Reported-by: NMarc Haber <mh+netdev@zugschlus.de>
Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9f2c1ae

26 7月, 2017 1 次提交

ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment() · afce615a

由 Stefano Brivio 提交于 7月 24, 2017

RFC 2465 defines ipv6IfStatsOutFragFails as:

	"The number of IPv6 datagrams that have been discarded
	 because they needed to be fragmented at this output
	 interface but could not be."

The existing implementation, instead, would increase the counter
twice in case we fail to allocate room for single fragments:
once for the fragment, once for the datagram.

This didn't look intentional though. In one of the two affected
affected failure paths, the double increase was simply a result
of a new 'goto fail' statement, introduced to avoid a skb leak.
The other path appears to be affected since at least 2.6.12-rc2.
Reported-by: NSabrina Dubroca <sdubroca@redhat.com>
Fixes: 1d325d21 ("ipv6: ip6_fragment: fix headroom tests and skb leak")
Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afce615a

20 7月, 2017 1 次提交

ipv6: avoid overflow of offset in ip6_find_1stfragopt · 6399f1fa

由 Sabrina Dubroca 提交于 7月 19, 2017

In some cases, offset can overflow and can cause an infinite loop in
ip6_find_1stfragopt(). Make it unsigned int to prevent the overflow, and
cap it at IPV6_MAXPLEN, since packets larger than that should be invalid.

This problem has been here since before the beginning of git history.
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6399f1fa

19 7月, 2017 1 次提交

ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check() · 18bcf290

由 Alexander Potapenko 提交于 7月 17, 2017

KMSAN reported use of uninitialized memory in skb_set_hash_from_sk(),
which originated from the TCP request socket created in
cookie_v6_check():

 ==================================================================
 BUG: KMSAN: use of uninitialized memory in tcp_transmit_skb+0xf77/0x3ec0
 CPU: 1 PID: 2949 Comm: syz-execprog Not tainted 4.11.0-rc5+ #2931
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 TCP: request_sock_TCPv6: Possible SYN flooding on port 20028. Sending cookies.  Check SNMP counters.
 Call Trace:
  <IRQ>
  __dump_stack lib/dump_stack.c:16
  dump_stack+0x172/0x1c0 lib/dump_stack.c:52
  kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
  __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
  skb_set_hash_from_sk ./include/net/sock.h:2011
  tcp_transmit_skb+0xf77/0x3ec0 net/ipv4/tcp_output.c:983
  tcp_send_ack+0x75b/0x830 net/ipv4/tcp_output.c:3493
  tcp_delack_timer_handler+0x9a6/0xb90 net/ipv4/tcp_timer.c:284
  tcp_delack_timer+0x1b0/0x310 net/ipv4/tcp_timer.c:309
  call_timer_fn+0x240/0x520 kernel/time/timer.c:1268
  expire_timers kernel/time/timer.c:1307
  __run_timers+0xc13/0xf10 kernel/time/timer.c:1601
  run_timer_softirq+0x36/0xa0 kernel/time/timer.c:1614
  __do_softirq+0x485/0x942 kernel/softirq.c:284
  invoke_softirq kernel/softirq.c:364
  irq_exit+0x1fa/0x230 kernel/softirq.c:405
  exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:657
  smp_apic_timer_interrupt+0x5a/0x80 arch/x86/kernel/apic/apic.c:966
  apic_timer_interrupt+0x86/0x90 arch/x86/entry/entry_64.S:489
 RIP: 0010:native_restore_fl ./arch/x86/include/asm/irqflags.h:36
 RIP: 0010:arch_local_irq_restore ./arch/x86/include/asm/irqflags.h:77
 RIP: 0010:__msan_poison_alloca+0xed/0x120 mm/kmsan/kmsan_instr.c:440
 RSP: 0018:ffff880024917cd8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
 RAX: 0000000000000246 RBX: ffff8800224c0000 RCX: 0000000000000005
 RDX: 0000000000000004 RSI: ffff880000000000 RDI: ffffea0000b6d770
 RBP: ffff880024917d58 R08: 0000000000000dd8 R09: 0000000000000004
 R10: 0000160000000000 R11: 0000000000000000 R12: ffffffff85abf810
 R13: ffff880024917dd8 R14: 0000000000000010 R15: ffffffff81cabde4
  </IRQ>
  poll_select_copy_remaining+0xac/0x6b0 fs/select.c:293
  SYSC_select+0x4b4/0x4e0 fs/select.c:653
  SyS_select+0x76/0xa0 fs/select.c:634
  entry_SYSCALL_64_fastpath+0x13/0x94 arch/x86/entry/entry_64.S:204
 RIP: 0033:0x4597e7
 RSP: 002b:000000c420037ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000017
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004597e7
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: 000000c420037ef0 R08: 000000c420037ee0 R09: 0000000000000059
 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000042dc20
 R13: 00000000000000f3 R14: 0000000000000030 R15: 0000000000000003
 chained origin:
  save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
  kmsan_save_stack mm/kmsan/kmsan.c:317
  kmsan_internal_chain_origin+0x12a/0x1f0 mm/kmsan/kmsan.c:547
  __msan_store_shadow_origin_4+0xac/0x110 mm/kmsan/kmsan_instr.c:259
  tcp_create_openreq_child+0x709/0x1ae0 net/ipv4/tcp_minisocks.c:472
  tcp_v6_syn_recv_sock+0x7eb/0x2a30 net/ipv6/tcp_ipv6.c:1103
  tcp_get_cookie_sock+0x136/0x5f0 net/ipv4/syncookies.c:212
  cookie_v6_check+0x17a9/0x1b50 net/ipv6/syncookies.c:245
  tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
  tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
  tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
  ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
  NF_HOOK ./include/linux/netfilter.h:257
  ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
  dst_input ./include/net/dst.h:492
  ip6_rcv_finish net/ipv6/ip6_input.c:69
  NF_HOOK ./include/linux/netfilter.h:257
  ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
  __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
  __netif_receive_skb net/core/dev.c:4246
  process_backlog+0x667/0xba0 net/core/dev.c:4866
  napi_poll net/core/dev.c:5268
  net_rx_action+0xc95/0x1590 net/core/dev.c:5333
  __do_softirq+0x485/0x942 kernel/softirq.c:284
 origin:
  save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
  kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
  kmsan_kmalloc+0x7f/0xe0 mm/kmsan/kmsan.c:337
  kmem_cache_alloc+0x1c2/0x1e0 mm/slub.c:2766
  reqsk_alloc ./include/net/request_sock.h:87
  inet_reqsk_alloc+0xa4/0x5b0 net/ipv4/tcp_input.c:6200
  cookie_v6_check+0x4f4/0x1b50 net/ipv6/syncookies.c:169
  tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:989
  tcp_v6_do_rcv+0xdd8/0x1c60 net/ipv6/tcp_ipv6.c:1298
  tcp_v6_rcv+0x41a3/0x4f00 net/ipv6/tcp_ipv6.c:1487
  ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
  NF_HOOK ./include/linux/netfilter.h:257
  ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
  dst_input ./include/net/dst.h:492
  ip6_rcv_finish net/ipv6/ip6_input.c:69
  NF_HOOK ./include/linux/netfilter.h:257
  ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
  __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
  __netif_receive_skb net/core/dev.c:4246
  process_backlog+0x667/0xba0 net/core/dev.c:4866
  napi_poll net/core/dev.c:5268
  net_rx_action+0xc95/0x1590 net/core/dev.c:5333
  __do_softirq+0x485/0x942 kernel/softirq.c:284
 ==================================================================

Similar error is reported for cookie_v4_check().

Fixes: 58d607d3 ("tcp: provide skb->hash to synack packets")
Signed-off-by: NAlexander Potapenko <glider@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18bcf290

06 7月, 2017 1 次提交

net: ipv6: Compare lwstate in detecting duplicate nexthops · f06b7549

由 David Ahern 提交于 7月 05, 2017

Lennert reported a failure to add different mpls encaps in a multipath
route:

  $ ip -6 route add 1234::/16 \
        nexthop encap mpls 10 via fe80::1 dev ens3 \
        nexthop encap mpls 20 via fe80::1 dev ens3
  RTNETLINK answers: File exists

The problem is that the duplicate nexthop detection does not compare
lwtunnel configuration. Add it.

Fixes: 19e42e45 ("ipv6: support for fib route lwtunnel encap attributes")
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Reported-by: NJoão Taveira Araújo <joao.taveira@gmail.com>
Reported-by: NLennert Buytenhek <buytenh@wantstofly.org>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Tested-by: NLennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f06b7549

05 7月, 2017 1 次提交

net, calipso: convert calipso_doi.refcount from atomic_t to refcount_t · edcd9270

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edcd9270

04 7月, 2017 7 次提交

net, ipv6: convert ip6addrlbl_entry.refcnt from atomic_t to refcount_t · 87078f26

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87078f26

net, ipv6: convert xfrm6_tunnel_spi.refcnt from atomic_t to refcount_t · d12f3827

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d12f3827

net, ipv6: convert ifacaddr6.aca_refcnt from atomic_t to refcount_t · affa78bc

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

affa78bc

net, ipv6: convert ifmcaddr6.mca_refcnt from atomic_t to refcount_t · d3981bc6

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3981bc6

net, ipv6: convert inet6_ifaddr.refcnt from atomic_t to refcount_t · 271201c0

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

271201c0

net, ipv6: convert inet6_dev.refcnt from atomic_t to refcount_t · 1be92460

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1be92460

net, ipv6: convert ipv6_txoptions.refcnt from atomic_t to refcount_t · 0aeea21a

由 Reshetova, Elena 提交于 7月 04, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0aeea21a

03 7月, 2017 1 次提交

ipv6: dad: don't remove dynamic addresses if link is down · ec8add2a

由 Sabrina Dubroca 提交于 6月 29, 2017

Currently, when the link for $DEV is down, this command succeeds but the
address is removed immediately by DAD (1):

    ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800

In the same situation, this will succeed and not remove the address (2):

    ip addr add 1111::12/64 dev $DEV
    ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800

The comment in addrconf_dad_begin() when !IF_READY makes it look like
this is the intended behavior, but doesn't explain why:

     * If the device is not ready:
     * - keep it tentative if it is a permanent address.
     * - otherwise, kill it.

We clearly cannot prevent userspace from doing (2), but we can make (1)
work consistently with (2).

addrconf_dad_stop() is only called in two cases: if DAD failed, or to
skip DAD when the link is down. In that second case, the fix is to avoid
deleting the address, like we already do for permanent addresses.

Fixes: 3c21edbd ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec8add2a

01 7月, 2017 4 次提交

net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_t · b4217b82

由 Reshetova, Elena 提交于 6月 30, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4217b82

net: convert sock.sk_refcnt from atomic_t to refcount_t · 41c6d650

由 Reshetova, Elena 提交于 6月 30, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

This patch uses refcount_inc_not_zero() instead of
atomic_inc_not_zero_hint() due to absense of a _hint()
version of refcount API. If the hint() version must
be used, we might need to revisit API.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41c6d650

net: convert sock.sk_wmem_alloc from atomic_t to refcount_t · 14afee4b

由 Reshetova, Elena 提交于 6月 30, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

14afee4b

net: convert sk_buff.users from atomic_t to refcount_t · 63354797

由 Reshetova, Elena 提交于 6月 30, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63354797

30 6月, 2017 1 次提交
- A
  do_ipv6_setsockopt(): don't open-code memdup_user() · 43727da9
  由 Al Viro 提交于 5月 13, 2017
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  43727da9
28 6月, 2017 1 次提交

ipv6: udp: leverage scratch area helpers · 67a51780

由 Paolo Abeni 提交于 6月 26, 2017

The commit b65ac446 ("udp: try to avoid 2 cache miss on dequeue")
leveraged the scratched area helpers for UDP v4 but I forgot to
update accordingly the IPv6 code path.

This change extends the scratch area usage to the IPv6 code, synching
the two implementations and giving some performance benefit.
IPv6 is again almost on the same level of IPv4, performance-wide.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67a51780

27 6月, 2017 3 次提交

net: add netlink_ext_ack argument to rtnl_link_ops.validate · a8b8a889

由 Matthias Schiffer 提交于 6月 25, 2017

Add support for extended error reporting.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8b8a889

net: add netlink_ext_ack argument to rtnl_link_ops.changelink · ad744b22

由 Matthias Schiffer 提交于 6月 25, 2017

Add support for extended error reporting.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad744b22

net: add netlink_ext_ack argument to rtnl_link_ops.newlink · 7a3f4a18

由 Matthias Schiffer 提交于 6月 25, 2017

Add support for extended error reporting.
Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a3f4a18

25 6月, 2017 1 次提交

net: ipv6: reset daddr and dport in sk if connect() fails · 85cb73ff

由 Wei Wang 提交于 6月 23, 2017

In __ip6_datagram_connect(), reset sk->sk_v6_daddr and inet->dport if
error occurs.
In udp_v6_early_demux(), check for sk_state to make sure it is in
TCP_ESTABLISHED state.
Together, it makes sure unconnected UDP socket won't be considered as a
valid candidate for early demux.

v3: add TCP_ESTABLISHED state check in udp_v6_early_demux()
v2: fix compilation error

Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast")
Signed-off-by: NWei Wang <weiwan@google.com>
Acked-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85cb73ff

24 6月, 2017 2 次提交

sit: use __GFP_NOWARN for user controlled allocation · 0ccc22f4

由 WANG Cong 提交于 6月 22, 2017

The memory allocation size is controlled by user-space,
if it is too large just fail silently and return NULL,
not to mention there is a fallback allocation later.
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ccc22f4

net: account for current skb length when deciding about UFO · a5cb659b

由 Michal Kubeček 提交于 6月 19, 2017

Our customer encountered stuck NFS writes for blocks starting at specific
offsets w.r.t. page boundary caused by networking stack sending packets via
UFO enabled device with wrong checksum. The problem can be reproduced by
composing a long UDP datagram from multiple parts using MSG_MORE flag:

sendto(sd, buff, 1000, MSG_MORE, ...);
sendto(sd, buff, 1000, MSG_MORE, ...);
sendto(sd, buff, 3000, 0, ...);

Assume this packet is to be routed via a device with MTU 1500 and
NETIF_F_UFO enabled. When second sendto() gets into __ip_append_data(),
this condition is tested (among others) to decide whether to call
ip_ufo_append_data():

((length + fragheaderlen) > mtu) || (skb && skb_is_gso(skb))

At the moment, we already have skb with 1028 bytes of data which is not
marked for GSO so that the test is false (fragheaderlen is usually 20).
Thus we append second 1000 bytes to this skb without invoking UFO. Third
sendto(), however, has sufficient length to trigger the UFO path so that we
end up with non-UFO skb followed by a UFO one. Later on, udp_send_skb()
uses udp_csum() to calculate the checksum but that assumes all fragments
have correct checksum in skb->csum which is not true for UFO fragments.

When checking against MTU, we need to add skb->len to length of new segment
if we already have a partially filled skb and fragheaderlen only if there
isn't one.

In the IPv6 case, skb can only be null if this is the first segment so that
we have to use headersize (length of the first IPv6 header) rather than
fragheaderlen (length of IPv6 header of further fragments) for skb == NULL.

Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
Fixes: e4c5e13a ("ipv6: Should use consistent conditional judgement for
ip6 fragment between __ip6_append_data and ip6_finish_output")
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Acked-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5cb659b

23 6月, 2017 2 次提交

udp/v6: prefetch rmem_alloc in udp6_queue_rcv_skb() · 4b943fae

由 Paolo Abeni 提交于 6月 22, 2017

very similar to commit dd99e425 ("udp: prefetch
rmem_alloc in udp_queue_rcv_skb()"), this allows saving a cache
miss when the BH is bottle-neck for UDP over ipv6 packet
processing, e.g. for small packets when a single RX NIC ingress
queue is in use.

Performances under flood when multiple NIC RX queues used are
unaffected, but when a single NIC rx queue is in use, this
gives ~8% performance improvement.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b943fae

ipv6: avoid unregistering inet6_dev for loopback · 60abc0be

由 WANG Cong 提交于 6月 21, 2017

The per netns loopback_dev->ip6_ptr is unregistered and set to
NULL when its mtu is set to smaller than IPV6_MIN_MTU, this
leads to that we could set rt->rt6i_idev NULL after a
rt6_uncached_list_flush_dev() and then crash after another
call.

In this case we should just bring its inet6_dev down, rather
than unregistering it, at least prior to commit 176c39af
("netns: fix addrconf_ifdown kernel panic") we always
override the case for loopback.

Thanks a lot to Andrey for finding a reliable reproducer.

Fixes: 176c39af ("netns: fix addrconf_ifdown kernel panic")
Reported-by: NAndrey Konovalov <andreyknvl@google.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Daniel Lezcano <dlezcano@fr.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Tested-by: NAndrey Konovalov <andreyknvl@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60abc0be

22 6月, 2017 4 次提交

tcp: Add a tcp_filter hook before handle ack packet · 8fac365f

由 Chenbo Feng 提交于 6月 20, 2017

Currently in both ipv4 and ipv6 code path, the ack packet received when
sk at TCP_NEW_SYN_RECV state is not filtered by socket filter or cgroup
filter since it is handled from tcp_child_process and never reaches the
tcp_filter inside tcp_v4_rcv or tcp_v6_rcv. Adding a tcp_filter hooks
here can make sure all the ingress tcp packet can be correctly filtered.
Signed-off-by: NChenbo Feng <fengc@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fac365f

ipv6: only call ip6_route_dev_notify() once for NETDEV_UNREGISTER · 76da0704

由 WANG Cong 提交于 6月 20, 2017

In commit 242d3a49 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
I assumed NETDEV_REGISTER and NETDEV_UNREGISTER are paired,
unfortunately, as reported by jeffy, netdev_wait_allrefs()
could rebroadcast NETDEV_UNREGISTER event until all refs are
gone.

We have to add an additional check to avoid this corner case.
For netdev_wait_allrefs() dev->reg_state is NETREG_UNREGISTERED,
for dev_change_net_namespace(), dev->reg_state is
NETREG_REGISTERED. So check for dev->reg_state != NETREG_UNREGISTERED.

Fixes: 242d3a49 ("ipv6: reorder ip6_route_dev_notifier after ipv6_dev_notf")
Reported-by: Njeffy <jeffy.chen@rock-chips.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76da0704

esp6_offload: Fix IP6CB(skb)->nhoff for ESP GRO · ca3a1b85

由 Yossi Kuperman 提交于 6月 22, 2017

IP6CB(skb)->nhoff is the offset of the nexthdr field in an IPv6
header, unless there are extension headers present, in which case
nhoff points to the nexthdr field of the last extension header.

In non-GRO code path, nhoff is set by ipv6_rcv before any XFRM code
is executed. Conversely, in GRO code path (when esp6_offload is loaded),
nhoff is not set. The following functions fail to read the correct value
and eventually the packet is dropped:

    xfrm6_transport_finish
    xfrm6_tunnel_input
    xfrm6_rcv_tnl

Set nhoff to the proper offset of nexthdr in esp6_gro_receive.

Fixes: 7785bba2 ("esp: Add a software GRO codepath")
Signed-off-by: NYossi Kuperman <yossiku@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

ca3a1b85

xfrm6: Fix IPv6 payload_len in xfrm6_transport_finish · 7c88e21a

由 Yossi Kuperman 提交于 6月 22, 2017

IPv6 payload length indicates the size of the payload, including any
extension headers.

In xfrm6_transport_finish, ipv6_hdr(skb)->payload_len is set to the
payload size only, regardless of the presence of any extension headers.
After ESP GRO transport mode decapsulation, ipv6_rcv trims the packet
according to the wrong payload_len, thus corrupting the packet.

Set payload_len to account for extension headers as well.

Fixes: 7785bba2 ("esp: Add a software GRO codepath")
Signed-off-by: NYossi Kuperman <yossiku@mellanox.com>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

7c88e21a

21 6月, 2017 2 次提交

ip6mr: add netlink notifications on mrt6msg cache reports · dd12d15c

由 Julien Gomes 提交于 6月 20, 2017

Add Netlink notifications on cache reports in ip6mr, in addition to the
existing mrt6msg sent to mroute6_sk.
Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV6_MROUTE_R.

MSGTYPE, MIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
same data as their equivalent fields in the mrt6msg header.
PKT attribute is the packet sent to mroute6_sk, without the added
mrt6msg header.
Suggested-by: NRyan Halbrook <halbrook@arista.com>
Signed-off-by: NJulien Gomes <julien@arista.com>
Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dd12d15c

ipv6: Do not leak throw route references · 07f61557

由 Serhey Popovych 提交于 6月 20, 2017

While commit 73ba57bf ("ipv6: fix backtracking for throw routes")
does good job on error propagation to the fib_rules_lookup()
in fib rules core framework that also corrects throw routes
handling, it does not solve route reference leakage problem
happened when we return -EAGAIN to the fib_rules_lookup()
and leave routing table entry referenced in arg->result.

If rule with matched throw route isn't last matched in the
list we overwrite arg->result losing reference on throw
route stored previously forever.

We also partially revert commit ab997ad4 ("ipv6: fix the
incorrect return value of throw route") since we never return
routing table entry with dst.error == -EAGAIN when
CONFIG_IPV6_MULTIPLE_TABLES is on. Also there is no point
to check for RTF_REJECT flag since it is always set throw
route.

Fixes: 73ba57bf ("ipv6: fix backtracking for throw routes")
Signed-off-by: NSerhey Popovych <serhe.popovych@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07f61557

20 6月, 2017 2 次提交

tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix · 8917a777

由 Ivan Delalande 提交于 6月 15, 2017

Replace first padding in the tcp_md5sig structure with a new flag field
and address prefix length so it can be specified when configuring a new
key for TCP MD5 signature. The tcpm_flags field will only be used if the
socket option is TCP_MD5SIG_EXT to avoid breaking existing programs, and
tcpm_prefixlen only when the TCP_MD5SIG_FLAG_PREFIX flag is set.
Signed-off-by: NBob Gilligan <gilligan@arista.com>
Signed-off-by: NEric Mowat <mowat@arista.com>
Signed-off-by: NIvan Delalande <colona@arista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8917a777

tcp: md5: add an address prefix for key lookup · 6797318e

由 Ivan Delalande 提交于 6月 15, 2017

This allows the keys used for TCP MD5 signature to be used for whole
range of addresses, specified with a prefix length, instead of only one
address as it currently is.
Signed-off-by: NBob Gilligan <gilligan@arista.com>
Signed-off-by: NEric Mowat <mowat@arista.com>
Signed-off-by: NIvan Delalande <colona@arista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6797318e

19 6月, 2017 1 次提交

ip6_tunnel: Correct tos value in collect_md mode · 46f8cd9d

由 Haishuang Yan 提交于 6月 17, 2017

Same as ip_gre, geneve and vxlan, use key->tos as traffic class value.

CC: Peter Dawson <petedaws@gmail.com>
Fixes: 0e9a7095 ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets”)
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: NPeter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46f8cd9d

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功