提交 · d445516966dcb2924741b13b27738b54df2af01a · openanolis / cloud-kernel

18 7月, 2017 1 次提交

net: xdp: support xdp generic on virtual devices · d4455169

由 John Fastabend 提交于 7月 17, 2017

XDP generic allows users to test XDP programs and/or run them with
degraded performance on devices that do not yet support XDP. For
testing I typically test eBPF programs using a set of veth devices.
This allows testing topologies that would otherwise be difficult to
setup especially in the early stages of development.

This patch adds a xdp generic hook to the netif_rx_internal()
function which is called from dev_forward_skb(). With this addition
attaching XDP programs to veth devices works as expected! Also I
noticed multiple drivers using netif_rx(). These devices will also
benefit and generic XDP will work for them as well.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Tested-by: NAndy Gospodarek <andy@greyhouse.net>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4455169

17 7月, 2017 13 次提交

inetpeer: remove AVL implementation in favor of RB tree · b145425f

由 Eric Dumazet 提交于 7月 17, 2017

As discussed in Faro during Netfilter Workshop 2017, RB trees can be
used with RCU, using a seqlock.

Note that net/rxrpc/conn_service.c is already using this.

This patch converts inetpeer from AVL tree to RB tree, since it allows
to remove private AVL implementation in favor of shared RB code.

$ size net/ipv4/inetpeer.before net/ipv4/inetpeer.after
   text    data     bss     dec     hex filename
   3195      40     128    3363     d23 net/ipv4/inetpeer.before
   1562      24       0    1586     632 net/ipv4/inetpeer.after

The same technique can be used to speed up
net/netfilter/nft_set_rbtree.c (removing rwlock contention in fast path)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b145425f

net/unix: drop obsolete fd-recursion limits · 27eac47b

由 David Herrmann 提交于 7月 17, 2017

All unix sockets now account inflight FDs to the respective sender.
This was introduced in:

    commit 712f4aad
    Author: willy tarreau <w@1wt.eu>
    Date:   Sun Jan 10 07:54:56 2016 +0100

        unix: properly account for FDs passed over unix sockets

and further refined in:

    commit 415e3d3e
    Author: Hannes Frederic Sowa <hannes@stressinduktion.org>
    Date:   Wed Feb 3 02:11:03 2016 +0100

        unix: correctly track in-flight fds in sending process user_struct

Hence, regardless of the stacking depth of FDs, the total number of
inflight FDs is limited, and accounted. There is no known way for a
local user to exceed those limits or exploit the accounting.

Furthermore, the GC logic is independent of the recursion/stacking depth
as well. It solely depends on the total number of inflight FDs,
regardless of their layout.

Lastly, the current `recursion_level' suffers a TOCTOU race, since it
checks and inherits depths only at queue time. If we consider `A<-B' to
mean `queue-B-on-A', the following sequence circumvents the recursion
level easily:

    A<-B
       B<-C
          C<-D
             ...
               Y<-Z

resulting in:

    A<-B<-C<-...<-Z

With all of this in mind, lets drop the recursion limit. It has no
additional security value, anymore. On the contrary, it randomly
confuses message brokers that try to forward file-descriptors, since
any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client
maliciously modifies the FD while inflight.

Cc: Alban Crequy <alban.crequy@collabora.co.uk>
Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Reviewed-by: NTom Gundersen <teg@jklm.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27eac47b

skbuff: optimize the pull_pages code in __pskb_pull_tail() · 3ccc6c6f

由 linzhang 提交于 7月 17, 2017

In the pull_pages code block, if the first frag size > eat,
we can end the loop in advance to avoid extra copy.
Signed-off-by: NLin Zhang <xiaolou4617@gmail.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ccc6c6f

sctp: remove the typedef sctp_hmac_algo_param_t · 1474774a

由 Xin Long 提交于 7月 17, 2017

This patch is to remove the typedef sctp_hmac_algo_param_t, and
replace with struct sctp_hmac_algo_param in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1474774a

sctp: remove the typedef sctp_chunks_param_t · a762a9d9

由 Xin Long 提交于 7月 17, 2017

This patch is to remove the typedef sctp_chunks_param_t, and
replace with struct sctp_chunks_param in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a762a9d9

sctp: remove the typedef sctp_random_param_t · b02db702

由 Xin Long 提交于 7月 17, 2017

This patch is to remove the typedef sctp_random_param_t, and
replace with struct sctp_random_param in the places where it's
using this typedef.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b02db702

sctp: remove the typedef sctp_supported_ext_param_t · 15328d9f

由 Xin Long 提交于 7月 17, 2017

This patch is to remove the typedef sctp_supported_ext_param_t, and
replace with struct sctp_supported_ext_param in the places where it's
using this typedef.

It is also to use sizeof(variable) instead of sizeof(type).
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15328d9f

sctp: remove the typedef sctp_adaptation_ind_param_t · 85f6bd24

由 Xin Long 提交于 7月 17, 2017

This patch is to remove the typedef sctp_adaptation_ind_param_t, and
replace with struct sctp_adaptation_ind_param in the places where it's
using this typedef.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85f6bd24

sctp: remove the typedef sctp_supported_addrs_param_t · e925d506

由 Xin Long 提交于 7月 17, 2017

This patch is to remove the typedef sctp_supported_addrs_param_t, and
replace with struct sctp_supported_addrs_param in the places where it's
using this typedef.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e925d506

sctp: remove the typedef sctp_cookie_preserve_param_t · 365ddb65

由 Xin Long 提交于 7月 17, 2017

This patch is to remove the typedef sctp_cookie_preserve_param_t, and
replace with struct sctp_cookie_preserve_param in the places where it's
using this typedef.

It is also to fix some indents in sctp_sf_do_5_2_6_stale().
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

365ddb65

sctp: remove the typedef sctp_ipv6addr_param_t · 00987cc0

由 Xin Long 提交于 7月 17, 2017

This patch is to remove the typedef sctp_ipv6addr_param_t, and replace
with struct sctp_ipv6addr_param in the places where it's using this
typedef.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00987cc0

sctp: remove the typedef sctp_ipv4addr_param_t · a38905e6

由 Xin Long 提交于 7月 17, 2017

This patch is to remove the typedef sctp_ipv4addr_param_t, and replace
with struct sctp_ipv4addr_param in the places where it's using this
typedef.
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a38905e6

rds: cancel send/recv work before queuing connection shutdown · aed20a53

由 Sowmini Varadhan 提交于 7月 16, 2017

We could end up executing rds_conn_shutdown before the rds_recv_worker
thread, then rds_conn_shutdown -> rds_tcp_conn_shutdown can do a
sock_release and set sock->sk to null, which may interleave in bad
ways with rds_recv_worker, e.g., it could result in:

"BUG: unable to handle kernel NULL pointer dereference at 0000000000000078"
    [ffff881769f6fd70] release_sock at ffffffff815f337b
    [ffff881769f6fd90] rds_tcp_recv at ffffffffa043c888 [rds_tcp]
    [ffff881769f6fdb0] rds_recv_worker at ffffffffa04a4810 [rds]
    [ffff881769f6fde0] process_one_work at ffffffff810a14c1
    [ffff881769f6fe40] worker_thread at ffffffff810a1940
    [ffff881769f6fec0] kthread at ffffffff810a6b1e

Also, do not enqueue any new shutdown workq items when the connection is
shutting down (this may happen for rds-tcp in softirq mode, if a FIN
or CLOSE is received while the modules is in the middle of an unload)
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aed20a53

13 7月, 2017 1 次提交

datagram: fix kernel-doc comments · d3f6cd9e

由 stephen hemminger 提交于 7月 12, 2017

An underscore in the kernel-doc comment section has special meaning
and mis-use generates an errors.

./net/core/datagram.c:207: ERROR: Unknown target name: "msg".
./net/core/datagram.c:379: ERROR: Unknown target name: "msg".
./net/core/datagram.c:816: ERROR: Unknown target name: "t".
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3f6cd9e

12 7月, 2017 2 次提交

net: ipmr: ipmr_get_table() returns NULL · 2e3d232e

由 Dan Carpenter 提交于 7月 12, 2017

The ipmr_get_table() function doesn't return error pointers it returns
NULL on error.

Fixes: 4f75ba69 ("net: ipmr: Add ipmr_rtm_getroute")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e3d232e

bridge: mdb: fix leak on complete_info ptr on fail path · 1bfb1596

由 Eduardo Valentin 提交于 7月 11, 2017

We currently get the following kmemleak report:
unreferenced object 0xffff8800039d9820 (size 32):
  comm "softirq", pid 0, jiffies 4295212383 (age 792.416s)
  hex dump (first 32 bytes):
    00 0c e0 03 00 88 ff ff ff 02 00 00 00 00 00 00  ................
    00 00 00 01 ff 11 00 02 86 dd 00 00 ff ff ff ff  ................
  backtrace:
    [<ffffffff8152b4aa>] kmemleak_alloc+0x4a/0xa0
    [<ffffffff811d8ec8>] kmem_cache_alloc_trace+0xb8/0x1c0
    [<ffffffffa0389683>] __br_mdb_notify+0x2a3/0x300 [bridge]
    [<ffffffffa038a0ce>] br_mdb_notify+0x6e/0x70 [bridge]
    [<ffffffffa0386479>] br_multicast_add_group+0x109/0x150 [bridge]
    [<ffffffffa0386518>] br_ip6_multicast_add_group+0x58/0x60 [bridge]
    [<ffffffffa0387fb5>] br_multicast_rcv+0x1d5/0xdb0 [bridge]
    [<ffffffffa037d7cf>] br_handle_frame_finish+0xcf/0x510 [bridge]
    [<ffffffffa03a236b>] br_nf_hook_thresh.part.27+0xb/0x10 [br_netfilter]
    [<ffffffffa03a3738>] br_nf_hook_thresh+0x48/0xb0 [br_netfilter]
    [<ffffffffa03a3fb9>] br_nf_pre_routing_finish_ipv6+0x109/0x1d0 [br_netfilter]
    [<ffffffffa03a4400>] br_nf_pre_routing_ipv6+0xd0/0x14c [br_netfilter]
    [<ffffffffa03a3c27>] br_nf_pre_routing+0x197/0x3d0 [br_netfilter]
    [<ffffffff814a2952>] nf_iterate+0x52/0x60
    [<ffffffff814a29bc>] nf_hook_slow+0x5c/0xb0
    [<ffffffffa037ddf4>] br_handle_frame+0x1a4/0x2c0 [bridge]

This happens when switchdev_port_obj_add() fails. This patch
frees complete_info object in the fail path.
Reviewed-by: NVallish Vaidyeshwara <vallish@amazon.com>
Signed-off-by: NEduardo Valentin <eduval@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bfb1596

08 7月, 2017 3 次提交

mpls: fix uninitialized in_label var warning in mpls_getroute · a906c1aa

由 Roopa Prabhu 提交于 7月 07, 2017

Fix the below warning generated by static checker:
    net/mpls/af_mpls.c:2111 mpls_getroute()
    error: uninitialized symbol 'in_label'."

Fixes: 397fc9e5 ("mpls: route get support")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a906c1aa

bonding: avoid NETDEV_CHANGEMTU event when unregistering slave · f51048c3

由 WANG Cong 提交于 7月 06, 2017

As Hongjun/Nicolas summarized in their original patch:

"
When a device changes from one netns to another, it's first unregistered,
then the netns reference is updated and the dev is registered in the new
netns. Thus, when a slave moves to another netns, it is first
unregistered. This triggers a NETDEV_UNREGISTER event which is caught by
the bonding driver. The driver calls bond_release(), which calls
dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in
the old netns).
"

This is a very special case, because the device is being unregistered
no one should still care about the NETDEV_CHANGEMTU event triggered
at this point, we can avoid broadcasting this event on this path,
and avoid touching inetdev_event()/addrconf_notify() path.

It requires to export __dev_set_mtu() to bonding driver.
Reported-by: NHongjun Li <hongjun.li@6wind.com>
Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f51048c3

rds: tcp: use sock_create_lite() to create the accept socket · 0933a578

由 Sowmini Varadhan 提交于 7月 06, 2017

There are two problems with calling sock_create_kern() from
rds_tcp_accept_one()
1. it sets up a new_sock->sk that is wasteful, because this ->sk
   is going to get replaced by inet_accept() in the subsequent ->accept()
2. The new_sock->sk is a leaked reference in sock_graft() which
   expects to find a null parent->sk

Avoid these problems by calling sock_create_lite().
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0933a578

07 7月, 2017 20 次提交

I
libceph: osd_state is 32 bits wide in luminous · 0bb05da2
由 Ilya Dryomov 提交于 6月 22, 2017
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
0bb05da2

crush: remove an obsolete comment · 9eebe45c

由 Ilya Dryomov 提交于 6月 22, 2017

Reflects ceph.git commit dca1ae1e0a6b02029c3a7f9dec4114972be26d50.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9eebe45c

crush: crush_init_workspace starts with struct crush_work · b88ed8d8

由 Ilya Dryomov 提交于 6月 22, 2017

It is not just a pointer to crush_work, it is the whole structure.
That is not a problem since it only contains a pointer. But it will
be a problem if new data members are added to crush_work.

Reflects ceph.git commit ee957dd431bfbeb6dadaf77764db8e0757417328.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b88ed8d8

libceph, crush: per-pool crush_choose_arg_map for crush_do_rule() · 5cf9c4a9

由 Ilya Dryomov 提交于 6月 22, 2017

If there is no crush_choose_arg_map for a given pool, a NULL pointer is
passed to preserve existing crush_do_rule() behavior.

Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b,
dbe36e08be00c6519a8c89718dd47b0219c20516.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5cf9c4a9

crush: implement weight and id overrides for straw2 · 069f3222

由 Ilya Dryomov 提交于 6月 22, 2017

bucket_straw2_choose needs to use weights that may be different from
weight_items. For instance to compensate for an uneven distribution
caused by a low number of values. Or to fix the probability biais
introduced by conditional probabilities (see
http://tracker.ceph.com/issues/15653 for more information).

We introduce a weight_set for each straw2 bucket to set the desired
weight for a given item at a given position. The weight of a given item
when picking the first replica (first position) may be different from
the weight the second replica (second position). For instance the weight
matrix for a given bucket containing items 3, 7 and 13 could be as
follows:

          position 0   position 1

item 3     0x10000      0x100000
item 7     0x40000       0x10000
item 13    0x40000       0x10000

When crush_do_rule picks the first of two replicas (position 0), item 7,
3 are four times more likely to be choosen by bucket_straw2_choose than
item 13. When choosing the second replica (position 1), item 3 is ten
times more likely to be choosen than item 7, 13.

By default the weight_set of each bucket exactly matches the content of
item_weights for each position to ensure backward compatibility.

bucket_straw2_choose compares items by using their id. The same ids are
also used to index buckets and they must be unique. For each item in a
bucket an array of ids can be provided for placement purposes and they
are used instead of the ids. If no replacement ids are provided, the
legacy behavior is preserved.

Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

069f3222

libceph: apply_upmap() · 1c2e7b45