- 18 7月, 2017 1 次提交
-
-
由 John Fastabend 提交于
XDP generic allows users to test XDP programs and/or run them with degraded performance on devices that do not yet support XDP. For testing I typically test eBPF programs using a set of veth devices. This allows testing topologies that would otherwise be difficult to setup especially in the early stages of development. This patch adds a xdp generic hook to the netif_rx_internal() function which is called from dev_forward_skb(). With this addition attaching XDP programs to veth devices works as expected! Also I noticed multiple drivers using netif_rx(). These devices will also benefit and generic XDP will work for them as well. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Tested-by: NAndy Gospodarek <andy@greyhouse.net> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 7月, 2017 13 次提交
-
-
由 Eric Dumazet 提交于
As discussed in Faro during Netfilter Workshop 2017, RB trees can be used with RCU, using a seqlock. Note that net/rxrpc/conn_service.c is already using this. This patch converts inetpeer from AVL tree to RB tree, since it allows to remove private AVL implementation in favor of shared RB code. $ size net/ipv4/inetpeer.before net/ipv4/inetpeer.after text data bss dec hex filename 3195 40 128 3363 d23 net/ipv4/inetpeer.before 1562 24 0 1586 632 net/ipv4/inetpeer.after The same technique can be used to speed up net/netfilter/nft_set_rbtree.c (removing rwlock contention in fast path) Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Herrmann 提交于
All unix sockets now account inflight FDs to the respective sender. This was introduced in: commit 712f4aad Author: willy tarreau <w@1wt.eu> Date: Sun Jan 10 07:54:56 2016 +0100 unix: properly account for FDs passed over unix sockets and further refined in: commit 415e3d3e Author: Hannes Frederic Sowa <hannes@stressinduktion.org> Date: Wed Feb 3 02:11:03 2016 +0100 unix: correctly track in-flight fds in sending process user_struct Hence, regardless of the stacking depth of FDs, the total number of inflight FDs is limited, and accounted. There is no known way for a local user to exceed those limits or exploit the accounting. Furthermore, the GC logic is independent of the recursion/stacking depth as well. It solely depends on the total number of inflight FDs, regardless of their layout. Lastly, the current `recursion_level' suffers a TOCTOU race, since it checks and inherits depths only at queue time. If we consider `A<-B' to mean `queue-B-on-A', the following sequence circumvents the recursion level easily: A<-B B<-C C<-D ... Y<-Z resulting in: A<-B<-C<-...<-Z With all of this in mind, lets drop the recursion limit. It has no additional security value, anymore. On the contrary, it randomly confuses message brokers that try to forward file-descriptors, since any sendmsg(2) call can fail spuriously with ETOOMANYREFS if a client maliciously modifies the FD while inflight. Cc: Alban Crequy <alban.crequy@collabora.co.uk> Cc: Simon McVittie <simon.mcvittie@collabora.co.uk> Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com> Reviewed-by: NTom Gundersen <teg@jklm.no> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 linzhang 提交于
In the pull_pages code block, if the first frag size > eat, we can end the loop in advance to avoid extra copy. Signed-off-by: NLin Zhang <xiaolou4617@gmail.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch is to remove the typedef sctp_hmac_algo_param_t, and replace with struct sctp_hmac_algo_param in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch is to remove the typedef sctp_chunks_param_t, and replace with struct sctp_chunks_param in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch is to remove the typedef sctp_random_param_t, and replace with struct sctp_random_param in the places where it's using this typedef. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch is to remove the typedef sctp_supported_ext_param_t, and replace with struct sctp_supported_ext_param in the places where it's using this typedef. It is also to use sizeof(variable) instead of sizeof(type). Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch is to remove the typedef sctp_adaptation_ind_param_t, and replace with struct sctp_adaptation_ind_param in the places where it's using this typedef. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch is to remove the typedef sctp_supported_addrs_param_t, and replace with struct sctp_supported_addrs_param in the places where it's using this typedef. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch is to remove the typedef sctp_cookie_preserve_param_t, and replace with struct sctp_cookie_preserve_param in the places where it's using this typedef. It is also to fix some indents in sctp_sf_do_5_2_6_stale(). Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch is to remove the typedef sctp_ipv6addr_param_t, and replace with struct sctp_ipv6addr_param in the places where it's using this typedef. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch is to remove the typedef sctp_ipv4addr_param_t, and replace with struct sctp_ipv4addr_param in the places where it's using this typedef. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sowmini Varadhan 提交于
We could end up executing rds_conn_shutdown before the rds_recv_worker thread, then rds_conn_shutdown -> rds_tcp_conn_shutdown can do a sock_release and set sock->sk to null, which may interleave in bad ways with rds_recv_worker, e.g., it could result in: "BUG: unable to handle kernel NULL pointer dereference at 0000000000000078" [ffff881769f6fd70] release_sock at ffffffff815f337b [ffff881769f6fd90] rds_tcp_recv at ffffffffa043c888 [rds_tcp] [ffff881769f6fdb0] rds_recv_worker at ffffffffa04a4810 [rds] [ffff881769f6fde0] process_one_work at ffffffff810a14c1 [ffff881769f6fe40] worker_thread at ffffffff810a1940 [ffff881769f6fec0] kthread at ffffffff810a6b1e Also, do not enqueue any new shutdown workq items when the connection is shutting down (this may happen for rds-tcp in softirq mode, if a FIN or CLOSE is received while the modules is in the middle of an unload) Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 7月, 2017 1 次提交
-
-
由 stephen hemminger 提交于
An underscore in the kernel-doc comment section has special meaning and mis-use generates an errors. ./net/core/datagram.c:207: ERROR: Unknown target name: "msg". ./net/core/datagram.c:379: ERROR: Unknown target name: "msg". ./net/core/datagram.c:816: ERROR: Unknown target name: "t". Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 7月, 2017 2 次提交
-
-
由 Dan Carpenter 提交于
The ipmr_get_table() function doesn't return error pointers it returns NULL on error. Fixes: 4f75ba69 ("net: ipmr: Add ipmr_rtm_getroute") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eduardo Valentin 提交于
We currently get the following kmemleak report: unreferenced object 0xffff8800039d9820 (size 32): comm "softirq", pid 0, jiffies 4295212383 (age 792.416s) hex dump (first 32 bytes): 00 0c e0 03 00 88 ff ff ff 02 00 00 00 00 00 00 ................ 00 00 00 01 ff 11 00 02 86 dd 00 00 ff ff ff ff ................ backtrace: [<ffffffff8152b4aa>] kmemleak_alloc+0x4a/0xa0 [<ffffffff811d8ec8>] kmem_cache_alloc_trace+0xb8/0x1c0 [<ffffffffa0389683>] __br_mdb_notify+0x2a3/0x300 [bridge] [<ffffffffa038a0ce>] br_mdb_notify+0x6e/0x70 [bridge] [<ffffffffa0386479>] br_multicast_add_group+0x109/0x150 [bridge] [<ffffffffa0386518>] br_ip6_multicast_add_group+0x58/0x60 [bridge] [<ffffffffa0387fb5>] br_multicast_rcv+0x1d5/0xdb0 [bridge] [<ffffffffa037d7cf>] br_handle_frame_finish+0xcf/0x510 [bridge] [<ffffffffa03a236b>] br_nf_hook_thresh.part.27+0xb/0x10 [br_netfilter] [<ffffffffa03a3738>] br_nf_hook_thresh+0x48/0xb0 [br_netfilter] [<ffffffffa03a3fb9>] br_nf_pre_routing_finish_ipv6+0x109/0x1d0 [br_netfilter] [<ffffffffa03a4400>] br_nf_pre_routing_ipv6+0xd0/0x14c [br_netfilter] [<ffffffffa03a3c27>] br_nf_pre_routing+0x197/0x3d0 [br_netfilter] [<ffffffff814a2952>] nf_iterate+0x52/0x60 [<ffffffff814a29bc>] nf_hook_slow+0x5c/0xb0 [<ffffffffa037ddf4>] br_handle_frame+0x1a4/0x2c0 [bridge] This happens when switchdev_port_obj_add() fails. This patch frees complete_info object in the fail path. Reviewed-by: NVallish Vaidyeshwara <vallish@amazon.com> Signed-off-by: NEduardo Valentin <eduval@amazon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 7月, 2017 3 次提交
-
-
由 Roopa Prabhu 提交于
Fix the below warning generated by static checker: net/mpls/af_mpls.c:2111 mpls_getroute() error: uninitialized symbol 'in_label'." Fixes: 397fc9e5 ("mpls: route get support") Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 WANG Cong 提交于
As Hongjun/Nicolas summarized in their original patch: " When a device changes from one netns to another, it's first unregistered, then the netns reference is updated and the dev is registered in the new netns. Thus, when a slave moves to another netns, it is first unregistered. This triggers a NETDEV_UNREGISTER event which is caught by the bonding driver. The driver calls bond_release(), which calls dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in the old netns). " This is a very special case, because the device is being unregistered no one should still care about the NETDEV_CHANGEMTU event triggered at this point, we can avoid broadcasting this event on this path, and avoid touching inetdev_event()/addrconf_notify() path. It requires to export __dev_set_mtu() to bonding driver. Reported-by: NHongjun Li <hongjun.li@6wind.com> Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sowmini Varadhan 提交于
There are two problems with calling sock_create_kern() from rds_tcp_accept_one() 1. it sets up a new_sock->sk that is wasteful, because this ->sk is going to get replaced by inet_accept() in the subsequent ->accept() 2. The new_sock->sk is a leaked reference in sock_graft() which expects to find a null parent->sk Avoid these problems by calling sock_create_lite(). Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 7月, 2017 20 次提交
-
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Reflects ceph.git commit dca1ae1e0a6b02029c3a7f9dec4114972be26d50. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
It is not just a pointer to crush_work, it is the whole structure. That is not a problem since it only contains a pointer. But it will be a problem if new data members are added to crush_work. Reflects ceph.git commit ee957dd431bfbeb6dadaf77764db8e0757417328. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
If there is no crush_choose_arg_map for a given pool, a NULL pointer is passed to preserve existing crush_do_rule() behavior. Reflects ceph.git commits 55fb91d64071552ea1bc65ab4ea84d3c8b73ab4b, dbe36e08be00c6519a8c89718dd47b0219c20516. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
bucket_straw2_choose needs to use weights that may be different from weight_items. For instance to compensate for an uneven distribution caused by a low number of values. Or to fix the probability biais introduced by conditional probabilities (see http://tracker.ceph.com/issues/15653 for more information). We introduce a weight_set for each straw2 bucket to set the desired weight for a given item at a given position. The weight of a given item when picking the first replica (first position) may be different from the weight the second replica (second position). For instance the weight matrix for a given bucket containing items 3, 7 and 13 could be as follows: position 0 position 1 item 3 0x10000 0x100000 item 7 0x40000 0x10000 item 13 0x40000 0x10000 When crush_do_rule picks the first of two replicas (position 0), item 7, 3 are four times more likely to be choosen by bucket_straw2_choose than item 13. When choosing the second replica (position 1), item 3 is ten times more likely to be choosen than item 7, 13. By default the weight_set of each bucket exactly matches the content of item_weights for each position to ensure backward compatibility. bucket_straw2_choose compares items by using their id. The same ids are also used to index buckets and they must be unique. For each item in a bucket an array of ids can be provided for placement purposes and they are used instead of the ids. If no replacement ids are provided, the legacy behavior is preserved. Reflects ceph.git commit 19537a450fd5c5a0bb8b7830947507a76db2ceca. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Previously, pg_to_raw_osds() didn't filter for existent OSDs because raw_to_up_osds() would filter for "up" ("up" is predicated on "exists") and raw_to_up_osds() was called directly after pg_to_raw_osds(). Now, with apply_upmap() call in there, nonexistent OSDs in pg_to_raw_osds() output can affect apply_upmap(). Introduce remove_nonexistent_osds() to deal with that. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Move raw_pg_to_pg() call out of get_temp_osds() and into ceph_pg_to_up_acting_osds(), for upcoming apply_upmap(). Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
pg_temp and pg_upmap encodings are the same (PG -> array of osds), except for the incremental remove: it's an empty mapping in new_pg_temp for pg_temp and a separate old_pg_upmap set for pg_upmap. (This isn't to allow for empty pg_upmap mappings -- apparently, pg_temp just wasn't looked at as an example for pg_upmap encoding.) Reuse __decode_pg_temp() for decoding pg_upmap and new_pg_upmap. __decode_pg_temp() stores into pg_temp union member, but since pg_upmap union member is identical, reading through pg_upmap later is OK. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Some of these won't be as efficient as they could be (e.g. ceph_decode_skip_set(... 32 ...) could advance by len * sizeof(u32) once instead of advancing by sizeof(u32) len times), but that's fine and not worth a bunch of extra macro code. Replace skip_name_map() with ceph_decode_skip_map as an example. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Switch to DEFINE_RB_FUNCS2-generated {insert,lookup,erase}_pg_mapping(). Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Make __{lookup,remove}_pg_mapping() look like their ceph_spg_mapping counterparts: take const struct ceph_pg *. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
For luminous and beyond we are encoding the actual spgid, which requires operating with the correct pg_num, i.e. that of the target pool. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
need_check_tiering logic doesn't make a whole lot of sense. Drop it and apply tiering unconditionally on every calc_target() call instead. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Otherwise we may miss events like PG splits, pool deletions, etc when we get multiple incremental maps at once. Because check_pool_dne() can now be fed an unlinked request, finish_request() needed to be taught to handle unlinked requests. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
When processing a map update consisting of multiple incrementals, we may end up running check_linger_pool_dne() on a lingering request that was previously added to need_resend_linger list. If it is concluded that the target pool doesn't exist, the request is killed off while still on need_resend_linger list, which leads to a crash on a NULL lreq->osd in kick_requests(): libceph: linger_id 18446462598732840961 pool does not exist BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: ceph_osdc_handle_map+0x4ae/0x870 Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Note that ceph_osd_request_target fields are updated regardless of RESEND_ON_SPLIT. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Replace it with more fine-grained bools to separate updating ceph_osd_request_target fields and the decision to resend. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-