- 08 4月, 2016 3 次提交
-
-
由 shamir rabinovitch 提交于
When PAGE_SIZE > 4k single page can contain 2 RDS fragments. If 'rds_ib_cong_recv' ignore the RDS fragment offset in to the page it then read the data fragment as far congestion map update and lead to corruption of the RDS connection far congestion map. Signed-off-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 shamir rabinovitch 提交于
Fix issue in 'rds_ib_cong_recv' when accessing unaligned memory allocated by 'rds_page_remainder_alloc' using uint64_t pointer. Signed-off-by: NShamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch fixes an issue I found in which we were dropping frames if we had enabled checksums on GRE headers that were encapsulated by either FOU or GUE. Without this patch I was barely able to get 1 Gb/s of throughput. With this patch applied I am now at least getting around 6 Gb/s. The issue is due to the fact that with FOU or GUE applied we do not provide a transport offset pointing to the GRE header, nor do we offload it in software as the GRE header is completely skipped by GSO and treated like a VXLAN or GENEVE type header. As such we need to prevent the stack from generating it and also prevent GRE from generating it via any interface we create. Fixes: c3483384 ("gro: Allow tunnel stacking in the case of FOU/GUE") Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 4月, 2016 4 次提交
-
-
由 Dexuan Cui 提交于
This is for the recent kcm driver, which introduces AF_KCM(41) in b7ac4eb(kcm: Kernel Connection Multiplexor module). Signed-off-by: NDexuan Cui <decui@microsoft.com> Cc: Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jorgen Hansen 提交于
The check in vmci_transport_peer_detach_cb should only allow a detach when the qp handle of the transport matches the one in the detach message. Testing: Before this change, a detach from a peer on a different socket would cause an active stream socket to register a detach. Reviewed-by: NGeorge Zhang <georgezhang@vmware.com> Signed-off-by: NJorgen Hansen <jhansen@vmware.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dave Jones 提交于
Trinity and other fuzzers can hit this WARN on far too easily, resulting in a tainted kernel that hinders automated fuzzing. Replace it with a rate-limited printk. Signed-off-by: NDave Jones <davej@codemonkey.org.uk> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This reverts commit c862cc9b. Patch lacks a real-name Signed-off-by. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 4月, 2016 3 次提交
-
-
When creating an ip6tnl tunnel with ip tunnel, rtnl_link_ops is not set before ip6_tnl_create2 is called. When register_netdevice is called, there is no linkinfo attribute in the NEWLINK message because of that. Setting rtnl_link_ops before calling register_netdevice fixes that. Fixes: 0b112457 ("ip6tnl: add support of link creation via rtnl") Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@redhat.com> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Bjorn Helgaas 提交于
This reverts commit 543e3a8d. Direct callers of __netpoll_setup() depend on it to set np->dev, so we can't simply move that assignment up to netpoll_stup(). Reported-by: NBart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcelo Ricardo Leitner 提交于
There is no point on delaying the packet if we can't fit a single byte of data on it anymore. So lets just reduce the threshold by the amount that a data chunk with 4 bytes (rounding) would use. v2: based on the right tree Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 4月, 2016 3 次提交
-
-
由 Bastien Philbert 提交于
This fixes the incorrect variable assignment on error path in br_sysfs_addbr for when the call to kobject_create_and_add fails to assign the value of -EINVAL to the returned variable of err rather then incorrectly return zero making callers think this function has succeededed due to the previous assignment being assigned zero when assigning it the successful return value of the call to sysfs_create_group which is zero. Signed-off-by: NBastien Philbert <bastienphilbert@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Haishuang Yan 提交于
pskb_may_pull() can change skb->data, so we have to load ptr/optr at the right place. Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Haishuang Yan 提交于
pskb_may_pull() can change skb->data, so we have to load ptr/optr at the right place. Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 4月, 2016 1 次提交
-
-
由 Daniel Borkmann 提交于
Sasha Levin reported a suspicious rcu_dereference_protected() warning found while fuzzing with trinity that is similar to this one: [ 52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage! [ 52.765688] other info that might help us debug this: [ 52.765695] rcu_scheduler_active = 1, debug_locks = 1 [ 52.765701] 1 lock held by a.out/1525: [ 52.765704] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20 [ 52.765721] stack backtrace: [ 52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264 [...] [ 52.765768] Call Trace: [ 52.765775] [<ffffffff813e488d>] dump_stack+0x85/0xc8 [ 52.765784] [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110 [ 52.765792] [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90 [ 52.765801] [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun] [ 52.765810] [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun] [ 52.765818] [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210 [ 52.765827] [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun] [ 52.765834] [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690 [ 52.765843] [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60 [ 52.765850] [<ffffffff81261519>] SyS_ioctl+0x79/0x90 [ 52.765858] [<ffffffff81003ba2>] do_syscall_64+0x62/0x140 [ 52.765866] [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25 Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH, DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices. Since the fix in f91ff5b9 ("net: sk_{detach|attach}_filter() rcu fixes") sk_attach_filter()/sk_detach_filter() now dereferences the filter with rcu_dereference_protected(), checking whether socket lock is held in control path. Since its introduction in 99405162 ("tun: socket filter support"), tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the sock_owned_by_user(sk) doesn't apply in this specific case and therefore triggers the false positive. Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair that is used by tap filters and pass in lockdep_rtnl_is_held() for the rcu_dereference_protected() checks instead. Reported-by: NSasha Levin <sasha.levin@oracle.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 4月, 2016 1 次提交
-
-
由 Nicolas Dichtel 提交于
Size of the attribute IFLA_PHYS_PORT_NAME was missing. Fixes: db24a904 ("net: add support for phys_port_name") CC: David Ahern <dsahern@gmail.com> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 3月, 2016 5 次提交
-
-
由 Daniel Borkmann 提交于
Make the 2 byte padding in struct bpf_tunnel_key between tunnel_ttl and tunnel_label members explicit. No issue has been observed, and gcc/llvm does padding for the old struct already, where tunnel_label was not yet present, so the current code works, but since it's part of uapi, make sure we don't introduce holes in structs. Therefore, add tunnel_ext that we can use generically in future (f.e. to flag OAM messages for backends, etc). Also add the offset to the compat tests to be sure should some compilers not padd the tail of the old version of bpf_tunnel_key. Fixes: 4018ab18 ("bpf: support flow label for bpf_skb_{set, get}_tunnel_key") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
IPv6 counters updates use a different macro than IPv4. Fixes: 36cbb245 ("udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Rick Jones <rick.jones2@hp.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch should fix the issues seen with a recent fix to prevent tunnel-in-tunnel frames from being generated with GRO. The fix itself is correct for now as long as we do not add any devices that support NETIF_F_GSO_GRE_CSUM. When such a device is added it could have the potential to mess things up due to the fact that the outer transport header points to the outer UDP header and not the GRE header as would be expected. Fixes: fac8e0f5 ("tunnels: Don't apply GRO to multiple layers of encapsulation.") Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcelo Ricardo Leitner 提交于
Somehow my patch for commit cea8768f ("sctp: allow sctp_transmit_packet and others to use gfp") missed two important chunks, which are now added. Fixes: cea8768f ("sctp: allow sctp_transmit_packet and others to use gfp") Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-By: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Haishuang Yan 提交于
When NET_SWITCHDEV=n, switchdev_port_attr_set will return -EOPNOTSUPP, we should ignore this error code and continue to set the ageing time. Fixes: c62987bb ("bridge: push bridge setting ageing_time down to switchdev") Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 3月, 2016 11 次提交
-
-
由 Liping Zhang 提交于
Commit fa50d974 ("ipv4: Namespaceify ip_default_ttl sysctl knob") use sock_net(skb->sk) to get the net namespace, but we can't assume that sk_buff->sk is always exist, so when it is NULL, oops will happen. Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com> Reviewed-by: NNikolay Borisov <kernel@kyup.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Make sure the table names via getsockopt GET_ENTRIES is nul-terminated in ebtables and all the x_tables variants and their respective compat code. Uncovered by KASAN. Reported-by: NBaozeng Ding <sploving1@gmail.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
When netlink unicast fails to deliver the message to userspace, we should also check if the NFQA_CFG_F_FAIL_OPEN flag is set so we reinject the packet back to the stack. I think the user expects no packet drops when this flag is set due to queueing to userspace errors, no matter if related to the internal queue or when sending the netlink message to userspace. The userspace application will still get the ENOBUFS error via recvmsg() so the user still knows that, with the current configuration that is in place, the userspace application is not consuming the messages at the pace that the kernel needs. Reported-by: N"Yigal Reiss (yreiss)" <yreiss@cisco.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Tested-by: N"Yigal Reiss (yreiss)" <yreiss@cisco.com>
-
由 Florian Westphal 提交于
Ben Hawkes says: In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it is possible for a user-supplied ipt_entry structure to have a large next_offset field. This field is not bounds checked prior to writing a counter value at the supplied offset. Problem is that mark_source_chains should not have been called -- the rule doesn't have a next entry, so its supposed to return an absolute verdict of either ACCEPT or DROP. However, the function conditional() doesn't work as the name implies. It only checks that the rule is using wildcard address matching. However, an unconditional rule must also not be using any matches (no -m args). The underflow validator only checked the addresses, therefore passing the 'unconditional absolute verdict' test, while mark_source_chains also tested for presence of matches, and thus proceeeded to the next (not-existent) rule. Unify this so that all the callers have same idea of 'unconditional rule'. Reported-by: NBen Hawkes <hawkes@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Otherwise this function may read data beyond the ruleset blob. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
We should check that e->target_offset is sane before mark_source_chains gets called since it will fetch the target entry for loop detection. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Arnd Bergmann 提交于
The openvswitch code has gained support for calling into the nf-nat-ipv4/ipv6 modules, however those can be loadable modules in a configuration in which openvswitch is built-in, leading to link errors: net/built-in.o: In function `__ovs_ct_lookup': :(.text+0x2cc2c8): undefined reference to `nf_nat_icmp_reply_translation' :(.text+0x2cc66c): undefined reference to `nf_nat_icmpv6_reply_translation' The dependency on (!NF_NAT || NF_NAT) prevents similar issues, but NF_NAT is set to 'y' if any of the symbols selecting it are built-in, but the link error happens when any of them are modular. A second issue is that even if CONFIG_NF_NAT_IPV6 is built-in, CONFIG_NF_NAT_IPV4 might be completely disabled. This is unlikely to be useful in practice, but the driver currently only handles IPv6 being optional. This patch improves the Kconfig dependency so that openvswitch cannot be built-in if either of the two other symbols are set to 'm', and it replaces the incorrect #ifdef in ovs_ct_nat_execute() with two "if (IS_ENABLED())" checks that should catch all corner cases also make the code more readable. The same #ifdef exists ovs_ct_nat_to_attr(), where it does not cause a link error, but for consistency I'm changing it the same way. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Fixes: 05752523 ("openvswitch: Interface with NAT.") Acked-by: NJoe Stringer <joe@ovn.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Jarno Rajahalme 提交于
OVS should call into CT NAT for packets of new expected connections only when the conntrack state is persisted with the 'commit' option to the OVS CT action. The test for this condition is doubly wrong, as the CT status field is ANDed with the bit number (IPS_EXPECTED_BIT) rather than the mask (IPS_EXPECTED), and due to the wrong assumption that the expected bit would apply only for the first (i.e., 'new') packet of a connection, while in fact the expected bit remains on for the lifetime of an expected connection. The 'ctinfo' value IP_CT_RELATED derived from the ct status can be used instead, as it is only ever applicable to the 'new' packets of the expected connection. Fixes: 05752523 ('openvswitch: Interface with NAT.') Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NJarno Rajahalme <jarno@ovn.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Vishwanath Pai 提交于
This fix adds a new reference counter (ref_netlink) for the struct ip_set. The other reference counter (ref) can be swapped out by ip_set_swap and we need a separate counter to keep track of references for netlink events like dump. Using the same ref counter for dump causes a race condition which can be demonstrated by the following script: ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \ counters ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \ counters ipset save & ipset swap hash_ip3 hash_ip2 ipset destroy hash_ip3 /* will crash the machine */ Swap will exchange the values of ref so destroy will see ref = 0 instead of ref = 1. With this fix in place swap will not succeed because ipset save still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink). Both delete and swap will error out if ref_netlink != 0 on the set. Note: The changes to *_head functions is because previously we would increment ref whenever we called these functions, we don't do that anymore. Reviewed-by: NJoshua Hunt <johunt@akamai.com> Signed-off-by: NVishwanath Pai <vpai@akamai.com> Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Haishuang Yan 提交于
For the input parameter count, it's better to use the size of destination buffer size, as nla_memcpy would take into account the length of the source netlink attribute when a data is copied from an attribute. Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Quentin Armitage 提交于
For a route with IPv6 encapsulation, the traffic class and hop limit values are interchanged when returned to userspace by the kernel. For example, see below. ># ip route add 192.168.0.1 dev eth0.2 encap ip6 dst 0x50 tc 0x50 hoplimit 100 table 1000 ># ip route show table 1000 192.168.0.1 encap ip6 id 0 src :: dst fe83::1 hoplimit 80 tc 100 dev eth0.2 scope link Signed-off-by: NQuentin Armitage <quentin@armitage.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 3月, 2016 9 次提交
-
-
由 Geliang Tang 提交于
Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code. Signed-off-by: NGeliang Tang <geliangtang@163.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Don't open-code sizeof_footer() in read_partial_message() and ceph_msg_revoke(). Also, after switching to sizeof_footer(), it's now possible to use con_out_kvec_add() in prepare_write_message_footer(). Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NAlex Elder <elder@linaro.org>
-
由 Yan, Zheng 提交于
This helper duplicates last extent operation in OSD request, then adjusts the new extent operation's offset and length. The helper is for scatterd page writeback, which adds nonconsecutive dirty pages to single OSD request. Signed-off-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Turn r_ops into a flexible array member to enable large, consisting of up to 16 ops, OSD requests. The use case is scattered writeback in cephfs and, as far as the kernel client is concerned, 16 is just a made up number. r_ops had size 3 for copyup+hint+write, but copyup is really a special case - it can only happen once. ceph_osd_request_cache is therefore stuffed with num_ops=2 requests, anything bigger than that is allocated with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which means either num_ops=1 or num_ops=2 for use_mempool=true - all existing users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with that. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
ceph_osd_request_cache was introduced a long time ago. Also, osd_req is about to get a flexible array member, which ceph_osd_request_cache is going to be aware of. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Although msg_size is calculated correctly, the terms are grouped in a misleading way - snaps appears to not have room for a u32 length. Move calculation closer to its use and regroup terms. No functional change. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Follow userspace nomenclature on this - the next commit adds outdata_len. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
This can happen if __close_session() in ceph_monc_stop() races with a connection reset. We need to ignore such faults, otherwise it's likely we would take !hunting, call __schedule_delayed() and end up with delayed_work() executing on invalid memory, among other things. The (two!) con->private tests are useless, as nothing ever clears con->private. Nuke them. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-