- 24 12月, 2016 2 次提交
-
-
由 Dave Jones 提交于
By setting certain socket options on ipv6 raw sockets, we can confuse the length calculation in rawv6_push_pending_frames triggering a BUG_ON. RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40 RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282 RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002 RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00 RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009 R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030 R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80 Call Trace: [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170 [<ffffffff816da27e>] SyS_sendto+0xe/0x10 [<ffffffff81002910>] do_syscall_64+0x50/0xa0 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25 Handle by jumping to the failure path if skb_copy_bits gets an EFAULT. Reproducer: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #define LEN 504 int main(int argc, char* argv[]) { int fd; int zero = 0; char buf[LEN]; memset(buf, 0, LEN); fd = socket(AF_INET6, SOCK_RAW, 7); setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4); setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN); sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110); } Signed-off-by: NDave Jones <davej@codemonkey.org.uk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within the packet. For sockets that have transport headers pulled, transport offset can be negative. Use signed comparison to avoid overflow. Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing") Reported-by: NNisar Jagabar <njagabar@cloudmark.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 12月, 2016 2 次提交
-
-
由 Mantas M 提交于
The protocol field is checked when deleting IPv4 routes, but ignored for IPv6, which causes problems with routing daemons accidentally deleting externally set routes (observed by multiple bird6 users). This can be verified using `ip -6 route del <prefix> proto something`. Signed-off-by: NMantas Mikulėnas <grawity@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
A user may call listen with binding an explicit port with the intent that the kernel will assign an available port to the socket. In this case inet_csk_get_port does a port scan. For such sockets, the user may also set soreuseport with the intent a creating more sockets for the port that is selected. The problem is that the initial socket being opened could inadvertently choose an existing and unreleated port number that was already created with soreuseport. This patch adds a boolean parameter to inet_bind_conflict that indicates rather soreuseport is allowed for the check (in addition to sk->sk_reuseport). In calls to inet_bind_conflict from inet_csk_get_port the argument is set to true if an explicit port is being looked up (snum argument is nonzero), and is false if port scan is done. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 12月, 2016 5 次提交
-
-
由 Liping Zhang 提交于
Acctually ntohl and htonl are identical, so this doesn't affect anything, but it is conceptually wrong. Signed-off-by: NLiping Zhang <zlpnobody@gmail.com> Acked-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
instead of allocating each xt_counter individually, allocate 4k chunks and then use these for counter allocation requests. This should speed up rule evaluation by increasing data locality, also speeds up ruleset loading because we reduce calls to the percpu allocator. As Eric points out we can't use PAGE_SIZE, page_allocator would fail on arches with 64k page size. Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Keeps some noise away from a followup patch. Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
On SMP we overload the packet counter (unsigned long) to contain percpu offset. Hide this from callers and pass xt_counters address instead. Preparation patch to allocate the percpu counters in page-sized batch chunks. Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
nf_defrag modules for ipv4 and ipv6 export an empty stub function. Any module that needs the defragmentation hooks registered simply 'calls' this empty function to create a phony module dependency -- modprobe will then load the defrag module too. This extends netfilter ipv4/ipv6 defragmentation modules to delay the hook registration until the functionality is requested within a network namespace instead of module load time for all namespaces. Hooks are only un-registered on module unload or when a namespace that used such defrag functionality exits. We have to use struct net for this as the register hooks can be called before netns initialization here from the ipv4/ipv6 conntrack module init path. There is no unregister functionality support, defrag will always be active once it was requested inside a net namespace. The reason is that defrag has impact on nft and iptables rulesets (without defrag we might see framents). Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 06 12月, 2016 2 次提交
-
-
由 Erik Nordmark 提交于
Made kernel accept IPv6 routes with IPv4-mapped address as next-hop. It is possible to configure IP interfaces with IPv4-mapped addresses, and one can add IPv6 routes for IPv4-mapped destinations/prefixes, yet prior to this fix the kernel returned an EINVAL when attempting to add an IPv6 route with an IPv4-mapped address as a nexthop/gateway. RFC 4798 (a proposed standard RFC) uses IPv4-mapped addresses as nexthops, thus in order to support that type of address configuration the kernel needs to allow IPv4-mapped addresses as nexthops. Signed-off-by: NErik Nordmark <nordmark@arista.com> Signed-off-by: NBob Gilligan <gilligan@arista.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tsq_flags being in the same cache line than sk_wmem_alloc makes a lot of sense. Both fields are changed from tcp_wfree() and more generally by various TSQ related functions. Prior patch made room in struct sock and added sk_tsq_flags, this patch deletes tsq_flags from struct tcp_sock. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 12月, 2016 7 次提交
-
-
由 Florian Westphal 提交于
This makes use of nf_ct_netns_get/put added in previous patch. We add get/put functions to nf_conntrack_l3proto structure, ipv4 and ipv6 then implement use-count to track how many users (nft or xtables modules) have a dependency on ipv4 and/or ipv6 connection tracking functionality. When count reaches zero, the hooks are unregistered. This delays activation of connection tracking inside a namespace until stateful firewall rule or nat rule gets added. This patch breaks backwards compatibility in the sense that connection tracking won't be active anymore when the protocol tracker module is loaded. This breaks e.g. setups that ctnetlink for flow accounting and the like, without any '-m conntrack' packet filter rules. Followup patch restores old behavour and makes new delayed scheme optional via sysctl. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
so that conntrack core will add the needed hooks in this namespace. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
currently aliased to try_module_get/_put. Will be changed in next patch when we add functions to make use of ->net argument to store usercount per l3proto tracker. This is needed to avoid registering the conntrack hooks in all netns and later only enable connection tracking in those that need conntrack. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Davide Caratti 提交于
CONFIG_NF_CT_PROTO_UDPLITE is no more a tristate. When set to y, connection tracking support for UDPlite protocol is built-in into nf_conntrack.ko. footprint test: $ ls -l net/netfilter/nf_conntrack{_proto_udplite,}.ko \ net/ipv4/netfilter/nf_conntrack_ipv4.ko \ net/ipv6/netfilter/nf_conntrack_ipv6.ko (builtin)|| udplite| ipv4 | ipv6 |nf_conntrack ---------++--------+--------+--------+-------------- none || 432538 | 828755 | 828676 | 6141434 UDPlite || - | 829649 | 829362 | 6498204 Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Davide Caratti 提交于
CONFIG_NF_CT_PROTO_SCTP is no more a tristate. When set to y, connection tracking support for SCTP protocol is built-in into nf_conntrack.ko. footprint test: $ ls -l net/netfilter/nf_conntrack{_proto_sctp,}.ko \ net/ipv4/netfilter/nf_conntrack_ipv4.ko \ net/ipv6/netfilter/nf_conntrack_ipv6.ko (builtin)|| sctp | ipv4 | ipv6 | nf_conntrack ---------++--------+--------+--------+-------------- none || 498243 | 828755 | 828676 | 6141434 SCTP || - | 829254 | 829175 | 6547872 Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Davide Caratti 提交于
CONFIG_NF_CT_PROTO_DCCP is no more a tristate. When set to y, connection tracking support for DCCP protocol is built-in into nf_conntrack.ko. footprint test: $ ls -l net/netfilter/nf_conntrack{_proto_dccp,}.ko \ net/ipv4/netfilter/nf_conntrack_ipv4.ko \ net/ipv6/netfilter/nf_conntrack_ipv6.ko (builtin)|| dccp | ipv4 | ipv6 | nf_conntrack ---------++--------+--------+--------+-------------- none || 469140 | 828755 | 828676 | 6141434 DCCP || - | 830566 | 829935 | 6533526 Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Arturo Borrero Gonzalez 提交于
The email address has changed, let's update the copyright statements. Signed-off-by: NArturo Borrero Gonzalez <arturo@debian.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 04 12月, 2016 1 次提交
-
-
由 Erik Nordmark 提交于
Implemented RFC7527 Enhanced DAD. IPv6 duplicate address detection can fail if there is some temporary loopback of Ethernet frames. RFC7527 solves this by including a random nonce in the NS messages used for DAD, and if an NS is received with the same nonce it is assumed to be a looped back DAD probe and is ignored. RFC7527 is enabled by default. Can be disabled by setting both of conf/{all,interface}/enhanced_dad to zero. Signed-off-by: NErik Nordmark <nordmark@arista.com> Signed-off-by: NBob Gilligan <gilligan@arista.com> Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 12月, 2016 5 次提交
-
-
由 David Ahern 提交于
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run any time a process in the cgroup opens an AF_INET or AF_INET6 socket. Currently only sk_bound_dev_if is exported to userspace for modification by a bpf program. This allows a cgroup to be configured such that AF_INET{6} sockets opened by processes are automatically bound to a specific device. In turn, this enables the running of programs that do not support SO_BINDTODEVICE in a specific VRF context / L3 domain. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Artem Savkov 提交于
segs needs to be checked for being NULL in ipv6_gso_segment() before calling skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference: [ 97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc [ 97.819112] IP: [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0 [ 97.825214] PGD 0 [ 97.827047] [ 97.828540] Oops: 0000 [#1] SMP [ 97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod [ 97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1 [ 97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010 [ 97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000 [ 97.947720] RIP: 0010:[<ffffffff816e52f9>] [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0 [ 97.956251] RSP: 0018:ffff88012fc43a10 EFLAGS: 00010207 [ 97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594 [ 97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000 [ 97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000 [ 97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3 [ 97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000 [ 97.997198] FS: 0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000 [ 98.005280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0 [ 98.018149] Stack: [ 98.020157] 00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e [ 98.027584] 0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3 [ 98.035010] ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98 [ 98.042437] Call Trace: [ 98.044879] <IRQ> [ 98.046803] [<ffffffffa017ad0a>] ? tg3_start_xmit+0x84a/0xd60 [tg3] [ 98.053156] [<ffffffff815eeee0>] skb_mac_gso_segment+0xb0/0x130 [ 98.059158] [<ffffffff815eefd3>] __skb_gso_segment+0x73/0x110 [ 98.064985] [<ffffffff815ef40d>] validate_xmit_skb+0x12d/0x2b0 [ 98.070899] [<ffffffff815ef5d2>] validate_xmit_skb_list+0x42/0x70 [ 98.077073] [<ffffffff81618560>] sch_direct_xmit+0xd0/0x1b0 [ 98.082726] [<ffffffff815efd86>] __dev_queue_xmit+0x486/0x690 [ 98.088554] [<ffffffff8135c135>] ? cpumask_next_and+0x35/0x50 [ 98.094380] [<ffffffff815effa0>] dev_queue_xmit+0x10/0x20 [ 98.099863] [<ffffffffa09ce057>] br_dev_queue_push_xmit+0xa7/0x170 [bridge] [ 98.106907] [<ffffffffa09ce161>] br_forward_finish+0x41/0xc0 [bridge] [ 98.113430] [<ffffffff81627cf2>] ? nf_iterate+0x52/0x60 [ 98.118735] [<ffffffff81627d6b>] ? nf_hook_slow+0x6b/0xc0 [ 98.124216] [<ffffffffa09ce32c>] __br_forward+0x14c/0x1e0 [bridge] [ 98.130480] [<ffffffffa09ce120>] ? br_dev_queue_push_xmit+0x170/0x170 [bridge] [ 98.137785] [<ffffffffa09ce4bd>] br_forward+0x9d/0xb0 [bridge] [ 98.143701] [<ffffffffa09cfbb7>] br_handle_frame_finish+0x267/0x560 [bridge] [ 98.150834] [<ffffffffa09d0064>] br_handle_frame+0x174/0x2f0 [bridge] [ 98.157355] [<ffffffff8102fb89>] ? sched_clock+0x9/0x10 [ 98.162662] [<ffffffff810b63b2>] ? sched_clock_cpu+0x72/0xa0 [ 98.168403] [<ffffffff815eccf5>] __netif_receive_skb_core+0x1e5/0xa20 [ 98.174926] [<ffffffff813659f9>] ? timerqueue_add+0x59/0xb0 [ 98.180580] [<ffffffff815ed548>] __netif_receive_skb+0x18/0x60 [ 98.186494] [<ffffffff815ee625>] process_backlog+0x95/0x140 [ 98.192145] [<ffffffff815edccd>] net_rx_action+0x16d/0x380 [ 98.197713] [<ffffffff8170cff1>] __do_softirq+0xd1/0x283 [ 98.203106] [<ffffffff8170b2bc>] do_softirq_own_stack+0x1c/0x30 [ 98.209107] <EOI> [ 98.211029] [<ffffffff8108a5c0>] do_softirq+0x50/0x60 [ 98.216166] [<ffffffff815ec853>] netif_rx_ni+0x33/0x80 [ 98.221386] [<ffffffffa09eeff7>] tun_get_user+0x487/0x7f0 [tun] [ 98.227388] [<ffffffffa09ef3ab>] tun_sendmsg+0x4b/0x60 [tun] [ 98.233129] [<ffffffffa0b68932>] handle_tx+0x282/0x540 [vhost_net] [ 98.239392] [<ffffffffa0b68c25>] handle_tx_kick+0x15/0x20 [vhost_net] [ 98.245916] [<ffffffffa0abacfe>] vhost_worker+0x9e/0xf0 [vhost] [ 98.251919] [<ffffffffa0abac60>] ? vhost_umem_alloc+0x40/0x40 [vhost] [ 98.258440] [<ffffffff81003a47>] ? do_syscall_64+0x67/0x180 [ 98.264094] [<ffffffff810a44d9>] kthread+0xd9/0xf0 [ 98.268965] [<ffffffff810a4400>] ? kthread_park+0x60/0x60 [ 98.274444] [<ffffffff8170a4d5>] ret_from_fork+0x25/0x30 [ 98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66 [ 98.299425] RIP [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0 [ 98.305612] RSP <ffff88012fc43a10> [ 98.309094] CR2: 00000000000000cc [ 98.312406] ---[ end trace 726a2c7a2d2d78d0 ]--- Signed-off-by: NArtem Savkov <asavkov@redhat.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
jiffies based timestamps allow for easy inference of number of devices behind NAT translators and also makes tracking of hosts simpler. commit ceaa1fef ("tcp: adding a per-socket timestamp offset") added the main infrastructure that is needed for per-connection ts randomization, in particular writing/reading the on-wire tcp header format takes the offset into account so rest of stack can use normal tcp_time_stamp (jiffies). So only two items are left: - add a tsoffset for request sockets - extend the tcp isn generator to also return another 32bit number in addition to the ISN. Re-use of ISN generator also means timestamps are still monotonically increasing for same connection quadruple, i.e. PAWS will still work. Includes fixes from Eric Dumazet. Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eli Cooper 提交于
This reverts commit ae148b08 ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"). skb->protocol is now set in __ip_local_out() and __ip6_local_out() before dst_output() is called. It is no longer necessary to do it for each tunnel. Cc: stable@vger.kernel.org Signed-off-by: NEli Cooper <elicooper@gmx.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eli Cooper 提交于
When xfrm is applied to TSO/GSO packets, it follows this path: xfrm_output() -> xfrm_output_gso() -> skb_gso_segment() where skb_gso_segment() relies on skb->protocol to function properly. This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called, fixing a bug where GSO packets sent through an ipip6 tunnel are dropped when xfrm is involved. Cc: stable@vger.kernel.org Signed-off-by: NEli Cooper <elicooper@gmx.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 12月, 2016 1 次提交
-
-
由 Guillaume Nault 提交于
Socket flags aren't updated atomically, so the socket must be locked while reading the SOCK_ZAPPED flag. This issue exists for both l2tp_ip and l2tp_ip6. For IPv6, this patch also brings error handling for __ip6_datagram_connect() failures. Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 11月, 2016 2 次提交
-
-
由 Tobias Brunner 提交于
When handling inbound packets, the two halves of the sequence number stored on the skb are already in network order. Fixes: 000ae7b2 ("esp6: Switch to new AEAD interface") Signed-off-by: NTobias Brunner <tobias@strongswan.org> Acked-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Florian Westphal 提交于
Dmitry Vyukov reported GPF in network stack that Andrey traced down to negative nh offset in nf_ct_frag6_queue(). Problem is that all network headers before fragment header are pulled. Normal ipv6 reassembly will drop the skb when errors occur further down the line. netfilter doesn't do this, and instead passed the original fragment along. That was also fine back when netfilter ipv6 defrag worked with cloned fragments, as the original, pristine fragment was passed on. So we either have to undo the pull op, or discard such fragments. Since they're malformed after all (e.g. overlapping fragment) it seems preferrable to just drop them. Same for temporary errors -- it doesn't make sense to accept (and perhaps forward!) only some fragments of same datagram. Fixes: 029f7f3b ("netfilter: ipv6: nf_defrag: avoid/free clone operations") Reported-by: NDmitry Vyukov <dvyukov@google.com> Debugged-by: NAndrey Konovalov <andreyknvl@google.com> Diagnosed-by: NEric Dumazet <Eric Dumazet <edumazet@google.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 29 11月, 2016 1 次提交
-
-
由 David Ahern 提交于
Andrey reported the following while fuzzing the kernel with syzkaller: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Modules linked in: CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 task: ffff8800666d4200 task.stack: ffff880067348000 RIP: 0010:[<ffffffff833617ec>] [<ffffffff833617ec>] icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451 RSP: 0018:ffff88006734f2c0 EFLAGS: 00010206 RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018 RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003 R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000 R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0 FS: 00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0 Stack: ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460 ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046 ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000 Call Trace: [<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557 [< inline >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88 [<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157 [<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663 [<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191 ... icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both cases the dst->dev should be preferred for determining the L3 domain if the dst has been set on the skb. Fallback to the skb->dev if it has not. This covers the case reported here where icmp6_send is invoked on Rx before the route lookup. Fixes: 5d41ce29 ("net: icmp6_send should use dst dev to determine L3 domain") Reported-by: NAndrey Konovalov <andreyknvl@google.com> Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 11月, 2016 1 次提交
-
-
由 Daniel Mack 提交于
If the cgroup associated with the receiving socket has an eBPF programs installed, run them from ip_output(), ip6_output() and ip_mc_output(). From mentioned functions we have two socket contexts as per 7026b1dd ("netfilter: Pass socket pointer down through okfn()."). We explicitly need to use sk instead of skb->sk here, since otherwise the same program would run multiple times on egress when encap devices are involved, which is not desired in our case. eBPF programs used in this context are expected to either return 1 to let the packet pass, or != 1 to drop them. The programs have access to the skb through bpf_skb_load_bytes(), and the payload starts at the network headers (L3). Note that cgroup_bpf_run_filter() is stubbed out as static inline nop for !CONFIG_CGROUP_BPF, and is otherwise guarded by a static key if the feature is unused. Signed-off-by: NDaniel Mack <daniel@zonque.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 11月, 2016 2 次提交
-
-
由 Eric Dumazet 提交于
In commits 93821778 ("udp: Fix rcv socket locking") and f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite was forgotten. This leads to crashes if UDPlite header is pulled twice, which happens starting from commit e6afc8ac ("udp: remove headers from UDP packets before queueing") Bug found by syzkaller team, thanks a lot guys ! Note that backlog use in UDP/UDPlite is scheduled to be removed starting from linux-4.10, so this patch is only needed up to linux-4.9 Fixes: 93821778 ("udp: Fix rcv socket locking") Fixes: f7ad74fe ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb") Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NAndrey Konovalov <andreyknvl@google.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
When an ipv6 address has the tentative flag set, it can't be used as source for egress traffic, while the associated route, if any, can be looked up and even stored into some dst_cache. In the latter scenario, the source ipv6 address selected and stored in the cache is most probably wrong (e.g. with link-local scope) and the entity using the dst_cache will experience lack of ipv6 connectivity until said cache is cleared or invalidated. Overall this may cause lack of connectivity over most IPv6 tunnels (comprising geneve and vxlan), if the first egress packet reaches the tunnel before the DaD is completed for the used ipv6 address. This patch bumps a new genid after that the IFA_F_TENTATIVE flag is cleared, so that dst_cache will be invalidated on next lookup and ipv6 connectivity restored. Fixes: 0c1d70af ("net: use dst_cache for vxlan device") Fixes: 468dfffc ("geneve: add dst caching support") Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 11月, 2016 1 次提交
-
-
由 David Ahern 提交于
nf_send_reset6 is not considering the L3 domain and lookups are sent to the wrong table. For example consider the following output rule: ip6tables -A OUTPUT -p tcp --dport 12345 -j REJECT --reject-with tcp-reset using perf to analyze lookups via the fib6_table_lookup tracepoint shows: swapper 0 [001] 248.787816: fib6:fib6_table_lookup: table 255 oif 0 iif 1 src 2100:1::3 dst 2100:1: ffffffff81439cdc perf_trace_fib6_table_lookup ([kernel.kallsyms]) ffffffff814c1ce3 trace_fib6_table_lookup ([kernel.kallsyms]) ffffffff814c3e89 ip6_pol_route ([kernel.kallsyms]) ffffffff814c40d5 ip6_pol_route_output ([kernel.kallsyms]) ffffffff814e7b6f fib6_rule_action ([kernel.kallsyms]) ffffffff81437f60 fib_rules_lookup ([kernel.kallsyms]) ffffffff814e7c79 fib6_rule_lookup ([kernel.kallsyms]) ffffffff814c2541 ip6_route_output_flags ([kernel.kallsyms]) 528 nf_send_reset6 ([nf_reject_ipv6]) The lookup is directed to table 255 rather than the table associated with the device via the L3 domain. Update nf_send_reset6 to pull the L3 domain from the dst currently attached to the skb. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 22 11月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
UDP_SKB_CB(skb)->partial_cov is located at offset 66 in skb, requesting a cold cache line being read in cpu cache. We can avoid this cache line miss for UDP sockets, as partial_cov has a meaning only for UDPLite. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 11月, 2016 1 次提交
-
-
由 Alexey Dobriyan 提交于
1) cast to "int" is unnecessary: u8 will be promoted to int before decrementing, small positive numbers fit into "int", so their values won't be changed during promotion. Once everything is int including loop counters, signedness doesn't matter: 32-bit operations will stay 32-bit operations. But! Someone tried to make this loop smart by making everything of the same type apparently in an attempt to optimise it. Do the optimization, just differently. Do the cast where it matters. :^) 2) frag size is unsigned entity and sum of fragments sizes is also unsigned. Make everything unsigned, leave no MOVSX instruction behind. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-4 (-4) function old new delta skb_cow_data 835 834 -1 ip_do_fragment 2549 2548 -1 ip6_fragment 3130 3128 -2 Total: Before=154865032, After=154865028, chg -0.00% Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 11月, 2016 3 次提交
-
-
由 Alexey Dobriyan 提交于
Make struct pernet_operations::id unsigned. There are 2 reasons to do so: 1) This field is really an index into an zero based array and thus is unsigned entity. Using negative value is out-of-bound access by definition. 2) On x86_64 unsigned 32-bit data which are mixed with pointers via array indexing or offsets added or subtracted to pointers are preffered to signed 32-bit data. "int" being used as an array index needs to be sign-extended to 64-bit before being used. void f(long *p, int i) { g(p[i]); } roughly translates to movsx rsi, esi mov rdi, [rsi+...] call g MOVSX is 3 byte instruction which isn't necessary if the variable is unsigned because x86_64 is zero extending by default. Now, there is net_generic() function which, you guessed it right, uses "int" as an array index: static inline void *net_generic(const struct net *net, int id) { ... ptr = ng->ptr[id - 1]; ... } And this function is used a lot, so those sign extensions add up. Patch snipes ~1730 bytes on allyesconfig kernel (without all junk messing with code generation): add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) Unfortunately some functions actually grow bigger. This is a semmingly random artefact of code generation with register allocator being used differently. gcc decides that some variable needs to live in new r8+ registers and every access now requires REX prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be used which is longer than [r8] However, overall balance is in negative direction: add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730) function old new delta nfsd4_lock 3886 3959 +73 tipc_link_build_proto_msg 1096 1140 +44 mac80211_hwsim_new_radio 2776 2808 +32 tipc_mon_rcv 1032 1058 +26 svcauth_gss_legacy_init 1413 1429 +16 tipc_bcbase_select_primary 379 392 +13 nfsd4_exchange_id 1247 1260 +13 nfsd4_setclientid_confirm 782 793 +11 ... put_client_renew_locked 494 480 -14 ip_set_sockfn_get 730 716 -14 geneve_sock_add 829 813 -16 nfsd4_sequence_done 721 703 -18 nlmclnt_lookup_host 708 686 -22 nfsd4_lockt 1085 1063 -22 nfs_get_client 1077 1050 -27 tcf_bpf_init 1106 1076 -30 nfsd4_encode_fattr 5997 5930 -67 Total: Before=154856051, After=154854321, chg -0.00% Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
UDP busy polling is restricted to connected UDP sockets. This is because sk_busy_loop() only takes care of one NAPI context. There are cases where it could be extended. 1) Some hosts receive traffic on a single NIC, with one RX queue. 2) Some applications use SO_REUSEPORT and associated BPF filter to split the incoming traffic on one UDP socket per RX queue/thread/cpu 3) Some UDP sockets are used to send/receive traffic for one flow, but they do not bother with connect() This patch records the napi_id of first received skb, giving more reach to busy polling. Tested: lpaa23:~# echo 70 >/proc/sys/net/core/busy_read lpaa24:~# echo 70 >/proc/sys/net/core/busy_read lpaa23:~# for f in `seq 1 10`; do ./super_netperf 1 -H lpaa24 -t UDP_RR -l 5; done Before patch : 27867 28870 37324 41060 41215 36764 36838 44455 41282 43843 After patch : 73920 73213 70147 74845 71697 68315 68028 75219 70082 73707 Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
If an ip6 tunnel is configured to inherit the traffic class from the inner header, the dst_cache must be disabled or it will foul the policy routing. The issue is apprently there since at leat Linux-2.6.12-rc2. Reported-by: NLiam McBirnie <liam.mcbirnie@boeing.com> Cc: Liam McBirnie <liam.mcbirnie@boeing.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 11月, 2016 1 次提交
-
-
由 David Lebrun 提交于
This patch adds a new option CONFIG_IPV6_SEG6_LWTUNNEL to enable/disable support of encapsulation with the lightweight tunnels. When this option is enabled, CONFIG_LWTUNNEL is automatically selected. Fix commit 6c8702c6 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Without a proper option to control lwtunnel support for SR-IPv6, if CONFIG_LWTUNNEL=n then the IPv6 initialization fails as a consequence of seg6_iptunnel_init() failure with EOPNOTSUPP: NET: Registered protocol family 10 IPv6: Attempt to unregister permanent protocol 6 IPv6: Attempt to unregister permanent protocol 136 IPv6: Attempt to unregister permanent protocol 17 NET: Unregistered protocol family 10 Tested (compiling, booting, and loading ipv6 module when relevant) with possible combinations of CONFIG_IPV6={y,m,n}, CONFIG_IPV6_SEG6_LWTUNNEL={y,n} and CONFIG_LWTUNNEL={y,n}. Reported-by: NLorenzo Colitti <lorenzo@google.com> Suggested-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 11月, 2016 2 次提交
-
-
由 Pablo Neira 提交于
Honor udptable parameter that is passed to __udp*_lib_mcast_deliver(), otherwise udplite broadcast/multicast use the wrong table and it breaks. Fixes: 2dc41cff ("udp: Use hash2 for long hash1 chains in __udp*_lib_mcast_deliver.") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
The commit 850cbadd ("udp: use it's own memory accounting schema") assumes that the socket proto has memory accounting enabled, but this is not the case for UDPLITE. Fix it enabling memory accounting for UDPLITE and performing fwd allocated memory reclaiming on socket shutdown. UDP and UDPLITE share now the same memory accounting limits. Also drop the backlog receive operation, since is no more needed. Fixes: 850cbadd ("udp: use it's own memory accounting schema") Reported-by: NAndrei Vagin <avagin@gmail.com> Suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-