- 02 6月, 2018 1 次提交
-
-
由 Daniel Borkmann 提交于
In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the case of struct bpf_map_info but also struct bpf_prog_info. In net-next commit b85fab0e ("bpf: Add gpl_compatible flag to struct bpf_prog_info") added a bitfield into it to expose some flags related to programs. Thus, add an unnamed __u32 bitfield for both so that alignment keeps the same in both 32 and 64 bit cases, and can be naturally extended from there as in b85fab0e. Before: # file test.o test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # pahole test.o struct bpf_map_info { __u32 type; /* 0 4 */ __u32 id; /* 4 4 */ __u32 key_size; /* 8 4 */ __u32 value_size; /* 12 4 */ __u32 max_entries; /* 16 4 */ __u32 map_flags; /* 20 4 */ char name[16]; /* 24 16 */ __u32 ifindex; /* 40 4 */ __u64 netns_dev; /* 44 8 */ __u64 netns_ino; /* 52 8 */ /* size: 64, cachelines: 1, members: 10 */ /* padding: 4 */ }; After (same as on 64 bit): # file test.o test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # pahole test.o struct bpf_map_info { __u32 type; /* 0 4 */ __u32 id; /* 4 4 */ __u32 key_size; /* 8 4 */ __u32 value_size; /* 12 4 */ __u32 max_entries; /* 16 4 */ __u32 map_flags; /* 20 4 */ char name[16]; /* 24 16 */ __u32 ifindex; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ __u64 netns_dev; /* 48 8 */ __u64 netns_ino; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ /* size: 64, cachelines: 1, members: 10 */ /* sum members: 60, holes: 1, sum holes: 4 */ }; Reported-by: NDmitry V. Levin <ldv@altlinux.org> Reported-by: NEugene Syromiatnikov <esyr@redhat.com> Fixes: 52775b33 ("bpf: offload: report device information about offloaded maps") Fixes: 675fc275 ("bpf: offload: report device information for offloaded programs") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 31 5月, 2018 1 次提交
-
-
由 Neil Armstrong 提交于
The dw_hdmi_setup_rx_sense exported function should not use struct device to recover the dw-hdmi context using drvdata, but take struct dw_hdmi directly like other exported functions. This caused a regression using Meson DRM on S905X since v4.17-rc1 : Internal error: Oops: 96000007 [#1] PREEMPT SMP [...] CPU: 0 PID: 124 Comm: irq/32-dw_hdmi_ Not tainted 4.17.0-rc7 #2 Hardware name: Libre Technology CC (DT) [...] pc : osq_lock+0x54/0x188 lr : __mutex_lock.isra.0+0x74/0x530 [...] Process irq/32-dw_hdmi_ (pid: 124, stack limit = 0x00000000adf418cb) Call trace: osq_lock+0x54/0x188 __mutex_lock_slowpath+0x10/0x18 mutex_lock+0x30/0x38 __dw_hdmi_setup_rx_sense+0x28/0x98 dw_hdmi_setup_rx_sense+0x10/0x18 dw_hdmi_top_thread_irq+0x2c/0x50 irq_thread_fn+0x28/0x68 irq_thread+0x10c/0x1a0 kthread+0x128/0x130 ret_from_fork+0x10/0x18 Code: 34000964 d00050a2 51000484 9135c042 (f864d844) ---[ end trace 945641e1fbbc07da ]--- note: irq/32-dw_hdmi_[124] exited with preempt_count 1 genirq: exiting task "irq/32-dw_hdmi_" (124) is an active IRQ thread (irq 32) Fixes: eea034af ("drm/bridge/synopsys: dw-hdmi: don't clobber drvdata") Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com> Tested-by: NKoen Kooi <koen@dominion.thruhere.net> Signed-off-by: NSean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/1527673438-20643-1-git-send-email-narmstrong@baylibre.com
-
- 26 5月, 2018 2 次提交
-
-
由 Jonathan Cameron 提交于
The case of a new numa node got missed in avoiding using the node info from page_struct during hotplug. In this path we have a call to register_mem_sect_under_node (which allows us to specify it is hotplug so don't change the node), via link_mem_sections which unfortunately does not. Fix is to pass check_nid through link_mem_sections as well and disable it in the new numa node path. Note the bug only 'sometimes' manifests depending on what happens to be in the struct page structures - there are lots of them and it only needs to match one of them. The result of the bug is that (with a new memory only node) we never successfully call register_mem_sect_under_node so don't get the memory associated with the node in sysfs and meminfo for the node doesn't report it. It came up whilst testing some arm64 hotplug patches, but appears to be universal. Whilst I'm triggering it by removing then reinserting memory to a node with no other elements (thus making the node disappear then appear again), it appears it would happen on hotplugging memory where there was none before and it doesn't seem to be related the arm64 patches. These patches call __add_pages (where most of the issue was fixed by Pavel's patch). If there is a node at the time of the __add_pages call then all is well as it calls register_mem_sect_under_node from there with check_nid set to false. Without a node that function returns having not done the sysfs related stuff as there is no node to use. This is expected but it is the resulting path that fails... Exact path to the problem is as follows: mm/memory_hotplug.c: add_memory_resource() The node is not online so we enter the 'if (new_node)' twice, on the second such block there is a call to link_mem_sections which calls into drivers/node.c: link_mem_sections() which calls drivers/node.c: register_mem_sect_under_node() which calls get_nid_for_pfn and keeps trying until the output of that matches the expected node (passed all the way down from add_memory_resource) It is effectively the same fix as the one referred to in the fixes tag just in the code path for a new node where the comments point out we have to rerun the link creation because it will have failed in register_new_memory (as there was no node at the time). (actually that comment is wrong now as we don't have register_new_memory any more it got renamed to hotplug_memory_register in Pavel's patch). Link: http://lkml.kernel.org/r/20180504085311.1240-1-Jonathan.Cameron@huawei.com Fixes: fc44f7f9 ("mm/memory_hotplug: don't read nid from struct page during hotplug") Signed-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Oscar has noticed that we splat WARNING: CPU: 0 PID: 64 at ./include/linux/gfp.h:467 vmemmap_alloc_block+0x4e/0xc9 [...] CPU: 0 PID: 64 Comm: kworker/u4:1 Tainted: G W E 4.17.0-rc5-next-20180517-1-default+ #66 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: kacpi_hotplug acpi_hotplug_work_fn Call Trace: vmemmap_populate+0xf2/0x2ae sparse_mem_map_populate+0x28/0x35 sparse_add_one_section+0x4c/0x187 __add_pages+0xe7/0x1a0 add_pages+0x16/0x70 add_memory_resource+0xa3/0x1d0 add_memory+0xe4/0x110 acpi_memory_device_add+0x134/0x2e0 acpi_bus_attach+0xd9/0x190 acpi_bus_scan+0x37/0x70 acpi_device_hotplug+0x389/0x4e0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x146/0x340 worker_thread+0x47/0x3e0 kthread+0xf5/0x130 ret_from_fork+0x35/0x40 when adding memory to a node that is currently offline. The VM_WARN_ON is just too loud without a good reason. In this particular case we are doing alloc_pages_node(node, GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_NOWARN, order) so we do not insist on allocating from the given node (it is more a hint) so we can fall back to any other populated node and moreover we explicitly ask to not warn for the allocation failure. Soften the warning only to cases when somebody asks for the given node explicitly by __GFP_THISNODE. Link: http://lkml.kernel.org/r/20180523125555.30039-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Reported-by: NOscar Salvador <osalvador@techadventures.net> Tested-by: NOscar Salvador <osalvador@techadventures.net> Reviewed-by: NPavel Tatashin <pasha.tatashin@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 5月, 2018 3 次提交
-
-
Since the following commit: b91473ff ("sched,tracing: Update trace_sched_pi_setprio()") the sched_pi_setprio trace point shows the "newprio" during a deboost: |futex sched_pi_setprio: comm=futex_requeue_p pid"34 oldprio newprio=3D98 |futex sched_switch: prev_comm=futex_requeue_p prev_pid"34 prev_prio=120 This patch open codes __rt_effective_prio() in the tracepoint as the 'newprio' to get the old behaviour back / the correct priority: |futex sched_pi_setprio: comm=futex_requeue_p pid"20 oldprio newprio=3D120 |futex sched_switch: prev_comm=futex_requeue_p prev_pid"20 prev_prio=120 Peter suggested to open code the new priority so people using tracehook could get the deadline data out. Reported-by: NMansky Christian <man@keba.com> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b91473ff ("sched,tracing: Update trace_sched_pi_setprio()") Link: http://lkml.kernel.org/r/20180524132647.gg6ziuogczdmjjzu@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Eric Biggers 提交于
The PPPIOCDETACH ioctl effectively tries to "close" the given ppp file before f_count has reached 0, which is fundamentally a bad idea. It does check 'f_count < 2', which excludes concurrent operations on the file since they would only be possible with a shared fd table, in which case each fdget() would take a file reference. However, it fails to account for the fact that even with 'f_count == 1' the file can still be linked into epoll instances. As reported by syzbot, this can trivially be used to cause a use-after-free. Yet, the only known user of PPPIOCDETACH is pppd versions older than ppp-2.4.2, which was released almost 15 years ago (November 2003). Also, PPPIOCDETACH apparently stopped working reliably at around the same time, when the f_count check was added to the kernel, e.g. see https://lkml.org/lkml/2002/12/31/83. Also, the current 'f_count < 2' check makes PPPIOCDETACH only work in single-threaded applications; it always fails if called from a multithreaded application. All pppd versions released in the last 15 years just close() the file descriptor instead. Therefore, instead of hacking around this bug by exporting epoll internals to modules, and probably missing other related bugs, just remove the PPPIOCDETACH ioctl and see if anyone actually notices. Leave a stub in place that prints a one-time warning and returns EINVAL. Reported-by: syzbot+16363c99d4134717c05b@syzkaller.appspotmail.com Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NEric Biggers <ebiggers@google.com> Acked-by: NPaul Mackerras <paulus@ozlabs.org> Reviewed-by: NGuillaume Nault <g.nault@alphalink.fr> Tested-by: NGuillaume Nault <g.nault@alphalink.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joonsoo Kim 提交于
This reverts the following commits that change CMA design in MM. 3d2054ad ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y") 1d47a3ec ("mm/cma: remove ALLOC_CMA") bad8c6c0 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE") Ville reported a following error on i386. Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) microcode: microcode updated early to revision 0x4, date = 2013-06-28 Initializing CPU#0 Initializing HighMem for node 0 (000377fe:00118000) Initializing Movable for node 0 (00000001:00118000) BUG: Bad page state in process swapper pfn:377fe page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0 flags: 0x80000000() raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145 Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013 Call Trace: dump_stack+0x60/0x96 bad_page+0x9a/0x100 free_pages_check_bad+0x3f/0x60 free_pcppages_bulk+0x29d/0x5b0 free_unref_page_commit+0x84/0xb0 free_unref_page+0x3e/0x70 __free_pages+0x1d/0x20 free_highmem_page+0x19/0x40 add_highpages_with_active_regions+0xab/0xeb set_highmem_pages_init+0x66/0x73 mem_init+0x1b/0x1d7 start_kernel+0x17a/0x363 i386_start_kernel+0x95/0x99 startup_32_smp+0x164/0x168 The reason for this error is that the span of MOVABLE_ZONE is extended to whole node span for future CMA initialization, and, normal memory is wrongly freed here. I submitted the fix and it seems to work, but, another problem happened. It's so late time to fix the later problem so I decide to reverting the series. Reported-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Acked-by: NLaura Abbott <labbott@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 5月, 2018 2 次提交
-
-
由 Daniel Borkmann 提交于
While reviewing the verifier code, I recently noticed that the following two program variants in relation to tail calls can be loaded. Variant 1: # bpftool p d x i 15 0: (15) if r1 == 0x0 goto pc+3 1: (18) r2 = map[id:5] 3: (05) goto pc+2 4: (18) r2 = map[id:6] 6: (b7) r3 = 7 7: (35) if r3 >= 0xa0 goto pc+2 8: (54) (u32) r3 &= (u32) 255 9: (85) call bpf_tail_call#12 10: (b7) r0 = 1 11: (95) exit # bpftool m s i 5 5: prog_array flags 0x0 key 4B value 4B max_entries 4 memlock 4096B # bpftool m s i 6 6: prog_array flags 0x0 key 4B value 4B max_entries 160 memlock 4096B Variant 2: # bpftool p d x i 20 0: (15) if r1 == 0x0 goto pc+3 1: (18) r2 = map[id:8] 3: (05) goto pc+2 4: (18) r2 = map[id:7] 6: (b7) r3 = 7 7: (35) if r3 >= 0x4 goto pc+2 8: (54) (u32) r3 &= (u32) 3 9: (85) call bpf_tail_call#12 10: (b7) r0 = 1 11: (95) exit # bpftool m s i 8 8: prog_array flags 0x0 key 4B value 4B max_entries 160 memlock 4096B # bpftool m s i 7 7: prog_array flags 0x0 key 4B value 4B max_entries 4 memlock 4096B In both cases the index masking inserted by the verifier in order to control out of bounds speculation from a CPU via b2157399 ("bpf: prevent out-of-bounds speculation") seems to be incorrect in what it is enforcing. In the 1st variant, the mask is applied from the map with the significantly larger number of entries where we would allow to a certain degree out of bounds speculation for the smaller map, and in the 2nd variant where the mask is applied from the map with the smaller number of entries, we get buggy behavior since we truncate the index of the larger map. The original intent from commit b2157399 is to reject such occasions where two or more different tail call maps are used in the same tail call helper invocation. However, the check on the BPF_MAP_PTR_POISON is never hit since we never poisoned the saved pointer in the first place! We do this explicitly for map lookups but in case of tail calls we basically used the tail call map in insn_aux_data that was processed in the most recent path which the verifier walked. Thus any prior path that stored a pointer in insn_aux_data at the helper location was always overridden. Fix it by moving the map pointer poison logic into a small helper that covers both BPF helpers with the same logic. After that in fixup_bpf_calls() the poison check is then hit for tail calls and the program rejected. Latter only happens in unprivileged case since this is the *only* occasion where a rewrite needs to happen, and where such rewrite is specific to the map (max_entries, index_mask). In the privileged case the rewrite is generic for the insn->imm / insn->code update so multiple maps from different paths can be handled just fine since all the remaining logic happens in the instruction processing itself. This is similar to the case of map lookups: in case there is a collision of maps in fixup_bpf_calls() we must skip the inlined rewrite since this will turn the generic instruction sequence into a non- generic one. Thus the patch_call_imm will simply update the insn->imm location where the bpf_map_lookup_elem() will later take care of the dispatch. Given we need this 'poison' state as a check, the information of whether a map is an unpriv_array gets lost, so enforcing it prior to that needs an additional state. In general this check is needed since there are some complex and tail call intensive BPF programs out there where LLVM tends to generate such code occasionally. We therefore convert the map_ptr rather into map_state to store all this w/o extra memory overhead, and the bit whether one of the maps involved in the collision was from an unpriv_array thus needs to be retained as well there. Fixes: b2157399 ("bpf: prevent out-of-bounds speculation") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Jason Gunthorpe 提交于
The err pointer comes from uverbs_attr_get, not from the uobject member, which does not store an ERR_PTR. Fixes: be934cca ("IB/uverbs: Add device memory registration ioctl support") Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
-
- 23 5月, 2018 1 次提交
-
-
由 Xin Long 提交于
Now sctp uses inet_dgram_connect as its proto_ops .connect, and the flags param can't be passed into its proto .connect where this flags is really needed. sctp works around it by getting flags from socket file in __sctp_connect. It works for connecting from userspace, as inherently the user sock has socket file and it passes f_flags as the flags param into the proto_ops .connect. However, the sock created by sock_create_kern doesn't have a socket file, and it passes the flags (like O_NONBLOCK) by using the flags param in kernel_connect, which calls proto_ops .connect later. So to fix it, this patch defines a new proto_ops .connect for sctp, sctp_inet_connect, which calls __sctp_connect() directly with this flags param. After this, the sctp's proto .connect can be removed. Note that sctp_inet_connect doesn't need to do some checks that are not needed for sctp, which makes thing better than with inet_dgram_connect. Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reviewed-by: NMichal Kubecek <mkubecek@suse.cz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 5月, 2018 1 次提交
-
-
由 Alexei Starovoitov 提交于
Detect code patterns where malicious 'speculative store bypass' can be used and sanitize such patterns. 39: (bf) r3 = r10 40: (07) r3 += -216 41: (79) r8 = *(u64 *)(r7 +0) // slow read 42: (7a) *(u64 *)(r10 -72) = 0 // verifier inserts this instruction 43: (7b) *(u64 *)(r8 +0) = r3 // this store becomes slow due to r8 44: (79) r1 = *(u64 *)(r6 +0) // cpu speculatively executes this load 45: (71) r2 = *(u8 *)(r1 +0) // speculatively arbitrary 'load byte' // is now sanitized Above code after x86 JIT becomes: e5: mov %rbp,%rdx e8: add $0xffffffffffffff28,%rdx ef: mov 0x0(%r13),%r14 f3: movq $0x0,-0x48(%rbp) fb: mov %rdx,0x0(%r14) ff: mov 0x0(%rbx),%rdi 103: movzbq 0x0(%rdi),%rsi Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 19 5月, 2018 1 次提交
-
-
由 Souptick Joarder 提交于
Many places in drivers/ file systems, error was handled in a common way like below: ret = (ret == -ENOMEM) ? VM_FAULT_OOM : VM_FAULT_SIGBUS; vmf_error() will replace this and return vm_fault_t type err. A lot of drivers and filesystems currently have a rather complex mapping of errno-to-VM_FAULT code. We have been able to eliminate a lot of it by just returning VM_FAULT codes directly from functions which are called exclusively from the fault handling path. Some functions can be called both from the fault handler and other context which are expecting an errno, so they have to continue to return an errno. Some users still need to choose different behaviour for different errnos, but vmf_error() captures the essential error translation that's common to all users, and those that need to handle additional errors can handle them first. Link: http://lkml.kernel.org/r/20180510174826.GA14268@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com> Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 5月, 2018 3 次提交
-
-
由 Eric Biggers 提交于
wiphy names were recently limited to 128 bytes by commit a7cfebcb ("cfg80211: limit wiphy names to 128 bytes"). As it turns out though, this isn't sufficient because dev_vprintk_emit() needs the syslog header string "SUBSYSTEM=ieee80211\0DEVICE=+ieee80211:$devname" to fit into 128 bytes. This triggered the "device/subsystem name too long" WARN when the device name was >= 90 bytes. As before, this was reproduced by syzbot by sending an HWSIM_CMD_NEW_RADIO command to the MAC80211_HWSIM generic netlink family. Fix it by further limiting wiphy names to 64 bytes. Reported-by: syzbot+e64565577af34b3768dc@syzkaller.appspotmail.com Fixes: a7cfebcb ("cfg80211: limit wiphy names to 128 bytes") Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Matt Mullins 提交于
scatterlist code expects virt_to_page() to work, which fails with CONFIG_VMAP_STACK=y. Fixes: c46234eb ("tls: RX path for ktls") Signed-off-by: NMatt Mullins <mmullins@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willy Tarreau 提交于
proc_pid_cmdline_read() and environ_read() directly access the target process' VM to retrieve the command line and environment. If this process remaps these areas onto a file via mmap(), the requesting process may experience various issues such as extra delays if the underlying device is slow to respond. Let's simply refuse to access file-backed areas in these functions. For this we add a new FOLL_ANON gup flag that is passed to all calls to access_remote_vm(). The code already takes care of such failures (including unmapped areas). Accesses via /proc/pid/mem were not changed though. This was assigned CVE-2018-1120. Note for stable backports: the patch may apply to kernels prior to 4.11 but silently miss one location; it must be checked that no call to access_remote_vm() keeps zero as the last argument. Reported-by: NQualys Security Advisory <qsa@qualys.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NWilly Tarreau <w@1wt.eu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 5月, 2018 1 次提交
-
-
由 Saeed Mahameed 提交于
Avoid using the kernel's irq_descriptor and return IRQ vector affinity directly from the driver. This fixes the following build break when CONFIG_SMP=n include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’: include/linux/mlx5/driver.h:1299:13: error: ‘struct irq_desc’ has no member named ‘affinity_hint’ Fixes: 6082d9c9 ("net/mlx5: Fix mlx5_get_vector_affinity function") Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> CC: Randy Dunlap <rdunlap@infradead.org> CC: Guenter Roeck <linux@roeck-us.net> CC: Thomas Gleixner <tglx@linutronix.de> Tested-by: NIsrael Rukshin <israelr@mellanox.com> Reported-by: Nkbuild test robot <lkp@intel.com> Reported-by: NRandy Dunlap <rdunlap@infradead.org> Tested-by: NRandy Dunlap <rdunlap@infradead.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 5月, 2018 2 次提交
-
-
由 Waiman Long 提交于
The filesystem freezing code needs to transfer ownership of a rwsem embedded in a percpu-rwsem from the task that does the freezing to another one that does the thawing by calling percpu_rwsem_release() after freezing and percpu_rwsem_acquire() before thawing. However, the new rwsem debug code runs afoul with this scheme by warning that the task that releases the rwsem isn't the one that acquires it, as reported by Amir Goldstein: DEBUG_LOCKS_WARN_ON(sem->owner != get_current()) WARNING: CPU: 1 PID: 1401 at /home/amir/build/src/linux/kernel/locking/rwsem.c:133 up_write+0x59/0x79 Call Trace: percpu_up_write+0x1f/0x28 thaw_super_locked+0xdf/0x120 do_vfs_ioctl+0x270/0x5f1 ksys_ioctl+0x52/0x71 __x64_sys_ioctl+0x16/0x19 do_syscall_64+0x5d/0x167 entry_SYSCALL_64_after_hwframe+0x49/0xbe To work properly with the rwsem debug code, we need to annotate that the rwsem ownership is unknown during the tranfer period until a brave soul comes forward to acquire the ownership. During that period, optimistic spinning will be disabled. Reported-by: NAmir Goldstein <amir73il@gmail.com> Tested-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NWaiman Long <longman@redhat.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Theodore Y. Ts'o <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-fsdevel@vger.kernel.org Link: http://lkml.kernel.org/r/1526420991-21213-3-git-send-email-longman@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Lidong Chen 提交于
User-space may invoke ibv_reg_mr and ibv_dereg_mr in different threads. If ibv_dereg_mr is called after the thread which invoked ibv_reg_mr has exited, get_pid_task will return NULL and ib_umem_release will not decrease mm->pinned_vm. Instead of using threads to locate the mm, use the overall tgid from the ib_ucontext struct instead. This matches the behavior of ODP and disassociate in handling the mm of the process that called ibv_reg_mr. Cc: <stable@vger.kernel.org> Fixes: 87773dd5 ("IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_get") Signed-off-by: NLidong Chen <lidongchen@tencent.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
- 15 5月, 2018 2 次提交
-
-
由 Geert Uytterhoeven 提交于
The __DIVIDE() macro checks whether it is called with a 32-bit or 64-bit dividend, to select the appropriate divide-and-round-up routine. As the check uses the ternary operator, the result will always be promoted to a type that can hold both results, i.e. unsigned long long. When using this result in a division on a 32-bit system, this may lead to link errors like: ERROR: "__udivdi3" [drivers/mtd/nand/raw/nand.ko] undefined! Fix this by casting the result of the division to the type of the dividend. Fixes: 8878b126 ("mtd: nand: add ->exec_op() implementation") Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NBoris Brezillon <boris.brezillon@bootlin.com>
-
由 Steven Rostedt (VMware) 提交于
Doing an audit of trace events, I discovered two trace events in the xen subsystem that use a hack to create zero data size trace events. This is not what trace events are for. Trace events add memory footprint overhead, and if all you need to do is see if a function is hit or not, simply make that function noinline and use function tracer filtering. Worse yet, the hack used was: __array(char, x, 0) Which creates a static string of zero in length. There's assumptions about such constructs in ftrace that this is a dynamic string that is nul terminated. This is not the case with these tracepoints and can cause problems in various parts of ftrace. Nuke the trace events! Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 95a7d768 ("xen/mmu: Use Xen specific TLB flush instead of the generic one.") Reviewed-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 14 5月, 2018 3 次提交
-
-
由 David Howells 提交于
Add a tracepoint to record callbacks from servers for which we don't have a record. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 Ben Hutchings 提交于
Commit 9e343e87 ("mtd: cfi: convert inline functions to macros") changed map_word_andequal() into a macro, but also changed the right hand side of the comparison from val3 to val2. Change it back to use val3 on the right hand side. Thankfully this did not cause a regression because all callers currently pass the same argument for val2 and val3. Fixes: 9e343e87 ("mtd: cfi: convert inline functions to macros") Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NBoris Brezillon <boris.brezillon@bootlin.com>
-
由 Ard Biesheuvel 提交于
Mixed mode allows a kernel built for x86_64 to interact with 32-bit EFI firmware, but requires us to define all struct definitions carefully when it comes to pointer sizes. 'struct efi_pci_io_protocol_32' currently uses a 'void *' for the 'romimage' field, which will be interpreted as a 64-bit field on such kernels, potentially resulting in bogus memory references and subsequent crashes. Tested-by: NHans de Goede <hdegoede@redhat.com> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 5月, 2018 3 次提交
-
-
Since commit c1adf200 ("Introduce rb_replace_node_rcu()") rbtree_augmented.h uses RCU related data structures but does not include the header file. It works as long as it gets somehow included before that and fails otherwise. Link: http://lkml.kernel.org/r/20180504103159.19938-1-bigeasy@linutronix.deSigned-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
Since exit_mmap() is done without the protection of mm->mmap_sem, it is possible for the oom reaper to concurrently operate on an mm until MMF_OOM_SKIP is set. This allows munlock_vma_pages_all() to concurrently run while the oom reaper is operating on a vma. Since munlock_vma_pages_range() depends on clearing VM_LOCKED from vm_flags before actually doing the munlock to determine if any other vmas are locking the same memory, the check for VM_LOCKED in the oom reaper is racy. This is especially noticeable on architectures such as powerpc where clearing a huge pmd requires serialize_against_pte_lookup(). If the pmd is zapped by the oom reaper during follow_page_mask() after the check for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a kernel oops. Fix this by manually freeing all possible memory from the mm before doing the munlock and then setting MMF_OOM_SKIP. The oom reaper can not run on the mm anymore so the munlock is safe to do in exit_mmap(). It also matches the logic that the oom reaper currently uses for determining when to set MMF_OOM_SKIP itself, so there's no new risk of excessive oom killing. This issue fixes CVE-2018-1000200. Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.corp.google.com Fixes: 21292580 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: NDavid Rientjes <rientjes@google.com> Suggested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
For anything NFS-exported we do _not_ want to unlock new inode before it has grown an alias; original set of fixes got the ordering right, but missed the nasty complication in case of lockdep being enabled - unlock_new_inode() does lockdep_annotate_inode_mutex_key(inode) which can only be done before anyone gets a chance to touch ->i_mutex. Unfortunately, flipping the order and doing unlock_new_inode() before d_instantiate() opens a window when mkdir can race with open-by-fhandle on a guessed fhandle, leading to multiple aliases for a directory inode and all the breakage that follows from that. Correct solution: a new primitive (d_instantiate_new()) combining these two in the right order - lockdep annotate, then d_instantiate(), then the rest of unlock_new_inode(). All combinations of d_instantiate() with unlock_new_inode() should be converted to that. Cc: stable@kernel.org # 2.6.29 and later Tested-by: NMike Marshall <hubcap@omnibond.com> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 5月, 2018 4 次提交
-
-
由 Debabrata Banerjee 提交于
There was a regression at some point from the intended functionality of commit f60c3704 ("bonding: Fix alb mode to only use first level vlans.") Given the return value vlan_get_encap_level() we need to store the nest level of the bond device, and then compare the vlan's encap level to this. Without this, this check always fails and learning packets are never sent. In addition, this same commit caused a regression in the behavior of balance_alb, which requires learning packets be sent for all interfaces using the slave's mac in order to load balance properly. For vlan's that have not set a user mac, we can send after checking one bit. Otherwise we need send the set mac, albeit defeating rx load balancing for that vlan. Signed-off-by: NDebabrata Banerjee <dbanerje@akamai.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wanpeng Li 提交于
Our virtual machines make use of device assignment by configuring 12 NVMe disks for high I/O performance. Each NVMe device has 129 MSI-X Table entries: Capabilities: [50] MSI-X: Enable+ Count=129 Masked-Vector table: BAR=0 offset=00002000 The windows virtual machines fail to boot since they will map the number of MSI-table entries that the NVMe hardware reported to the bus to msi routing table, this will exceed the 1024. This patch extends MAX_IRQ_ROUTES to 4096 for all archs, in the future this might be extended again if needed. Reviewed-by: NCornelia Huck <cohuck@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim KrÄmář <rkrcmar@redhat.com> Cc: Cornelia Huck <cohuck@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NTonny Lu <tonnylu@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Howells 提交于
Add a tracepoint to log transmission failure from the UDP transport socket being used by AF_RXRPC. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Add a tracepoint to log received ICMP/ICMP6 events and other error messages. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 10 5月, 2018 1 次提交
-
-
由 Ilya Dryomov 提交于
... and store num_bvecs for client code's convenience. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJeff Layton <jlayton@redhat.com> Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
-
- 09 5月, 2018 1 次提交
-
-
由 Pablo Neira Ayuso 提交于
When removing a rule that jumps to chain and such chain in the same batch, this bogusly hits EBUSY. Add activate and deactivate operations to expression that can be called from the preparation and the commit/abort phases. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 08 5月, 2018 1 次提交
-
-
由 Wolfram Sang 提交于
Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 5月, 2018 1 次提交
-
-
由 Randy Dunlap 提交于
Fix 88 instances of a kernel-doc warning: ../include/net/mac80211.h:2083: warning: bad line: > Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: linux-wireless@vger.kernel.org Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
- 05 5月, 2018 4 次提交
-
-
由 Thomas Gleixner 提交于
The migitation control is simpler to implement in architecture code as it avoids the extra function call to check the mode. Aside of that having an explicit seccomp enabled mode in the architecture mitigations would require even more workarounds. Move it into architecture code and provide a weak function in the seccomp code. Remove the 'which' argument as this allows the architecture to decide which mitigations are relevant for seccomp. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Kees Cook 提交于
If a seccomp user is not interested in Speculative Store Bypass mitigation by default, it can set the new SECCOMP_FILTER_FLAG_SPEC_ALLOW flag when adding filters. Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
For certain use cases it is desired to enforce mitigations so they cannot be undone afterwards. That's important for loader stubs which want to prevent a child from disabling the mitigation again. Will also be used for seccomp(). The extra state preserving of the prctl state for SSB is a preparatory step for EBPF dymanic speculation control. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Bhadram Varka 提交于
It adds support for BCM89610 (Single-Port 10/100/1000BASE-T) transceiver which is used in P3310 Tegra186 platform. Signed-off-by: NBhadram Varka <vbhadram@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 5月, 2018 2 次提交
-
-
由 Mauro Carvalho Chehab 提交于
From now on, I'll start using my @kernel.org as my development e-mail. As such, let's remove the entries that point to the old mchehab@s-opensource.com at MAINTAINERS file. For the files written with a copyright with mchehab@s-opensource, let's keep Samsung on their names, using mchehab+samsung@kernel.org, in order to keep pointing to my employer, with sponsors the work. For the files written before I join Samsung (on July, 4 2013), let's just use mchehab@kernel.org. For bug reports, we can simply point to just kernel.org, as this will reach my mchehab+samsung inbox anyway. Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: NBrian Warner <brian.warner@samsung.com> Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
-
由 Peter Zijlstra 提交于
Gaurav reported a perceived problem with TASK_PARKED, which turned out to be a broken wait-loop pattern in __kthread_parkme(), but the reported issue can (and does) in fact happen for states that do not do condition based sleeps. When the 'current->state = TASK_RUNNING' store of a previous (concurrent) try_to_wake_up() collides with the setting of a 'special' sleep state, we can loose the sleep state. Normal condition based wait-loops are immune to this problem, but for sleep states that are not condition based are subject to this problem. There already is a fix for TASK_DEAD. Abstract that and also apply it to TASK_STOPPED and TASK_TRACED, both of which are also without condition based wait-loop. Reported-by: NGaurav Kohli <gkohli@codeaurora.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-