- 23 7月, 2016 2 次提交
-
-
由 Andrey Ryabinin 提交于
radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags. Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot() leading to crash: RIP: radix_tree_next_slot include/linux/radix-tree.h:473 find_get_pages_tag+0x334/0x930 mm/filemap.c:1452 .... Call Trace: pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960 mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516 ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736 do_writepages+0x97/0x100 mm/page-writeback.c:2364 __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300 filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490 ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115 vfs_fsync_range+0x10a/0x250 fs/sync.c:195 vfs_fsync fs/sync.c:209 do_fsync+0x42/0x70 fs/sync.c:219 SYSC_fdatasync fs/sync.c:232 SyS_fdatasync+0x19/0x20 fs/sync.c:230 entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207 We must reset iterator's tags to bail out from radix_tree_next_slot() and go to the slow-path in radix_tree_next_chunk(). Fixes: 46437f9a ("radix-tree: fix race in gang lookup") Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Acked-by: NKonstantin Khlebnikov <koct9i@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> -
由 Johannes Weiner 提交于
The memory controller has quite a bit of state that usually outlives the cgroup and pins its CSS until said state disappears. At the same time it imposes a 16-bit limit on the CSS ID space to economically store IDs in the wild. Consequently, when we use cgroups to contain frequent but small and short-lived jobs that leave behind some page cache, we quickly run into the 64k limitations of outstanding CSSs. Creating a new cgroup fails with -ENOSPC while there are only a few, or even no user-visible cgroups in existence. Although pinning CSSs past cgroup removal is common, there are only two instances that actually need an ID after a cgroup is deleted: cache shadow entries and swapout records. Cache shadow entries reference the ID weakly and can deal with the CSS having disappeared when it's looked up later. They pose no hurdle. Swap-out records do need to pin the css to hierarchically attribute swapins after the cgroup has been deleted; though the only pages that remain swapped out after offlining are tmpfs/shmem pages. And those references are under the user's control, so they are manageable. This patch introduces a private 16-bit memcg ID and switches swap and cache shadow entries over to using that. This ID can then be recycled after offlining when the CSS remains pinned only by objects that don't specifically need it. This script demonstrates the problem by faulting one cache page in a new cgroup and deleting it again: set -e mkdir -p pages for x in `seq 128000`; do [ $((x % 1000)) -eq 0 ] && echo $x mkdir /cgroup/foo echo $$ >/cgroup/foo/cgroup.procs echo trex >pages/$x echo $$ >/cgroup/cgroup.procs rmdir /cgroup/foo done When run on an unpatched kernel, we eventually run out of possible IDs even though there are no visible cgroups: [root@ham ~]# ./cssidstress.sh [...] 65000 mkdir: cannot create directory '/cgroup/foo': No space left on device After this patch, the IDs get released upon cgroup destruction and the cache and css objects get released once memory reclaim kicks in. [hannes@cmpxchg.org: init the IDR] Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org Fixes: b2052564 ("mm: memcontrol: continue cache reclaim from offlined groups") Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reported-by: NJohn Garcia <john.garcia@mesosphere.io> Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NTejun Heo <tj@kernel.org> Cc: Nikolay Borisov <kernel@kyup.com> Cc: <stable@vger.kernel.org> [3.19+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 7月, 2016 1 次提交
-
-
由 Paolo Abeni 提交于
macsec can't cope with mtu frames which need vlan tag insertion, and vlan device set the default mtu equal to the underlying dev's one. By default vlan over macsec devices use invalid mtu, dropping all the large packets. This patch adds a netif helper to check if an upper vlan device needs mtu reduction. The helper is used during vlan devices initialization to set a valid default and during mtu updating to forbid invalid, too bit, mtu values. The helper currently only check if the lower dev is a macsec device, if we get more users, we need to update only the helper (possibly reserving an additional IFF bit). Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 7月, 2016 2 次提交
-
-
由 Hugh Dickins 提交于
The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's worth: the syzkaller fuzzer hit it again. It's still wrong for some THP cases, because linear_page_index() was never intended to apply to addresses before the start of a vma. That's easily fixed with a signed long cast inside linear_page_index(); and Dmitry has tested such a patch, to verify the false positive. But why extend linear_page_index() just for this case? when the avoidance in page_move_anon_rmap() has already grown ugly, and there's no reason for the check at all (nothing else there is using address or index). Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE, remove CONFIG_DEBUG_VM PageTransHuge adjustment. And one more thing: should the compound_head(page) be done inside or outside page_move_anon_rmap()? It's usually pushed down to the lowest level nowadays (and mm/memory.c shows no other explicit use of it), so I think it's better done in page_move_anon_rmap() than by caller. Fixes: 0798d3c0 ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()") Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvilsSigned-off-by: NHugh Dickins <hughd@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
I found a race condition triggering VM_BUG_ON() in freeze_page(), when running a testcase with 3 processes: - process 1: keep writing thp, - process 2: keep clearing soft-dirty bits from virtual address of process 1 - process 3: call migratepages for process 1, The kernel message is like this: kernel BUG at /src/linux-dev/mm/huge_memory.c:3096! invalid opcode: 0000 [#1] SMP Modules linked in: cfg80211 rfkill crc32c_intel ppdev serio_raw pcspkr virtio_balloon virtio_console parport_pc parport pvpanic acpi_cpufreq tpm_tis tpm i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy virtio_pci virtio_ring virtio CPU: 0 PID: 28863 Comm: migratepages Not tainted 4.6.0-v4.6-160602-0827-+ #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff880037320000 ti: ffff88007cdd0000 task.ti: ffff88007cdd0000 RIP: 0010:[<ffffffff811f8e06>] [<ffffffff811f8e06>] split_huge_page_to_list+0x496/0x590 RSP: 0018:ffff88007cdd3b70 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88007c7b88c0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000700000200 RDI: ffffea0003188000 RBP: ffff88007cdd3bb8 R08: 0000000000000001 R09: 00003ffffffff000 R10: ffff880000000000 R11: ffffc000001fffff R12: ffffea0003188000 R13: ffffea0003188000 R14: 0000000000000000 R15: 0400000000000080 FS: 00007f8ec241d740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8ec1f3ed20 CR3: 000000003707b000 CR4: 00000000000006f0 Call Trace: ? list_del+0xd/0x30 queue_pages_pte_range+0x4d1/0x590 __walk_page_range+0x204/0x4e0 walk_page_range+0x71/0xf0 queue_pages_range+0x75/0x90 ? queue_pages_hugetlb+0x190/0x190 ? new_node_page+0xc0/0xc0 ? change_prot_numa+0x40/0x40 migrate_to_node+0x71/0xd0 do_migrate_pages+0x1c3/0x210 SyS_migrate_pages+0x261/0x290 entry_SYSCALL_64_fastpath+0x1a/0xa4 Code: e8 b0 87 fb ff 0f 0b 48 c7 c6 30 32 9f 81 e8 a2 87 fb ff 0f 0b 48 c7 c6 b8 46 9f 81 e8 94 87 fb ff 0f 0b 85 c0 0f 84 3e fd ff ff <0f> 0b 85 c0 0f 85 a6 00 00 00 48 8b 75 c0 4c 89 f7 41 be f0 ff RIP split_huge_page_to_list+0x496/0x590 I'm not sure of the full scenario of the reproduction, but my debug showed that split_huge_pmd_address(freeze=true) returned without running main code of pmd splitting because pmd_present(*pmd) in precheck somehow returned 0. If this happens, the subsequent try_to_unmap() fails and returns non-zero (because page_mapcount() still > 0), and finally VM_BUG_ON() fires. This patch tries to fix it by prechecking pmd state inside ptl. Link: http://lkml.kernel.org/r/1466990929-7452-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 7月, 2016 4 次提交
-
-
由 Frederic Weisbecker 提交于
The vtime irqtime accounting headers are very scattered and convoluted right now. Reorganize them such that it is obvious that only CONFIG_VIRT_CPU_ACCOUNTING_NATIVE does use it. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1468421405-20056-5-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Frederic Weisbecker 提交于
Vtime generic irqtime accounting has been removed but there are a few remnants to clean up: * The vtime_accounting_cpu_enabled() check in irq entry was only used by CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can safely remove it. * Without the vtime_accounting_cpu_enabled(), we no longer need to have a vtime_common_account_irq_enter() indirect function. * Move vtime_account_irq_enter() implementation under CONFIG_VIRT_CPU_ACCOUNTING_NATIVE which is the last user. * The vtime_account_user() call was only used on irq entry for CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can remove that too. Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1468421405-20056-4-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not appear to currently work right. On CPUs without nohz_full=, only tick based irq time sampling is done, which breaks down when dealing with a nohz_idle CPU. On firewalls and similar systems, no ticks may happen on a CPU for a while, and the irq time spent may never get accounted properly. This can cause issues with capacity planning and power saving, which use the CPU statistics as inputs in decision making. Remove the VTIME_GEN vtime irq time code, and replace it with the IRQ_TIME_ACCOUNTING code, when selected as a config option by the user. Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Willem de Bruijn 提交于
Sockets can have a filter program attached that drops or trims incoming packets based on the filter program return value. Rose requires data packets to have at least ROSE_MIN_LEN bytes. It verifies this on arrival in rose_route_frame and unconditionally pulls the bytes in rose_recvmsg. The filter can trim packets to below this value in-between, causing pull to fail, leaving the partial header at the time of skb_copy_datagram_msg. Place a lower bound on the size to which sk_filter may trim packets by introducing sk_filter_trim_cap and call this for rose packets. Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 7月, 2016 1 次提交
-
-
由 Jeff Layton 提交于
Currently the two are unioned together, but I don't think that's safe. It looks like get_cached_acl could race with the last put in posix_acl_release. get_cached_acl calls atomic_inc_not_zero on a_refcount, but that field could have already been clobbered by call_rcu, and may no longer be zero. Fix this by de-unioning the two fields. Fixes: b8a7a3a6 (posix_acl: Inode acl caching fixes) Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 7月, 2016 2 次提交
-
-
由 Ingo Molnar 提交于
This reverts commit 2c95afc1. Stephane reported the following regression: > Since Andi added: > > commit 2c95afc1 > Author: Andi Kleen <ak@linux.intel.com> > Date: Thu Jun 9 06:14:38 2016 -0700 > > perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86 > > $ perf stat -e ref-cycles ls > <not counted> .... > > fails systematically because the ref-cycles is now used by the > watchdog and given this is a system-wide pinned event, it monopolizes > the fixed counter 2 which is the only counter able to measure this event. Since the next merge window is near, fix the regression for now by reverting the commit. Reported-by: NStephane Eranian <eranian@google.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Lukas Wunner 提交于
The EFI firmware on Macs contains a full-fledged network stack for downloading OS X images from osrecovery.apple.com. Unfortunately on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331 wireless card on every boot and leaves it enabled even after ExitBootServices has been called. The card continues to assert its IRQ line, causing spurious interrupts if the IRQ is shared. It also corrupts memory by DMAing received packets, allowing for remote code execution over the air. This only stops when a driver is loaded for the wireless card, which may be never if the driver is not installed or blacklisted. The issue seems to be constrained to the Broadcom 4331. Chris Milsted has verified that the newer Broadcom 4360 built into the MacBookPro11,3 (2013/2014) does not exhibit this behaviour. The chances that Apple will ever supply a firmware fix for the older machines appear to be zero. The solution is to reset the card on boot by writing to a reset bit in its mmio space. This must be done as an early quirk and not as a plain vanilla PCI quirk to successfully combat memory corruption by DMAed packets: Matthew Garrett found out in 2012 that the packets are written to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html). This type of memory is made available to the page allocator by efi_free_boot_services(). Plain vanilla PCI quirks run much later, in subsys initcall level. In-between a time window would be open for memory corruption. Random crashes occurring in this time window and attributed to DMAed packets have indeed been observed in the wild by Chris Bainbridge. When Matthew Garrett analyzed the memory corruption issue in 2012, he sought to fix it with a grub quirk which transitions the card to D3hot: http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56 This approach does not help users with other bootloaders and while it may prevent DMAed packets, it does not cure the spurious interrupts emanating from the card. Unfortunately the card's mmio space is inaccessible in D3hot, so to reset it, we have to undo the effect of Matthew's grub patch and transition the card back to D0. Note that the quirk takes a few shortcuts to reduce the amount of code: The size of BAR 0 and the location of the PM capability is identical on all affected machines and therefore hardcoded. Only the address of BAR 0 differs between models. Also, it is assumed that the BCMA core currently mapped is the 802.11 core. The EFI driver seems to always take care of this. Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback towards finding the best solution to this problem. The following should be a comprehensive list of affected models: iMac13,1 2012 21.5" [Root Port 00:1c.3 = 8086:1e16] iMac13,2 2012 27" [Root Port 00:1c.3 = 8086:1e16] Macmini5,1 2011 i5 2.3 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,2 2011 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini5,3 2011 i7 2.0 GHz [Root Port 00:1c.1 = 8086:1c12] Macmini6,1 2012 i5 2.5 GHz [Root Port 00:1c.1 = 8086:1e12] Macmini6,2 2012 i7 2.3 GHz [Root Port 00:1c.1 = 8086:1e12] MacBookPro8,1 2011 13" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,2 2011 15" [Root Port 00:1c.1 = 8086:1c12] MacBookPro8,3 2011 17" [Root Port 00:1c.1 = 8086:1c12] MacBookPro9,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro9,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,1 2012 15" [Root Port 00:1c.1 = 8086:1e12] MacBookPro10,2 2012 13" [Root Port 00:1c.1 = 8086:1e12] For posterity, spurious interrupts caused by the Broadcom 4331 wireless card resulted in splats like this (stacktrace omitted): irq 17: nobody cared (try booting with the "irqpoll" option) handlers: [<ffffffff81374370>] pcie_isr [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci] [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec] Disabling IRQ #17 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130 Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732 Tested-by: Konstantin Simanov <k.simanov@stlk.ru> # [MacBookPro8,1] Tested-by: Lukas Wunner <lukas@wunner.de> # [MacBookPro9,1] Tested-by: Bryan Paradis <bryan.paradis@gmail.com> # [MacBookPro9,2] Tested-by: Andrew Worsley <amworsley@gmail.com> # [MacBookPro10,1] Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2] Signed-off-by: NLukas Wunner <lukas@wunner.de> Acked-by: NRafał Miłecki <zajec5@gmail.com> Acked-by: NMatt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chris Milsted <cmilsted@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Michael Buesch <m@bues.ch> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> Cc: b43-dev@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: stable@vger.kernel.org Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Apply nvidia_bugs quirk only on root bus Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Reintroduce scanning of secondary buses Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de [ Did minor readability edits. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 7月, 2016 1 次提交
-
-
由 Paolo Bonzini 提交于
Thanks to all the work that was done by Andy Lutomirski and others, enter_from_user_mode() and prepare_exit_to_usermode() are now called only with interrupts disabled. Let's provide them a version of user_enter()/user_exit() that skips saving and restoring the interrupt flag. On an AMD-based machine I tested this patch on, with force-enabled context tracking, the speed-up in system calls was 90 clock cycles or 6%, measured with the following simple benchmark: #include <sys/signal.h> #include <time.h> #include <unistd.h> #include <stdio.h> unsigned long rdtsc() { unsigned long result; asm volatile("rdtsc; shl $32, %%rdx; mov %%eax, %%eax\n" "or %%rdx, %%rax" : "=a" (result) : : "rdx"); return result; } int main() { unsigned long tsc1, tsc2; int pid = getpid(); int i; tsc1 = rdtsc(); for (i = 0; i < 100000000; i++) kill(pid, SIGWINCH); tsc2 = rdtsc(); printf("%ld\n", tsc2 - tsc1); } Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NRik van Riel <riel@redhat.com> Reviewed-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Link: http://lkml.kernel.org/r/1466434712-31440-2-git-send-email-pbonzini@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 7月, 2016 3 次提交
-
-
由 Ingo Molnar 提交于
Re-organize the GUID table so that every GUID takes a single line. This makes each line super long, but if you have a large enough terminal (or zoom out of a small terminal) then you can see the structure at a glance - which is more readable than it was the case with the multi-line layout. Acked-by: NMatt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Joe Perches <joe@perches.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Jones <pjones@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20160627104920.GA9099@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dmitry Safonov 提交于
Add possibility for 32-bit user-space applications to move the vDSO mapping. Previously, when a user-space app called mremap() for the vDSO address, in the syscall return path it would land on the previous address of the vDSOpage, resulting in segmentation violation. Now it lands fine and returns to userspace with a remapped vDSO. This will also fix the context.vdso pointer for 64-bit, which does not affect the user of vDSO after mremap() currently, but this may change in the future. As suggested by Andy, return -EINVAL for mremap() that would split the vDSO image: that operation cannot possibly result in a working system so reject it. Renamed and moved the text_mapping structure declaration inside map_vdso(), as it used only there and now it complements the vvar_mapping variable. There is still a problem for remapping the vDSO in glibc applications: the linker relocates addresses for syscalls on the vDSO page, so you need to relink with the new addresses. Without that the next syscall through glibc may fail: Program received signal SIGSEGV, Segmentation fault. #0 0xf7fd9b80 in __kernel_vsyscall () #1 0xf7ec8238 in _exit () from /usr/lib32/libc.so.6 Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com> Acked-by: NAndy Lutomirski <luto@kernel.org> Cc: 0x7f454c46@gmail.com Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160628113539.13606-2-dsafonov@virtuozzo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Have printk*once() return a bool which denotes whether the string was printed or not so that calling code can react accordingly. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1467671487-10344-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 07 7月, 2016 1 次提交
-
-
由 Davidlohr Bueso 提交于
With the inclusion of atomic FETCH-OP variants, many places in the kernel can make use of atomic_fetch_$op() to avoid the callers that need to compute the value/state _before_ the operation. Peter Zijlstra laid out the machinery but we are still missing the simpler dec,inc() calls (which future patches will make use of). This patch only deals with the generic code, as at least right now no arch actually implement them -- which is similar to what the OP-RETURN primitives currently do. Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: James.Bottomley@HansenPartnership.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: awalls@md.metrocast.net Cc: bp@alien8.de Cc: cw00.choi@samsung.com Cc: davem@davemloft.net Cc: dledford@redhat.com Cc: dougthompson@xmission.com Cc: gregkh@linuxfoundation.org Cc: hans.verkuil@cisco.com Cc: heiko.carstens@de.ibm.com Cc: jikos@kernel.org Cc: kys@microsoft.com Cc: mchehab@osg.samsung.com Cc: pfg@sgi.com Cc: schwidefsky@de.ibm.com Cc: sean.hefty@intel.com Cc: sumit.semwal@linaro.org Link: http://lkml.kernel.org/r/20160628215651.GA20048@linux-80c1.suseSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 04 7月, 2016 1 次提交
-
-
由 David Lechner 提交于
The initial use for this is for PHYs that have a mode related to USB OTG. There are several SoCs (e.g. TI OMAP and DA8xx) that have a mode setting in the USB PHY to override OTG VBUS and ID signals. Of course, the enum can be expaned in the future to include modes for other types of PHYs as well. Suggested-by: NKishon Vijay Abraham I <kishon@ti.com> Signed-off-by: NDavid Lechner <david@lechnology.com>
-
- 03 7月, 2016 1 次提交
-
-
由 Linus Walleij 提交于
Leonard Crestez observed the following phenomenon: when using hard interrupt triggers (the DRDY line coming out of an ST sensor) sometimes a new value would arrive while reading the previous value, due to latencies in the system. We discovered that the ST hardware as far as can be observed is designed for level interrupts: the DRDY line will be held asserted as long as there are new values coming. The interrupt handler should be re-entered until we're out of values to handle from the sensor. If interrupts were handled as occurring on the edges (usually low-to-high) new values could appear and the line be held asserted after that, and these values would be missed, the interrupt handler would also lock up as new data was available, but as no new edges occurs on the DRDY signal, nothing happens: the edge detector only detects edges. To counter this, do the following: - Accept interrupt lines to be flagged as level interrupts using IRQF_TRIGGER_HIGH and IRQF_TRIGGER_LOW. If the line is marked like this (in the device tree node or ACPI table or similar) it will be utilized as a level IRQ. We mark the line with IRQF_ONESHOT and mask the IRQ while processing a sample, then the top half will be entered again if new values are available. - If we are flagged as using edge interrupts with IRQF_TRIGGER_RISING or IRQF_TRIGGER_FALLING: remove IRQF_ONESHOT so that the interrupt line is not masked while running the thread part of the interrupt. This way we will never miss an interrupt, then introduce a loop that polls the data ready registers repeatedly until no new samples are available, then exit the interrupt handler. This way we know no new values are available when the interrupt handler exits and new (edge) interrupts will be triggered when data arrives. Take some extra care to update the timestamp in the poll loop if this happens. The timestamp will not be 100% perfect, but it will at least be closer to the actual events. Usually the extra poll loop will handle the new samples, but once in a blue moon, we get a new IRQ while exiting the loop, before returning from the thread IRQ bottom half with IRQ_HANDLED. On these rare occasions, the removal of IRQF_ONESHOT means the interrupt will immediately fire again. - If no interrupt type is indicated from the DT/ACPI, choose IRQF_TRIGGER_RISING as default, as this is necessary for legacy boards. Tested successfully on the LIS331DL and L3G4200D by setting sampling frequency to 400Hz/800Hz and stressing the system: extra reads in the threaded interrupt handler occurs. Cc: Giuseppe Barba <giuseppe.barba@st.com> Cc: Denis Ciocca <denis.ciocca@st.com> Tested-by: NCrestez Dan Leonard <cdleonard@gmail.com> Reported-by: NCrestez Dan Leonard <cdleonard@gmail.com> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NJonathan Cameron <jic23@kernel.org>
-
- 02 7月, 2016 3 次提交
-
-
由 Venkat Reddy Talla 提交于
adding suspend and resume funtionality for extcon-adc-jack driver to configure system wake up for extcon events, also adding support to enable/disable system wakeup through flag wakeup_source based on platform requirement. Signed-off-by: NVenkat Reddy Talla <vreddytalla@nvidia.com> Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com>
-
由 WANG Cong 提交于
Similar to commit 9b368814 ("net: fix bridge multicast packet checksum validation") we need to fixup the checksum for CHECKSUM_COMPLETE when pushing skb on RX path. Otherwise we get similar splats. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that they want packets going in both directions on a flow to hash to the same bucket. The core kernel SKB hash became non-symmetric when the ipv6 flow label and other entities were incorporated into the standard flow hash order to increase entropy. But there are no users of PACKET_FANOUT_HASH who want an assymetric hash, they all want a symmetric one. Therefore, use the flow dissector to compute a flat symmetric hash over only the protocol, addresses and ports. This hash does not get installed into and override the normal skb hash, so this change has no effect whatsoever on the rest of the stack. Reported-by: NEric Leblond <eric@regit.org> Tested-by: NEric Leblond <eric@regit.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 7月, 2016 2 次提交
-
-
由 Mohamad Haj Yahia 提交于
The current implementation does not handle timeout in case of command with callback request, and this can lead to deadlock if the command doesn't get fw response. Add delayed callback timeout work before posting the command to fw. In case of real fw command completion we will cancel the delayed work. In case of fw command timeout the callback timeout handler will be called and it will simulate fw completion with timeout error. Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters') Signed-off-by: NMohamad Haj Yahia <mohamad@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gregor Boirie 提交于
Adds a new per-device sysfs attribute "current_timestamp_clock" to allow userspace to select a particular POSIX clock for buffered samples and events timestamping. Following clocks, as listed in clock_gettime(2), are supported: CLOCK_REALTIME, CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_REALTIME_COARSE, CLOCK_MONOTONIC_COARSE, CLOCK_BOOTTIME and CLOCK_TAI. Signed-off-by: NGregor Boirie <gregor.boirie@parrot.com> Acked-by: NSanchayan Maity <maitysanchayan@gmail.com> Signed-off-by: NJonathan Cameron <jic23@kernel.org>
-
- 30 6月, 2016 6 次提交
-
-
由 Steve Twiss 提交于
Fix compiler warning caused by an uninitialised variable inside da9052_group_write() function. Defaulting the value to zero covers the trivial case. Signed-off-by: NSteve Twiss <stwiss.opensource@diasemi.com> Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NLee Jones <lee.jones@linaro.org>
-
由 Lee Jones 提交于
Standardise the way inline functions: devm_reset_control_get_shared_by_index devm_reset_control_get_exclusive_by_index ... are formatted. Signed-off-by: NLee Jones <lee.jones@linaro.org> Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
-
由 Lee Jones 提交于
Consumers need to be able to specify whether they are requesting an 'exclusive' or 'shared' reset line no matter which API (of_*, devm_*, etc) they are using. This change allows users of the optional_* API in particular to specify that their request is for a 'shared' line. Signed-off-by: NLee Jones <lee.jones@linaro.org> Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
-
由 Lee Jones 提交于
Consumers need to be able to specify whether they are requesting an 'exclusive' or 'shared' reset line no matter which API (of_*, devm_*, etc) they are using. This change allows users of the of_* API in particular to specify that their request is for a 'shared' line. Signed-off-by: NLee Jones <lee.jones@linaro.org> Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
-
由 Lee Jones 提交于
Phasing out generic reset line requests enables us to make some better decisions on when and how to (de)assert said lines. If an 'exclusive' line is requested, we know a device *requires* a reset and that it's preferable to act upon a request right away. However, if a 'shared' reset line is requested, we can reasonably assume sure that placing a device into reset isn't a hard requirement, but probably a measure to save power and is thus able to cope with not being asserted if another device is still in use. In order allow gentle adoption and not to forcing all consumers to move to the API immediately, causing administration headache between subsystems, this patch adds some temporary stand-in shim-calls. This will ease the burden at merge time and allow subsystems to migrate over to the new API in a more realistic time-frame. Signed-off-by: NLee Jones <lee.jones@linaro.org> Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
-
由 Lee Jones 提交于
We're about to split the current API into two, where consumers will be forced to be explicit when requesting reset lines. The choice will be to either the call the *_exclusive or *_shared variant depending on whether they can actually tolorate not being asserted when that request is made. The new API will look like this once reorded and complete: reset_control_get_exclusive() reset_control_get_shared() reset_control_get_optional_exclusive() reset_control_get_optional_shared() of_reset_control_get_exclusive() of_reset_control_get_shared() of_reset_control_get_exclusive_by_index() of_reset_control_get_shared_by_index() devm_reset_control_get_exclusive() devm_reset_control_get_shared() devm_reset_control_get_optional_exclusive() devm_reset_control_get_optional_shared() devm_reset_control_get_exclusive_by_index() devm_reset_control_get_shared_by_index() Signed-off-by: NLee Jones <lee.jones@linaro.org> Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
-
- 29 6月, 2016 3 次提交
-
-
由 Chanwoo Choi 提交于
This patch fixes the wrong description about extcon_set/get_cable_state_() because they use the unique id of external connector instead of legacy name. Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com> -
由 Daniel Borkmann 提交于
Commit dead9f29 ("perf: Fix race in BPF program unregister") moved destruction of BPF program from free_event_rcu() callback to __free_event(), which is problematic if used with tail calls: if prog A is attached as trace event directly, but at the same time present in a tail call map used by another trace event program elsewhere, then we need to delay destruction via RCU grace period since it can still be in use by the program doing the tail call (the prog first needs to be dropped from the tail call map, then trace event with prog A attached destroyed, so we get immediate destruction). Fixes: dead9f29 ("perf: Fix race in BPF program unregister") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Cc: Jann Horn <jann@thejh.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Richard Guy Briggs 提交于
The only users of audit_get_tty and audit_put_tty are internal to audit, so move it out of include/linux/audit.h to kernel.h and create a proper function rather than inlining it. This also reduces kABI changes. Suggested-by: NPaul Moore <pmoore@redhat.com> Signed-off-by: NRichard Guy Briggs <rgb@redhat.com> [PM: line wrapped description] Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
- 28 6月, 2016 2 次提交
-
-
由 Willem de Bruijn 提交于
Diag intends to broadcast tcp_sk and udp_sk socket destruction. Testing sk->sk_protocol for IPPROTO_TCP/IPPROTO_UDP alone is not sufficient for this. Raw sockets can have the same type. Add a test for sk->sk_type. Fixes: eb4cb008 ("sock_diag: define destruction multicast groups") Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Frey 提交于
This driver implements support for the Sensirion SHT3x-DIS chip, a humidity and temperature sensor. Temperature is measured in degrees celsius, relative humidity is expressed as a percentage. In the sysfs interface, all values are scaled by 1000, i.e. the value for 31.5 degrees celsius is 31500. Signed-off-by: NPascal Sachs <pascal.sachs@sensirion.com> [groeck: Fixed 'Variable length array is used' gcc warning] Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
-
- 27 6月, 2016 5 次提交
-
-
由 Benjamin Marzinski 提交于
gfs2 needs to be able to skip the check to see if a page is outside of the file size when writing it out. gfs2 can get into a situation where it needs to flush its in-memory log to disk while a truncate is in progress. If the file being trucated has data journaling enabled, it is possible that there are data blocks in the log that are past the end of the file. gfs can't finish the log flush without either writing these blocks out or revoking them. Otherwise, if the node crashed, it could overwrite subsequent changes made by other nodes in the cluster when it's journal was replayed. Unfortunately, there is no way to add log entries to the log during a flush. So gfs2 simply writes out the page instead. This situation can only occur when the truncate code still has the file locked exclusively, and hasn't marked this block as free in the metadata (which happens later in truc_dealloc). After gfs2 writes this page out, the truncation code will shortly invalidate it and write out any revokes if necessary. In order to make this work, gfs2 needs to be able to skip the check for writes outside the file size. Since the check exists in block_write_full_page, this patch exports __block_write_full_page, which doesn't have the check. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Chanwoo Choi 提交于
This patch adds the resource-managed functions for register/unregister the extcon notifier with the id of each external connector. This function will make it easy to handle the extcon notifier. - int devm_extcon_register_notifier(struct device *dev, struct extcon_dev *edev, unsigned int id, struct notifier_block *nb); - void devm_extcon_unregister_notifier(struct device *dev, struct extcon_dev *edev, unsigned int id, struct notifier_block *nb); Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com> -
由 Chanwoo Choi 提交于
This patch moves the struct extcon_cable because that should be only handled by extcon core. There are no reason to publish the internal structure. Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com> -
由 Arnd Bergmann 提交于
Nothing calls the efi_get_time() function on x86, but it does suffer from the 32-bit time_t overflow in 2038. This removes the function, we can always put it back in case we need it later. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk> Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1466839230-12781-8-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alex Thorlton 提交于
This commit makes a few slight modifications to the efi_call_virt() macro to get it to work with function pointers that are stored in locations other than efi.systab->runtime, and renames the macro to efi_call_virt_pointer(). The majority of the changes here are to pull these macros up into header files so that they can be accessed from outside of drivers/firmware/efi/runtime-wrappers.c. The most significant change not directly related to the code move is to add an extra "p" argument into the appropriate efi_call macros, and use that new argument in place of the, formerly hard-coded, efi.systab->runtime pointer. The last piece of the puzzle was to add an efi_call_virt() macro back into drivers/firmware/efi/runtime-wrappers.c to wrap around the new efi_call_virt_pointer() macro - this was mainly to keep the code from looking too cluttered by adding a bunch of extra references to efi.systab->runtime everywhere. Note that I also broke up the code in the efi_call_virt_pointer() macro a bit in the process of moving it. Signed-off-by: NAlex Thorlton <athorlton@sgi.com> Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roy Franz <roy.franz@linaro.org> Cc: Russ Anderson <rja@sgi.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1466839230-12781-5-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
-