- 06 11月, 2017 6 次提交
-
-
由 Christoffer Dall 提交于
kvm_timer_should_fire() can be called in two different situations from the kvm_vcpu_block(). The first case is before calling kvm_timer_schedule(), used for wait polling, and in this case the VCPU thread is running and the timer state is loaded onto the hardware so all we have to do is check if the virtual interrupt lines are asserted, becasue the timer interrupt handler functions will raise those lines as appropriate. The second case is inside the wait loop of kvm_vcpu_block(), where we have already called kvm_timer_schedule() and therefore the hardware will be disabled and the software view of the timer state is up to date (timer->loaded is false), and so we can simply check if the timer should fire by looking at the software state. Signed-off-by: NChristoffer Dall <cdall@linaro.org> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
Now when both the vtimer and the ptimer when using both the in-kernel vgic emulation and a userspace IRQ chip are driven by the timer signals and at the vcpu load/put boundaries, instead of recomputing the timer state at every entry/exit to/from the guest, we can get entirely rid of the flush hwstate function. Signed-off-by: NChristoffer Dall <cdall@linaro.org> Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
We don't need to save and restore the hardware timer state and examine if it generates interrupts on on every entry/exit to the guest. The timer hardware is perfectly capable of telling us when it has expired by signaling interrupts. When taking a vtimer interrupt in the host, we don't want to mess with the timer configuration, we just want to forward the physical interrupt to the guest as a virtual interrupt. We can use the split priority drop and deactivate feature of the GIC to do this, which leaves an EOI'ed interrupt active on the physical distributor, making sure we don't keep taking timer interrupts which would prevent the guest from running. We can then forward the physical interrupt to the VM using the HW bit in the LR of the GIC, like we do already, which lets the guest directly deactivate both the physical and virtual timer simultaneously, allowing the timer hardware to exit the VM and generate a new physical interrupt when the timer output is again asserted later on. We do need to capture this state when migrating VCPUs between physical CPUs, however, which we use the vcpu put/load functions for, which are called through preempt notifiers whenever the thread is scheduled away from the CPU or called directly if we return from the ioctl to userspace. One caveat is that we have to save and restore the timer state in both kvm_timer_vcpu_[put/load] and kvm_timer_[schedule/unschedule], because we can have the following flows: 1. kvm_vcpu_block 2. kvm_timer_schedule 3. schedule 4. kvm_timer_vcpu_put (preempt notifier) 5. schedule (vcpu thread gets scheduled back) 6. kvm_timer_vcpu_load (preempt notifier) 7. kvm_timer_unschedule And a version where we don't actually call schedule: 1. kvm_vcpu_block 2. kvm_timer_schedule 7. kvm_timer_unschedule Since kvm_timer_[schedule/unschedule] may not be followed by put/load, but put/load also may be called independently, we call the timer save/restore functions from both paths. Since they rely on the loaded flag to never save/restore when unnecessary, this doesn't cause any harm, and we ensure that all invokations of either set of functions work as intended. An added benefit beyond not having to read and write the timer sysregs on every entry and exit is that we no longer have to actively write the active state to the physical distributor, because we configured the irq for the vtimer to only get a priority drop when handling the interrupt in the GIC driver (we called irq_set_vcpu_affinity()), and the interrupt stays active after firing on the host. Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@linaro.org>
-
由 Christoffer Dall 提交于
We were using the same hrtimer for emulating the physical timer and for making sure a blocking VCPU thread would be eventually woken up. That worked fine in the previous arch timer design, but as we are about to actually use the soft timer expire function for the physical timer emulation, change the logic to use a dedicated hrtimer. This has the added benefit of not having to cancel any work in the sync path, which in turn allows us to run the flush and sync with IRQs disabled. Note that the hrtimer used to program the host kernel's timer to generate an exit from the guest when the emulated physical timer fires never has to inject any work, and to share the soft_timer_cancel() function with the bg_timer, we change the function to only cancel any pending work if the pointer to the work struct is not null. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@linaro.org>
-
由 Christoffer Dall 提交于
As we are about to introduce a separate hrtimer for the physical timer, call this timer bg_timer, because we refer to this timer as the background timer in the code and comments elsewhere. Signed-off-by: NChristoffer Dall <cdall@linaro.org> Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
We are about to add an additional soft timer to the arch timer state for a VCPU and would like to be able to reuse the functions to program and cancel a timer, so we make them slightly more generic and rename to make it more clear that these functions work on soft timers and not the hardware resource that this code is managing. The armed flag on the timer state is only used to assert a condition, and we don't rely on this assertion in any meaningful way, so we can simply get rid of this flack and slightly reduce complexity. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@linaro.org>
-
- 02 11月, 2017 3 次提交
-
-
由 Paul Burton 提交于
The gic_all_vpes_local_irq_controller chip currently attempts to operate on all CPUs/VPs in the system when masking or unmasking an interrupt. This has a few drawbacks: - In multi-cluster systems we may not always have access to all CPUs in the system. When all CPUs in a cluster are powered down that cluster's GIC may also power down, in which case we cannot configure its state. - Relatedly, if we power down a cluster after having configured interrupts for CPUs within it then the cluster's GIC may lose state & we need to reconfigure it. The current approach doesn't take this into account. - It's wasteful if we run Linux on fewer VPs than are present in the system. For example if we run a uniprocessor kernel on CPU0 of a system with 16 CPUs then there's no point in us configuring CPUs 1-15. - The implementation is also lacking in that it expects the range 0..gic_vpes-1 to represent valid Linux CPU numbers which may not always be the case - for example if we run on a system with more VPs than the kernel is configured to support. Fix all of these issues by only configuring the affected interrupts for CPUs which are online at the time, and recording the configuration in a new struct gic_all_vpes_chip_data for later use by CPUs being brought online. We register a CPU hotplug state (reusing CPUHP_AP_IRQ_GIC_STARTING which the ARM GIC driver uses, and which seems suitably generic for reuse with the MIPS GIC) and execute irq_cpu_online() in order to configure the interrupts on the newly onlined CPU. Signed-off-by: NPaul Burton <paul.burton@mips.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mips@linux-mips.org Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dou Liyang 提交于
Commit: f110711a ("irqdomain: Convert irqdomain-%3Eof_node to fwnode") converted of_node field to fwnode, but didn't update its comments. Update it. Fixes: f110711a ("irqdomain: Convert irqdomain-%3Eof_node to fwnode") Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
So far, we require the hypervisor to update the VLPI properties once the the VLPI mapping has been established. While this makes it easy for the ITS driver, it creates a window where an incoming interrupt can be delivered with an unknown set of properties. Not very nice. Instead, let's add a "properties" field to the mapping structure, and use that to configure the VLPI before it actually gets mapped. Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 19 10月, 2017 4 次提交
-
-
由 Marc Zyngier 提交于
So far, we map all VPEs on all ITSs. While this is not wrong, this is quite a big hammer, as moving a VPE around requires all ITSs to be synchronized. Needles to say, this is an expensive proposition. Instead, let's switch to a mode where we issue VMAPP commands only on ITSs that are actually involved in reporting interrupts to the given VM. For that purpose, we refcount the number of interrupts are are mapped for this VM on each ITS, performing the map/unmap operations as required. It then allows us to use this refcount to only issue VMOVP to the ITSs that need to know about this VM. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
As we're about to make use of the maximum number of ITSs in a GICv4 system, let's make this value global (and rename it to GICv4_ITS_LIST_MAX). Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Shanker Donthineni 提交于
A new feature Range Selector (RS) has been added to GIC specification in order to support more than 16 CPUs at affinity level 0. New fields are introduced in SGI system registers (ICC_SGI0R_EL1, ICC_SGI1R_EL1 and ICC_ASGI1R_EL1) to relax an artificial limit of 16 at level 0. - A new RSS field in ICC_CTLR_EL3, ICC_CTLR_EL1 and ICV_CTLR_EL1: [18] - Range Selector Support (RSS) 0b0 = Targeted SGIs with affinity level 0 values of 0-15 are supported. 0b1 = Targeted SGIs with affinity level 0 values of 0-255 are supported. - A new RS field in ICC_SGI0R_EL1, ICC_SGI1R_EL1 and ICC_ASGI1R_EL1: [47:44] - RangeSelector (RS) which group of 16 TargetList[n] field TargetList[n] represents aff0 value ((RS*16)+n) When ICC_CTLR_EL3.RSS==0 or ICC_CTLR_EL1.RSS==0, RS is RES0. - A new RSS field in GICD_TYPER: [26] - Range Selector Support (RSS) 0b0 = Targeted SGIs with affinity level 0 values of 0-15 are supported. 0b1 = Targeted SGIs with affinity level 0 values of 0-255 are supported. Signed-off-by: NShanker Donthineni <shankerd@codeaurora.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Masahiro Yamada 提交于
The revmap_trees_mutex protects domain->revmap_tree. There is no need to make it global because it is allowed to modify revmap_tree of two different domains concurrently. Having said that, this would not be a actual bottleneck because the interrupt map/unmap does not occur quite often. Rather, the motivation is to tidy up the code from a data structure point of view. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 17 10月, 2017 1 次提交
-
-
由 Ladislav Michl 提交于
All mach-omap2 variants are device tree only now, so this function is dead code. Remove it. Signed-off-by: NLadislav Michl <ladis@linux-mips.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NTony Lindgren <tony@atomide.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-omap@vger.kernel.org Cc: Jason Cooper <jason@lakedaemon.net> Link: https://lkml.kernel.org/r/20171016160422.uu2i7vvrgy7cc4aw@lenoch
-
- 04 10月, 2017 11 次提交
-
-
由 Thomas Gleixner 提交于
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu() on powerpc because it is called multiple times for the boot CPU. The first call is via: start_wd_on_cpu+0x80/0x2f0 watchdog_nmi_reconfigure+0x124/0x170 softlockup_reconfigure_threads+0x110/0x130 lockup_detector_init+0xbc/0xe0 kernel_init_freeable+0x18c/0x37c kernel_init+0x2c/0x160 ret_from_kernel_thread+0x5c/0xbc And then again via the CPU hotplug registration: start_wd_on_cpu+0x80/0x2f0 cpuhp_invoke_callback+0x194/0x620 cpuhp_thread_fun+0x7c/0x1b0 smpboot_thread_fn+0x290/0x2a0 kthread+0x168/0x1b0 ret_from_kernel_thread+0x5c/0xbc This can be avoided by setting up the cpu hotplug state with nocalls and move the initialization to the watchdog_nmi_probe() function. That initializes the hotplug callbacks without invoking the callback and the following core initialization function then configures the watchdog for the online CPUs (in this case CPU0) via softlockup_reconfigure_threads(). Reported-and-tested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org
-
由 Thomas Gleixner 提交于
The recent cleanup of the watchdog code split watchdog_nmi_reconfigure() into two stages. One to stop the NMI and one to restart it after reconfiguration. That was done by adding a boolean 'run' argument to the code, which is functionally correct but not necessarily a piece of art. Replace it by two explicit functions: watchdog_nmi_stop() and watchdog_nmi_start(). Fixes: 6592ad2f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage") Requested-by: NLinus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org> Signed-off-by: NThomas 'Mopping up garbage' Gleixner <tglx@linutronix.de> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
-
由 Linus Walleij 提交于
In may, Steven sent a patch deleting the bounce buffer handling and the CONFIG_MMC_BLOCK_BOUNCE option. I chose the less invasive path of making it a runtime config option, and we merged that successfully for kernel v4.12. The code is however just standing in the way and taking up space for seemingly no gain on any systems in wide use today. Pierre says the code was there to improve speed on TI SDHCI controllers on certain HP laptops and possibly some Ricoh controllers as well. Early SDHCI controllers lacked the scatter-gather feature, which made software bounce buffers a significant speed boost. We are clearly talking about the list of SDHCI PCI-based MMC/SD card readers found in the pci_ids[] list in drivers/mmc/host/sdhci-pci-core.c. The TI SDHCI derivative is not supported by the upstream kernel. This leaves the Ricoh. What we can however notice is that the x86 defconfigs in the kernel did not enable CONFIG_MMC_BLOCK_BOUNCE option, which means that any such laptop would have to have a custom configured kernel to actually take advantage of this bounce buffer speed-up. It simply seems like there was a speed optimization for the Ricoh controllers that noone was using. (I have not checked the distro defconfigs but I am pretty sure the situation is the same there.) Bounce buffers increased performance on the OMAP HSMMC at one point, and was part of the original submission in commit a45c6cb8 ("[ARM] 5369/1: omap mmc: Add new omap hsmmc controller for 2430 and 34xx, v3") This optimization was removed in commit 0ccd76d4 ("omap_hsmmc: Implement scatter-gather emulation") which found that scatter-gather emulation provided even better performance. The same was introduced for SDHCI in commit 2134a922 ("sdhci: scatter-gather (ADMA) support") I am pretty positively convinced that software scatter-gather emulation will do for any host controller what the bounce buffers were doing. Essentially, the bounce buffer was a reimplementation of software scatter-gather-emulation in the MMC subsystem, and it should be done away with. Cc: Pierre Ossman <pierre@ossman.eu> Cc: Juha Yrjola <juha.yrjola@solidboot.com> Cc: Steven J. Hill <Steven.Hill@cavium.com> Cc: Shawn Lin <shawn.lin@rock-chips.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Suggested-by: NSteven J. Hill <Steven.Hill@cavium.com> Suggested-by: NShawn Lin <shawn.lin@rock-chips.com> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
-
由 Mike Rapoport 提交于
Before commit 9c5d760b ("mm: split gfp_mask and mapping flags into separate fields") the private_* fields of struct adrress_space were grouped together and using "ditto" in comments describing the last fields was correct. With introduction of gpf_mask between private_lock and private_list "ditto" references the wrong description. Fix it by using the elaborate description. Link: http://lkml.kernel.org/r/1507009987-8746-1-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 YASUAKI ISHIMATSU 提交于
pfn_to_section_nr() and section_nr_to_pfn() are defined as macro. pfn_to_section_nr() has no issue even if it is defined as macro. But section_nr_to_pfn() has overflow issue if sec is defined as int. section_nr_to_pfn() just shifts sec by PFN_SECTION_SHIFT. If sec is defined as unsigned long, section_nr_to_pfn() returns pfn as 64 bit value. But if sec is defined as int, section_nr_to_pfn() returns pfn as 32 bit value. __remove_section() calculates start_pfn using section_nr_to_pfn() and scn_nr defined as int. So if hot-removed memory address is over 16TB, overflow issue occurs and section_nr_to_pfn() does not calculate correct pfn. To make callers use proper arg, the patch changes the macros to inline functions. Fixes: 815121d2 ("memory_hotplug: clear zone when removing the memory") Link: http://lkml.kernel.org/r/e643a387-e573-6bbf-d418-c60c8ee3d15e@gmail.comSigned-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masahiro Yamada 提交于
I do not see anything that restricts this macro to 32 bit width. Link: http://lkml.kernel.org/r/1505921975-23379-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Patch series "exec: binfmt_misc: fix use-after-free, kill iname[BINPRM_BUF_SIZE]". It looks like this code was always wrong, then commit 948b701a ("binfmt_misc: add persistent opened binary handler for containers") added more problems. This patch (of 6): load_script() can simply use i_name instead, it points into bprm->buf[] and nobody can change this memory until we call prepare_binprm(). The only complication is that we need to also change the signature of bprm_change_interp() but this change looks good too. While at it, do whitespace/style cleanups. NOTE: the real motivation for this change is that people want to increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but this looks more complicated because afaics it is very buggy. Link: http://lkml.kernel.org/r/20170918163446.GA26793@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NKees Cook <keescook@chromium.org> Cc: Travis Gummels <tgummels@redhat.com> Cc: Ben Woodard <woodard@redhat.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: <tdhooge@llnl.gov> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sherry Yang 提交于
Drop the global lru lock in isolate callback before calling zap_page_range which calls cond_resched, and re-acquire the global lru lock before returning. Also change return code to LRU_REMOVED_RETRY. Use mmput_async when fail to acquire mmap sem in an atomic context. Fix "BUG: sleeping function called from invalid context" errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled. Also restore mmput_async, which was initially introduced in commit ec8d7c14 ("mm, oom_reaper: do not mmput synchronously from the oom reaper context"), and was removed in commit 21292580 ("mm: oom: let oom_reap_task and exit_mmap run concurrently"). Link: http://lkml.kernel.org/r/20170914182231.90908-1-sherryy@android.com Fixes: f2517eb7 ("android: binder: Add global lru shrinker to binder") Signed-off-by: NSherry Yang <sherryy@android.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reported-by: NKyle Yan <kyan@codeaurora.org> Acked-by: NArve Hjønnevåg <arve@android.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Martijn Coenen <maco@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Riley Andrews <riandrews@android.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hdanton@sina.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hoeun Ryu <hoeun.ryu@gmail.com> Cc: Christopher Lameter <cl@linux.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Andrea has noticed that the oom_reaper doesn't invalidate the range via mmu notifiers (mmu_notifier_invalidate_range_start/end) and that can corrupt the memory of the kvm guest for example. tlb_flush_mmu_tlbonly already invokes mmu notifiers but that is not sufficient as per Andrea: "mmu_notifier_invalidate_range cannot be used in replacement of mmu_notifier_invalidate_range_start/end. For KVM mmu_notifier_invalidate_range is a noop and rightfully so. A MMU notifier implementation has to implement either ->invalidate_range method or the invalidate_range_start/end methods, not both. And if you implement invalidate_range_start/end like KVM is forced to do, calling mmu_notifier_invalidate_range in common code is a noop for KVM. For those MMU notifiers that can get away only implementing ->invalidate_range, the ->invalidate_range is implicitly called by mmu_notifier_invalidate_range_end(). And only those secondary MMUs that share the same pagetable with the primary MMU (like AMD iommuv2) can get away only implementing ->invalidate_range" As the callback is allowed to sleep and the implementation is out of hand of the MM it is safer to simply bail out if there is an mmu notifier registered. In order to not fail too early make the mm_has_notifiers check under the oom_lock and have a little nap before failing to give the current oom victim some more time to exit. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170913113427.2291-1-mhocko@kernel.org Fixes: aac45363 ("mm, oom: introduce oom reaper") Signed-off-by: NMichal Hocko <mhocko@suse.com> Reported-by: NAndrea Arcangeli <aarcange@redhat.com> Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
There's a typo in recent change of VM_MPX definition. We want it to be VM_HIGH_ARCH_4, not VM_HIGH_ARCH_BIT_4. This bug does cause visible regressions. In arch_vma_name the vmflags are tested against VM_MPX. With the incorrect value of VM_MPX, a number of vmas (such as the stack) test positive and end up being marked as "[mpx]" in /proc/N/maps instead of their correct names. This confuses tools like rr which expect to be able to find familiar vmas. Fixes: df3735c5 ("x86,mpx: make mpx depend on x86-64 to free up VMA flag") Link: http://lkml.kernel.org/r/20170918140253.36856-1-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kyle Huey <me@kylehuey.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexei Starovoitov 提交于
- bpf prog_array just like all other types of bpf array accepts 32-bit index. Clarify that in the comment. - fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes - tighten corresponding check in the interpreter to stay consistent The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag in commit 96eabe7a in 4.14. Before that the map_flags would stay zero and though JIT code is wrong it will check bounds correctly. Hence two fixes tags. All other JITs don't have this problem. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Fixes: 96eabe7a ("bpf: Allow selecting numa node during map creation") Fixes: b52f00e6 ("x86: bpf_jit: implement bpf_tail_call() helper") Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NMartin KaFai Lau <kafai@fb.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 10月, 2017 2 次提交
-
-
由 Khazhismel Kumykov 提交于
iscsi_session_teardown was the only user of this function. Function currently is just short for iscsi_remove_session + iscsi_free_session. Signed-off-by: NKhazhismel Kumykov <khazhy@google.com> Acked-by: NChris Leech <cleech@redhat.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Martin K. Petersen 提交于
SBC-4 states: "A MAXIMUM UNMAP LBA COUNT field set to a non-zero value indicates the maximum number of LBAs that may be unmapped by an UNMAP command" "A MAXIMUM WRITE SAME LENGTH field set to a non-zero value indicates the maximum number of contiguous logical blocks that the device server allows to be unmapped or written in a single WRITE SAME command." Despite the spec being clear on the topic, some devices incorrectly expect WRITE SAME commands with the UNMAP bit set to be limited to the value reported in MAXIMUM UNMAP LBA COUNT in the Block Limits VPD. Implement a blacklist option that can be used to accommodate devices with this behavior. Cc: <stable@vger.kernel.org> Reported-by: NBill Kuzeja <William.Kuzeja@stratus.com> Reported-by: NEwan D. Milne <emilne@redhat.com> Reviewed-by: NEwan D. Milne <emilne@redhat.com> Tested-by: NLaurence Oberman <loberman@redhat.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 01 10月, 2017 2 次提交
-
-
由 Paolo Abeni 提交于
The UDP early demux can leverate the rx dst cache even for multicast unconnected sockets. In such scenario the ipv4 source address is validated only on the first packet in the given flow. After that, when we fetch the dst entry from the socket rx cache, we stop enforcing the rp_filter and we even start accepting any kind of martian addresses. Disabling the dst cache for unconnected multicast socket will cause large performace regression, nearly reducing by half the max ingress tput. Instead we factor out a route helper to completely validate an skb source address for multicast packets and we call it from the UDP early demux for mcast packets landing on unconnected sockets, after successful fetching the related cached dst entry. This still gives a measurable, but limited performance regression: rp_filter = 0 rp_filter = 1 edmux disabled: 1182 Kpps 1127 Kpps edmux before: 2238 Kpps 2238 Kpps edmux after: 2037 Kpps 2019 Kpps The above figures are on top of current net tree. Applying the net-next commit 6e617de8 ("net: avoid a full fib lookup when rp_filter is disabled.") the delta with rp_filter == 0 will decrease even more. Fixes: 421b3885 ("udp: ipv4: Add udp early demux") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Currently no error is emitted, but this infrastructure will used by the next patch to allow source address validation for mcast sockets. Since early demux can do a route lookup and an ipv4 route lookup can return an error code this is consistent with the current ipv4 route infrastructure. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 9月, 2017 6 次提交
-
-
由 Peter Zijlstra 提交于
Currently TASK_PARKED is masqueraded as TASK_INTERRUPTIBLE, give it its own print state because it will not in fact get woken by regular wakeups and is a long-term state. This requires moving TASK_PARKED into the TASK_REPORT mask, and since that latter needs to be a contiguous bitmask, we need to shuffle the bits around a bit. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Markus reported that kthreads that idle using TASK_IDLE instead of TASK_INTERRUPTIBLE are reported in as TASK_UNINTERRUPTIBLE and things like htop mark those red. This is undesirable, so add an explicit state for TASK_IDLE. Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Remove yet another task-state char instance. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Convert trace_sched_switch to use the common task-state helpers and fix the "X" and "Z" order, possibly they ended up in the wrong order because TASK_REPORT has them in the wrong order too. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Bit patterns are easier in hex. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Currently get_task_state() and task_state_to_char() report different states, create a number of common helpers and unify the reported state space. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 9月, 2017 3 次提交
-
-
由 Kees Cook 提交于
Modern kernel callback systems pass the structure associated with a given callback to the callback function. The timer callback remains one of the legacy cases where an arbitrary unsigned long argument continues to be passed as the callback argument. This has several problems: - This bloats the timer_list structure with a normally redundant .data field. - No type checking is being performed, forcing callbacks to do explicit type casts of the unsigned long argument into the object that was passed, rather than using container_of(), as done in most of the other callback infrastructure. - Neighboring buffer overflows can overwrite both the .function and the .data field, providing attackers with a way to elevate from a buffer overflow into a simplistic ROP-like mechanism that allows calling arbitrary functions with a controlled first argument. - For future Control Flow Integrity work, this creates a unique function prototype for timer callbacks, instead of allowing them to continue to be clustered with other void functions that take a single unsigned long argument. This adds a new timer initialization API, which will ultimately replace the existing setup_timer(), setup_{deferrable,pinned,etc}_timer() family, named timer_setup() (to mirror hrtimer_setup(), making instances of its use much easier to grep for). In order to support the migration of existing timers into the new callback arguments, timer_setup() casts its arguments to the existing legacy types, and explicitly passes the timer pointer as the legacy data argument. Once all setup_*timer() callers have been replaced with timer_setup(), the casts can be removed, and the data argument can be dropped with the timer expiration code changed to just pass the timer to the callback directly. Since the regular pattern of using container_of() during local variable declaration repeats the need for the variable type declaration to be included, this adds a helper modeled after other from_*() helpers that wrap container_of(), named from_timer(). This helper uses typeof(*variable), removing the type redundancy and minimizing the need for line wraps in forthcoming conversions from "unsigned data long" to "struct timer_list *" in the timer callbacks: -void callback(unsigned long data) +void callback(struct timer_list *t) { - struct some_data_structure *local = (struct some_data_structure *)data; + struct some_data_structure *local = from_timer(local, t, timer); Finally, in order to support the handful of timer users that perform open-coded assignments of the .function (and .data) fields, provide cast macros (TIMER_FUNC_TYPE and TIMER_DATA_TYPE) that can be used temporarily. Once conversion has been completed, these can be globally trivially removed. Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20170928133817.GA113410@beast
-
由 Raed Salem 提交于
Added check for the maximal number of flow counters attached to rule (FTE). Fixes: bd5251db ('net/mlx5_core: Introduce flow steering destination of type counter') Signed-off-by: NRaed Salem <raeds@mellanox.com> Reviewed-by: NMaor Gottlieb <maorg@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
-
由 Inbar Karmy 提交于
Currently, FPGA capability is located in (mdev)->caps.hca_cur, change the location to be (mdev)->caps.fpga, since hca_cur is reserved for HCA device capabilities. Fixes: e29341fb ("net/mlx5: FPGA, Add basic support for Innova") Signed-off-by: NInbar Karmy <inbark@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
-
- 27 9月, 2017 1 次提交
-
-
由 Jean-Philippe Brucker 提交于
The definition of map_sg was split during a recent addition to iommu_ops. Put it back together. Fixes: add02cfd ("iommu: Introduce Interface for IOMMU TLB Flushing") Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de>
-
- 26 9月, 2017 1 次提交
-
-
由 Mark Rutland 提交于
As raw_cpu_generic_read() is a plain read from a raw_cpu_ptr() address, it's possible (albeit unlikely) that the compiler will split the access across multiple instructions. In this_cpu_generic_read() we disable preemption but not interrupts before calling raw_cpu_generic_read(). Thus, an interrupt could be taken in the middle of the split load instructions. If a this_cpu_write() or RMW this_cpu_*() op is made to the same variable in the interrupt handling path, this_cpu_read() will return a torn value. For native word types, we can avoid tearing using READ_ONCE(), but this won't work in all cases (e.g. 64-bit types on most 32-bit platforms). This patch reworks this_cpu_generic_read() to use READ_ONCE() where possible, otherwise falling back to disabling interrupts. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: NTejun Heo <tj@kernel.org>
-