- 24 9月, 2014 1 次提交
-
-
由 Wanpeng Li 提交于
The following bug can be triggered by hot adding and removing a large number of xen domain0's vcpus repeatedly: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group PGD 5a9d5067 PUD 13067 PMD 0 Oops: 0000 [#3] SMP [...] Call Trace: load_balance ? _raw_spin_unlock_irqrestore idle_balance __schedule schedule schedule_timeout ? lock_timer_base schedule_timeout_uninterruptible msleep lock_device_hotplug_sysfs online_store dev_attr_store sysfs_write_file vfs_write SyS_write system_call_fastpath Last level cache shared mask is built during CPU up and the build_sched_domain() routine takes advantage of it to setup the sched domain CPU topology. However, llc_shared_mask is not released during CPU disable, which leads to an invalid sched domainCPU topology. This patch fix it by releasing the llc_shared_mask correctly during CPU disable. Yasuaki also reported that this can happen on real hardware: https://lkml.org/lkml/2014/7/22/1018 His case is here: == Here is an example on my system. My system has 4 sockets and each socket has 15 cores and HT is enabled. In this case, each core of sockes is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 Socket#2 | 30-44, 90-104 Socket#3 | 45-59, 105-119 Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000. It means that last level cache of Socket#2 is shared with CPU#30-44 and 90-104. When hot-removing socket#2 and #3, each core of sockets is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains having 0x3fff80000001fffc0000000. After that, when hot-adding socket#2 and #3, each core of sockets is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 Socket#2 | 30-59 Socket#3 | 90-119 Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000. It means that last level cache of Socket#2 is shared with CPU#30-59 and 90-104. So the mask has the wrong value. Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com> Tested-by: NLinn Crosetto <linn@hp.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NToshi Kani <toshi.kani@hp.com> Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 30 8月, 2014 1 次提交
-
-
由 Vivek Goyal 提交于
Currently new system call kexec_file_load() and all the associated code compiles if CONFIG_KEXEC=y. But new syscall also compiles purgatory code which currently uses gcc option -mcmodel=large. This option seems to be available only gcc 4.4 onwards. Hiding new functionality behind a new config option will not break existing users of old gcc. Those who wish to enable new functionality will require new gcc. Having said that, I am trying to figure out how can I move away from using -mcmodel=large but that can take a while. I think there are other advantages of introducing this new config option. As this option will be enabled only on x86_64, other arches don't have to compile generic kexec code which will never be used. This new code selects CRYPTO=y and CRYPTO_SHA256=y. And all other arches had to do this for CONFIG_KEXEC. Now with introduction of new config option, we can remove crypto dependency from other arches. Now CONFIG_KEXEC_FILE is available only on x86_64. So whereever I had CONFIG_X86_64 defined, I got rid of that. For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to "depends on CRYPTO=y". This should be safer as "select" is not recursive. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Tested-by: NShaun Ruffell <sruffell@digium.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 8月, 2014 1 次提交
-
-
由 Jiang Liu 提交于
Now IOAPIC driver dynamically allocates IRQ numbers for IOAPIC pins. We need to keep IRQ assignment for PCI devices during runtime power management, otherwise it may cause failure of device wakeups. Commit 3eec5952 "x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation" has fixed the issue for suspend/ hibernation, we also need the same fix for runtime device sleep too. Fix: https://bugzilla.kernel.org/show_bug.cgi?id=83271Reported-and-Tested-by: NEmanueL Czirai <amanual@openmailbox.org> Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: EmanueL Czirai <amanual@openmailbox.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Grant Likely <grant.likely@linaro.org> Link: http://lkml.kernel.org/r/1409304383-18806-1-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 28 8月, 2014 1 次提交
-
-
由 Wang Nan 提交于
This patch frees the 'optinsn' slot when we get a range check error, to prevent memory leaks. Before this patch, cache entry in kprobe_insn_cache() won't be freed if kprobe optimizing fails due to range check failure. Signed-off-by: NWang Nan <wangnan0@huawei.com> Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Pei Feiyue <peifeiyue@huawei.com> Link: http://lkml.kernel.org/r/1406550019-70935-1-git-send-email-wangnan0@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 8月, 2014 1 次提交
-
-
由 Jiang Liu 提交于
Commit 15a3c7cc "x86, irq: Introduce two helper functions to support irqdomain map operation" breaks LPSS ACPI enumerated devices. On startup, IOAPIC driver preallocates IRQ descriptors and programs IOAPIC pins with default level and polarity attributes for all legacy IRQs. Later legacy IRQ users may fail to set IOAPIC pin attributes if the requested attributes conflicts with the default IOAPIC pin attributes. So change mp_irqdomain_map() to allow the first legacy IRQ user to reprogram IOAPIC pin with different attributes. Reported-and-tested-by: NMika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Grant Likely <grant.likely@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1409118795-17046-1-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 26 8月, 2014 1 次提交
-
-
由 Andy Shevchenko 提交于
Upstream commit: 95d76acc ("x86, irq: Count legacy IRQs by legacy_pic->nr_legacy_irqs instead of NR_IRQS_LEGACY") removed reserved interrupts for the platforms that do not have a legacy IOAPIC. Which breaks the boot on Intel MID platforms such as Medfield: BUG: unable to handle kernel NULL pointer dereference at 0000003a IP: [<c107079a>] setup_irq+0xf/0x4d [ 0.000000] *pdpt = 0000000000000000 *pde = 9bbf32453167e510 The culprit is an uncoditional setting of IRQ2 which is used as cascade IRQ on legacy platforms. It seems we have to check if we have enough legacy IRQs reserved before we can call setup_irq(). The fix adds such check in native_init_IRQ() and in setup_default_timer_irq(). Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: NJiang Liu <jiang.liu@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: David Cohen <david.a.cohen@linux.intel.com> Link: http://lkml.kernel.org/r/1405931920-12871-1-git-send-email-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 8月, 2014 1 次提交
-
-
由 Stefan Bader 提交于
commit 554086d8 "x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)" introduced a new jump label (sysenter_badsys) but somehow the END statements seem to have gone wrong (at least it feels that way to me). This does not seem to be a fatal problem, but just for the sake of symmetry, change the second syscall_badsys to sysenter_badsys. Signed-off-by: NStefan Bader <stefan.bader@canonical.com> Link: http://lkml.kernel.org/r/1408093066-31021-1-git-send-email-stefan.bader@canonical.comAcked-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 13 8月, 2014 1 次提交
-
-
由 Benoit Taine 提交于
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines. This issue was reported by checkpatch. A simplified version of the semantic patch that makes this change is as follows (http://coccinelle.lip6.fr/): // <smpl> @@ identifier i; declarer name DEFINE_PCI_DEVICE_TABLE; initializer z; @@ - DEFINE_PCI_DEVICE_TABLE(i) + const struct pci_device_id i[] = z; // </smpl> [bhelgaas: add semantic patch] Signed-off-by: NBenoit Taine <benoit.taine@lip6.fr> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 09 8月, 2014 7 次提交
-
-
由 Vivek Goyal 提交于
This is the final piece of the puzzle of verifying kernel image signature during kexec_file_load() syscall. This patch calls into PE file routines to verify signature of bzImage. If signature are valid, kexec_file_load() succeeds otherwise it fails. Two new config options have been introduced. First one is CONFIG_KEXEC_VERIFY_SIG. This option enforces that kernel has to be validly signed otherwise kernel load will fail. If this option is not set, no signature verification will be done. Only exception will be when secureboot is enabled. In that case signature verification should be automatically enforced when secureboot is enabled. But that will happen when secureboot patches are merged. Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option enables signature verification support on bzImage. If this option is not set and previous one is set, kernel image loading will fail because kernel does not have support to verify signature of bzImage. I tested these patches with both "pesign" and "sbsign" signed bzImages. I used signing_key.priv key and signing_key.x509 cert for signing as generated during kernel build process (if module signing is enabled). Used following method to sign bzImage. pesign ====== - Convert DER format cert to PEM format cert openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform PEM - Generate a .p12 file from existing cert and private key file openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in signing_key.x509.PEM - Import .p12 file into pesign db pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign - Sign bzImage pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign -c "Glacier signing key - Magrathea" -s sbsign ====== sbsign --key signing_key.priv --cert signing_key.x509.PEM --output /boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+ Patch details: Well all the hard work is done in previous patches. Now bzImage loader has just call into that code and verify whether bzImage signature are valid or not. Also create two config options. First one is CONFIG_KEXEC_VERIFY_SIG. This option enforces that kernel has to be validly signed otherwise kernel load will fail. If this option is not set, no signature verification will be done. Only exception will be when secureboot is enabled. In that case signature verification should be automatically enforced when secureboot is enabled. But that will happen when secureboot patches are merged. Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option enables signature verification support on bzImage. If this option is not set and previous one is set, kernel image loading will fail because kernel does not have support to verify signature of bzImage. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Fleming <matt@console-pimps.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vivek Goyal 提交于
This patch does two things. It passes EFI run time mappings to second kernel in bootparams efi_info. Second kernel parse this info and create new mappings in second kernel. That means mappings in first and second kernel will be same. This paves the way to enable EFI in kexec kernel. This patch also prepares and passes EFI setup data through bootparams. This contains bunch of information about various tables and their addresses. These information gathering and passing has been written along the lines of what current kexec-tools is doing to make kexec work with UEFI. [akpm@linux-foundation.org: s/get_efi/efi_get/g, per Matt] Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Fleming <matt@console-pimps.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vivek Goyal 提交于
This patch adds support for loading a kexec on panic (kdump) kernel usning new system call. It prepares ELF headers for memory areas to be dumped and for saved cpu registers. Also prepares the memory map for second kernel and limits its boot to reserved areas only. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vivek Goyal 提交于
This is loader specific code which can load bzImage and set it up for 64bit entry. This does not take care of 32bit entry or real mode entry. 32bit mode entry can be implemented if somebody needs it. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vivek Goyal 提交于
Load purgatory code in RAM and relocate it based on the location. Relocation code has been inspired by module relocation code and purgatory relocation code in kexec-tools. Also compute the checksums of loaded kexec segments and store them in purgatory. Arch independent code provides this functionality so that arch dependent bootloaders can make use of it. Helper functions are provided to get/set symbol values in purgatory which are used by bootloaders later to set things like stack and entry point of second kernel etc. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vivek Goyal 提交于
Previous patch provided the interface definition and this patch prvides implementation of new syscall. Previously segment list was prepared in user space. Now user space just passes kernel fd, initrd fd and command line and kernel will create a segment list internally. This patch contains generic part of the code. Actual segment preparation and loading is done by arch and image specific loader. Which comes in next patch. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Walter 提交于
Replace obsolete strict_strto calls with appropriate kstrto calls Signed-off-by: NDaniel Walter <dwalter@google.com> Acked-by: NBorislav Petkov <bp@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 8月, 2014 1 次提交
-
-
由 Thomas Gleixner 提交于
Commit ea431643 ("x86/mce: Fix CMCI preemption bugs") breaks RT by the completely unrelated conversion of the cmci_discover_lock to a regular (non raw) spinlock. This lock was annotated in commit 59d958d2 ("locking, x86: mce: Annotate cmci_discover_lock as raw") with a proper explanation why. The argument for converting the lock back to a regular spinlock was: - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. Which is complete nonsense. The raw_spinlock is disabling preemption in the same way as a regular spinlock. In mainline spinlock maps to raw_spinlock, in RT spinlock becomes a "sleeping" lock. raw_spinlock has on RT exactly the same semantics as in mainline. And because this lock is taken in non preemptible context it must be raw on RT. Undo the locking brainfart. Reported-by: NClark Williams <williams@redhat.com> Reported-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 8月, 2014 1 次提交
-
-
由 Dan Carpenter 提交于
I don't know if we really need 64 bits here but these variables are declared as u64 and it can't hurt to cast this so we prevent any shift wrapping. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NAubrey Li <aubrey.li@linux.intel.com> Link: http://lkml.kernel.org/r/20140801082715.GE28869@mwandaSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 02 8月, 2014 1 次提交
-
-
由 H. Peter Anvin 提交于
Since checkin 411cf9ee x86, vsmp: Remove is_vsmp_box() from apic_is_clustered_box() ... is_vsmp_box() is only used in vsmp_64.c and does not have any header file declaring it, so make it explicitly static. Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Cc: Shai Fultheim <shai@scalemp.com> Cc: Oren Twaig <oren@scalemp.com> Link: http://lkml.kernel.org/r/1404036068-11674-1-git-send-email-oren@scalemp.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 31 7月, 2014 10 次提交
-
-
由 Dave Hansen 提交于
I think the flush_tlb_mm_range() code that tries to tune the flush sizes based on the CPU needs to get ripped out for several reasons: 1. It is obviously buggy. It uses mm->total_vm to judge the task's footprint in the TLB. It should certainly be using some measure of RSS, *NOT* ->total_vm since only resident memory can populate the TLB. 2. Haswell, and several other CPUs are missing from the intel_tlb_flushall_shift_set() function. Thus, it has been demonstrated to bitrot quickly in practice. 3. It is plain wrong in my vm: [ 0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] tlb_flushall_shift: 6 Which leads to it to never use invlpg. 4. The assumptions about TLB refill costs are wrong: http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com (more on this in later patches) 5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59 I believe the sample times were too short. Running the benchmark in a loop yields times that vary quite a bit. Note that this leaves us with a static ceiling of 1 page. This is a conservative, dumb setting, and will be revised in a later patch. This also removes the code which attempts to predict whether we are flushing data or instructions. We expect instruction flushes to be relatively rare and not worth tuning for explicitly. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.comAcked-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 David Rientjes 提交于
The enable_apic_mode() apic callback is never called, so remove it. Signed-off-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302352320.17503@chino.kir.corp.google.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 David Rientjes 提交于
Since commit b5660ba7 ("x86, platforms: Remove NUMAQ") removed NUMAQ, the setup_portio_remap() apic callback has been obsolete. Remove it. Signed-off-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302351480.17503@chino.kir.corp.google.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 David Rientjes 提交于
Since commit b5660ba7 ("x86, platforms: Remove NUMAQ") removed NUMAQ, the multi_timer_check() apic callback has been obsolete. Remove it. Signed-off-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302351120.17503@chino.kir.corp.google.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 David Rientjes 提交于
noop_check_apicid_used() has the same implementation as default_check_apicid_used() in the standard header file, so replace the former with the latter. Signed-off-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302350450.17503@chino.kir.corp.google.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 David Rientjes 提交于
The check_apicid_present() apic callback is never called, so remove it and functions that implement it. Signed-off-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302350160.17503@chino.kir.corp.google.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 David Rientjes 提交于
Since commit b5660ba7 ("x86, platforms: Remove NUMAQ") removed NUMAQ, the mps_oem_check() apic callback has been obsolete. Remove it. This allows generic_mps_oem_check() to be removed as well. Signed-off-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302349390.17503@chino.kir.corp.google.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 David Rientjes 提交于
Since commit b5660ba7 ("x86, platforms: Remove NUMAQ") removed NUMAQ, the smp_callin_clear_local_apic() apic callback has been obsolete. Remove it. Signed-off-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302349040.17503@chino.kir.corp.google.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 David Rientjes 提交于
The trampoline_phys_{high,low} members of struct apic are always initialized to DEFAULT_TRAMPOLINE_PHYS_HIGH and TRAMPOLINE_PHYS_LOW, respectively. Hardwire the constants and remove the unneeded members. Signed-off-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302348330.17503@chino.kir.corp.google.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 David Rientjes 提交于
Since commit b5660ba7 ("x86, platforms: Remove NUMAQ") removed NUMAQ, the x86_32_numa_cpu_node() apic callback has been obsolete. Remove it. Signed-off-by: NDavid Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1407302348060.17503@chino.kir.corp.google.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 29 7月, 2014 1 次提交
-
-
由 Andy Lutomirski 提交于
This moves the espfix64 logic into native_iret. To make this work, it gets rid of the native patch for INTERRUPT_RETURN: INTERRUPT_RETURN on native kernels is now 'jmp native_iret'. This changes the 16-bit SS behavior on Xen from OOPSing to leaking some bits of the Xen hypervisor's RSP (I think). [ hpa: this is a nonzero cost on native, but probably not enough to measure. Xen needs to fix this in their own code, probably doing something equivalent to espfix64. ] Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>
-
- 26 7月, 2014 4 次提交
-
-
由 Andy Lutomirski 提交于
This commit in Linux 3.6: commit c767a54b Author: Joe Perches <joe@perches.com> Date: Mon May 21 19:50:07 2012 -0700 x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> caused warn_bad_vsyscall to output garbage in the middle of the line. Revert the bad part of it. The printk in question isn't actually bare; the level is "%s". The bug this fixes is purely cosmetic; backports are optional. Cc: <stable@vger.kernel.org> # v3.6+ Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/03eac1f24110bbe496ecc12a4df467e0d88466d4.1406330947.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Li, Aubrey 提交于
Add the following interfaces to exposes PMC device state and sleep state residency via debugfs: /sys/kernel/debugfs/pmc_atom/dev_state /sys/kernel/debugfs/pmc_atom/sleep_state Signed-off-by: NAubrey Li <aubrey.li@linux.intel.com> Link: http://lkml.kernel.org/r/53B0FF59.8000600@linux.intel.comSigned-off-by: NKasagar, Srinidhi <srinidhi.kasagar@intel.com> Reviewed-by: NRudramuni, Vishwesh M <vishwesh.m.rudramuni@intel.com> Reviewed-by: NJoe Perches <joe@perches.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Li, Aubrey 提交于
Disable PMC S0IX_WAKE_EN events coming from LPC block(unused) and also from GPIO_SUS ored dedicated IRQs (must be disabled as per PMC programming rule), GPIOSCORE ored dedicated IRQs (must be disabled as per PMC programming rule), GPIO_SUS shared IRQ (not necessary since the IOAPIC_DS wake event will still work), GPIO_SCORE shared IRQ (not necessary since the IOAPIC_DS wake event will still work). Signed-off-by: NAubrey Li <aubrey.li@linux.intel.com> Link: http://lkml.kernel.org/r/53B0FF22.5080403@linux.intel.comSigned-off-by: NOlivier Leveque <olivier.leveque@intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Li, Aubrey 提交于
The Power Management Controller (PMC) controls many of the power management features present in the Atom SoC. This driver provides a native power off function via PMC PCI IO port. On some ACPI hardware-reduced platforms(e.g. ASUS-T100), ACPI sleep registers are not valid so that (*pm_power_off)() is not hooked by acpi_power_off(). The power off function in this driver is installed only when pm_power_off is NULL. Signed-off-by: NAubrey Li <aubrey.li@linux.intel.com> Link: http://lkml.kernel.org/r/53B0FEEA.3010805@linux.intel.comSigned-off-by: NLejun Zhu <lejun.zhu@linux.intel.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 24 7月, 2014 3 次提交
-
-
由 Thomas Gleixner 提交于
The members of the new struct are the required ones for the new NMI safe accessor to clcok monotonic. In order to reuse the existing timekeeping code and to make the update of the fast NMI safe timekeepers a simple memcpy use the struct for the timekeeper as well and convert all users. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
cycle_last was added to the clocksource to support the TSC validation. We moved that to the core code, so we can get rid of the extra copy. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
The only user of the cycle_last validation is the x86 TSC. In order to provide NMI safe accessor functions for clock monotonic and monotonic_raw we need to do that in the core. We can't do the TSC specific if (now < cycle_last) now = cycle_last; for the other wrapping around clocksources, but TSC has CLOCKSOURCE_MASK(64) which actually does not mask out anything so if now is less than cycle_last the subtraction will give a negative result. So we can check for that in clocksource_delta() and return 0 for that case. Implement and enable it for x86 Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
- 23 7月, 2014 3 次提交
-
-
由 Peter Zijlstra 提交于
P4 systems with cpuid level < 4 can have SMT, but the cache topology description available (cpuid2) does not include SMP information. Now we know that SMT shares all cache levels, and therefore we can mark all available cache levels as shared. We do this by setting cpu_llc_id to ->phys_proc_id, since that's the same for each SMT thread. We can do this unconditional since if there's no SMT its still true, the one CPU shares cache with only itself. This fixes a problem where such CPUs report an incorrect LLC CPU mask. This in turn fixes a crash in the scheduler where the topology was build wrong, it assumes the LLC mask to include at least the SMT CPUs. Cc: Josh Boyer <jwboyer@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: NBruno Wolff III <bruno@wolff.to> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140722133514.GM12054@laptop.lanSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Tomasz Nowicki 提交于
GHES currently maps two pages with atomic_ioremap. From now on, NMI is architectural depended so there is no need to allocate an NMI page for platforms without NMI support. To make it possible to not use a second page, swap the existing page order so that the IRQ context page is first, and the optional NMI context page is second. Then, use HAVE_ACPI_APEI_NMI to decide how many pages are to be allocated. Signed-off-by: NTomasz Nowicki <tomasz.nowicki@linaro.org> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 Tomasz Nowicki 提交于
This commit abstracts MCE calls and provides weak corresponding default implementation for those architectures which do not need arch specific actions. Each platform willing to do additional architectural actions should provides desired function definition. It allows us to avoid wrap code into #ifdef in generic code and prevent new platform from introducing dummy stub function too. Initially, there are two APEI arch-specific calls: - arch_apei_enable_cmcff() - arch_apei_report_mem_error() Both interact with MCE driver for X86 architecture. Signed-off-by: NTomasz Nowicki <tomasz.nowicki@linaro.org> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 22 7月, 2014 1 次提交
-
-
由 Sven Wegener 提交于
Commit 554086d8 ("x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry code, resulting in syscall() not returning proper errors for undefined syscalls on CPUs supporting the sysenter feature. The following code: > int result = syscall(666); > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno)); results in: > result=666 errno=0 error=Success Obviously, the syscall return value is the called syscall number, but it should have been an ENOSYS error. When run under ptrace it behaves correctly, which makes it hard to debug in the wild: > result=-1 errno=38 error=Function not implemented The %eax register is the return value register. For debugging via ptrace the syscall entry code stores the complete register context on the stack. The badsys handlers only store the ENOSYS error code in the ptrace register set and do not set %eax like a regular syscall handler would. The old resume_userspace call chain contains code that clobbers %eax and it restores %eax from the ptrace registers afterwards. The same goes for the ptrace-enabled call chain. When ptrace is not used, the syscall return value is the passed-in syscall number from the untouched %eax register. Use %eax as the return value register in syscall_badsys and sysenter_badsys, like a real syscall handler does, and have the caller push the value onto the stack for ptrace access. Signed-off-by: NSven Wegener <sven.wegener@stealer.net> Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.netReviewed-and-tested-by: NAndy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> # If 554086d8 is backported Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-