- 07 6月, 2015 1 次提交
-
-
由 Borislav Petkov 提交于
In talking to Aravind recently about making certain AMD topology attributes available to the MCE injection module, it seemed like that CONFIG_X86_HT thing is more or less superfluous. It is def_bool y, depends on SMP and gets enabled in the majority of .configs - distro and otherwise - out there. So let's kill it and make code behind it depend directly on SMP. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Walter <dwalter@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1433436928-31903-18-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 6月, 2015 1 次提交
-
-
由 Ingo Molnar 提交于
Peter Zijstra noticed that in arch/x86/Kconfig there are a lot of X86_{32,64} clauses in the X86 symbol, plus there are a number of similar selects in the X86_32 and X86_64 config definitions as well - which all overlap in an inconsistent mess. So: - move all select's from X86_32 and X86_64 to the X64 config option - sort their names, so that duplications are easier to spot - align their if clauses, so that they are easier to identify at a glance - and so that weirdnesses stand out more No change in functionality: 105 insertions(+) 105 deletions(-) Originally-from: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20150602153027.GU3644@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 5月, 2015 3 次提交
-
-
由 Jan Beulich 提交于
... as their only caller is. Signed-off-by: NJan Beulich <jbeulich@suse.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/5566EE07020000780007E683@mail.emea.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rafael J. Wysocki 提交于
Commit 97badf87 (device property: Make it possible to use secondary firmware nodes) uncovered a bug in the x86 (and ia64) PCI host bridge initialization code that assumes bridge->bus->sysdata to always point to a struct pci_sysdata object which need not be the case (in particular, the Xen PCI frontend driver sets it to point to a different data type). If it is not the case, an incorrect pointer (or a piece of data that is not a pointer at all) will be passed to ACPI_COMPANION_SET() and that may cause interesting breakage to happen going forward. To work around this problem use the observation that the ACPI host bridge initialization always passes NULL as parent to pci_create_root_bus(), so if pcibios_root_bridge_prepare() sees a non-NULL parent of the bridge, it should not attempt to set an ACPI companion for it, because that means that pci_create_root_bus() has been called by someone else. Fixes: 97badf87 (device property: Make it possible to use secondary firmware nodes) Reported-and-tested-by: NSander Eikelenboom <linux@eikelenboom.it> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com>
-
Changes mainly to account for minor differences in Knights Landing(KNL): 1. KNL supports C1 and C6 core states. 2. KNL supports PC2, PC3 and PC6 package states. 3. KNL has a different encoding of the TURBO_RATIO_LIMIT MSR Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 27 5月, 2015 12 次提交
-
-
由 Luis R. Rodriguez 提交于
Two Linux device drivers cannot work with PAT and the work required to make them work is significant. There is not enough motivation to convert these drivers over to use PAT properly, the compromise reached is to let drivers that cannot be ported to PAT check if PAT was enabled and if so fail on probe with a recommendation to boot with the "nopat" kernel parameter. Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Doug Ledford <dledford@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430425520-22275-4-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-14-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Luis R. Rodriguez 提交于
We use pat_enabled in x86-specific code to see if PAT is enabled or not but we're granting full access to it even though readers do not need to set it. If, for instance, we granted access to it to modules later they then could override the variable setting... no bueno. This renames pat_enabled to a new static variable __pat_enabled. Folks are redirected to use pat_enabled() now. Code that sets this can only be internal to pat.c. Apart from the early kernel parameter "nopat" to disable PAT, we also have a few cases that disable it later and make use of a helper pat_disable(). It is wrapped under an ifdef but since that code cannot run unless PAT was enabled its not required to wrap it with ifdefs, unwrap that. Likewise, since "nopat" doesn't really change non-PAT systems just remove that ifdef as well. Although we could add and use an early_param_off(), these helpers don't use __read_mostly but we want to keep __read_mostly for __pat_enabled as this is a hot path -- upon boot, for instance, a simple guest may see ~4k accesses to pat_enabled(). Since __read_mostly early boot params are not that common we don't add a helper for them just yet. Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Doug Ledford <dledford@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kyle McMartin <kyle@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-13-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Luis R. Rodriguez 提交于
It is possible to enable CONFIG_MTRR and CONFIG_X86_PAT and end up with a system with MTRR functionality disabled but PAT functionality enabled. This can happen, for instance, when the Xen hypervisor is used where MTRRs are not supported but PAT is. This can happen on Linux as of commit 47591df5 ("xen: Support Xen pv-domains using PAT") by Juergen, introduced in v3.19. Technically, we should assume the proper CPU bits would be set to disable MTRRs but we can't always rely on this. At least on the Xen Hypervisor, for instance, only X86_FEATURE_MTRR was disabled as of Xen 4.4 through Xen commit 586ab6a [0], but not X86_FEATURE_K6_MTRR, X86_FEATURE_CENTAUR_MCR, or X86_FEATURE_CYRIX_ARR for instance. Roger Pau Monné has clarified though that although this is technically true we will never support PVH on these CPU types so Xen has no need to disable these bits on those systems. As per Roger, AMD K6, Centaur and VIA chips don't have the necessary hardware extensions to allow running PVH guests [1]. As per Toshi it is also possible for the BIOS to disable MTRR support, in such cases get_mtrr_state() would update the MTRR state as per the BIOS, we need to propagate this information as well. x86 MTRR code relies on quite a bit of checks for mtrr_if being set to check to see if MTRRs did get set up. Instead, lets provide a generic getter for that. This also adds a few checks where they were not before which could potentially safeguard ourselves against incorrect usage of MTRR where this was not desirable. Where possible match error codes as if MTRRs were disabled on arch/x86/include/asm/mtrr.h. Lastly, since disabling MTRRs can happen at run time and we could end up with PAT enabled, best record now in our logs when MTRRs are disabled. [0] ~/devel/xen (git::stable-4.5)$ git describe --contains 586ab6a 4.4.0-rc1~18 [1] http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg03460.htmlSigned-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Ville Syrjälä <syrjala@sci.fi> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: bhelgaas@google.com Cc: david.vrabel@citrix.com Cc: jbeulich@suse.com Cc: konrad.wilk@oracle.com Cc: venkatesh.pallipadi@intel.com Cc: ville.syrjala@linux.intel.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1426893517-2511-3-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-12-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Luis R. Rodriguez 提交于
There is only one user but since we're going to bury MTRR next out of access to drivers, expose this last piece of API to drivers in a general fashion only needing io.h for access to helpers. Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Abhilash Kesavan <a.kesavan@samsung.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Cristian Stoica <cristian.stoica@freescale.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thierry Reding <treding@nvidia.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Ville Syrjälä <syrjala@sci.fi> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will.deacon@arm.com> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-11-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Luis R. Rodriguez 提交于
As part of the effort to phase out MTRR use document write-combining MTRR effects on pages with different non-PAT page attributes flags and different PAT entry values. Extend arch_phys_wc_add() documentation to clarify power of two sizes / boundary requirements as we phase out mtrr_add() use. Lastly hint towards ioremap_uc() for corner cases on device drivers working with devices with mixed regions where MTRR size requirements would otherwise not enable write-combining effective memory types. Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Ville Syrjälä <syrjala@sci.fi> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: linux-fbdev@vger.kernel.org Link: http://lkml.kernel.org/r/1430343851-967-3-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-10-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Luis R. Rodriguez 提交于
Use pr_info() instead of the old printk to prefix the component where things are coming from. With this readers will know exactly where the message is coming from. We use pr_* helpers but define pr_fmt to the empty string for easier grepping for those error messages. We leave the users of dprintk() in place, this will print only when the debugpat kernel parameter is enabled. We want to leave those enabled as a debug feature, but also make them use the same prefix. Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> [ Kill pr_fmt. ] Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Walls <awalls@md.metrocast.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Doug Ledford <dledford@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cocci@systeme.lip6.fr Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Link: http://lkml.kernel.org/r/1430425520-22275-2-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-9-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Toshi Kani 提交于
This patch adds the argument 'uniform' to mtrr_type_lookup(), which gets set to 1 when a given range is covered uniformly by MTRRs, i.e. the range is fully covered by a single MTRR entry or the default type. Change pud_set_huge() and pmd_set_huge() to honor the 'uniform' flag to see if it is safe to create a huge page mapping in the range. This allows them to create a huge page mapping in a range covered by a single MTRR entry of any memory type. It also detects a non-optimal request properly. They continue to check with the WB type since it does not effectively change the uniform mapping even if a request spans multiple MTRR entries. pmd_set_huge() logs a warning message to a non-optimal request so that driver writers will be aware of such a case. Drivers should make a mapping request aligned to a single MTRR entry when the range is covered by MTRRs. Signed-off-by: NToshi Kani <toshi.kani@hp.com> [ Realign, flesh out comments, improve warning message. ] Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-8-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Toshi Kani 提交于
MTRRs contain fixed and variable entries. mtrr_type_lookup() may repeatedly call __mtrr_type_lookup() to handle a request that overlaps with variable entries. However, __mtrr_type_lookup() also handles the fixed entries, which do not have to be repeated. Therefore, this patch creates separate functions, mtrr_type_lookup_fixed() and mtrr_type_lookup_variable(), to handle the fixed and variable ranges respectively. The patch also updates the function headers to clarify the return values and output argument. It updates comments to clarify that the repeating is necessary to handle overlaps with the default type, since overlaps with multiple entries alone can be handled without such repeating. There is no functional change in this patch. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-6-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Toshi Kani 提交于
mtrr_type_lookup() returns verbatim 0xFF when MTRRs are disabled. This patch defines MTRR_TYPE_INVALID to clarify the meaning of this value, and documents its usage. Document the return values of the kernel virtual address mapping helpers pud_set_huge(), pmd_set_huge, pud_clear_huge() and pmd_clear_huge(). There is no functional change in this patch. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Toshi Kani 提交于
'mtrr_state.enabled' contains the FE (fixed MTRRs enabled) and E (MTRRs enabled) flags in MSR_MTRRdefType. Intel SDM, section 11.11.2.1, defines these flags as follows: - All MTRRs are disabled when the E flag is clear. The FE flag has no affect when the E flag is clear. - The default type is enabled when the E flag is set. - MTRR variable ranges are enabled when the E flag is set. - MTRR fixed ranges are enabled when both E and FE flags are set. MTRR state checks in __mtrr_type_lookup() do not match with SDM. Hence, this patch makes the following changes: - The current code detects MTRRs disabled when both E and FE flags are clear in mtrr_state.enabled. Fix to detect MTRRs disabled when the E flag is clear. - The current code does not check if the FE bit is set in mtrr_state.enabled when looking at the fixed entries. Fix to check the FE flag. - The current code returns the default type when the E flag is clear in mtrr_state.enabled. However, the default type is UC when the E flag is clear. Remove the code as this case is handled as MTRR disabled with the 1st change. In addition, this patch defines the E and FE flags in mtrr_state.enabled as follows. - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED - E flag: MTRR_STATE_MTRR_ENABLED print_mtrr_state() and x86_get_mtrr_mem_range() are also updated accordingly. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Toshi Kani 提交于
When an MTRR entry is inclusive to a requested range, i.e. the start and end of the request are not within the MTRR entry range but the range contains the MTRR entry entirely: range_start ... [mtrr_start ... mtrr_end] ... range_end __mtrr_type_lookup() ignores such a case because both start_state and end_state are set to zero. This bug can cause the following issues: 1) reserve_memtype() tracks an effective memory type in case a request type is WB (ex. /dev/mem blindly uses WB). Missing to track with its effective type causes a subsequent request to map the same range with the effective type to fail. 2) pud_set_huge() and pmd_set_huge() check if a requested range has any overlap with MTRRs. Missing to detect an overlap may cause a performance penalty or undefined behavior. This patch fixes the bug by adding a new flag, 'inclusive', to detect the inclusive case. This case is then handled in the same way as end_state:1 since the first region is the same. With this fix, __mtrr_type_lookup() handles the inclusive case properly. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-3-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Toshi Kani 提交于
Simplify the conditions selecting HAVE_ARCH_HUGE_VMAP since X86_PAE depends on X86_32 already. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-2-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 5月, 2015 1 次提交
-
-
由 Alexei Starovoitov 提交于
x86 has variable length encoding. x86 JIT compiler is trying to pick the shortest encoding for given bpf instruction. While doing so the jump targets are changing, so JIT is doing multiple passes over the program. Typical program needs 3 passes. Some very short programs converge with 2 passes. Large programs may need 4 or 5. But specially crafted bpf programs may hit the pass limit and if the program converges on the last iteration the JIT compiler will be producing an image full of 'int 3' insns. Fix this corner case by doing final iteration over bpf program. Fixes: 0a14842f ("net: filter: Just In Time compiler for x86-64") Reported-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Tested-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 5月, 2015 4 次提交
-
-
由 Liang Li 提交于
The MPX feature requires eager KVM FPU restore support. We have verified that MPX cannot work correctly with the current lazy KVM FPU restore mechanism. Eager KVM FPU restore should be enabled if the MPX feature is exposed to VM. Signed-off-by: NYang Zhang <yang.z.zhang@intel.com> Signed-off-by: NLiang Li <liang.z.li@intel.com> [Also activate the FPU on AMD processors. - Paolo] Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This reverts commit 4473b570. We'll use the hook again. Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andrea Arcangeli 提交于
memslot->userfault_addr is set by the kernel with a mmap executed from the kernel but the userland can still munmap it and lead to the below oops after memslot->userfault_addr points to a host virtual address that has no vma or mapping. [ 327.538306] BUG: unable to handle kernel paging request at fffffffffffffffe [ 327.538407] IP: [<ffffffff811a7b55>] put_page+0x5/0x50 [ 327.538474] PGD 1a01067 PUD 1a03067 PMD 0 [ 327.538529] Oops: 0000 [#1] SMP [ 327.538574] Modules linked in: macvtap macvlan xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT iptable_filter ip_tables tun bridge stp llc rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache xprtrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ipmi_devintf iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp dcdbas intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd pcspkr sb_edac edac_core ipmi_si ipmi_msghandler acpi_pad wmi acpi_power_meter lpc_ich mfd_core mei_me [ 327.539488] mei shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc mlx4_ib ib_sa ib_mad ib_core mlx4_en vxlan ib_addr ip_tunnel xfs libcrc32c sd_mod crc_t10dif crct10dif_common crc32c_intel mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ahci i2c_core libahci mlx4_core libata tg3 ptp pps_core megaraid_sas ntb dm_mirror dm_region_hash dm_log dm_mod [ 327.539956] CPU: 3 PID: 3161 Comm: qemu-kvm Not tainted 3.10.0-240.el7.userfault19.4ca4011.x86_64.debug #1 [ 327.540045] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.1.2 01/20/2014 [ 327.540115] task: ffff8803280ccf00 ti: ffff880317c58000 task.ti: ffff880317c58000 [ 327.540184] RIP: 0010:[<ffffffff811a7b55>] [<ffffffff811a7b55>] put_page+0x5/0x50 [ 327.540261] RSP: 0018:ffff880317c5bcf8 EFLAGS: 00010246 [ 327.540313] RAX: 00057ffffffff000 RBX: ffff880616a20000 RCX: 0000000000000000 [ 327.540379] RDX: 0000000000002014 RSI: 00057ffffffff000 RDI: fffffffffffffffe [ 327.540445] RBP: ffff880317c5bd10 R08: 0000000000000103 R09: 0000000000000000 [ 327.540511] R10: 0000000000000000 R11: 0000000000000000 R12: fffffffffffffffe [ 327.540576] R13: 0000000000000000 R14: ffff880317c5bd70 R15: ffff880317c5bd50 [ 327.540643] FS: 00007fd230b7f700(0000) GS:ffff880630800000(0000) knlGS:0000000000000000 [ 327.540717] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 327.540771] CR2: fffffffffffffffe CR3: 000000062a2c3000 CR4: 00000000000427e0 [ 327.540837] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 327.540904] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 327.540974] Stack: [ 327.541008] ffffffffa05d6d0c ffff880616a20000 0000000000000000 ffff880317c5bdc0 [ 327.541093] ffffffffa05ddaa2 0000000000000000 00000000002191bf 00000042f3feab2d [ 327.541177] 00000042f3feab2d 0000000000000002 0000000000000001 0321000000000000 [ 327.541261] Call Trace: [ 327.541321] [<ffffffffa05d6d0c>] ? kvm_vcpu_reload_apic_access_page+0x6c/0x80 [kvm] [ 327.543615] [<ffffffffa05ddaa2>] vcpu_enter_guest+0x3f2/0x10f0 [kvm] [ 327.545918] [<ffffffffa05e2f10>] kvm_arch_vcpu_ioctl_run+0x2b0/0x5a0 [kvm] [ 327.548211] [<ffffffffa05e2d02>] ? kvm_arch_vcpu_ioctl_run+0xa2/0x5a0 [kvm] [ 327.550500] [<ffffffffa05ca845>] kvm_vcpu_ioctl+0x2b5/0x680 [kvm] [ 327.552768] [<ffffffff810b8d12>] ? creds_are_invalid.part.1+0x12/0x50 [ 327.555069] [<ffffffff810b8d71>] ? creds_are_invalid+0x21/0x30 [ 327.557373] [<ffffffff812d6066>] ? inode_has_perm.isra.49.constprop.65+0x26/0x80 [ 327.559663] [<ffffffff8122d985>] do_vfs_ioctl+0x305/0x530 [ 327.561917] [<ffffffff8122dc51>] SyS_ioctl+0xa1/0xc0 [ 327.564185] [<ffffffff816de829>] system_call_fastpath+0x16/0x1b [ 327.566480] Code: 0b 31 f6 4c 89 e7 e8 4b 7f ff ff 0f 0b e8 24 fd ff ff e9 a9 fd ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 2a 8b 47 1c 85 c0 74 1e f0 Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Ingo Molnar 提交于
The kernel's handling of 'compacted' xsave state layout is buggy: http://marc.info/?l=linux-kernel&m=142967852317199 I don't have such a system, and the description there is vague, but from extrapolation I guess that there were two kinds of bugs observed: - boot crashes, due to size calculations being wrong and the dynamic allocation allocating a too small xstate area. (This is now fixed in the new FPU code - but still present in stable kernels.) - FPU state corruption and ABI breakage: if signal handlers try to change the FPU state in standard format, which then the kernel tries to restore in the compacted format. These breakages are scary, but they only occur on a small number of systems that have XSAVES* CPU support. Yet we have had XSAVES support in the upstream kernel for a large number of stable kernel releases, and the fixes are involved and unproven. So do the safe resolution first: disable XSAVES* support and only use the standard xstate format. This makes the code work and is easy to backport. On top of this we can work on enabling (and testing!) proper compacted format support, without backporting pressure, on top of the new, cleaned up FPU code. Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 5月, 2015 6 次提交
-
-
由 Feng Wu 提交于
Show the statistics information for notification event and wakeup event for posted-interrupt in /proc/interrupts. [ tglx: Named the short identifiers PIN and PIW to match the long identifiers ] Signed-off-by: NFeng Wu <feng.wu@intel.com> Cc: jiang.liu@linux.intel.com Link: http://lkml.kernel.org/r/1432026437-16560-5-git-send-email-feng.wu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Feng Wu 提交于
Currently, we use a global vector as the Posted-Interrupts Notification Event for all the vCPUs in the system. We need to introduce another global vector for VT-d Posted-Interrtups, which will be used to wakeup the sleep vCPU when an external interrupt from a direct-assigned device happens for that vCPU. [ tglx: Removed a gazillion of extra newlines ] Signed-off-by: NFeng Wu <feng.wu@intel.com> Cc: jiang.liu@linux.intel.com Link: http://lkml.kernel.org/r/1432026437-16560-4-git-send-email-feng.wu@intel.comSuggested-by: NYang Zhang <yang.z.zhang@intel.com> Acked-by: NH. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Feng Wu 提交于
Implement irq_set_vcpu_affinity for pci_msi_ir_controller. Signed-off-by: NFeng Wu <feng.wu@intel.com> Reviewed-by: NJiang Liu <jiang.liu@linux.intel.com> Link: http://lkml.kernel.org/r/1432026437-16560-3-git-send-email-feng.wu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Paul Gortmaker 提交于
This routine has been around for over a decade, but with EISA being dead and abandoned for about twice that long, the name can be kind of confusing. The function is going at the PIC Edge/Level Configuration Registers (ELCR), so rename it as such and mentally decouple it from the long since dead EISA bus. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Reviewed-by: NMaciej W. Rozycki <macro@linux-mips.org> Acked-by: NPavel Machek <pavel@ucw.cz> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Len Brown <len.brown@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1431217657-934-1-git-send-email-paul.gortmaker@windriver.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Peter Zijlstra 提交于
Since set_mb() is really about an smp_mb() -- not a IO/DMA barrier like mb() rename it to match the recent smp_load_acquire() and smp_store_release(). Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Since we assume set_mb() to result in a single store followed by a full memory barrier, employ WRITE_ONCE(). Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 5月, 2015 2 次提交
-
-
由 Ingo Molnar 提交于
So while testing kernels using tools/kvm/ (kvmtool) I noticed that it booted super slow: [ 0.142991] Performance Events: no PMU driver, software events only. [ 0.149265] x86: Booting SMP configuration: [ 0.149765] .... node #0, CPUs: #1 [ 0.148304] kvm-clock: cpu 1, msr 2:1bfe9041, secondary cpu clock [ 10.158813] KVM setup async PF for cpu 1 [ 10.159000] #2 [ 10.159000] kvm-stealtime: cpu 1, msr 211a4d400 [ 10.158829] kvm-clock: cpu 2, msr 2:1bfe9081, secondary cpu clock [ 20.167805] KVM setup async PF for cpu 2 [ 20.168000] #3 [ 20.168000] kvm-stealtime: cpu 2, msr 211a8d400 [ 20.167818] kvm-clock: cpu 3, msr 2:1bfe90c1, secondary cpu clock [ 30.176902] KVM setup async PF for cpu 3 [ 30.177000] #4 [ 30.177000] kvm-stealtime: cpu 3, msr 211acd400 One CPU booted up per 10 seconds. With 120 CPUs that takes a while. Bisection pinpointed this commit: 853b160a ("Revert f5d6a52f ("x86/smpboot: Skip delays during SMP initialization similar to Xen")") But that commit just restores previous behavior, so it cannot cause the problem. After some head scratching it turns out that these two commits: 1a744cb3 ("x86/smp/boot: Remove 10ms delay from cpu_up() on modern processors") d68921f9 ("x86/smp/boot: Add cmdline "cpu_init_udelay=N" to specify cpu_up() delay") added the following code to smpboot.c: - mdelay(10); + mdelay(init_udelay); Note the mismatch in the units: the delay is called 'udelay' and is set to microseconds - while the function used here is actually 'mdelay', which counts in milliseconds ... So the delay for legacy systems is off by a factor of 1,000, so instead of 10 msecs we waited for 10 seconds ... The reason bisection pointed to 853b160a was that 853b160a removed a (broken) boot-time speedup patch, which masked the factor 1,000 bug. Fix it by using udelay(). This fixes my bootup problems. Cc: Len Brown <len.brown@intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Derek noticed that a critical MCE gets reported with the wrong error type description: [Hardware Error]: CPU 34: Machine Check Exception: 5 Bank 9: f200003f000100b0 [Hardware Error]: RIP !INEXACT! 10:<ffffffff812e14c1> {intel_idle+0xb1/0x170} [Hardware Error]: TSC 49587b8e321cb [Hardware Error]: PROCESSOR 0:306e4 TIME 1431561296 SOCKET 1 APIC 29 [Hardware Error]: Some CPUs didn't answer in synchronization [Hardware Error]: Machine check: Invalid ^^^^^^^ The last line with 'Invalid' should have printed the high level MCE error type description we get from mce_severity, i.e. something like: [Hardware Error]: Machine check: Action required: data load error in a user process this happens due to the fact that mce_no_way_out() iterates over all MCA banks and possibly overwrites the @msg argument which is used in the panic printing later. Change behavior to take the message of only and the (last) critical MCE it detects. Reported-by: NDerek <denc716@gmail.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1431936437-25286-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 5月, 2015 3 次提交
-
-
由 Denys Vlasenko 提交于
The "movw %ds,%cx" instruction needs a 0x66 prefix, while "movl %ds,%ecx" does not. The difference is that latter form (on 64-bit CPUs) overwrites the entire %ecx, not only its lower half. But subsequent code doesn't depend on the value of upper half of %ecx, so we can safely use the shorter instruction. The new code is also faster than the old one - now we don't depend on the old value of %ecx, but this code fragment is not performance-critical so it does not matter much. Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1431722346-26585-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Make the disassembly look less confusing: -- head_64.o.before.asm ++ head_64.o.after.asm 0000000000000120 <early_idt_handler>: 120: fc cld 121: 83 3c 24 02 cmpl $0x2,(%rsp) - 125: 0f 84 9d 00 00 00 je 1c8 <is_nmi> + 125: 0f 84 9d 00 00 00 je 1c8 <early_idt_handler+0xa8> 12b: 83 3d 00 00 00 00 02 cmpl $0x2,0x0(%rip) # 132 <early_idt_handler+0x12> 132: 74 7e je 1b2 <early_idt_handler+0x92> 134: ff 05 00 00 00 00 incl 0x0(%rip) # 13a <early_idt_handler+0x1a> @@ -1198,9 +1198,7 @@ Disassembly of section .init.text: 1bf: 5a pop %rdx 1c0: 59 pop %rcx 1c1: 58 pop %rax - 1c2: ff 0d 00 00 00 00 decl 0x0(%rip) # 1c8 <is_nmi> - -00000000000001c8 <is_nmi>: + 1c2: ff 0d 00 00 00 00 decl 0x0(%rip) # 1c8 <early_idt_handler+0xa8> 1c8: 48 83 c4 10 add $0x10,%rsp 1cc: 48 cf iretq -- head_32.o.before.asm ++ head_32.o.after.asm 0000016c <early_idt_handler>: 16c: fc cld 16d: 83 3c 24 02 cmpl $0x2,(%esp) - 171: 74 73 je 1e6 <is_nmi> + 171: 74 73 je 1e6 <ex_entry+0xc> 173: 36 83 3d 00 00 00 00 cmpl $0x2,%ss:0x0 17a: 02 17b: 74 5a je 1d7 <hlt_loop> @@ -483,8 +483,6 @@ Disassembly of section .init.text: 1dd: 59 pop %ecx 1de: 58 pop %eax 1df: 36 ff 0d 00 00 00 00 decl %ss:0x0 - -000001e6 <is_nmi>: 1e6: 83 c4 08 add $0x8,%esp 1e9: cf iret 1ea: 66 90 xchg %ax,%ax No functionality change. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431793079-11153-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
Packing loops tightly (-falign-loops=1) is beneficial to code size: text data bss dec filename 12566391 1617840 1089536 15273767 vmlinux.align.16-byte 12224951 1617840 1089536 14932327 vmlinux.align.1-byte 11976567 1617840 1089536 14683943 vmlinux.align.1-byte.funcs-1-byte 11903735 1617840 1089536 14611111 vmlinux.align.1-byte.funcs-1-byte.loops-1-byte Which reduces the size of the kernel by another 0.6%, so the the total combined size reduction of the alignment-packing patches is ~5.5%. The x86 decoder bandwidth and caching arguments laid out in: be6cb027 ("x86: Align jump targets to 1-byte boundaries") apply to loop alignment as well. Furtermore, modern CPU uarchs have a loop cache/buffer that is a L0 cache before even any uop cache, covering a few dozen most recently executed instructions. This loop cache generally does not have the 16-byte alignment restrictions of the uop cache. Now loop alignment can still be beneficial if: - a loop is cache-hot and its surroundings are not. - if the loop is so cache hot that the instruction flow becomes x86 decoder bandwidth limited But loop alignment is harmful if: - a loop is cache-cold - a loop's surroundings are cache-hot as well - two cache-hot loops are close to each other - if the loop fits into the loop cache - if the code flow is not decoder bandwidth limited and I'd argue that the latter five scenarios are much more common in the kernel, as our hottest loops are typically: - pointer chasing: this should fit into the loop cache in most cases and is typically data cache and address generation limited - generic memory ops (memset, memcpy, etc.): these generally fit into the loop cache as well, and are likewise data cache limited. So this patch packs loop addresses tightly as well. Acked-by: NDenys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/20150410123017.GB19918@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 5月, 2015 3 次提交
-
-
由 Thomas Gleixner 提交于
smp.c and irq_work.c implement the same inline helper. Move it to apic.h and use it everywhere. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
-
由 Thomas Gleixner 提交于
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Ingo Molnar 提交于
The following NOP in a hot function caught my attention: > 5a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) That's a dead NOP that bloats the function a bit, added for the default 16-byte alignment that GCC applies for jump targets. I realize that x86 CPU manufacturers recommend 16-byte jump target alignments (it's in the Intel optimization manual), to help their relatively narrow decoder prefetch alignment and uop cache constraints, but the cost of that is very significant: text data bss dec filename 12566391 1617840 1089536 15273767 vmlinux.align.16-byte 12224951 1617840 1089536 14932327 vmlinux.align.1-byte By using 1-byte jump target alignment (i.e. no alignment at all) we get an almost 3% reduction in kernel size (!) - and a probably similar reduction in I$ footprint. Now, the usual justification for jump target alignment is the following: - modern decoders tend to have 16-byte (effective) decoder prefetch windows. (AMD documents it higher but measurements suggest the effective prefetch window on curretn uarchs is still around 16 bytes) - on Intel there's also the uop-cache with cachelines that have 16-byte granularity and limited associativity. - older x86 uarchs had a penalty for decoder fetches that crossed 16-byte boundaries. These limits are mostly gone from recent uarchs. So if a forward jump target is aligned to cacheline boundary then prefetches will start from a new prefetch-cacheline and there's higher chance for decoding in fewer steps and packing tightly. But I think that argument is flawed for typical optimized kernel code flows: forward jumps often go to 'cold' (uncommon) pieces of code, and aligning cold code to cache lines does not bring a lot of advantages (they are uncommon), while it causes collateral damage: - their alignment 'spreads out' the cache footprint, it shifts followup hot code further out - plus it slows down even 'cold' code that immediately follows 'hot' code (like in the above case), which could have benefited from the partial cacheline that comes off the end of hot code. But even in the cache-hot case the 16 byte alignment brings disadvantages: - it spreads out the cache footprint, possibly making the code fall out of the L1 I$. - On Intel CPUs, recent microarchitectures have plenty of uop cache (typically doubling every 3 years) - while the size of the L1 cache grows much less aggressively. So workloads are rarely uop cache limited. The only situation where alignment might matter are tight loops that could fit into a single 16 byte chunk - but those are pretty rare in the kernel: if they exist they tend to be pointer chasing or generic memory ops, which both tend to be cache miss (or cache allocation) intensive and are not decoder bandwidth limited. So the balance of arguments strongly favors packing kernel instructions tightly versus maximizing for decoder bandwidth: this patch changes the jump target alignment from 16 bytes to 1 byte (tightly packed, unaligned). Acked-by: NDenys Vlasenko <dvlasenk@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Aswin Chandramouleeswaran <aswin@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Link: http://lkml.kernel.org/r/20150410120846.GA17101@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 14 5月, 2015 3 次提交
-
-
由 Borislav Petkov 提交于
Move __copy_user_nocache() to arch/x86/lib/copy_user_64.S and kill the containing file. No functionality change. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431538944-27724-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Pull it up into the header and kill duplicate versions. Separately, both macros are identical: 35948b2bd3431aee7149e85cfe4becbc /tmp/a 35948b2bd3431aee7149e85cfe4becbc /tmp/b Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431538944-27724-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
No code changed: # arch/x86/lib/copy_user_nocache_64.o: text data bss dec hex filename 390 0 0 390 186 copy_user_nocache_64.o.before 390 0 0 390 186 copy_user_nocache_64.o.after md5: 7fa0577b28700af89d3a67a8b590426e copy_user_nocache_64.o.before.asm 7fa0577b28700af89d3a67a8b590426e copy_user_nocache_64.o.after.asm Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1431538944-27724-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 5月, 2015 1 次提交
-
-
由 Thomas Gleixner 提交于
We removed the only user of this define in the rtmutex code. Get rid of it. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
-