1. 21 7月, 2015 1 次提交
  2. 06 7月, 2015 4 次提交
  3. 25 6月, 2015 1 次提交
    • T
      mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute · fc6daaf9
      Tony Luck 提交于
      Some high end Intel Xeon systems report uncorrectable memory errors as a
      recoverable machine check.  Linux has included code for some time to
      process these and just signal the affected processes (or even recover
      completely if the error was in a read only page that can be replaced by
      reading from disk).
      
      But we have no recovery path for errors encountered during kernel code
      execution.  Except for some very specific cases were are unlikely to ever
      be able to recover.
      
      Enter memory mirroring. Actually 3rd generation of memory mirroing.
      
      Gen1: All memory is mirrored
      	Pro: No s/w enabling - h/w just gets good data from other side of the
      	     mirror
      	Con: Halves effective memory capacity available to OS/applications
      
      Gen2: Partial memory mirror - just mirror memory begind some memory controllers
      	Pro: Keep more of the capacity
      	Con: Nightmare to enable. Have to choose between allocating from
      	     mirrored memory for safety vs. NUMA local memory for performance
      
      Gen3: Address range partial memory mirror - some mirror on each memory
            controller
      	Pro: Can tune the amount of mirror and keep NUMA performance
      	Con: I have to write memory management code to implement
      
      The current plan is just to use mirrored memory for kernel allocations.
      This has been broken into two phases:
      
      1) This patch series - find the mirrored memory, use it for boot time
         allocations
      
      2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the
         unused mirrored memory from mm/memblock.c and only give it out to
         select kernel allocations (this is still being scoped because
         page_alloc.c is scary).
      
      This patch (of 3):
      
      Add extra "flags" to memblock to allow selection of memory based on
      attribute.  No functional changes
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Xiexiuqi <xiexiuqi@huawei.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc6daaf9
  4. 09 6月, 2015 14 次提交
  5. 07 6月, 2015 9 次提交
    • T
      x86/mm/pat: Add set_memory_wt() for Write-Through type · 623dffb2
      Toshi Kani 提交于
      Now that reserve_ram_pages_type() accepts the WT type, add
      set_memory_wt(), set_memory_array_wt() and set_pages_array_wt()
      in order to be able to set memory to Write-Through page cache
      mode.
      
      Also, extend ioremap_change_attr() to accept the WT type.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: hch@lst.de
      Cc: hmh@hmh.eng.br
      Cc: jgross@suse.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-nvdimm@lists.01.org
      Cc: stefan.bader@canonical.com
      Cc: yigal@plexistor.com
      Link: http://lkml.kernel.org/r/1433436928-31903-13-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      623dffb2
    • T
      x86/mm/pat: Extend set_page_memtype() to support Write-Through type · 35a5a104
      Toshi Kani 提交于
      As set_memory_wb() calls free_ram_pages_type(), which then calls
      set_page_memtype() with -1, _PGMT_DEFAULT is used for tracking
      the WB type. _PGMT_WB is defined but unused. Thus, rename
      _PGMT_DEFAULT to _PGMT_WB to clarify the usage, and release the
      slot used by _PGMT_WB.
      
      Furthermore, change free_ram_pages_type() to call
      set_page_memtype() with _PGMT_WB, and get_page_memtype() to
      return _PAGE_CACHE_MODE_WB for _PGMT_WB.
      
      Then, define _PGMT_WT in the freed slot. This allows
      set_page_memtype() to track the WT type.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: hch@lst.de
      Cc: hmh@hmh.eng.br
      Cc: jgross@suse.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-nvdimm@lists.01.org
      Cc: stefan.bader@canonical.com
      Cc: yigal@plexistor.com
      Link: http://lkml.kernel.org/r/1433436928-31903-12-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35a5a104
    • T
      x86/mm/pat: Add pgprot_writethrough() · d1b4bfbf
      Toshi Kani 提交于
      Add pgprot_writethrough() for setting page protection flags to
      Write-Through mode.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: hch@lst.de
      Cc: hmh@hmh.eng.br
      Cc: jgross@suse.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-nvdimm@lists.01.org
      Cc: stefan.bader@canonical.com
      Cc: yigal@plexistor.com
      Link: http://lkml.kernel.org/r/1433436928-31903-11-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d1b4bfbf
    • T
      x86/mm, asm-generic: Add ioremap_wt() for creating Write-Through mappings · d838270e
      Toshi Kani 提交于
      Add ioremap_wt() for creating Write-Through mappings on x86. It
      follows the same model as ioremap_wc() for multi-arch support.
      Define ARCH_HAS_IOREMAP_WT in the x86 version of io.h to
      indicate that ioremap_wt() is implemented on x86.
      
      Also update the PAT documentation file to cover ioremap_wt().
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: hch@lst.de
      Cc: hmh@hmh.eng.br
      Cc: jgross@suse.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-nvdimm@lists.01.org
      Cc: stefan.bader@canonical.com
      Cc: yigal@plexistor.com
      Link: http://lkml.kernel.org/r/1433436928-31903-8-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d838270e
    • T
      x86/mm/pat: Change reserve_memtype() for Write-Through type · 0d69bdff
      Toshi Kani 提交于
      When a target range is in RAM, reserve_ram_pages_type() verifies
      the requested type. Change it to fail WT and WP requests with
      -EINVAL since set_page_memtype() is limited to handle three
      types: WB, WC and UC-.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: hch@lst.de
      Cc: hmh@hmh.eng.br
      Cc: jgross@suse.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-nvdimm@lists.01.org
      Cc: stefan.bader@canonical.com
      Cc: yigal@plexistor.com
      Link: http://lkml.kernel.org/r/1433436928-31903-6-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0d69bdff
    • T
      x86/mm/pat: Use 7th PAT MSR slot for Write-Through PAT type · d79a40ca
      Toshi Kani 提交于
      Assign Write-Through type to the PA7 slot in the PAT MSR when
      the processor is not affected by PAT errata. The PA7 slot is
      chosen to improve robustness in the presence of errata that
      might cause the high PAT bit to be ignored. This way a buggy PA7
      slot access will hit the PA3 slot, which is UC, so at worst we
      lose performance without causing a correctness issue.
      
      The following Intel processors are affected by the PAT errata.
      
        Errata               CPUID
        ----------------------------------------------------
        Pentium 2, A52       family 0x6, model 0x5
        Pentium 3, E27       family 0x6, model 0x7, 0x8
        Pentium 3 Xenon, G26 family 0x6, model 0x7, 0x8, 0xa
        Pentium M, Y26       family 0x6, model 0x9
        Pentium M 90nm, X9   family 0x6, model 0xd
        Pentium 4, N46       family 0xf, model 0x0
      
      Instead of making sharp boundary checks, we remain conservative
      and exclude all Pentium 2, 3, M and 4 family processors. For
      those, _PAGE_CACHE_MODE_WT is redirected to UC- per the default
      setup in __cachemode2pte_tbl[].
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: hch@lst.de
      Cc: hmh@hmh.eng.br
      Cc: jgross@suse.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-nvdimm@lists.01.org
      Cc: stefan.bader@canonical.com
      Cc: yigal@plexistor.com
      Link: https://lkml.kernel.org/r/1433187393-22688-2-git-send-email-toshi.kani@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d79a40ca
    • B
      x86/mm/pat: Remove pat_enabled() checks · 7202fdb1
      Borislav Petkov 提交于
      Now that we emulate a PAT table when PAT is disabled, there's no
      need for those checks anymore as the PAT abstraction will handle
      those cases too.
      
      Based on a conglomerate patch from Toshi Kani.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: hch@lst.de
      Cc: hmh@hmh.eng.br
      Cc: jgross@suse.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-nvdimm@lists.01.org
      Cc: stefan.bader@canonical.com
      Cc: yigal@plexistor.com
      Link: http://lkml.kernel.org/r/1433436928-31903-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7202fdb1
    • B
      x86/mm/pat: Emulate PAT when it is disabled · 9cd25aac
      Borislav Petkov 提交于
      In the case when PAT is disabled on the command line with
      "nopat" or when virtualization doesn't support PAT (correctly) -
      see
      
        9d34cfdf ("x86: Don't rely on VMWare emulating PAT MSR correctly").
      
      we emulate it using the PWT and PCD cache attribute bits. Get
      rid of boot_pat_state while at it.
      
      Based on a conglomerate patch from Toshi Kani.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: hch@lst.de
      Cc: hmh@hmh.eng.br
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-nvdimm@lists.01.org
      Cc: stefan.bader@canonical.com
      Cc: yigal@plexistor.com
      Link: http://lkml.kernel.org/r/1433436928-31903-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9cd25aac
    • B
      x86/mm/pat: Untangle pat_init() · 9dac6290
      Borislav Petkov 提交于
      Split it into a BSP and AP version which makes the PAT
      initialization path actually readable again.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arnd@arndb.de
      Cc: hch@lst.de
      Cc: hmh@hmh.eng.br
      Cc: jgross@suse.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: linux-nvdimm@lists.01.org
      Cc: stefan.bader@canonical.com
      Cc: yigal@plexistor.com
      Link: http://lkml.kernel.org/r/1433436928-31903-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9dac6290
  6. 03 6月, 2015 1 次提交
    • S
      x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h> · d6472302
      Stephen Rothwell 提交于
      Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so
      remove it from there and fix up the resulting build problems
      triggered on x86 {64|32}-bit {def|allmod|allno}configs.
      
      The breakages were triggering in places where x86 builds relied
      on vmalloc() facilities but did not include <linux/vmalloc.h>
      explicitly and relied on the implicit inclusion via <asm/io.h>.
      
      Also add:
      
        - <linux/init.h> to <linux/io.h>
        - <asm/pgtable_types> to <asm/io.h>
      
      ... which were two other implicit header file dependencies.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      [ Tidied up the changelog. ]
      Acked-by: NDavid Miller <davem@davemloft.net>
      Acked-by: NTakashi Iwai <tiwai@suse.de>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Acked-by: NVinod Koul <vinod.koul@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Colin Cross <ccross@android.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: James E.J. Bottomley <JBottomley@odin.com>
      Cc: Jaroslav Kysela <perex@perex.cz>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Suma Ramars <sramars@cisco.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d6472302
  7. 28 5月, 2015 1 次提交
  8. 27 5月, 2015 5 次提交
    • L
      x86/mm/pat: Export pat_enabled() · fbe7193a
      Luis R. Rodriguez 提交于
      Two Linux device drivers cannot work with PAT and the work
      required to make them work is significant. There is not enough
      motivation to convert these drivers over to use PAT properly,
      the compromise reached is to let drivers that cannot be ported
      to PAT check if PAT was enabled and if so fail on probe with a
      recommendation to boot with the "nopat" kernel parameter.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Walls <awalls@md.metrocast.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1430425520-22275-4-git-send-email-mcgrof@do-not-panic.com
      Link: http://lkml.kernel.org/r/1432628901-18044-14-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fbe7193a
    • L
      x86/mm/pat: Wrap pat_enabled into a function API · cb32edf6
      Luis R. Rodriguez 提交于
      We use pat_enabled in x86-specific code to see if PAT is enabled
      or not but we're granting full access to it even though readers
      do not need to set it. If, for instance, we granted access to it
      to modules later they then could override the variable
      setting... no bueno.
      
      This renames pat_enabled to a new static variable __pat_enabled.
      Folks are redirected to use pat_enabled() now.
      
      Code that sets this can only be internal to pat.c. Apart from
      the early kernel parameter "nopat" to disable PAT, we also have
      a few cases that disable it later and make use of a helper
      pat_disable(). It is wrapped under an ifdef but since that code
      cannot run unless PAT was enabled its not required to wrap it
      with ifdefs, unwrap that. Likewise, since "nopat" doesn't really
      change non-PAT systems just remove that ifdef as well.
      
      Although we could add and use an early_param_off(), these
      helpers don't use __read_mostly but we want to keep
      __read_mostly for __pat_enabled as this is a hot path -- upon
      boot, for instance, a simple guest may see ~4k accesses to
      pat_enabled(). Since __read_mostly early boot params are not
      that common we don't add a helper for them just yet.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Walls <awalls@md.metrocast.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kyle McMartin <kyle@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1430425520-22275-3-git-send-email-mcgrof@do-not-panic.com
      Link: http://lkml.kernel.org/r/1432628901-18044-13-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb32edf6
    • L
      x86/mm/pat: Convert to pr_*() usage · 9e76561f
      Luis R. Rodriguez 提交于
      Use pr_info() instead of the old printk to prefix the component
      where things are coming from. With this readers will know
      exactly where the message is coming from. We use pr_* helpers
      but define pr_fmt to the empty string for easier grepping for
      those error messages.
      
      We leave the users of dprintk() in place, this will print only
      when the debugpat kernel parameter is enabled. We want to leave
      those enabled as a debug feature, but also make them use the
      same prefix.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      [ Kill pr_fmt. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Walls <awalls@md.metrocast.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: cocci@systeme.lip6.fr
      Cc: plagnioj@jcrosoft.com
      Cc: tomi.valkeinen@ti.com
      Link: http://lkml.kernel.org/r/1430425520-22275-2-git-send-email-mcgrof@do-not-panic.com
      Link: http://lkml.kernel.org/r/1432628901-18044-9-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9e76561f
    • T
      x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers · b73522e0
      Toshi Kani 提交于
      This patch adds the argument 'uniform' to mtrr_type_lookup(),
      which gets set to 1 when a given range is covered uniformly by
      MTRRs, i.e. the range is fully covered by a single MTRR entry or
      the default type.
      
      Change pud_set_huge() and pmd_set_huge() to honor the 'uniform'
      flag to see if it is safe to create a huge page mapping in the
      range.
      
      This allows them to create a huge page mapping in a range
      covered by a single MTRR entry of any memory type. It also
      detects a non-optimal request properly. They continue to check
      with the WB type since it does not effectively change the
      uniform mapping even if a request spans multiple MTRR entries.
      
      pmd_set_huge() logs a warning message to a non-optimal request
      so that driver writers will be aware of such a case. Drivers
      should make a mapping request aligned to a single MTRR entry
      when the range is covered by MTRRs.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      [ Realign, flesh out comments, improve warning message. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@intel.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: pebolle@tiscali.nl
      Link: http://lkml.kernel.org/r/1431714237-880-7-git-send-email-toshi.kani@hp.com
      Link: http://lkml.kernel.org/r/1432628901-18044-8-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b73522e0
    • T
      x86/mm/mtrr: Use symbolic define as a retval for disabled MTRRs · 3d3ca416
      Toshi Kani 提交于
      mtrr_type_lookup() returns verbatim 0xFF when MTRRs are
      disabled. This patch defines MTRR_TYPE_INVALID to clarify the
      meaning of this value, and documents its usage.
      
      Document the return values of the kernel virtual address mapping
      helpers pud_set_huge(), pmd_set_huge, pud_clear_huge() and
      pmd_clear_huge().
      
      There is no functional change in this patch.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Elliott@hp.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave.hansen@intel.com
      Cc: linux-mm <linux-mm@kvack.org>
      Cc: pebolle@tiscali.nl
      Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com
      Link: http://lkml.kernel.org/r/1432628901-18044-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3d3ca416
  9. 19 5月, 2015 4 次提交
    • I
      x86/fpu: Harmonize FPU register state types · c47ada30
      Ingo Molnar 提交于
      Use these consistent names:
      
          struct fregs_state           # was: i387_fsave_struct
          struct fxregs_state          # was: i387_fxsave_struct
          struct swregs_state          # was: i387_soft_struct
          struct xregs_state           # was: xsave_struct
          union  fpregs_state          # was: thread_xstate
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c47ada30
    • I
      x86/fpu: Rename all the fpregs, xregs, fxregs and fregs handling functions · c6813144
      Ingo Molnar 提交于
      Standardize the naming of the various functions that copy register
      content in specific FPU context formats:
      
        copy_fxregs_to_kernel()         # was: fpu_fxsave()
        copy_xregs_to_kernel()          # was: xsave_state()
      
        copy_kernel_to_fregs()          # was: frstor_checking()
        copy_kernel_to_fxregs()         # was: fxrstor_checking()
        copy_kernel_to_xregs()          # was: fpu_xrstor_checking()
        copy_kernel_to_xregs_booting()  # was: xrstor_state_booting()
      
        copy_fregs_to_user()            # was: fsave_user()
        copy_fxregs_to_user()           # was: fxsave_user()
        copy_xregs_to_user()            # was: xsave_user()
      
        copy_user_to_fregs()            # was: frstor_user()
        copy_user_to_fxregs()           # was: fxrstor_user()
        copy_user_to_xregs()            # was: xrestore_user()
        copy_user_to_fpregs_zeroing()   # was: restore_user_xstate()
      
      Eliminate fpu_xrstor_checking(), because it was just a wrapper.
      
      No change in functionality.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c6813144
    • I
      x86/fpu: Simplify FPU handling by embedding the fpstate in task_struct (again) · 7366ed77
      Ingo Molnar 提交于
      So 6 years ago we made the FPU fpstate dynamically allocated:
      
        aa283f49 ("x86, fpu: lazy allocation of FPU area - v5")
        61c4628b ("x86, fpu: split FPU state from task struct - v5")
      
      In hindsight this was a mistake:
      
         - it complicated context allocation failure handling, such as:
      
      		/* kthread execs. TODO: cleanup this horror. */
      		if (WARN_ON(fpstate_alloc_init(fpu)))
      			force_sig(SIGKILL, tsk);
      
         - it caused us to enable irqs in fpu__restore():
      
                      local_irq_enable();
                      /*
                       * does a slab alloc which can sleep
                       */
                      if (fpstate_alloc_init(fpu)) {
                              /*
                               * ran out of memory!
                               */
                              do_group_exit(SIGKILL);
                              return;
                      }
                      local_irq_disable();
      
         - it (slightly) slowed down task creation/destruction by adding
           slab allocation/free pattens.
      
         - it made access to context contents (slightly) slower by adding
           one more pointer dereference.
      
      The motivation for the dynamic allocation was two-fold:
      
         - reduce memory consumption by non-FPU tasks
      
         - allocate and handle only the necessary amount of context for
           various XSAVE processors that have varying hardware frame
           sizes.
      
      These days, with glibc using SSE memcpy by default and GCC optimizing
      for SSE/AVX by default, the scope of FPU using apps on an x86 system is
      much larger than it was 6 years ago.
      
      For example on a freshly installed Fedora 21 desktop system, with a
      recent kernel, all non-kthread tasks have used the FPU shortly after
      bootup.
      
      Also, even modern embedded x86 CPUs try to support the latest vector
      instruction set - so they'll too often use the larger xstate frame
      sizes.
      
      So remove the dynamic allocation complication by embedding the FPU
      fpstate in task_struct again. This should make the FPU a lot more
      accessible to all sorts of atomic contexts.
      
      We could still optimize for the xstate frame size in the future,
      by moving the state structure to the last element of task_struct,
      and allocating only a part of that.
      
      This change is kept minimal by still keeping the ctx_alloc()/free()
      routines (that now do nothing substantial) - we'll remove them in
      the following patches.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7366ed77
    • I
      x86/fpu: Rename fpu_save_init() to copy_fpregs_to_fpstate() · 4f836347
      Ingo Molnar 提交于
      So fpu_save_init() is a historic name that got its name when the only
      way the FPU state was FNSAVE, which cleared (well, destroyed) the FPU
      state after saving it.
      
      Nowadays the name is misleading, because ever since the introduction of
      FXSAVE (and more modern FPU saving instructions) the 'we need to reload
      the FPU state' part is only true if there's a pending FPU exception [*],
      which is almost never the case.
      
      So rename it to copy_fpregs_to_fpstate() to make it clear what's
      happening. Also add a few comments about why we cannot keep registers
      in certain cases.
      
      Also clean up the control flow a bit, to make it more apparent when
      we are dropping/keeping FP registers, and to optimize the common
      case (of keeping fpregs) some more.
      
      [*] Probably not true anymore, modern instructions always leave the FPU
          state intact, even if exceptions are pending: because pending FP
          exceptions are posted on the next FP instruction, not asynchronously.
      
          They were truly asynchronous back in the IRQ13 case, and we had to
          synchronize with them, but that code is not working anymore: we don't
          have IRQ13 mapped in the IDT anymore.
      
          But a cleanup patch is obviously not the place to change subtle behavior.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4f836347