1. 22 4月, 2019 1 次提交
    • D
      x86/kdump: Fall back to reserve high crashkernel memory · b9ac3849
      Dave Young 提交于
      crashkernel=xM tries to reserve memory for the crash kernel under 4G,
      which is enough, usually. But this could fail sometimes, for example
      when one tries to reserve a big chunk like 2G, for example.
      
      So let the crashkernel=xM just fall back to use high memory in case it
      fails to find a suitable low range. Do not set the ,high as default
      because it allocates extra low memory for DMA buffers and swiotlb, and
      this is not always necessary for all machines.
      
      Typically, crashkernel=128M usually works with low reservation under 4G,
      so keep <4G as default.
      
       [ bp: Massage. ]
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: linux-doc@vger.kernel.org
      Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
      Cc: Petr Tesarik <ptesarik@suse.cz>
      Cc: piliu@redhat.com
      Cc: Ram Pai <linuxram@us.ibm.com>
      Cc: Sinan Kaya <okaya@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thymo van Beers <thymovanbeers@gmail.com>
      Cc: vgoyal@redhat.com
      Cc: x86-ml <x86@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Zhimin Gu <kookoo.gu@intel.com>
      Link: https://lkml.kernel.org/r/20190422031905.GA8387@dhcp-128-65.nay.redhat.com
      b9ac3849
  2. 26 2月, 2019 1 次提交
  3. 22 2月, 2019 1 次提交
  4. 14 2月, 2019 1 次提交
    • F
      async: Add cmdline option to specify drivers to be async probed · 1ea61b68
      Feng Tang 提交于
      Asynchronous driver probing can help much on kernel fastboot, and
      this option can provide a flexible way to optimize and quickly verify
      async driver probe.
      
      Also it will help in below cases:
      * Some driver actually covers several families of HWs, some of which
        could use async probing while others don't. So we can't simply
        turn on the PROBE_PREFER_ASYNCHRONOUS flag in driver, but use this
        cmdline option, like igb driver async patch discussed at
        https://www.spinics.net/lists/netdev/msg545986.html
      
      * For SOC (System on Chip) with multiple spi or i2c controllers, most
        of the slave spi/i2c devices will be assigned with fixed controller
        number, while async probing may make those controllers get different
        index for each boot, which prevents those controller drivers to be
        async probed. For platforms not using these spi/i2c slave devices,
        they can use this cmdline option to benefit from the async probing.
      Suggested-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1ea61b68
  5. 07 2月, 2019 2 次提交
  6. 06 2月, 2019 1 次提交
  7. 04 2月, 2019 1 次提交
    • A
      efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation · 69c1f396
      Ard Biesheuvel 提交于
      Move the x86 EFI earlyprintk implementation to a shared location under
      drivers/firmware and tweak it slightly so we can expose it as an earlycon
      implementation (which is generic) rather than earlyprintk (which is only
      implemented for a few architectures)
      
      This also involves switching to write-combine mappings by default (which
      is required on ARM since device mappings lack memory semantics, and so
      memcpy/memset may not be used on them), and adding support for shared
      memory framebuffers on cache coherent non-x86 systems (which do not
      tolerate mismatched attributes).
      
      Note that 32-bit ARM does not populate its struct screen_info early
      enough for earlycon=efifb to work, so it is disabled there.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NAlexander Graf <agraf@suse.de>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
      Cc: Jeffrey Hugo <jhugo@codeaurora.org>
      Cc: Lee Jones <lee.jones@linaro.org>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190202094119.13230-10-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69c1f396
  8. 02 2月, 2019 1 次提交
  9. 31 1月, 2019 1 次提交
    • L
      iommu/vt-d: Leave scalable mode default off · 8950dcd8
      Lu Baolu 提交于
      Commit 765b6a98 ("iommu/vt-d: Enumerate the scalable
      mode capability") enables VT-d scalable mode if hardware
      advertises the capability. As we will bring up different
      features and use cases to upstream in different patch
      series, it will leave some intermediate kernel versions
      which support partial features. Hence, end user might run
      into problems when they use such kernels on bare metals
      or virtualization environments.
      
      This leaves scalable mode default off and end users could
      turn it on with "intel-iommu=sm_on" only when they have
      clear ideas about which scalable features are supported
      in the kernel.
      
      Cc: Liu Yi L <yi.l.liu@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Suggested-by: NAshok Raj <ashok.raj@intel.com>
      Suggested-by: NKevin Tian <kevin.tian@intel.com>
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      8950dcd8
  10. 26 1月, 2019 2 次提交
  11. 09 1月, 2019 1 次提交
  12. 05 1月, 2019 1 次提交
  13. 01 1月, 2019 1 次提交
  14. 29 12月, 2018 1 次提交
  15. 20 12月, 2018 1 次提交
  16. 11 12月, 2018 3 次提交
  17. 02 12月, 2018 1 次提交
    • P
      rcutorture: Remove cbflood facility · fc6f9c57
      Paul E. McKenney 提交于
      Now that the forward-progress code does a full-bore continuous callback
      flood lasting multiple seconds, there is little point in also posting a
      mere 60,000 callbacks every second or so.  This commit therefore removes
      the old cbflood testing.  Over time, it may be desirable to concurrently
      do full-bore continuous callback floods on all CPUs simultaneously, but
      one dragon at a time.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      fc6f9c57
  18. 01 12月, 2018 1 次提交
    • J
      psi: make disabling/enabling easier for vendor kernels · e0c27447
      Johannes Weiner 提交于
      Mel Gorman reports a hackbench regression with psi that would prohibit
      shipping the suse kernel with it default-enabled, but he'd still like
      users to be able to opt in at little to no cost to others.
      
      With the current combination of CONFIG_PSI and the psi_disabled bool set
      from the commandline, this is a challenge.  Do the following things to
      make it easier:
      
      1. Add a config option CONFIG_PSI_DEFAULT_DISABLED that allows distros
         to enable CONFIG_PSI in their kernel but leave the feature disabled
         unless a user requests it at boot-time.
      
         To avoid double negatives, rename psi_disabled= to psi=.
      
      2. Make psi_disabled a static branch to eliminate any branch costs
         when the feature is disabled.
      
      In terms of numbers before and after this patch, Mel says:
      
      : The following is a comparision using CONFIG_PSI=n as a baseline against
      : your patch and a vanilla kernel
      :
      :                          4.20.0-rc4             4.20.0-rc4             4.20.0-rc4
      :                 kconfigdisable-v1r1                vanilla        psidisable-v1r1
      : Amean     1       1.3100 (   0.00%)      1.3923 (  -6.28%)      1.3427 (  -2.49%)
      : Amean     3       3.8860 (   0.00%)      4.1230 *  -6.10%*      3.8860 (  -0.00%)
      : Amean     5       6.8847 (   0.00%)      8.0390 * -16.77%*      6.7727 (   1.63%)
      : Amean     7       9.9310 (   0.00%)     10.8367 *  -9.12%*      9.9910 (  -0.60%)
      : Amean     12     16.6577 (   0.00%)     18.2363 *  -9.48%*     17.1083 (  -2.71%)
      : Amean     18     26.5133 (   0.00%)     27.8833 *  -5.17%*     25.7663 (   2.82%)
      : Amean     24     34.3003 (   0.00%)     34.6830 (  -1.12%)     32.0450 (   6.58%)
      : Amean     30     40.0063 (   0.00%)     40.5800 (  -1.43%)     41.5087 (  -3.76%)
      : Amean     32     40.1407 (   0.00%)     41.2273 (  -2.71%)     39.9417 (   0.50%)
      :
      : It's showing that the vanilla kernel takes a hit (as the bisection
      : indicated it would) and that disabling PSI by default is reasonably
      : close in terms of performance for this particular workload on this
      : particular machine so;
      
      Link: http://lkml.kernel.org/r/20181127165329.GA29728@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Tested-by: NMel Gorman <mgorman@techsingularity.net>
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0c27447
  19. 28 11月, 2018 4 次提交
    • T
      x86/speculation: Provide IBPB always command line options · 55a97402
      Thomas Gleixner 提交于
      Provide the possibility to enable IBPB always in combination with 'prctl'
      and 'seccomp'.
      
      Add the extra command line options and rework the IBPB selection to
      evaluate the command instead of the mode selected by the STIPB switch case.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.de
      55a97402
    • T
      x86/speculation: Add seccomp Spectre v2 user space protection mode · 6b3e64c2
      Thomas Gleixner 提交于
      If 'prctl' mode of user space protection from spectre v2 is selected
      on the kernel command-line, STIBP and IBPB are applied on tasks which
      restrict their indirect branch speculation via prctl.
      
      SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it
      makes sense to prevent spectre v2 user space to user space attacks as
      well.
      
      The Intel mitigation guide documents how STIPB works:
          
         Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
         prevents the predicted targets of indirect branches on any logical
         processor of that core from being controlled by software that executes
         (or executed previously) on another logical processor of the same core.
      
      Ergo setting STIBP protects the task itself from being attacked from a task
      running on a different hyper-thread and protects the tasks running on
      different hyper-threads from being attacked.
      
      While the document suggests that the branch predictors are shielded between
      the logical processors, the observed performance regressions suggest that
      STIBP simply disables the branch predictor more or less completely. Of
      course the document wording is vague, but the fact that there is also no
      requirement for issuing IBPB when STIBP is used points clearly in that
      direction. The kernel still issues IBPB even when STIBP is used until Intel
      clarifies the whole mechanism.
      
      IBPB is issued when the task switches out, so malicious sandbox code cannot
      mistrain the branch predictor for the next user space task on the same
      logical processor.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.de
      
      6b3e64c2
    • T
      x86/speculation: Enable prctl mode for spectre_v2_user · 7cc765a6
      Thomas Gleixner 提交于
      Now that all prerequisites are in place:
      
       - Add the prctl command line option
      
       - Default the 'auto' mode to 'prctl'
      
       - When SMT state changes, update the static key which controls the
         conditional STIBP evaluation on context switch.
      
       - At init update the static key which controls the conditional IBPB
         evaluation on context switch.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.de
      
      7cc765a6
    • T
      x86/speculation: Add command line control for indirect branch speculation · fa1202ef
      Thomas Gleixner 提交于
      Add command line control for user space indirect branch speculation
      mitigations. The new option is: spectre_v2_user=
      
      The initial options are:
      
          -  on:   Unconditionally enabled
          - off:   Unconditionally disabled
          -auto:   Kernel selects mitigation (default off for now)
      
      When the spectre_v2= command line argument is either 'on' or 'off' this
      implies that the application to application control follows that state even
      if a contradicting spectre_v2_user= argument is supplied.
      Originally-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.de
      fa1202ef
  20. 21 11月, 2018 2 次提交
    • P
      perf/x86/intel: Fix regression by default disabling perfmon v4 interrupt handling · 2a5bf23d
      Peter Zijlstra 提交于
      Kyle Huey reported that 'rr', a replay debugger, broke due to the following commit:
      
        af3bdb99 ("perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler")
      
      Rework the 'disable_counter_freezing' __setup() parameter such that we
      can explicitly enable/disable it and switch to default disabled.
      
      To this purpose, rename the parameter to "perf_v4_pmi=" which is a much
      better description and allows requiring a bool argument.
      
      [ mingo: Improved the changelog some more. ]
      Reported-by: NKyle Huey <me@kylehuey.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert O'Callahan <robert@ocallahan.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/20181120170842.GZ2131@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2a5bf23d
    • W
      Documentation: Use "while" instead of "whilst" · 806654a9
      Will Deacon 提交于
      Whilst making an unrelated change to some Documentation, Linus sayeth:
      
        | Afaik, even in Britain, "whilst" is unusual and considered more
        | formal, and "while" is the common word.
        |
        | [...]
        |
        | Can we just admit that we work with computers, and we don't need to
        | use þe eald Englisc spelling of words that most of the world never
        | uses?
      
      dictionary.com refers to the word as "Chiefly British", which is
      probably an undesirable attribute for technical documentation.
      
      Replace all occurrences under Documentation/ with "while".
      
      Cc: David Howells <dhowells@redhat.com>
      Cc: Liam Girdwood <lgirdwood@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      806654a9
  21. 13 11月, 2018 1 次提交
  22. 07 11月, 2018 1 次提交
    • K
      USB: Wait for extra delay time after USB_PORT_FEAT_RESET for quirky hub · 781f0766
      Kai-Heng Feng 提交于
      Devices connected under Terminus Technology Inc. Hub (1a40:0101) may
      fail to work after the system resumes from suspend:
      [  206.063325] usb 3-2.4: reset full-speed USB device number 4 using xhci_hcd
      [  206.143691] usb 3-2.4: device descriptor read/64, error -32
      [  206.351671] usb 3-2.4: device descriptor read/64, error -32
      
      Info for this hub:
      T:  Bus=03 Lev=01 Prnt=01 Port=01 Cnt=01 Dev#=  2 Spd=480 MxCh= 4
      D:  Ver= 2.00 Cls=09(hub  ) Sub=00 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=1a40 ProdID=0101 Rev=01.11
      S:  Product=USB 2.0 Hub
      C:  #Ifs= 1 Cfg#= 1 Atr=e0 MxPwr=100mA
      I:  If#= 0 Alt= 0 #EPs= 1 Cls=09(hub  ) Sub=00 Prot=00 Driver=hub
      
      Some expirements indicate that the USB devices connected to the hub are
      innocent, it's the hub itself is to blame. The hub needs extra delay
      time after it resets its port.
      
      Hence wait for extra delay, if the device is connected to this quirky
      hub.
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Cc: stable <stable@vger.kernel.org>
      Acked-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      781f0766
  23. 01 11月, 2018 1 次提交
  24. 27 10月, 2018 1 次提交
    • A
      mm: provide kernel parameter to allow disabling page init poisoning · f682a97a
      Alexander Duyck 提交于
      Patch series "Address issues slowing persistent memory initialization", v5.
      
      The main thing this patch set achieves is that it allows us to initialize
      each node worth of persistent memory independently.  As a result we reduce
      page init time by about 2 minutes because instead of taking 30 to 40
      seconds per node and going through each node one at a time, we process all
      4 nodes in parallel in the case of a 12TB persistent memory setup spread
      evenly over 4 nodes.
      
      This patch (of 3):
      
      On systems with a large amount of memory it can take a significant amount
      of time to initialize all of the page structs with the PAGE_POISON_PATTERN
      value.  I have seen it take over 2 minutes to initialize a system with
      over 12TB of RAM.
      
      In order to work around the issue I had to disable CONFIG_DEBUG_VM and
      then the boot time returned to something much more reasonable as the
      arch_add_memory call completed in milliseconds versus seconds.  However in
      doing that I had to disable all of the other VM debugging on the system.
      
      In order to work around a kernel that might have CONFIG_DEBUG_VM enabled
      on a system that has a large amount of memory I have added a new kernel
      parameter named "vm_debug" that can be set to "-" in order to disable it.
      
      Link: http://lkml.kernel.org/r/20180925201921.3576.84239.stgit@localhost.localdomainReviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f682a97a
  25. 11 10月, 2018 1 次提交
  26. 09 10月, 2018 1 次提交
  27. 03 10月, 2018 3 次提交
  28. 02 10月, 2018 1 次提交
    • A
      perf/x86/intel: Add a separate Arch Perfmon v4 PMI handler · af3bdb99
      Andi Kleen 提交于
      Implements counter freezing for Arch Perfmon v4 (Skylake and
      newer). This allows to speed up the PMI handler by avoiding
      unnecessary MSR writes and make it more accurate.
      
      The Arch Perfmon v4 PMI handler is substantially different than
      the older PMI handler.
      
      Differences to the old handler:
      
      - It relies on counter freezing, which eliminates several MSR
        writes from the PMI handler and lowers the overhead significantly.
      
        It makes the PMI handler more accurate, as all counters get
        frozen atomically as soon as any counter overflows. So there is
        much less counting of the PMI handler itself.
      
        With the freezing we don't need to disable or enable counters or
        PEBS. Only BTS which does not support auto-freezing still needs to
        be explicitly managed.
      
      - The PMU acking is done at the end, not the beginning.
        This makes it possible to avoid manual enabling/disabling
        of the PMU, instead we just rely on the freezing/acking.
      
      - The APIC is acked before reenabling the PMU, which avoids
        problems with LBRs occasionally not getting unfreezed on Skylake.
      
      - Looping is only needed to workaround a corner case which several PMIs
        are very close to each other. For common cases, the counters are freezed
        during PMI handler. It doesn't need to do re-check.
      
      This patch:
      
      - Adds code to enable v4 counter freezing
      - Fork <=v3 and >=v4 PMI handlers into separate functions.
      - Add kernel parameter to disable counter freezing. It took some time to
        debug counter freezing, so in case there are new problems we added an
        option to turn it off. Would not expect this to be used until there
        are new bugs.
      - Only for big core. The patch for small core will be posted later
        separately.
      
      Performance:
      
      When profiling a kernel build on Kabylake with different perf options,
      measuring the length of all NMI handlers using the nmi handler
      trace point:
      
      V3 is without counter freezing.
      V4 is with counter freezing.
      The value is the average cost of the PMI handler.
      (lower is better)
      
      perf options    `           V3(ns) V4(ns)  delta
      -c 100000                   1088   894     -18%
      -g -c 100000                1862   1646    -12%
      --call-graph lbr -c 100000  3649   3367    -8%
      --c.g. dwarf -c 100000      2248   1982    -12%
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/1533712328-2834-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af3bdb99
  29. 01 10月, 2018 1 次提交
  30. 21 9月, 2018 1 次提交