1. 10 12月, 2016 2 次提交
  2. 08 10月, 2016 2 次提交
    • T
      x86/apic: Prevent pointless warning messages · df610d67
      Thomas Gleixner 提交于
      Markus reported that he sees new warnings:
      
        APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 4/0x84 ignored.
        APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 5/0x85 ignored.
      
      This comes from the recent persistant cpuid - nodeid changes. The code
      which emits the warning has been called prior to these changes only for
      enabled processors. Now it's called for disabled processors as well to get
      the possible cpu accounting correct. So if the kernel is compiled for the
      number of actual available/enabled CPUs and the BIOS reports disabled CPUs
      as well then the above warnings are printed.
      
      That's a pointless exercise as it only makes sense if there are more CPUs
      enabled than the kernel supports.
      
      Nake the warning conditional on enabled processors so we are back to the
      state before these changes.
      
      Fixes: 8f54969d ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") 
      Reported-and-tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: linux-acpi@vger.kernel.org
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      df610d67
    • C
      nmi_backtrace: add more trigger_*_cpu_backtrace() methods · 9a01c3ed
      Chris Metcalf 提交于
      Patch series "improvements to the nmi_backtrace code" v9.
      
      This patch series modifies the trigger_xxx_backtrace() NMI-based remote
      backtracing code to make it more flexible, and makes a few small
      improvements along the way.
      
      The motivation comes from the task isolation code, where there are
      scenarios where we want to be able to diagnose a case where some cpu is
      about to interrupt a task-isolated cpu.  It can be helpful to see both
      where the interrupting cpu is, and also an approximation of where the
      cpu that is being interrupted is.  The nmi_backtrace framework allows us
      to discover the stack of the interrupted cpu.
      
      I've tested that the change works as desired on tile, and build-tested
      x86, arm, mips, and sparc64.  For x86 I confirmed that the generic
      cpuidle stuff as well as the architecture-specific routines are in the
      new cpuidle section.  For arm, mips, and sparc I just build-tested it
      and made sure the generic cpuidle routines were in the new cpuidle
      section, but I didn't attempt to figure out which the platform-specific
      idle routines might be.  That might be more usefully done by someone
      with platform experience in follow-up patches.
      
      This patch (of 4):
      
      Currently you can only request a backtrace of either all cpus, or all
      cpus but yourself.  It can also be helpful to request a remote backtrace
      of a single cpu, and since we want that, the logical extension is to
      support a cpumask as the underlying primitive.
      
      This change modifies the existing lib/nmi_backtrace.c code to take a
      cpumask as its basic primitive, and modifies the linux/nmi.h code to use
      the new "cpumask" method instead.
      
      The existing clients of nmi_backtrace (arm and x86) are converted to
      using the new cpumask approach in this change.
      
      The other users of the backtracing API (sparc64 and mips) are converted
      to use the cpumask approach rather than the all/allbutself approach.
      The mips code ignored the "include_self" boolean but with this change it
      will now also dump a local backtrace if requested.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a01c3ed
  3. 04 10月, 2016 1 次提交
    • M
      x86/irq: Prevent force migration of irqs which are not in the vector domain · db91aa79
      Mika Westerberg 提交于
      When a CPU is about to be offlined we call fixup_irqs() that resets IRQ
      affinities related to the CPU in question. The same thing is also done when
      the system is suspended to S-states like S3 (mem).
      
      For each IRQ we try to complete any on-going move regardless whether the
      IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch
      its chip_data, assume it is of type struct apic_chip_data and manipulate it
      by clearing old_domain mask etc. For irq_chips that are not part of the
      x86_vector_domain, like those created by various GPIO drivers, will find
      their chip_data being changed unexpectly.
      
      Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets
      corrupted after resume:
      
        # cat /sys/kernel/debug/gpio
        gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
         gpio-511 (                    |sysfs               ) in  hi
      
        # rtcwake -s10 -mmem
        <10 seconds passes>
      
        # cat /sys/kernel/debug/gpio
        gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
         gpio-511 (                    |sysfs               ) in  ?
      
      Note '?' in the output. It means the struct gpio_chip ->get function is
      NULL whereas before suspend it was there.
      
      Fix this by first checking that the IRQ belongs to x86_vector_domain before
      we try to use the chip_data as struct apic_chip_data.
      Reported-and-tested-by: NSakari Ailus <sakari.ailus@linux.intel.com>
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: stable@vger.kernel.org # 4.4+
      Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      db91aa79
  4. 27 9月, 2016 1 次提交
  5. 22 9月, 2016 2 次提交
    • G
      x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping · 8f54969d
      Gu Zheng 提交于
      The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
      when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
      wq_numa_possible_cpumask will not cause any problem.
      It contains 4 steps:
      1. Enable apic registeration flow to handle both enabled and disabled cpus.
      2. Introduce a new array storing all possible cpuid <-> apicid mapping.
      3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
      4. Establish all possible cpuid <-> nodeid mapping.
      
      This patch finishes step 2.
      
      In this patch, we introduce a new static array named cpuid_to_apicid[],
      which is large enough to store info for all possible cpus.
      
      And then, we modify the cpuid calculation. In generic_processor_info(),
      it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
      mapping changes with node hotplug.
      
      After this patch, we find the next unused cpuid, map it to an apicid,
      and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
      mapping will be persistent.
      
      And finally we will use this array to make cpuid <-> nodeid persistent.
      
      cpuid <-> apicid mapping is established at local apic registeration time.
      But non-present or disabled cpus are ignored.
      
      In this patch, we establish all possible cpuid <-> apicid mapping when
      registering local apic.
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: mika.j.penttila@gmail.com
      Cc: len.brown@intel.com
      Cc: rafael@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: yasu.isimatu@gmail.com
      Cc: linux-mm@kvack.org
      Cc: linux-acpi@vger.kernel.org
      Cc: isimatu.yasuaki@jp.fujitsu.com
      Cc: gongzhaogang@inspur.com
      Cc: tj@kernel.org
      Cc: izumi.taku@jp.fujitsu.com
      Cc: cl@linux.com
      Cc: chen.tang@easystack.cn
      Cc: akpm@linux-foundation.org
      Cc: kamezawa.hiroyu@jp.fujitsu.com
      Cc: lenb@kernel.org
      Link: http://lkml.kernel.org/r/1472114120-3281-4-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8f54969d
    • G
      x86/acpi: Enable acpi to register all possible cpus at boot time · f7c28833
      Gu Zheng 提交于
      cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
      the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.
      
      When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
      which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
      workqueue does not update wq_numa_possible_cpumask.
      
      So here is the problem:
      
      Assume we have the following cpuid <-> nodeid in the beginning:
      
        Node | CPU
      
      ------------------------
      node 0 |  0-14, 60-74
      node 1 | 15-29, 75-89
      node 2 | 30-44, 90-104
      node 3 | 45-59, 105-119
      
      and we hot-remove node2 and node3, it becomes:
      
        Node | CPU
      ------------------------
      node 0 |  0-14, 60-74
      node 1 | 15-29, 75-89
      
      and we hot-add node4 and node5, it becomes:
      
        Node | CPU
      ------------------------
      node 0 |  0-14, 60-74
      node 1 | 15-29, 75-89
      node 4 | 30-59
      node 5 | 90-119
      
      But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.
      
      When a pool workqueue is initialized, if its cpumask belongs to a node, its
      pool->node will be mapped to that node. And memory used by this workqueue will
      also be allocated on that node.
      
      static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
      ...
              /* if cpumask is contained inside a NUMA node, we belong to that node */
              if (wq_numa_enabled) {
                      for_each_node(node) {
                              if (cpumask_subset(pool->attrs->cpumask,
                                                 wq_numa_possible_cpumask[node])) {
                                      pool->node = node;
                                      break;
                              }
                      }
              }
      
      Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
      which will lead to memory allocation failure:
      
       SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
        cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
        node 0: slabs: 6172, objs: 259224, free: 245741
        node 1: slabs: 3261, objs: 136962, free: 127656
      
      It happens here:
      
      create_worker(struct worker_pool *pool)
       |--> worker = alloc_worker(pool->node);
      
      static struct worker *alloc_worker(int node)
      {
              struct worker *worker;
      
              worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.
      
              ......
      
              return worker;
      }
      
      [Solution]
      
      There are four mappings in the kernel:
      1. nodeid (logical node id)   <->   pxm
      2. apicid (physical cpu id)   <->   nodeid
      3. cpuid (logical cpu id)     <->   apicid
      4. cpuid (logical cpu id)     <->   nodeid
      
      1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
         mapping is setup at boot time. This mapping is persistent, won't change.
      
      2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
         time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
         persistent.
      
      3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
         allocated, lower ids first, and released at CPU hotremove time, reused for other
         hotadded CPUs. So this mapping is not persistent.
      
      4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
         cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.
      
      To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
      cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
      cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
      mapping. So the key point is obtaining all cpus' apicid.
      
      apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
      MADT (Multiple APIC Description Table). So we finish the job in the following steps:
      
      1. Enable apic registeration flow to handle both enabled and disabled cpus.
         This is done by introducing an extra parameter to generic_processor_info to let the
         caller control if disabled cpus are ignored.
      
      2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
         the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
         registering local apic. Store the mapping in this array.
      
      3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
         This is also done by introducing an extra parameter to these apis to let the caller
         control if disabled cpus are ignored.
      
      4. Establish all possible cpuid <-> nodeid mapping.
         This is done via an additional acpi namespace walk for processors.
      
      This patch finished step 1.
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: mika.j.penttila@gmail.com
      Cc: len.brown@intel.com
      Cc: rafael@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: yasu.isimatu@gmail.com
      Cc: linux-mm@kvack.org
      Cc: linux-acpi@vger.kernel.org
      Cc: isimatu.yasuaki@jp.fujitsu.com
      Cc: gongzhaogang@inspur.com
      Cc: tj@kernel.org
      Cc: izumi.taku@jp.fujitsu.com
      Cc: cl@linux.com
      Cc: chen.tang@easystack.cn
      Cc: akpm@linux-foundation.org
      Cc: kamezawa.hiroyu@jp.fujitsu.com
      Cc: lenb@kernel.org
      Link: http://lkml.kernel.org/r/1472114120-3281-3-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f7c28833
  6. 20 9月, 2016 2 次提交
  7. 14 9月, 2016 1 次提交
    • M
      x86: Clean up various simple wrapper functions · f148b41e
      Masahiro Yamada 提交于
      Remove unneeded variables and assignments.
      
      While we are here, let's fix the following as well:
      
        - Remove unnecessary parentheses
        - Remove unnecessary unsigned-suffix 'U' from constant values
        - Reword the comment in set_apic_id() (suggested by Thomas Gleixner)
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Daniel J Blueman <daniel@numascale.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Nathan Zimmer <nzimmer@sgi.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steffen Persvold <sp@numascale.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/1473573502-27954-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f148b41e
  8. 08 9月, 2016 1 次提交
  9. 24 8月, 2016 2 次提交
  10. 15 8月, 2016 1 次提交
    • B
      x86/apic, ACPI: Remove the repeated lapic address override entry parsing · 6de42119
      Baoquan He 提交于
      The ACPI MADT has a 32-bit field providing lapic address at which
      each processor can access its lapic information. MADT also contains
      an optional entry to provide a 64-bit address to override the 32-bit
      one. However the current code does the lapic address override entry
      parsing twice. One is in early_acpi_boot_init() because AMD NUMA need
      get boot_cpu_id earlier. The other is in acpi_boot_init() which parses
      all MADT entries.
      
      So in this patch we remove the repeated code in the 2nd part.
      
      Meanwhile print lapic override entry information like other MADT entry,
      this will be added to boot log.
      
      This patch is not supposed to change any runtime behavior, other than
      improving kernel messages.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-acpi@vger.kernel.org
      Cc: rjw@rjwysocki.net
      Link: http://lkml.kernel.org/r/1470985033-22493-2-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6de42119
  11. 11 8月, 2016 1 次提交
  12. 10 8月, 2016 6 次提交
    • M
      x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems · 5a52e8f8
      Mike Travis 提交于
      The latest UV kernel support panics when RHEL7 kexec's the kdump kernel
      to make a dumpfile.  This patch fixes the problem by turning off all UV
      support if NUMA is off.
      Tested-by: NFrank Ramsay <framsay@sgi.com>
      Tested-by: NJohn Estabrook <estabrook@sgi.com>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NDimitri Sivanich <sivanich@sgi.com>
      Reviewed-by: NNathan Zimmer <nzimmer@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160801184050.577755634@asylum.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5a52e8f8
    • M
      x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values · 22ac2bca
      Mike Travis 提交于
      There are some circumstances where the UV4 BIOS cannot provide the
      correct Proximity Node values to associate with specific Sockets and
      Physical Nodes.  The decision was made to remove these values from BIOS
      and for the kernel to get these values from the standard ACPI tables.
      Tested-by: NFrank Ramsay <framsay@sgi.com>
      Tested-by: NJohn Estabrook <estabrook@sgi.com>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NDimitri Sivanich <sivanich@sgi.com>
      Reviewed-by: NNathan Zimmer <nzimmer@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160801184050.414210079@asylum.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      22ac2bca
    • M
      x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous · 054f621f
      Mike Travis 提交于
      The UV4 Socket IDs are not guaranteed to equate to Node values which
      can cause the GAM (Global Addressable Memory) table lookups to fail.
      Fix this by using an independent index into the GAM table instead of
      the Socket ID to reference the base address.
      Tested-by: NFrank Ramsay <framsay@sgi.com>
      Tested-by: NJohn Estabrook <estabrook@sgi.com>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NDimitri Sivanich <sivanich@sgi.com>
      Reviewed-by: NNathan Zimmer <nzimmer@sgi.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Banman <abanman@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160801184050.048755337@asylum.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      054f621f
    • K
      x86: Apply more __ro_after_init and const · 404f6aac
      Kees Cook 提交于
      Guided by grsecurity's analogous __read_only markings in arch/x86,
      this applies several uses of __ro_after_init to structures that are
      only updated during __init, and const for some structures that are
      never updated.  Additionally extends __init markings to some functions
      that are only used during __init, and cleans up some missing C99 style
      static initializers.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Brown <david.brown@linaro.org>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20160808232906.GA29731@www.outflux.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      404f6aac
    • N
      x86/timers/apic: Inform TSC deadline clockevent device about recalibration · 6731b0d6
      Nicolai Stange 提交于
      This patch eliminates a source of imprecise APIC timer interrupts,
      which imprecision may result in double interrupts or even late
      interrupts.
      
      The TSC deadline clockevent devices' configuration and registration
      happens before the TSC frequency calibration is refined in
      tsc_refine_calibration_work().
      
      This results in the TSC clocksource and the TSC deadline clockevent
      devices being configured with slightly different frequencies: the former
      gets the refined one and the latter are configured with the inaccurate
      frequency detected earlier by means of the "Fast TSC calibration using PIT".
      
      Within the APIC code, introduce the notifier function
      lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline
      clockevent devices with the current tsc_khz.
      
      Call it from the TSC code after TSC calibration refinement has happened.
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christopher S. Hall <christopher.s.hall@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com
      [ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6731b0d6
    • N
      x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents... · 1a9e4c56
      Nicolai Stange 提交于
      x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
      
      I noticed the following bug/misbehavior on certain Intel systems: with a
      single task running on a NOHZ CPU on an Intel Haswell, I recognized
      that I did not only get the one expected local_timer APIC interrupt, but
      two per second at minimum. (!)
      
      Further tracing showed that the first one precedes the programmed deadline
      by up to ~50us and hence, it did nothing except for reprogramming the TSC
      deadline clockevent device to trigger shortly thereafter again.
      
      The reason for this is imprecise calibration, the timeout we program into
      the APIC results in 'too short' timer interrupts. The core (hr)timer code
      notices this (because it has a precise ktime source and sees the short
      interrupt) and fixes it up by programming an additional very short
      interrupt period.
      
      This is obviously suboptimal.
      
      The reason for the imprecise calibration is twofold, and this patch
      fixes the first reason:
      
      In setup_APIC_timer(), the registered clockevent device's frequency
      is calculated by first dividing tsc_khz by TSC_DIVISOR and multiplying
      it with 1000 afterwards:
      
        (tsc_khz / TSC_DIVISOR) * 1000
      
      The multiplication with 1000 is done for converting from kHz to Hz and the
      division by TSC_DIVISOR is carried out in order to make sure that the final
      result fits into an u32.
      
      However, with the order given in this calculation, the roundoff error
      introduced by the division gets magnified by a factor of 1000 by the
      following multiplication.
      
      To fix it, reversing the order of the division and the multiplication a la:
      
        (tsc_khz * 1000) / TSC_DIVISOR
      
      ... reduces the roundoff error already.
      
      Furthermore, if TSC_DIVISOR divides 1000, associativity holds:
      
        (tsc_khz * 1000) / TSC_DIVISOR = tsc_khz * (1000 / TSC_DIVISOR)
      
      and thus, the roundoff error even vanishes and the whole operation can be
      carried out within 32 bits.
      
      The powers of two that divide 1000 are 2, 4 and 8. A value of 8 for
      TSC_DIVISOR still allows for TSC frequencies up to
      2^32 / 10^9ns * 8 = 34.4GHz which is way larger than anything to expect
      in the next years.
      
      Thus we also replace the current TSC_DIVISOR value of 32 by 8. Reverse
      the order of the divison and the multiplication in the calculation of
      the registered clockevent device's frequency.
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christopher S. Hall <christopher.s.hall@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20160714152255.18295-2-nicstange@gmail.com
      [ Improved changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1a9e4c56
  13. 04 8月, 2016 1 次提交
    • M
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada 提交于
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      clearer.
      
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      
      I noticed two cases where config_enabled() is used against a tristate
      option:
      
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
      
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      intention.
      
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Nikolay Martynov <mar.kolya@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: James Cowgill <James.Cowgill@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tony Wu <tung7970@gmail.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Cc: "Maciej W. Rozycki" <macro@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97f2645f
  14. 25 7月, 2016 1 次提交
  15. 19 7月, 2016 1 次提交
  16. 15 7月, 2016 2 次提交
  17. 14 7月, 2016 1 次提交
    • P
      x86/kernel: Audit and remove any unnecessary uses of module.h · 186f4360
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  The advantage
      in doing so is that module.h itself sources about 15 other headers;
      adding significantly to what we feed cpp, and it can obscure what
      headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
      for the presence of either and replace as needed.  Build testing
      revealed some implicit header usage that was fixed up accordingly.
      
      Note that some bool/obj-y instances remain since module.h is
      the header for some exception table entry stuff, and for things
      like __init_or_module (code that is tossed when MODULES=n).
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      186f4360
  18. 11 7月, 2016 1 次提交
    • I
      Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86" · 44530d58
      Ingo Molnar 提交于
      This reverts commit 2c95afc1.
      
      Stephane reported the following regression:
      
       > Since Andi added:
       >
       > commit 2c95afc1
       > Author: Andi Kleen <ak@linux.intel.com>
       > Date:   Thu Jun 9 06:14:38 2016 -0700
       >
       >    perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
       >
       > $ perf stat -e ref-cycles ls
       >   <not counted> ....
       >
       > fails systematically because the ref-cycles is now used by the
       > watchdog and given this is a system-wide pinned event, it monopolizes
       > the fixed counter 2 which is the only counter able to measure this event.
      
      Since the next merge window is near, fix the regression for now
      by reverting the commit.
      Reported-by: NStephane Eranian <eranian@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      44530d58
  19. 07 7月, 2016 1 次提交
  20. 04 7月, 2016 1 次提交
  21. 14 6月, 2016 1 次提交
    • A
      perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86 · 2c95afc1
      Andi Kleen 提交于
      The NMI watchdog uses either the fixed cycles or a generic cycles
      counter. This causes a lot of conflicts with users of the PMU who want
      to run a full group including the cycles fixed counter, for example
      the --topdown support recently added to perf stat. The code needs to
      fall back to not use groups, which can cause measurement inaccuracy
      due to multiplexing errors.
      
      This patch switches the NMI watchdog to use reference cycles
      on Intel systems.  This is actually more accurate than cycles,
      because cycles can tick faster than the measured CPU Frequency
      due to Turbo mode.
      
      The ref cycles always tick at their frequency, or slower when
      the system is idling. That means the NMI watchdog can never
      expire too early, unlike with cycles.
      
      The reference cycles tick roughly at the frequency of the TSC,
      so the same period computation can be used.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/1465478079-19993-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2c95afc1
  22. 10 6月, 2016 3 次提交
  23. 21 5月, 2016 1 次提交
    • P
      printk/nmi: generic solution for safe printk in NMI · 42a0bb3f
      Petr Mladek 提交于
      printk() takes some locks and could not be used a safe way in NMI
      context.
      
      The chance of a deadlock is real especially when printing stacks from
      all CPUs.  This particular problem has been addressed on x86 by the
      commit a9edc880 ("x86/nmi: Perform a safe NMI stack trace on all
      CPUs").
      
      The patchset brings two big advantages.  First, it makes the NMI
      backtraces safe on all architectures for free.  Second, it makes all NMI
      messages almost safe on all architectures (the temporary buffer is
      limited.  We still should keep the number of messages in NMI context at
      minimum).
      
      Note that there already are several messages printed in NMI context:
      WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
      handlers.  These are not easy to avoid.
      
      This patch reuses most of the code and makes it generic.  It is useful
      for all messages and architectures that support NMI.
      
      The alternative printk_func is set when entering and is reseted when
      leaving NMI context.  It queues IRQ work to copy the messages into the
      main ring buffer in a safe context.
      
      __printk_nmi_flush() copies all available messages and reset the buffer.
      Then we could use a simple cmpxchg operations to get synchronized with
      writers.  There is also used a spinlock to get synchronized with other
      flushers.
      
      We do not longer use seq_buf because it depends on external lock.  It
      would be hard to make all supported operations safe for a lockless use.
      It would be confusing and error prone to make only some operations safe.
      
      The code is put into separate printk/nmi.c as suggested by Steven
      Rostedt.  It needs a per-CPU buffer and is compiled only on
      architectures that call nmi_enter().  This is achieved by the new
      HAVE_NMI Kconfig flag.
      
      The are MN10300 and Xtensa architectures.  We need to clean up NMI
      handling there first.  Let's do it separately.
      
      The patch is heavily based on the draft from Peter Zijlstra, see
      
        https://lkml.org/lkml/2015/6/10/327
      
      [arnd@arndb.de: printk-nmi: use %zu format string for size_t]
      [akpm@linux-foundation.org: min_t->min - all types are size_t here]
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>	[arm part]
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: Jiri Kosina <jkosina@suse.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42a0bb3f
  24. 05 5月, 2016 1 次提交
    • A
      x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init · 08914f43
      Alex Thorlton 提交于
      A while back the following commit:
      
        d394f2d9 ("x86/platform/UV: Remove EFI memmap quirk for UV2+")
      
      changed uv_system_init() to only call map_low_mmrs() on older UV1 hardware,
      which requires EFI_OLD_MEMMAP to be set in order to boot.
      
      The recent changes to the EFI memory mapping code in:
      
        d2f7cbe7 ("x86/efi: Runtime services virtual mapping")
      
      exposed some issues with the fact that we were relying on the EFI memory
      mapping mechanisms to map in our MMRs for us, after commit d394f2d9.
      
      Rather than revert the entire commit and go back to forcing
      EFI_OLD_MEMMAP on all UVs, we're going to add the call to map_low_mmrs()
      back into uv_system_init(), and then fix up our EFI runtime calls to use
      the appropriate page table.
      
      For now, UV2+ will still need efi=old_map to boot, but there will be
      other changes soon that should eliminate the need for this.
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Link: http://lkml.kernel.org/r/1462401592-120735-1-git-send-email-athorlton@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      08914f43
  25. 04 5月, 2016 3 次提交