1. 09 1月, 2017 1 次提交
  2. 06 1月, 2017 1 次提交
  3. 13 12月, 2016 1 次提交
    • T
      x86/smpboot: Make logical package management more robust · 9d85eb91
      Thomas Gleixner 提交于
      The logical package management has several issues:
      
       - The APIC ids provided by ACPI are not required to be the same as the
         initial APIC id which can be retrieved by CPUID. The APIC ids provided
         by ACPI are those which are written by the BIOS into the APIC. The
         initial id is set by hardware and can not be changed. The hardware
         provided ids contain the real hardware package information.
      
         Especially AMD sets the effective APIC id different from the hardware id
         as they need to reserve space for the IOAPIC ids starting at id 0.
      
         As a consequence those machines trigger the currently active firmware
         bug printouts in dmesg, These are obviously wrong.
      
       - Virtual machines have their own interesting of enumerating APICs and
         packages which are not reliably covered by the current implementation.
      
      The sizing of the mapping array has been tweaked to be generously large to
      handle systems which provide a wrong core count when HT is disabled so the
      whole magic which checks for space in the physical hotplug case is not
      needed anymore.
      
      Simplify the whole machinery and do the mapping when the CPU starts and the
      CPUID derived physical package information is available. This solves the
      observed problems on AMD machines and works for the virtualization issues
      as well.
      
      Remove the extra call from XEN cpu bringup code as it is not longer
      required.
      
      Fixes: d49597fd ("x86/cpu: Deal with broken firmware (VMWare/XEN)")
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: M. Vefa Bicakci <m.v.b@runbox.com>
      Cc: xen-devel <xen-devel@lists.xen.org>
      Cc: Charles (Chas) Williams <ciwillia@brocade.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1612121102260.3429@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9d85eb91
  4. 10 12月, 2016 2 次提交
  5. 10 11月, 2016 1 次提交
    • W
      x86/apic: Prevent tracing on apic_msr_write_eoi() · 8ca22552
      Wanpeng Li 提交于
      The following RCU lockdep warning led to adding irq_enter()/irq_exit() into
      smp_reschedule_interrupt():
      
       RCU used illegally from idle CPU!
       rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       no locks held by swapper/1/0.
       
        do_trace_write_msr
        native_write_msr
        native_apic_msr_eoi_write
        smp_reschedule_interrupt
        reschedule_interrupt
      
      As Peterz pointed out:
      
      | So now we're making a very frequent interrupt slower because of debug 
      | code.
      |
      | The thing is, many many smp_reschedule_interrupt() invocations don't
      | actually execute anything much at all and are only sent to tickle the
      | return to user path (which does the actual preemption).
      | 
      | Having to do the whole irq_enter/irq_exit dance just for this unlikely
      | debug case totally blows.
      
      Use the wrmsr_notrace() variant in native_apic_msr_write_eoi, annotate the
      kvm variant with notrace and add a native_apic_eoi callback to the apic
      structure so KVM guests are covered as well.
      
      This allows to revert the irq_enter/irq_exit dance in
      smp_reschedule_interrupt().
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/1478488420-5982-3-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8ca22552
  6. 08 10月, 2016 1 次提交
    • T
      x86/apic: Prevent pointless warning messages · df610d67
      Thomas Gleixner 提交于
      Markus reported that he sees new warnings:
      
        APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 4/0x84 ignored.
        APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 5/0x85 ignored.
      
      This comes from the recent persistant cpuid - nodeid changes. The code
      which emits the warning has been called prior to these changes only for
      enabled processors. Now it's called for disabled processors as well to get
      the possible cpu accounting correct. So if the kernel is compiled for the
      number of actual available/enabled CPUs and the BIOS reports disabled CPUs
      as well then the above warnings are printed.
      
      That's a pointless exercise as it only makes sense if there are more CPUs
      enabled than the kernel supports.
      
      Nake the warning conditional on enabled processors so we are back to the
      state before these changes.
      
      Fixes: 8f54969d ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") 
      Reported-and-tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: linux-acpi@vger.kernel.org
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      df610d67
  7. 27 9月, 2016 1 次提交
  8. 22 9月, 2016 2 次提交
    • G
      x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping · 8f54969d
      Gu Zheng 提交于
      The whole patch-set aims at making cpuid <-> nodeid mapping persistent. So that,
      when node online/offline happens, cache based on cpuid <-> nodeid mapping such as
      wq_numa_possible_cpumask will not cause any problem.
      It contains 4 steps:
      1. Enable apic registeration flow to handle both enabled and disabled cpus.
      2. Introduce a new array storing all possible cpuid <-> apicid mapping.
      3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
      4. Establish all possible cpuid <-> nodeid mapping.
      
      This patch finishes step 2.
      
      In this patch, we introduce a new static array named cpuid_to_apicid[],
      which is large enough to store info for all possible cpus.
      
      And then, we modify the cpuid calculation. In generic_processor_info(),
      it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
      mapping changes with node hotplug.
      
      After this patch, we find the next unused cpuid, map it to an apicid,
      and store the mapping in cpuid_to_apicid[], so that cpuid <-> apicid
      mapping will be persistent.
      
      And finally we will use this array to make cpuid <-> nodeid persistent.
      
      cpuid <-> apicid mapping is established at local apic registeration time.
      But non-present or disabled cpus are ignored.
      
      In this patch, we establish all possible cpuid <-> apicid mapping when
      registering local apic.
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: mika.j.penttila@gmail.com
      Cc: len.brown@intel.com
      Cc: rafael@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: yasu.isimatu@gmail.com
      Cc: linux-mm@kvack.org
      Cc: linux-acpi@vger.kernel.org
      Cc: isimatu.yasuaki@jp.fujitsu.com
      Cc: gongzhaogang@inspur.com
      Cc: tj@kernel.org
      Cc: izumi.taku@jp.fujitsu.com
      Cc: cl@linux.com
      Cc: chen.tang@easystack.cn
      Cc: akpm@linux-foundation.org
      Cc: kamezawa.hiroyu@jp.fujitsu.com
      Cc: lenb@kernel.org
      Link: http://lkml.kernel.org/r/1472114120-3281-4-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8f54969d
    • G
      x86/acpi: Enable acpi to register all possible cpus at boot time · f7c28833
      Gu Zheng 提交于
      cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
      the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.
      
      When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
      which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
      workqueue does not update wq_numa_possible_cpumask.
      
      So here is the problem:
      
      Assume we have the following cpuid <-> nodeid in the beginning:
      
        Node | CPU
      
      ------------------------
      node 0 |  0-14, 60-74
      node 1 | 15-29, 75-89
      node 2 | 30-44, 90-104
      node 3 | 45-59, 105-119
      
      and we hot-remove node2 and node3, it becomes:
      
        Node | CPU
      ------------------------
      node 0 |  0-14, 60-74
      node 1 | 15-29, 75-89
      
      and we hot-add node4 and node5, it becomes:
      
        Node | CPU
      ------------------------
      node 0 |  0-14, 60-74
      node 1 | 15-29, 75-89
      node 4 | 30-59
      node 5 | 90-119
      
      But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.
      
      When a pool workqueue is initialized, if its cpumask belongs to a node, its
      pool->node will be mapped to that node. And memory used by this workqueue will
      also be allocated on that node.
      
      static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
      ...
              /* if cpumask is contained inside a NUMA node, we belong to that node */
              if (wq_numa_enabled) {
                      for_each_node(node) {
                              if (cpumask_subset(pool->attrs->cpumask,
                                                 wq_numa_possible_cpumask[node])) {
                                      pool->node = node;
                                      break;
                              }
                      }
              }
      
      Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
      which will lead to memory allocation failure:
      
       SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
        cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
        node 0: slabs: 6172, objs: 259224, free: 245741
        node 1: slabs: 3261, objs: 136962, free: 127656
      
      It happens here:
      
      create_worker(struct worker_pool *pool)
       |--> worker = alloc_worker(pool->node);
      
      static struct worker *alloc_worker(int node)
      {
              struct worker *worker;
      
              worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.
      
              ......
      
              return worker;
      }
      
      [Solution]
      
      There are four mappings in the kernel:
      1. nodeid (logical node id)   <->   pxm
      2. apicid (physical cpu id)   <->   nodeid
      3. cpuid (logical cpu id)     <->   apicid
      4. cpuid (logical cpu id)     <->   nodeid
      
      1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
         mapping is setup at boot time. This mapping is persistent, won't change.
      
      2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
         time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
         persistent.
      
      3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
         allocated, lower ids first, and released at CPU hotremove time, reused for other
         hotadded CPUs. So this mapping is not persistent.
      
      4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
         cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.
      
      To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
      cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
      cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
      mapping. So the key point is obtaining all cpus' apicid.
      
      apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
      MADT (Multiple APIC Description Table). So we finish the job in the following steps:
      
      1. Enable apic registeration flow to handle both enabled and disabled cpus.
         This is done by introducing an extra parameter to generic_processor_info to let the
         caller control if disabled cpus are ignored.
      
      2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
         the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
         registering local apic. Store the mapping in this array.
      
      3. Enable _MAT and MADT relative apis to return non-present or disabled cpus' apicid.
         This is also done by introducing an extra parameter to these apis to let the caller
         control if disabled cpus are ignored.
      
      4. Establish all possible cpuid <-> nodeid mapping.
         This is done via an additional acpi namespace walk for processors.
      
      This patch finished step 1.
      Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: mika.j.penttila@gmail.com
      Cc: len.brown@intel.com
      Cc: rafael@kernel.org
      Cc: rjw@rjwysocki.net
      Cc: yasu.isimatu@gmail.com
      Cc: linux-mm@kvack.org
      Cc: linux-acpi@vger.kernel.org
      Cc: isimatu.yasuaki@jp.fujitsu.com
      Cc: gongzhaogang@inspur.com
      Cc: tj@kernel.org
      Cc: izumi.taku@jp.fujitsu.com
      Cc: cl@linux.com
      Cc: chen.tang@easystack.cn
      Cc: akpm@linux-foundation.org
      Cc: kamezawa.hiroyu@jp.fujitsu.com
      Cc: lenb@kernel.org
      Link: http://lkml.kernel.org/r/1472114120-3281-3-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f7c28833
  9. 20 9月, 2016 1 次提交
  10. 08 9月, 2016 1 次提交
  11. 24 8月, 2016 2 次提交
  12. 15 8月, 2016 1 次提交
    • B
      x86/apic, ACPI: Remove the repeated lapic address override entry parsing · 6de42119
      Baoquan He 提交于
      The ACPI MADT has a 32-bit field providing lapic address at which
      each processor can access its lapic information. MADT also contains
      an optional entry to provide a 64-bit address to override the 32-bit
      one. However the current code does the lapic address override entry
      parsing twice. One is in early_acpi_boot_init() because AMD NUMA need
      get boot_cpu_id earlier. The other is in acpi_boot_init() which parses
      all MADT entries.
      
      So in this patch we remove the repeated code in the 2nd part.
      
      Meanwhile print lapic override entry information like other MADT entry,
      this will be added to boot log.
      
      This patch is not supposed to change any runtime behavior, other than
      improving kernel messages.
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-acpi@vger.kernel.org
      Cc: rjw@rjwysocki.net
      Link: http://lkml.kernel.org/r/1470985033-22493-2-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6de42119
  13. 10 8月, 2016 2 次提交
    • N
      x86/timers/apic: Inform TSC deadline clockevent device about recalibration · 6731b0d6
      Nicolai Stange 提交于
      This patch eliminates a source of imprecise APIC timer interrupts,
      which imprecision may result in double interrupts or even late
      interrupts.
      
      The TSC deadline clockevent devices' configuration and registration
      happens before the TSC frequency calibration is refined in
      tsc_refine_calibration_work().
      
      This results in the TSC clocksource and the TSC deadline clockevent
      devices being configured with slightly different frequencies: the former
      gets the refined one and the latter are configured with the inaccurate
      frequency detected earlier by means of the "Fast TSC calibration using PIT".
      
      Within the APIC code, introduce the notifier function
      lapic_update_tsc_freq() which reconfigures all per-CPU TSC deadline
      clockevent devices with the current tsc_khz.
      
      Call it from the TSC code after TSC calibration refinement has happened.
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christopher S. Hall <christopher.s.hall@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20160714152255.18295-3-nicstange@gmail.com
      [ Pushed #ifdef CONFIG_X86_LOCAL_APIC into header, improved changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6731b0d6
    • N
      x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents... · 1a9e4c56
      Nicolai Stange 提交于
      x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
      
      I noticed the following bug/misbehavior on certain Intel systems: with a
      single task running on a NOHZ CPU on an Intel Haswell, I recognized
      that I did not only get the one expected local_timer APIC interrupt, but
      two per second at minimum. (!)
      
      Further tracing showed that the first one precedes the programmed deadline
      by up to ~50us and hence, it did nothing except for reprogramming the TSC
      deadline clockevent device to trigger shortly thereafter again.
      
      The reason for this is imprecise calibration, the timeout we program into
      the APIC results in 'too short' timer interrupts. The core (hr)timer code
      notices this (because it has a precise ktime source and sees the short
      interrupt) and fixes it up by programming an additional very short
      interrupt period.
      
      This is obviously suboptimal.
      
      The reason for the imprecise calibration is twofold, and this patch
      fixes the first reason:
      
      In setup_APIC_timer(), the registered clockevent device's frequency
      is calculated by first dividing tsc_khz by TSC_DIVISOR and multiplying
      it with 1000 afterwards:
      
        (tsc_khz / TSC_DIVISOR) * 1000
      
      The multiplication with 1000 is done for converting from kHz to Hz and the
      division by TSC_DIVISOR is carried out in order to make sure that the final
      result fits into an u32.
      
      However, with the order given in this calculation, the roundoff error
      introduced by the division gets magnified by a factor of 1000 by the
      following multiplication.
      
      To fix it, reversing the order of the division and the multiplication a la:
      
        (tsc_khz * 1000) / TSC_DIVISOR
      
      ... reduces the roundoff error already.
      
      Furthermore, if TSC_DIVISOR divides 1000, associativity holds:
      
        (tsc_khz * 1000) / TSC_DIVISOR = tsc_khz * (1000 / TSC_DIVISOR)
      
      and thus, the roundoff error even vanishes and the whole operation can be
      carried out within 32 bits.
      
      The powers of two that divide 1000 are 2, 4 and 8. A value of 8 for
      TSC_DIVISOR still allows for TSC frequencies up to
      2^32 / 10^9ns * 8 = 34.4GHz which is way larger than anything to expect
      in the next years.
      
      Thus we also replace the current TSC_DIVISOR value of 32 by 8. Reverse
      the order of the divison and the multiplication in the calculation of
      the registered clockevent device's frequency.
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christopher S. Hall <christopher.s.hall@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20160714152255.18295-2-nicstange@gmail.com
      [ Improved changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1a9e4c56
  14. 04 8月, 2016 1 次提交
    • M
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada 提交于
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      clearer.
      
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      
      I noticed two cases where config_enabled() is used against a tristate
      option:
      
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
      
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      intention.
      
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Nikolay Martynov <mar.kolya@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: James Cowgill <James.Cowgill@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tony Wu <tung7970@gmail.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Cc: "Maciej W. Rozycki" <macro@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97f2645f
  15. 25 7月, 2016 1 次提交
  16. 14 7月, 2016 1 次提交
    • P
      x86/kernel: Audit and remove any unnecessary uses of module.h · 186f4360
      Paul Gortmaker 提交于
      Historically a lot of these existed because we did not have
      a distinction between what was modular code and what was providing
      support to modules via EXPORT_SYMBOL and friends.  That changed
      when we forked out support for the latter into the export.h file.
      
      This means we should be able to reduce the usage of module.h
      in code that is obj-y Makefile or bool Kconfig.  The advantage
      in doing so is that module.h itself sources about 15 other headers;
      adding significantly to what we feed cpp, and it can obscure what
      headers we are effectively using.
      
      Since module.h was the source for init.h (for __init) and for
      export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
      for the presence of either and replace as needed.  Build testing
      revealed some implicit header usage that was fixed up accordingly.
      
      Note that some bool/obj-y instances remain since module.h is
      the header for some exception table entry stuff, and for things
      like __init_or_module (code that is tossed when MODULES=n).
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      186f4360
  17. 10 6月, 2016 1 次提交
  18. 13 4月, 2016 2 次提交
  19. 31 3月, 2016 1 次提交
  20. 29 2月, 2016 1 次提交
    • T
      x86/topology: Create logical package id · 1f12e32f
      Thomas Gleixner 提交于
      For per package oriented services we must be able to rely on the number of CPU
      packages to be within bounds. Create a tracking facility, which
      
      - calculates the number of possible packages depending on nr_cpu_ids after boot
      
      - makes sure that the package id is within the number of possible packages. If
        the apic id is outside we map it to a logical package id if there is enough
        space available.
      
      Provide interfaces for drivers to query the mapping and do translations from
      physcial to logical ids.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Harish Chegondi <harish.chegondi@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160222221011.541071755@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f12e32f
  21. 24 2月, 2016 1 次提交
  22. 19 12月, 2015 1 次提交
    • H
      x86/apic: Introduce apic_extnmi command line parameter · b7c4948e
      Hidehiro Kawai 提交于
      This patch introduces a command line parameter apic_extnmi:
      
       apic_extnmi=( bsp|all|none )
      
      The default value is "bsp" and this is the current behavior: only the
      Boot-Strapping Processor receives an external NMI.
      
      "all" allows external NMIs to be broadcast to all CPUs. This would
      raise the success rate of panic on NMI when BSP hangs in NMI context
      or the external NMI is swallowed by other NMI handlers on the BSP.
      
      If you specify "none", no CPUs receive external NMIs. This is useful for
      the dump capture kernel so that it cannot be shot down by accidentally
      pressing the external NMI button (on platforms which have it) while
      saving a crash dump.
      Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Bandan Das <bsd@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: kexec@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: x86-ml <x86@kernel.org>
      Link: http://lkml.kernel.org/r/20151210014632.25437.43778.stgit@softrsSigned-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b7c4948e
  23. 24 11月, 2015 1 次提交
  24. 01 10月, 2015 1 次提交
  25. 15 9月, 2015 1 次提交
    • S
      x86/apic: Serialize LVTT and TSC_DEADLINE writes · 5d7c631d
      Shaohua Li 提交于
      The APIC LVTT register is MMIO mapped but the TSC_DEADLINE register is an
      MSR. The write to the TSC_DEADLINE MSR is not serializing, so it's not
      guaranteed that the write to LVTT has reached the APIC before the
      TSC_DEADLINE MSR is written. In such a case the write to the MSR is
      ignored and as a consequence the local timer interrupt never fires.
      
      The SDM decribes this issue for xAPIC and x2APIC modes. The
      serialization methods recommended by the SDM differ.
      
      xAPIC:
       "1. Memory-mapped write to LVT Timer Register, setting bits 18:17 to 10b.
        2. WRMSR to the IA32_TSC_DEADLINE MSR a value much larger than current time-stamp counter.
        3. If RDMSR of the IA32_TSC_DEADLINE MSR returns zero, go to step 2.
        4. WRMSR to the IA32_TSC_DEADLINE MSR the desired deadline."
      
      x2APIC:
       "To allow for efficient access to the APIC registers in x2APIC mode,
        the serializing semantics of WRMSR are relaxed when writing to the
        APIC registers. Thus, system software should not use 'WRMSR to APIC
        registers in x2APIC mode' as a serializing instruction. Read and write
        accesses to the APIC registers will occur in program order. A WRMSR to
        an APIC register may complete before all preceding stores are globally
        visible; software can prevent this by inserting a serializing
        instruction, an SFENCE, or an MFENCE before the WRMSR."
      
      The xAPIC method is to just wait for the memory mapped write to hit
      the LVTT by checking whether the MSR write has reached the hardware.
      There is no reason why a proper MFENCE after the memory mapped write would
      not do the same. Andi Kleen confirmed that MFENCE is sufficient for the
      xAPIC case as well.
      
      Issue MFENCE before writing to the TSC_DEADLINE MSR. This can be done
      unconditionally as all CPUs which have TSC_DEADLINE also have MFENCE
      support.
      
      [ tglx: Massaged the changelog ]
      Signed-off-by: NShaohua Li <shli@fb.com>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: <Kernel-team@fb.com>
      Cc: <lenb@kernel.org>
      Cc: <fenghua.yu@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org #v3.7+
      Link: http://lkml.kernel.org/r/20150909041352.GA2059853@devbig257.prn2.facebook.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5d7c631d
  26. 22 8月, 2015 1 次提交
    • T
      x86/apic: Fix fallout from x2apic cleanup · a57e456a
      Thomas Gleixner 提交于
      In the recent x2apic cleanup I got two things really wrong:
      1) The safety check in __disable_x2apic which allows the function to
         be called unconditionally is backwards. The check is there to
         prevent access to the apic MSR in case that the machine has no
         apic. Though right now it returns if the machine has an apic and
         therefor the disabling of x2apic is never invoked.
      
      2) x2apic_disable() sets x2apic_mode to 0 after registering the local
         apic. That's wrong, because register_lapic_address() checks x2apic
         mode and therefor takes the wrong code path.
      
      This results in boot failures on machines with x2apic preenabled by
      BIOS and can also lead to an fatal MSR access on machines without
      apic.
      
      The solutions are simple:
      1) Correct the sanity check for apic availability
      2) Clear x2apic_mode _before_ calling register_lapic_address()
      
      Fixes: 659006bf 'x86/x2apic: Split enable and setup function'
      Reported-and-tested-by: NJavier Monteagudo <javiermon@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1224764
      Cc: stable@vger.kernel.org # 4.0+
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      a57e456a
  27. 30 7月, 2015 2 次提交
  28. 06 7月, 2015 2 次提交
  29. 01 4月, 2015 1 次提交
  30. 14 2月, 2015 1 次提交
  31. 22 1月, 2015 3 次提交