1. 05 5月, 2010 12 次提交
    • E
      x86, acpi/irq: Handle isa irqs that are not identity mapped to gsi's. · 988856ee
      Eric W. Biederman 提交于
      ACPI irq source overrides are allowed for the 16 isa irqs and are
      allowed to map any gsi to any isa irq.  A few motherboards have been
      seen to take advantage of this and put the isa irqs on the 2nd or
      3rd ioapic.  This causes some problems, most notably the fact
      that we can not use any gsi < 16.
      
      To correct this move the gsis that are not isa irqs and have
      a gsi number < 16 into the linux irq space just past gsi_end.
      This is what the es7000 platform is doing today.  Moving only the
      low 16 gsis above the rest of the gsi's only penalizes weird
      platforms, leaving sane acpi implementations with a 1-1 mapping
      of gsis and irqs.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-14-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      988856ee
    • E
      x86, ioapic: Simplify probe_nr_irqs_gsi. · 4afc51a8
      Eric W. Biederman 提交于
      Use the global gsi_end value now that all ioapics have
      valid gsi numbers instead of a combination of acpi_probe_gsi
      and walking all of the ioapics and couting their number of
      entries by hand if acpi_probe_gsi gave us an answer we did
      not like.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-13-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      4afc51a8
    • E
      x86, ioapic: Optimize pin_2_irq · d464207c
      Eric W. Biederman 提交于
      Now that all ioapics have valid gsi_base values use this to
      accellerate pin_2_irq.  In the case of acpi this also ensures
      that pin_2_irq will compute the same irq value for an ioapic
      pin as acpi will.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-12-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d464207c
    • E
      x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic. · 7716a5c4
      Eric W. Biederman 提交于
      Now that all ioapic registration happens in mp_register_ioapic we can
      move the calculation of nr_ioapic_registers there from enable_IO_APIC.
      The number of ioapic registers is already calucated in mp_register_ioapic
      so all that really needs to be done is to save the caluclated value
      in nr_ioapic_registers.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-11-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      7716a5c4
    • E
      x86, ioapic: In mpparse use mp_register_ioapic · cf7500c0
      Eric W. Biederman 提交于
      Long ago MP_ioapic_info was the primary way of setting up our
      ioapic data structures and mp_register_ioapic was a compatibility
      shim for acpi code.  Now the situation is reversed and
      and mp_register_ioapic is the primary way of setting up our
      ioapic data structures.
      
      Keep the setting up of ioapic data structures uniform by
      having mp_register_ioapic call mp_register_ioapic.
      
      This changes a few fields:
      
      - type: is now hardset to MP_IOAPIC but type had to
        bey MP_IOAPIC or MP_ioapic_info would not have been called.
      
      - flags: is now hard coded to MPC_APIC_USABLE.
        We require flags to contain at least MPC_APIC_USEBLE in
        MP_ioapic_info and we don't ever examine flags so dropping
        a few flags that might possibly exist that we have never
        used is harmless.
      
      - apicaddr: Unchanged
      
      - apicver: Read from the ioapic instead of using the cached
        hardware value in the MP table.  The real hardware value
        will be more accurate.
      
      - apicid: Now verified to be unique and changed if it is not.
        If the BIOS got this right this is a noop.  If the BIOS did
        not fixing things appears to be the better solution.
      
      This adds gsi_base and gsi_end values to our ioapics defined with
      the mpatable, which will make our lives simpler later since
      we can always assume gsi_base and gsi_end are valid.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-10-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      cf7500c0
    • E
      x86, ioapic: Teach mp_register_ioapic to compute a global gsi_end · 5777372a
      Eric W. Biederman 提交于
      Add the global variable gsi_end and teach mp_register_ioapic
      to keep it uptodate as we add more ioapics into the system.
      
      ioapics can only be added early in boot so the code that
      runs later can treat gsi_end as a constant.
      
      Remove the have hacks in sfi.c to second guess mp_register_ioapic
      by keeping t's own running total of how many gsi's have been seen,
      and instead use the gsi_end.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-9-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5777372a
    • E
      x86, ioapic: Fix the types of gsi values · eddb0c55
      Eric W. Biederman 提交于
      This patches fixes the types of gsi_base and gsi_end values in
      struct mp_ioapic_gsi, and the gsi parameter of mp_find_ioapic
      and mp_find_ioapic_pin
      
      A gsi is cannonically a u32, not an int.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-8-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      eddb0c55
    • E
      x86, ioapic: Fix io_apic_redir_entries to return the number of entries. · 4b6b19a1
      Eric W. Biederman 提交于
      io_apic_redir_entries has a huge conceptual bug.  It returns the maximum
      redirection entry not the number of redirection entries.  Which simply
      does not match what the name of the function.  This just caught me
      and it caught  Feng Tang, and  Len Brown when they wrote sfi_parse_ioapic.
      
      Modify io_apic_redir_entries to actually return the number of redirection
      entries, and fix the callers so that they properly handle receiving the
      number of the number of redirection table entries, instead of the
      number of redirection table entries less one.
      
      While the usage in sfi.c does not show up in this patch it is fixed
      by virtue of the fact that io_apic_redir_entries now has the semantics
      sfi_parse_ioapic most reasonably expects.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-7-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      4b6b19a1
    • E
      x86, acpi/irq: Generalize mp_config_acpi_legacy_irqs · 0fd52670
      Eric W. Biederman 提交于
      Remove the assumption that there is not an override for isa irq 0.
      Instead lookup the gsi and from that lookup the ioapic and pin of each
      isa irq indivdually.
      
      In general this should not have any behavioural affect but in
      perverse cases this gets all of the details correct, instead of
      doing something weird.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-5-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      0fd52670
    • E
      x86, acpi/irq: Fix acpi_sci_ioapic_setup so it has both bus_irq and gsi · 9d2062b8
      Eric W. Biederman 提交于
      Currently acpi_sci_ioapic_setup calls mp_override_legacy_irq with
      bus_irq == gsi, which is wrong if we are comming from an override
      Instead pass the bus_irq into acpi_sci_ioapic_setup.
      
      This fix was inspired by a similar fix from:
      Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-4-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9d2062b8
    • E
      x86, acpi/irq: Teach acpi_get_override_irq to take a gsi not an isa_irq · 9a0a91bb
      Eric W. Biederman 提交于
      In perverse acpi implementations the isa irqs are not identity mapped
      to the first 16 gsi.  Furthermore at least the extended interrupt
      resource capability may return gsi's and not isa irqs.  So since
      what we get from acpi is a gsi teach acpi_get_overrride_irq to
      operate on a gsi instead of an isa_irq.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-2-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9a0a91bb
    • E
      x86, acpi/irq: Introduce apci_isa_irq_to_gsi · 2c2df841
      Eric W. Biederman 提交于
      There are a number of cases where the current code makes the assumption
      that isa irqs identity map to the first 16 acpi global system intereupts.
      In most instances that assumption is correct as that is the required
      behaviour in dual i8259 mode and the default behavior in ioapic mode.
      
      However there are some systems out there that take advantage of acpis
      interrupt remapping  for the isa irqs to have a completely different
      mapping of isa_irq to gsi.
      
      Introduce acpi_isa_irq_to_gsi to perform this mapping explicitly in the
      code that needs it.  Initially this will be just the current assumed
      identity mapping to ensure it's introduction does not cause regressions.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <1269936436-7039-1-git-send-email-ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      2c2df841
  2. 25 4月, 2010 1 次提交
    • D
      VMware Balloon driver · 453dc659
      Dmitry Torokhov 提交于
      This is a standalone version of VMware Balloon driver.  Ballooning is a
      technique that allows hypervisor dynamically limit the amount of memory
      available to the guest (with guest cooperation).  In the overcommit
      scenario, when hypervisor set detects that it needs to shuffle some
      memory, it instructs the driver to allocate certain number of pages, and
      the underlying memory gets returned to the hypervisor.  Later hypervisor
      may return memory to the guest by reattaching memory to the pageframes and
      instructing the driver to "deflate" balloon.
      
      We are submitting a standalone driver because KVM maintainer (Avi Kivity)
      expressed opinion (rightly) that our transport does not fit well into
      virtqueue paradigm and thus it does not make much sense to integrate with
      virtio.
      
      There were also some concerns whether current ballooning technique is the
      right thing.  If there appears a better framework to achieve this we are
      prepared to evaluate and switch to using it, but in the meantime we'd like
      to get this driver upstream.
      
      We want to get the driver accepted in distributions so that users do not
      have to deal with an out-of-tree module and many distributions have
      "upstream first" requirement.
      
      The driver has been shipping for a number of years and users running on
      VMware platform will have it installed as part of VMware Tools even if it
      will not come from a distribution, thus there should not be additional
      risk in pulling the driver into mainline.  The driver will only activate
      if host is VMware so everyone else should not be affected at all.
      Signed-off-by: NDmitry Torokhov <dtor@vmware.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      453dc659
  3. 24 4月, 2010 2 次提交
    • H
      x86: Disable large pages on CPUs with Atom erratum AAE44 · 7a0fc404
      H. Peter Anvin 提交于
      Atom erratum AAE44/AAF40/AAG38/AAH41:
      
      "If software clears the PS (page size) bit in a present PDE (page
      directory entry), that will cause linear addresses mapped through this
      PDE to use 4-KByte pages instead of using a large page after old TLB
      entries are invalidated. Due to this erratum, if a code fetch uses
      this PDE before the TLB entry for the large page is invalidated then
      it may fetch from a different physical address than specified by
      either the old large page translation or the new 4-KByte page
      translation. This erratum may also cause speculative code fetches from
      incorrect addresses."
      
      [http://download.intel.com/design/processor/specupdt/319536.pdf]
      
      Where as commit 211b3d03 seems to
      workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
      opportunity for the bug to occur and does not totally remove it.  This
      patch disables mixed 4K/4MB page tables totally avoiding the page
      splitting and not tripping this processor issue.
      
      This is based on an original patch by Colin King.
      Originally-by: NColin Ian King <colin.king@canonical.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com>
      Cc: <stable@kernel.org>
      7a0fc404
    • H
      x86-64: Clear a 64-bit FS/GS base on fork if selector is nonzero · 7ce5a2b9
      H. Peter Anvin 提交于
      When we do a thread switch, we clear the outgoing FS/GS base if the
      corresponding selector is nonzero.  This is taken by __switch_to() as
      an entry invariant; it does not verify that it is true on entry.
      However, copy_thread() doesn't enforce this constraint, which can
      result in inconsistent results after fork().
      
      Make copy_thread() match the behavior of __switch_to().
      Reported-and-tested-by: NSamuel Thibault <samuel.thibault@inria.fr>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <4BD1E061.8030605@zytor.com>
      Cc: <stable@kernel.org>
      7ce5a2b9
  4. 21 4月, 2010 1 次提交
  5. 09 4月, 2010 1 次提交
    • F
      perf: Fix unsafe frame rewinding with hot regs fetching · ab285f2b
      Frederic Weisbecker 提交于
      When we fetch the hot regs and rewind to the nth caller, it
      might happen that we dereference a frame pointer outside the
      kernel stack boundaries, like in this example:
      
      	perf_trace_sched_switch+0xd5/0x120
              schedule+0x6b5/0x860
              retint_careful+0xd/0x21
      
      Since we directly dereference a userspace frame pointer here while
      rewinding behind retint_careful, this may end up in a crash.
      
      Fix this by simply using probe_kernel_address() when we rewind the
      frame pointer.
      
      This issue will have a much more proper fix in the next version of the
      perf_arch_fetch_caller_regs() API that will only need to rewind to the
      first caller.
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Archs <linux-arch@vger.kernel.org>
      ab285f2b
  6. 07 4月, 2010 5 次提交
  7. 06 4月, 2010 1 次提交
    • V
      perf, x86: Enable Nehalem-EX support · 134fbadf
      Vince Weaver 提交于
      According to Intel Software Devel Manual Volume 3B, the
      Nehalem-EX PMU is just like regular Nehalem (except for the
      uncore support, which is completely different).
      Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Lin Ming <ming.m.lin@intel.com>
      LKML-Reference: <alpine.DEB.2.00.1004060956580.1417@cl320.eecs.utk.edu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      134fbadf
  8. 03 4月, 2010 4 次提交
    • S
      x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards · 472a474c
      Suresh Siddha 提交于
      Jan Grossmann reported kernel boot panic while booting SMP
      kernel on his system with a single core cpu. SMP kernels call
      enable_IR_x2apic() from native_smp_prepare_cpus() and on
      platforms where the kernel doesn't find SMP configuration we
      ended up again calling enable_IR_x2apic() from the
      APIC_init_uniprocessor() call in the smp_sanity_check(). Thus
      leading to kernel panic.
      
      Don't call enable_IR_x2apic() and default_setup_apic_routing()
      from APIC_init_uniprocessor() in CONFIG_SMP case.
      
      NOTE: this kind of non-idempotent and assymetric initialization
      sequence is rather fragile and unclean, we'll clean that up
      in v2.6.35. This is the minimal fix for v2.6.34.
      
      Reported-by: Jan.Grossmann@kielnet.net
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: <jbarnes@virtuousgeek.org>
      Cc: <david.woodhouse@intel.com>
      Cc: <weidong.han@intel.com>
      Cc: <youquan.song@intel.com>
      Cc: <Jan.Grossmann@kielnet.net>
      Cc: <stable@kernel.org> # [v2.6.32.x, v2.6.33.x]
      LKML-Reference: <1270083887.7835.78.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      472a474c
    • T
      perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernels · 257ef9d2
      Torok Edwin 提交于
      When profiling a 32-bit process on a 64-bit kernel, callgraph tracing
      stopped after the first function, because it has seen a garbage memory
      address (tried to interpret the frame pointer, and return address as a
      64-bit pointer).
      
      Fix this by using a struct stack_frame with 32-bit pointers when the
      TIF_IA32 flag is set.
      
      Note that TIF_IA32 flag must be used, and not is_compat_task(), because
      the latter is only set when the 32-bit process is executing a syscall,
      which may not always be the case (when tracing page fault events for
      example).
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: x86@kernel.org
      Cc: linux-kernel@vger.kernel.org
      LKML-Reference: <1268820436-13145-1-git-send-email-edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      257ef9d2
    • P
      perf, x86: Fix AMD hotplug & constraint initialization · b38b24ea
      Peter Zijlstra 提交于
      Commit 3f6da390 ("perf: Rework and fix the arch CPU-hotplug hooks") moved
      the amd northbridge allocation from CPUS_ONLINE to CPUS_PREPARE_UP
      however amd_nb_id() doesn't work yet on prepare so it would simply bail
      basically reverting to a state where we do not properly track node wide
      constraints - causing weird perf results.
      
      Fix up the AMD NorthBridge initialization code by allocating from
      CPU_UP_PREPARE and installing it from CPU_STARTING once we have the
      proper nb_id. It also properly deals with the allocation failing.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      [ robustify using amd_has_nb() ]
      Signed-off-by: NStephane Eranian <eranian@google.com>
      LKML-Reference: <1269353485.5109.48.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b38b24ea
    • P
      x86: Move notify_cpu_starting() callback to a later stage · 85257024
      Peter Zijlstra 提交于
      Because we need to have cpu identification things done by the time we run
      CPU_STARTING notifiers.
      
      ( This init ordering will be relied on by the next fix. )
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1269353485.5109.48.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      85257024
  9. 02 4月, 2010 4 次提交
    • Y
      ibft, x86: Change reserve_ibft_region() to find_ibft_region() · 042be38e
      Yinghai Lu 提交于
      This allows arch code could decide the way to reserve the ibft.
      
      And we should reserve ibft as early as possible, instead of BOOTMEM
      stage, in case the table is in RAM range and is not reserved by BIOS
      (this will often be the case.)
      
      Move to just after find_smp_config().
      
      Also when CONFIG_NO_BOOTMEM=y, We will not have reserve_bootmem() anymore.
      
      -v2: fix typo about ibft pointed by Konrad Rzeszutek Wilk <konrad@darnok.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4BB510FB.80601@kernel.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
      CC: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      042be38e
    • A
      x86, hpet: Fix bug in RTC emulation · b4a5e8a1
      Alok Kataria 提交于
      We think there exists a bug in the HPET code that emulates the RTC.
      
      In the normal case, when the RTC frequency is set, the rtc driver tells
      the hpet code about it here:
      
      int hpet_set_periodic_freq(unsigned long freq)
      {
              uint64_t clc;
      
              if (!is_hpet_enabled())
                      return 0;
      
              if (freq <= DEFAULT_RTC_INT_FREQ)
                      hpet_pie_limit = DEFAULT_RTC_INT_FREQ / freq;
              else {
                      clc = (uint64_t) hpet_clockevent.mult * NSEC_PER_SEC;
                      do_div(clc, freq);
                      clc >>= hpet_clockevent.shift;
                      hpet_pie_delta = (unsigned long) clc;
              }
              return 1;
      }
      
      If freq is set to 64Hz (DEFAULT_RTC_INT_FREQ) or lower, then
      hpet_pie_limit (a static) is set to non-zero.  Then, on every one-shot
      HPET interrupt, hpet_rtc_timer_reinit is called to compute the next
      timeout.  Well, that function has this logic:
      
              if (!(hpet_rtc_flags & RTC_PIE) || hpet_pie_limit)
                      delta = hpet_default_delta;
              else
                      delta = hpet_pie_delta;
      
      Since hpet_pie_limit is not 0, hpet_default_delta is used.  That
      corresponds to 64Hz.
      
      Now, if you set a different rtc frequency, you'll take the else path
      through hpet_set_periodic_freq, but unfortunately no one resets
      hpet_pie_limit back to 0.
      
      Boom....now you are stuck with 64Hz RTC interrupts forever.
      
      The patch below just resets the hpet_pie_limit value when requested freq
      is greater than DEFAULT_RTC_INT_FREQ, which we think fixes this problem.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      LKML-Reference: <201003112200.o2BM0Hre012875@imap1.linux-foundation.org>
      Signed-off-by: NDaniel Hecht <dhecht@vmware.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b4a5e8a1
    • P
      x86, hpet: Erratum workaround for read after write of HPET comparator · 8da854cb
      Pallipadi, Venkatesh 提交于
      On Wed, Feb 24, 2010 at 03:37:04PM -0800, Justin Piszcz wrote:
      > Hello,
      >
      > Again, on the Intel DP55KG board:
      >
      > # uname -a
      > Linux host 2.6.33 #1 SMP Wed Feb 24 18:31:00 EST 2010 x86_64 GNU/Linux
      >
      > [    1.237600] ------------[ cut here ]------------
      > [    1.237890] WARNING: at arch/x86/kernel/hpet.c:404 hpet_next_event+0x70/0x80()
      > [    1.238221] Hardware name:
      > [    1.238504] hpet: compare register read back failed.
      > [    1.238793] Modules linked in:
      > [    1.239315] Pid: 0, comm: swapper Not tainted 2.6.33 #1
      > [    1.239605] Call Trace:
      > [    1.239886]  <IRQ>  [<ffffffff81056c13>] ? warn_slowpath_common+0x73/0xb0
      > [    1.240409]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
      > [    1.240699]  [<ffffffff81056cb0>] ? warn_slowpath_fmt+0x40/0x50
      > [    1.240992]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
      > [    1.241281]  [<ffffffff81041ad0>] ? hpet_next_event+0x70/0x80
      > [    1.241573]  [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0
      > [    1.241859]  [<ffffffff81078e32>] ? tick_handle_oneshot_broadcast+0xe2/0x100
      > [    1.246533]  [<ffffffff8102a67a>] ? timer_interrupt+0x1a/0x30
      > [    1.246826]  [<ffffffff81085499>] ? handle_IRQ_event+0x39/0xd0
      > [    1.247118]  [<ffffffff81087368>] ? handle_edge_irq+0xb8/0x160
      > [    1.247407]  [<ffffffff81029f55>] ? handle_irq+0x15/0x20
      > [    1.247689]  [<ffffffff810294a2>] ? do_IRQ+0x62/0xe0
      > [    1.247976]  [<ffffffff8146be53>] ? ret_from_intr+0x0/0xa
      > [    1.248262]  <EOI>  [<ffffffff8102f277>] ? mwait_idle+0x57/0x80
      > [    1.248796]  [<ffffffff8102645c>] ? cpu_idle+0x5c/0xb0
      > [    1.249080] ---[ end trace db7f668fb6fef4e1 ]---
      >
      > Is this something Intel has to fix or is it a bug in the kernel?
      
      This is a chipset erratum.
      
      Thomas: You mentioned we can retain this check only for known-buggy and
      hpet debug kind of options. But here is the simple workaround patch for
      this particular erratum.
      
      Some chipsets have a erratum due to which read immediately following a
      write of HPET comparator returns old comparator value instead of most
      recently written value.
      
      Erratum 15 in
      "Intel I/O Controller Hub 9 (ICH9) Family Specification Update"
      (http://www.intel.com/assets/pdf/specupdate/316973.pdf)
      
      Workaround for the errata is to read the comparator twice if the first
      one fails.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      LKML-Reference: <20100225185348.GA9674@linux-os.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com>
      Cc: <stable@kernel.org>
      8da854cb
    • A
      x86: Handle overlapping mptables · 909fc87b
      Andi Kleen 提交于
      We found a system where the MP table MPC and MPF structures overlap.
      
      That doesn't really matter because the mptable is not used anyways with ACPI,
      but it leads to a panic in the early allocator due to the overlapping
      reservations in 2.6.33.
      
      Earlier kernels handled this without problems.
      
      Simply change these reservations to reserve_early_overlap_ok to avoid
      the panic.
      Reported-by: NThomas Renninger <trenn@suse.de>
      Tested-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      LKML-Reference: <20100329074111.GA22821@basil.fritz.box>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: <stable@kernel.org>
      909fc87b
  10. 01 4月, 2010 3 次提交
    • J
      x86,kgdb: Always initialize the hw breakpoint attribute · ab310b5e
      Jason Wessel 提交于
      It is required to call hw_breakpoint_init() on an attr before using it
      in any other calls.  This fixes the problem where kgdb will sometimes
      fail to initialize on x86_64.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: 2.6.33 <stable@kernel.org>
      LKML-Reference: <1269975907-27602-1-git-send-email-jason.wessel@windriver.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      ab310b5e
    • F
      perf: Use hot regs with software sched switch/migrate events · e49a5bd3
      Frederic Weisbecker 提交于
      Scheduler's task migration events don't work because they always
      pass NULL regs perf_sw_event(). The event hence gets filtered
      in perf_swevent_add().
      
      Scheduler's context switches events use task_pt_regs() to get
      the context when the event occured which is a wrong thing to
      do as this won't give us the place in the kernel where we went
      to sleep but the place where we left userspace. The result is
      even more wrong if we switch from a kernel thread.
      
      Use the hot regs snapshot for both events as they belong to the
      non-interrupt/exception based events family. Unlike page faults
      or so that provide the regs matching the exact origin of the event,
      we need to save the current context.
      
      This makes the task migration event working and fix the context
      switch callchains and origin ip.
      
      Example: perf record -a -e cs
      
      Before:
      
          10.91%      ksoftirqd/0                  0  [k] 0000000000000000
                      |
                      --- (nil)
                          perf_callchain
                          perf_prepare_sample
                          __perf_event_overflow
                          perf_swevent_overflow
                          perf_swevent_add
                          perf_swevent_ctx_event
                          do_perf_sw_event
                          __perf_sw_event
                          perf_event_task_sched_out
                          schedule
                          run_ksoftirqd
                          kthread
                          kernel_thread_helper
      
      After:
      
          23.77%  hald-addon-stor  [kernel.kallsyms]  [k] schedule
                  |
                  --- schedule
                     |
                     |--60.00%-- schedule_timeout
                     |          wait_for_common
                     |          wait_for_completion
                     |          blk_execute_rq
                     |          scsi_execute
                     |          scsi_execute_req
                     |          sr_test_unit_ready
                     |          |
                     |          |--66.67%-- sr_media_change
                     |          |          media_changed
                     |          |          cdrom_media_changed
                     |          |          sr_block_media_changed
                     |          |          check_disk_change
                     |          |          cdrom_open
      
      v2: Always build perf_arch_fetch_caller_regs() now that software
      events need that too. They don't need it from modules, unlike trace
      events, so we keep the EXPORT_SYMBOL in trace_event_perf.c
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      e49a5bd3
    • Y
      x86: Make e820_remove_range to handle all covered case · 9f3a5f52
      Yinghai Lu 提交于
      Rusty found on lguest with trim_bios_range, max_pfn is not right anymore, and
      looks e820_remove_range does not work right.
      
      [    0.000000] BIOS-provided physical RAM map:
      [    0.000000]  LGUEST: 0000000000000000 - 0000000004000000 (usable)
      [    0.000000] Notice: NX (Execute Disable) protection missing in CPU or disabled in BIOS!
      [    0.000000] DMI not present or invalid.
      [    0.000000] last_pfn = 0x3fa0 max_arch_pfn = 0x100000
      [    0.000000] init_memory_mapping: 0000000000000000-0000000003fa0000
      
      root cause is: the e820_remove_range doesn't handle the all covered
      case.  e820_remove_range(BIOS_START, BIOS_END - BIOS_START, ...)
      produces a bogus range as a result.
      
      Make it match e820_update_range() by handling that case too.
      Reported-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Tested-by: NRusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <4BB18E55.6090903@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9f3a5f52
  11. 30 3月, 2010 3 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
    • Y
      x86: Make sure free_init_pages() frees pages on page boundary · c967da6a
      Yinghai Lu 提交于
      When CONFIG_NO_BOOTMEM=y, it could use memory more effiently, or
      in a more compact fashion.
      
      Example:
      
       Allocated new RAMDISK: 00ec2000 - 0248ce57
       Move RAMDISK from 000000002ea04000 - 000000002ffcee56 to 00ec2000 - 0248ce56
      
      The new RAMDISK's end is not page aligned.
      Last page could be shared with other users.
      
      When free_init_pages are called for initrd or .init, the page
      could be freed and we could corrupt other data.
      
      code segment in free_init_pages():
      
       |        for (; addr < end; addr += PAGE_SIZE) {
       |                ClearPageReserved(virt_to_page(addr));
       |                init_page_count(virt_to_page(addr));
       |                memset((void *)(addr & ~(PAGE_SIZE-1)),
       |                        POISON_FREE_INITMEM, PAGE_SIZE);
       |                free_page(addr);
       |                totalram_pages++;
       |        }
      
      last half page could be used as one whole free page.
      
      So page align the boundaries.
      
      -v2: make the original initramdisk to be aligned, according to
           Johannes, otherwise we have the chance to lose one page.
           we still need to keep initrd_end not aligned, otherwise it could
           confuse decompressor.
      -v3: change to WARN_ON instead, suggested by Johannes.
      -v4: use PAGE_ALIGN, suggested by Johannes.
           We may fix that macro name later to PAGE_ALIGN_UP, and PAGE_ALIGN_DOWN
           Add comments about assuming ramdisk start is aligned
           in relocate_initrd(), change to re get ramdisk_image instead of save it
           to make diff smaller. Add warning for wrong range, suggested by Johannes.
      -v6: remove one WARN()
           We need to align beginning in free_init_pages()
           do not copy more than ramdisk_size, noticed by Johannes
      Reported-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Tested-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1269830604-26214-3-git-send-email-yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c967da6a
    • Y
      x86: Make smp_locks end with page alignment · 596b711e
      Yinghai Lu 提交于
      Fix:
      
       ------------[ cut here ]------------
       WARNING: at arch/x86/mm/init.c:342 free_init_pages+0x4c/0xfa()
       free_init_pages: range [0x40daf000, 0x40db5c24] is not aligned
       Modules linked in:
       Pid: 0, comm: swapper Not tainted
       2.6.34-rc2-tip-03946-g4f16b23-dirty #50 Call Trace:
        [<40232e9f>] warn_slowpath_common+0x65/0x7c
        [<4021c9f0>] ? free_init_pages+0x4c/0xfa
        [<40881434>] ? _etext+0x0/0x24
        [<40232eea>] warn_slowpath_fmt+0x24/0x27
        [<4021c9f0>] free_init_pages+0x4c/0xfa
        [<40881434>] ? _etext+0x0/0x24
        [<40d3f4bd>] alternative_instructions+0xf6/0x100
        [<40d3fe4f>] check_bugs+0xbd/0xbf
        [<40d398a7>] start_kernel+0x2d5/0x2e4
        [<40d390ce>] i386_start_kernel+0xce/0xd5
       ---[ end trace 4eaa2a86a8e2da22 ]---
      
      Comments in vmlinux.lds.S already said:
      
       |        /*
       |         * smp_locks might be freed after init
       |         * start/end must be page aligned
       |         */
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1269830604-26214-2-git-send-email-yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      596b711e
  12. 23 3月, 2010 1 次提交
  13. 20 3月, 2010 1 次提交
    • A
      x86, amd: Restrict usage of c1e_idle() · 035a02c1
      Andreas Herrmann 提交于
      Currently c1e_idle returns true for all CPUs greater than or equal to
      family 0xf model 0x40. This covers too many CPUs.
      
      Meanwhile a respective erratum for the underlying problem was filed
      (#400). This patch adds the logic to check whether erratum #400
      applies to a given CPU.
      Especially for CPUs where SMI/HW triggered C1e is not supported,
      c1e_idle() doesn't need to be used. We can check this by looking at
      the respective OSVW bit for erratum #400.
      
      Cc: <stable@kernel.org> # .32.x .33.x
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      LKML-Reference: <20100319110922.GA19614@alberich.amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      035a02c1
  14. 17 3月, 2010 1 次提交
    • F
      perf: Fix unexported generic perf_arch_fetch_caller_regs · dcd5c166
      Frederic Weisbecker 提交于
      perf_arch_fetch_caller_regs() is exported for the overriden x86
      version, but not for the generic weak version.
      
      As a general rule, weak functions should not have their symbol
      exported in the same file they are defined.
      
      So let's export it on trace_event_perf.c as it is used by trace
      events only.
      
      This fixes:
      
      	ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
      	ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
      
      -v2: And also only build it if trace events are enabled.
      -v3: Fix changelog mistake
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dcd5c166