1. 26 9月, 2006 40 次提交
    • A
      [PATCH] Don't leak NT bit into next task · 658fdbef
      Andi Kleen 提交于
      SYSENTER can cause a NT to be set which might cause crashes on the IRET
      in the next task.
      
      Following similar i386 patch from Linus.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      658fdbef
    • J
      [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder · adf14236
      Jan Beulich 提交于
      Current gcc generates calls not jumps to noreturn functions. When that happens the
      return address can point to the next function, which confuses the unwinder.
      
      This patch works around it by marking asynchronous exception
      frames in contrast normal call frames in the unwind information.  Then teach
      the unwinder to decode this.
      
      For normal call frames the unwinder now subtracts one from the address which avoids
      this problem.  The standard libgcc unwinder uses the same trick.
      
      It doesn't include adjustment of the printed address (i.e. for the original
      example, it'd still be kernel_math_error+0 that gets displayed, but the
      unwinder wouldn't get confused anymore.
      
      This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16
      unfortunately because earlier binutils don't support .cfi_signal_frame
      
      [AK: added automatic detection of the new binutils and wrote description]
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      adf14236
    • A
      [PATCH] Remove all traces of signal number conversion · dd54a110
      Andi Kleen 提交于
      This was old code that was needed for iBCS and x86-64 never supported that.
      
      Pointed out by Albert Cahalan
      Signed-off-by: NAndi Kleen <ak@suse.de>
      dd54a110
    • A
      [PATCH] Don't synchronize time reading on single core AMD systems · 2049336f
      Andi Kleen 提交于
      We do some additional CPU synchronization in gettimeofday et.al. to make
      sure the time stamps are always monotonic over multiple CPUs. But on
      single core systems that is not needed. So don't do it.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      2049336f
    • A
      [PATCH] Use string instructions for Core2 copy/clear · 27fbe5b2
      Andi Kleen 提交于
      It is faster than using a unrolled loop for the use cases the kernel
      cares about (cached, sizes typically < 4K)
      Signed-off-by: NAndi Kleen <ak@suse.de>
      27fbe5b2
    • M
      [PATCH] x86: - restore i8259A eoi status on resume · 35d534a3
      Matthew Garrett 提交于
      Got it. i8259A_resume calls init_8259A(0) unconditionally, even if
      auto_eoi has been set. Keep track of the current status and restore that
      on resume. This fixes it for AMD64 and i386.
      Signed-off-by: NMatthew Garrett <mjg59@srcf.ucam.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      35d534a3
    • A
      [PATCH] Insert GART region into resource map · 56dd669a
      Aaron Durbin 提交于
      Patch inserts the GART region into the iomem resource map. The GART will then
      be visible within /proc/iomem. It will also allow for other users
      utilizing the GART to subreserve the region (agp or IOMMU).
      Signed-off-by: NAaron Durbin <adurbin@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      56dd669a
    • A
      [PATCH] Fix idle notifiers · a15da49d
      Andi Kleen 提交于
      Previously exit_idle would be called more often than enter_idle
      
      Now instead of using complicated tests just keep track of it
      using the per CPU variable as a flip flop.  I moved the idle state into the
      PDA to make the access more efficient.
      
      Original bug report and an initial patch from Stephane Eranian,
      but redone by AK.
      
      Cc: Stephane Eranian <eranian@hpl.hp.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      a15da49d
    • A
      [PATCH] Mark per cpu data initialization __initdata again · b38337a6
      Andi Kleen 提交于
      Before 2.6.16 this was changed to work around code that accessed
      CPUs not in the possible map. But that code should be all fixed now,
      so mark it __initdata again.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      b38337a6
    • A
      [PATCH] Fix zeroing on exception in copy_*_user · 3022d734
      Andi Kleen 提交于
      - Don't zero for __copy_from_user_inatomic following i386.
      This will prevent spurious zeros for parallel file system writers when
      one does a exception
      - The string instruction version didn't zero the output on
      exception. Oops.
      
      Also I cleaned up the code a bit while I was at it and added a minor
      optimization to the string instruction path.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      3022d734
    • A
      [PATCH] insert IOAPIC(s) and Local APIC into resource map · 54dbc0c9
      adurbin@google.com 提交于
      This patch places the IOAPIC(s) and the Local APIC specified by ACPI
      tables into the resource map. The APICs will then be visible within
      /proc/iomem
      Signed-off-by: NAaron Durbin <adurbin@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      54dbc0c9
    • A
      [PATCH] Fix a PDA warning uncovered by the new type checking · 7b0bda74
      Andi Kleen 提交于
      Fix
      linux/arch/x86_64/kernel/process.c: In function __switch_to:
      linux/arch/x86_64/kernel/process.c:626: warning: assignment makes integer from pointer without a cast
      Signed-off-by: NAndi Kleen <ak@suse.de>
      7b0bda74
    • A
      [PATCH] Fix a irqcount comment in entry.S · 96e54049
      Andi Kleen 提交于
      Signed-off-by: NAndi Kleen <ak@suse.de>
      96e54049
    • A
      [PATCH] Add the canary field to the PDA area and the task struct · 0a425405
      Arjan van de Ven 提交于
      This patch adds the per thread cookie field to the task struct and the PDA.
      Also it makes sure that the PDA value gets the new cookie value at context
      switch, and that a new task gets a new cookie at task creation time.
      Signed-off-by: NArjan van Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      CC: Andi Kleen <ak@suse.de>
      0a425405
    • A
      [PATCH] Don't use kernel_text_address in oops context · e8c7391d
      Andi Kleen 提交于
      Because it can take spinlocks.
      
      Suggested by Mathieu Desnoyers
      
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      e8c7391d
    • M
      [PATCH] Avoid overwriting the current pgd (V4, x86_64) · 4bfaaef0
      Magnus Damm 提交于
      kexec: Avoid overwriting the current pgd (V4, x86_64)
      
      This patch upgrades the x86_64-specific kexec code to avoid overwriting the
      current pgd. Overwriting the current pgd is bad when CONFIG_CRASH_DUMP is used
      to start a secondary kernel that dumps the memory of the previous kernel.
      
      The code introduces a new set of page tables. These tables are used to provide
      an executable identity mapping without overwriting the current pgd.
      Signed-off-by: NMagnus Damm <magnus@valinux.co.jp>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      4bfaaef0
    • K
      [PATCH] Remove most of the special cases for the debug IST stack · f5741644
      Keith Owens 提交于
      Remove most of the special cases for the debug IST stack.  This is a
      follow on clean up patch, it requires the bug fix patch that adds
      orig_ist.
      Signed-off-by: NKeith Owens <kaos@ocs.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      f5741644
    • A
      [PATCH] Optimize PDA accesses slightly · 53ee11ae
      Andi Kleen 提交于
      Based on a idea by Jeremy Fitzhardinge:
      
      Replace the volatiles and memory clobbers in the PDA access with
      telling gcc about access to a proxy PDA structure that doesn't
      actually exist. But the dummy accesses give a defined ordering for
      read/write accesses.
      
      Also add some memory barriers to the early GS initialization to
      make sure no PDA access is moved before it.
      
      Advantage is some .text savings (probably most from better
      code for accessing "current"):
      
         text    data     bss     dec     hex filename
      4845647 1223688  615864 6685199  66020f vmlinux
      4837780 1223688  615864 6677332  65e354 vmlinux-pda
      
      1.2% smaller code
      
      Cc:  Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      53ee11ae
    • I
      [PATCH] Put .note.* sections into a PT_NOTE segment · f2a9e1de
      Ian Campbell 提交于
      This patch updates x86_64 linker script to pack any .note.* sections
      into a PT_NOTE segment in the output file.
      
      To do this, we tell ld that we need a PT_NOTE segment.  This requires
      us to start explicitly mapping sections to segments, so we also need
      to explicitly create PT_LOAD segments for text and data, and map the
      sections to them appropriately.  Fortunately, each section will
      default to its previous section's segment, so it doesn't take many
      changes to vmlinux.lds.S.
      
      The corresponding change is already made for i386 in -mm and I'd like
      this patch to join it. The section to segment mappings do change as do
      the segment flags so some time in -mm would be good for that reason as
      well, just in case.
      
      In particular .data and .bss move from the text segment to the data
      segment and .data.cacheline_aligned .data.read_mostly are put in the
      data segment instead of a separate one.
      
      I think that it would be possible to exactly match the existing section
      to segment mapping and flags but it would be a more intrusive change and
      I'm not sure there is a reason for the existing layout other than it is
      what you get by default if you don't explicitly specify something else.
      If there is a reason for the existing layout then I will of course make
      the more intrusive change. If there is no reason we could probably drop
      the executable or writable flags from some segments but I don't know how
      much attention is paid to them anyway so it might not be worth the
      effort.
      
      The vsyscall related sections need to go in a different segment to the
      normal data segment and so I invented a "user" segment to contain them.
      I believe this should appear to be another data segment as far as the
      kernel is concerned so the flags are setup accordingly.
      
      The notes will be used in the Xen paravirt_ops backend to provide
      additional information to the domain builder. I am in the process of
      converting the xen-unstable kernels and tools over to this scheme at the
      moment to support this in the future.
      
      It has been suggested to me that the notes segment should have flags 0
      (i.e. not readable) since it is only used by the loader and is not used
      at runtime. For now I went with a readable segment since that is what
      the i386 patch uses.
      
      AK: dropped NOTES addition right now because the needed infrastructure
      for that is not merged yet
      Signed-off-by: NIan Campbell <ian.campbell@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      f2a9e1de
    • E
      [PATCH] Reload CS when startup_64 is used. · 26374c7b
      Eric W. Biederman 提交于
      In long mode the %cs is largely a relic.  However there are a few cases
      like iret where it matters that we have a valid value.  Without this
      patch it is possible to enter the kernel in startup_64 without setting
      %cs to a valid value.  With this patch we don't care what %cs value
      we enter the kernel with, so long as the cs shadow register indicates
      it is a privileged code segment.
      
      Thanks to Magnus Damm for finding this problem and posting the
      first workable patch.  I have moved the jump to set %cs down a
      few instructions so we don't need to take an extra jump.  Which
      keeps the code simpler.
      Signed-of-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      26374c7b
    • A
      [PATCH] Remove non e820 fallbacks in high level code · 8380aabb
      Andi Kleen 提交于
      Drop support for non e820 BIOS calls to get the memory map.
      
      The boot assembler code still has some support, but not the C code now.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      8380aabb
    • A
      [PATCH] Add a missing check for irq flags tracing in NMI · 7a0a2dff
      Andi Kleen 提交于
      NMIs are not supposed to track the irq flags, but TRACE_IRQS_IRETQ
      did it anyways. Add a check.
      
      Cc: mingo@elte.hu
      Signed-off-by: NAndi Kleen <ak@suse.de>
      7a0a2dff
    • A
      [PATCH] Fix coding style and output of the mptable parser · aecc6361
      Andi Kleen 提交于
      Give the printks a consistent prefix.
      Add some missing white space.
      
      Cc: len.brown@intel.com
      Signed-off-by: NAndi Kleen <ak@suse.de>
      aecc6361
    • A
      [PATCH] Remove some cruft in apic id checking during processor setup · e4251e13
      Andi Kleen 提交于
      - Remove a define that was used only once
      - Remove the too large APIC ID check because we always support
      the full 8bit range of APICs.
      - Restructure code a bit to be simpler.
      
      Cc: len.brown@intel.com
      Signed-off-by: NAndi Kleen <ak@suse.de>
      e4251e13
    • A
      [PATCH] Remove APIC version/cpu capability mpparse checking/printing · f2c2cca3
      Andi Kleen 提交于
      ACPI went to great trouble to get the APIC version and CPU capabilities
      of different CPUs before passing them to the mpparser. But all
      that data was used was to print it out.  Actually it even faked some data
      based on the boot cpu, not on the actual CPU being booted.
      
      Remove all this code because it's not needed.
      
      Cc: len.brown@intel.com
      Signed-off-by: NAndi Kleen <ak@suse.de>
      f2c2cca3
    • A
      [PATCH] Remove safe_smp_processor_id() · 151f8cc1
      Andi Kleen 提交于
      And replace all users with ordinary smp_processor_id.  The function
      was originally added to get some basic oops information out even
      if the GS register was corrupted. However that didn't
      work for some anymore because printk is needed to print the oops
      and it uses smp_processor_id() already. Also GS register corruptions
      are not particularly common anymore.
      
      This also helps the Xen port which would otherwise need to
      do this in a special way because it can't access the local APIC.
      
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      151f8cc1
    • R
      [PATCH] Detect clock skew during suspend · 34464a5b
      Rafael J. Wysocki 提交于
      Detect the situations in which the time after a resume from disk would
      be earlier than the time before the suspend and prevent them from
      happening on x86_64.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      34464a5b
    • A
      [PATCH] Don't force reserve the 640k-1MB range · dbf9272e
      Andi Kleen 提交于
      From i386 x86-64 inherited code to force reserve the 640k-1MB area.
      That was needed on some old systems.
      
      But we generally trust the e820 map to be correct on 64bit systems
      and mark all areas that are not memory correctly.
      
      This patch will allow to use the real memory in there.
      
      Or rather the only way to find out if it's still needed is to
      try. So far I'm optimistic.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      dbf9272e
    • M
      [PATCH] mark init_amd() as __cpuinit · ed77504b
      Magnus Damm 提交于
      The init_amd() function is only called from identify_cpu() which is already
      marked as __cpuinit. So let's mark it as __cpuinit.
      Signed-off-by: NMagnus Damm <magnus@valinux.co.jp>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      ed77504b
    • A
      [PATCH] wire up oops_enter()/oops_exit() · abf0f109
      Andrew Morton 提交于
      Implement pause_on_oops() on x86_64.
      
      AK: I redid the patch to do the oops_enter/exit in the existing
      oops_begin()/end(). This makes it much shorter.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      abf0f109
    • A
      [PATCH] non lazy "sleazy" fpu implementation · e07e23e1
      Arjan van de Ven 提交于
      Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
      context switch a trap is taken for the first FPU use to restore the FPU
      context lazily.  This is of course great for applications that have very
      sporadic or no FPU use (since then you avoid doing the expensive
      save/restore all the time).  However for very frequent FPU users...  you
      take an extra trap every context switch.
      
      The patch below adds a simple heuristic to this code: After 5 consecutive
      context switches of FPU use, the lazy behavior is disabled and the context
      gets restored every context switch.  If the app indeed uses the FPU, the
      trap is avoided.  (the chance of the 6th time slice using FPU after the
      previous 5 having done so are quite high obviously).
      
      After 256 switches, this is reset and lazy behavior is returned (until
      there are 5 consecutive ones again).  The reason for this is to give apps
      that do longer bursts of FPU use still the lazy behavior back after some
      time.
      
      [akpm@osdl.org: place new task_struct field next to jit_keyring to save space]
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      e07e23e1
    • E
      [PATCH] Auto size the per cpu area. · ba4d40bb
      Eric W. Biederman 提交于
      Now for a completely different but trivial approach.
      I just boot tested it with 255 CPUS and everything worked.
      
      Currently everything (except module data) we place in
      the per cpu area we know about at compile time.  So
      instead of allocating a fixed size for the per_cpu area
      allocate the number of bytes we need plus a fixed constant
      for to be used for modules.
      
      It isn't perfect but it is much less of a pain to
      work with than what we are doing now.
      
      AK: fixed warning
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      ba4d40bb
    • A
      [PATCH] Make boot_param_data pure BSS · 2717941c
      Andi Kleen 提交于
      Since it's all zero.
      
      Actually I think gcc 4+ will do that automatically, but earlier compilers won't
      Signed-off-by: NAndi Kleen <ak@suse.de>
      2717941c
    • D
      [PATCH] X86_64 monotonic_clock goes backwards · cbf9b4bb
      Dimitri Sivanich 提交于
      I've noticed some erratic behavior while testing the X86_64 version
      of monotonic_clock().
      
      While spinning in a loop reading monotonic clock values (pinned to a
      single cpu) I noticed that the difference between subsequent values
      occasionally went negative (time going backwards).
      
      I found that in the following code:
                      this_offset = get_cycles_sync();
                      /* FIXME: 1000 or 1000000? */
      -->             offset = (this_offset - last_offset)*1000 / cpu_khz;
              }
              return base + offset;
      
      the offset sometimes turns out to be 0, even though
      this_offset > last_offset.
      
      +Added fix From: Toyo Abe <toyoa@mvista.com>
      
      The x86_64-mm-monotonic-clock.patch in 2.6.18-rc4-mm2 made a change to
      the updating of monotonic_base. It now uses cycles_2_ns().
      
      I suggest that a set_cyc2ns_scale() should be done prior to the setup_irq().
      Because cycles_2_ns() can be called from the timer ISR right after the irq0
      is enabled.
      Signed-off-by: NToyo Abe <toyoa@mvista.com>
      Signed-off-by: NDimitri Sivanich <sivanich@sgi.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      cbf9b4bb
    • P
      [PATCH] x86: error_code is not safe for kprobes · d28c4393
      Prasanna S.P 提交于
      This patch moves the entry.S:error_entry to .kprobes.text section,
      since code marked unsafe for kprobes jumps directly to entry.S::error_entry,
      that must be marked unsafe as well.
      This patch also moves all the ".previous.text" asm directives to ".previous"
      for kprobes section.
      
      AK: Following a similar i386 patch from Chuck Ebbert
      AK: Also merged Jeremy's fix in.
      
      +From: Jeremy Fitzhardinge <jeremy@goop.org>
      
      KPROBE_ENTRY does a .section .kprobes.text, and expects its users to
      do a .previous at the end of the function.
      
      Unfortunately, if any code within the function switches sections, for
      example .fixup, then the .previous ends up putting all subsequent code
      into .fixup.  Worse, any subsequent .fixup code gets intermingled with
      the code its supposed to be fixing (which is also in .fixup).  It's
      surprising this didn't cause more havok.
      
      The fix is to use .pushsection/.popsection, so this stuff nests
      properly.  A further cleanup would be to get rid of all
      .section/.previous pairs, since they're inherently fragile.
      
      +From: Chuck Ebbert <76306.1226@compuserve.com>
      
      Because code marked unsafe for kprobes jumps directly to
      entry.S::error_code, that must be marked unsafe as well.
      The easiest way to do that is to move the page fault entry
      point to just before error_code and let it inherit the same
      section.
      
      Also moved all the ".previous" asm directives for kprobes
      sections to column 1 and removed ".text" from them.
      Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      d28c4393
    • A
      [PATCH] Check for end of stack trace before falling back · be7a9170
      Andi Kleen 提交于
      Signed-off-by: NAndi Kleen <ak@suse.de>
      be7a9170
    • A
      [PATCH] Merge stacktrace and show_trace · c0b766f1
      Andi Kleen 提交于
      This unifies the standard backtracer and the new stacktrace
      in memory backtracer. The standard one is converted to use callbacks
      and then reimplement stacktrace using new callbacks.
      
      The main advantage is that stacktrace can now use the new dwarf2 unwinder
      and avoid false positives in many cases.
      
      I kept it simple to make sure the standard backtracer stays reliable.
      
      Cc: mingo@elte.hu
      Signed-off-by: NAndi Kleen <ak@suse.de>
      c0b766f1
    • A
      [PATCH] Don't access the APIC in safe_smp_processor_id when it is not mapped yet · b7f5e3c7
      Andi Kleen 提交于
      Lockdep can call the dwarf2 unwinder early, and the dwarf2 code
      uses safe_smp_processor_id which tries to access the local APIC page.
      But that doesn't work before the APIC code has set up its fixmap.
      
      Check for this case and always return boot cpu then.
      
      Cc: jbeulich@novell.com
      Cc: mingo@elte.hu
      Signed-off-by: NAndi Kleen <ak@suse.de>
      b7f5e3c7
    • A
      [PATCH] x86: Some preparationary cleanup for stack trace · 5a1b3999
      Andi Kleen 提交于
      - Remove unused all_contexts parameter
      No caller used it
      - Move skip argument into the structure (needed for
      followon patches)
      
      Cc: mingo@elte.hu
      Signed-off-by: NAndi Kleen <ak@suse.de>
      5a1b3999
    • M